Reverse Geocode creates addresses from point geometries and returns them as string values. This process requires a Spark DataFrame containing the points that you want to reverse geocode and a locator. The tool matches the points against reference data in a locator and returns the addresses of the points as strings along with other output columns.
Usage notes
-
The input DataFrame needs to have a point column to be able to run Reverse Geocode.
-
If the spatial reference of the input DataFrame is different than that of the locator, the input will be transformed to match the locator.
-
The fields from the input DataFrame will always be included in the output.
The result fields in the output DataFrame are determined by the
predefined
parameter in the_set set
setter.Out Fields() Minimal
—Returns theMatch
, and_addr Addr
fields. This is the default option._type Minimal
—Returns the fields defined inAnd User Fields Minimal
and the custom output fields available in the locator. User defined fields can be configured during the process of creating a locator in ArcGIS Pro. For more information about locators, read the geocoding core concept.All
—Returns all available output fields including any custom output fields defined in your locator.
-
When an input DataFrame contains a field that has the same name as one of the output fields in the reverse geocoded result, the output field will be automatically renamed with a suffix of "1". For example, if a field named
Address
already exists in the input DataFrame, the result DataFrame will have a field namedAddress
from the input, andAddress1
representing the output field. -
The output DataFrame will contain the same number of records as the input DataFrame. Unmatched records are indicated by a
null
value forMatch
._addr -
If there are no records in the locator that can be associated with the input geometry, a match address will not be returned. The following are common causes for unmatched records:
- The geometry contains null coordinates.
- The coordinates are invalid or cannot be transformed to the locator's spatial reference.
- The locator does not contain reference addresses near the geometry.
- An address type was specified for which there are no good matches within a reasonable distance.
-
You can use
set
to set the language in which reverse geocoded addresses will be returned. When a given language code is not available in the locator, the tool will return results in the default language of the locator. The code should follow the ISO 639-3 standard.Language Code()
Limitations
Geocoding with GeoAnalytics Engine requires a locator file. Using a locator service, such as the ArcGIS World Geocoding Service, is not supported.
Results
The result of Reverse Geocode is a copy of the input DataFrame with new fields added depending on the set
setter.
The table below explains which fields are returned based on the predefined
parameter's value in the set
setter.
There are three options:
Minimal
—Match
, and_addr Addr
are returned. This is the default option._type Minimal
—And User Fields Match
,_addr Addr
and any custom output fields available in the locator are returned._type All
—All fields are returned including any custom fields defined in your locator.
The fields are detailed in the table below.
Field | Description |
---|---|
Loc | The name of the locator used to return a match result. This field is available only if the locator used for matching the table is a composite locator. |
Match | The address where the matched location actually resides based on the information of the matched candidate. |
Long | A longer version of Match containing more administrative information. |
Short | A shortened version of Match . |
Addr | The geocoded address type, which indicates the level at which the address matched. Supported match levels vary between countries. The table at the bottom of this section describes some possible values. |
Type | The feature type for results returned by a search. The Type field only includes a value for candidates with an address type of POI or Locality . For example, the feature type of Starbucks might be Coffee Shop. |
Place | The formal name of a geocode match candidate (e.g., Paris or Starbucks). |
Place | The full street address of a place, including street, city, and region (e.g., 275 Columbus Ave., New York, New York). |
Phone | The primary phone number of a place. |
URL | The URL of the primary website for a place. |
Rank | A number that indicates the importance of a result relative to other results with the same name. The smaller numbers represent higher-ranked features. Rank values are based on population or feature type. For example, there are cities in France and Texas named Paris. Paris, France, has a greater population than Paris, Texas, so it will have a higher rank. |
Add | The name of a building (e.g., Empire State Building). |
Add | The alphanumeric value that represents the portion of an address typically known as a house number or building number. This value is returned for Point and Street matches only. |
Add | A value representing the beginning number of a street address range. It is relative to direction of feature digitization and is not always the smallest number in the range. This value is provided for Street match results. |
Add | A value representing the ending number of a street address range. It is relative to direction of feature digitization and is not always the largest number in the range. This value is provided for Street match results. |
Add | The full address number range for the street segment that an address lies on, in the format AddNumFrom-AddNumTo. An example is the AddRange value for the street address 123 Main St. may be 101-199. |
Side | The side of the street where an address resides relative to the direction of feature digitization. This value is not relative to the direction of travel along the street. L indicates that an address is matched to the left side while R means the address is matched to the right side of the street. No value indicates that the address is not matched or the locator could not determine the side of the street. |
St | An address element defining the direction of a street, which occurs before the primary street name (e.g., North in North Main Street). |
St | An address element defining the leading type of a street (e.g., Avenid in Avenida Central or Rue in Rue Lapin). |
St | An address element defining the primary name of a street (e.g., Main in North Main Street). |
St | An address element defining the trailing type of a street (e.g., Street in Main Street). |
St | An address element defining the direction of a street, which occurs after the primary street name (e.g. North in Main Street North). |
St | An address element defining the leading direction of the first street in an intersection. |
St | An address element defining the leading type of the first street in an intersection. |
St | An address element defining the primary name of the first street in an intersection. |
St | An address element defining the trailing direction of the first street in an intersection. |
St | An address element defining the leading direction of the second street in an intersection. |
St | An address element defining the leading type of the second street in an intersection. |
St | An address element defining the primary name of the second street in an intersection. |
St | An address element defining the trailing direction of the second street in an intersection. |
Bldg | The name or number of a building subunit (e.g., A in Building A). |
Bldg | The classification of a building subunit. Examples include building, hangar, and tower. |
Level | The classification of a floor subunit. Examples include floor, level, and department. |
Level | The name or number of a floor subunit (e.g., 3 in Level 3). |
Unit | The classification of a unit subunit. Examples include unit, apartment, and suite. |
Unit | The name or number of a unit subunit (e.g., 2B in Apartment 2B). |
Sub | The full subunit value for a candidate with an address type of Subaddress . |
St | The street address of a place without a zone, such as city or state (e.g., 275 Columbus Ave). |
Address | The full address of a place (e.g., 2000 MCMILLAN AVE, COMPTON, CA 90220). |
Block | The name of the block-level administrative division for a candidate. A block is the smallest administrative area for a country. It can be described as a subdivision of sector or neighborhood or a named city block. It is not commonly used. |
Sector | The name of the sector-level administrative division for a candidate. A sector is a subdivision of neighborhood, district, or a collection of blocks. It is not commonly used. |
Nbrhd | The name of the neighborhood-level administrative division for a candidate. A neighborhood is a subsection of a city or district. For example, Little Italy is the name of a neighborhood in the city of San Diego, California. |
Neighborhood | The name of the neighborhood-level administrative division for a candidate. It is an alias for the field Nbrhd . |
District | The name of the district-level administrative division for a candidate, for example, a subdivision of city. For example, Wilhelmsburg is a district in the city of Hamburg in Germany. |
City | The name of the city-level administrative division for a candidate. City is a subdivision of a subregion or region. For example, Atlanta is a city within Fulton County in the state of Georgia. |
Metro | The name of the metropolitan area-level administrative division for a candidate. This is usually an urban area consisting of a large city and the smaller cities surrounding it. This can potentially intersect multiple subregions or regions. An example is the Kolkata Metropolitan Area in India. |
Subregion | The name of the subregion-level administrative division for a candidate. Subregion is a subdivision of a region. For example, San Diego County is a subregion of the state of California. |
Region | The name of the region-level administrative division for a candidate. This can be a subdivision of a country or territory. It is typically the largest administrative area for a country (such as state or province) if the Territory administrative division is not used. |
Region | Abbreviated region name. For example, the abbreviated name for California is CA. |
Territory | The name of the territory-level administrative division for a candidate. This is a subdivision of a country and is not commonly used. An example is the Sudeste macroregion of Brazil, which encompasses the states of Espírito Santo, Minas Gerais, Rio de Janeiro, and São Paulo. |
Postal | An alphanumeric address element defining the primary postal code (e.g., V7M 2B4 or 92374). |
Postal | An alphanumeric address element defining the postal code extension (e.g., 8110 in 92373-8110). |
Country | A three-character code for a country that follows the ISO 3166-1 alpha-3 standard. |
Cntry | The full country name for an address candidate. The name may be in the same language as the input address, or in the language specified by the lang parameter. If the full country name is not available in the specified language, the primary language of the country is used (e.g., 日本 for Japan). |
Lang | A three-character language code representing the language of the address. The code should follow the ISO 639-3 standard. |
X | The primary x-coordinate of the matched address in the spatial reference of the locator. |
Y | The primary y-coordinate of the matched address in the spatial reference of the locator. |
Display | The display x-coordinate of an address returned in the spatial reference of the locator. |
Display | The display y-coordinate of an address returned in the spatial reference of the locator. |
Xmin | The minimum x-coordinate of a geocode result. |
Xmax | The maximum x-coordinate of a geocode result. |
Ymin | The minimum y-coordinate of a geocode result. |
Ymax | The maximum y-coordinate of a geocode result. |
Ex | A collection of strings from the input that could not be matched to any part of an address and were used to score or penalize the result. |
The table below outlines the possible values for Addr
:
Value | Description |
---|---|
Subaddress | A street address based on points that represent house and building subaddress locations. Typically, this is the most spatially accurate match level. The subaddress elements of unit type and unit identifier help to distinguish one subaddress within or between structures from another when several occur within the same location. Reference data contains address points or polygons with associated house numbers, street names, and subaddress elements, along with administrative divisions and optional postal code. An example is 3836 Emerald Ave., Suite C, La Verne, CA 91750. |
Point | A street address based on points that represent house and building locations. Reference data contains address points with associated house numbers and street names, along with administrative divisions and optional postal code. The X and Y and geometry output values for a Point match represent the street entry location for the address; this is the location used for routing operations. The Display and Display values represent the rooftop or actual location of the address. An example is 380 New York St., Redlands, CA 92373. |
Parcel | A plot of land that is considered real property and may include one or more homes or other structures. A parcel typically has an address and parcel identification number assigned to it, such as 17 011100120063. |
Street | A street address that differs from Point because the house number is interpolated from a range of numbers. Reference data contains street centerlines with house number ranges, along with administrative divisions and optional postal code information. An example is 647 Haight St., San Francisco, CA 94117. |
Street | A street address consisting of a street intersection along with city and optional state and postal code information. An example is Redlands Blvd. & New York St., Redlands, CA 92373. |
Street | An estimated street address match that is returned when parameter matchOutOfRange=true and the input house number exceeds the house number range for the matched street segment. |
POI | Points of interest. Reference data consists of administrative division, place-names, businesses, landmarks, and geographic features. An example is Starbucks. |
Distance | A street address that represents the linear distance along a street, typically in kilometers or miles, from a designated origin location. An example is Carr 682 KM 4, Barceloneta, 00617. |
Street | The estimated midpoint of a range of house numbers along a street segment that correspond to a city block. An example is 100 Block of Grant Ave, Millville, New Jersey. The location returned for a Street match is more precise than that of a Street match, but less precise than a Street match. This is currently only functional for the United States. |
Street | Similar to a street address but without the house number. Reference data contains street centerlines with associated street names (no numbered address ranges), along with administrative divisions and optional postal code. An example is W Olive Ave., Redlands, CA 92373. |
Postal | A postal code with an additional extension (e.g., 90210-3841). Reference data is postal code points with extensions. |
Postal | Postal code (e.g., 90210). Reference data is postal code points. |
Postal | A combination of postal code and city name. Reference data is typically a union of postal boundaries and administrative (locality) boundaries. An example is 7132 Frauenkirchen. |
Locality | A place-name representing a populated place. The Type output field provides more detailed information about the type of populated place. Possible Type values for Locality matches include Block, Sector, Neighborhood, District, City, MetroArea, County, State or Province, Territory, Country, and Zone. |
Feature | A geocoding result returned by a locator created with the Create Feature Locator tool in ArcGIS Pro. |
Lat | An x,y coordinate pair. The Lat address type is returned when an x,y coordinate pair such as 117.155579,32.703761 is the input. |
X | A match based on the assumption that the first coordinate of the input is longitude and the second is latitude. |
Y | A match based on the assumption that the first coordinate of the input is latitude and the second is longitude. |
MGRS | A Military Grid Reference System (MGRS) location, such as 46VFM5319397841. |
USNG | A United States National Grid (USNG) location, such as 15TXN29753883. |
How Reverse Geocode works
See the geocoding core concept topic for more info on the geocoding process.
Performance notes
To improve performance, limit the number of output fields returned in the tool output. For example, returning only the
Minimal
set of output fields should take less time to complete than returning All
output fields.
Syntax
For more details, go to the GeoAnalytics Engine API reference for reverse geocode.
Setter | Description | Required |
---|---|---|
run(dataframe) | Runs the Reverse Geocode tool using the provided DataFrame. | Yes |
set | Set the address locator that will be used to reverse geocode the geometries. | Yes |
set | Specifies the possible match types that will be returned. A single value or multiple values can be specified. Available values are Subaddress , Point , Street , Distance , Street , Street , Postal , Locality , and POI . | No |
set | Sets the language in which reverse geocoded addresses are returned. | No |
set | Sets the fields that will be included in the output DataFrame. The predefined parameter can accept three options: ' (default), ' and ' . | No |
Examples
Run Reverse Geocode
# Log in
import geoanalytics
geoanalytics.auth(username="myusername", password="mypassword")
# Imports
from geoanalytics.tools import ReverseGeocode
from geoanalytics.sql import functions as ST
# URL to the public schools data
data_url = r"https://services1.arcgis.com/Ua5sjt3LWTPigjyD/arcgis/rest/services/" \
"Public_School_Location_201819/FeatureServer/0"
# Create a public schools DataFrame
df = spark.read.format("feature-service").load(data_url) \
.withColumn("shape", ST.transform("shape", 4326))\
.select("shape")\
.where("STATE='CA'")
# Access the locator
# This needs to be accessible to the machine that is running the Reverse Geocode tool.
# If running on a cluster, it needs to be accessible to all nodes in the cluster.
north_america_locator = r"/data/NA_locator.loc"
# Use Reverse Geocode to convert the coordinates into addresses
result = ReverseGeocode() \
.setLocator(north_america_locator) \
.setOutFields("minimal")\
.setLanguageCode("ENG")\
.setFeatureTypes("POI")\
.run(df)
# Show the first 5 outputs
result.show(5)
+--------------------+--------------------+---------+
| shape| Match_addr|Addr_type|
+--------------------+--------------------+---------+
|{"x":-118.2159902...| Vasquez High School| POI|
|{"x":-118.1856342...|Meadowlark Elemen...| POI|
|{"x":-118.1951402...| High Desert School| POI|
|{"x":-121.9655031...|California School...| POI|
|{"x":-121.9633661...|California School...| POI|
+--------------------+--------------------+---------+
only showing top 5 rows
Version table
Release | Notes |
---|---|
1.3.0 | Tool introduced |