Introduction
The batch_geocode()
function in the arcgis.geocoding
module geocodes an entire list of addresses. Geocoding many addresses at once is also known as bulk geocoding. You can use this method upon finding the following types of locations:
- Street addresses (e.g.
27488 Stanford Ave, Bowden, North Dakota
, or380 New York St, Redlands, CA 92373
) - Administrative place names, such as city, county, state, province, or country names (e.g.
Seattle, Washington
,State of Mahārāshtra
, orLiechtenstein
) - Postal codes: (e.g.
92591
orTW9 1DN
)
Batch sizes (max and suggested batch sizes)
There is a limit to the maximum number of addresses that can be geocoded in a single batch request with the geocoder. The MaxBatchSize
property defines this limit. For instance, if MaxBatchSize=2000, and 3000 addresses are sent as input, only the first 2000 will be geocoded. The SuggestedBatchSize
property is also useful as it specifies the optimal number of addresses to include in a single batch request.
Both of these properties can be determined by querying the geocoder:
from arcgis.gis import GIS
from arcgis.geocoding import get_geocoders, batch_geocode
from arcgis.map import Map
from arcgis.map.popups import PopupInfo
gis = GIS(profile="your_enterprise_profile")
# use the first of GIS's configured geocoders
geocoder = get_geocoders(gis)[0]
print("For current geocoder:")
print(" - MaxBatchSize: " + str(geocoder.properties.locatorProperties.MaxBatchSize))
print(" - SuggestedBatchSize: " + str(geocoder.properties.locatorProperties.SuggestedBatchSize))
For current geocoder: - MaxBatchSize: 1000 - SuggestedBatchSize: 150
Batch geocode single line addresses, multi-line addresses
The batch_geocode()
function supports searching for lists of places and addresses. Each address in the list can be specified as a single line of text (single field format), or in multi-field format with the address components separated into mulitple parameters.
The code snippet below imports the geocode
function and displays its signature and parameters along with a brief description:
help(batch_geocode)
Help on function batch_geocode in module arcgis.geocoding._functions: batch_geocode(addresses: 'Union[list[str], dict[str, str]]', source_country: 'Optional[str]' = None, category: 'Optional[str]' = None, out_sr: 'Optional[dict]' = None, geocoder: 'Optional[Geocoder]' = None, as_featureset: 'bool' = False, match_out_of_range: 'bool' = True, location_type: 'str' = 'street', search_extent: 'Optional[Union[list[dict[str, Any]], dict[str, Any]]]' = None, lang_code: 'str' = 'EN', preferred_label_values: 'Optional[str]' = None, out_fields: 'Optional[str]' = None) The ``batch_geocode`` function geocodes an entire list of addresses. .. note:: Geocoding many addresses at once is also known as bulk geocoding. ========================= ================================================================ **Parameter** **Description** ------------------------- ---------------------------------------------------------------- addresses Required list of strings or dictionaries. A list of addresses to be geocoded. For passing in the location name as a single line of text - single field batch geocoding - use a string. For passing in the location name as multiple lines of text multifield batch geocoding - use the address fields described in the Geocoder documentation. .. note:: The maximum number of addresses that can be geocoded in a single request is limited to the SuggestedBatchSize property of the locator. Syntax: addresses = ["380 New York St, Redlands, CA", "1 World Way, Los Angeles, CA", "1200 Getty Center Drive, Los Angeles, CA", "5905 Wilshire Boulevard, Los Angeles, CA", "100 Universal City Plaza, Universal City, CA 91608", "4800 Oak Grove Dr, Pasadena, CA 91109"] OR addresses= [{ "Address": "380 New York St.", "City": "Redlands", "Region": "CA", "Postal": "92373" },{ "OBJECTID": 2, "Address": "1 World Way", "City": "Los Angeles", "Region": "CA", "Postal": "90045" }] ------------------------- ---------------------------------------------------------------- source_country Optional string, The ``source_country`` parameter is only supported by geocoders published using StreetMap Premium locators. .. note:: Added at 10.3 and only supported by geocoders published with ArcGIS 10.3 for Server and later versions. ------------------------- ---------------------------------------------------------------- category Optional String. The ``category`` parameter is only supported by geocode services published using StreetMap Premium locators. ------------------------- ---------------------------------------------------------------- out_sr Optional dictionary, The spatial reference of the x/y coordinates returned by a geocode request. This is useful for applications using a map with a spatial reference different than that of the geocode service. ------------------------- ---------------------------------------------------------------- as_featureset Optional boolean, if True, the result set is returned as a FeatureSet object, else it is a dictionary. ------------------------- ---------------------------------------------------------------- geocoder Optional :class:`~arcgis.geocoding.Geocoder`, the geocoder to be used. .. note:: If not specified, the active ``GIS`` instances first ``Geocoder`` is used. ------------------------- ---------------------------------------------------------------- match_out_of_range Optional, A Boolean which specifies if StreetAddress matches should be returned even when the input house number is outside of the house number range defined for the input street. ------------------------- ---------------------------------------------------------------- location_type Optional, Specifies if the output geometry of PointAddress matches should be the rooftop point or street entrance location. Valid values are rooftop and street. ------------------------- ---------------------------------------------------------------- search_extent Optional, a set of bounding box coordinates that limit the search area to a specific region. The input can either be a comma-separated list of coordinates defining the bounding box or a JSON envelope object. ------------------------- ---------------------------------------------------------------- lang_code Optional, sets the language in which geocode results are returned. See the table of supported countries for valid language code values in each country. ------------------------- ---------------------------------------------------------------- preferred_label_values Optional, allows simple configuration of output fields returned in a response from the World Geocoding Service by specifying which address component values should be included in output fields. Supports a single value or a comma-delimited collection of values as input. e.g. ='matchedCity,primaryStreet' ------------------------- ---------------------------------------------------------------- out_fields Optional String. A string of comma seperated fields names used to limit the return attributes of a geocoded location. ========================= ================================================================ .. code-block:: python # Usage Example >>> batched = batch_geocode(addresses = ["380 New York St, Redlands, CA", "1 World Way, Los Angeles, CA", "1200 Getty Center Drive, Los Angeles, CA", "5905 Wilshire Boulevard, Los Angeles, CA", "100 Universal City Plaza, Universal City, CA 91608", "4800 Oak Grove Dr, Pasadena, CA 91109"] as_featureset = True, match_out_of_range = True, ) >>> type(batched) <:class:`~arcgis.features.FeatureSet> :return: A dictionary or :class:`~arcgis.features.FeatureSet`
The address
parameter will be a list of addresses to be geocoded, and you can choose between:
- a single line of text — single field batch geocoding — use a string.
- or multiple lines of text — multifield batch geocoding — use the address fields described in Part 3.
The Geocoder provides localized versions of the input field names in all locales supported by it.
Single Line Addresses
addresses = ["380 New York St, Redlands, CA",
"1 World Way, Los Angeles, CA",
"1200 Getty Center Drive, Los Angeles, CA",
"5905 Wilshire Boulevard, Los Angeles, CA",
"100 Universal City Plaza, Universal City, CA 91608",
"4800 Oak Grove Dr, Pasadena, CA 91109"]
results = batch_geocode(addresses)
map0 = Map("Los Angeles")
map0
map0.zoom = 9
for address in results:
address['location'].update({"spatialReference" : {"wkid" : 4326}})
for address in results:
map0.content.draw(address['location'])
print(address['score'])
100 100 99.41 100 100 100
Each match has keys for score, location, attributes and address:
results[0].keys()
dict_keys(['address', 'location', 'score', 'attributes'])
Multi-line Addresses
The earlier example showed how to call batch_geocode()
with single line addresses. The following example illustrates how to call batch_geocode()
with a list of multi-field addresses.
addresses= [{
"Address": "380 New York St.",
"City": "Redlands",
"Region": "CA",
"Postal": "92373"
},{
"Address": "1 World Way",
"City": "Los Angeles",
"Region": "CA",
"Postal": "90045"
}]
results = batch_geocode(addresses)
map1 = Map("Los Angeles")
map1
map1.zoom = 9
for address in results:
address['location'].update({"spatialReference" : {"wkid" : 4326}})
for address in results:
map1.content.draw(address['location'])
Get geocoded results as a FeatureSet
object
When as_featureset
is set to True, we can get the geocoded results as a FeatureSet
object, which is more convenient for being plotted on the map, and shown as DataFrame
than when the results set is generated as a dict
object.
results_fset = batch_geocode(addresses,
as_featureset=True)
results_fset
<FeatureSet> 2 features
map1b = Map("Los Angeles")
map1b
map1b.zoom = 9
for feature in results_fset.features:
map1b.content.draw(feature.geometry)
results_fset.sdf
ResultID | Loc_name | Status | Score | Match_addr | LongLabel | ShortLabel | Addr_type | Type | PlaceName | ... | Y | DisplayX | DisplayY | Xmin | Xmax | Ymin | Ymax | ExInfo | OBJECTID | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | World | M | 100 | 380 New York St, Redlands, California, 92373 | 380 New York St, Redlands, CA, 92373, USA | 380 New York St | PointAddress | ... | 34.057252 | -117.19479 | 34.057265 | -117.19579 | -117.19379 | 34.056265 | 34.058265 | 1 | {"x": -117.195649834906, "y": 34.057251584743,... | |||
1 | 1 | World | M | 100 | 1 World Way, Los Angeles, California, 90045 | 1 World Way, Los Angeles, CA, 90045, USA | 1 World Way | PointAddress | ... | 33.945058 | -118.398162 | 33.944686 | -118.399162 | -118.397162 | 33.943686 | 33.945686 | 2 | {"x": -118.398102660849, "y": 33.945058427725,... |
2 rows × 61 columns
Batch geocoding using geocode_from_items()
The batch_geocode()
function geocodes a table or file of addresses and returns the geocoded results. It supports CSV, XLS or table input. The task geocodes the entire file regardless of size. We can first take a look at its signature with help()
:
from arcgis.geocoding import geocode_from_items
help(geocode_from_items)
Help on function geocode_from_items in module arcgis.geocoding._functions: geocode_from_items(input_data: 'Union[Item, str, FeatureLayer]', output_type: 'str' = 'Feature Layer', geocode_service_url: 'Optional[Union[str, Geocoder]]' = None, geocode_parameters: 'Optional[dict[str, Any]]' = None, country: 'Optional[str]' = None, output_fields: 'Optional[str]' = None, header_rows_to_skip: 'int' = 1, output_name: 'Optional[str]' = None, category: 'Optional[str]' = None, context: 'Optional[dict[str, Any]]' = None, gis: 'Optional[GIS]' = None) The ``geocode_from_items`` method creates :class:`~arcgis.geocoding.Geocoder` objects from an :class:`~arcgis.gis.Item` or ``Layer`` objects. .. note:: ``geocode_from_items`` geocodes the entire file regardless of size. ===================== ================================================================ **Parameter** **Description** --------------------- ---------------------------------------------------------------- input_data required Item, string, Layer. Data to geocode. --------------------- ---------------------------------------------------------------- output_type optional string. Export item types. Allowed values are "CSV", "XLS", or "FeatureLayer". .. note:: The default for ``output_type`` is "FeatureLayer". --------------------- ---------------------------------------------------------------- geocode_service_url optional string of Geocoder. Optional :class:`~arcgis.geocoding.Geocoder` to use to spatially enable the dataset. --------------------- ---------------------------------------------------------------- geocode_parameters optional dictionary. This includes parameters that help parse the input data, as well the field lengths and a field mapping. This value is the output from the ``analyze_geocode_input`` available on your server designated to geocode. It is important to inspect the field mapping closely and adjust them accordingly before submitting your job, otherwise your geocoding results may not be accurate. It is recommended to use the output from ``analyze_geocode_input`` and modify the field mapping instead of constructing this dictionary by hand. **Values** ``field_info`` - A list of triples with the field names of your input data, the field type (usually TEXT), and the allowed length (usually 255). Example: [['ObjectID', 'TEXT', 255], ['Address', 'TEXT', 255], ['Region', 'TEXT', 255], ['Postal', 'TEXT', 255]] ``header_row_exists`` - Enter true or false. ``column_names`` - Submit the column names of your data if your data does not have a header row. ``field_mapping`` - Field mapping between each input field and candidate fields on the geocoding service. Example: [['ObjectID', 'OBJECTID'], ['Address', 'Address'], ['Region', 'Region'], ['Postal', 'Postal']] --------------------- ---------------------------------------------------------------- country optional string. If all your data is in one country, this helps improve performance for locators that accept that variable. --------------------- ---------------------------------------------------------------- output_fields optional string. Enter the output fields from the geocoding service that you want returned in the results, separated by commas. To output all available outputFields, leave this parameter blank. Example: score,match_addr,x,y --------------------- ---------------------------------------------------------------- header_rows_to_skip optional integer. Describes on which row your data begins in your file or table. The default is 1 (since the first row contains the headers). The default is 1. --------------------- ---------------------------------------------------------------- output_name optional string, The task will create a feature service of the results. You define the name of the service. --------------------- ---------------------------------------------------------------- category optional string. Enter a category for more precise geocoding results, if applicable. Some geocoding services do not support category, and the available options depend on your geocode service. --------------------- ---------------------------------------------------------------- context optional dictionary. Context contains additional settings that affect task execution. Batch Geocode has the following two settings: 1. Extent (extent) - A bounding box that defines the analysis area. Only those points in inputLayer that intersect the bounding box are analyzed. 2. Output Spatial Reference (outSR) - The output features are projected into the output spatial reference. Syntax: { "extent" : {extent} "outSR" : {spatial reference} } --------------------- ---------------------------------------------------------------- gis optional ``GIS``, the :class:`~arcgis.gis.GIS` on which this tool runs. .. note:: If not specified, the active ``GIS`` is used. ===================== ================================================================ .. code-block:: python # Usage Example >>> fl_item = geocode_from_items(csv_item, output_type='Feature Layer', geocode_parameters={"field_info": ['Addresses', 'TEXT', 255], "column_names": ["Addresses"], "field_mapping": ['Addresses', 'Address'] }, output_name="address_file_matching", gis=gis) >>> type(fl_item) <:class:`~arcgis.gis.Item`> :return: A :class:`~arcgis.gis.Item` object.
The geocode_from_items()
function is popular because it allows the user to input a web item (e.g. a CSV
file that has been uploaded to your organization before hand) and generate a resulting web item (in this case, we have specified the output_type
as Feature Layer
). Let's look at an example below:
csv_item = gis.content.search("addresses", item_type="CSV")[0]
csv_item
from arcgis.geocoding import analyze_geocode_input
my_geocode_parameters = analyze_geocode_input(
input_table_or_item=csv_item,
input_file_parameters= {
"fileType":"csv",
"headerRowExists":"true",
"columnDelimiter":"COMMA",
"textQualifier":""
}
)
my_geocode_parameters
{'header_row_exists': True, 'column_delimiter': 'COMMA', 'text_qualifier': '', 'field_info': '[["Incomplete", "TEXT", 255], ["Country", "TEXT", 255], ["Address", "TEXT", 255]]', 'field_mapping': '[["Address", "Address"], ["Incomplete", ""], ["Country", "CountryCode"]]', 'column_names': '', 'file_type': 'csv', 'singleline_field': 'SingleLine'}
from arcgis.geocoding import geocode_from_items
fl_item = geocode_from_items(
input_data=csv_item,
output_type="Feature Layer",
geocode_parameters=my_geocode_parameters,
output_name="address_all_matching",
gis=gis
)
fl_item
Example of geocoding POIs (category param)
category
parameter
The category
parameter is a place or address type which can be used to filter batch geocoding results. The parameter supports input of single category values or multiple comma-separated values.
Single category filtering example:
category="Address"
Multiple category filtering example:
category="Address,Postal"
We will now explore some examples taking advantage of the category
parameter, in the following orders:
- airports using their codes
- a list of city names
- restaurants of a few different sub-categories (Peruvian, Japanese, Korean, French..)
Example: Finding airports using their codes
airports = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"], category="airport")
map2 = Map("California")
map2
map2.zoom = 6
for airport in airports:
airport['location'].update({"spatialReference" : {"wkid" : 4326}})
for airport in airports:
popup = PopupInfo(**{
"title" : airport['attributes']['PlaceName'],
"description" : airport['address']
}
)
map2.content.draw(airport['location'], popup)
Examples of source_country
and lang_code
source_country
parameter
The source_country
parameter is a value representing the country. When a value is passed for this parameter, all of the addresses in the input table are sent to the specified country locator to be geocoded. For example, if source_country="USA"
is passed in a batch_geocode()
call, it is assumed that all of the addresses are in the United States, and so all of the addresses are sent to the USA country locator. Using this parameter can increase batch geocoding performance when all addresses are within a single country.
Acceptable values include the full country name, the ISO 3166-1 2-digit country code
, or the ISO 3166-1 3-digit country code
.
A list of supported countries and codes is available here.
Example:
source_country="USA"
lang_code
parameter
The lang_code
parameter is optional. When specified, you can set the language in which geocode results are returned. See the table of supported countries for valid language code values in each country.
Example: Finding Indian Cities and Return Results in Hindi
india_cities = batch_geocode(["Mumbai", "New Dehli", "Kolkata"],
category="city",
source_country="IND",
lang_code="HI")
for city in india_cities:
print(city['address'])
कोलकाता, पश्चिम बंगाल मुंबई, महाराष्ट्र नई दिल्ली, दिल्ली
india_map = Map("India")
india_map
for city in india_cities:
city['location'].update({"spatialReference" : {"wkid" : 4326}})
for city in india_cities:
india_map.content.draw(city['location'])
Getting results in desired coordinate system
out_sr
parameter
This parameter is the spatial reference of the x/y coordinates returned by the geocode method. It is useful for applications using a map with a spatial reference different than that of the geocoder.
The spatial reference can be specified as either a well-known ID (WKID) or as a JSON spatial reference object. If outSR is not specified, the spatial reference of the output locations is the same as that of the geocoder. The World Geocoding Service spatial reference is WGS84 (WKID = 4326).
For a list of valid WKID values, see the Coordinate Systems Reference.
Example (102100 is the WKID for the Web Mercator projection):
out_sr=102100
for airport in airports:
if airport["address"] == "LAX":
print(airport['address'])
print(airport['location'])
LAX {'x': -118.405581418239, 'y': 33.945016955146, 'spatialReference': {'wkid': 4326}, 'type': 'point'}
For instance, the default output spatial reference is WGS84 with the WKID shown as 4326 (as shown in the previous cell). If we specify the out_sr
as 102100, then the x/y coordinates being returned by batch_geocode()
is now in Web Mercator, as displayed below:
airports_2 = batch_geocode(["LAX", "SFO", "ONT", "FAT", "LGB"],
category="airport",
out_sr=102100)
for airport in airports_2:
if airport['address'] == "LAX":
print(airport['address'])
print(airport['location'])
LAX {'x': -13180849.0306, 'y': 4021421.5337999985}
Avoiding fallbacks
You can also use category filtering to avoid "low resolution" fallback matches. By default, if the World Geocoding Service cannot find a match for an input address, it will automatically search for a lower match level, such as a street name, city, or postal code. For batch geocoding, a user may prefer that no match is returned in these cases so that they are not charged for the geocode. If a user passes category="Point Address,Street Address" in a batch_geocode()
call, no fallback will occur if address matches cannot be found; the user will only be charged for the actual address matches.
Example: Batch geocode with fallback allowed (no category)
In the example below, the second address is not matched to a point address, but is matched to the city instead, due to fallback.
results = batch_geocode(["380 New York St Redlands CA 92373",
"? Stanford Dr Escondido CA"])
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Score 100 : 380 New York St, Redlands, California, 92373 Score 92.97 : Sanford Rd, Escondido, California, 92026
Example: Batch geocode with no fallback allowed (category="Point Address"
)
In the example below, as a point address match is not found for the second address, there is no low resolution fallback, as the category has been set to Point Address. As a result, no match is returned for the second address:
results = batch_geocode([ "380 New York St Redlands CA 92373",
"? Stanford Dr Escondido CA"],
category="Street Address")
for result in results:
print("Score " + str(result['score']) + " : " + result['address'])
Score 100 : 380 New York St, Redlands, California, 92373 Score 0 :
Conclusions
In this Part 4, we have explored the usage of batch_geocode()
function and how the advanced parameters can help with fine-tuning and filtering the geocoded results.