The ArcGIS API for Python allows GIS analysts and data scientists to query, visualize, analyze, and transform their spatial data using the powerful GeoAnalytics Tools available in their organization. Learn more about the analysis capabilities of the API at the documentation site.
The big data analysis tools can be accessed via the arcgis.geoanalytics
module.
Tools Overview
The GeoAnalytics tools are presented through a set of sub modules within the arcgis.geoanalytics
module. To view the list of tools available, refer to the page titled Working with big data. In this page, we will learn how to execute big data tools.
Get started
The arcgis.geoanalytics
module provides types and functions for distributed analysis of large datasets. These GeoAnalytics tools work with big data registered in the GIS's datastores as well as with the feature layers.
Use arcgis.geoanalytics.is_analysis_supported(gis)
to check if geoanalytics is supported in your GIS.
Feature Input
You can run the GeoAnalytics Tools on the following:
arcgis.features.FeatureLayer
(hosted, hosted feature layer views, and from feature services)arcgis.features.FeatureCollection
- Big data file shares registered with ArcGIS GeoAnalytics Server
Feature Output
The output from running GeoAnalytics Tools can be one of two options:
- A hosted feature layer with data stored in ArcGIS Data Store registered with the portal's hosting server.
- A dataset stored to a big data file share (a folder, cloud store, HDFS location) that you have registered with your GeoAnalytics Server.
Refer to this page for detailed information about feature layers and features.
Next, we will specify which big data file share the GeoAnalyticss results will save to. If set to None
, the arcgis.env.output_datastore
will reset to default. Allowed string values are: spatiotemporal
or relational
.
import arcgis
arcgis.geoanalytics.define_output_datastore(datastore='relational')
True
Environment settings
The arcgis.env
module provides a shared environment used by the different modules. It stores globals, such as the currently active GIS, the default geocoder, and more. It also stores environment settings that are common among all tools, such as the output spatial reference, cell size, etc.
Set spatial reference
The GeoAnalytics Tools use a process spatial reference during execution. Analyses with square or hexagon bins require a projected coordinate system. We'll use the World Cylindrical Equal Area projection (WKID 54034) below (as it is the default used when running tools in ArcGIS Online). All results are stored in the spatiotemporal datastore of the Enterprise in the WGS 84 Spatial Reference.
See the GeoAnalytics Documentation for a full explanation of analysis environment settings.
arcgis.env.process_spatial_reference=54034
Verbosity of messages
The ArcGIS Platform, including the ArcGIS API for Python, manages and transforms geographic data with a large suite of tools and functions collectively known as geoprocessing. The GeoAnalytics Tools in the ArcGIS API for Python are a subset of geoprocessing tools that operate in the context of a geoprocessing environment. You can set various aspects of this environment to control how tools are executed and what messages you receive during and after the execution. See the Logging and error handling section in the API for Python Geoprocessing Guide's Advanced concepts for ways to control messaging, including the arcgis.env.verbose
setting.
arcgis.env.verbose=True
Context Parameter
ArcGIS GeoAnalytics Server tasks that have the outSR
property in their Context parameter will save results in the specified spatial reference. If you are saving the results to the spatiotemporal data store, all results will be projected to World Geographic Coordinate System 1984 after analysis for storage and the outSR will not be used. Set the spatial reference that results will be analyzed in using the Process Spatial Reference property.
GeoAnalytics operations use the following context parameters defined in the arcgis.env
module:
Conetxt Parameter | Description |
---|---|
out_spatial_reference | Used for setting the output spatial reference |
process_spatial_reference | Used for setting the processing spatial reference. |
analysis_extent | Used for setting the analysis extent. |
output_datastore | Used for setting the output datastore to be used. |
#example
context = {
"extent": {
"xmin": -122.68,
"ymin": 45.53,
"xmax": -122.45,
"ymax": 45.6,
"spatialReference": {
"wkid": 4326
}
},
"outSR" : {"wkid" : 3857},
"dataStore" : "relational"
}
Executing a GeoAnalytics tool
In the previous guide, you learnt how to register big data file share with your ArcGIS GeoAnalytics Server. Adding a big data file share to the Geoanalytics server adds a corresponding big data file share item in the portal. We can search for these types of items using the item_type parameter.
When you add a big data file share, a corresponding item gets created on your portal. You can search for it like any other portal Item
and query its layers.
# connect to Enterprise GIS
from arcgis.gis import GIS
import arcgis.geoanalytics
portal_gis = GIS("your_enterprise_profile")
When no parameters are specified with geoanalytics
methods, they use the active GIS connection, which you can query with the arcgis.env.active_gis
property. However, if you are working with more than one GIS object, you can specify the desired GIS object as the gis
parameter of this method. For example, let us create a connection to an Enterprise deployment and check if GeoAnalytics is supported.
Ensure your GIS supports GeoAnalytics
After connecting to Enterprise portal, you need to ensure an ArcGIS Enterprise GIS is set up with a licensed GeoAnalytics server. To do so, we will call the is_supported()
method.
arcgis.geoanalytics.is_supported(gis=portal_gis)
True
Search big data file share item
Adding a big data file share to the Geoanalytics server adds a corresponding big data file share item in the portal. We can search for these types of items using the item_type
parameter.
search_result = portal_gis.content.search("bigDataFileShares_ServiceCallsOrleans",
item_type = "big data file share",
max_items=40)
search_result
[<Item title:"bigDataFileShares_ServiceCallsOrleans" type:Big Data File Share owner:portaladmin>]
data_item = search_result[0]
data_item
Querying the layers property of the item returns a feature layer representing the data. The object is actually an API Layer object.
#displays layers in the item
data_item.layers
[<Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_ServiceCallsOrleans/BigDataCatalogServer/yearly_calls">]
calls = data_item.layers[0] #select first layer
calls
<Layer url:"https://pythonapi.playground.esri.com/ga/rest/services/DataStoreCatalogs/bigDataFileShares_ServiceCallsOrleans/BigDataCatalogServer/yearly_calls">
Access the aggregate_points()
tool through the summarize_data
module. This example uses the Aggregate Points tool to aggregate the point features representing earthquakes into 1 Kilometer square bins. The tool creates an output feature layer in your portal you can access once processing is complete.
from arcgis.geoanalytics.summarize_data import aggregate_points
from datetime import datetime as dt
Sync execution
By default, all the tools have the Future
parameter set to False
. The tools return output results as feature layer items.
agg_result1 = aggregate_points(calls,
bin_type='Hexagon',
bin_size=1,
bin_size_unit='Meters',
output_name="aggregate results of call" + str(dt.now().microsecond))
agg_result1
Async execution
If Future=True
, a GPJob is returned, rather than results. The GPJob can be queried on the status of the execution.
agg_result2 = aggregate_points(calls,
bin_type='Hexagon',
bin_size=1,
bin_size_unit='Meters',
output_name="aggregate results of call" + str(dt.now().microsecond),
future=True)
agg_result2
<AggregatePoints GA Job: jd47e5f0d6f82413fb31be8bd6ec476d7>
agg_result2.result()
{"messageCode":"BD_101054","message":"Some records have either missing or invalid geometries."} {"messageCode":"BD_101088","message":"Some result features were clipped to the valid extent of the resulting spatial reference."}
The aggregate points tool returns a feature layer item that contains the processed results.
Apply spatial filter
The context
parameter helps to set spatial and temporal filters. It takes the following keys to set an extent or time filter.
The tool output above shows that some data points have been located outside New Orleans because of missing or invalid geometries. We want to explore data points within New Orleans city limits. As such, we want to run the tool only in the zoomed extent. Let's set our area of interest as the zoomed extent of map.
ext = m1.extent
ext
{'spatialReference': {'latestWkid': 3857, 'wkid': 102100}, 'xmin': -10022118.236961203, 'ymin': 3491517.7562587974, 'xmax': -10017417.35972154, 'ymax': 3493428.6819659774}
agg_result3 = aggregate_points(calls,
bin_type='Hexagon',
bin_size=1,
bin_size_unit='Meters',
output_name="aggregate results of call" + str(dt.now().microsecond),
context=ext)
agg_result3
Attaching log redirect Log level set to DEBUG Detaching log redirect
Apply filter by field value
Using filter
property, you can apply a filter on feature layers to run your analysis only on a subset of data.
item = portal_gis.content.get('67908048c99f44998dfd464de004bffa')
item
fl = item.layers[0]
fl.query(as_df=True).columns
Index(['BLOCK_ADDRESS', 'Disposition', 'DispositionText', 'INSTANT_DATETIME', 'Location', 'MapX', 'MapY', 'NOPD_Item', 'OBJECTID', 'PoliceDistrict', 'Priority', 'SHAPE', 'TimeArrive', 'TimeClosed', 'TimeCreate', 'TimeDispatch', 'TypeText', 'Type_', 'Zip', 'globalid'], dtype='object')
# Apply a filter on Zip field
fl.filter = 'Zip=70119'
agg_result4 = aggregate_points(fl,
bin_type='Hexagon',
bin_size=1,
bin_size_unit='Meters',
output_name="aggregate results of call" + str(dt.now().microsecond))
agg_result4
Attaching log redirect Log level set to DEBUG {"messageCode":"BD_101068","message":"Bin generation and analysis requires a projected coordinate system and a default projection of World Cylindrical Equal Area has been applied."} Detaching log redirect
The screenshot above displays the aggregated results for the 70119 zip code .
Apply time filter
You can also apply a time filter using the time_filter
method, which filters a time-enabled feature layer by datetime. When you apply the filter, the analysis will only be performed on the time filtered features. Refer this page for more details.
# Apply a filter by datetime
fl.time_filter = '2017'
In this guide, we have learned about the analysis capabilities available in the arcgis.geoanalytics
module and how some of the common concepts, such as environment settings, sync operation, filer etc., can be applied across all tools. In the next guide, we will learn in more detail about the tools available in the arcgis.geoanalytics.summarize_data
submodule.