What is an outlier analysis?
An outlier analysis is the process of identifying both clusters and anomalous values (outliers) in spatial data. It determines whether an attribute value or point count for each feature is significantly different, defined as the resultant z-score and p-value, from its neighbors. To execute the analysis, use the spatial analysis service and the Find
operation.
The analysis classifies features as being:
- High-High: a high value surrounded by other high values
- High-Low: a high value surrounded by low values
- Low-High: a low value surrounded by high values
- Low-Low: a low value surrounded by other low values.
A feature is part of a cluster when it has a similar value to its neighbors. A feature is considered an outlier when it has dissimilar values from its neighbors.
An outlier analysis helps to find spatial trends and patterns in the data that may not be visible at first glance.
Real-world examples of this analysis include the following:
- Finding outliers (either high or low counts) of traffic crashes or crimes.
- Determining whether there are outlier (anomalous) spending trends or real estate prices.
- Finding whether some areas of a country might have a higher population despite being surrounded by lower population numbers.
How to perform an outlier analysis
The general steps to performing an outlier analysis are as follows:
- Review the parameters for the
Find
operation.Outliers - Send a request to get the spatial analysis service URL.
- Execute a job request with the following URL and parameters:
- URL:
https
:// <YOUR _ANALYSIS _SERVICE >/arcgis/rest/services/tasks/ GP Server/ Find Outliers/submit Job - Parameters:
analysis
: Your dataset as a hosted feature layer or feature collection.Layer analysis
: A numeric field.Field shape
: Fishnet or Hexagon.Type output
: A string representing the name of the hosted feature layer to return with the results.Name
- URL:
- Check the status.
- Get the output layer results.
To see examples using ArcGIS API for Python, ArcGIS REST JS, and the ArcGIS REST API, go to Examples below.
URL request
http://<YOUR_ANALYSIS_SERVICE>/arcgis/rest/services/tasks/GPServer/FindOutliers/submitJob?<parameters>
Required parameters
Name | Description | Examples |
---|---|---|
f | The format of the data returned. | f=json f=pjson |
token | An OAuth 2.0 access token. | token= |
analysis | The point or polygon feature layer. | {url |
analysis | Use if the analysis layer contains polygons. | counts , rates , averages |
Key parameters
Name | Description | Examples |
---|---|---|
shape | Use if the analysis layer contains points. | hexagon , fishnet |
bounding | When the analysis layer contains points and no analysis is specified, you can provide a boundary for the analysis. | {url |
output | A string representing the name of the hosted feature layer to return with the results. NOTE: If you do not include this parameter, the results are returned as a feature collection (JSON). | {"service |
context | A bounding box or output spatial reference for the analysis. | "extent" |
Code examples
Identify outliers in traffic crashes
This example uses the Find
operation to determine where there are statistically significant outliers of Traffic crashes counted within a fishnet grid. The anomalous areas, shown as dark red and dark blue, indicate either significantly high or significantly low instances of crashes compared to neighboring clusters.
In the analysis, the analysis
value is the Traffic crashes hosted feature layer. The points in the layer are counted within a fishnet
, which was set in the shape
parameter.
APIs
trafficLayer = "https://services3.arcgis.com/GVgbJbqm8hXASVYi/arcgis/rest/services/Traffic_Crashes/FeatureServer/0"
results = find_outliers(
analysis_layer=trafficLayer,
shape_type="fishnet",
#Outputs results as a hosted feature layer.
output_name="Find outliers results"
)
result_features = results["outliers_result_layer"].layers[0].query()
print(
f"The find outliers layer has {len(result_features.features)} new records"
)
Service requests
Request
POST arcgis.com/sharing/rest/portals/self HTTP/1.1
Content-Type: application/x-www-form-urlencoded
&f=json
&token=<ACCESS_TOKEN>
Response (JSON)
{
"helperServices": {
// Other parameters
"analysis": {
"url": "https://<YOUR_ANALYSIS_SERVICE>/arcgis/rest/services/tasks/GPServer"
},
"geoenrichment": {
"url": "https://geoenrich.arcgis.com/arcgis/rest/services/World/GeoenrichmentServer"
}
}
}
Identify home value outliers
This example uses an outlier analysis to determine where there are statistically significant outliers for home values in Portland. The anomalous areas, shown as dark red and dark blue, indicate either significantly high or significantly low home values compared to neighboring clusters.
In the analysis, the analysis
value is the Enriched Portland hexagon bins hosted feature layer. The feature layer was created using generated hexagon bins that were enriched using data from the GeoEnrichment service. To analyze home values, you set the analysis
with the AVG
attribute.
To learn how to generate hexagon bins, go to generate tessellations.
APIs
# https://developers.arcgis.com/rest/analysis/api-reference/feature-input.htm
pdx_homes= {
"url": "https://services3.arcgis.com/GVgbJbqm8hXASVYi/arcgis/rest/services/Enriched_PDX_hex_bin_tessellations/FeatureServer/0",
"filter": "AVGVAL_CY <> 0",
}
results = find_outliers(
analysis_layer=pdx_homes,
analysis_field="AVGVAL_CY",
#Outputs results as a hosted feature layer.
output_name="Find outliers results"
)
result_features = results["outliers_result_layer"].layers[0].query()
print(f"The outliers layer has {len(result_features.features)} new records")
Service requests
Request
POST arcgis.com/sharing/rest/portals/self HTTP/1.1
Content-Type: application/x-www-form-urlencoded
&f=json
&token=<ACCESS_TOKEN>
Response (JSON)
{
"helperServices": {
// Other parameters
"analysis": {
"url": "https://<YOUR_ANALYSIS_SERVICE>/arcgis/rest/services/tasks/GPServer"
},
"geoenrichment": {
"url": "https://geoenrich.arcgis.com/arcgis/rest/services/World/GeoenrichmentServer"
}
}
}