Find Outliers

Find Outliers

The FindOutliers task analyzes point data (such as crime incidents, traffic accidents, or trees) or field values associated with points or area features (such as the number of people in each census tract or the total sales for retail stores). It finds statistically significant spatial clusters of high values and low values and statistically significant high or low spatial outliers within those clusters.

The result map layer shows high outliers in red and low outliers in dark blue. Clusters of high values appear pink and clusters of low values appear light blue. Features that are beige are not a statistically significant outlier and not part of a statistically significant cluster; the spatial pattern associated with these features may be the result of random processes and random chance.

Request URL

http://<analysis url>/FindOutliers/submitJob

Limits

The processInfo parameter and pop-ups are not updated by overwriting the output feature layer if the existing output feature layer already has strings for processInfo and custom pop-ups.

Request parameters

ParameterDescription

analysisLayer

(Required)

The point or polygon feature layer for which outliers will be calculated.

Syntax: As described in detail in the Feature input topic, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A feature collection

Examples:

  • {"url": <feature service layer url>, "filter": <where clause>}
  • {"layerDefinition": {}, "featureSet": {}, "filter": <where clause>}

analysisField

(Required if analysisLayer contains polygons)

The numeric field that will be analyzed. This field can represent the following:

  • Counts (such as the number of traffic accidents)
  • Rates (such as the number of crimes per square mile)
  • Averages (such as the mean math test score)
  • Indices (such as a customer satisfaction score)

If an analysisField value is not provided, hot spot results will be based on point densities only.

Syntax: "analysisField": "Average_Score"

dividedByField

The numeric field in the analysisLayer value that will be used to normalize the data. For example, if points represent crimes, dividing by total population will result in an analysis of crimes per capita rather than raw crime counts.

You can use esriPopulation to enrich each area feature with the most recent population values, which will then be used as the attribute to divide by. This option uses credits.

Syntax: "dividedByField": "esriPopulation"

boundingPolygonLayer

When the analysis layer is points and no analysisField value is provided, you can provide polygon features that define where incidents could have occurred. For example, if you are analyzing boating accidents in a harbor, the outline of the harbor can provide a boundary for where accidents could occur. When no bounding areas are provided, only locations with at least one point will be included in the analysis.

Syntax: As described in detail in the Feature input topic, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A feature collection

aggregationPolygonLayer

When the analysisLayer value contains points and no analysisField value is specified, you can provide polygon features into which the points will be aggregated and analyzed, such as administrative units. The number of points that fall within each polygon are counted, and the point count in each polygon is analyzed.

Syntax: As described in detail in the Feature input topic, this parameter can be one of the following:

  • A URL to a feature service layer with an optional filter to select specific features
  • A feature collection

permutations

Specifies the type of permutations that will be used. Permutations are used to determine how likely it would be to find the actual spatial distribution of the values you are analyzing. Deciding on the number of permutations is a balance between precision and increased processing time. A lower number of permutations can be used when first exploring a problem, but it is a best practice to increase the permutations to the highest number feasible for final results.

  • Speed—Implements 199 permutations and results in p-values with a precision of 0.005
  • Balance—Implements 499 permutations and results in p-values with a precision of 0.002
  • Precision—Implements 999 permutations and results in p-values with a precision of 0.001

Values: Speed | Balance | Precision

Example: "permutations": "Precision"

shapeType

Specifies the shape of the polygon mesh the input features will be aggregated into.

  • Fishnet—The input features will be aggregated into a grid of square (fishnet) cells.
  • Hexagon—The input features will be aggregated into a grid of hexagonal cells.

Example: "shapeType": "Hexagon"

cellSize

The size of the grid cells that will be used to aggregate the features. When aggregating into a hexagon grid, this distance is used as the height to construct the hexagon polygons.

Example: "cellSize": 500

cellSizeUnits

Specifies the units of the cellSize value. You must specify a value if a value for cellSize has been set.

Values: Miles | Feet | Kilometers | Meters

Example: "cellSizeUnits": "Meters"

distanceBand

The spatial extent of the analysis neighborhood. This value determines which features will be analyzed together to assess local clustering.

distanceBandUnits

Specifies the units of the distanceBand value. You must specify a value if a value for distanceBand has been set.

Values: Miles | Feet | Kilometers | Meters

Example: "distanceBandUnits": "Meters"

outputName

If provided, the task will create a feature service of the results. You define the name of the service. If an outputName value is not provided, the task will return a feature collection.

Syntax:

{
  "serviceProperties": {
    "name": "<service name>"
  }
}
In ArcGIS Online or ArcGIS Enterprise 11.1 and later, you can overwrite an existing feature service by providing the itemId value of the existing feature service and setting the overwrite property to true. Including the serviceProperties parameter is optional. As described in the Feature output topic, you must either be the owner of the feature service or have administrative privileges to perform the overwrite.

Syntax:

{

  "itemProperties": {
			"itemId": "<itemID of the existing feature service>",
			"overwrite": true
	}
}
or
{
"serviceProperties": {
    "name": "<existing service name>"
  },
"itemProperties": {
				"itemId": "<itemID of the existing feature service>",
				"overwrite": true
	}
}
The processInfo parameter and pop-ups are not updated by overwriting the output feature layer if the existing output feature layer already has strings for processInfo and custom pop-ups.

context

The context parameter contains the following additional settings that affect task operation:

  • Extent (extent)—A bounding box that defines the analysis area. Only input features that intersect the bounding box will be analyzed.
  • Output spatial reference (outSR)—The output features will be projected into the output spatial reference.
  • Random number seed (randomGenerator)—A string representing the integer and seed type that will initiate a random number generator. The seed type is always MERSENNE_TWISTER, for example, 13 MERSENNE_TWISTER. This parameter is available in ArcGIS Enterprise 11.2 or later.

Syntax:

{
"extent" : {extent},
"outSR" : {"wkid": 4326},
"randomGenerator" : "13 MERSENNE_TWISTER"
}

f

The response format. The default response format is html.

Values: html | json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:

{
"jobId": "<unique job identifier>",
"jobStatus": "<job status>"
}

After the initial request is submitted, you can use the job ID to check the status of the job and messages as described in Check job status . Once the job has successfully completed, use the job ID to retrieve the results. To track the status, you can make a request of the following form:

http://<analysis url>/FindOutliers/jobs/<jobId>

Analysis results

When the status of the job request is esriJobSucceded, you can access the results of the analysis by making a request of the following form:

http://<analysis url>/FindOutliers/jobs/<jobId>/results/outlierResultLayer?token=<your token>&f=json

ParameterDescription

outlierResultLayer

The result of the FindOutliers task is a feature layer that provides information about statistically significant outlier features.

If the input analysis layer (analysisLayer) contains points and an analysisField value is provided, the result will be points. For all other scenarios (polygons or points when no analysisField value is provided) the output will be polygons.

The result layer has the following attributes:

  • ID field (FID)—An integer field with a unique value for every feature.
  • AnalysisField or Join_Count—When an analysisField value is provided, it will be copied to the result with the same name and properties. When no analysisField value is provided, an integer field is created with values reflecting the number of points in each aggregation polygon. If an aggregationPolygonLayer value is provided, the polygons are used for aggregation. Otherwise, a fishnet or hexagon polygon mesh is created to overlay the points, and the squares in the fishnet mesh are used as aggregation polygons.
  • Hot/Cold Intensity—A numeric (double) field with standard deviations representing the intensity of spatial clustering.
  • Confidence Bin—A field for symbolizing the results. Values range from -3 to +3 and reflect statistical significance. Use blue to draw values less than zero and red to draw values greater than zero. Use the darkest blue for features equal to -3, medium blue for -2, and light blue for -1. Use the darkest red for features equal to 3, medium red for 2, and the lightest red or pink for 1. A confidence bin value of zero means no statistically significant clustering, and the features should be drawn in white or beige (the color should be neutral to not draw attention).

Example:

{"url": 
"http://<analysis url>/FindOutliers/jobs/<jobId>/results/outlierResultLayer"}

The result has properties for parameter name, data type, and value. The contents of value depend on the outputName parameter value provided in the initial request.

  • If an outputName value was provided, value contains the URL to the feature service layer.

    {
    "paramName":"outlierResultLayer", 
    "dataType":"GPString",
    "value":{"url":"<hosted featureservice layer url>"}
    }

  • If no outputName value was provided, value contains a feature collection.

    {
    "paramName": "outlierResultLayer",
    "dataType": "GPString",
    "value":{"layerDefinition": {}, "featureSet": {} }
    }

See Feature output for more information about how the result layer or collection is accessed.

processInfo

Contains strings that describe the workflow used by FindOutliers when calculating the result. These strings are used for reporting by the Find Outliers tool in Map Viewer in ArcGIS Online. You can create custom reports for your application using these strings. There are four parts in the returned JSON:

  • messageCode—The serial number for each unique message
  • message—Text that may contain parameters (in ${paramsName} format) that must be replaced by values
  • params—A dictionary of the keys and values to be inserted into ${paramsName} in the message
  • style—The style used to format the report produced by the Find Outliers tool in Map Viewer.

Example:

{
"messageCode" : "SS_84464",
"message" : "The optimal fixed distance band is based on the average distance to ${NumNeighs} nearest neighbors: ${DistanceInfo}",
"params" : { "NumNeighs" : "20" , "DistanceInfo" : "446.8956 Meters" },
"style" : "<ul><li></li></ul><br></br>",
}

Cluster and outlier analysis

The FindOutliers task calculates the Anselin Local Moran's I statistic for each feature in a feature layer. The service examines each feature in the context of all features, as well as each feature in the context of its neighboring features. To be a statistically significant outlier, a feature must have a high value or incident count and be surrounded by features with low values or incident counts. To be a statistically significant cluster, a feature must have a high value or incident count and be surrounded by other features with high values or incident counts. The local sum for a feature's neighbors is compared proportionally to the sum of all features, and the feature is also compared to its neighbors. When the local sum is very different from the expected local sum, or the feature's value is very different from the expected value, and when that difference is too large to be the result of random chance, a statistically significant z-score results.

Potential applications

Applications include crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic incident analysis, and demographics. Examples include the following:

  • Where are anomalous spending patterns in Los Angeles?
  • Where are the sharpest boundaries between affluence and poverty in the study area?
  • Where are there unexpectedly high rates of diabetes across the study area?
  • Are stores struggling or low performing despite being surrounded by high performing stores?
  • Where are there unexpectedly high rates of insurance claims in the greater Phoenix area?
  • Are there counties in the United States with unusually low life expectancy?

Outlier analysis considerations

Consider the following when undertaking an outlier analysis:

  • What is the analysis field?

    The Find Outliers analysis tool assesses whether high or low values (the number of crimes, accident severity, or dollars spent on sporting goods, for example) cluster spatially. The field containing those values is the analysis field. When the analysis layer represents incident points and you are only interested in locating high and low incident densities, choose NO ANALYSIS FIELD. When you choose NO ANALYSIS FIELD, the FindOutliers task will overlay the incident points with a fishnet or hexagon and count the number of incidents within each fishnet or hexagon cell. The incident count values will then be used as the analysis field.

  • What is the question?

    How you construct the analysis field determines the types of questions that can be answered. Are you most interested in determining where there are lots of incidents or where high and low values for a particular attribute cluster spatially? If so, run the FindOutliers task on the raw values or raw incident counts. This type of analysis is particularly helpful for resource allocation types of problems. Alternatively (or in addition), you can locate areas with unexpectedly high values in relation to some other variable. If you are analyzing foreclosures, for example, you may expect more foreclosures in locations with more homes (that is, at some level, you expect the number of foreclosures to be a function of the number of houses). For each analysis layer area, divide the number of foreclosures by the number of homes, and run the FindOutliers task on this ratio. For this analysis, you are no longer asking Where are there unusually low foreclosures? You are asking Where are there unexpectedly low numbers of foreclosures, given the number of homes? By creating a rate or ratio prior to analysis, you can control for certain expected relationships (for example, the number of foreclosures is a function of housing stock) and will then be identifying unexpected outlier areas.

  • Does the analysis layer contain at least 30 features?

    Results aren't reliable with fewer than 30 features.

Calculations

The Anselin Local Moran's I statistic is calculated for each feature.

Mathematics for the Local Moran's I statistic

Additional resources

For more information, see the following:

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.