Forest-based Classification And Regression

URL:: https://<geoanalytics-url>/ForestBasedClassificationAndRegression
Methods:: GET
Version Introduced:: 10.7

Description

The ForestBasedClassificationAndRegression operation creates models and generates predictions using an adaptation of Leo Breiman's random forest algorithm, which is a supervised machine learning method. Predictions can be performed for both categorical variables (classification) and continuous variables (regression). Explanatory variables can take the form of fields in the attribute table of the training features. In addition to validation of model performance on the training data, predictions can be made to another feature dataset.

The following are examples:

You have seagrass occurrence and a number of environmental explanatory variables that have been enriched using a multivariable grid to calculate distances to factories upstream and major ports. Future seagrass occurrence can be predicted based on future projections for those same environmental explanatory variables.
You have crop yield data at hundreds of farms across the country, along with other attributes at each of those farms (number of employees, acreage, and so on). Using this data, you can provide a set of features representing farms where you don't have crop yield (but you do have all of the other variables), and make a prediction about crop yield.
Housing values can be predicted based on the prices of houses sold in the current year. The sale price of homes sold, along with information about the number of bedrooms, distance to schools, proximity to major highways, average income, and crime counts, can be used to predict sale prices of similar homes.

Request parameters

Parameter	Details
`predictionType` (Required)	Specifies the operation mode of the tool. The tool can be run to train a model to only assess performance or to train a model and predict features. Prediction types are as follows: `Train`—This is the default. A model will be trained, but no predictions will be generated. Use this option to assess the accuracy of your model before generating predictions. `TrainAndPredict`—Predictions or classifications will be generated for features. Explanatory variables must be provided for both the training features and the features to be predicted. The output of this option will be a feature service, model diagnostics, and an optional table of variable importance. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example Train //REST scripting example "predictionType": "TrainAndPredict"`
`inFeatures` (Required)	The features that will be used to train the dataset. This layer must include fields representing the variable to predict and the explanatory variables. Syntax: As described in Feature input, this parameter can be one of the following: A URL to a feature service layer with an optional filter to select specific features A URL to a big data catalog service layer with an optional filter to select specific features A feature collection REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"} //REST scripting example "inFeatures": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}`
`featuresToPredict` (Required if using `TrainAndPredict` )	A feature layer representing locations where predictions will be made. This layer must include explanatory variable fields that correspond to fields used in `inFeatures`. This parameter is only used when `predictionType` is `TrainAndPredict` and is required in that case. Syntax: As described in Feature input, this parameter can be one of the following: A URL to a feature service layer with an optional filter to select specific features A URL to a big data catalog service layer with an optional filter to select specific features A feature collection REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"} //REST scripting example "featuresToPredict": {"url": "https://myportal.domain.com/server/rest/services/Hosted/hurricaneTrack/FeatureServer/0", "filter": "Month = 'September'"}`
`variablePredict` (Required)	The variable from the `inFeatures` parameter containing the values to be used to train the model and a Boolean denoting whether it's categorical. This field contains known (training) values of the variable that will be used to predict at unknown locations. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example {"fieldName": "variablePredict", "categorical": true} //REST scripting example "variablePredict": {"fieldName": "variablePredict", "categorical": true}`
`explanatoryVariables` (Required)	A list of fields representing the explanatory variables and a Boolean value denoting whether the fields are categorical. The explanatory variables help predict the value or category of the `variablePredict` parameter. Use the categorical parameter for any variables that represent classes or categories (such as land cover or presence or absence). Specify the variable as `true` for any that represent classes or categories such as land cover or presence or absence and `false` if the variable is continuous. In the example below, `fieldName` is the name of the field in the `inFeatures` used to predict the `variablePredict`, and categorical is either `true` or `false`. A string field should always be set as `true`, and a continuous value should always be set as `false`. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example [{"fieldName": "CrimeType", "categorical": true},{"fieldName": "population", "categorical": false}] //REST scripting example "variablePredict": [{"fieldName": "isSunny", "categorical": true},{"fieldName": "isWeekend","categorical": true},{"fieldName": "hoursOutside", "categorical": false}]`
`numberOfTrees` (Optional)	The number of trees to create in the forest model. More trees will generally result in more accurate model prediction, but the model will take longer to calculate. The default number of trees is 100. Values must be greater than 0. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 20 //REST scripting example "numberOfTrees": 50`
`minimumLeafSize` (Optional)	The minimum number of observations required to keep a leaf (that is, the terminal node on a tree without further splits). The default minimum for regression is 5, and the default for classification is 1. For very large data, increasing these numbers will decrease the run time of the tool. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 3 //REST scripting example "minimumLeafSize": 6`
`maximumTreeDepth` (Optional)	The maximum number of splits that will be made down a tree. Using a large maximum depth, more splits will be created, which may increase the chances of overfitting the model. The default is data driven and depends on the number of trees created and the number of variables included. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 14 //REST scripting example "minimumLeafSize": 10`
`sampleSize` (Optional)	The percentage of the `inFeatures` used for each decision tree. The default is 100 percent of the data. Samples for each tree are taken randomly from two-thirds of the data specified. Each decision tree in the forest is created using a random sample or subset (approximately two-thirds) of the training data available. Using a lower percentage of the input data for each decision tree increases the speed of the tool for very large dataset. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 95 //REST scripting example "sampleSize": 70`
`randomVariables` (Optional)	The number of explanatory variables used to create each decision tree. Each of the decision trees in the forest is created using a random subset of the explanatory variables specified. Increasing the number of variables used in each decision tree will increase the chances of overfitting your model, particularly if there are one or two dominate variables. A common practice is to use the square root of the total number of explanatory variables if your `variablePredict` is categorical, or to divide the total number of explanatory variables by 3 if `variablePredict` is numeric. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 3 //REST scripting example "randomVariables": 2`
`percentageForValidation` (Optional)	The percentage (between 0 percent and 50 percent) of `inFeatures` to reserve as the test dataset for validation. The default is 10 percent. The model will be trained without its random subset of data, and the observed values for those features will be compared to the predicted value. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example 15 //REST scripting example "percentageForValidation": 45`
`createVariableOfImportanceTable` (Optional)	A Boolean that specifies whether an output table will be generated that contains information describing the importance of each explanatory variable used in the model created. Values: `true` \| `false` REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example false //REST scripting example "createVariableImportanceTable": false`
`explanatoryVariableMatching` (Optional)	A list of the `explanatoryVariables` specified from the `inFeatures` and their corresponding fields from the `featuresToPredict`. By default, if an `explanatoryVariable` is not mapped, it will match to a field with the same name in the `featuresToPredict`. This parameter is only used if there is a `featuresToPredict` input. You do not need to use it if the names and types of the fields match between your two input datasets. `predictionLayerField` is the name of a field specified in the `explanatoryVariables` parameter and `trainingLayerField` is the field that will match the field in `explanatoryVariables`. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example [{"predictionLayerField": "CrimeType", "trainingLayerField": "TypeOfCrime"},{"predictionLayerField": "population", "trainingLayerField": "population"}] //REST scripting example "variablePredict": [{"predictionLayerField": "isSunny", "trainingLayerField": "isSunny2010"}]`
`outputTrainedName` (Required)	The task will create a feature service of the results. You define the name of the service. REST examples Use dark colors for code blocksCopy `1 2 3 4 //REST web example myOutput //REST scripting example "outputName": "myOutput"`
`context` (Optional)	The `context` parameter contains additional settings that affect task execution. For this task, there are four settings: Extent (`extent` )—A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed. Processing spatial reference (`processSR` )—The features will be projected into this coordinate system for analysis. Output spatial reference (`outSR` )—The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84. Data store (`dataStore` )—Results will be saved to the specified data store. The default is the spatiotemporal big data store. Syntax: Use dark colors for code blocksCopy `1 2 3 4 5 6 { "extent": {extent}, "processSR": {spatial reference}, "outSR": {spatial reference}, "dataStore": {data store} }`
`f`	The response format. The default response format is `html` . Values: `html` \| `json` \| `pjson`

Example usage

Below is a sample request URL for ForestBasedClassificationAndRegression:

Use dark colors for code blocksCopy
https://machine.domain.com/webadaptor/rest/services/System/GeoAnalyticsTools/GPServer/FindHotSpots/submitJob?
predictionType=Train&inFeatures={"url":"https://webadaptor.domain.com/server/rest/services/Hurricane/hurricaneTrack/0"}&featuresToPredict={"url":"https://webadaptor.domain.com/server/rest/services/USA/cities/0"}&variablePredict={"fieldName":"shelterCapacity","categorical":true}&explanatoryVariables={"fieldName":"townDensity","categorical":true}&numberOfTrees=20&minimumLeafSize=6&maximumTreeDepth=10&sampleSize=95&randomVariables=3&percentageForValidation=10&createVariableOfImportanceTab=false&explanatoryVariableMatching=[{"predictionLayerField":"Hurricane2019","trainingLayerField":"hurricanesIn2019"},{"predictionLayerField":"ShelterLocations","trainingLayerField":"CorpusChristiShelters"&outputTrainedName=myOutput&context={"extent":{"xmin":-122.68,"ymin":45.53,"xmax":-122.45,"ymax":45.6,"spatialReference":{"wkid":4326}}}&f=json

Response

When you submit a request, the service assigns a unique job ID for the transaction.

Syntax:

Use dark colors for code blocksCopy
{
  "jobId": "<unique job identifier>",
  "jobStatus": "<job status>"
}

After the initial request is submitted, you can use jobId to periodically check the status of the job and messages as described in Check job status. Once the job has successfully completed, use jobId to retrieve the results. To track the status, you can make a request of the following form:

Use dark colors for code blocksCopy
https://<analysis url>/ForestBasedClassificationAndRegression/jobs/<jobId>

Access results

When the status of the job request is esriJobSucceeded , you can access the results of the analysis by making a request of the following form:

Use dark colors for code blocksCopy
https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/<response type>?token=<your token>&f=json

Response	Description
`outputTrained`	The input features that are fit to the model. The type of feature (point, line, or polygon) depends on the input layers. Use dark colors for code blocksCopy `1 {"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/outputTrained"}` The result has properties for parameter name, data type, and value. The contents of `value` depend on the `outputName` parameter provided in the initial request. The `value` contains the URL of the feature service layer. Use dark colors for code blocksCopy `1 2 3 4 { "paramName":"outputTrained", "dataType":"GPRecordSet", "value":{"url":"<hosted featureservice layer url>"}}` See Feature output for more information about how the result layer is accessed.
`outputPredicted`	The features predicted using the model. The type of feature (table, point, line, or polygon) depends on the input layers. This result is optional and is only returned when `featureToPredict` is provided as input. Use dark colors for code blocksCopy `1 {"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/outputPredicted"}`
`variableOfImportance`	A table representing the variable of importance from the model fit. This result is optional and is only returned when `createVariableImportanceTable` is true. Use dark colors for code blocksCopy `1 {"url": "https://<analysis-url>/ForestBasedClassificationAndRegression/jobs/<jobId>/results/variableOfImportance"}`
`processInfo`	The `processInfo` parameter contains strings that summarize the `ForestBasedClassificationAndRegression` result. These strings are used for reporting tool results. You can create custom reports for your application using these strings. There are four parts in the returned JSON as follows: `messageCode`—The serial number for each unique message `message`—Text that may or may not contain parameters (in `${paramsName}` format) that must be replaced by values `params`—A dictionary of the keys and values to be inserted into the `${paramsName}` parameter in the message `style`—The formatting of the report produced by the Forest-based Classification And Regression tool in the map viewer. Use dark colors for code blocksCopy `1 2 3 4 5 6 { "messageCode" : "SS_84507", "message" : ["Attribute", "Min", "Max", "SD", "Mean","Input"], "params" : {}, "style" : "<table><tr><th></th><th></th><th></th><th></th><th></th><th></th></tr>" }`