Detect incidents

Detect Incidents examines time-sequential records using a specified condition. Records that meet the condition are marked as incidents. The result contains the input DataFrame records, with additional fields stating if the record is an incident, the status of the incident, the duration of the incident, and a unique incident identifier.

Usage notes

The following table outlines terminology for Detect Incidents:

Term	Description
Track	A sequence of records that are time enabled with time type instant. Records are determined to be in the sequence by a track identifier field and are ordered by time. For example, a city could have a fleet of snowplow trucks that record their location every 10 minutes. The vehicle ID could represent the distinct tracks.
Incident	Records that meet a condition of interest.
Instant	A single moment in time represented by a start time and no end time. Inputs to Detect Incidents must have a time type of instant. Learn how to enable time on a data frame.
Interval	A duration of time represented by a start and an end time.
Record of interest	Describes the record being analyzed. During analysis, all records are analyzed.

Detect Incidents will create a new output. It will not modify the input DataFrame.
Detect Incidents can be completed on DataFrames that are tabular or have a geometry. The input DataFrame must have an instant timestamp.
Only input records that have a time value will be used. Any record that does not have a time value will be excluded from analysis.
Tracks are represented by the unique combination of one or more track fields. For example, if the flightID and Destination fields are used as track identifiers, the records ID007, Solden and ID007, Tokyo would be in different tracks since they have different values for the Destination field.
Conditions are created using Arcade expressions. A start condition is required, and an end condition is optional. If you only apply a start condition, the incident starts when the start condition is evaluated as true and ends when the start condition is evaluated as false. For example, if values in a track are [0, 10, 15, 20, 40, 10, 12, -2, -12 ] and the start condition is $feature['values' ] > 15, the rows that are incidents are those with [True] and would be [0: False, 10: False, 15: False, 20: True, 40: True, 10: False, 12: False, -2: False, -12: False ], where only values above 15 are incidents. If you apply an end condition of $feature['values' ] < 0, the results would be [0: False, 10: False, 15: False, 20: True, 40: True, 10: True, 12: True, -2: False, -12: False ]. In this example, the incident starts when the start condition is met, and each sequential row is an incident until the end condition is true. These examples are outlined in the following table:

Position	1	2	3	4	5	6	7	8	9
Value	0	10	15	20	40	10	12	-2	-12
Start: `$feature['values' ] > 15` and no End	False	False	False	True	True	False	False	False	False
Start: `$feature['values' ] > 15` and End: `$feature ['values' ] < 0`	False	False	False	True	True	True	True	False	False

Applying a time interval segments tracks at a defined interval. For example, if you set a value of 1 day using setTimeBoundarySplit(), and a value of 9:00 a.m. on January 1, 1990 using the time_boundary_reference parameter, each track will be truncated at 9:00 a.m. every day. This split accelerates computing time, as it creates smaller tracks for analysis. If splitting by a recurring time interval boundary makes sense for your analysis, it is recommended for big data processing.
Conditions can be track aware.
A track can have multiple incidents.
The duration of an incident is calculated in milliseconds as the time of the record of interest minus the start of an incident. The duration is only calculated if the record has a status of Started, OnGoing, or Ended. The duration for a row with the status of Started is always 0.

Limitations

The input must be a time-enabled DataFrame of type instant. Any records that do not have time will not be included in the output.

Results

Results will include the fields from the input DataFrame as well as the following additional fields:

Field	Description	Notes
`IncidentID`	A unique ID given to every row that is an incident.
`IncidentStatus`	A string field representing the status of an incident. The value will be `null` if the row is not an incident, `Started` if the row is the first incident to meet the start condition, `OnGoing` if the row is still an incident, and `Ended` when a row is no longer an incident. A single track can have multiple segments of incidents. For example, a track with values `[0, 10, 15, 20, 40, 10, 12, -2, -12]` and a start condition of `$feature['values' ] > 15` will result in `IncidentStatus` values of `[0: null, 10: null, 15: null, 20: Started, 40: OnGoing, 10: Ended, 12: null, -2: null, -12: null]`.
`IncidentDuration`	The length of time, in milliseconds, an incident occurs. This is calculated as the difference between the record of interest and the record that started the incident.

Performance notes

Improve the performance of Detect Incidents by doing one or more of the following:

Only analyze the records in your area of interest. You can pick the records of interest by using one of the following SQL functions:
- ST_Intersection—Clip to an area of interest represented by a polygon. This will modify your input records.
- ST_BboxIntersects—Select records that intersect an envelope.
- ST_EnvIntersects—Select records having an evelope that intersects the envelope of another geometry.
- ST_Intersects—Select records that intersect another dataset or area of intersect represented by a polygon.
Specify Incidents as the value for the output_type parameter in the setOutputMode() setter.
Split your tracks using setTimeBoundarySplit().

Similar capabilities

Syntax

For more details, go to the GeoAnalytics Engine API reference for detect incidents.

Setter	Description	Required
`run(dataframe)`	Runs the Detect Incidents tool using the provided DataFrame.	Yes
`setEndConditionExpression(end_condition_expression)`	Sets the condition used to end incidents. If there is an end condition, any record that meets the start condition expression and does not meet the end condition expression is an incident.	No
`setOutputMode(output_mode)`	Sets which observations are returned. Choose from `'All'` (default) or `'Incidents'`.	No
`setStartConditionExpression(start_condition_expression))`	Sets the condition used to start incidents. If there is no end condition expression specified, any record that meets this condition is an incident. If there is an end condition, any record that meets the start condition expression and does not meet the end condition expression is an incident.	Yes
`setTimeBoundarySplit(time_boundary_split, time_boundary_split_unit, time_boundary_reference=None)`	Sets boundaries to limit calculations to defined spans of time.	No
`setTrackFields(*track_fields)`	Sets one or more fields used to identify distinct tracks.	Yes

Examples

Run Detect Incidents

Python
Use dark colors for code blocksCopy

# Log in
import geoanalytics
geoanalytics.auth(username="myusername", password="mypassword")

# Imports
from geoanalytics.tools import DetectIncidents
from geoanalytics.sql import functions as ST
from pyspark.sql import functions as F

# Path to the Atlantic hurricanes data
data_path = r"https://sampleserver6.arcgisonline.com/arcgis/rest/services/" \
            "Hurricanes/MapServer/0"

# Create an Atlantic hurricanes DataFrame
df = spark.read.format("feature-service").load(data_path) \
                    .st.set_time_fields("Date_Time")

# Run Detect Incidents to find where the windspeed of a hurricane is greater than
# or equal to 70 nautical miles per hour (knots)
result = DetectIncidents() \
   .setTrackFields("EVENTID") \
   .setStartConditionExpression(start_condition_expression="$feature.WINDSPEED >= 70") \
   .setOutputMode("Incidents") \
   .run(dataframe=df)

# Convert IncidentDuration from milliseconds to hours
result = result.withColumn("IncidentDuration_hours",
                           F.col("IncidentDuration") / (60 * 60 * 1000))

# Show the first 5 rows of the result DataFrame
result.filter(result["EVENTID"] == 'Alberto') \
      .filter(result["IncidentStatus"] == 'OnGoing') \
      .select("EVENTID", "IncidentStatus", "IncidentDuration","IncidentDuration_hours",
             F.date_format("Date_Time", "yyyy-MM-dd").alias("Date_Time")) \
      .sort("IncidentDuration_hours", "Date_Time", ascending=False).show(5, truncate=False)

Result
Use dark colors for code blocksCopy
+-------+--------------+----------------+----------------------+----------+
|EVENTID|IncidentStatus|IncidentDuration|IncidentDuration_hours|Date_Time |
+-------+--------------+----------------+----------------------+----------+
|Alberto|OnGoing       |302400000       |84.0                  |2000-08-22|
|Alberto|OnGoing       |280800000       |78.0                  |2000-08-22|
|Alberto|OnGoing       |259200000       |72.0                  |2000-08-22|
|Alberto|OnGoing       |237600000       |66.0                  |2000-08-21|
|Alberto|OnGoing       |237600000       |66.0                  |2000-08-13|
+-------+--------------+----------------+----------------------+----------+
only showing top 5 rows

Plot results

Python
Use dark colors for code blocksCopy

# Create a world continents DataFrame for plotting
continents_path = "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/" \
                "World_Continents/FeatureServer/0"
continents_subset_df = spark.read.format("feature-service").load(continents_path) \
                    .where("""CONTINENT = 'North America' or CONTINENT = 'South America' or
                              CONTINENT = 'Africa' or CONTINENT = 'Europe'""")

# Plot the resulting incidents for hurricane Alberto with the USA continents data
continents_subset_plot = continents_subset_df.st.plot(facecolor="lightgreen",
                                                      edgecolors="black",
                                                      alpha=0.3,
                                                      figsize=(20,10),
                                                      basemap="light")
alberto_tracks_plot = df.where("EVENTID = 'Alberto'").st.plot(color="lightgrey",
                                                              edgecolors="darkgrey",
                                                              ax=continents_subset_plot)
result_plot = result.where("EVENTID = 'Alberto'").st.plot(cmap_values="IncidentStatus",
                                                          cmap="tab10",
                                                          is_categorical=True,
                                                          legend=True,
                                                          legend_kwds={"title": "Incident status",
                                                                       "loc" : "upper left"},
                                                          ax=alberto_tracks_plot)
result_plot.set_title("Detected incidents for hurricane Alberto where windspeed was greater\n"
 "than or equal to 70 nautical miles per hour (knots)", {'fontsize': 12})
result_plot.set_xlabel("X (Meters)")
result_plot.set_xlim(left=-20000000, right=8000000)
result_plot.set_ylabel("X (Meters)")
result_plot.set_ylim(bottom=-8000000, top=20000000);

Plotting example for a Detect Incidents result. Finding hurricanes with high wind speeds is shown.

Version table

Release	Notes
1.0.0	Python tool introduced