geoanalytics.tracks.functions¶

after¶

geoanalytics.tracks.functions.after(track, offset)¶

Returns a linestring column representing the subset of the input track that comes after the offset distance or offset duration from the start of the track. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_After

Parameters:

track – Linestring column.
offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns:

Linestring column representing the subset of the input track that comes after the offset distance or offset duration from the start of the track.

Return type:

pyspark.sql.Column

aggr_create_track¶

geoanalytics.tracks.functions.aggr_create_track(point, timestamp)¶

Operates on a grouped DataFrame and creates tracks using the points in each group, where each point represents an entity’s observed location at an instant. The output tracks are linestrings that represent the shortest path between each observation. Each vertex in the linestring has a timestamp (stored as the M-value) and the vertices are ordered sequentially. You can group your DataFrame using DataFrame.groupBy() or with a GROUP BY clause in a SQL statement.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Aggr_CreateTrack

Parameters:

point (pyspark.sql.Column) – Point geometry column.
timestamp – Timestamp column to order points by.

Returns:

Linestring column representing the result tracks.

Return type:

pyspark.sql.Column

before¶

geoanalytics.tracks.functions.before(track, offset)¶

Returns a linestring column representing the subset of the input track that is between the track start and the offset distance or offset duration. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g. (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Before

Parameters:

track – Linestring column.
offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns:

Linestring column representing the subset of the input track that is between the track start and the offset distance or offset duration.

Return type:

pyspark.sql.Column

between¶

geoanalytics.tracks.functions.between(track, start_offset, end_offset)¶

Returns a linestring column representing the subset of the input track that comes between the two offset distances or offset durations. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Between

Parameters:

track – Linestring column.
start_offset (pyspark.sql.Column) – The start offset distance or start offset duration. The offset must be greater than zero.
end_offset (pyspark.sql.Column) – The end offset distance or end offset duration. The offset must be greater than zero.

Returns:

Linestring column representing the subset of the input track that comes between the two offset distances or offset durations on the track.

Return type:

pyspark.sql.Column

collapse_dwells¶

geoanalytics.tracks.functions.collapse_dwells(track, distance_threshold, duration_threshold)¶

Returns a linestring column representing the input track with the dwell segments removed.

TRK_CollapseDwells removes dwell segments from the input track and connects the remaining points, preserving the start and end points of each dwell.

Returns null if a track is invalid.

The ST_CreateDistance and ST_CreateDuration functions can be used to define the distance and duration thresholds. You can also define them with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_CollapseDwells

Parameters:

track (pyspark.sql.Column) – Linestring column.
distance_threshold (pyspark.sql.Column/struct/tuple) – The distance threshold used to define a dwell.
duration_threshold (pyspark.sql.Column/struct/tuple) – The duration threshold used to define a dwell.

Returns:

Linestring column representing the input track with the dwell segments collapsed.

Return type:

pyspark.sql.Column

distance_along¶

geoanalytics.tracks.functions.distance_along(track, point, max_deviation=0.0, output_unit=None)¶

Returns a double column representing the length of the track between the track start and where the point intersects the track. You can optionally specify a max_deviation which is the maximum distance a point can be from the track while still being considered on the track. The value is in the units of the track’s spatial reference.

If the input track and point do not have the same spatial reference, the point will be transformed to the spatial reference of the track.

The result is returned in the units specified by output_unit. When output_unit is None, the result is in the units of the input track’s spatial reference if it is projected; otherwise, the result is in meters.

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_DistanceAlong

Parameters:

track – Linestring column.
point (pyspark.sql.Column) – Point column.
max_deviation (float/int, optional) – Numeric value representing the maximum distance a point can be from the track while still being considered on the track.
output_unit (str, optional) – The units of the result. Choose from Meters, Kilometers, Feet, Yards, Miles, or NauticalMiles.

Returns:

DoubleType column representing the length of the track between the track start and where the point intersects the track

Return type:

pyspark.sql.Column

distance_within¶

geoanalytics.tracks.functions.distance_within(track, geometry, output_unit=None)¶

Returns a float column representing the distance traveled within a geometry. The geometry type can be linestring or polygon. The result is returned in the units specified by output_unit. When output_unit is None, the result is in the units of the input track’s spatial reference if it’s projected; otherwise, the result is in meters.

If the track and geometry columns are in different spatial references, the function automatically transforms the geometry into the spatial reference of the track.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_DistanceWithin

Parameters:

track (pyspark.sql.Column) – Linestring column.
geometry (pyspark.sql.Column) – Geometry column. The geometry type can be linestring or polygon.
output_unit (str, optional) – The units of the result. Choose from Meters, Kilometers, Feet, Yards, Miles, or NauticalMiles.

Returns:

DoubleType column representing the distance traveled within the geometry.

Return type:

pyspark.sql.Column

duration¶

geoanalytics.tracks.functions.duration(track, output_unit='seconds')¶

Returns a double column representing the duration of the input track. The duration is the difference between the first and last timestamps in the track. The result is returned in the units specified by output_unit. Returns null for invalid tracks.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Duration

Parameters:

track – Linestring column.
output_unit (str, optional) – The units of the result. Choose from Milliseconds, Seconds, Minutes, Hours, or Days.

Returns:

DoubleType column representing the track duration.

Return type:

pyspark.sql.Column

duration_along¶

geoanalytics.tracks.functions.duration_along(track, point, max_deviation=0.0, output_unit='seconds')¶

Returns a double column representing the duration of the track between the track start and where the point intersects the track. You can optionally specify a max_deviation which is the maximum distance a point can be from the track while still being considered on the track. The value is in the units of the track’s spatial reference.

The result is returned in the units specified by output_unit. The default is seconds.

If the input track and point do not have the same spatial reference, the point will be transformed to the spatial reference of the track.

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_DurationAlong

Parameters:

track (pyspark.sql.Column) – Linestring column.
point (pyspark.sql.Column) – Point column.
max_deviation (float/int, optional) – Numeric value representing the maximum distance a point can be from the track while still being considered on the track.
output_unit (str, optional) – The units of the result. Choose from Milliseconds, Seconds, Minutes, Hours, or Days.

Returns:

DoubleType column representing the duration of the track between the track start and where the point intersects the track.

Return type:

pyspark.sql.Column

duration_within¶

geoanalytics.tracks.functions.duration_within(track, geometry, output_unit='seconds')¶

Returns a float column representing the duration of the track that intersects a linestring or polygon. The result is returned in the units specified by output_unit. The default is seconds.

If the track and geometry columns are in different spatial references, the function automatically transforms the geometry into the spatial reference of the track.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_DurationWithin

Parameters:

track (pyspark.sql.Column) – Linestring column.
geometry (pyspark.sql.Column) – Geometry column. The geometry type can be linestring or polygon.
output_unit (str, optional) – The units of the result. Choose from Milliseconds, Seconds, Minutes, Hours, or Days.

Returns:

DoubleType column representing the duration of the track that intersects the geometry.

Return type:

pyspark.sql.Column

end_timestamp¶

geoanalytics.tracks.functions.end_timestamp(track)¶

Returns a timestamp column containing the last timestamp of each input track. Returns null for invalid tracks.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_EndTimestamp

Parameters:: track – Linestring column.
Returns:: Timestamp column with start timestamp of each track.
Return type:: pyspark.sql.Column

entry_exit_points¶

geoanalytics.tracks.functions.entry_exit_points(track, geometry)¶

Returns an array of struct representing the points at which a track enters or exists a linestring or polygon. The entry and exit point structs contain the following fields:

point: the geometry of the entry or exit point.
time: the timestamp of the entry or exit point formatted as HH-MM-SS hh:mm:ss.s.
track_endpoint: a boolean value. True if the entry or exit point is the starting or ending point of the track.

If the track and geometry columns are in different spatial references, the function automatically transforms the geometry into the spatial reference of the track. The spatial reference of the point geometry in the output is the same as the track.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_EntryExitPoints

Parameters:

track (pyspark.sql.Column) – Linestring column.
geometry (pyspark.sql.Column) – Geometry column. The geometry type can be linestring or polygon.

Returns:

Array column representing the entry and exit points that the track intersects with the linestring or polygon.

Return type:

pyspark.sql.Column

find_dwells¶

geoanalytics.tracks.functions.find_dwells(track, distance_threshold, duration_threshold)¶

Returns an array of tracks, each track representing the points of the input track where the track is dwelling.

A track is considered to be dwelling if the points on the track have traveled a distance less than the distance threshold for a duration that exceeds the duration threshold. A dwell is defined on segments of the track where this condition is met.

TRK_FindDwells returns an array of tracks, each representing a dwelling portion of the input track.

Returns null if a track is invalid.

The ST_CreateDistance and ST_CreateDuration functions can be used to define the distance and duration thresholds. You can also define them with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_FindDwells

Parameters:

track (pyspark.sql.Column) – Linestring column.
distance_threshold (pyspark.sql.Column/struct/tuple) – The distance threshold used to define a dwell.
duration_threshold (pyspark.sql.Column/struct/tuple) – The duration threshold used to define a dwell.

Returns:

Array column representing the tracks created from the dwell segments of the input track.

Return type:

pyspark.sql.Column

is_valid¶

geoanalytics.tracks.functions.is_valid(track)¶

Returns a boolean column where the result is True if the input linestring is a valid track; otherwise, it returns False. A linestring is a valid track if it is non-null, non-empty, and has M-values that are distinct and strictly increasing.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_IsValid

Parameters:: track – Linestring column.
Returns:: Geometry column with the centerline of the polygon feature.
Return type:: pyspark.sql.Column

lcss¶

geoanalytics.tracks.functions.lcss(track1, track2, search_distance, search_duration=None)¶

Returns a double column representing the size of the longest common subsequence between the two input tracks.

Returns null if a track is invalid.

The longest common subsequence is a count of all pairs of observations, each from the two tracks, within the search distance and duration thresholds.

The ST_CreateDistance and ST_CreateDuration functions can be used to define the search distance and search duration parameters. You can also define them with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

TRK_LCSS uses planar distance calculations when the tracks are in a projected coordinate system and geodesic distance calculations when the tracks are in a geographic coordinate system. If one of the tracks has an unknown spatial reference, the function will use planar distance calculations.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_LCSS

Parameters:

track1 (pyspark.sql.Column) – Linestring column.
track2 (pyspark.sql.Column) – Linestring column.
search_distance (pyspark.sql.Column/struct/tuple) – Distance used to calculate the longest common subsequence. It can be set using ST_CreateDistance.
search_duration (pyspark.sql.Column/struct/tuple) – Duration used to calculate the longest common subsequence. It can be set using ST_CreateDuration.

Returns:

DoubleType column representing the size of the longest common subsequence between the two tracks.

Return type:

pyspark.sql.Column

length¶

geoanalytics.tracks.functions.length(track, output_unit=None)¶

Returns a double column representing the length of the input track. Returns null for invalid tracks.

The result is returned in the units specified by output_unit. When output_unit is None, the result is in the units of the input track’s spatial reference if it is projected; otherwise, the result is in meters.

Planar distance calculations are used if the input tracks have a projected spatial reference or no spatial reference. Chordal distance calculations are used if the input tracks have a geographic spatial reference. For more information see Coordinate systems and transformations.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Length

Parameters:

track – Linestring column.
output_unit (str, optional) – The units of the result. Choose from Meters, Kilometers, Feet, Yards, Miles, or NauticalMiles.

Returns:

DoubleType column representing the track length.

Return type:

pyspark.sql.Column

query¶

geoanalytics.tracks.functions.query(track, offset)¶

Returns a point column representing the location that is the offset distance or offset duration along the input track, measured from the track start. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Query

Parameters:

track – Linestring column.
offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns:

Point column representing the location that is the offset distance or offset duration along the input track.

Return type:

pyspark.sql.Column

speed¶

geoanalytics.tracks.functions.speed(track, output_unit='meterspersecond')¶

Returns a double column representing the speed of the input track. The speed is the length of the track (see TRK_Length) divided by the duration of the track (see TRK_Duration). The result is returned in the units specified by output_unit. Returns null for invalid tracks.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_Speed

Parameters:

track – Linestring column.
output_unit (str, optional) – The units of the result. Choose from MetersPerSecond, MilesPerHour, NauticalMilesPerHour, FeetPerSecond, or KilometersPerHour.

Returns:

DoubleType column representing the track speed.

Return type:

pyspark.sql.Column

split_by_distance¶

geoanalytics.tracks.functions.split_by_distance(track, distance)¶

Returns an array of tracks created by splitting the input track into segments with each segment no longer than the specified distance. The distance can be created with ST_CreateDistance or with a tuple containing a number and a unit (e.g., (10, “kilometers”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_SplitByDistance

Parameters:

track – Linestring column.
distance (pyspark.sql.Column) – The maximum length of result tracks. The distance must be greater than zero.

Returns:

Array column representing the tracks created by splitting the input track.

Return type:

pyspark.sql.Column

split_by_distance_gap¶

geoanalytics.tracks.functions.split_by_distance_gap(track, gap_distance)¶

Returns an array of tracks created by splitting the input track wherever two vertices are farther apart than the specified gap distance. The track is split by removing the segment between the two vertices. The distance can be created with ST_CreateDistance or with a tuple containing a number and a unit (e.g., (10, “kilometers”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_SplitByDistanceGap

Parameters:

track – Linestring column.
gap_distance (pyspark.sql.Column) – The maximum distance allowed between two track vertices. The distance must be greater than zero.

Returns:

Array column representing the tracks created by splitting the input track.

Return type:

pyspark.sql.Column

split_by_duration¶

geoanalytics.tracks.functions.split_by_duration(track, duration)¶

Returns an array of tracks created by splitting the input track into segments with each segment no longer than the specified duration. The duration can be created with ST_CreateDuration or with a tuple containing a number and a unit (e.g., (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_SplitByDuration

Parameters:

track – Linestring column.
duration (pyspark.sql.Column) – The maximum duration of result tracks. The duration must be greater than zero.

Returns:

Array column representing the tracks created by splitting the input track.

Return type:

pyspark.sql.Column

split_by_dwells¶

geoanalytics.tracks.functions.split_by_dwells(track, distance_threshold, duration_threshold)¶

Returns an array of tracks, each track representing a part of the input track where a dwell condition is not met.

TRK_SplitByDwells splits the input track by removing the dwell segments. The output is an array of tracks, each representing a portion of the input track where it is in motion.

Returns null if a track is invalid.

The ST_CreateDistance and ST_CreateDuration functions can be used to define the distance and duration thresholds. You can also define them with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_SplitByDwells

Parameters:

track (pyspark.sql.Column) – Linestring column.
distance_threshold (pyspark.sql.Column/struct/tuple) – The distance threshold used to define a dwell.
duration_threshold (pyspark.sql.Column/struct/tuple) – The duration threshold used to define a dwell.

Returns:

Array column representing the tracks created by splitting the input track at points where the track is dwelling, before and after the start and end of the dwell.

Return type:

pyspark.sql.Column

split_by_time_gap¶

geoanalytics.tracks.functions.split_by_time_gap(track, gap_duration)¶

Returns an array of tracks created by splitting the input track wherever two vertices are farther apart than the specified gap duration. The track is split by removing the segment between the two vertices. The duration can be created with ST_CreateDuration or with a tuple containing a number and a unit (e.g., (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_SplitByTimeGap

Parameters:

track – Linestring column.
gap_duration (pyspark.sql.Column) – The maximum duration allowed between two track vertices. The duration must be greater than zero.

Returns:

Array column representing the tracks created by splitting the input track.

Return type:

pyspark.sql.Column

start_timestamp¶

geoanalytics.tracks.functions.start_timestamp(track)¶

Returns a timestamp column containing the first timestamp of each input track. Returns null for invalid tracks.

Refer to the GeoAnalytics guide for examples and usage notes: TRK_StartTimestamp

Parameters:: track – Linestring column.
Returns:: Timestamp column with start timestamp of each track.
Return type:: pyspark.sql.Column