geoanalytics.tracks.functions

after

geoanalytics.tracks.functions.after(track, offset)

Returns a linestring column representing the subset of the input track that comes after the offset distance or offset duration from the start of the track. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_After

Parameters
  • track – Linestring column.

  • offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns

Linestring column representing the subset of the input track that comes after the offset distance or offset duration from the start of the track.

Return type

pyspark.sql.Column

aggr_create_track

geoanalytics.tracks.functions.aggr_create_track(point, time)

Operates on a grouped DataFrame and creates tracks using the points in each group, where each point represents an entity’s observed location at an instant. The output tracks are linestrings that represent the shortest path between each observation. Each vertex in the linestring has a timestamp (stored as the M-value) and the vertices are ordered sequentially. You can group your DataFrame using DataFrame.groupBy() or with a GROUP BY clause in a SQL statement.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Aggr_CreateTrack

Parameters
  • point (pyspark.sql.Column) – Point geometry column.

  • time – Timestamp column to order points by.

Returns

Linestring column representing the result tracks.

Return type

pyspark.sql.Column

before

geoanalytics.tracks.functions.before(track, offset)

Returns a linestring column representing the subset of the input track that is between the track start and the offset distance or offset duration. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g. (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Before

Parameters
  • track – Linestring column.

  • offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns

Linestring column representing the subset of the input track that is between the track start and the offset distance or offset duration.

Return type

pyspark.sql.Column

between

geoanalytics.tracks.functions.between(track, start_offset, end_offset)

Returns a linestring column representing the subset of the input track that comes between the two offset distances or offset durations. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Between

Parameters
  • track – Linestring column.

  • start_offset (pyspark.sql.Column) – The start offset distance or start offset duration. The offset must be greater than zero.

  • end_offset (pyspark.sql.Column) – The end offset distance or end offset duration. The offset must be greater than zero.

Returns

Linestring column representing the subset of the input track that comes between the two offset distances or offset durations on the track.

Return type

pyspark.sql.Column

distance_along

geoanalytics.tracks.functions.distance_along(track, point, max_deviation=0.0, output_units=None)

Returns a double column representing the length of the track between the track start and where the point intersects the track. You can optionally specify a max_deviation which is the maximum distance a point can be from the track while still being considered on the track. The value is in the units of the track’s spatial reference.

If the input track and point do not have the same spatial reference, the point will be transformed to the spatial reference of the track.

The result is returned in the units specified by output_units. When output_units is None, the result is in the units of the input track’s spatial reference if it is projected; otherwise, the result is in meters.

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_DistanceAlong

Parameters
  • track – Linestring column.

  • point (pyspark.sql.Column) – Point column.

  • max_deviation (float/int, optional) – Numeric value representing the maximum distance a point can be from the track while still being considered on the track.

  • output_units (str, optional) – The units of the result. Choose from Meters, Kilometers, Feet, Yards, Miles, or NauticalMiles.

Returns

DoubleType column representing the length of the track between the track start and where the point intersects the track

Return type

pyspark.sql.Column

duration

geoanalytics.tracks.functions.duration(track, output_units='seconds')

Returns a double column representing the duration of the input track. The duration is the difference between the first and last timestamps in the track. The result is returned in the units specified by output_units. Returns null for invalid tracks.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Duration

Parameters
  • track – Linestring column.

  • output_units (str, optional) – The units of the result. Choose from Milliseconds, Seconds, Minutes, Hours, or Days.

Returns

DoubleType column representing the track duration.

Return type

pyspark.sql.Column

duration_along

geoanalytics.tracks.functions.duration_along(track, point, max_deviation=0.0, output_units='seconds')

Returns a double column representing the duration of the track between the track start and where the point intersects the track. You can optionally specify a max_deviation which is the maximum distance a point can be from the track while still being considered on the track. The value is in the units of the track’s spatial reference.

The result is returned in the units specified by output_units. The default is seconds.

If the input track and point do not have the same spatial reference, the point will be transformed to the spatial reference of the track.

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_DurationAlong

Parameters
  • track (pyspark.sql.Column) – Linestring column.

  • point (pyspark.sql.Column) – Point column.

  • max_deviation (float/int, optional) – Numeric value representing the maximum distance a point can be from the track while still being considered on the track.

  • output_units (str, optional) – The units of the result. Choose from Milliseconds, Seconds, Minutes, Hours, or Days.

Returns

DoubleType column representing the duration of the track between the track start and where the point intersects the track.

Return type

pyspark.sql.Column

end_timestamp

geoanalytics.tracks.functions.end_timestamp(track)

Returns a timestamp column containing the last timestamp of each input track. Returns null for invalid tracks.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_EndTimestamp

Parameters

track – Linestring column.

Returns

Timestamp column with start timestamp of each track.

Return type

pyspark.sql.Column

is_valid

geoanalytics.tracks.functions.is_valid(track)

Returns a boolean column where the result is True if the input linestring is a valid track; otherwise, it returns False. A linestring is a valid track if it is non-null, non-empty, and has M-values that are distinct and strictly increasing.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_IsValid

Parameters

track – Linestring column.

Returns

Geometry column with the centerline of the polygon feature.

Return type

pyspark.sql.Column

lcss

geoanalytics.tracks.functions.lcss(track1, track2, search_distance, search_duration=None)

Returns a double column representing the size of the longest common subsequence between the two input tracks.

Returns null if a track is invalid.

The longest common subsequence is a count of all pairs of observations, each from the two tracks, within the search distance and duration thresholds.

The ST_CreateDistance and ST_CreateDuration functions can be used to define the search distance and search duration parameters. You can also define them with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

TRK_LCSS uses planar distance calculations when the tracks are in a projected coordinate system and geodesic distance calculations when the tracks are in a geographic coordinate system. If one of the tracks has an unknown spatial reference, the function will use planar distance calculations.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_LCSS

Parameters
  • track1 (pyspark.sql.Column) – Linestring column.

  • track2 (pyspark.sql.Column) – Linestring column.

  • search_distance (pyspark.sql.Column/struct/tuple) – Distance used to calculate the longest common subsequence. It can be set using ST_CreateDistance.

  • search_duration (pyspark.sql.Column/struct/tuple) – Duration used to calculate the longest common subsequence. It can be set using ST_CreateDuration.

Returns

DoubleType column representing the size of the longest common subsequence between the two tracks.

Return type

pyspark.sql.Column

length

geoanalytics.tracks.functions.length(track, output_units=None)

Returns a double column representing the length of the input track. Returns null for invalid tracks.

The result is returned in the units specified by output_units. When output_units is None, the result is in the units of the input track’s spatial reference if it is projected; otherwise, the result is in meters.

Planar distance calculations are used if the input tracks have a projected spatial reference or no spatial reference. Chordal distance calculations are used if the input tracks have a geographic spatial reference. For more information see Coordinate systems and transformations.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Length

Parameters
  • track – Linestring column.

  • output_units (str, optional) – The units of the result. Choose from Meters, Kilometers, Feet, Yards, Miles, or NauticalMiles.

Returns

DoubleType column representing the track length.

Return type

pyspark.sql.Column

query

geoanalytics.tracks.functions.query(track, offset)

Returns a point column representing the location that is the offset distance or offset duration along the input track, measured from the track start. An offset column can be created with ST_CreateDistance or ST_CreateDuration. You can also define an offset with a tuple containing a number and a unit (e.g., (10, “kilometers”) or (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Query

Parameters
  • track – Linestring column.

  • offset (pyspark.sql.Column) – The offset distance or offset duration. The offset must be greater than zero.

Returns

Point column representing the location that is the offset distance or offset duration along the input track.

Return type

pyspark.sql.Column

speed

geoanalytics.tracks.functions.speed(track, output_units='meterspersecond')

Returns a double column representing the speed of the input track. The speed is the length of the track (see TRK_Length) divided by the duration of the track (see TRK_Duration). The result is returned in the units specified by output_units. Returns null for invalid tracks.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_Speed

Parameters
  • track – Linestring column.

  • output_units (str, optional) – The units of the result. Choose from MetersPerSecond, MilesPerHour, NauticalMilesPerHour, FeetPerSecond, or KilometersPerHour.

Returns

DoubleType column representing the track speed.

Return type

pyspark.sql.Column

split_by_distance

geoanalytics.tracks.functions.split_by_distance(track, distance)

Returns an array of tracks created by splitting the input track into segments with each segment no longer than the specified distance. The distance can be created with ST_CreateDistance or with a tuple containing a number and a unit (e.g., (10, “kilometers”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_SplitByDistance

Parameters
  • track – Linestring column.

  • distance (pyspark.sql.Column) – The maximum length of result tracks. The distance must be greater than zero.

Returns

Array column representing the tracks created by splitting the input track.

Return type

pyspark.sql.Column

split_by_distance_gap

geoanalytics.tracks.functions.split_by_distance_gap(track, gap_distance)

Returns an array of tracks created by splitting the input track wherever two vertices are farther apart than the specified gap distance. The track is split by removing the segment between the two vertices. The distance can be created with ST_CreateDistance or with a tuple containing a number and a unit (e.g., (10, “kilometers”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_SplitByDistanceGap

Parameters
  • track – Linestring column.

  • gap_distance (pyspark.sql.Column) – The maximum distance allowed between two track vertices. The distance must be greater than zero.

Returns

Array column representing the tracks created by splitting the input track.

Return type

pyspark.sql.Column

split_by_duration

geoanalytics.tracks.functions.split_by_duration(track, duration)

Returns an array of tracks created by splitting the input track into segments with each segment no longer than the specified duration. The duration can be created with ST_CreateDuration or with a tuple containing a number and a unit (e.g., (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_SplitByDuration

Parameters
  • track – Linestring column.

  • duration (pyspark.sql.Column) – The maximum duration of result tracks. The duration must be greater than zero.

Returns

Array column representing the tracks created by splitting the input track.

Return type

pyspark.sql.Column

split_by_time_gap

geoanalytics.tracks.functions.split_by_time_gap(track, gap_duration)

Returns an array of tracks created by splitting the input track wherever two vertices are farther apart than the specified gap duration. The track is split by removing the segment between the two vertices. The duration can be created with ST_CreateDuration or with a tuple containing a number and a unit (e.g., (5, “minutes”)).

Returns null if a track is invalid.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_SplitByTimeGap

Parameters
  • track – Linestring column.

  • gap_duration (pyspark.sql.Column) – The maximum duration allowed between two track vertices. The duration must be greater than zero.

Returns

Array column representing the tracks created by splitting the input track.

Return type

pyspark.sql.Column

start_timestamp

geoanalytics.tracks.functions.start_timestamp(track)

Returns a timestamp column containing the first timestamp of each input track. Returns null for invalid tracks.

Refer to the GeoAnalytics Engine guide for examples and usage notes: TRK_StartTimestamp

Parameters

track – Linestring column.

Returns

Timestamp column with start timestamp of each track.

Return type

pyspark.sql.Column