SQL functions

The sql module in GeoAnalytics Engine includes over 150 functions that extend the Spark SQL API by enabling spatial queries on DataFrame columns. These functions enable creating geometries, operating on geometries, evaluating spatial relationships, summarizing geometries, and more. Like the functions in the pyspark.sql module, GeoAnalytics Engine SQL functions can be called with Python functions or in a PySpark SQL query statement.

In contrast to analysis Tools which are aware of all columns in a DataFrame and use all rows to compute a result if required, SQL functions typically operate on only one or two columns at a time.

In the code snippet below, a point column is added to a DataFrame. The geometries are created using a WKT string and the point_from_text Python function.

dataframe = my_dataframe.withColumn("geometry", ST.point_from_text("wkt_field", sr=4326))

If the DataFrame is registered as a temporary view the same operation can be called in a SQL statement, as shown in the code snippet below.

dataframe = spark.sql("SELECT *, st_pointfromtext(my_dataframe.wkt_field, 4326) as point_field from my_dataframe")

In both cases shown above, the same distributed engine is used to make the query. The syntax used to call the function is a matter of developer preference.

GeoAnalytics Engine includes five spatial data types:

Point
Multipoint
Linestring
Polygon
Unknown

These types are used to represent geographic data and are often required as arguments to or returned from SQL functions. The spatial reference of the spatial data is stored with the geometry column and is 0 if not set. The spatial reference can be set when creating a geometry column from a text or binary column by using the sr parameter in the constructor functions (i.e., ST_PointFromText, ST_LineFromBinary, and etc.). Alternatively, you can use ST_SRID or ST_SRText to set the spatial reference. The spatial reference will be set automatically when loading data from shapefiles or feature services.

What's next?

Learn more about how to set up your data and run tools and SQL functions: