The sql
module in GeoAnalytics Engine includes over 150
functions that extend the Spark SQL API by enabling spatial queries on
DataFrame columns. These functions enable creating geometries, operating
on geometries, evaluating spatial relationships, summarizing geometries,
and more. Like the functions in the pyspark.sql
module, GeoAnalytics Engine
SQL functions can be called with Python functions or in a PySpark SQL
query statement.
In contrast to analysis Tools which are aware of all columns in a DataFrame and use all rows to compute a result if required, SQL functions typically operate on only one or two columns at a time.
In the code snippet below, a point column is added to a DataFrame. The
geometries are created using a WKT string and the point
Python
function.
dataframe = my
If the DataFrame is registered as a temporary view the same operation can be called in a SQL statement, as shown in the code snippet below.
dataframe = spark.sql("
In both cases shown above, the same distributed engine is used to make the query. The syntax used to call the function is a matter of developer preference.
GeoAnalytics Engine includes five spatial data types:
- Point
- Multipoint
- Linestring
- Polygon
- Unknown
These types are used to represent geographic data and are often required as arguments to or returned from SQL functions.
The spatial reference of the spatial data is stored with the geometry column and is 0 if not set. The spatial reference
can be set when creating a geometry column
from a text or binary column by using the sr
parameter in the constructor functions (i.e.,
ST_PointFromText,
ST_LineFromBinary, and etc.).
Alternatively, you can use ST_SRID or ST_SRText to set the spatial reference.
The spatial reference will be set automatically when loading data from shapefiles or feature services.
What's next?
Learn more about how to set up your data and run tools and SQL functions: