Install GeoAnalytics Engine on Databricks

The Databricks Lakehouse Platform provides a unified set of tools for building, deploying, sharing, and maintaining enterprise-grade data solutions at scale. Databricks integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. The Databricks Runtime releases include both open source technologies like Apache Spark, as well as a number of proprietary tools that integrate and expand these technologies to add optimized performance and ease of use.

GeoAnalytics Engine can be installed on Databricks in Azure, AWS, or Google Cloud Platform to add spatial data science and analysis capabilities to your Databricks Lakehouse. After installing GeoAnalytics Engine, you will be able to run spatial SQL functions and analysis tools using a Spark cluster managed by Databricks. Because GeoAnalytics Engine extends PySpark, you can spatially-enable your data wherever it lives and seamlessly execute spatial analysis workflows alongside other data science and machine learning technologies in a Databricks notebook.

The table below summarizes the Databricks Runtime releases supported by each version of GeoAnalytics Engine. Both the Databricks Runtime and Databricks Runtime for Machine Learning are supported.

GeoAnalytics Engine	Databricks Runtimes
1.1.x	7.3-12.1
1.2.x	7.3-13.2
1.3.x	9.1-14.2
1.4.x	9.1-15.2
1.5.x	11.3-15.4

By following the steps below, you can leverage GeoAnalytics Engine within a PySpark notebook hosted on Databricks. To complete the installation, you will need:

An active subscription to AWS, Azure, or Google Cloud Platform.
GeoAnalytics Engine install files. If you have a GeoAnalytics Engine subscription with a username and password, you can download the ArcGIS GeoAnalytics Engine distribution here after signing in. If you have a license file, follow the instructions provided with your license file to download the GeoAnalytics Engine distribution.
A GeoAnalytics Engine subscription, or a license file.

Prepare the workspace

If you do not have an active Databricks account, create one by starting a Databricks free trial.
Create a Databricks workspace if you don't have one already and open the workspace in a web browser.
Find the jar file downloaded previously and upload it to Databricks' Unity Catalog volumes. Copy or make note of the jar path. Use the File API Format, for example /Volumes/engine/jars/geoanalytics_2_12_x_x_x.jar. Depending on the analysis you will complete, optionally upload the following jars:
- esri-projection-geographic, if you need to perform a transformation that requires supplementary projection data.
- geoanalytics-natives to use geocoding or network analysis tools.
Note
Unity Catalog volumes are supported on Databricks Runtime 13.3 LTS and above. If you do not have access to volumes, you can upload the jars to DBFS.
Use the script below as a Cluster-scoped init script to install GeoAnalytics Engine on only this cluster. You can alternatively use it as a Global init script to install GeoAnalytics Engine on all clusters in your Databricks workspace. Replace JAR_PATH with the File API path noted in step 3.
Use dark colors for code blocksCopy
```
1
2
#!/bin/bash
cp JAR_PATH /databricks/jars/
```
If you need to perform a transformation that requires supplementary projection data, add the first line in the example below to the script and replace PROJECTION_DATA_JAR_PATH with the corresponding File API path noted in step 3. Follow these steps for every esri-projection-geographic jar that you previously uploaded.

If you are planning to use geocoding or network analysis tools, add the second line in the example below to the script and replace GEOANALYTICS_NATIVES_JAR_PATH with the corresponding File API path noted in step 3.
Use dark colors for code blocksCopy
```
1
2
cp PROJECTION_DATA_JAR_PATH /databricks/jars/
cp GEOANALYTICS_NATIVES_JAR_PATH /databricks/jars/
```
Note
To use ST_H3Bin or ST_H3Bins with Databricks runtime 11.1 or earlier, you must copy the H3 Java bindings to /databricks/jars on your cluster using an init script, like in the examples shown above. This is not required on later Databricks runtimes.

Create a cluster

In the workspace sidebar, go to Compute, and click the Create compute button to open the New compute page. Choose a name for your cluster.
Choose to deploy either a Multi node or Single node cluster and select a Policy and an Access mode.

Note
GeoAnalytics Engine does not support the Shared access mode for cluster configurations. You can choose either Single User or No Isolation Shared for access mode. Refer to Databricks Access Modes for more details.
Choose a supported Databricks Runtime Version. See Databricks runtime releases for details on runtime components.
Choose your preferred Worker Type and Driver Type options.
For the other parameters, use the default or change them to your preference.
Under Advanced Options find Spark Config and paste in the configuration below.
Use dark colors for code blocksCopy
```
1
2
3
spark.plugins com.esri.geoanalytics.Plugin
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator com.esri.geoanalytics.KryoRegistrator
```
Note
To write shapefiles in Databricks, the spark.sql.sources.commitProtocolClass Spark configuration property must be set to org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol or another supported commit protocol class. For more information see Spark configuration.

Note
To read feature services in Databricks, the spark.databricks.delta.formatCheck.enabled Spark configuration property must be set to false. For more information see Spark configuration.
Under Advanced Options find Init Scripts tab and specify the path to the init script created previously.
Select Create compute.
Install the wheel (.whl) file downloaded previously. Installing the file will make it available to import as a python library in a notebook. You can choose to either install the library for every cluster in your workspace or only on the cluster you are creating now. Install any other Python libraries you will need at this time.

(Optional) Check cluster status and view logs

To make sure your cluster has been successfully created, look in the Event Log of the created cluster and check for Event Type of RUNNING, usually you will see under message it indicates Compute is running.
If cluster creation failed, you will find Event Type of TERMINATING under Event Log. The message of TERMINATING event should give you more context of failure. For example, if you see Reason: Init script failure in the message, you should check the init script logs.
If the failure reason isn't clear from Event Log, check the Driver Logs which will provide more information in standard output, standard error, and Log4j logs to help with debugging.

Authorize GeoAnalytics Engine

Create a new notebook or open an existing one. Choose "Python" as the default language and attach the notebook to the cluster created previously.
Import the geoanalytics library and authorize it using your username and password or a license file. See Authorization for more information. For example:
Use dark colors for code blocksCopy
```
1
2
import geoanalytics
geoanalytics.auth(username="User1", password="p@ssw0rd")
```
Try out the API by importing the SQL functions as an easy-to-use alias like ST and listing the first 20 functions in a notebook cell:
Use dark colors for code blocksCopy
```
1
2
from geoanalytics.sql import functions as ST
spark.sql("show user functions like 'ST_*'").show()
```

What’s next?

You can now use any SQL function, track function, or analysis tool in the geoanalytics module.

See Data sources and Using DataFrames to learn more about how to access your data from your notebook. Also see Visualize results to get started with viewing your data on a map. For examples of what else is possible with GeoAnalytics Engine, check out the sample notebooks, tutorials, and blog posts.