ArcGIS GeoAnalytics Engine can be installed on a personal computer, a standalone Spark cluster, or a managed Spark service in the cloud. If you have a GeoAnalytics Engine subscription with a username and password, you can download the ArcGIS GeoAnalytics Engine distribution here after signing in. If you have a license file, follow the instructions provided with your license file to download the GeoAnalytics Engine distribution.
The ArcGIS GeoAnalytics Engine distribution includes the following files and directories:
geoanalytics
—ArcGIS GeoAnalytics Engine plugin for Apache Spark._2.12-x.x.x.jar geoanalytics-natives
—An optional plugin for performing network analysis and geocoding. Compatible with Linux x64, MacOS (Intel x86_64 & Apple Silicon M1 and M2), and Windows x64._2.12-x.x.x.jar geoanalytics-x.x.x.zip
—ArcGIS GeoAnalytics Engine Python distribution in zip format.geoanalytics-x.x.x-py3-none-any.whl
—ArcGIS GeoAnalytics Engine Python distribution in wheel format.help/samples/
—Sample notebooks with example workflows using GeoAnalytics Engine.help/doc/
—Documentation for offline users.License
—Copyright information, licenses, and user agreements.
You can also choose to install supplementary projection data with ArcGIS GeoAnalytics Engine. Installation instructions are included in the specific install guide for each environment. For further information, see the README included with the ArcGIS GeoAnalytics Engine Projection Engine Data distribution or learn more about Coordinate systems and transformations.
GeoAnalytics Engine must be authorized before running any tool or function. For more information see Authorization.
Installing on a personal computer
Apache Spark supports a local deployment mode that is useful for testing in a shell or notebook prior to using resources on a larger Spark cluster. This deployment mode lets you run PySpark code using your personal computer's resources as a single node cluster.
See this guide for instructions on using GeoAnalytics Engine in Spark local mode.
Installing on a Spark standalone cluster
For working with large datasets, a cluster or managed Spark service offers the ability to scale out compute resources and utilize the true potential of Spark. Spark cluster mode allows you to configure Apache Spark on any number of nodes in a cluster of machines that you deploy. See this guide for instructions on using GeoAnalytics Engine in Spark Cluster mode.
Installing on a managed Spark service in the cloud
GeoAnalytics Engine supports use with the following managed Spark services:
Within each service you can deploy customized Spark clusters and PySpark notebooks. The advantages of deploying a Spark cluster in the cloud include a small startup cost, the ability to deploy and shut down resources quickly, and the option to scale up or scale down resources as needed.
Dependencies
GeoAnalytics Engine extends Spark and thus requires Spark and its dependencies to be installed prior to using the API. The table below summarizes which versions of Spark and its dependencies are supported by each version of GeoAnalytics Engine. Refer to the Spark documentation to learn which versions of each dependency are supported by the Spark version you have installed.
GeoAnalytics Engine | Spark | Python | Java | Scala |
---|---|---|---|---|
1.0.x | 3.0.1-3.2.x | 3.7+ | Java 8/11 | 2.12 |
1.1.x | 3.0.1-3.3.x | 3.7+ | Java 8/11 | 2.12 |
1.2.x | 3.0.1-3.4.x | 3.7+ | Java 8/11 | 2.12 |
1.3.x-1.4.x | 3.0.1-3.5.x | 3.7+ | Java 8/11 | 2.12 |
Support for new versions of Spark or its dependencies may be added with any minor release while support for older versions may be dropped with any major release. For more information see Versioning policy. Managed Spark services hosted in the cloud are often pre-configured with Spark dependencies and ready to use. See the install guide for each cloud provider for the list of runtimes supported by GeoAnalytics Engine.
With Spark 3.5.x, writing vector tiles or plotting in a notebook requires you to configure protobuf-java with your Spark cluster.
This is configured for you automatically in Databricks. EMR requires you to copy the jar to /usr/lib/spark/jars/
on your cluster using the setup script. For other environments, version 2.5.0 is recommended.