A file geodatabase is an Esri geospatial data format that stores and manages spatial and nonspatial data. It can store various types of geographic data, including nonspatial tables, feature classes, feature datasets, and raster datasets.
GeoAnalytics Engine supports loading tables and feature classes of point
, multipoint
, line
, and polygon
geometries.
After loading the file geodatabase into a Spark DataFrame, you can perform analysis and visualize the data by
using the SQL functions and tools available in GeoAnalytics Engine in addition to functions offered in Spark.
The following table shows the Python syntax for loading the file geodatabase into a Spark DataFrame.
Load | Save |
---|---|
spark.read.format("filegdb").load() | Not supported |
spark.read.load(format="filegdb") | Not supported |
When you load the file geodatabase, specify the path of the file geodatabase and the name of the table or feature class using the below options.
DataFrameReader option | Example | Description |
---|---|---|
gdb | .option("gdb | The path to the file geodatabase. It is required for loading the file geodatabase. |
gdb | .option("gdb | The name of the table or feature class in the file geodatabase. |
If you don't specify the gdb
, the complete catalog of the datasets in the file geodatabase will be loaded.
Above is an example view of a file geodatabase in the ArcGIS Pro Catalog pane.
When you load the catalog for the file geodatabase named example.gdb
, the table catalog includes the dataset name, Name
,
the dataset type, Dataset
, and the geometry type, Geometry
, for each table and feature class.
# Load the file geodatabase catalog from an S3 bucket
spark.read.format("filegdb").option("gdbPath", "s3a://my-bucket/my-folder/example.gdb").load().show()
+-------------+-------------+------------+
| Name| DatasetType|GeometryType|
+-------------+-------------+------------+
|ca_population| Table| null|
| ca_parks|Feature Class| Point|
| us_lakes|Feature Class| Polygon|
| us_rivers|Feature Class| Polyline|
| calls|Feature Class| MultiPoint|
+-------------+-------------+------------+
Usage notes
- GeoAnalytics Engine will load date data type
in a table or feature class as a TimestampType.
If there is one date column in a table or feature class, it will be automatically set as the
time
field in a Spark DataFrame. If there are multiple date columns, you can call
st.set
to enable time._time _fields() - The table or feature class name is unique in a file geodatabase. To load the table or feature class in a feature dataset,
you can access the data with
gdb
without specifying the name or path to the feature dataset. In the above example, you can load the feature className us
with syntax_lakes spark.read.format("filegdb").option("gdb
.Path", "s3a ://my-bucket/my-folder/example.gdb").option("gdb Name", "us _lakes").load() - When GeoAnalytics Engine accesses a file geodatabase, it will not lock the table, feature class, or feature dataset. You can freely edit or modify the file geodatabase with other processes such as ArcGIS Pro.
- When loading the catalog of the file geodatabase, the name of the feature dataset is not included in the table catalog.
- GeoAnalytics Engine doesn't support loading mosaic datasets or raster datasets stored in a file geodatabase.
- GeoAnalytics Engine doesn't support saving data into file geodatabases.