The Spatially Enabled DataFrame
(SEDF) creates a simple, intutive object that can easily manipulate geometric and attribute data.
Note: The Spatially Enabled DataFrame is implemented as the GeoAccessor class in the API and is based upon the Pandas DataFrame object. The SEDF provides you excellent memory management, ability to handle larger datasets and is coded with the recommended Pandas pattern.
The Spatially Enabled DataFrame inserts a custom namespace called spatial
into the popular Pandas DataFrame structure to give it spatial abilities. This allows you to use intutive, pandorable operations on both the attribute and spatial columns. Thus, the SEDF is based on data structures inherently suited to data analysis, with natural operations for the filtering and inspecting of subsets of values which are fundamental to statistical and geographic manipulations.
The dataframe reads from many sources, including shapefiles, Pandas DataFrames, feature classes, GeoJSON, and Feature Layers.
This document outlines some fundamentals of using the Spatially Enabled DataFrame
object for working with GIS data.
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
Accessing GIS data
GIS users need to work with both published layers on remote servers (web layers) and local data, but the ability to manipulate these datasets without permanently copying the data is lacking. The Spatial Enabled DataFrame
solves this problem because it is an in-memory object that can read, write and manipulate geospatial data.
The SEDF integrates with Esri's ArcPy
site-package as well as the open source pyshp
, shapely
and fiona
packages. This means the ArcGIS API for Python SEDF can use either of these geometry engines to provide you options for easily working with geospatial data regardless of your platform. The SEDF transforms data into the formats you desire so you can use Python functionality to analyze and visualize geographic information.
Data can be read and scripted to automate workflows and just as easily visualized on maps in Jupyter Lab notebooks
. The SEDF can export data as feature classes or publish them directly to servers for sharing according to your needs. Let's explore some of the different options available with the versatile Spatial Enabled DataFrame
namespaces:
Reading Web Layers
Feature layers
hosted on ArcGIS Online or ArcGIS Enterprise can be easily read into a Spatially Enabled DataFrame using the from_layer
method. Once you read it into a SEDF object, you can create reports, manipulate the data, or convert it to a form that is comfortable and makes sense for its intended purpose.
Example: Retrieving an ArcGIS Online item
and using the layers
property to inspect the first 5 records of the layer
from arcgis import GIS
gis = GIS()
item = gis.content.search(
"USA Major Cities", item_type="Feature layer", outside_org=True)[0]
flayer = item.layers[0]
# create a Spatially Enabled DataFrame object
sdf = pd.DataFrame.spatial.from_layer(flayer)
sdf.head()
AGE_10_14 | AGE_15_19 | AGE_20_24 | AGE_25_34 | AGE_35_44 | AGE_45_54 | AGE_55_64 | AGE_5_9 | AGE_65_74 | AGE_75_84 | ... | PLACEFIPS | POP2010 | POPULATION | POP_CLASS | RENTER_OCC | SHAPE | ST | STFIPS | VACANT | WHITE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2144 | 2314 | 2002 | 3531 | 3887 | 5643 | 6353 | 2067 | 5799 | 2850 | ... | 0408220 | 39540 | 40346 | 6 | 6563 | {"x": -12751215.004681978, "y": 4180278.406256... | AZ | 04 | 6703 | 32367 |
1 | 876 | 867 | 574 | 1247 | 1560 | 2122 | 2342 | 733 | 2157 | 975 | ... | 0424895 | 14364 | 14847 | 6 | 1397 | {"x": -12755627.731115643, "y": 4164465.572856... | AZ | 04 | 1389 | 12730 |
2 | 1000 | 1003 | 833 | 2311 | 2063 | 2374 | 3631 | 1068 | 6165 | 3776 | ... | 0425030 | 26265 | 26977 | 6 | 1963 | {"x": -12734674.294574209, "y": 3850472.723091... | AZ | 04 | 9636 | 22995 |
3 | 2730 | 2850 | 2194 | 4674 | 5240 | 7438 | 8440 | 2499 | 8145 | 4608 | ... | 0439370 | 52527 | 55041 | 7 | 6765 | {"x": -12725332.21151233, "y": 4096532.0908223... | AZ | 04 | 9159 | 47335 |
4 | 2732 | 2965 | 2024 | 3182 | 3512 | 3109 | 1632 | 2497 | 916 | 467 | ... | 0463470 | 25505 | 29767 | 6 | 1681 | {"x": -12770984.257542243, "y": 3826624.133935... | AZ | 04 | 572 | 16120 |
5 rows × 51 columns
When you inspect the type
of the object, you get back a standard pandas DataFrame
object. However, this object now has an additional SHAPE
column that allows you to perform geometric operations. In other words, this DataFrame
is now geo-aware.
type(sdf)
pandas.core.frame.DataFrame
Further, the DataFrame
has a new spatial
property that provides a list of geoprocessing operations that can be performed on the object. The rest of the guides in this section go into details of how to use these functionalities. So, sit tight.
Reading Feature Layer Data
As seen above, the SEDF can consume a Feature Layer
served from either ArcGIS Online or ArcGIS Enterprise orgs. Let's take a step-by-step approach to break down the notebook cell above and then extract a subset of records from the feature layer.
Example: Examining Feature Layer content
Use the from_layer
method on the SEDF to instantiate a data frame from an item's layer
and inspect the first 5 records.
known_item = gis.content.search(
"USA Major Cities", item_type="Feature layer", outside_org=True)[0]
known_item
# Obtain the first feature layer from the item
fl = known_item.layers[0]
# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(fl)
# Return the first 5 records.
sdf.head()
AGE_10_14 | AGE_15_19 | AGE_20_24 | AGE_25_34 | AGE_35_44 | AGE_45_54 | AGE_55_64 | AGE_5_9 | AGE_65_74 | AGE_75_84 | ... | PLACEFIPS | POP2010 | POPULATION | POP_CLASS | RENTER_OCC | SHAPE | ST | STFIPS | VACANT | WHITE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2144 | 2314 | 2002 | 3531 | 3887 | 5643 | 6353 | 2067 | 5799 | 2850 | ... | 0408220 | 39540 | 40346 | 6 | 6563 | {"x": -12751215.004681978, "y": 4180278.406256... | AZ | 04 | 6703 | 32367 |
1 | 876 | 867 | 574 | 1247 | 1560 | 2122 | 2342 | 733 | 2157 | 975 | ... | 0424895 | 14364 | 14847 | 6 | 1397 | {"x": -12755627.731115643, "y": 4164465.572856... | AZ | 04 | 1389 | 12730 |
2 | 1000 | 1003 | 833 | 2311 | 2063 | 2374 | 3631 | 1068 | 6165 | 3776 | ... | 0425030 | 26265 | 26977 | 6 | 1963 | {"x": -12734674.294574209, "y": 3850472.723091... | AZ | 04 | 9636 | 22995 |
3 | 2730 | 2850 | 2194 | 4674 | 5240 | 7438 | 8440 | 2499 | 8145 | 4608 | ... | 0439370 | 52527 | 55041 | 7 | 6765 | {"x": -12725332.21151233, "y": 4096532.0908223... | AZ | 04 | 9159 | 47335 |
4 | 2732 | 2965 | 2024 | 3182 | 3512 | 3109 | 1632 | 2497 | 916 | 467 | ... | 0463470 | 25505 | 29767 | 6 | 1681 | {"x": -12770984.257542243, "y": 3826624.133935... | AZ | 04 | 572 | 16120 |
5 rows × 51 columns
NOTE: See Pandas DataFrame
head() method documentation
for details.
You can also use sql queries to return a subset of records by leveraging the ArcGIS API for Python's Feature Layer
object itself. When you run a query()
on a FeatureLayer
, you get back a FeatureSet
object. Calling the sdf
property of the FeatureSet
returns a Spatially Enabled DataFrame object. We then use the data frame's head()
method to return the first 5 records and a subset of columns from the DataFrame:
Example: Feature Layer Query Results to a Spatially Enabled DataFrame
We'll use the AGE_45_54
column to query the data frame and return a new DataFrame
with a subset of records. We can use the built-in zip()
function to print the data frame attribute field names, and then use data frame syntax to view specific attribute fields in the output:
# Filter feature layer records with a sql query.
# See https://developers.arcgis.com/rest/services-reference/query-feature-service-layer-.htm
df = fl.query(where="AGE_45_54 < 1500").sdf
for a, b, c, d in zip(df.columns[::4], df.columns[1::4], df.columns[2::4], df.columns[3::4]):
print("{:<30}{:<30}{:<30}{:<}".format(a, b, c, d))
AGE_10_14 AGE_15_19 AGE_20_24 AGE_25_34 AGE_35_44 AGE_45_54 AGE_55_64 AGE_5_9 AGE_65_74 AGE_75_84 AGE_85_UP AGE_UNDER5 AMERI_ES ASIAN AVE_FAM_SZ AVE_HH_SZ BLACK CAPITAL CLASS FAMILIES FEMALES FHH_CHILD FID HAWN_PI HISPANIC HOUSEHOLDS HSEHLD_1_F HSEHLD_1_M HSE_UNITS MALES MARHH_CHD MARHH_NO_C MED_AGE MED_AGE_F MED_AGE_M MHH_CHILD MULT_RACE NAME OBJECTID OTHER OWNER_OCC PLACEFIPS POP2010 POPULATION POP_CLASS RENTER_OCC SHAPE ST
# Return a subset of columns on just the first 5 records
df[['NAME', 'AGE_45_54', 'POP2010']].head()
NAME | AGE_45_54 | POP2010 | |
---|---|---|---|
0 | Somerton | 1411 | 14287 |
1 | Anderson | 1333 | 9932 |
2 | Camp Pendleton South | 127 | 10616 |
3 | Citrus | 1443 | 10866 |
4 | Commerce | 1478 | 12823 |
Accessing local GIS data
The SEDF can also access local geospatial data. Depending upon what Python modules you have installed, you'll have access to a wide range of functionality:
- If the
ArcPy
module is installed, meaning you have installedArcGIS Pro
and have installed the ArcGIS API for Python in that same environment, theDataFrame
then has methods to read a subset of the ArcGIS Desktop supported data types, most notably: feature classes
shapefiles
,Web layers
andArcGIS Online Hosted Feature Layers
OGC Service layers
- If the
ArcPy
module is not installed, the SEDFfrom_featureclass
method only supports consuming an Esrishapefile
.
Note: You must install the
pyshp
package to read shapefiles in environments that don't have access toArcPy
.
Example: Reading a Shapefile
Note: You must authenticate to
ArcGIS Online
orArcGIS Enterprise
to use thefrom_featureclass()
method to read a shapefile with a Python interpreter that does not have access toArcPy
:
g2 = GIS("https://www.arcgis.com", "username", "password")
g2 = GIS(profile="your_organization_profile")
sdf = pd.DataFrame.spatial.from_featureclass(
"path\to\your\data\census_example\cities.shp")
sdf.tail()
FID | NAME | CLASS | ST | STFIPS | PLACEFIP | CAPITAL | AREALAND | AREAWATER | POP_CLASS | ... | MARHH_NO_C | MHH_CHILD | FHH_CHILD | FAMILIES | AVE_FAM_SZ | HSE_UNITS | VACANT | OWNER_OCC | RENTER_OCC | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3552 | 3552 | East Providence | City | RI | 44 | 22960 | 13.405 | 3.208 | 6 | ... | 5658 | 306 | 1414 | 12850 | 2.99 | 21309 | 779 | 12096 | 8434 | {'x': -71.3608270663031, 'y': 41.8015001782688... | |
3553 | 3553 | Pawtucket | City | RI | 44 | 54640 | 8.736 | 0.259 | 7 | ... | 6740 | 754 | 3242 | 18520 | 3.07 | 31819 | 1772 | 13331 | 16716 | {'x': -71.3759815680945, 'y': 41.8755001649055... | |
3554 | 3554 | Fall River | City | MA | 25 | 23000 | 31.022 | 7.202 | 7 | ... | 9011 | 759 | 4247 | 23558 | 3.00 | 41857 | 3098 | 13521 | 25238 | {'x': -71.1469910908576, 'y': 41.6981001567767... | |
3555 | 3555 | Somerset | Census Designated Place | MA | 25 | 62465 | 8.109 | 3.867 | 6 | ... | 2771 | 91 | 287 | 5260 | 2.98 | 7143 | 156 | 5723 | 1264 | {'x': -71.15319106847441, 'y': 41.748500174901... | |
3556 | 3556 | New Bedford | City | MA | 25 | 45000 | 20.122 | 3.904 | 7 | ... | 8813 | 910 | 4701 | 24083 | 3.01 | 41511 | 3333 | 16711 | 21467 | {'x': -70.93370908847608, 'y': 41.651800155406... |
5 rows × 48 columns
Example: Reading a Featureclass from FileGDB
You must have
fiona
installed if you use thefrom_featureclass()
method to read a feature class from FileGDB with a Python interpreter that does not have access toArcPy
.
sdf = pd.DataFrame.spatial.from_featureclass(
"path\to\your\data\census_example\census.gdb\cities")
sdf.head()
OBJECTID | FID | NAME | CLASS | ST | STFIPS | PLACEFIP | CAPITAL | AREALAND | AREAWATER | ... | MARHH_NO_C | MHH_CHILD | FHH_CHILD | FAMILIES | AVE_FAM_SZ | HSE_UNITS | VACANT | OWNER_OCC | RENTER_OCC | SHAPE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | College | Census Designated Place | AK | 02 | 16750 | 18.670 | 0.407 | ... | 936 | 152 | 339 | 2640 | 3.13 | 4501 | 397 | 2395 | 1709 | {'x': -147.82719115699996, 'y': 64.84830019400... | |
1 | 2 | 1 | Fairbanks | City | AK | 02 | 24230 | 31.857 | 0.815 | ... | 2259 | 395 | 1058 | 7187 | 3.15 | 12357 | 1282 | 3863 | 7212 | {'x': -147.72638162999996, 'y': 64.83809069700... | |
2 | 3 | 2 | Kalispell | City | MT | 30 | 40075 | 5.458 | 0.004 | ... | 1433 | 147 | 480 | 3494 | 2.92 | 6532 | 390 | 3458 | 2684 | {'x': -114.31606412399998, 'y': 48.19780017900... | |
3 | 4 | 3 | Post Falls | City | ID | 16 | 64810 | 9.656 | 0.045 | ... | 1851 | 205 | 467 | 4670 | 3.13 | 6697 | 328 | 4611 | 1758 | {'x': -116.93792709799999, 'y': 47.71555468000... | |
4 | 5 | 4 | Dishman | Census Designated Place | WA | 53 | 17985 | 3.378 | 0.000 | ... | 1096 | 131 | 345 | 2564 | 2.96 | 4408 | 257 | 2635 | 1516 | {'x': -117.27780913799995, 'y': 47.65654568400... |
5 rows × 49 columns
Saving Spatially Enabled DataFrames
The SEDF can export data to various data formats for use in other applications.
Export Options
Export to Feature Class
The SEDF allows for the export of whole datasets or partial datasets.
Example: Export a whole dataset to a shapefile:
sdf.spatial.to_featureclass(location=r"c:\output_examples\census.shp")
'c:\\output_examples\\census.shp'
The ArcGIS API for Python installs on all
macOS
andLinux
machines, as well as thoseWindows
machines not using Python interpreters that have access toArcPy
will only be able to write out to shapefile format with theto_featureclass
method. Writing to file geodatabases requires theArcPy
site-package.
Example: Export dataset with a subset of columns and top 5 records to a shapefile:
for a, b, c, d in zip(sdf.columns[::4], sdf.columns[1::4], sdf.columns[2::4], sdf.columns[3::4]):
print("{:<30}{:<30}{:<30}{:<}".format(a, b, c, d))
PLACENS GEOID NAMELSAD CLASSFP FUNCSTAT ALAND AWATER INTPTLAT
columns = ['NAME', 'ST', 'CAPITAL', 'STFIPS', 'POP2000', 'POP2007', 'SHAPE']
sdf[columns].head().spatial.to_featureclass(
location=r"/path/to/your/data/directory/sdf_head_output.shp")
'/path/to/your/data/directory/sdf_head_output.shp'
Example: Export dataset to a featureclass in FileGDB:
sdf.spatial.to_featureclass(location=r"c:\output_examples\census.gdb\cities")
Publish as a Feature Layer
The SEDF allows for the publishing of datasets as feature layers.
Example: Publishing as a feature layer:
lyr = sdf.spatial.to_featurelayer('census_cities', folder='census')
lyr