Part-2 Data IO with SeDF - Accessing Data

Introduction

In part-1 of this guide series, we started with an introduction to the Spatially enabled DataFrame (SeDF), the spatial and geom namespaces, and looked at a quick example of SeDF in action. In this part of the guide series, we will look at how GIS data can be accessed from various data formats using SeDF.

GIS users work with different vector-based spatial data formats, like published layers on remote servers (web layers) and local data. The Spatially enabled DataFrame allows the users to read, write, and manipulate spatial data by bringing the data in-memory.

The SeDF integrates with Esri's ArcPy site-package, as well as the open source pyshp, shapely and fiona packages. This means that the SeDF can use either shapely or arcpy geometry engines to provide you with options for easily working with geospatial data, regardless of your platform. The SeDF transforms the data into the formats you desire, allowing you to use Python functionality to analyze and visualize geographic information.

Data can be read and scripted to automate workflows and be visualized on maps in a Jupyter notebooks. Let's explore the options available for accessing GIS data with the versatile Spatially enabled DataFrame.

The data used in this guide is available as an item. We will start by importing some libraries and downloading and extracting the data needed for the analysis in this guide.

# Import Libraries
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor
from arcgis.gis import GIS
from IPython.display import display
import zipfile
import os
import shutil
# Create a GIS connection
gis = GIS()
agol_gis = GIS("https://www.arcgis.com", "arcgis_python", "amazing_arcgis_123")
# Get the data item
data_item = gis.content.get('c7140ae3d7ae4fd0817181461019aa75')
data_item
sedf_guide_data
Data for Spatially enabled DataFrame GuidesShapefile by api_data_owner
Last Modified: November 11, 2021
0 comments, 4 views

The cell below downloads and extracts the data from the data item to your machine.

# Download and extract the data
def unzip_data():
    """
    This function:
    - creates a directory `sedf_data` to download the data from the item
    - downloads the item as `sedf_guide_data.zip` file in the sedf_data directory
    - unzips and extracts the data to '.\sedf_data\cities'.
    """
    try:

        # path to downloaded data folder
        data_dir = os.path.join(os.getcwd(), 'sedf_data')

        # remove existing cities directory if exists
        if os.path.isdir(data_dir):
            shutil.rmtree(data_dir)
            print(f'Removed existing data directory')
        else:
            os.makedirs(data_dir)

        data_item.download(data_dir)    # download the data item
        # path to zipped file inside data folder
        zipped_file_path = os.path.join(data_dir, 'sedf_guide_data.zip')

        # unzip the data
        zip_ref = zipfile.ZipFile(zipped_file_path, 'r')
        zip_ref.extractall(data_dir)
        zip_ref.close()

        # path to new cities directory
        cities_dir = os.path.join(data_dir, 'cities')
        print(f'Dataset unzipped at: {os.path.relpath(cities_dir)}')

    except Exception as e:
        print(f'Error unzipping file: {e}')


# Extract data
unzip_data()
Removed existing data directory
Dataset unzipped at: sedf_data\cities

Accessing GIS Data

The Spatially enabled DataFrame reads from many sources, including Feature layers, Feature classes, Shapefiles, Pandas DataFrames and more. Let's dive into the details of accessing GIS data from various sources.

Read in Web Feature Layers

Feature layers hosted on ArcGIS Online or ArcGIS Enterprise can be easily read into a Spatially enabled DataFrame using the from_layer() method.

The example below shows how the get() method can be used to retrieve an ArcGIS Online item and how the layers property of an item can be used to access the data.

gis = GIS()
item = gis.content.search(
    "USA Major Cities", item_type="Feature layer", outside_org=True)[0]
item
USA Major Cities
This layer presents the locations of cities within the United States with populations of approximately 10,000 or greater, all state capitals, and the national capital.Feature Layer Collection by esri_dm
Last Modified: May 19, 2020
1 comments, 33,841,105 views
# Obtain the first feature layer from the item
flayer = item.layers[0]

# Use the `from_layer` static method in the 'spatial' namespace on the Pandas' DataFrame
sdf = pd.DataFrame.spatial.from_layer(flayer)

# Check shape
sdf.shape
(3886, 50)
# Check first few records
sdf.head()
AGE_10_14AGE_15_19AGE_20_24AGE_25_34AGE_35_44AGE_45_54AGE_55_64AGE_5_9AGE_65_74AGE_75_84...PLACEFIPSPOP2010POPULATIONPOP_CLASSRENTER_OCCSHAPESTSTFIPSVACANTWHITE
01313105873420311767144611361503665486...1601990138161518161271{"x": -12462673.723706165, "y": 5384674.994080...ID1627113002
189081781817991235133011431099721579...1607840118991194661441{"x": -12506251.313993266, "y": 5341537.793529...ID163189893
21275013959169663213527048295952417712933121767087...1608830205671225405833359{"x": -12938676.6836459, "y": 5403597.04949123...ID166996182991
3790768699144511361134935959679464...1611260103451072761461{"x": -12667411.402393516, "y": 5241722.820606...ID162417984
43803377936877571555947443624439722961222...1612250462375394275196{"x": -12989383.674504515, "y": 5413226.487333...ID16142835856

5 rows × 50 columns

# Check type of sdf
type(sdf)
pandas.core.frame.DataFrame
# Access spatial namespace
sdf.spatial.geometry_type
['point']

We can see that the dataset has 3886 records and 50 columns. Inspecting the type of sdf object and accessing the spatial namespace shows us that a Spatially enabled DataFrame has been created from all the data in the layer.

Memory usage and the query() operation

The from_layer() method will attempt to read all the data from the layer into the memory. This approach works when you are dealing with small datasets. However, when it comes to large datasets, it becomes imperative to use the memory efficiently and query for only what is necessary.

Let's take a look at the memory usage of the existing SeDF using the memory_usage() method from Pandas.

# Check memory usage of current sdf
mem_used = sdf.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Shape of data: {sdf.shape}')
print(f'Memory used: {round(mem_used, 2)} MB')
Shape of data: (3886, 50)
Memory used: 1.48 MB

We can see that a SeDF created using the from_layer() method reads all the data into the memory. So, the sdf object has 3886 records and 50 columns, and uses 1.48MB memory.

But what if we only needed a small amount of data for our analysis and did not need to bring everything from the layer into the memory? Good question... let's see how we can achieve that.

The query() method is a powerful operation that allows you to use SQL like queries to return only a subset of records. Since the processing is performed on the server, this operation is not restricted by the capacity of your computer.

The method returns a FeatureSet object; however, the return type can be changed to a Spatially enabled DataFrame object by specifying the parameter as_df=True.

Let's subset the data using query(), create a new SeDF, and check the memory usage. We'll use the AGE_45_54 column to query the layer and get a subset of records.

# Filter feature layer records with a query.
sub_sdf = flayer.query(where="AGE_45_54 < 1500", as_df=True)
sub_sdf.shape
(316, 50)
# Check memory usage of current sdf
mem_used = sub_sdf.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')
Memory used is: 0.12 MB

Now that we are only querying for records where AGE_45_54 < 1500, the result is a smaller DataFrame with 316 records and 50 columns. Since the processing is performed on the server side, only a subset of data is being saved in the memory reducing usage from 1.48 MB to 0.12 MB.

The query() method allows you to specify a number of optional parameters that may further refine and transform the results. One such key parameter is out_fields. With out_fields, you can subset your data by specifying a list of field names to return.

# Filter feature layer with where and out_fields
out_fields = ['NAME', 'ST', 'POP_CLASS', 'AGE_45_54']
sub_sdf2 = flayer.query(where="AGE_45_54 < 1500",
                        out_fields=out_fields,
                        as_df=True)
sub_sdf2.shape
(316, 6)
# Check head
sub_sdf2.head()
FIDNAMESTPOP_CLASSAGE_45_54SHAPE
01AmmonID61446{"x": -12462673.723706165, "y": 5384674.994080...
12BlackfootID61330{"x": -12506251.313993266, "y": 5341537.793529...
24BurleyID61134{"x": -12667411.402393516, "y": 5241722.820606...
36ChubbuckID61494{"x": -12520053.904151963, "y": 5300220.333409...
412JeromeID61155{"x": -12747828.64784961, "y": 5269214.8197742...
# Check memory usage of current sdf
mem_used = sub_sdf2.memory_usage().sum() / (1024**2)  # converting to megabytes
print(f'Memory used is: {round(mem_used, 2)} MB')
Memory used is: 0.01 MB

Using out_fields, we have further reduced memory usage by subsetting the data and bringing only necessary information into the memory.

Create SeDF from FeatureSet

As mentioned earlier, the query() method returns a FeatureSet object. The FeatureSet object contains useful information about the data that can be accessed through its various properties.

Let's use the AGE_45_54 column to query the layer to get the result as a FeatureSet and check some its properties.

# Filter feature layer to return a feature set.
fset = flayer.query(where="AGE_45_54 < 1500")
# Check type
type(fset)
arcgis.features.feature.FeatureSet
# Check length
len(fset.features)
316
# Check geometry of a feature in the featureset
fset.features[0].geometry
{'x': -12462673.723706165,
 'y': 5384674.994080178,
 'spatialReference': {'wkid': 102100, 'latestWkid': 3857}}

The fields property of a FeatureSet returns a list containing information about each column recorded as a dictionary. Let's use the fields property to access information about the first column.

# Check details of a column in the feature set
fset.fields[0]
{'name': 'FID',
 'type': 'esriFieldTypeOID',
 'alias': 'FID',
 'sqlType': 'sqlTypeInteger',
 'domain': None,
 'defaultValue': None}

Let's get the names of the columns in the data.

# Get column names
f_names = [f['name'] for f in fset.fields]
f_names[:5]
['FID', 'NAME', 'CLASS', 'ST', 'STFIPS']

Now, let's create a Spatially enabled DataFrame from a FeatureSet using the .sdf property.

# Create SeDF from FeatureSet
fset_df = fset.sdf
fset_df.shape
(316, 50)
# Check head
fset_df.head(2)
FIDNAMECLASSSTSTFIPSPLACEFIPSCAPITALPOP_CLASSPOPULATIONPOP2010...MARHH_NO_CMHH_CHILDFHH_CHILDFAMILIESAVE_FAM_SZHSE_UNITSVACANTOWNER_OCCRENTER_OCCSHAPE
01AmmoncityID16160199061518113816...113110633533523.61474727132051271{"x": -12462673.723706165, "y": 5384674.994080...
12BlackfootcityID16160784061194611899...108117438129583.31454731827881441{"x": -12506251.313993266, "y": 5341537.793529...

2 rows × 50 columns

# Check geometry type
fset_df.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureSet.

Create SeDF from FeatureCollection

Tools within the ArcGIS API for Python often return a FeatureCollection object as a result of some analysis. A FeatureCollection is an in-memory collection of Feature objects with rendering information. Similar to feature layers, feature collections can also be used to store features. With a feature collection, a service is not created to serve out the feature data.

Let's create a SeDF from a FeatureCollection. Here, we:

  • Import the Major Ports feature layer.
  • Create 5 mile buffers using create_buffers() tool resulting in a FeatureCollection.
  • Using the query() method on a FeatureCollection returns a FeatureSet object. We will create a SeDF from the buffered FeatureCollection using the the .sdf property of a FeatureSet object returned from query().
# Get the ports item
ports_item = gis.content.get("405963eaea24428c9db236ec289760eb")
ports_item
Major Ports
This feature layer, utilizing data from the U.S. Department of Transportation, depicts Major Ports in the United States by total tonnage.Feature Layer Collection by Federal_User_Community
Last Modified: October 27, 2021
0 comments, 157,223 views
# Get the ports layer
ports_lyr = ports_item.layers[0]
ports_lyr
<FeatureLayer url:"https://geo.dot.gov/server/rest/services/NTAD/Ports_Major/MapServer/0">
# Create buffers
from arcgis.features.use_proximity import create_buffers
ports_buffer50 = create_buffers(
    ports_lyr, distances=[5], units='Miles', gis=agol_gis)
# Check type of result from the analysis
type(ports_buffer50)
arcgis.features.feature.FeatureCollection

The create_buffers() tool resulted in a FeatureCollection.

Now, we will create a SeDF from the FeatureCollection object.

# Create SeDF
sedf_fc = ports_buffer50.query().sdf
sedf_fc.head(2)
OBJECTID_1OBJECTIDIDPORTPORT_NAMEGRAND_TOTAFOREIGN_TOIMPORTSEXPORTSDOMESTICBUFF_DISTORIG_FIDAnalysisAreaSHAPE
011124C4947Unalaska Island, AK165228112368294262518105784154525178.528402{"rings": [[[-18806114.3995, 7138385.537799999...
12285C4410Kahului, Maui, HI36154492039120391035950585278.528402{"rings": [[[-17418472.419, 2388455.4312999994...
# Check geometry type
sedf_fc.spatial.geometry_type
['polygon']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a FeatureCollection.

Read in local GIS data

Local geospatial data, such as Feature classes and shapefiles can be easily accessed using the Spatially enabled DataFrame. The from_featureclass() method can be used to access local data. Let's look at some examples.

Reading a Shapefile

A locally stored shapefile can be accessed by passing the location of the file in the from_featureclass() method.

Note: In the absence of arcpy, the PyShp package must be present in your current conda environment in order to read shapefiles. To check if PyShp is present, you can run the following in a cell: !conda list pyshp To install PyShp, you can run the following in a cell: !conda install pyshp
# Reading from shape file
shp_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.shp")
shp_df.shape
(3886, 51)
shp_df.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the shapefile stored locally.

Shapefile from a URL

The url of a zipped shapefile can be used to create a SeDF by passing the url as location in the from_featureclass() method. The image below shows how the operation can be performed.

Note: This operation requires PyShp to be available in the environment.

image.png

Reading a Featureclass

A featureclass can be accessed from a File Geodatabase by passing its location in the from_featureclass() method.

Note: In the absence of arcpy, the Fiona package must be present in your current conda environment in order to read a featureclass. To check if Fiona is present, you can run the following in a cell: !conda list fiona To install Fiona, you can run the following in a cell: !conda install fiona
# Reading from FGDB
fcls_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.gdb/cities")
fcls_df.shape
(3886, 51)
# Check head
fcls_df.head(2)
OBJECTIDage_10_14age_15_19age_20_24age_25_34age_35_44age_45_54age_55_64age_5_9age_65_74...placefipspop2010populationpop_classrenter_occststfipsvacantwhiteSHAPE
011313105873420311767144611361503665...1601990138161518161271ID1627113002{"x": -12462673.7237, "y": 5384674.994099997, ...
1289081781817991235133011431099721...1607840118991194661441ID163189893{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
fcls_df.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from the featureclass stored locally.

Specify optional parameters

The from_featureclass() method allows users to specify optional parameters when the ArcPy library is available in the current environment. These parameters are:

  • sql_clause: a pair of SQL prefix and postfix clauses, sql_clause=(prefix,postfix), organized in a list or a tuple can be passed to query specific data. The parameter allows only a small set of operations to be performed. Learn more about the allowed operations here.
  • where_clause: where statement to subset the data. Learn more about it here.
  • fields: to subset the data for specific fields.
  • spatial_filter: a geometry object to filter the results.
Note: The operations below can only be performed in an environment that contains arcpy.
Subset data for specific fields
# Subset for fields
fcls_flds = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st', 'pop_class'])
fcls_flds.shape
(3886, 3)
# Check head
fcls_flds.head(2)
stpop_classSHAPE
0ID6{"x": -12462673.7237, "y": 5384674.994099997, ...
1ID6{"x": -12506251.314, "y": 5341537.793499999, "...
Subset using where_clause

Learn more about how to use where_clause here.

# Subset using where_clause
fcls_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  where_clause="st='ID' and pop_class=6")
fcls_whr.shape
(15, 51)
# Check head
fcls_whr.head(2)
OBJECTIDage_10_14age_15_19age_20_24age_25_34age_35_44age_45_54age_55_64age_5_9age_65_74...placefipspop2010populationpop_classrenter_occststfipsvacantwhiteSHAPE
011313105873420311767144611361503665...1601990138161518161271ID1627113002{"x": -12462673.7237, "y": 5384674.994099997, ...
1289081781817991235133011431099721...1607840118991194661441ID163189893{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

Subset using fields and where_clause
# Subset using where_clause
flds_whr = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                  fields=[
                                                      'st', 'pop_class', 'age_10_14', 'age_15_19'],
                                                  where_clause="st='ID' and pop_class=6")
flds_whr.shape
(15, 5)
# Check head
flds_whr.head(2)
stpop_classage_10_14age_15_19SHAPE
0ID613131058{"x": -12462673.7237, "y": 5384674.994099997, ...
1ID6890817{"x": -12506251.314, "y": 5341537.793499999, "...
Subset using sql_clause

sql_clause can be combined with fields and where_clause to further subset the data. You can learn more about the allowed operations here. Now let's look at some examples.

Prefix sql_clause - DISTINCT operation
# Prefix Sql clause - DISTINCT operation
fcls_sql1 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   sql_clause=("DISTINCT pop_class", None))

# Check shape
fcls_sql1.shape
(3886, 51)
# Check head
fcls_sql1.head(2)
OBJECTIDage_10_14age_15_19age_20_24age_25_34age_35_44age_45_54age_55_64age_5_9age_65_74...placefipspop2010populationpop_classrenter_occststfipsvacantwhiteSHAPE
0941124712131043202216922116182711871037...0507330156201477163006AR0513036216{"x": -10006810.091, "y": 4290154.581699997, "...
114057967487541999171720621450760851...246685012677131886814MD2428111613{"x": -8517714.7855, "y": 4744316.880199999, "...

2 rows × 51 columns

Postfix sql_clause with specific fields

Here, we will subset the data for the state and population class fields and apply a postfix clause.

# Postfix Sql clause with specific fields
fcls_sql2 = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                   fields=['st', 'pop_class'],
                                                   sql_clause=(None, "ORDER BY st, pop_class"))
# Check shape
fcls_sql2.shape
(3886, 3)
# Check head
fcls_sql2.head()
stpop_classSHAPE
0AK6{"x": -16417572.1606, "y": 9562359.403800003, ...
1AK6{"x": -16455422.2224, "y": 9574022.0224, "spat...
2AK6{"x": -16444303.0276, "y": 9568008.9705, "spat...
3AK6{"x": -14962313.3618, "y": 8031014.926600002, ...
4AK6{"x": -16657118.680399999, "y": 8746757.662600...
Prefix and Postfix sql_clause with specific fields and where_clause

Here, we will subset the data using where_clause, keep specific fields, and then apply both prefix and postfix clause.

# Prefix and Postfix sql_clause
fcls_sql3_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                      fields=[
                                                          'st', 'name', 'pop_class', 'age_10_14'],
                                                      where_clause="st='ID'",
                                                      sql_clause=("DISTINCT pop_class", "ORDER BY name"))

# Check Shape
fcls_sql3_df.shape
(22, 5)
# Check head
fcls_sql3_df.head()
stnamepop_classage_10_14SHAPE
0IDAmmon61313{"x": -12462673.7237, "y": 5384674.994099997, ...
1IDBlackfoot6890{"x": -12506251.314, "y": 5341537.793499999, "...
2IDBoise City812750{"x": -12938676.683600001, "y": 5403597.049500...
3IDBurley6790{"x": -12667411.4024, "y": 5241722.820600003, ...
4IDCaldwell73803{"x": -12989383.6745, "y": 5413226.487300001, ...
Using spatial_filter

spatial_filter can be used to query the results by using a spatial relationship with another geometry. The spatial filtering is even more powerful when integrated with Geoenrichment. Let's use this approach to filter our results for the state of Idaho. In this example, we will:

  • use arcgis.geoenrichment.Country to derive the geometries for the state of Idaho.
  • use arcgis.geometry.filters.intersects(geometry, sr=None) to create a geometry filter object that filters results whose geometry intersects with the specified geometry (i.e. filter data points within the boundary of Idaho).
  • pass the geometry filter object to spatial_filter to get desired results.
Note: To perform enrichment operations, GeoEnrichment must be configured in your GIS organization. GeoEnrichment consumes credits, and you can learn more about credit consumption here.
# Basic Imports
from arcgis.geometry import Geometry
from arcgis.geometry.filters import intersects
from arcgis.geoenrichment import Country
# Create country object
usa = Country.get('US', gis=agol_gis)
type(usa)
arcgis.geoenrichment.enrichment.Country
# Get boundaries for Idaho
named_area_ID = usa.search(query='Idaho', layers=['US.States'])
display(named_area_ID[0])
named_area_ID[0].geometry.as_arcpy
<NamedArea name:"Idaho" area_id="16", level="US.States", country="147">
# Create spatial reference
sr_id = named_area_ID[0].geometry["spatialReference"]
sr_id
{'wkid': 4326, 'latestWkid': 4326}
# Construct a geometry filter using the filter geometry
id_state_filter = intersects(named_area_ID[0].geometry,
                             sr=sr_id)
type(id_state_filter)
dict
# Pass geometry filter object as a spatial_filter
fcls_spfl_df = pd.DataFrame.spatial.from_featureclass(location="./sedf_data/cities/cities.gdb/cities",
                                                      fields=[
                                                          'st', 'name', 'pop_class', 'age_10_14'],
                                                      spatial_filter=id_state_filter)
# Check shape
fcls_spfl_df.shape
(22, 5)
# Check head
fcls_spfl_df.head()
stnamepop_classage_10_14SHAPE
0IDAmmon61313{"x": -12462673.7237, "y": 5384674.994099997, ...
1IDBlackfoot6890{"x": -12506251.314, "y": 5341537.793499999, "...
2IDBoise City812750{"x": -12938676.683600001, "y": 5403597.049500...
3IDBurley6790{"x": -12667411.4024, "y": 5241722.820600003, ...
4IDCaldwell73803{"x": -12989383.6745, "y": 5413226.487300001, ...

The result shows the data points filtered for Idaho as defined by the spatial filter.

You can learn more about applying spatial filters in our Working with geometries guide series.

Read in DataFrame with Addresses

A SeDF can be easily created from a DataFrame with address information using the from_df() method. This method geocodes the addresses using the first configured geocoder in your GIS. The locations generated after geocoding are used as the geometry of the SeDF.

You can learn more about geocoding in our Finding Places with geocoding guide series.

Note: The from_df() method performs a batch geocoding operation which consumes credits. If a geocoder is not specified, then the first configured geocoder in your GIS organization will be used. Learn more about credit consumption here.

To avoid credit consumption, you may specify your own geocoder.

Let's look at an example of using from_df(). We will read addresses into a DataFrame using the pd.read_csv() method. Next, we will create a SeDF by passing the DataFrame and address column as parameters to the from_df() method.

# Read the csv file with address into a DataFrame
orders_df = pd.read_csv("./sedf_data/cities/orders.csv")

# Check head
orders_df.head()
Address
0602 Murray Cir, Sausalito, CA 94965
1340 Stockton St, San Francisco, CA 94108
23619 Balboa St, San Francisco, CA 94121
31274 El Camino Real, San Bruno, CA 94066
4625 Monterey Blvd, San Francisco, CA 94127

The DataFrame shows a column with address information.

# Use from_df to create SeDF
orders_sdf = pd.DataFrame.spatial.from_df(
    df=orders_df, address_column="Address")
orders_sdf.head()
AddressSHAPE
0602 Murray Cir, Sausalito, CA 94965{"x": -122.47885242199999, "y": 37.83735920100...
1340 Stockton St, San Francisco, CA 94108{"x": -122.44955096499996, "y": 37.73152250200...
23619 Balboa St, San Francisco, CA 94121{"x": -122.49772620499999, "y": 37.77567413500...
31274 El Camino Real, San Bruno, CA 94066{"x": -122.40685153899994, "y": 37.78910429100...
4625 Monterey Blvd, San Francisco, CA 94127{"x": -122.42218381299995, "y": 37.63856151200...
# Check geometry type
orders_sdf.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with address information.

Read in DataFrame with Lat/Long Information

As we saw in part-1 of this guide series, a SeDF can be created from any Pandas DataFrame with location information (Latitude and Longitude) using the from_xy() method.

Let's look at an example. We will read the data with latitude and longitude information into a DataFrame using the pd.read_csv() method. Then, we will create a SeDF by passing the DataFrame, latitude, and longitude as parameters to the from_xy() method.

# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Return the first 5 records
cms_df.head()
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsLONGITUDELATITUDE
0GROSSE POINTE MANORNILESIL556129961-87.79297342.012012
1MILLER'S MERRY MANORDUNKIRKIN0004643-85.19765140.392722
2PARKWAY MANORMARIONIL00013184-88.98294437.750143
3AVANTARA LONG GROVELONG GROVEIL61410195131-87.98644242.160843
4HARMONY NURSING & REHAB CENTERCHICAGOIL197516180116-87.72635341.975505
# Create a SeDF
cms_sedf = pd.DataFrame.spatial.from_xy(
    df=cms_df, x_column='LONGITUDE', y_column='LATITUDE', sr=4326)

# Check head
cms_sedf.head()
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsLONGITUDELATITUDESHAPE
0GROSSE POINTE MANORNILESIL556129961-87.79297342.012012{"spatialReference": {"wkid": 4326}, "x": -87....
1MILLER'S MERRY MANORDUNKIRKIN0004643-85.19765140.392722{"spatialReference": {"wkid": 4326}, "x": -85....
2PARKWAY MANORMARIONIL00013184-88.98294437.750143{"spatialReference": {"wkid": 4326}, "x": -88....
3AVANTARA LONG GROVELONG GROVEIL61410195131-87.98644242.160843{"spatialReference": {"wkid": 4326}, "x": -87....
4HARMONY NURSING & REHAB CENTERCHICAGOIL197516180116-87.72635341.975505{"spatialReference": {"wkid": 4326}, "x": -87....

The SHAPE feature shows that a Spatially enabled DataFrame has been created from a Pandas DataFrame with latitude and longitude information.

Read in GeoPandas DataFrame

A SeDF can be easily created from a GeoPandas's GeoDataFrame using the from_geodataframe() method. We will:

Create a GeoDataFrame

Here, we will create a GeoDataFrame from a Pandas DataFrame, cms_df, defined above.

# Import libraries
from geopandas import GeoDataFrame
from shapely.geometry import Point
# Read the data
cms_df = pd.read_csv('./sedf_data/cities/sample_cms_data.csv')

# Create Geopandas DataFrame
gdf = GeoDataFrame(cms_df.drop(['LONGITUDE', 'LATITUDE'], axis=1),
                   crs={'init': 'epsg:4326'},
                   geometry=[Point(xy) for xy in zip(cms_df.LONGITUDE, cms_df.LATITUDE)])
gdf.shape
(124, 9)
# Check head
gdf.head(2)
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied Bedsgeometry
0GROSSE POINTE MANORNILESIL556129961POINT (-87.79297 42.01201)
1MILLER'S MERRY MANORDUNKIRKIN0004643POINT (-85.19765 40.39272)

A GeoDataFrame has been created with a geometry column that stores the geometry of the dataset.

Create a SeDF from GeoDataFrame

Here, we will create a SeDF from the gdf GeoDataFrame created above using the from_geodataframe() method.

# Create a SeDF
sedf_gpd = pd.DataFrame.spatial.from_geodataframe(gdf)
sedf_gpd.head(2)
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsSHAPE
0GROSSE POINTE MANORNILESIL556129961{"x": -87.792973, "y": 42.012012, "spatialRefe...
1MILLER'S MERRY MANORDUNKIRKIN0004643{"x": -85.197651, "y": 40.392722, "spatialRefe...
# Check geometry type
sedf_gpd.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from a GeoDataFrame.

Read in feather format data

A SeDF can be easily created from the data in feather format using the from_feather() method. The method's defaults SHAPE is the spatial_column for geo-spatial information, but any other column with spatial information can be specified.

# Check head
cms_sedf.head(2)
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsLONGITUDELATITUDESHAPE
0GROSSE POINTE MANORNILESIL556129961-87.79297342.012012{"spatialReference": {"wkid": 4326}, "x": -87....
1MILLER'S MERRY MANORDUNKIRKIN0004643-85.19765140.392722{"spatialReference": {"wkid": 4326}, "x": -85....
# Create SeDf by reading from feather
sedf_fthr = pd.DataFrame.spatial.from_feather(
    './sedf_data/cities/sample_cms_data.feather')
sedf_fthr.head(2)
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsLONGITUDELATITUDESHAPE
0GROSSE POINTE MANORNILESIL556129961-87.79297342.012012{"x": -87.792973, "y": 42.012012, "spatialRefe...
1MILLER'S MERRY MANORDUNKIRKIN0004643-85.19765140.392722{"x": -85.197651, "y": 40.392722, "spatialRefe...
# Check geometry type
sedf_fthr.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created from feather format data.

Read in Non-spatial Table data

Non-spatial table data can be hosted on ArcGIS Online or ArcGIS Enterprise, or it can be stored locally in a File Geodatabase. A SeDF can be easily created from such non-spatial table data using the following methods:

Using the from_table() method

A SeDF can be created from local non-spatial data using the from_table() method. The method can read a csv file (in any environment) or a table stored in a File Geodatabase (with ArcPy only).

Reading a csv file
# Create SeDF
tbl_df = pd.DataFrame.spatial.from_table(
    filename='./sedf_data/cities/sample_cms_data.csv')
tbl_df.head(2)
Provider NameProvider CityProvider StateResidents Total Admissions COVID-19Residents Total COVID-19 CasesResidents Total COVID-19 DeathsNumber of All BedsTotal Number of Occupied BedsLONGITUDELATITUDE
0GROSSE POINTE MANORNILESIL556129961-87.79297342.012012
1MILLER'S MERRY MANORDUNKIRKIN0004643-85.19765140.392722

A Pandas DataFrame without any spatial information is returned.

Reading table from a File Geodatabase
Note: The operation below can only be performed in an environment that contains arcpy.
# Create SeDF
tbl_df2 = pd.DataFrame.spatial.from_table(
    filename="./sedf_data/cities/cities.gdb/cities_table_export")
tbl_df2.head(2)
OBJECTIDNAMEOTHEROWNER_OCCPLACEFIPSPOP2010POPULATIONPOP_CLASSRENTER_OCCSTSTFIPSVACANTWHITE
01Ammon30732051601990138161518161271ID1627113002
12Blackfoot107727881607840118991194661441ID163189893

A Pandas DataFrame without any spatial information is returned.

Using the from_layer() method

A SeDF can be created from hosted non-spatial data using thefrom_layer() method.

tbl_item = agol_gis.content.get("019215fdda4b4b3eb5b4712f3b06f544")
tbl_item
sedf_major_cities_table

Table Layer by api_data_owner
Last Modified: September 30, 2024
0 comments, 3 views
# Get table url
tbl = tbl_item.tables[0]
tbl
<Table url:"https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/sedf_major_cities_table/FeatureServer/0">
import pandas as pd
tbl_df2 = pd.DataFrame.spatial.from_layer(tbl)
tbl_df2.head(2)
OBJECTIDPLACEFIPSPOP2010POPULATIONPOP_CLASSSTFIPSCLASSObjectId2
0016019901381615181616city1
1116078401189911946616city2

A Pandas DataFrame without any spatial information is returned.

Read in data from 'lite and portable' databases

Geospatial data stored in a mobile geodatabase (.geodatabase) or a SQLite Database can be easily accessed using the Spatially enabled DataFrame.

  • A mobile geodatabase (.geodatabase) is a collection of various types of GIS datasets contained in a single file on disk that can store, query, and manage spatial and nonspatial data. Mobile geodatabases are stored in an SQLite database.

  • SQLite is a full-featured relational database with the advantage of being portable and interoperable making it ubiquitous in mobile app development.

The from_featureclass() method can be used to create a SeDF by reading in data from these databases. Let's look at some examples.

Note: The operations below can only be performed in an environment that contains arcpy.

Read from a mobile geodatabase

# Reading from mobile geodatabase
mobile_gdb_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities_mobile.geodatabase/main.cities")
mobile_gdb_df.shape
(3886, 51)
# Check head
mobile_gdb_df.head(2)
OBJECTIDage_10_14age_15_19age_20_24age_25_34age_35_44age_45_54age_55_64age_5_9age_65_74...placefipspop2010populationpop_classrenter_occststfipsvacantwhiteSHAPE
011313105873420311767144611361503665...1601990138161518161271ID1627113002{"x": -12462673.7237, "y": 5384674.994099997, ...
1289081781817991235133011431099721...1607840118991194661441ID163189893{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
mobile_gdb_df.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Read from a SQLite database

# Reading from sqlite database
sqlite_df = pd.DataFrame.spatial.from_featureclass(
    location="./sedf_data/cities/cities.sqlite/main.cities")
sqlite_df.shape
(3886, 51)
# Check head
sqlite_df.head(2)
OBJECTIDage_10_14age_15_19age_20_24age_25_34age_35_44age_45_54age_55_64age_5_9age_65_74...placefipspop2010populationpop_classrenter_occststfipsvacantwhiteSHAPE
011313105873420311767144611361503665...1601990138161518161271ID1627113002{"x": -12462673.7237, "y": 5384674.994099997, ...
1289081781817991235133011431099721...1607840118991194661441ID163189893{"x": -12506251.314, "y": 5341537.793499999, "...

2 rows × 51 columns

# Check geometry type
sqlite_df.spatial.geometry_type
['point']

The spatial namespace shows that a Spatially enabled DataFrame has been created.

Conclusion

In this guide, we explored how Spatially enabled DataFrame (SeDF) can be used to read spatial data from various formats. We started by reading data from web feature layers and using the query() operation to optimize performance and results. We explored reading data from various local data sources such as file geodatabase and shapefile. Next, we explained how data with address or coordinate information, in a geopandas dataframe, or in feather format can be used to create a SeDF. We also discussed creating SeDF from non-spatial table data. Towards the end, we also discussed how SeDF can be created using data from lite and portable databases.

In the next part of the guide series, you will learn about exporting data using Spatially enabled DataFrame.

Note: Given the importance and popularity of Spatially enabled DataFrame, we are revisiting our documentation for this topic. Our goal is to enhance the existing documentation to showcase the various capabilities of Spatially enabled DataFrame in detail with even more examples this time.

Creating quality documentation is time-consuming and exhaustive, but we are committed to providing you with the best experience possible. With that in mind, we will be rolling out the revamped guides on this topic as different parts of a guide series (like the Data Engineering or Geometry guide series). This is "part-2" of the guide series for Spatially Enabled DataFrame. You will continue to see the existing documentation as we revamp it to add new parts. Stay tuned for more on this topic.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.