Enrich Example

As in Part 1, we are looking at the example of a large retailer evaluating potential sites for a new location. This retailer is interested in using key criteria they know are important based on previous experience to evaluate a few candidates. These criteria include competition, traffic, economic feasibility and market potential for the areas surrounding the potential sites. Utilizing the GeoEnrichment module, the real estate site selection team can include demographic variables such as lifestyle, income, spending and education to understand potential customers in the study areas surrounding the candidate sites.

Although we will go a similar route, in this example all we have to start with are addresses.

from arcgis.geoenrichment import Country, enrich
from arcgis.gis import GIS

gis = GIS(profile="your_online_profile")
country = Country("usa", gis=gis)
country
<Country - United States (GIS @ https://geosaurus.maps.arcgis.com version:10.3)>
import pandas as pd

candidate_df = pd.read_csv("../data/health.csv").loc[
    :, ["Name", "Address", "City", "State", "Zip Code"]
]

candidate_df
NameAddressCityStateZip Code
0Facility 12468 SOUTH ST ANDREWS PLACELOS ANGELESCA90018
1Facility 22300 W. WASHINGTON BLVD.LOS ANGELESCA90018
2Facility 34060 E. WHITTIER BLVD.LOS ANGELESCA90023
3Facility 46070 W. PICO BOULEVARDLOS ANGELESCA90035
4Facility 51480 S. LA CIENEGA BLLOS ANGELESCA90035

Next, we are going to concatenate the address into one column and rename the Name column to loc_id to match more closely with the example from the GeoEnrichment Part 1 notebook.

# create full address string to make geocoding easier
candidate_df["full_address"] = candidate_df.apply(
    lambda r: ", ".join((r["Address"], r["City"], r["State"])) + f' {r["Zip Code"]}',
    axis=1,
)

# filter columns
candidate_df = candidate_df.loc[:, ["Name", "full_address"]].rename(
    columns={"Name": "loc_id"}
)

candidate_df
loc_idfull_address
0Facility 12468 SOUTH ST ANDREWS PLACE, LOS ANGELES, CA 9...
1Facility 22300 W. WASHINGTON BLVD., LOS ANGELES, CA 90018
2Facility 34060 E. WHITTIER BLVD., LOS ANGELES, CA 90023
3Facility 46070 W. PICO BOULEVARD, LOS ANGELES, CA 90035
4Facility 51480 S. LA CIENEGA BL, LOS ANGELES, CA 90035

Enrich Variables

We are going to use the same variables for enrichment as in Part 1.

analysis_variables = [
    "TOTPOP_CY",  # Population: Total Population (Esri)
    "DIVINDX_CY",  # Diversity Index (Esri)
    "AVGHHSZ_CY",  # Average Household Size (Esri)
    "MEDAGE_CY",  # Age: Median Age (Esri)
    "MEDHINC_CY",  # Income: Median Household Income (Esri)
    "BACHDEG_CY",  # Education: Bachelor's Degree (Esri)
]

analysis_variables
['TOTPOP_CY',
 'DIVINDX_CY',
 'AVGHHSZ_CY',
 'MEDAGE_CY',
 'MEDHINC_CY',
 'BACHDEG_CY']

Define Study Areas

The enrich capability in Business Analyst requires polygon areas to be used for apportioning demographic data to the input geographies. In this case, they are addresses defining store locations. Geocoding can be used to get the location of the stores, but the enrich method still requires areas to be able to apportion demographic data.

First, we can use geocoding to get the geographic location of all the stores. Since we are using a Pandas DataFrame, we can take advantage of data manipulation and schema pruning. Firstly, to concatenate the components of the addresss into a concise column for geocoding. Secondly, by reducing the geocoding response to just columns we need for subsequent analysis steps.

from arcgis.features import GeoAccessor
from arcgis.geocoding import get_geocoders

# ensure using intended geocoder
agol_geocoder = get_geocoders(gis)[0]

# geocode the addresses and prune the retunred columns
geocode_df = GeoAccessor.from_df(
    candidate_df, address_column="full_address", geocoder=agol_geocoder
).loc[:, ["loc_id", "full_address", "SHAPE"]]

# following pruning schema, re-enable spatial
geocode_df.spatial.set_geometry("SHAPE")
assert geocode_df.spatial.validate()

geocode_df
loc_idfull_addressSHAPE
0Facility 12468 SOUTH ST ANDREWS PLACE, LOS ANGELES, CA 9...{"x": -118.31127251419741, "y": 34.03313999252...
1Facility 22300 W. WASHINGTON BLVD., LOS ANGELES, CA 90018{"x": -118.31183535899584, "y": 34.03988893331...
2Facility 34060 E. WHITTIER BLVD., LOS ANGELES, CA 90023{"x": -118.1843180294075, "y": 34.023902464669...
3Facility 46070 W. PICO BOULEVARD, LOS ANGELES, CA 90035{"x": -118.37276542483494, "y": 34.05264979417...
4Facility 51480 S. LA CIENEGA BL, LOS ANGELES, CA 90035{"x": -118.37613251915946, "y": 34.05099298527...

As in the example from the first Notebook, study areas can be polygons defined manually beforehand and provided as input. They can also be standard geographic areas defined with the unique identifiers for the areas, such as postal (ZIP) codes. Finally, as is the case with our example, study areas can be provided as lines or points. Since lines and points do not define an area, in these cases, polygons are created on the server to use for apportioning data to each location.

The polygons created around lines and points, by default, is a five kilometer straight-line buffered area. This can be controlled using the proximity parameters of the enrich method; proximity_type, proximity_value and proximity_metric. For line geometries, only the straight line method can be used, but for point geometries, any transportation network method available in the GIS can be used to define the area surrounding the points, thus delineating the study areas to be used.

Discover Available Travel Modes

In the example we know customers will travel about eight minutes to visit the store locations. The enrich method is capable of creating eight-minute drive time areas around the stores for us, but we need to know how to provide the correct inputs. We can discover available travel modes using the travel_modes property.

country.travel_modes
namealiasdescriptiontypeimpedanceimpedance_categorytime_attribute_namedistance_attribute_nametravel_mode_idtravel_mode_dict
0driving_timeDriving TimeModels the movement of cars and other similar ...AUTOMOBILETravelTimetemporalTravelTimeKilometersFEgifRtFndKNcJMJ{"attributeParameterValues": [{"attributeName"...
1driving_distanceDriving DistanceModels the movement of cars and other similar ...AUTOMOBILEKilometersdistanceTravelTimeKilometersiKjmHuBSIqdEfOVr{"attributeParameterValues": [{"attributeName"...
2trucking_timeTrucking TimeModels basic truck travel by preferring design...TRUCKTruckTravelTimetemporalTruckTravelTimeKilometersZzzRtYcPLjXFBKwr{"attributeParameterValues": [{"attributeName"...
3trucking_distanceTrucking DistanceModels basic truck travel by preferring design...TRUCKKilometersdistanceTruckTravelTimeKilometersUBaNfFWeKcrRVYIo{"attributeParameterValues": [{"attributeName"...
4walking_timeWalking TimeFollows paths and roads that allow pedestrian ...WALKWalkTimetemporalWalkTimeKilometerscaFAgoThrvUpkFBW{"attributeParameterValues": [{"attributeName"...
5walking_distanceWalking DistanceFollows paths and roads that allow pedestrian ...WALKKilometersdistanceWalkTimeKilometersyFuMFwIYblqKEefX{"attributeParameterValues": [{"attributeName"...
6rural_driving_timeRural Driving TimeModels the movement of cars and other similar ...AUTOMOBILETravelTimetemporalTravelTimeKilometersNmNhNDUwZmE1YTlj{"attributeParameterValues": [{"attributeName"...
7rural_driving_distanceRural Driving DistanceModels the movement of cars and other similar ...AUTOMOBILEKilometersdistanceTravelTimeKilometersYzk3NjI1NTU5NjVj{"attributeParameterValues": [{"attributeName"...

Any value from the name column can be used direclty as input to the enrich method to define the study area proximity_type. For this example, to define eight minute drive times, we can populate the proximity factors accordingly.

enrich_df = country.enrich(
    geocode_df,
    enrich_variables=analysis_variables,
    proximity_type="driving_time",
    proximity_value=8,
    proximity_metric="minutes",
)


enrich_df
loc_idfull_addresssource_countryarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datamedage_cytotpop_cyavghhsz_cybachdeg_cymedhinc_cydivindx_cySHAPE
0Facility 12468 SOUTH ST ANDREWS PLACE, LOS ANGELES, CA 9...USANetworkServiceAreaMinutesDrive Time Minutes8.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576132.8276718.02.8133723.048083.087.8{"rings": [[[-118.31409427984764, 34.064380416...
1Facility 22300 W. WASHINGTON BLVD., LOS ANGELES, CA 90018USANetworkServiceAreaMinutesDrive Time Minutes8.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576133.7305454.02.6843130.050213.088.1{"rings": [[[-118.31409427984764, 34.072465217...
2Facility 34060 E. WHITTIER BLVD., LOS ANGELES, CA 90023USANetworkServiceAreaMinutesDrive Time Minutes8.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576130.5170309.03.629400.052719.065.7{"rings": [[[-118.16227969122916, 34.070668594...
3Facility 46070 W. PICO BOULEVARD, LOS ANGELES, CA 90035USANetworkServiceAreaMinutesDrive Time Minutes8.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576138.5201739.02.2154857.096918.080.4{"rings": [[[-118.36597175035031, 34.088185662...
4Facility 51480 S. LA CIENEGA BL, LOS ANGELES, CA 90035USANetworkServiceAreaMinutesDrive Time Minutes8.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576138.6198064.02.1854132.097074.079.6{"rings": [[[-118.37652690642967, 34.088185662...

The response includes metadata related to how the enrichment was performed. However, if we are only interested in the demographic columns added, we can filter using the available enrich variable names.

# get just the enrich columns
enrich_cols = [
    c for c in enrich_df if c in country.enrich_variables.name.str.lower().values
]

# combine the enrich columns with a few others we want to keep
keep_cols = ["loc_id"] + enrich_cols + ["SHAPE"]

# filter the enrich data frame to just these columns
enrich_df = enrich_df.loc[:, keep_cols].set_index("loc_id")

# re-enable spatial awareness
enrich_df.spatial.set_geometry("SHAPE")

enrich_df
medage_cytotpop_cyavghhsz_cybachdeg_cymedhinc_cydivindx_cySHAPE
loc_id
Facility 132.8276718.02.8133723.048083.087.8{"rings": [[[-118.31409427984764, 34.064380416...
Facility 233.7305454.02.6843130.050213.088.1{"rings": [[[-118.31409427984764, 34.072465217...
Facility 330.5170309.03.629400.052719.065.7{"rings": [[[-118.16227969122916, 34.070668594...
Facility 438.5201739.02.2154857.096918.080.4{"rings": [[[-118.36597175035031, 34.088185662...
Facility 538.6198064.02.1854132.097074.079.6{"rings": [[[-118.37652690642967, 34.088185662...

Evaluate Results

An extremely effective starting point for analysis is visualizing the results. Here, we are using matplotlib to visualize the differences between the locations based on the enriched data.

# this is due to a deprication warning inside matplotlib
import warnings

import matplotlib.pyplot as plt

warnings.filterwarnings("ignore")

fig, axs = plt.subplots(2, 3)
fig.set_figheight(10.0)
fig.set_figwidth(18.0)
fig.subplots_adjust(hspace=0.4)

plt.sca(axs[0, 0])
_ = enrich_df.medage_cy.plot(title="Median Age", kind="bar")

plt.sca(axs[0, 1])
_ = enrich_df.totpop_cy.plot(title="Total Population", kind="bar")

plt.sca(axs[0, 2])
_ = enrich_df.avghhsz_cy.plot(title="Average Household Size", kind="bar")

plt.sca(axs[1, 0])
_ = enrich_df.bachdeg_cy.plot(title="Bachelor's Degree", kind="bar")

plt.sca(axs[1, 1])
_ = enrich_df.medhinc_cy.plot(title="Median Household Income", kind="bar")

plt.sca(axs[1, 2])
_ = enrich_df.divindx_cy.plot(title="Diversity Index", kind="bar")
<Figure size 1296x720 with 6 Axes>

As in Part 1:
Facility 1 and facility 2 have higher populations, and are diverse with less income. Facility 3 is far younger with larger households, less education, and have lower incomes. Facility 4 and facility 5 are older, more educated and have a higher income.

If interested in opening a discount department store, facility 2 is the most attractive location with facility 1 as a close second. The diversity and lower income can allow us to conclude that people will buy at lower prices.

If interested in opening a quick service restaurant, facility 3 may be the best option to meet the needs of a young, busy and price conscious population.

Obviously, depending on the key characteristics of the business looking for a new location, the key demographic indicators will be different. Using geoenrichment, paired with the ArcGIS API for Python, enables extremely quick access to demographic variables for informed decision making.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.