Which college district has the fewest low-income families?

A pilot program was run by a local cable operator in the county to provide low-cost computers and Internet access to low-income families with kids in high school. This showed a marked improvement in school performance for these kids, and the program has brought the company a fair amount of positive publicity and goodwill in the community.

Company officials now want to set up a similar program for community college students. The company provides Internet access to the five community college districts in the county, and officials are aware that the colleges are under a lot of pressure - they are facing funding cuts at the same time as increased demand for enrollment. To try to improve the situation the colleges are turning more and more to distance learning, primarily via the Internet. By providing computers and Internet access, the cable company can enable more low-income students to take advantage of online classes.

This case study uses ArcGIS API for Python to find districts that have the fewest low income families in order to empower these students.

We will use summarize_within tool to get the number of low-income families within each community district. We will also visualize this using the map widget.

Connect to your ArcGIS Online organization

We first establish a connection to our organization which could be an ArcGIS Online organization or an ArcGIS Enterprise. To be able to run the code using ArcGIS API for Python, we will need to provide credentials of a user within an ArcGIS Online organization.

from arcgis.gis import GIS
import pandas as pd

Please sign-in into your organization to continue to execute this notebook.

gis = GIS('home')

Get data for analysis

san_diego_data = gis.content.search('title:CommunityCollege_CensusTracts owner:api_data_owner', 
                                 'Feature layer',
                                  outside_org=True)

san_diego_data

[<Item title:"CommunityCollege_CensusTracts" type:Feature Layer Collection owner:api_data_owner>]

from IPython.display import display

for item in san_diego_data:
    display(item)

CommunityCollege_CensusTracts

Feature Layer Collection by api_data_owner
Last Modified: April 11, 2020
0 comments, 89 views

san_diego_item = san_diego_data[0] # get first item from the list of items

for lyr in san_diego_item.layers:
    print(lyr.properties.name)

census_tract_income
Community_College_Dist

Since the item is a Feature Layer Collection, accessing the layers property will give us a list of Feature Layers.

census_tract_income = san_diego_item.layers[0]

community_college_dist = san_diego_item.layers[1]

m1 = gis.map('San Diego')
m1

m1.add_layer(community_college_dist)

m2 = gis.map('San Diego')
m2

Find the community college district with the fewest low income families

Convert the layer into pandas dataframe to calculate the number of households in each tract with income less than $30,000.

sdf = pd.DataFrame.spatial.from_layer(census_tract_income)

sdf.columns

Index(['FID', 'TRACT', 'INCOME_ALL', 'INCOME_LES', 'INCOME_10K', 'INCOME_15K',
       'INCOME_20K', 'INCOME_25K', 'INCOME_30K', 'INCOME_35K', 'INCOME_40K',
       'INCOME_45K', 'INCOME_50K', 'INCOME_60K', 'INCOME_75K', 'INCOME_100',
       'INCOME_125', 'INCOME_150', 'INCOME_200', 'Shape__Area',
       'Shape__Length', 'SHAPE'],
      dtype='object')

sdf.head()

	FID	TRACT	INCOME_ALL	INCOME_LES	INCOME_10K	INCOME_15K	INCOME_20K	INCOME_25K	INCOME_30K	INCOME_35K	...	INCOME_50K	INCOME_60K	INCOME_75K	INCOME_100	INCOME_125	INCOME_150	INCOME_200	Shape__Area	Shape__Length	SHAPE
0	1	7700	4148	243	205	158	195	229	279	278	...	445	526	370	379	73	127	125	1724049.019531	6919.424522	{"rings": [[[-13051046.6746253, 3866695.333166...
1	2	7800	2510	294	132	180	160	135	250	116	...	280	263	178	107	64	52	9	2889814.199219	11223.567885	{"rings": [[[-13049196.649225, 3869830.7042951...
2	3	7901	2953	240	156	154	191	209	233	168	...	325	393	233	150	49	42	25	1785775.15625	5749.634908	{"rings": [[[-13051806.5792234, 3868598.509832...
3	4	7903	2429	154	163	184	174	171	139	195	...	288	145	310	124	30	43	19	1075470.988281	4651.499315	{"rings": [[[-13050375.5212048, 3868973.977334...
4	5	7904	3157	335	219	187	208	218	199	188	...	304	316	326	162	53	67	19	1318393.753906	4961.527797	{"rings": [[[-13050786.6266337, 3868042.625540...

5 rows × 22 columns

The census tract layer contains the number of households in each of several income categories, such as less than \$10,000, \$10,000 to \$15,000, \$15,000 to \$20,000, and so on.

The aim of the project is to provide support to families with an annual income less than \$30,000.

We will add a field to the census tract dataframe and sum the number of households in each tract with income less than \$30,000.

sdf['income_lt_30k'] = sdf['INCOME_LES'] + sdf['INCOME_10K'] + sdf['INCOME_15K'] + sdf['INCOME_20K'] + sdf['INCOME_25K']

sdf.income_lt_30k.head()

0    1030
1     901
2     950
3     846
4    1167
Name: income_lt_30k, dtype: Int32

sdf.head()

	FID	TRACT	INCOME_ALL	INCOME_LES	INCOME_10K	INCOME_15K	INCOME_20K	INCOME_25K	INCOME_30K	INCOME_35K	...	INCOME_60K	INCOME_75K	INCOME_100	INCOME_125	INCOME_150	INCOME_200	Shape__Area	Shape__Length	SHAPE	income_lt_30k
0	1	7700	4148	243	205	158	195	229	279	278	...	526	370	379	73	127	125	1724049.019531	6919.424522	{"rings": [[[-13051046.6746253, 3866695.333166...	1030
1	2	7800	2510	294	132	180	160	135	250	116	...	263	178	107	64	52	9	2889814.199219	11223.567885	{"rings": [[[-13049196.649225, 3869830.7042951...	901
2	3	7901	2953	240	156	154	191	209	233	168	...	393	233	150	49	42	25	1785775.15625	5749.634908	{"rings": [[[-13051806.5792234, 3868598.509832...	950
3	4	7903	2429	154	163	184	174	171	139	195	...	145	310	124	30	43	19	1075470.988281	4651.499315	{"rings": [[[-13050375.5212048, 3868973.977334...	846
4	5	7904	3157	335	219	187	208	218	199	188	...	316	326	162	53	67	19	1318393.753906	4961.527797	{"rings": [[[-13050786.6266337, 3868042.625540...	1167

5 rows × 23 columns

sdf.shape

(605, 23)

We will import the spatially enabled dataframe back into the GIS and create a feature layer.

census_tract = gis.content.import_data(sdf,
                                       title='CensusTract',
                                       tags='datascience')

census_tract

CensusTract

Feature Layer Collection by arcgis_python
Last Modified: April 18, 2023
0 comments, 0 views

Get the number of low-income households in each district

We will summarize census tracts by community college districts to find the total number of low-income households in each district. If a tract falls in two or more districts, the value for that tract will be split proportionally between the districts (based on the area of the tract in each district).

from arcgis.features.summarize_data import summarize_within
from datetime import datetime as dt

tracts_within_boundary = summarize_within(community_college_dist,
                                          census_tract,
                                          summary_fields=["income_lt_ SUM"],
                                          shape_units='SquareMiles',
                                          output_name='TractsWithinBoundary' + str(dt.now().microsecond))

{"cost": 0.61}

tracts_within_boundary

TractsWithinBoundary560119

Feature Layer Collection by jyaist_geosaurus
Last Modified: July 27, 2023
0 comments, 0 views

m3 = gis.map('San Diego')
m3

m3.add_layer(tracts_within_boundary)

The map displays the census tracts color-coded by the number of households in each census tract with income less than $30,000 per year.

tracts_within_boundary_lyr = tracts_within_boundary.layers[0]

sdf = pd.DataFrame.spatial.from_layer(tracts_within_boundary_lyr)

sdf.columns

Index(['OBJECTID_1', 'OBJECTID', 'DISTRICT', 'Shape_Leng', 'sum_income_lt_',
       'sum_Area_SquareMiles', 'Polygon_Count', 'AnalysisArea', 'SHAPE'],
      dtype='object')

sdf.sort_values(['sum_income_lt_'], inplace=True)

sdf.head()

	OBJECTID_1	OBJECTID	DISTRICT	Shape_Leng	sum_income_lt_	sum_Area_SquareMiles	Polygon_Count	AnalysisArea	SHAPE
1	2	3	MIRA COSTA COMMUNITY COLLEGE	529254.242323	28286.961822	179.904967	87	180.057307	{"rings": [[[-13069560.2323, 3941041.6565], [-...
0	1	5	SOUTHWESTERN COMMUNITY COLLEGE	484545.196366	40860.319742	171.085801	111	171.34353	{"rings": [[[-13045570.4191, 3857253.9374], [-...
3	4	1	GROSSMONT-CUYAMACA COMMUNITY COLLEGE	962386.012704	48778.127326	1137.093733	118	1137.329793	{"rings": [[[-13000869.2404, 3890488.4302], [-...
4	5	2	PALOMAR COMMUNITY COLLEGE	1538204.61099	56548.178816	2554.695555	152	2554.78782	{"rings": [[[-13078663.7712, 3962536.5573], [-...
2	3	4	SAN DIEGO COMMUNITY COLLEGE	608636.549263	127840.876015	217.566587	253	217.5859	{"rings": [[[-13039483.6168, 3888486.6244], [-...

Visualization to show district with fewest households

m4 = gis.map('San Diego')
m4

m4.add_layer(tracts_within_boundary, {"renderer":"ClassedSizeRenderer",
                                      "field_name": "sum_income_lt_"})

It's clear that the Mira Costa district has by far the fewest low-income households. That's where the pilot program could be set up.

Conclusion

We have successfully located a district with the fewest low income families. We can assess the success of the project for the next 6 months and give recommendations to expand the program across other areas in the country.