Forward geocoding is the process of taking an address or place information and identifying its location on the globe.
To geocode addresses, the arcgisgeocode package provides the function find_address_candidates()
. This function geocodes a single address at a time and returns up to 50 address candidates (ranked by a score).
There are two ways in which you can provide address information:
- Provide the entire address as a string via the
single_line
argument - Provide parts of the address using the arguments
address
,city
,region
,postal
etc.
Single line address geocoding
It can be tough to parse out addresses into their components. Using the single_line
argument is a very flexible way of geocoding addresses. Doing utilizes the ArcGIS World Geocoder’s address parsing capabilities.
For example, we can geocode the same location using 3 decreasingly specific addresses.
library(arcgisgeocode) addresses <- c( "380 New York Street Redlands, California, 92373, USA", "Esri Redlands", "ESRI CA" ) locs <- find_address_candidates( addresses, max_locations = 1L ) locs$geometry
#> Geometry set for 3 features #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -117.1948 ymin: 34.05726 xmax: -117.1948 ymax: 34.05726 #> Geodetic CRS: WGS 84 #> POINT (-117.1948 34.05726) #> POINT (-117.1957 34.05609) #> POINT (-117.1957 34.05609)
In each case, it finds the correct address!
Geocoding from a dataframe
Most commonly, you will need to geocode addresses from a column in a data.frame. It is important to note that the find_address_candidates()
function does not work well in a dplyr::mutate()
function call. Particularly because it is possible to return more than 1 address at a time.
Let’s read in a csv of bike stores in Tacoma, WA. To use find_address_candidates()
with a data.frame, it is recommended to create a unique identifier of the row positions.
library(dplyr) fp <- "https://www.arcgis.com/sharing/rest/content/items/9a9b91179ac44db1b689b42017471ae6/data" bike_stores <- readr::read_csv(fp) |> mutate(id = row_number()) bike_stores
#> # A tibble: 10 × 3 #> store_name original_address id #> <chr> <chr> <int> #> 1 Cascadia Wheel Co. 3320 N Proctor St, Tacoma, WA 98407 1 #> 2 Puget Sound Bike and Ski Shop between 3206 N. 15th and 1414, N Alder St, Tacoma, WA 98406 2 #> 3 Takoma Bike & Ski 3010 6th Ave, Tacoma, WA 98406 3 #> 4 Trek Bicycle Tacoma University Place 3550 Market Pl W Suite 102, University Place, WA 98466 4 #> 5 Opalescent Cyclery 814 6th Ave, Tacoma, WA 98405 5 #> 6 Sound Bikes 108 W Main, Puyallup, WA 98371 6 #> 7 Trek Bicycle Tacoma North End 3009 McCarver St, Tacoma, WA 98403 7 #> 8 Second Cycle 1205 M.L.K. Jr Way, Tacoma, WA 98405 8 #> 9 Penny bike co. 6419 24th St NE, Tacoma, WA 98422 9 #> 10 Spider's Bike, Ski & Tennis Lab 3608 Grandview St, Gig Harbor, WA 98335 10
To geocode addresses from a data.frame, you can use dplyr::reframe()
.
bike_stores |> reframe( find_address_candidates(original_address) )
#> # A tibble: 13 × 62 #> input_id result_id loc_name status score match_addr long_label short_label addr_type type_field place_name #> <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 1 NA World M 100 3320 N Proctor St, … 3320 N Pr… 3320 N Pro… PointAdd… <NA> <NA> #> 2 2 NA World M 97.6 N 15th St & N Alder… N 15th St… N 15th St … StreetInt <NA> <NA> #> 3 2 NA World M 97.3 1414 N Alder St, Ta… 1414 N Al… 1414 N Ald… PointAdd… <NA> <NA> #> 4 2 NA World M 94.7 S 15th St & S Alder… S 15th St… S 15th St … StreetInt <NA> <NA> #> 5 2 NA World M 84.4 3206 N 15th St, Tac… 3206 N 15… 3206 N 15t… PointAdd… <NA> <NA> #> 6 3 NA World M 100 3010 6th Ave, Tacom… 3010 6th … 3010 6th A… PointAdd… <NA> <NA> #> 7 4 NA World M 100 3550 Market Pl W, S… 3550 Mark… 3550 Marke… Subaddre… <NA> <NA> #> 8 5 NA World M 100 814 6th Ave, Tacoma… 814 6th A… 814 6th Ave PointAdd… <NA> <NA> #> 9 6 NA World M 100 108 W Main, Puyallu… 108 W Mai… 108 W Main PointAdd… <NA> <NA> #> 10 7 NA World M 100 3009 McCarver St, T… 3009 McCa… 3009 McCar… PointAdd… <NA> <NA> #> 11 8 NA World M 100 1205 Martin Luther … 1205 Mart… 1205 Marti… PointAdd… <NA> <NA> #> 12 9 NA World M 97.9 6419 24th St NE, Ta… 6419 24th… 6419 24th … PointAdd… <NA> <NA> #> 13 10 NA World M 100 3608 Grandview St, … 3608 Gran… 3608 Grand… PointAdd… <NA> <NA> #> # ℹ 51 more variables: place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>, #> # add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>, st_pre_type <chr>, #> # st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>, #> # unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, #> # district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>, region_abbr <chr>, territory <chr>, #> # zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>, lang_code <chr>, distance <dbl>, #> # x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, ymin <dbl>, ymax <dbl>, …
Notice how there are multiple results for each input_id
. This is because the max_locations
argument was not specified. To ensure only the best match is returned set max_locations = 1
geocoded <- bike_stores |> reframe( find_address_candidates(original_address, max_locations = 1) ) |> # reframe drops the sf class, must be added sf::st_as_sf() geocoded
#> Simple feature collection with 10 features and 61 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 62 #> input_id result_id loc_name status score match_addr long_label short_label addr_type type_field place_name #> <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 1 NA World M 100 3320 N Proctor St, … 3320 N Pr… 3320 N Pro… PointAdd… <NA> <NA> #> 2 2 NA World M 97.6 N 15th St & N Alder… N 15th St… N 15th St … StreetInt <NA> <NA> #> 3 3 NA World M 100 3010 6th Ave, Tacom… 3010 6th … 3010 6th A… PointAdd… <NA> <NA> #> 4 4 NA World M 100 3550 Market Pl W, S… 3550 Mark… 3550 Marke… Subaddre… <NA> <NA> #> 5 5 NA World M 100 814 6th Ave, Tacoma… 814 6th A… 814 6th Ave PointAdd… <NA> <NA> #> 6 6 NA World M 100 108 W Main, Puyallu… 108 W Mai… 108 W Main PointAdd… <NA> <NA> #> 7 7 NA World M 100 3009 McCarver St, T… 3009 McCa… 3009 McCar… PointAdd… <NA> <NA> #> 8 8 NA World M 100 1205 Martin Luther … 1205 Mart… 1205 Marti… PointAdd… <NA> <NA> #> 9 9 NA World M 97.9 6419 24th St NE, Ta… 6419 24th… 6419 24th … PointAdd… <NA> <NA> #> 10 10 NA World M 100 3608 Grandview St, … 3608 Gran… 3608 Grand… PointAdd… <NA> <NA> #> # ℹ 51 more variables: place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, add_bldg <chr>, add_num <chr>, #> # add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>, st_pre_dir <chr>, st_pre_type <chr>, #> # st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, level_type <chr>, level_name <chr>, #> # unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>, sector <chr>, nbrhd <chr>, #> # district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>, region_abbr <chr>, territory <chr>, #> # zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>, lang_code <chr>, distance <dbl>, #> # x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, ymin <dbl>, ymax <dbl>, …
With this result, you can now join the address fields back onto the bike_stores
data.frame using a left_join()
.
left_join( bike_stores, geocoded, by = c("id" = "input_id") ) |> # left_join keeps the class of the first table # must add sf class back on sf::st_as_sf()
#> Simple feature collection with 10 features and 63 fields #> Geometry type: POINT #> Dimension: XY #> Bounding box: xmin: -122.5871 ymin: 47.19164 xmax: -122.294 ymax: 47.32301 #> Geodetic CRS: WGS 84 #> # A tibble: 10 × 64 #> store_name original_address id result_id loc_name status score match_addr long_label short_label addr_type #> <chr> <chr> <int> <int> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> #> 1 Cascadia Wheel C… 3320 N Proctor … 1 NA World M 100 3320 N Pr… 3320 N Pr… 3320 N Pro… PointAdd… #> 2 Puget Sound Bike… between 3206 N.… 2 NA World M 97.6 N 15th St… N 15th St… N 15th St … StreetInt #> 3 Takoma Bike & Ski 3010 6th Ave, T… 3 NA World M 100 3010 6th … 3010 6th … 3010 6th A… PointAdd… #> 4 Trek Bicycle Tac… 3550 Market Pl … 4 NA World M 100 3550 Mark… 3550 Mark… 3550 Marke… Subaddre… #> 5 Opalescent Cycle… 814 6th Ave, Ta… 5 NA World M 100 814 6th A… 814 6th A… 814 6th Ave PointAdd… #> 6 Sound Bikes 108 W Main, Puy… 6 NA World M 100 108 W Mai… 108 W Mai… 108 W Main PointAdd… #> 7 Trek Bicycle Tac… 3009 McCarver S… 7 NA World M 100 3009 McCa… 3009 McCa… 3009 McCar… PointAdd… #> 8 Second Cycle 1205 M.L.K. Jr … 8 NA World M 100 1205 Mart… 1205 Mart… 1205 Marti… PointAdd… #> 9 Penny bike co. 6419 24th St NE… 9 NA World M 97.9 6419 24th… 6419 24th… 6419 24th … PointAdd… #> 10 Spider's Bike, S… 3608 Grandview … 10 NA World M 100 3608 Gran… 3608 Gran… 3608 Grand… PointAdd… #> # ℹ 53 more variables: type_field <chr>, place_name <chr>, place_addr <chr>, phone <chr>, url <chr>, rank <dbl>, #> # add_bldg <chr>, add_num <chr>, add_num_from <chr>, add_num_to <chr>, add_range <chr>, side <chr>, #> # st_pre_dir <chr>, st_pre_type <chr>, st_name <chr>, st_type <chr>, st_dir <chr>, bldg_type <chr>, bldg_name <chr>, #> # level_type <chr>, level_name <chr>, unit_type <chr>, unit_name <chr>, sub_addr <chr>, st_addr <chr>, block <chr>, #> # sector <chr>, nbrhd <chr>, district <chr>, city <chr>, metro_area <chr>, subregion <chr>, region <chr>, #> # region_abbr <chr>, territory <chr>, zone <chr>, postal <chr>, postal_ext <chr>, country <chr>, cntry_name <chr>, #> # lang_code <chr>, distance <dbl>, x <dbl>, y <dbl>, display_x <dbl>, display_y <dbl>, xmin <dbl>, xmax <dbl>, …