Tutorial: Download data

Learn how to automate downloading data from portal using ArcGIS API for Python.

Download data

In this tutorial you will download and import data taken from the Los Angeles GeoHub using the ArcGIS API for Python. The data sets include a Trailheads (CSV), Trails (GeoJSON), and a Parks and Open Space (Shapefile) file.

The data will be stored locally on your machine.

Prerequisites

The ArcGIS API for Python tutorials use Jupyter Notebooks to execute Python code. If you are new to this environment, please see the guide to install the API and use notebooks locally.

Steps

Import modules and log in

  1. Import the GIS class and create a connection to ArcGIS Online. You will also load Path from pathlib and ZipFile from the Python standard library. Because the data is public, we can use an anonymous connection to ArcGIS Online to download the data.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    from arcgis.gis import GIS
    from pathlib import Path
    from zipfile import ZipFile
    
    gis = GIS()
    
    

Access the item by ID

  1. Create a variable to store the ID of the public data item.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    from arcgis.gis import GIS
    from pathlib import Path
    from zipfile import ZipFile
    
    gis = GIS()
    
    public_data_item_id = 'a04933c045714492bda6886f355416f2'
    
    
  2. The content property of a GIS object is an instance of a ContentManager class. This can be used to manage content in ArcGIS Online. The get() method makes an HTTP request to retrieve an Item object.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    gis = GIS()
    
    public_data_item_id = 'a04933c045714492bda6886f355416f2'
    
    # `ContentManager.get` will return `None` if there is no Item with ID `a04933c045714492bda6886f355416f2`
    data_item = gis.content.get(public_data_item_id)
    data_item
    
    

Download the item

  1. Download LA_Hub_datasets.zip to the notebook server's current location.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    public_data_item_id = 'a04933c045714492bda6886f355416f2'
    
    # `ContentManager.get` will return `None` if there is no Item with ID `a04933c045714492bda6886f355416f2`
    data_item = gis.content.get(public_data_item_id)
    data_item
    
    # configure where to save the data, and where the ZIP file is located
    data_path = Path('./data')
    if not data_path.exists():
        data_path.mkdir()
    zip_path = data_path.joinpath('LA_Hub_Datasets.zip')
    extract_path = data_path.joinpath('LA_Hub_datasets')
    data_item.download(save_path=data_path)
    
    
  2. Use ZipFile to extract the contents of the dataset.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    # `ContentManager.get` will return `None` if there is no Item with ID `a04933c045714492bda6886f355416f2`
    data_item = gis.content.get(public_data_item_id)
    data_item
    
    # configure where to save the data, and where the ZIP file is located
    data_path = Path('./data')
    if not data_path.exists():
        data_path.mkdir()
    zip_path = data_path.joinpath('LA_Hub_Datasets.zip')
    extract_path = data_path.joinpath('LA_Hub_datasets')
    data_item.download(save_path=data_path)
    
    zip_file = ZipFile(zip_path)
    zip_file.extractall(path=data_path)
    
    
  3. Call glob('*') on the extract_path to list the contents of the data directory.

    Use dark colors for code blocks
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    
    from arcgis.gis import GIS
    from pathlib import Path
    from zipfile import ZipFile
    
    gis = GIS()
    
    public_data_item_id = 'a04933c045714492bda6886f355416f2'
    
    # `ContentManager.get` will return `None` if there is no Item with ID `a04933c045714492bda6886f355416f2`
    data_item = gis.content.get(public_data_item_id)
    data_item
    
    # configure where to save the data, and where the ZIP file is located
    data_path = Path('./data')
    if not data_path.exists():
        data_path.mkdir()
    zip_path = data_path.joinpath('LA_Hub_Datasets.zip')
    extract_path = data_path.joinpath('LA_Hub_datasets')
    data_item.download(save_path=data_path)
    
    zip_file = ZipFile(zip_path)
    zip_file.extractall(path=data_path)
    
    files = [file.name for file in extract_path.glob('*')]
    files

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.