- 👟 Ready To Run!
- 📦 Content Management
- 🗃️ Administration
Requirements
- 🔒 Administrator Privileges
Some organizations require specific background and descriptive information on data items before they'll consider it a valid data holding. This background and descriptive information is known as metadata. An item's metadata can record whatever information is important for the organization to know about that item. In addition to descriptive information, this might include information about how accurate and recent the item is, restrictions associated with using and sharing the item, and important processes in its life cycle.
Each organization can define the metadata attributes necessary for the item to be considered valid. In addition, an organizaton may rely on specific metadata standards and styles to help identify the information it needs to know about geospatial and relevant nonspatial resources and how to store and present that information. For more details and approaches for storing metatdata, see the Enterprise Metadata documentation. Note that while the item metadata is similar in concept to conventional metadata (information that describes and explains data), it follows certain standards and specifications in the form of metadata properties to regard it as a valid ArcGIS item.
This notebook demonstrates one potential method to inspect items to ensure they contain certain default Item Description metadata properties an organization has deemed necessary. The notebook outputs a csv file with a value of False for each property an Item does not have, True for those it does, plus some additional item attributes.
Import the necessary libraries and connect to the GIS
import os
import datetime as dt
import pandas as pd
from arcgis.gis import GIS, ItemTypeEnum, ItemProperties
gis = GIS("home")
When you add an Item to your Organization, certain metadata properties are required, including an item title
and tags
. The item type
is also required, and with that type a set of typeKeywords
are automatically added to an item. No matter how you add items to the Organization, these metadata properties are present.
Let's specify an additional list of properties that our organization will require to describe items in our Organization. We'll create a list of strings to make sure items have a description, a thumbnail (other than the default), and a snippet.
Define the Organization's valid Metadata Profile
item_profile = ['description', 'thumbnail', 'snippet']
Next, we'll define a function that loops through our item profile list, and inspects the value for each profile attribute for the items each user in our Organization owns. For each thumbnail, we'll check to see whether the default thumbnail has been changed.
We'll then create a list of True/False values for each item:
- True if it has the property or has added a thumbnail
- False if the property is missing or the item uses the default thumbnail.
We'll then append the item id and url (if present) to this True/False list for later use to create an informative file.
Define a function to inspect the metadata of an item
def get_missing_item_attrs(portal_item):
"""Returns a list of True/False values for specific
properties as well as the item id and url (if
applicable for each item in the portal.
"""
non_compliance = []
for attr in item_profile:
if attr == 'thumbnail':
if getattr(portal_item, attr) is not None:
if 'ago_downloaded' in getattr(portal_item, attr):
non_compliance.append(False)
else:
non_compliance.append(True)
else:
non_compliance.append(False)
else:
if getattr(portal_item, attr) == None:
non_compliance.append(False)
else:
non_compliance.append(True)
non_compliance.append(portal_item.id)
non_compliance.append(portal_item.url)
return non_compliance
Create a Data Structure for each item's metadata status
Now we'll use a Python dictionary
to create a data structure so we can inspect each item. We'll create a list of users in the GIS. While looping over the list of users, we'll examine each folder the user owns for items and call the function we defined above on each item to create a list of the status for each metadata attribute we're interested in.
We'll then use the list for each item to populate a dictionary. Each key will be a unique name for each item (Since item titles in an Organization can be indentical, we'll use string indexing and concatenation to combine item attributes into a name that uniquely identifies each item). Each value will be a list with the True/False attributes regarding the metadata plus the item id and url.
In addition to this dictionary, the cell below prints information on each user, each folder the user owns, and number of items in each folder.
item_profile_status = {}
for user in gis.users.search():
print(f"{user.username.upper()}\n{'-'*50}")
print(f"\tRoot Folder: {user.username.lower()}\n\t{'='*25}")
if user.items():
print(f"\t\t- {len(user.items())} items")
for item in user.items():
missing_item_atts = get_missing_item_attrs(item)
item_profile_status[item.title[:50] + '_' +
str(int(item.created/1000))] = missing_item_atts
else:
print(f"\t\t- {len(user.items())} items")
if user.folders:
for folder in user.folders:
if user.items(folder=folder):
print(f"\t{folder['title']}\n\t{'='*25}")
print(f"\t\t- {len(user.items(folder=folder))} items")
for item in user.items(folder=folder):
missing_item_atts = get_missing_item_attrs(item)
item_profile_status[item.title[:50] + '_' +
str(int(item.created/1000))] = missing_item_atts
else:
print(f"\t{folder['title'].capitalize()}\n\t{'='*25}")
print(f"\t\t-0 items")
print("\n")
Create a Pandas Dataframe for writing out to a csv file
Let's first inspect the first five elements from the dictionary of data items:
list(item_profile_status.items())[:5]
[('StreamOverlay178515_Buffer_1615887205', [True, True, True, '8ace59c5a8be401bbddaccfae0a39305', '']), ('StreamOverlay178515_Buffer_1615887208', [True, True, True, 'bc6a732940e84e67a07b4dc299e0f5cf', 'https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/StreamOverlay178515_Buffer/FeatureServer']), ('test1_1615944782', [False, False, True, '7c50884101b14a5986a419ed756a629c', 'https://geosaurus.maps.arcgis.com/apps/webappviewer/index.html?id=7c50884101b14a5986a419ed756a629c']), ('test1_1615944783', [False, False, False, '8a9aec45b9474b3eb61d940c0712f15f', '//geosaurus.maps.arcgis.com/sharing/rest/content/items/7c50884101b14a5986a419ed756a629c/package']), ('澜沧江流域2010土地利用_1615962321', [False, False, False, 'f83e5fe270a84d29abe9ce77c110e02f', None])]
Now we'll create a list based upon our original item profile list. We'll add two members to the list corresponding to the item id and url values we recorded for each item.
new_item_profile = item_profile + ['itemID', 'url']
new_item_profile
['description', 'thumbnail', 'snippet', 'itemID', 'url']
Next, we'll create the dataframe, using the new list as the index
for transposing the dataframe to one with each item as a row:
pd.set_option('display.max_colwidth', 175) # for display of lengthy text values
item_profile_df = pd.DataFrame(data=item_profile_status,
index=new_item_profile).T
item_profile_df.head()
description | thumbnail | snippet | itemID | url | |
---|---|---|---|---|---|
StreamOverlay178515_Buffer_1615887205 | True | True | True | 8ace59c5a8be401bbddaccfae0a39305 | |
StreamOverlay178515_Buffer_1615887208 | True | True | True | bc6a732940e84e67a07b4dc299e0f5cf | https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/StreamOverlay178515_Buffer/FeatureServer |
test1_1615944782 | False | False | True | 7c50884101b14a5986a419ed756a629c | https://geosaurus.maps.arcgis.com/apps/webappviewer/index.html?id=7c50884101b14a5986a419ed756a629c |
test1_1615944783 | False | False | False | 8a9aec45b9474b3eb61d940c0712f15f | //geosaurus.maps.arcgis.com/sharing/rest/content/items/7c50884101b14a5986a419ed756a629c/package |
澜沧江流域2010土地利用_1615962321 | False | False | False | f83e5fe270a84d29abe9ce77c110e02f | None |
Write the dataframe to a csv
file and add it as an item
We'll add a timestamp to the output file to ensure uniqueness when adding the csv item to the Organization.
output_dir = "/arcgis/home/"
out_file = "org_item_profile_status_" + \
str(int(dt.datetime.now().timestamp())) + \
".csv"
item_profile_df.to_csv(os.path.join(output_dir, out_file),
index_label='item_name')
root_folder = gis.content.folders.get()
new_item_props = ItemProperties(
"title":out_file,
"type":ItemTypeEnum.CSV.value,
"tags":"item_metatdata_report",
"snippet":"Report on item attributes from API"
)
root_folder.add(
item_properties=new_item_props,
file=os.path.join(output_dir + out_file)
).result()
You may download this item if you wish, and if you decide to delete this item after having used it, you may run the script below by updating the item_id
with the id of this file in your organization.
item = gis.content.get(item_id)
item.delete()
Conclusion
This notebook checked attribute values for an organization's items against a pre-defined list of properties for item metadata, and based upon those values recorded the status of the metadata property. It combined these values with the id
and url
for any service backing the item (if applicable) and then wrote the results to a csv
file that was added to the Organization. This file can then be analyzed to message item owners to update the metadata for items to comply with organizational requirements.