Content Management: Validate item metadata

👟 Ready To Run!
📦 Content Management
🗃️ Administration

Requirements

🔒 Administrator Privileges

Some organizations require specific background and descriptive information on data items before they'll consider it a valid data holding. This background and descriptive information is known as metadata. An item's metadata can record whatever information is important for the organization to know about that item. In addition to descriptive information, this might include information about how accurate and recent the item is, restrictions associated with using and sharing the item, and important processes in its life cycle.

Each organization can define the metadata attributes necessary for the item to be considered valid. In addition, an organizaton may rely on specific metadata standards and styles to help identify the information it needs to know about geospatial and relevant nonspatial resources and how to store and present that information. For more details and approaches for storing metatdata, see the Enterprise Metadata documentation. Note that while the item metadata is similar in concept to conventional metadata (information that describes and explains data), it follows certain standards and specifications in the form of metadata properties to regard it as a valid ArcGIS item.

This notebook demonstrates one potential method to inspect items to ensure they contain certain default Item Description metadata properties an organization has deemed necessary. The notebook outputs a csv file with a value of False for each property an Item does not have, True for those it does, plus some additional item attributes.

Import the necessary libraries and connect to the GIS

import os
import datetime as dt

import pandas as pd

from arcgis.gis import GIS, ItemTypeEnum, ItemProperties

gis = GIS("home")

When you add an Item to your Organization, certain metadata properties are required, including an item title and tags. The item type is also required, and with that type a set of typeKeywords are automatically added to an item. No matter how you add items to the Organization, these metadata properties are present.

Let's specify an additional list of properties that our organization will require to describe items in our Organization. We'll create a list of strings to make sure items have a description, a thumbnail (other than the default), and a snippet.

Define the Organization's valid Metadata Profile

item_profile = ['description', 'thumbnail', 'snippet']

Next, we'll define a function that loops through our item profile list, and inspects the value for each profile attribute for the items each user in our Organization owns. For each thumbnail, we'll check to see whether the default thumbnail has been changed.

We'll then create a list of True/False values for each item:

True if it has the property or has added a thumbnail
False if the property is missing or the item uses the default thumbnail.

We'll then append the item id and url (if present) to this True/False list for later use to create an informative file.

Define a function to inspect the metadata of an item

def get_missing_item_attrs(portal_item):
    """Returns a list of True/False values for specific 
    properties as well as the item id and url (if 
    applicable for each item in the portal.
    """
    non_compliance = []
    for attr in item_profile:
        if attr == 'thumbnail':
            if getattr(portal_item, attr) is not None:
                if 'ago_downloaded' in getattr(portal_item, attr):
                    non_compliance.append(False)
                else:
                    non_compliance.append(True)
            else:
                non_compliance.append(False)
        else:
            if getattr(portal_item, attr) == None:
                non_compliance.append(False)
            else:
                non_compliance.append(True)
    non_compliance.append(portal_item.id)
    non_compliance.append(portal_item.url)
    return non_compliance

Create a Data Structure for each item's metadata status

Now we'll use a Python dictionary to create a data structure so we can inspect each item. We'll create a list of users in the GIS. While looping over the list of users, we'll examine each folder the user owns for items and call the function we defined above on each item to create a list of the status for each metadata attribute we're interested in.

We'll then use the list for each item to populate a dictionary. Each key will be a unique name for each item (Since item titles in an Organization can be indentical, we'll use string indexing and concatenation to combine item attributes into a name that uniquely identifies each item). Each value will be a list with the True/False attributes regarding the metadata plus the item id and url.

In addition to this dictionary, the cell below prints information on each user, each folder the user owns, and number of items in each folder.

item_profile_status = {}
for user in gis.users.search():
    print(f"{user.username.upper()}\n{'-'*50}")
    print(f"\tRoot Folder: {user.username.lower()}\n\t{'='*25}")
    if user.items():
        print(f"\t\t- {len(user.items())} items")
        for item in user.items():
            missing_item_atts = get_missing_item_attrs(item)
            item_profile_status[item.title[:50] + '_' +
                str(int(item.created/1000))] = missing_item_atts
    else:
        print(f"\t\t- {len(user.items())} items")
    if user.folders:
          for folder in user.folders:
            if user.items(folder=folder):
                print(f"\t{folder['title']}\n\t{'='*25}")
                print(f"\t\t- {len(user.items(folder=folder))} items")
                for item in user.items(folder=folder):
                    missing_item_atts = get_missing_item_attrs(item)
                    item_profile_status[item.title[:50] + '_' +
                        str(int(item.created/1000))] = missing_item_atts
            else:
                print(f"\t{folder['title'].capitalize()}\n\t{'='*25}")
                print(f"\t\t-0 items")
    print("\n")

Create a Pandas Dataframe for writing out to a csv file

Let's first inspect the first five elements from the dictionary of data items:

list(item_profile_status.items())[:5]

[('StreamOverlay178515_Buffer_1615887205',
  [True, True, True, '8ace59c5a8be401bbddaccfae0a39305', '']),
 ('StreamOverlay178515_Buffer_1615887208',
  [True,
   True,
   True,
   'bc6a732940e84e67a07b4dc299e0f5cf',
   'https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/StreamOverlay178515_Buffer/FeatureServer']),
 ('test1_1615944782',
  [False,
   False,
   True,
   '7c50884101b14a5986a419ed756a629c',
   'https://geosaurus.maps.arcgis.com/apps/webappviewer/index.html?id=7c50884101b14a5986a419ed756a629c']),
 ('test1_1615944783',
  [False,
   False,
   False,
   '8a9aec45b9474b3eb61d940c0712f15f',
   '//geosaurus.maps.arcgis.com/sharing/rest/content/items/7c50884101b14a5986a419ed756a629c/package']),
 ('澜沧江流域2010土地利用_1615962321',
  [False, False, False, 'f83e5fe270a84d29abe9ce77c110e02f', None])]

Now we'll create a list based upon our original item profile list. We'll add two members to the list corresponding to the item id and url values we recorded for each item.

new_item_profile = item_profile + ['itemID', 'url']

new_item_profile

['description', 'thumbnail', 'snippet', 'itemID', 'url']

Next, we'll create the dataframe, using the new list as the index for transposing the dataframe to one with each item as a row:

pd.set_option('display.max_colwidth', 175) # for display of lengthy text values

item_profile_df = pd.DataFrame(data=item_profile_status, 
                               index=new_item_profile).T
item_profile_df.head()

	description	thumbnail	snippet	itemID	url
StreamOverlay178515_Buffer_1615887205	True	True	True	8ace59c5a8be401bbddaccfae0a39305
StreamOverlay178515_Buffer_1615887208	True	True	True	bc6a732940e84e67a07b4dc299e0f5cf	https://services7.arcgis.com/JEwYeAy2cc8qOe3o/arcgis/rest/services/StreamOverlay178515_Buffer/FeatureServer
test1_1615944782	False	False	True	7c50884101b14a5986a419ed756a629c	https://geosaurus.maps.arcgis.com/apps/webappviewer/index.html?id=7c50884101b14a5986a419ed756a629c
test1_1615944783	False	False	False	8a9aec45b9474b3eb61d940c0712f15f	//geosaurus.maps.arcgis.com/sharing/rest/content/items/7c50884101b14a5986a419ed756a629c/package
澜沧江流域2010土地利用_1615962321	False	False	False	f83e5fe270a84d29abe9ce77c110e02f	None

Write the dataframe to a `csv` file and add it as an item

We'll add a timestamp to the output file to ensure uniqueness when adding the csv item to the Organization.

output_dir = "/arcgis/home/"
out_file = "org_item_profile_status_" + \
            str(int(dt.datetime.now().timestamp())) + \
            ".csv"

item_profile_df.to_csv(os.path.join(output_dir, out_file), 
                       index_label='item_name')

root_folder = gis.content.folders.get()

new_item_props = ItemProperties(
    "title":out_file,
    "type":ItemTypeEnum.CSV.value,
    "tags":"item_metatdata_report",
    "snippet":"Report on item attributes from API"
)

root_folder.add(
    item_properties=new_item_props,
    file=os.path.join(output_dir + out_file)
).result()

org_item_profile_status_1686348049

CSV by MMajumdar_geosaurus
Last Modified: June 09, 2023
0 comments, 0 views

You may download this item if you wish, and if you decide to delete this item after having used it, you may run the script below by updating the item_id with the id of this file in your organization.

item = gis.content.get(item_id)
item.delete()

Conclusion

This notebook checked attribute values for an organization's items against a pre-defined list of properties for item metadata, and based upon those values recorded the status of the metadata property. It combined these values with the id and url for any service backing the item (if applicable) and then wrote the results to a csv file that was added to the Organization. This file can then be analyzed to message item owners to update the metadata for items to comply with organizational requirements.