Detecting Swimming Pools using Automated Deep Learning

  • 🔬 Data Science
  • 🥠 Deep Learning and Object Detection

Introduction and objective

Deep Learning has achieved great success with state of the art results, but taking it to the field and solving real-world problems is still a challenge. Integration of the latest research in AI with ArcGIS opens up a world of opportunities. This notebook demonstrates an end-to-end deep learning workflow in using ArcGIS API for Python. The workflow consists of three major steps: (1) extract training data, (2) train a deep learning object detection model, (3) deploy the model for inference and create maps. To better illustrate this process, we choose detecting swmming pools in Redlands, CA using remote sensing imagery.

Part 1 - Export training data

To export training data, we need a labeled feature class that contains the bounding box for each object, and a raster layer that contains all the pixels and band information. In this swimming pool detection case, we have created feature class by hand labelling the bounding box of each swimming pool in Redlands using ArcGIS Pro and USA NAIP Imagery: Color Infrared as raster data.

from arcgis.gis import GIS

# Connect to gis
gis = GIS('home')
pool_bb = gis.content.get('0da0026a3a6d47dc8da0bcff6cf5bfb2')
pool_bb
SwimmingPoolLabels

Feature Layer Collection by api_data_owner
Last Modified: March 31, 2021
0 comments, 250 views
naip_item = gis.content.get('41f2bbe1f73f4388a2e1694df437f265')
naip_item
NAIP_Imagery
NAIP imagery - raster layer.
Tiled Imagery Layer by api_data_owner
Last Modified: September 11, 2024
0 comments, 40 views

With the feature class and raster layer, we are now ready to export training data using the 'Export Training Data For Deep Learning' tool in arcgis Pro. In addtion to feature class, raster layer, and output folder, we also need to speficy a few other parameters such as tile size (size of the image chips), stride size (distance to move in the X when creating the next image chip), chip format (TIFF, PNG, or JPEG), metadata format (how we are going to store those bounding boxes).

Depending on the size of your data, tile and stride size, and computing resources, this opertation can take 15mins~2hrs in our experiment. Also, do not re-run it if you already run it once unless you would like to update the setting.

Part 2 - model training

If you've already done part 1, you should already have both the training chips and swimming pool labels. Please change the path to your own export training data folder that contains "images" and "labels" folder.

Necessary imports

import os
import zipfile
from pathlib import Path

from arcgis.learn import prepare_data, AutoDL, ImageryModel
training_data = gis.content.get('73a29df69b344ce8b94fdb4c9df7103d')
training_data
detecting_swimming_pools_using_satellite_image_and_deep_learning

Image Collection by api_data_owner
Last Modified: August 28, 2020
0 comments, 556 views
filepath = training_data.download(file_name=training_data.name)
# Unzip training data
with zipfile.ZipFile(filepath, 'r') as zip_ref:
    zip_ref.extractall(Path(filepath).parent)
data_path = Path(os.path.join(os.path.splitext(filepath)[0]))

Prepare data that will be used for training

data = prepare_data(data_path, 
                    batch_size=4, 
                    chip_size=448,
                    class_mapping={'0': 'pool'})
data.classes
['background', 'pool']

Visualize training data

To get a sense of what the training data looks like, arcgis.learn.show_batch() method randomly picks a few training chips and visualize them.

%%time
data.show_batch()
CPU times: total: 20.1 s
Wall time: 1.24 s
<Figure size 1500x1500 with 9 Axes>

Load model architecture

model = AutoDL(data, total_time_limit=0.30)
Given time to process the dataset is: 0.3 hours
Number of images that can be processed in the given time: 5.0
Time required to process the entire dataset of 1845 images is 99.57 hours
%%time
model.fit()
09-09-2024 20:04:31: Selected networks: SingleShotDetector RetinaNet FasterRCNN YOLOv3 DETReg ATSS CARAFE CascadeRCNN CascadeRPN DCN Detectors DoubleHeads DynamicRCNN EmpiricalAttention FCOS FoveaBox FSAF GHM LibraRCNN PaFPN PISA RegNet RepPoints Res2Net SABL VFNet
09-09-2024 20:04:31: Current network - SingleShotDetector. 
09-09-2024 20:04:31: Total time alloted to train the SingleShotDetector model is 0:00:16
09-09-2024 20:04:31: Maximum number of epochs will be 20 to train SingleShotDetector
09-09-2024 20:04:31: Initializing the SingleShotDetector network.
09-09-2024 20:05:19: SingleShotDetector initialized with resnet34 backbone
09-09-2024 20:05:19: finding desired batch size for the data object.
09-09-2024 20:05:19: Optimized batch size for SingleShotDetector with the selected backbone is 64
09-09-2024 20:05:47: Best learning rate for SingleShotDetector with the selected data is 0.00363078054770101
09-09-2024 20:05:47: Fitting SingleShotDetector
09-09-2024 20:06:28: Training completed
09-09-2024 20:06:28: Computing the network metrices
09-09-2024 20:06:45: Finished training SingleShotDetector.
09-09-2024 20:06:45: Exiting.
09-09-2024 20:06:45: Saving the model
09-09-2024 20:07:09: model saved at ~\AppData\Local\Temp\detecting_swimming_pools_using_satellite_image_and_deep_learning\models\AutoDL_SingleShotDetector_resnet34_2024-09-09_20-06-29
09-09-2024 20:07:10: Current network - RetinaNet. 
09-09-2024 20:07:10: Total time alloted to train the RetinaNet model is 0:00:30
09-09-2024 20:07:10: Maximum number of epochs will be 20 to train RetinaNet
09-09-2024 20:07:10: Initializing the RetinaNet network.
09-09-2024 20:49:13: RetinaNet initialized with resnet50 backbone
09-09-2024 20:49:13: finding desired batch size for the data object.
09-09-2024 20:49:13: Optimized batch size for RetinaNet with the selected backbone is 64
09-09-2024 20:49:40: Best learning rate for RetinaNet with the selected data is 9.120108393559096e-05
09-09-2024 20:49:40: Fitting RetinaNet
09-09-2024 20:50:15: Training completed
09-09-2024 20:50:15: Computing the network metrices
100.00% [46/46 00:11<00:00]
09-09-2024 20:50:27: Finished training RetinaNet.
09-09-2024 20:50:27: Exiting.
09-09-2024 20:50:27: Saving the model
Computing model metrics...
100.00% [46/46 00:11<00:00]
09-09-2024 20:50:48: model saved at ~\AppData\Local\Temp\detecting_swimming_pools_using_satellite_image_and_deep_learning\models\AutoDL_RetinaNet_resnet50_2024-09-09_20-50-15
09-09-2024 20:50:49: Current network - FasterRCNN. 
09-09-2024 20:50:49: Total time alloted to train the FasterRCNN model is 0:00:30
09-09-2024 20:50:49: Insufficient time to train the FasterRCNN for 20 epochs. 0 epochs can only be trained in the remaining time.
09-09-2024 20:50:49: The time left to train the FasterRCNN is not sufficent.
09-09-2024 20:50:49: Remaining networks will be skipped due to limited time, Stopping the training process.
09-09-2024 20:50:49: Collating and evaluating model performances.
09-09-2024 20:50:49: Exiting.
CPU times: total: 2h 9min 57s
Wall time: 46min 17s

Get the best model path

path = Path(os.getenv('APPDATA')).parents[0]
path = Path(path, 'Local\Temp\detecting_swimming_pools_using_satellite_image_and_deep_learning\models')
best_backbone = model._best_backbone
best_model = model.best_model
best_model, best_backbone
('SingleShotDetector', 'resnet34')
model.score() # Check the scores and other details of the trained models
Modeltrain_lossvalid_lossaverage_precision_scorelrtraining timebackbone
0SingleShotDetector450.839417155.4925690.0104270.0036310:01:56resnet34
1RetinaNet3.2924003.4143890.0004390.0000910:43:05resnet50
files = [f for f in path.iterdir() if f.is_dir()]
folder_list = []

for file in files:
    if best_model in file.name and best_backbone in file.name:
        folder = file.name
        folder_list.append(folder)
        
best_model_folder = sorted(folder_list)[-1]
emd_file = best_model_folder + '.emd'

path = Path(path, best_model_folder, emd_file)
path = os.path.abspath(path) #path of the best model
score = model.average_precision_score()
score.sort_values(by='pool', ascending=False)
Modelpool
0SingleShotDetector0.010427
0RetinaNet0.000439

Finetuning best model

from arcgis.learn import ImageryModel
finetune_model = ImageryModel()
finetune_model.load(path, data)
lr = finetune_model.lr_find()
<Figure size 640x480 with 1 Axes>
finetune_model.fit(epochs=10, lr=lr)
epochtrain_lossvalid_lossaverage_precisiontime
047.96236845.4689670.18539502:20
137.22979037.0990520.49950702:21
232.47641036.6121480.44037502:20
329.96686933.9425700.51138902:21
430.25608134.9169120.46147002:20
528.69529533.7801550.50186002:20
627.23210336.2813720.42541502:19
724.84466736.1771430.44251402:20
825.16933132.9322590.48155102:20
925.03821232.1949580.50343902:20
finetune_model.average_precision_score()
100.00% [46/46 00:10<00:00]
{'pool': 0.5034394905969962}
finetune_model.fit(epochs=10)
epochtrain_lossvalid_lossaverage_precisiontime
024.06322532.5825690.49515402:22
123.06049535.7080800.42442702:22
222.95293032.7523610.48719702:25
323.24148933.1413120.47700502:22
426.94701834.0392070.46599402:25
526.58854134.4675220.44804002:23
624.71182333.3740620.46677902:23
723.38918333.1378820.47522002:22
825.68192332.2329480.49146802:23
925.36654532.4063800.48974202:21
finetune_model.average_precision_score()
100.00% [46/46 00:10<00:00]
{'pool': 0.4914678402778204}

Detect and visualize swimming pools in validation set

Now we have the model, let's look at how the model performs. Here we plot out 5 rows of images and a threshold of 0.2. Threshold is a measure of probablity that a swimming pool exists. Higher value meas more confidence.

finetune_model.show_results(thresh=0.2)
<Figure size 800x2000 with 10 Axes>

As we can see, with only 20 epochs, we are already seeing reasonable results. Further improvment can be acheived through more sophisticated hyperparameter tuning. Let's save the model for further training or inference later. The model should be saved into a models folder in your folder. By default, it will be saved into your data_path that you specified in the very beginning of this notebook.

finetune_model.save('PoolDetection_USA_20')
Computing model metrics...
100.00% [46/46 00:09<00:00]
WindowsPath('~/AppData/Local/Temp/detecting_swimming_pools_using_satellite_image_and_deep_learning/models/PoolDetection_USA_20')

Part 3 - Model inference

To test our model, let's get a raster image with some swimming pools.

Visualize detected pools on map

predicted_result = gis.content.get('793d2060d14746d19ee4c45d3eda7724')
predicted_result
detected_pools
detected_pools
Feature Layer Collection by api_data_owner
Last Modified: June 14, 2022
0 comments, 122 views
result_map = gis.map('Redlands, CA')
result_map.content.add(naip_item.layers[0])
result_map.content.add(predicted_result.layers[0])
result_map.extent = {'spatialReference': {'latestWkid': 3857, 'wkid': 102100},
                      'xmin': -13041805.641506677,'ymin': 4032306.1515633,
                      'xmax': -13039489.838415546,'ymax': 4033022.7487034127}
result_map
<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=2001x752>

Conclusion

In thise notebook, we have covered a lot of ground. In part 1, we discussed how to export training data for deep learning using ArcGIS Pro, we demonstrated how to prepare the input data, train a object detection model, visualize the results, as well as apply the model to an unseen image using the Detect Objects Using Deep Learning tool in ArcGIS Pro.

References

[1] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu: “SSD: Single Shot MultiBox Detector”, 2015; arXiv:1512.02325.

Your browser is no longer supported. Please upgrade your browser for the best experience. See our browser deprecation post for more details.