Training a deep learning model is a long and iterative process and, hence, it is important to have a tool to visualize the progress of the model training and monitor the learning process.
TensorBoard
is an open source toolkit which enables us to understand training progress and improve model performance by updating the hyperparameters. TensorBoard
toolkit displays a dashboard where the logs can be visualized as graphs, images, histograms, embeddings, text etc. It also helps in tracking information like gradients, losses, metrics, and intermediate outputs [1, 2].
arcgis.learn
module integrates TensorBoard toolkit to the model training process which now makes it possible for us to monitor model training process. In this guide, we will learn how model training can be monitored using TensorBoard
within arcgis.learn
module.
Note: TensorBoard
is supported in ArcGIS API for Python version 1.8.3 and later.
Prerequisite
The specific Python libraries mentioned below need to be installed in your deep learning environment.
pip install tensorboard=2.2.1
pip install tensorboardX=2.1
Model training with TensorBoard
The arcis.learn
module currently supports TensorBoard
for the following models listed:
SingleShotDetector
UnetClassifier
FeatureClassifier
RetinaNet
PSPNetClassifier
MaskRCNN
DeepLab
FasterRCNN
SuperResolution
Pix2Pix
CycleGAN
ImageCaptioner
MultiTaskRoadExtractor
from arcgis.learn import UnetClassifier , prepare_data
data_path = r'training_data'
data = prepare_data(data_path, batch_size=4)
unet_model = UnetClassifier(data) # Choose the model you want to use for training from the above mentioned list
After instantiating the model object, we now train the model using model.fit()
method along with TensorBoard
flag set to True , we can train the model for specified number for epochs while also visualizing it using TensorBoard
. By default,the TensorBoard
parameter is set to False.
unet_model.fit(2, lr=0.0001, tensorboard=True)
Monitor training on Tensorboard using the following command: 'tensorboard --host=DELDEVAL047 --logdir="C:\Users\Karthik\Desktop\Base\Tensorboard\Kent_LULC\training_log"'
epoch | train_loss | valid_loss | accuracy | dice | time |
---|---|---|---|---|---|
0 | 1.489619 | 1.355104 | 0.522247 | 0.522247 | 00:25 |
1 | 1.323257 | 1.155571 | 0.593830 | 0.593830 | 00:24 |
The command that needs to be run to access the TensorBoard is printed as shown above when the TensorBoard flag is enabled. If the user does not have the libraries installed which are mentioned in the Prerequisite, the model training continues. However, a warning message will be displayed that prompts the user to install the required libraries.
Launch TensorBoard on a browser
To Visualize the TensorBoard on your default web browser, the command printed during the training phase should be executed on an anaconda prompt as shown below and the user will get a message as shown
It is possible to run TensorBoard on a different port by passing the required port number in the command (Ex: --port=8008). The default port used is port 6006.
The TensorBoard is now accessible on any web browser by typing the URL that gets printed when TensorBoard command is executed. (Highlighted above). Doing this will open up TensorBoard on the URL:
- In the tab 'SCALARS' various graphs related to different metrics and stats can be visualized.
- In the tab 'IMAGES' the intermittent outputs of the model get displayed as shown below. Using this feature, we can compare the outputs of the model across different epochs and compare visually the model outputs across different runs of the model.
This can be done even while the training process is ongoing as these graphs and images get updated at the end of each epoch and does not wait until the entire training process to get completed.