Path: blob/master/Advanced Computer Vision with TensorFlow/Week 1 - Introduction to Computer Vision/Copy of C3_W1_Lab_3_Object_Localization.ipynb
14373 views
Image Classification and Object Localization
In this lab, you'll build a CNN from scratch to:
classify the main subject in an image
localize it by drawing bounding boxes around it.
You'll use the MNIST dataset to synthesize a custom dataset for the task:
Place each "digit" image on a black canvas of width 75 x 75 at random locations.
Calculate the corresponding bounding boxes for those "digits".
The bounding box prediction can be modelled as a "regression" task, which means that the model will predict a numeric value (as opposed to a category).
Imports
Visualization Utilities
These functions are used to draw bounding boxes around the digits.
These utilities are used to visualize the data and predictions.
Selecting Between Strategies
TPU or GPU detection
Depending on the hardware available, you'll use different distribution strategies. For a review on distribution strategies, please check out the second course in this specialization "Custom and Distributed Training with TensorFlow", week 4, "Distributed Training".
If the TPU is available, then you'll be using the TPU Strategy. Otherwise:
If more than one GPU is available, then you'll use the Mirrored Strategy
If one GPU is available or if just the CPU is available, you'll use the default strategy.
Parameters
The global batch size is the batch size per replica (64 in this case) times the number of replicas in the distribution strategy.
Loading and Preprocessing the Dataset
Define some helper functions that will pre-process your data:
read_image_tfds
: randomly overlays the "digit" image on top of a larger canvas.get_training_dataset
: loads data and splits it to get the training set.get_validation_dataset
: loads and splits the data to get the validation set.
Visualize Data
Define the Network
Here, you'll define your custom CNN.
feature_extractor
: these convolutional layers extract the features of the image.classifier
: This define the output layer that predicts among 10 categories (digits 0 through 9)bounding_box_regression
: This defines the output layer that predicts 4 numeric values, which define the coordinates of the bounding box (xmin, ymin, xmax, ymax)final_model
: This combines the layers for feature extraction, classification and bounding box prediction.Notice that this is another example of a branching model, because the model splits to produce two kinds of output (a category and set of numbers).
Since you've learned to use the Functional API earlier in the specialization (course 1), you have the flexibility to define this kind of branching model!
define_and_compile_model
: choose the optimizer and metrics, then compile the model.
Train and validate the model
Train the model.
You can choose the number of epochs depending on the level of performance that you want and the time that you have.
Each epoch will take just a few seconds if you're using the TPU.
Intersection over union
Calculate the I-O-U metric to evaluate the model's performance.
Visualize predictions
The following code will make predictions and visualize both the classification and the predicted bounding boxes.
The true bounding box labels will be in green, and the model's predicted bounding boxes are in red.
The predicted number is shown below the image.