Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Advanced Computer Vision with TensorFlow/Week 2 - Object Detection/Copy of C3W2_Assignment.ipynb
Views: 13370
Week 2 Assignment: Zombie Detection
Welcome to this week's programming assignment! You will use the Object Detection API and retrain RetinaNet to spot Zombies using just 5 training images. You will setup the model to restore pretrained weights and fine tune the classification layers.
Important: This colab notebook has read-only access so you won't be able to save your changes. If you want to save your work periodically, please click File -> Save a Copy in Drive
to create a copy in your account, then work from there.
Installation
You'll start by installing the Tensorflow 2 Object Detection API.
Imports
Let's now import the packages you will use in this assignment.
Exercise 1: Import Object Detection API packages
Import the necessary modules from the object_detection
package.
From the utils package:
config_util: You'll use this to read model configurations from a .config file and then modify that configuration
visualization_utils: please give this the alias
viz_utils
, as this is what will be used in some visualization code that is given to you later.
From the builders package:
model_builder: This builds your model according to the model configuration that you'll specify.
Utilities
You'll define a couple of utility functions for loading images and plotting detections. This code is provided for you.
Download the Zombie data
Now you will get 5 images of zombies that you will use for training.
The zombies are hosted in a Google bucket.
You can download and unzip the images into a local
training/
directory by running the cell below.
Exercise 2: Visualize the training images
Next, you'll want to inspect the images that you just downloaded.
Please replace instances of
None
below to load and visualize the 5 training images.You can inspect the training directory (using the
Files
button on the left side of this Colab) to see the filenames of the zombie images. The paths for the images will look like this:
To set file paths, you'll use os.path.join. As an example, if you wanted to create the path './parent_folder/file_name1.txt', you could write:
os.path.join('parent_folder', 'file_name', str(1), '.txt')
You should see the 5 training images after running this cell. If not, please inspect your code, particularly the
image_path
.
In this section, you will create your ground truth boxes. You can either draw your own boxes or use a prepopulated list of coordinates that we have provided below.
Option 1: draw your own ground truth boxes
If you want to draw your own, please run the next cell and the following test code. If not, then skip these optional cells.
Draw a box around the zombie in each image.
Click the
next image
button to go to the next imageClick
submit
when it says "All images completed!!".
Make sure to not make the bounding box too big.
If the box is too big, the model might learn the features of the background (e.g. door, road, etc) in determining if there is a zombie or not.
Include the entire zombie inside the box.
As an example, scroll to the beginning of this notebook to look at the bounding box around the zombie.
View your ground truth box coordinates
Whether you chose to draw your own or use the given boxes, please check your list of ground truth box coordinates.
Below, we add the class annotations. For simplicity, we assume just a single class, though it should be straightforward to extend this to handle multiple classes. We will also convert everything to the format that the training loop expects (e.g., conversion to tensors, one-hot representations, etc.).
Exercise 3: Define the category index dictionary
You'll need to tell the model which integer class ID to assign to the 'zombie' category, and what 'name' to associate with that integer id.
zombie_class_id: By convention, class ID integers start numbering from 1,2,3, onward.
If there is ever a 'background' class, it could be assigned the integer 0, but in this case, you're just predicting the one zombie class.
Since you are just predicting one class (zombie), please assign
1
to the zombie class ID.
category_index: Please define the
category_index
dictionary, which will have the same structure as this:
Define
category_index
similar to the example dictionary above, except for zombies.This will be used by the succeeding functions to know the class
id
andname
of zombie images.num_classes: Since you are predicting one class, please assign
1
to the number of classes that the model will predict.This will be used during data preprocessing and again when you configure the model.
Expected Output:
Data preprocessing
You will now do some data preprocessing so it is formatted properly before it is fed to the model:
Convert the class labels to one-hot representations
convert everything (i.e. train images, gt boxes and class labels) to tensors.
This code is provided for you.
Visualize the zombies with their ground truth bounding boxes
You should see the 5 training images with the bounding boxes after running the cell below. If not, please re-run the annotation tool again or use the prepopulated gt_boxes
array given.
Download the checkpoint containing the pre-trained weights
Next, you will download RetinaNet and copy it inside the object detection directory.
When working with models that are at the frontiers of research, the models and checkpoints may not yet be organized in a central location like the TensorFlow Garden (https://github.com/tensorflow/models).
You'll often read a blog post from the researchers, who will usually provide information on:
how to use the model
where to download the models and pre-trained checkpoints.
It's good practice to do some of this "detective work", so that you'll feel more comfortable when exploring new models yourself! So please try the following steps:
Go to the TensorFlow Blog, where researchers announce new findings.
In the search box at the top of the page, search for "retinanet".
In the search results, click on the blog post titled "TensorFlow 2 meets the Object Detection API" (it may be the first search result).
Skim through this blog and look for links to either the checkpoints or to Colabs that will show you how to use the checkpoints.
Try to fill out the following code cell below, which does the following:
Download the compressed SSD Resnet 50 version 1, 640 x 640 checkpoint.
Untar (decompress) the tar file
Move the decompressed checkpoint to
models/research/object_detection/test_data/
If you want some help getting started, please click on the "Initial Hints" cell to get some hints.
Initial Hints
General Hints to get started
- The link to the blog is TensorFlow 2 meets the Object Detection API
- In the blog, you'll find the text "COCO pre-trained weights, which links to a list of checkpoints in GitHub titled TensorFlow 2 Detection Model Zoo.
- If you read each checkpoint name, you'll find the one for SSD Resnet 50 version 1, 640 by 640. If you hover your mouse over
- If you right-click on the desired checkpoint link, you can save the link address, and use it in the code cell below to get the checkpoint.
- For more hints, please click on the cell "More Hints"
More Hints
More Hints
- To see how to download the checkpoint, look in the blog for links to Colab tutorials.
- For example, the blog links to a Colab titled Intro to Object Detection Colab
- In the Colab, you'll see the section titled "Build a detection model and load pre-trained model weights", which is followed by a code cell showing how to download, decompress, and relocate a checkpoint. Use similar syntax, except use the URL to the ssd resnet50 version 1 640x640 checkpoint instead.
- If you're feeling stuck, please click on the cell "Even More Hints".
Even More Hints
Even More Hints
- The blog post also links to a notebook titled Eager Few Shot Object Detection Colab
- In this notebook, look for the section titled "Create model and restore weights for all but last layer". The code cell below it shows how to download the exact checkpoint that you're interested in.
- You can also review the lecture videos for this week, which show the same code.
Configure the model
Here, you will configure the model for this use case.
Exercise 5.1: Locate and read from the configuration file
pipeline_config
In the Colab, on the left side table of contents, click on the folder icon to display the file browser for the current workspace.
Navigate to
models/research/object_detection/configs/tf2
. The folder has multiple .config files.Look for the file corresponding to ssd resnet 50 version 1 640x640.
You can double-click the config file to view its contents. This may help you as you complete the next few code cells to configure your model.
Set the
pipeline_config
to a string that contains the full path to the resnet config file, in other words:models/research/.../... .config
configs
If you look at the module config_util that you imported, it contains the following function:
Please use this function to load the configuration from your
pipeline_config
.configs
will now contain a dictionary.
Exercise 5.3: Modify model_config
Modify num_classes from the default
90
to thenum_classes
that you set earlier in this notebook.num_classes is nested under ssd. You'll need to use dot notation 'obj.x' and NOT bracket notation obj['x']` to access num_classes.
Freeze batch normalization
Batch normalization is not frozen in the default configuration.
If you inspect the
model_config
object, you'll see thatfreeze_batchnorm
is nested underssd
just likenum_classes
.Freeze batch normalization by setting the relevant field to
True
.
Build the model
Recall that you imported model_builder.
You'll use
model_builder
to build the model according to the configurations that you have just downloaded and customized.
Exercise 5.4: Build the custom model
model_builder
model_builder has a function build
:
model_config: Set this to the model configuration that you just customized.
is_training: Set this to True.
You can keep the default value for the remaining parameter.
Note that it will take some time to build the model.
Expected Output:
Restore weights from your checkpoint
Now, you will selectively restore weights from your checkpoint.
Your end goal is to create a custom model which reuses parts of, but not all of the layers of RetinaNet (currently stored in the variable
detection_model
.)The parts of RetinaNet that you want to reuse are:
Feature extraction layers
Bounding box regression prediction layer
The part of RetinaNet that you will not want to reuse is the classification prediction layer (since you will define and train your own classification layer specific to zombies).
For the parts of RetinaNet that you want to reuse, you will also restore the weights from the checkpoint that you selected.
Inspect the detection_model
First, take a look at the type of the detection_model and its Python class.
Find the source code for detection_model
You'll see that the type of the model is object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch
. Please practice some detective work and open up the source code for this class in GitHub repository. Recall that at the start of this assignment, you cloned from this repository: TensorFlow Models.
Navigate through these subfolders: models -> research -> object_detection.
If you get stuck, go to this link: object_detection
Take a look at this 'object_detection' folder and look for the remaining folders to navigate based on the class type of detection_model: object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch
Hopefully you'll find the meta_architectures folder, and within it you'll notice a file named
ssd_meta_arch.py
.Please open and view this ssd_meta_arch.py file.
View the variables in detection_model
Now, check the class variables that are in detection_model
.
You'll see that detection_model contains several variables:
Two of these will be relevant to you:
Inspect _feature_extractor
Take a look at the ssd_meta_arch.py code.
Also
So detection_model._feature_extractor
is a feature extractor, which you will want to reuse for your zombie detector model.
Inspect _box_predictor
View the ssd_meta_arch.py file (which is the source code for detection_model)
Notice that in the init constructor for class SSDMetaArch(model.DetectionModel),
Inspect _box_predictor
Please take a look at the class type of detection_model._box_predictor
You'll see that the class type of _box_predictor is
You can navigate through the GitHub repository to this path:
Notice that there is a file named convolutional_keras_box_predictor.py. Please open that file.
View variables in _box_predictor
Also view the variables contained in _box_predictor:
Among the variables listed, a few will be relevant to you:
In the source code for convolutional_keras_box_predictor.py that you just opened, look at the source code to get a sense for what these three variables represent.
Inspect base_tower_layers_for_heads
If you look at the convolutional_keras_box_predictor.py file, you'll notice this:
base_tower_layers_for_heads
is a dictionary with two key-value pairs.BOX_ENCODINGS
: points to a list of layersCLASS_PREDICTIONS_WITH_BACKGROUND
: points to a list of layersIf you scan the code, you'll see that for both of these, the lists are filled with all layers that appear BEFORE the prediction layer.
So detection_model.box_predictor._base_tower_layers_for_heads
contains:
The layers for the prediction before the final bounding box prediction
The layers for the prediction before the final class prediction.
You will want to use these in your model.
Inspect _box_prediction_head
If you again look at convolutional_keras_box_predictor.py file, you'll see this
So detection_model.box_predictor._box_prediction_head
points to the bounding box prediction layer, which you'll want to use for your model.
Inspect _prediction_heads
If you again look at convolutional_keras_box_predictor.py file, you'll see this
You'll also see this docstring
So detection_model.box_predictor._prediction_heads
is a dictionary that points to both prediction layers:
The layer that predicts the bounding boxes
The layer that predicts the class (category).
Which layers will you reuse?
Remember that you are reusing the model for its feature extraction and bounding box detection.
You will create your own classification layer and train it on zombie images.
So you won't need to reuse the class prediction layer of
detection_model
.
Define checkpoints for desired layers
You will now isolate the layers of detection_model
that you wish to reuse so that you can restore the weights to just those layers.
First, define checkpoints for the box predictor
Next, define checkpoints for the model, which will point to this box predictor checkpoint as well as the feature extraction layers.
Please use tf.train.Checkpoint.
As a reminder of how to use tf.train.Checkpoint:
Pretend that detection_model
contains these variables for which you want to restore weights:
detection_model._ice_cream_sundae
detection_model._pies._apple_pie
detection_model._pies._pecan_pie
Notice that the pies are nested within ._pies
.
If you just want the ice cream sundae and apple pie variables (and not the pecan pie) then you can do the following:
Next, in order to connect these together in a node graph, do this:
Finally, define a checkpoint that uses the key model
and takes in the tmp_model_checkpoint.
You'll then be ready to restore the weights from the checkpoint that you downloaded.
Try this out step by step!
Exercise 6.1: Define Checkpoints for the box predictor
Please define
box_predictor_checkpoint
to be checkpoint for these two layers of thedetection_model
's box predictor:The base tower layer (the layers the precede both the class prediction and bounding box prediction layers).
The box prediction head (the prediction layer for bounding boxes).
Note, you won't include the class prediction layer.
Expected output
You should expect to see a list of variables that include the following:
Expected output
Among the variables of this checkpoint, you should see:
Exercise 6.3: Restore the checkpoint
You can now restore the checkpoint.
First, find and set the checkpoint_path
checkpoint_path:
Using the "files" browser in the left side of Colab, navigate to
models -> research -> object_detection -> test_data
.If you completed the previous code cell that downloads and moves the checkpoint, you'll see a subfolder named "checkpoint".
The 'checkpoint' folder contains three files:
checkpoint
ckpt-0.data-00000-of-00001
ckpt-0.index
Please set checkpoint_path to the path to the full path
models/.../ckpt-0
Notice that you don't want to include a file extension after
ckpt-0
.
IMPORTANT: Please don't set the path to include the
.index
extension in the checkpoint file name.If you do set it to
ckpt-0.index
, there won't be any immediate error message, but later during training, you'll notice that your model's loss doesn't improve, which means that the pre-trained weights were not restored properly.
Next, define one last checkpoint using tf.train.Checkpoint()
.
For the single keyword argument,
Set the key as
model=
Set the value to your temporary model checkpoint that you just defined.
IMPORTANT: You'll need to set the keyword argument as
model=
and not something else likedetection_model=
.If you set this keyword argument to anything else, it won't show an immmediate error, but when you train your model on the zombie images, your model loss will not decrease (your model will not learn).
Finally, call this checkpoint's .restore()
function, passing in the path to the checkpoint.
Exercise 7: Run a dummy image to generate the model variables
Run a dummy image through the model so that variables are created. We need to select the trainable variables later in Exercise 9 and right now, it is still empty. Try running len(detection_model.trainable_variables)
in a code cell and you will get 0
. We will pass in a dummy image through the forward pass to create these variables.
Recall that detection_model
is an object of type object_detection.meta_architectures.ssd_meta_arch.SSDMetaArch
Important methods that are available in the detection_model
object are:
takes in a tensor representing an image and returns
returns
image, shapes
For the dummy image, you can declare a tensor of zeros that has a shape that the
preprocess()
method can accept (i.e. [batch, height, width, channels]).Remember that your images have dimensions 640 x 640 x 3.
You can pass in a batch of 1 when making the dummy image.
takes in
image, shapes
which are created by thepreprocess()
function call.returns a prediction in a Python dictionary
this will pass the dummy image through the forward pass of the network and create the model variables
Takes in the prediction_dict and shapes
returns a dictionary of post-processed predictions of detected objects ("detections").
Note: Please use the recommended variable names, which include the prefix tmp_
, since these variables won't be used later, but you'll define similarly-named variables later for predicting on actual zombie images.
Expected Output:
Eager mode custom training loop
With the data and model now setup, you can now proceed to configure the training.
Exercise 8: Set training hyperparameters
Set an appropriate learning rate and optimizer for the training.
batch_size: you can use 4
You can increase the batch size up to 5, since you have just 5 images for training.
num_batches: You can use 100
You can increase the number of batches but the training will take longer to complete.
learning_rate: You can use 0.01
When you run the training loop later, notice how the initial loss INCREASES` before decreasing.
You can try a lower learning rate to see if you can avoid this increased loss.
optimizer: you can use tf.keras.optimizers.SGD
Set the learning rate
Set the momentum to 0.9
Training will be fairly quick, so we do encourage you to experiment a bit with these hyperparameters!
Choose the layers to fine-tune
To make use of transfer learning and pre-trained weights, you will train just certain parts of the detection model, namely, the last prediction layers.
Please take a minute to inspect the layers of
detection_model
.
Notice that there are some layers whose names are prefixed with the following:
Among these, which do you think are the prediction layers at the "end" of the model?
Recall that when inspecting the source code to restore the checkpoints (convolutional_keras_box_predictor.py) you noticed that:
_base_tower_layers_for_heads
: refers to the layers that are placed right before the prediction layer_box_prediction_head
refers to the prediction layer for the bounding boxes_prediction_heads
: refers to the set of prediction layers (both for classification and for bounding boxes)
So you can see that in the source code for this model, "tower" refers to layers that are before the prediction layer, and "head" refers to the prediction layers.
Exercise 9: Select the prediction layer variables
Based on inspecting the detection_model.trainable_variables
, please select the prediction layer variables that you will fine tune:
The bounding box head variables (which predict bounding box coordinates)
The class head variables (which predict the class/category)
You have a few options for doing this:
You can access them by their list index:
Alternatively, you can use string matching to select the variables:
Hint: There are a total of four variables that you want to fine tune.
Expected Output:
Train your model
You'll define a function that handles training for one batch, which you'll later use in your training loop.
First, walk through these code cells to learn how you'll perform training using this model.
The detection_model
is of class SSDMetaArch, and its source code shows that is has this function preprocess.
This preprocesses the images so that they can be passed into the model (for training or prediction):
You can pre-process each image and save their outputs into two separate lists
One list of the preprocessed images
One list of the true shape for each preprocessed image
Make a prediction
The detection_model
also has a .predict
function. According to the source code for predict
Notice that .predict
takes its inputs as tensors. If you tried to pass in the preprocessed images and true shapes, you'll get an error.
But don't worry! You can check how to properly use predict
:
Notice that the source code documentation says that
preprocessed_inputs
andtrue_image_shapes
are expected to be tensors and not lists of tensors.One way to turn a list of tensors into a tensor is to use tf.concat
Now you can make predictions for the images. According to the source code, predict
returns a dictionary containing the prediction information, including:
The bounding box predictions
The class predictions
Calculate loss
Now that your model has made its prediction, you want to compare it to the ground truth in order to calculate a loss.
The
detection_model
has a loss function.
It takes in:
The prediction dictionary that comes from your call to
.predict()
.the true images shape that comes from your call to
.preprocess()
followed by the conversion from a list to a tensor.
Try calling .loss
. You'll see an error message that you'll addres in order to run the .loss
function.
This is giving an error about groundtruth_classes_list:
Notice in the docstring for loss
(shown above), it says:
So you'll first want to set the ground truth (true labels and true bounding boxes) before you calculate the loss.
This makes sense, since the loss is comparing the prediction to the ground truth, and so the loss function needs to know the ground truth.
Provide the ground truth
The source code for providing the ground truth is located in the parent class of SSDMetaArch
, model.DetectionModel
.
Here is the link to the code for provide_ground_truth
You'll set two parameters in provide_ground_truth
:
The true bounding boxes
The true classes
Now you can calculate the loss
You can now calculate the gradient and optimize the variables that you selected to fine tune.
Use tf.GradientTape
Exercise 10: Define the training step
Please complete the function below to set up one training step.
Preprocess the images
Make a prediction
Calculate the loss (and make sure the loss function has the ground truth to compare with the prediction)
Calculate the total loss:
total_loss
=localization_loss + classification_loss
Note: this is different than the example code that you saw above
Calculate gradients with respect to the variables you selected to train.
Optimize the model's variables
Run the training loop
Run the training loop using the training step function that you just defined.
Expected Output:
Total loss should be decreasing and should be less than 1 after fine tuning. For example:
Load test images and run inference with new model!
You can now test your model on a new set of images. The cell below downloads 237 images of a walking zombie and stores them in a results/
directory.
You will load these images into numpy arrays to prepare it for inference.
You can now loop through the test images and get the detection scores and bounding boxes to overlay in the original image. We will save each result in a results
dictionary and the autograder will use this to evaluate your results.
Expected Output: Ideally the three boolean values at the bottom should be True
. But if you only get two, you can still try submitting. This compares your resulting bounding boxes for each zombie image to some preloaded coordinates (i.e. the hardcoded values in the test cell above). Depending on how you annotated the training images,it's possible that some of your results differ for these three frames but still get good results overall when all images are examined by the grader. If two or all are False, please try annotating the images again with a tighter bounding box or use the predefined gt_boxes
list.
You can also check if the model detects a zombie class in the images by examining the scores
key of the results
dictionary. You should get higher than 88.0 here.
You can also display some still frames and inspect visually. If you don't see a bounding box around the zombie, please consider re-annotating the ground truth or use the predefined gt_boxes
here
Create a zip of the zombie-walk images.
You can download this if you like to create your own animations
Create Zombie animation
Unfortunately, using IPyImage
in the notebook (as you've done in the rubber ducky detection tutorial) for the large gif
generated will disconnect the runtime. To view the animation, you can instead use the Files
pane on the left and double-click on zombie-anim.gif
. That will open a preview page on the right. It will take 2 to 3 minutes to load and see the walking zombie.
Save results file for grading
Run the cell below to save your results. Download the results.data
file and upload it to the grader in the classroom.
Congratulations on completing this assignment! Please go back to the Coursera classroom and upload results.data
to the Graded Lab item for Week 2.