Path: blob/master/Advanced Computer Vision with TensorFlow/Week 1 - Introduction to Computer Vision/Copy of C3W1_Assignment.ipynb
14375 views
Predicting Bounding Boxes
Welcome to Course 3, Week 1 Programming Assignment!
In this week's assignment, you'll build a model to predict bounding boxes around images.
You will use transfer learning on any of the pre-trained models available in Keras.
You'll be using the Caltech Birds - 2010 dataset.
How to submit your work
Notice that there is not a "submit assignment" button in this notebook.
To check your work and get graded on your work, you'll train the model, save it and then upload the model to Coursera for grading.
0.1 Set up your Colab
As you cannot save the changes you make to this colab, you have to make a copy of this notebook in your own drive and run that.
You can do so by going to
File -> Save a copy in Drive
.Close this colab and open the copy which you have made in your own drive. Then continue to the next step to set up the data location.
Set up the data location
A copy of the dataset that you'll be using is stored in a publicly viewable Google Drive folder. You'll want to add a shortcut to it to your own Google Drive.
Go to this google drive folder named TF3 C3 W1 Data
Next to the folder name "TF3 C3 W1 Data", hover your mouse over the triangle to reveal the drop down menu.
Use the drop down menu to select
"Add shortcut to Drive"
A pop-up menu will open up.In the pop-up menu, "My Drive" is selected by default. Click the
ADD SHORTCUT
button. This should add a shortcut to the folderTF3 C3 W1 Data
within your own google drive at the locationcontent/drive
.To verify, go to the left-side menu and click on "My Drive". Scroll through your files to look for the shortcut TF3 C3 W1 Data.
Please make sure this happens, as you'll be reading the data for this notebook from this folder.
0.4 Mount your drive
Please run the next code cell and follow these steps to mount your Google Drive so that it can be accessed by this Colab.
Run the code cell below. A web link will appear below the cell.
Please click on the web link, which will open a new tab in your browser, which asks you to choose your google account.
Choose your google account to login.
The page will display "Google Drive File Stream wants to access your Google Account". Please click "Allow".
The page will now show a code (a line of text). Please copy the code and return to this Colab.
Paste the code the textbox that is labeled "Enter your authorization code:" and hit
<Enter>
The text will now say "Mounted at /content/drive/"
Store the path to the data.
Remember to follow the steps to
set up the data location
(above) so that you'll have a shortcut to the data in your Google Drive.
1.1 Bounding Boxes Utilities
We have provided you with some functions which you will use to draw bounding boxes around the birds in the image
.
draw_bounding_box_on_image
: Draws a single bounding box on an image.draw_bounding_boxes_on_image
: Draws multiple bounding boxes on an image.draw_bounding_boxes_on_image_array
: Draws multiple bounding boxes on an array of images.
1.2 Data and Predictions Utilities
We've given you some helper functions and code that are used to visualize the data and the model's predictions.
display_digits_with_boxes
: This displays a row of "digit" images along with the model's predictions for each image.plot_metrics
: This plots a given metric (like loss) as it changes over multiple epochs of training.
read_image_tfds
Resizes
image
to (224, 224)Normalizes
image
Translates and normalizes bounding boxes
read_image_with_shape
This is very similar to read_image_tfds
except it also keeps a copy of the original image (before pre-processing) and returns this as well.
Makes a copy of the original image.
Resizes
image
to (224, 224)Normalizes
image
Translates and normalizes bounding boxes
read_image_tfds_with_original_bbox
This function reads
image
fromdata
It also denormalizes the bounding boxes (it undoes the bounding box normalization that is performed by the previous two helper functions.)
dataset_to_numpy_util
This function converts a dataset
into numpy arrays of images and boxes.
This will be used when visualizing the images and their bounding boxes
dataset_to_numpy_with_original_bboxes_util
This function converts a
dataset
into numpy arrays oforiginal images
resized and normalized images
bounding boxes
This will be used for plotting the original images with true and predicted bounding boxes.
Visualize the training images and their bounding box labels
Visualize the validation images and their bounding boxes
2.3 Load and prepare the datasets for the model
These next two functions read and prepare the datasets that you'll feed to the model.
They use
read_image_tfds
to resize, and normalize each image and its bounding box label.They performs shuffling and batching.
You'll use these functions to create
training_dataset
andvalidation_dataset
, which you will give to the model that you're about to build.
3. Define the Network
Bounding box prediction is treated as a "regression" task, in that you want the model to output numerical values.
You will be performing transfer learning with MobileNet V2. The model architecture is available in TensorFlow Keras.
You'll also use pretrained
'imagenet'
weights as a starting point for further training. These weights are also readily availableYou will choose to retrain all layers of MobileNet V2 along with the final classification layers.
Note: For the following exercises, please use the TensorFlow Keras Functional API (as opposed to the Sequential API).
Exercise 1
Please build a feature extractor using MobileNetV2.
First, create an instance of the mobilenet version 2 model
Please check out the documentation for MobileNetV2
Set the following parameters:
input_shape: (height, width, channel): input images have height and width of 224 by 224, and have red, green and blue channels.
include_top: you do not want to keep the "top" fully connected layer, since you will customize your model for the current task.
weights: Use the pre-trained 'imagenet' weights.
Next, make the feature extractor for your specific inputs by passing the
inputs
into your mobilenet model.For example, if you created a model object called
some_model
and have inputs stored inx
, you'd invoke the model and pass in your inputs like this:some_model(x)
to get the feature extractor for your given inputsx
.
Note: please use mobilenet_v2 and not mobile_net or mobile_net_v3
Exercise 2
Next, you'll define the dense layers to be used by your model.
You'll be using the following layers
GlobalAveragePooling2D: pools the
features
.Flatten: flattens the pooled layer.
Dense: Add two dense layers:
A dense layer with 1024 neurons and a relu activation.
A dense layer following that with 512 neurons and a relu activation.
Note: Remember, please build the model using the Functional API syntax (as opposed to the Sequential API).
Exercise 3
Now you'll define a layer that outputs the bounding box predictions.
You'll use a Dense layer.
Remember that you have 4 units in the output layer, corresponding to (xmin, ymin, xmax, ymax).
The prediction layer follows the previous dense layer, which is passed into this function as the variable
x
.For grading purposes, please set the
name
parameter of this Dense layer to be `bounding_box'
Exercise 4
Now, you'll use those functions that you have just defined above to construct the model.
feature_extractor(inputs)
dense_layers(features)
bounding_box_regression(x)
Then you'll define the model object using Model. Set the two parameters:
inputs
outputs
Exercise 5
Define the input layer, define the model, and then compile the model.
inputs: define an Input layer
Set the
shape
parameter. Check your definition offeature_extractor
to see the expected dimensions of the input image.
model: use the
final_model
function that you just defined to create the model.compile the model: Check the Model documentation for how to compile the model.
Set the
optimizer
parameter to Stochastic Gradient Descent using SGDWhen using SGD, set the
momentum
to 0.9 and keep the default learning rate.
Set the loss function of SGD to mean squared error (see the SGD documentation for an example of how to choose mean squared error loss).
Run the cell below to define your model and print the model summary.
Your expected model summary:
4.1 Prepare to Train the Model
You'll fit the model here, but first you'll set some of the parameters that go into fitting the model.
EPOCHS: You'll train the model for 50 epochs
BATCH_SIZE: Set the
BATCH_SIZE
to an appropriate value. You can look at the ungraded labs from this week for some examples.length_of_training_dataset: this is the number of training examples. You can find this value by getting the length of
visualization_training_dataset
.Note: You won't be able to get the length of the object
training_dataset
. (You'll get an error message).
length_of_validation_dataset: this is the number of validation examples. You can find this value by getting the length of
visualization_validation_dataset
.Note: You won't be able to get the length of the object
validation_dataset
.
steps_per_epoch: This is the number of steps it will take to process all of the training data.
If the number of training examples is not evenly divisible by the batch size, there will be one last batch that is not the full batch size.
Try to calculate the number steps it would take to train all the full batches plus one more batch containing the remaining training examples. There are a couples ways you can calculate this.
You can use regular division
/
and importmath
to usemath.ceil()
Python math module docsAlternatively, you can use
//
for integer division,%
to check for a remainder after integer division, and anif
statement.
validation_steps: This is the number of steps it will take to process all of the validation data. You can use similar calculations that you did for the step_per_epoch, but for the validation dataset.
4.2 Fit the model to the data
Check out the parameters that you can set to fit the Model. Please set the following parameters.
x: this can be a tuple of both the features and labels, as is the case here when using a tf.Data dataset.
Please use the variable returned from
get_training_dataset()
.Note, don't set the
y
parameter when thex
is already set to both the features and labels.
steps_per_epoch: the number of steps to train in order to train on all examples in the training dataset.
validation_data: this is a tuple of both the features and labels of the validation set.
Please use the variable returned from
get_validation_dataset()
validation_steps: teh number of steps to go through the validation set, batch by batch.
epochs: the number of epochs.
If all goes well your model's training will start.
5.4 Evaluate performance using IoU
You can see how well your model predicts bounding boxes on the validation set by calculating the Intersection-over-union (IoU) score for each image.
You'll find the IoU calculation implemented for you.
Predict on the validation set of images.
Apply the
intersection_over_union
on these predicted bounding boxes.