Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Advanced Computer Vision with TensorFlow/Week 3 - Image Segmentation/Copy of C3W3_Assignment.ipynb
Views: 13370
Week 3 Assignment: Image Segmentation of Handwritten Digits
In this week's assignment, you will build a model that predicts the segmentation masks (pixel-wise label map) of handwritten digits. This model will be trained on the M2NIST dataset, a multi digit MNIST. If you've done the ungraded lab on the CamVid dataset, then many of the steps here will look familiar.
You will build a Convolutional Neural Network (CNN) from scratch for the downsampling path and use a Fully Convolutional Network, FCN-8, to upsample and produce the pixel-wise label map. The model will be evaluated using the intersection over union (IOU) and Dice Score. Finally, you will download the model and upload it to the grader in Coursera to get your score for the assignment.
Exercises
We've given you some boilerplate code to work with and these are the 5 exercises you need to fill out before you can successfully get the segmentation masks.
Imports
As usual, let's start by importing the packages you will use in this lab.
Download the dataset
M2NIST is a multi digit MNIST. Each image has up to 3 digits from MNIST digits and the corresponding labels file has the segmentation masks.
The dataset is available on Kaggle and you can find it here
To make it easier for you, we're hosting it on Google Cloud so you can download without Kaggle credentials.
Load and Preprocess the Dataset
This dataset can be easily preprocessed since it is available as Numpy Array Files (.npy)
combined.npy has the image files containing the multiple MNIST digits. Each image is of size 64 x 84 (height x width, in pixels).
segmented.npy has the corresponding segmentation masks. Each segmentation mask is also of size 64 x 84.
This dataset has 5000 samples and you can make appropriate training, validation, and test splits as required for the problem.
With that, let's define a few utility functions for loading and preprocessing the dataset.
You can now load the preprocessed dataset and define the training, validation, and test sets.
Let's Take a Look at the Dataset
You may want to visually inspect the dataset before and after training. Like above, we've included utility functions to help show a few images as well as their annotations (i.e. labels).
You can view a subset of the images from the dataset with the list_show_annotation()
function defined above. Run the cells below to see the image on the left and its pixel-wise ground truth label map on the right.
You see from the images above the colors assigned to each class (i.e 0 to 9 plus the background). If you don't like these colors, feel free to rerun the cell where colors
is defined to get another set of random colors. Alternatively, you can assign the RGB values for each class instead of relying on random values.
Define the Model
As discussed in the lectures, the image segmentation model will have two paths:
Downsampling Path - This part of the network extracts the features in the image. This is done through a series of convolution and pooling layers. The final output is a reduced image (because of the pooling layers) with the extracted features. You will build a custom CNN from scratch for this path.
Upsampling Path - This takes the output of the downsampling path and generates the predictions while also converting the image back to its original size. You will use an FCN-8 decoder for this path.
Define the Basic Convolution Block
Exercise 1
Please complete the function below to build the basic convolution block for our CNN. This will have two Conv2D layers each followed by a LeakyReLU, then max pooled and batch-normalized. Use the functional syntax to stack these layers.
When defining the Conv2D layers, note that our data inputs will have the 'channels' dimension last. You may want to check the data_format
argument in the docs regarding this. Take note of the padding
argument too like you did in the ungraded labs.
Expected Output:
Please pay attention to the (type) and Output Shape columns. The Layer name beside the type may be different depending on how many times you ran the cell (e.g. input_7
can be input_1
)
Define the Downsampling Path
Exercise 2
Now that we've defined the building block of our encoder, you can now build the downsampling path. Please complete the function below to create the encoder. This should chain together five convolution building blocks to create a feature extraction CNN minus the fully connected layers.
Notes:
To optimize processing, it is best to resize the images to have dimension sizes in the power of 2. We know that our dataset images have the size 64 x 84. 64 is already a power of 2. 84, on the other hand, is not and needs to be padded to 96. You can refer to the ZeroPadding2D layer on how to do this. Remember that you will only pad the width (84) and not the height (64).
We recommend keeping the pool size and stride parameters constant at 2
Expected Output:
You should see the layers of your conv_block()
being repeated 5 times like the output below.
Define the FCN-8 decoder
Exercise 3
Now you can define the upsampling path taking the outputs of convolutions at each stage as arguments. This will be very similar to what you did in the ungraded lab (VGG16-FCN8-CamVid) so you can refer to it if you need a refresher.
Note: remember to set the
data_format
parameter for the Conv2D layers.
Here is also the diagram you saw in class on how it should work:
Expected Output:
Define the Complete Model
The downsampling and upsampling paths can now be combined as shown below.
Compile the Model
Model Training
Expected Output:
The losses should generally be decreasing and the accuracies should generally be increasing. For example, observing the first 4 epochs should output something similar:
Model Evaluation
Make Predictions
Let's get the predictions using our test dataset as input and print the shape.
As you can see, the resulting shape is (192, 64, 84, 11)
. This means that for each of the 192 images that we have in our test set, there are 11 predictions generated (i.e. one for each class: 0 to 1 plus background).
Thus, if you want to see the probability of the upper leftmost pixel of the 1st image belonging to class 0, then you can print something like results[0,0,0,0]
. If you want the probability of the same pixel at class 10, then do results[0,0,0,10]
.
What we're interested in is to get the index of the highest probability of each of these 11 slices and combine them in a single image. We can do that by getting the argmax at this axis.
The new array generated per image now only specifies the indices of the class with the highest probability. Let's see the output class of the upper most left pixel. As you might have observed earlier when you inspected the dataset, the upper left corner is usually just part of the background (class 10). The actual digits are written somewhere in the middle parts of the image.
We will use this results
array when we evaluate our predictions.
Metrics
We showed in the lectures two ways to evaluate your predictions. The intersection over union (IOU) and the dice score. Recall that:
The code below does that for you as you've also seen in the ungraded lab. A small smoothing factor is introduced in the denominators to prevent possible division by zero.
Visualize Predictions
Compute IOU Score and Dice Score of your model
Save the Model
Once you're satisfied with the results, you will need to save your model so you can upload it to the grader in the Coursera classroom. After running the cell below, please look for student_model.h5
in the File Explorer on the left and download it. Then go back to the Coursera classroom and upload it to the Lab item that points to the autograder of Week 3.
Congratulations on completing this assignment on image segmentation!