Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Advanced Computer Vision with TensorFlow/Week 3 - Image Segmentation/Copy of C3_W3_Lab_2_OxfordPets-UNet.ipynb
Views: 13371
Ungraded Lab: U-Net for Image Segmentation
This notebook illustrates how to build a UNet for semantic image segmentation. This architecture is also a fully convolutional network and is similar to the model you just built in the previous lesson. A key difference is the use of skip connections from the encoder to the decoder. You will see how this is implemented later as you build each part of the network.
At the end of this lab, you will be able to use the UNet to output segmentation masks that shows which pixels of an input image are part of the background, foreground, and outline.
Imports
Download the Oxford-IIIT Pets dataset
You will be training the model on the Oxford Pets - IIT dataset dataset. This contains pet images, their classes, segmentation masks and head region-of-interest. You will only use the images and segmentation masks in this lab.
This dataset is already included in TensorFlow Datasets and you can simply download it. The segmentation masks are included in versions 3 and above. The cell below will download the dataset and place the results in a dictionary named dataset
. It will also collect information about the dataset and we'll assign it to a variable named info
.
Let's briefly examine the contents of the dataset you just downloaded.
Prepare the Dataset
You will now prepare the train and test sets. The following utility functions preprocess the data. These include:
simple augmentation by flipping the image
normalizing the pixel values
resizing the images
Another preprocessing step is to adjust the segmentation mask's pixel values. The README
in the annotations folder of the dataset mentions that the pixels in the segmentation mask are labeled as such:
Label | Class Name |
---|---|
1 | foreground |
2 | background |
3 | Not Classified |
For convenience, let's subtract 1
from these values and we will interpret these as {'pet', 'background', 'outline'}
:
Label | Class Name |
---|---|
0 | pet |
1 | background |
2 | outline |
You can now call the utility functions above to prepare the train and test sets. The dataset you downloaded from TFDS already contains these splits and you will use those by simpling accessing the train
and test
keys of the dataset
dictionary.
Note: The tf.data.experimental.AUTOTUNE
you see in this notebook is simply a constant equal to -1
. This value is passed to allow certain methods to automatically set parameters based on available resources. For instance, num_parallel_calls
parameter below will be set dynamically based on the available CPUs. The docstrings will show if a parameter can be autotuned. Here is the entry describing what it does to num_parallel_calls
.
Now that the splits are loaded, you can then prepare batches for training and testing.
Let's define a few more utilities to help us visualize our data and metrics.
Finally, you can take a look at an image example and it's correponding mask from the dataset.
Define the model
With the dataset prepared, you can now build the UNet. Here is the overall architecture as shown in class:
A UNet consists of an encoder (downsampler) and decoder (upsampler) with a bottleneck in between. The gray arrows correspond to the skip connections that concatenate encoder block outputs to each stage of the decoder. Let's see how to implement these starting with the encoder.
Encoder
Like the FCN model you built in the previous lesson, the encoder here will have repeating blocks (red boxes in the figure below) so it's best to create functions for it to make the code modular. These encoder blocks will contain two Conv2D layers activated by ReLU, followed by a MaxPooling and Dropout layer. As discussed in class, each stage will have increasing number of filters and the dimensionality of the features will reduce because of the pooling layer.
The encoder utilities will have three functions:
conv2d_block()
- to add two convolution layers and ReLU activationsencoder_block()
- to add pooling and dropout to the conv2d blocks. Recall that in UNet, you need to save the output of the convolution layers at each block so this function will return two values to take that into account (i.e. output of the conv block and the dropout)encoder()
- to build the entire encoder. This will return the output of the last encoder block as well as the output of the previous conv blocks. These will be concatenated to the decoder blocks as you'll see later.
Bottleneck
A bottleneck follows the encoder block and is used to extract more features. This does not have a pooling layer so the dimensionality remains the same. You can use the conv2d_block()
function defined earlier to implement this.
Decoder
Finally, we have the decoder which upsamples the features back to the original image size. At each upsampling level, you will take the output of the corresponding encoder block and concatenate it before feeding to the next decoder block. This is summarized in the figure below.
Putting it all together
You can finally build the UNet by chaining the encoder, bottleneck, and decoder. You will specify the number of output channels and in this particular set, that would be 3
. That is because there are three possible labels for each pixel: 'pet', 'background', and 'outline'.
Compile and Train the model
Now, all that is left to do is to compile and train the model. The loss you will use is sparse_categorical_crossentropy
. The reason is because the network is trying to assign each pixel a label, just like multi-class prediction. In the true segmentation mask, each pixel has either a {0,1,2}. The network here is outputting three channels. Essentially, each channel is trying to learn to predict a class and sparse_categorical_crossentropy
is the recommended loss for such a scenario.
You can plot the train and validation loss to see how the training went. This should show generally decreasing values per epoch.
Make predictions
The model is now ready to make some predictions. You will use the test dataset you prepared earlier to feed input images that the model has not seen before. The utilities below will help in processing the test dataset and model predictions.
Compute class wise metrics
Like the previous lab, you will also want to compute the IOU and Dice Score. This is the same function you used previously.
With all the utilities defined, you can now proceed to showing the metrics and feeding test images.
Show Predictions
That's all for this lab! In the next section, you will learn about another type of image segmentation model: Mask R-CNN for instance segmentation!