Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Advanced Computer Vision with TensorFlow/Week 3 - Image Segmentation/Copy of C3_W3_Lab_1_VGG16-FCN8-CamVid.ipynb
Views: 13370
Ungraded Lab: Fully Convolutional Neural Networks for Image Segmentation
This notebook illustrates how to build a Fully Convolutional Neural Network for semantic image segmentation.
You will train the model on a custom dataset prepared by divamgupta. This contains video frames from a moving vehicle and is a subsample of the CamVid dataset.
You will be using a pretrained VGG-16 network for the feature extraction path, then followed by an FCN-8 network for upsampling and generating the predictions. The output will be a label map (i.e. segmentation mask) with predictions for 12 classes. Let's begin!
Imports
Download the Dataset
We hosted the dataset in a Google bucket so you will need to download it first and unzip to a local directory.
The dataset you just downloaded contains folders for images and annotations. The images contain the video frames while the annotations contain the pixel-wise label maps. Each label map has the shape (height, width , 1)
with each point in this space denoting the corresponding pixel's class. Classes are in the range [0, 11]
(i.e. 12 classes) and the pixel labels correspond to these classes:
| Value | Class Name | | -------| -------------| | 0 | sky | | 1 | building | | 2 | column/pole | | 3 | road | | 4 | side walk | | 5 | vegetation | | 6 | traffic light | | 7 | fence | | 8 | vehicle | | 9 | pedestrian | | 10 | byciclist | | 11 | void |
For example, if a pixel is part of a road, then that point will be labeled 3
in the label map. Run the cell below to create a list containing the class names:
Note: bicyclist is mispelled as 'byciclist' in the dataset. We won't handle data cleaning in this example, but you can inspect and clean the data if you want to use this as a starting point for a personal project.
Load and Prepare the Dataset
Next, you will load and prepare the train and validation sets for training. There are some preprocessing steps needed before the data is fed to the model. These include:
resizing the height and width of the input images and label maps (224 x 224px by default)
normalizing the input images' pixel values to fall in the range
[-1, 1]
reshaping the label maps from
(height, width, 1)
to(height, width, 12)
with each slice along the third axis having1
if it belongs to the class corresponding to that slice's index else0
. For example, if a pixel is part of a road, then using the table above, that point at slice #3 will be labeled1
and it will be0
in all other slices. To illustrate using simple arrays:
The following function will do the preprocessing steps mentioned above.
The dataset also already has separate folders for train and test sets. As described earlier, these sets will have two folders: one corresponding to the images, and the other containing the annotations.
You will use the following functions to create the tensorflow datasets from the images in these folders. Notice that before creating the batches in the get_training_dataset()
and get_validation_set()
, the images are first preprocessed using the map_filename_to_image_and_mask()
function you defined earlier.
You can now generate the training and validation sets by running the cell below.
Let's Take a Look at the Dataset
You will also need utilities to help visualize the dataset and the model predictions later. First, you need to assign a color mapping to the classes in the label maps. Since our dataset has 12 classes, you need to have a list of 12 colors. We can use the color_palette() from Seaborn to generate this.
Please run the cells below to see sample images from the train and validation sets. You will see the image and the label maps side side by side.
Define the Model
You will now build the model and prepare it for training. AS mentioned earlier, this will use a VGG-16 network for the encoder and FCN-8 for the decoder. This is the diagram as shown in class:
For this exercise, you will notice a slight difference from the lecture because the dataset images are 224x224 instead of 32x32. You'll see how this is handled in the next cells as you build the encoder.
Define Pooling Block of VGG
As you saw in Course 1 of this specialization, VGG networks have repeating blocks so to make the code neat, it's best to create a function to encapsulate this process. Each block has convolutional layers followed by a max pooling layer which downsamples the image.
Download VGG weights
First, please run the cell below to get pre-trained weights for VGG-16. You will load this in the next section when you build the encoder network.
Define VGG-16
You can build the encoder as shown below.
You will create 5 blocks with increasing number of filters at each stage.
The number of convolutions, filters, kernel size, activation, pool size and pool stride will remain constant.
You will load the pretrained weights after creating the VGG 16 network.
Additional convolution layers will be appended to extract more features.
The output will contain the output of the last layer and the previous four convolution blocks.
Define FCN 8 Decoder
Next, you will build the decoder using deconvolution layers. Please refer to the diagram for FCN-8 at the start of this section to visualize what the code below is doing. It will involve two summations before upsampling to the original image size and generating the predicted mask.
Define Final Model
You can now build the final model by connecting the encoder and decoder blocks.
Compile the Model
Next, the model will be configured for training. You will need to specify the loss, optimizer and metrics. You will use categorical_crossentropy
as the loss function since the label map is transformed to one hot encoded vectors for each pixel in the image (i.e. 1
in one slice and 0
for other slices as described earlier).
Train the Model
The model can now be trained. This will take around 30 minutes to run and you will reach around 85% accuracy for both train and val sets.
Evaluate the Model
After training, you will want to see how your model is doing on a test set. For segmentation models, you can use the intersection-over-union and the dice score as metrics to evaluate your model. You'll see how it is implemented in this section.
Make Predictions
You can get output segmentation masks by using the predict()
method. As you may recall, the output of our segmentation model has the shape (height, width, 12)
where 12
is the number of classes. Each pixel value in those 12 slices indicates the probability of that pixel belonging to that particular class. If you want to create the predicted label map, then you can get the argmax()
of that axis. This is shown in the following cell.
Compute Metrics
The function below generates the IOU and dice score of the prediction and ground truth masks. From the lectures, it is given that:
The code below does that for you. A small smoothening factor is introduced in the denominators to prevent possible division by zero.
Show Predictions and Metrics
You can now see the predicted segmentation masks side by side with the ground truth. The metrics are also overlayed so you can evaluate how your model is doing.
Display Class Wise Metrics
You can also compute the class-wise metrics so you can see how your model performs across all images in the test set.
That's all for this lab! In the next section, you will work on another architecture for building a segmentation model: the UNET.