Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Natural Language Processing with Attention Models/Week 3 - Question Answering/C4_W3_Assignment_Ungraded_BERT_Loss.ipynb
Views: 13373
Assignment 3 Ungraded Sections - Part 1: BERT Loss Model
Welcome to the part 1 of testing the models for this week's assignment. We will perform decoding using the BERT Loss model. In this notebook we'll use an input, mask (hide) random word(s) in it and see how well we get the "Target" answer(s).
Colab
Since this ungraded lab takes a lot of time to run on coursera, as an alternative we have a colab prepared for you.
If you run into a page that looks similar to the one below, with the option
Open with
, this would mean you need to download theColaboratory
app. You can do so byOpen with -> Connect more apps -> in the search bar write "Colaboratory" -> install
After installation it should look like this. Click on
Open with Google Colaboratory
Part 2: BERT Loss
Now that you created the encoder, we will not make you train it. Training it could easily cost you a few days depending on which GPUs/TPUs you are using. Very few people train the full transformer from scratch. Instead, what the majority of people do, they load in a pretrained model, and they fine tune it on a specific task. That is exactly what you are about to do. Let's start by initializing and then loading in the model.
Initialize the model from the saved checkpoint.
Run the cell below to decode.
Note: This will take some time to run
At this point the RAM is almost full, this happens because the model and the decoding is memory heavy. You can run decoding just once. Running it the second time with another example might give you an answer that makes no sense, or repetitive words. If that happens restart the runtime (see how to at the start of the notebook) and run all the cells again.
You should also be aware that the quality of the decoding is not very good because max_length was downsized from 50 to 5 so that this runs faster within this environment. The colab version uses the original max_length so check that one for the actual decoding.