Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Natural Language Processing with Sequence Models/Week 2 - Recureent Neural Networks for Language Modelling/C3_W2_lecture_notebook_perplexity.ipynb
Views: 13373
Working with JAX numpy and calculating perplexity: Ungraded Lecture Notebook
Normally you would import numpy
and rename it as np
.
However in this week's assignment you will notice that this convention has been changed.
Now standard numpy
is not renamed and trax.fastmath.numpy
is renamed as np
.
The rationale behind this change is that you will be using Trax's numpy (which is compatible with JAX) far more often. Trax's numpy supports most of the same functions as the regular numpy so the change won't be noticeable in most cases.
One important change to take into consideration is that the types of the resulting objects will be different depending on the version of numpy. With regular numpy you get numpy.ndarray
but with Trax's numpy you will get jax.interpreters.xla.DeviceArray
. These two types map to each other. So if you find some error logs mentioning DeviceArray type, don't worry about it, treat it like you would treat an ndarray and march ahead.
You can get a randomized numpy array by using the numpy.random.random()
function.
This is one of the functionalities that Trax's numpy does not currently support in the same way as the regular numpy.
You can easily cast regular numpy arrays or lists into trax numpy arrays using the trax.fastmath.numpy.array()
function:
Hope you now understand the differences (and similarities) between these two versions and numpy. Great!
The previous section was a quick look at Trax's numpy. However this notebook also aims to teach you how you can calculate the perplexity of a trained model.
Calculating Perplexity
The perplexity is a metric that measures how well a probability model predicts a sample and it is commonly used to evaluate language models. It is defined as:
As an implementation hack, you would usually take the log of that formula (to enable us to use the log probabilities we get as output of our RNN
, convert exponents to products, and products into sums which makes computations less complicated and computationally more efficient). You should also take care of the padding, since you do not want to include the padding when calculating the perplexity (because we do not want to have a perplexity measure artificially good). The algebra behind this process is explained next:
You will be working with a real example from this week's assignment. The example is made up of:
predictions
: batch of tensors corresponding to lines of text predicted by the model.targets
: batch of actual tensors corresponding to lines of text.
Notice that the predictions have an extra dimension with the same length as the size of the vocabulary used.
Because of this you will need a way of reshaping targets
to match this shape. For this you can use trax.layers.one_hot()
.
Notice that predictions.shape[-1]
will return the size of the last dimension of predictions
.
By calculating the product of the predictions and the reshaped targets and summing across the last dimension, the total log perplexity can be computed:
Now you will need to account for the padding so this metric is not artificially deflated (since a lower perplexity means a better model). For identifying which elements are padding and which are not, you can use np.equal()
and get a tensor with 1s
in the positions of actual values and 0s
where there are paddings.
By computing the product of the total log perplexity and the non_pad tensor we remove the effect of padding on the metric:
You can check the effect of filtering out the padding by looking at the two log perplexity tensors:
To get a single average log perplexity across all the elements in the batch you can sum across both dimensions and divide by the number of elements. Notice that the result will be the negative of the real log perplexity of the model:
Congratulations on finishing this lecture notebook! Now you should have a clear understanding of how to work with Trax's numpy and how to compute the perplexity to evaluate your language models. Keep it up!