Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Natural Language Processing with Sequence Models/Week 2 - Recureent Neural Networks for Language Modelling/C3_W2_lecture_notebook_GRU.ipynb
Views: 13373
Creating a GRU model using Trax: Ungraded Lecture Notebook
For this lecture notebook you will be using Trax's layers. These are the building blocks for creating neural networks with Trax.
Trax allows to define neural network architectures by stacking layers (similarly to other libraries such as Keras). For this the Serial()
is often used as it is a combinator that allows to stack layers serially using function composition.
Next you can see a simple vanilla NN architecture containing 1 hidden(dense) layer with 128 cells and output (dense) layer with 10 cells on which we apply the final layer of logsoftmax.
Each of the layers within the Serial
combinator layer is considered a sublayer. Notice that unlike similar libraries, in Trax the activation functions are considered layers. To know more about the Serial
layer check the docs here.
You can try printing this object:
Printing the model gives you the exact same information as the model's definition itself.
By just looking at the definition you can clearly see what is going on inside the neural network. Trax is very straightforward in the way a network is defined, that is one of the things that makes it awesome!
GRU MODEL
To create a GRU
model you will need to be familiar with the following layers (Documentation link attached with each layer name):
ShiftRight
Shifts the tensor to the right by padding on axis 1. Themode
should be specified and it refers to the context in which the model is being used. Possible values are: 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to "train".Embedding
Maps discrete tokens to vectors. It will have shape(vocabulary length X dimension of output vectors)
. The dimension of output vectors (also calledd_feature
) is the number of elements in the word embedding.GRU
The GRU layer. It leverages another Trax layer calledGRUCell
. The number of GRU units should be specified and should match the number of elements in the word embedding. If you want to stack two consecutive GRU layers, it can be done by using python's list comprehension.Dense
Vanilla Dense layer.LogSoftMax
Log Softmax function.
Putting everything together the GRU model will look like this:
Next is a helper function that prints information for every layer (sublayer within Serial
):
Try changing the parameters defined before the GRU model and see how it changes!
Hope you are now more familiarized with creating GRU models using Trax.
You will train this model in this week's assignment and see it in action.
GRU and the trax minions will return, in this week's endgame.