📚 The CoCalc Library - books, templates and other resources
License: OTHER
Recurrent Neural Networks in Theano
Credits: Forked from summerschool2015 by mila-udem
First, we import some dependencies:
We now define a class that uses scan
to initialize an RNN and apply it to a sequence of data vectors. The constructor initializes the shared variables after which the instance can be called on a symbolic variable to construct an RNN graph. Note that this class only handles the computation of the hidden layer activations. We'll define a set of output weights later.
For visualization purposes and to keep the optimization time managable, we will train the RNN on a short synthetic chaotic time series. Let's first have a look at the data:
To train an RNN model on this sequences, we need to generate a theano graph that computes the cost and its gradient. In this case, the task will be to predict the next time step and the error objective will be the mean squared error (MSE). We also need to define shared variables for the output weights. Finally, we also add a regularization term to the cost.
We now compile the function that will update the parameters of the model using gradient descent.
We can now train the network by supplying this function with our data and calling it repeatedly.
Since we're only looking at a very small toy problem here, the model probably already memorized the train data quite well. Let's find out by plotting the predictions of the network:
Small scale optimizations of this type often benefit from more advanced second order methods. The following block defines some functions that allow you to experiment with off-the-shelf optimization routines. In this case we used BFGS.
Generating sequences
Predicting a single step ahead is a relatively easy task. It would be more intresting to see if the network actually learned how to generate multiple time steps such that it can continue the sequence. Write code that generates the next 1000 examples after processing the train sequence.
#Things to Try The quality of the generated sequence is probably not very good. Let's try to improve on it. Things to consider are:
The initial weight values
Using L2/L1 regularization
Using weight noise
The number of hidden units
The non-linearity
Adding direct connections between the input and the output