📚 The CoCalc Library - books, templates and other resources
License: OTHER
Introduction
In this demo, you'll see a more practical application of RNNs/LSTMs as character-level language models. The emphasis will be more on parallelization and using RNNs with data from Fuel.
To get started, we first need to download the training text, validation text and a file that contains a dictionary for mapping characters to integers. We also need to import quite a list of modules.
##The Model The code below shows an implementation of an LSTM network. Note that there are various different variations of the LSTM in use and this one doesn't include the so-called 'peephole connections'. We used a separate method for the dynamic update to make it easier to generate from the network later. The index_dot
function doesn't safe much verbosity, but it clarifies that certain dot products have been replaced with indexing operations because this network will be applied to discrete data. Last but not least, note the addition of the mask
argument which is used to ignore certain parts of the input sequence.
The next block contains some code that computes cross-entropy for masked sequences and a stripped down version of the logistic regression class from the deep learning tutorials which we will need later.
#Processing the Data The data in traindata.txt
and valdata.txt
is simply English text but formatted in such a way that every sentence is conveniently separated by the newline symbol. We'll use some of the functionality of fuel to perform the following preprocessing steps:
Convert everything to lowercase
Map characters to indices
Group the sentences into batches
Convert each batch in a matrix/tensor as long as the longest sequence with zeros padded to all the shorter sequences
Add a mask matrix that encodes the length of each sequence (a timestep at which the mask is 0 indicates that there is no data available)
##The Theano Graph We'll now define the complete Theano graph for computing costs and gradients among other things. The cost will be the cross-entropy of the next character in the sequence and the network will try to predict it based on the previous characters.
We can now compile the function that updates the gradients. We also added a function that computes the cost without updating for monitoring purposes.
##Generating Sequences To see if the networks learn something useful (and to make results monitoring more entertaining), we'll also write some code to generate sequences. For this, we'll first compile a function that computes a single state update for the network to have more control over the values of each variable at each time step.
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-13-7c09df6ae427> in <module>()
9 iteration += 1
10
---> 11 cross_entropy = update_model(x_.T, mask_.T)
12
13
/home/pbrakel/Repositories/Theano/theano/compile/function_module.pyc in __call__(self, *args, **kwargs)
577 t0_fn = time.time()
578 try:
--> 579 outputs = self.fn()
580 except Exception:
581 if hasattr(self.fn, 'position_of_error'):
/home/pbrakel/Repositories/Theano/theano/scan_module/scan_op.pyc in rval(p, i, o, n)
649 # default arguments are stored in the closure of `rval`
650
--> 651 def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
652 r = p(n, [x[0] for x in i], o)
653 for o in node.outputs:
KeyboardInterrupt:
It can take a while before the text starts to look more reasonable but here are some things to experiment with:
Smarter optimization algorithms (or at least momentum)
Initializing the recurrent weights orthogonally
The sizes of the initial weights and biases (think about what the gates do)
Different sentence prefixes
Changing the temperature of the character distribution during generation. What happens when you generate deterministically?