Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/C5 - Sequence Models/Week 1/Dinosaur Island -- Character-level language model/Dinosaurus_Island_Character_level_language_model.ipynb
Views: 4819
Character level language model - Dinosaurus Island
Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment, they have returned.
You are in charge of a special task: Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go berserk 😉. So choose wisely!
![]() |
Luckily you're equipped with some deep learning now, and you will use it to save the day! Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character-level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!
By the time you complete this assignment, you'll be able to:
Store text data for processing using an RNN
Build a character-level text generation model using an RNN
Sample novel sequences in an RNN
Explain the vanishing/exploding gradient problem in RNNs
Apply gradient clipping as a solution for exploding gradients
Begin by loading in some functions that are provided for you in rnn_utils
. Specifically, you have access to functions such as rnn_forward
and rnn_backward
which are equivalent to those you've implemented in the previous assignment.
Table of Contents
The characters are a-z (26 characters) plus the "\n" (or newline character).
In this assignment, the newline character "\n" plays a role similar to the
<EOS>
(or "End of sentence") token discussed in lecture.Here, "\n" indicates the end of the dinosaur name rather than the end of a sentence.
char_to_ix
: In the cell below, you'll create a Python dictionary (i.e., a hash table) to map each character to an index from 0-26.ix_to_char
: Then, you'll create a second Python dictionary that maps each index back to the corresponding character.This will help you figure out which index corresponds to which character in the probability distribution output of the softmax layer.
1.2 - Overview of the Model
Your model will have the following structure:
Initialize parameters
Run the optimization loop
Forward propagation to compute the loss function
Backward propagation to compute the gradients with respect to the loss function
Clip the gradients to avoid exploding gradients
Using the gradients, update your parameters with the gradient descent update rule.
Return the learned parameters

At each time-step, the RNN tries to predict what the next character is, given the previous characters.
is a list of characters from the training set.
is the same list of characters but shifted one character forward.
At every time-step , . The prediction at time is the same as the input at time .
2.1 - Clipping the Gradients in the Optimization Loop
In this section you will implement the clip
function that you will call inside of your optimization loop.
Exploding gradients
When gradients are very large, they're called "exploding gradients."
Exploding gradients make the training process more difficult, because the updates may be so large that they "overshoot" the optimal values during back propagation.
Recall that your overall loop structure usually consists of:
forward pass,
cost computation,
backward pass,
parameter update.
Before updating the parameters, you will perform gradient clipping to make sure that your gradients are not "exploding."
Gradient clipping
In the exercise below, you will implement a function clip
that takes in a dictionary of gradients and returns a clipped version of gradients, if needed.
There are different ways to clip gradients.
You will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to fall between some range [-N, N].
For example, if the N=10
The range is [-10, 10]
If any component of the gradient vector is greater than 10, it is set to 10.
If any component of the gradient vector is less than -10, it is set to -10.
If any components are between -10 and 10, they keep their original values.

Exercise 1 - clip
Return the clipped gradients of your dictionary gradients
.
Your function takes in a maximum threshold and returns the clipped versions of the gradients.
You can check out numpy.clip for more info.
You will need to use the argument "
out = ...
".Using the "
out
" parameter allows you to update a variable "in-place".If you don't use "
out
" argument, the clipped variable is stored in the variable "gradient" but does not update the gradient variablesdWax
,dWaa
,dWya
,db
,dby
.
Expected values
2.2 - Sampling
Now, assume that your model is trained, and you would like to generate new text (characters). The process of generation is explained in the picture below:

Step 2: Run one step of forward propagation to get and . Here are the equations:
hidden state:
activation:
prediction:
Details about :
Note that is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1).
represents the probability that the character indexed by "i" is the next character.
A
softmax()
function is provided for you to use.
Additional Hints
is
x
in the code. When creating the one-hot vector, make a numpy array of zeros, with the number of rows equal to the number of unique characters, and the number of columns equal to one. It's a 2D and not a 1D array.is
a_prev
in the code. It is a numpy array of zeros, where the number of rows is , and number of columns is 1. It is a 2D array as well. is retrieved by getting the number of columns in (the numbers need to match in order for the matrix multiplication to work.Official documentation for numpy.dot and numpy.tanh
Step 3: Sampling:
Now that you have , you want to select the next letter in the dinosaur name. If you select the most probable, the model will always generate the same result given a starting letter. To make the results more interesting, use
np.random.choice
to select a next letter that is likely, but not always the same.Pick the next character's index according to the probability distribution specified by .
This means that if , you will pick the index "i" with 16% probability.
Use np.random.choice.
Example of how to use
np.random.choice()
:This means that you will pick the index (
idx
) according to the distribution:
.
Note that the value that's set to
p
should be set to a 1D vector.Also notice that , which is
y
in the code, is a 2D array.Also notice, while in your implementation, the first argument to
np.random.choice
is just an ordered list [0,1,.., vocab_len-1], it is not appropriate to usechar_to_ix.values()
. The order of values returned by a Python dictionary.values()
call will be the same order as they are added to the dictionary. The grader may have a different order when it runs your routine than when you run it in your notebook.
Step 4: Update to
The last step to implement in
sample()
is to update the variablex
, which currently stores , with the value of .You will represent by creating a one-hot vector corresponding to the character that you have chosen as your prediction.
You will then forward propagate in Step 1 and keep repeating the process until you get a
"\n"
character, indicating that you have reached the end of the dinosaur name.
Additional Hints
In order to reset
x
before setting it to the new one-hot vector, you'll want to set all the values to zero.You can either create a new numpy array: numpy.zeros
Or fill all values with a single number: numpy.ndarray.fill
Expected output
What you should remember:
Very large, or "exploding" gradients updates can be so large that they "overshoot" the optimal values during back prop -- making training difficult
Clip gradients before updating the parameters to avoid exploding gradients
Sampling is a technique you can use to pick the index of the next character according to a probability distribution.
To begin character-level sampling:
Input a "dummy" vector of zeros as a default input
Run one step of forward propagation to get 𝑎⟨1⟩ (your first character) and 𝑦̂ ⟨1⟩ (probability distribution for the following character)
When sampling, avoid generating tzhe same result each time given the starting letter (and make your names more interesting!) by using
np.random.choice
3 - Building the Language Model
It's time to build the character-level language model for text generation!
3.1 - Gradient Descent
In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You'll go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent.
As a reminder, here are the steps of a common optimization loop for an RNN:
Forward propagate through the RNN to compute the loss
Backward propagate through time to compute the gradients of the loss with respect to the parameters
Clip the gradients
Update the parameters using gradient descent
Exercise 3 - optimize
Implement the optimization process (one step of stochastic gradient descent).
The following functions are provided:
Recall that you previously implemented the clip
function:
Parameters
Note that the weights and biases inside the
parameters
dictionary are being updated by the optimization, even thoughparameters
is not one of the returned values of theoptimize
function. Theparameters
dictionary is passed by reference into the function, so changes to this dictionary are making changes to theparameters
dictionary even when accessed outside of the function.Python dictionaries and lists are "pass by reference", which means that if you pass a dictionary into a function and modify the dictionary within the function, this changes that same dictionary (it's not a copy of the dictionary).
Expected output
Given the dataset of dinosaur names, you'll use each line of the dataset (one name) as one training example.
Every 2000 steps of stochastic gradient descent, you will sample several randomly chosen names to see how the algorithm is doing.
Exercise 4 - model
Implement model()
.
When examples[index]
contains one dinosaur name (string), to create an example (X, Y), you can use this:
Set the index idx
into the list of examples
Using the for-loop, walk through the shuffled list of dinosaur names in the list "examples."
For example, if there are n_e examples, and the for-loop increments the index to n_e onwards, think of how you would make the index cycle back to 0, so that you can continue feeding the examples into the model when j is n_e, n_e + 1, etc.
Hint: n_e + 1 divided by n_e is zero with a remainder of 1.
%
is the modulo operator in python.
Extract a single example from the list of examples
single_example
: use theidx
index that you set previously to get one word from the list of examples.
Convert a string into a list of characters: single_example_chars
single_example_chars
: A string is a list of characters.You can use a list comprehension (recommended over for-loops) to generate a list of characters.
For more on list comprehensions:
Convert list of characters to a list of integers: single_example_ix
Create a list that contains the index numbers associated with each character.
Use the dictionary
char_to_ix
You can combine this with the list comprehension that is used to get a list of characters from a string.
Create the list of input characters: X
rnn_forward
uses theNone
value as a flag to set the input vector as a zero-vector.Prepend the list [
None
] in front of the list of input characters.There is more than one way to prepend a value to a list. One way is to add two lists together:
['a'] + ['b']
Get the integer representation of the newline character ix_newline
ix_newline
: The newline character signals the end of the dinosaur name.Get the integer representation of the newline character
'\n'
.Use
char_to_ix
Set the list of labels (integer representation of the characters): Y
The goal is to train the RNN to predict the next letter in the name, so the labels are the list of characters that are one time-step ahead of the characters in the input
X
.For example,
Y[0]
contains the same value asX[1]
The RNN should predict a newline at the last letter, so add
ix_newline
to the end of the labels.Append the integer representation of the newline character to the end of
Y
.Note that
append
is an in-place operation.It might be easier for you to add two lists together.
When you run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.
Expected output
Conclusion
You can see that your algorithm has started to generate plausible dinosaur names towards the end of training. At first, it was generating random characters, but towards the end you could begin to see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results! Our implementation generated some really cool names like maconucon
, marloralus
and macingsersaurus
. Your model hopefully also learned that dinosaur names tend to end in saurus
, don
, aura
, tor
, etc.
If your model generates some non-cool names, don't blame the model entirely -- not all actual dinosaur names sound cool. (For example, dromaeosauroides
is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!
This assignment used a relatively small dataset, so that you're able to train an RNN quickly on a CPU. Training a model of the English language requires a much bigger dataset, and usually much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favorite name is the great, the fierce, the undefeated: Mangosaurus!
Congratulations!
You've finished the graded portion of this notebook and created a working language model! Awesome job.
By now, you've:
Stored text data for processing using an RNN
Built a character-level text generation model
Explored the vanishing/exploding gradient problem in RNNs
Applied gradient clipping to avoid exploding gradients
You've also hopefully generated some dinosaur names that are cool enough to please you and also avoid the wrath of the dinosaurs. If you had fun with the assignment, be sure not to miss the ungraded portion, where you'll be able to generate poetry like the Bard Himself. Good luck and have fun!
4 - Writing like Shakespeare (OPTIONAL/UNGRADED)
The rest of this notebook is optional and is not graded, but it's quite fun and informative, so you're highly encouraged to try it out!
A similar task to character-level text generation (but more complicated) is generating Shakespearean poems. Instead of learning from a dataset of dinosaur names, you can use a collection of Shakespearean poems. Using LSTM cells, you can learn longer-term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character, much later in the sequence. These long-term dependencies were less important with dinosaur names, since the names were quite short.

Below, you can implement a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.
To save you some time, a model has already been trained for ~1000 epochs on a collection of Shakespearean poems called "The Sonnets."
Let's train the model for one more epoch. When it finishes training for an epoch (this will also take a few minutes), you can run generate_output
, which will prompt you for an input (<
40 characters). The poem will start with your sentence, and your RNN Shakespeare will complete the rest of the poem for you! For example, try, "Forsooth this maketh no sense" (without the quotation marks!). Depending on whether you include the space at the end, your results might also differ, so try it both ways, and try other inputs as well.
Congratulations on finishing this notebook!
The RNN Shakespeare model is very similar to the one you built for dinosaur names. The only major differences are:
LSTMs instead of the basic RNN to capture longer-range dependencies
The model is a deeper, stacked LSTM model (2 layer)
Using Keras instead of Python to simplify the code
5 - References
This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy's blog post.