Path: blob/master/examples/nlp/md/neural_machine_translation_with_transformer.md
3508 views
English-to-Spanish translation with a sequence-to-sequence Transformer
Author: fchollet
Date created: 2021/05/26
Last modified: 2024/11/18
Description: Implementing a sequence-to-sequence Transformer and training it on a machine translation task.
Introduction
In this example, we'll build a sequence-to-sequence Transformer model, which we'll train on an English-to-Spanish machine translation task.
You'll learn how to:
Vectorize text using the Keras
TextVectorization
layer.Implement a
TransformerEncoder
layer, aTransformerDecoder
layer, and aPositionalEmbedding
layer.Prepare data for training a sequence-to-sequence model.
Use the trained model to generate translations of never-seen-before input sentences (sequence-to-sequence inference).
The code featured here is adapted from the book Deep Learning with Python, Second Edition (chapter 11: Deep learning for text). The present example is fairly barebones, so for detailed explanations of how each building block works, as well as the theory behind Transformers, I recommend reading the book.
Setup
Downloading the data
We'll be working with an English-to-Spanish translation dataset provided by Anki. Let's download it:
Parsing the data
Each line contains an English sentence and its corresponding Spanish sentence. The English sentence is the source sequence and Spanish one is the target sequence. We prepend the token "[start]"
and we append the token "[end]"
to the Spanish sentence.
Here's what our sentence pairs look like:
Next, we'll format our datasets.
At each training step, the model will seek to predict target words N+1 (and beyond) using the source sentence and the target words 0 to N.
As such, the training dataset will yield a tuple (inputs, targets)
, where:
inputs
is a dictionary with the keysencoder_inputs
anddecoder_inputs
.encoder_inputs
is the vectorized source sentence anddecoder_inputs
is the target sentence "so far", that is to say, the words 0 to N used to predict word N+1 (and beyond) in the target sentence.target
is the target sentence offset by one step: it provides the next words in the target sentence -- what the model will try to predict.
Let's take a quick look at the sequence shapes (we have batches of 64 pairs, and all sequences are 20 steps long):
Next, we assemble the end-to-end model.
Training our model
We'll use accuracy as a quick way to monitor training progress on the validation data. Note that machine translation typically uses BLEU scores as well as other metrics, rather than accuracy.
Here we only train for 1 epoch, but to get the model to actually converge you should train for at least 30 epochs.
Model: "transformer"
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ Connected to ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩ │ encoder_inputs │ (None, None) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ decoder_inputs │ (None, None) │ 0 │ - │ │ (InputLayer) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ positional_embeddi… │ (None, None, 256) │ 3,845,120 │ encoder_inputs[0… │ │ (PositionalEmbeddi… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ not_equal │ (None, None) │ 0 │ encoder_inputs[0… │ │ (NotEqual) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ positional_embeddi… │ (None, None, 256) │ 3,845,120 │ decoder_inputs[0… │ │ (PositionalEmbeddi… │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ transformer_encoder │ (None, None, 256) │ 3,155,456 │ positional_embed… │ │ (TransformerEncode… │ │ │ not_equal[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ not_equal_1 │ (None, None) │ 0 │ decoder_inputs[0… │ │ (NotEqual) │ │ │ │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ transformer_decoder │ (None, None, 256) │ 5,259,520 │ positional_embed… │ │ (TransformerDecode… │ │ │ transformer_enco… │ │ │ │ │ not_equal_1[0][0… │ │ │ │ │ not_equal[0][0] │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dropout_3 (Dropout) │ (None, None, 256) │ 0 │ transformer_deco… │ ├─────────────────────┼───────────────────┼────────────┼───────────────────┤ │ dense_4 (Dense) │ (None, None, │ 3,855,000 │ dropout_3[0][0] │ │ │ 15000) │ │ │ └─────────────────────┴───────────────────┴────────────┴───────────────────┘
Total params: 19,960,216 (76.14 MB)
Trainable params: 19,960,216 (76.14 MB)
Non-trainable params: 0 (0.00 B)
Finally, let's demonstrate how to translate brand new English sentences. We simply feed into the model the vectorized English sentence as well as the target token "[start]"
, then we repeatedly generated the next token, until we hit the token "[end]"
.
After 30 epochs, we get results such as:
She handed him the money. [start] ella le pasó el dinero [end]
Tom has never heard Mary sing. [start] tom nunca ha oído cantar a mary [end]
Perhaps she will come tomorrow. [start] tal vez ella vendrá mañana [end]
I love to write. [start] me encanta escribir [end]
His French is improving little by little. [start] su francés va a [UNK] sólo un poco [end]
My hotel told me to call you. [start] mi hotel me dijo que te [UNK] [end]