Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
keras-team
GitHub Repository: keras-team/keras-io
Path: blob/master/examples/timeseries/timeseries_classification_transformer.py
3507 views
1
"""
2
Title: Timeseries classification with a Transformer model
3
Author: [Theodoros Ntakouris](https://github.com/ntakouris)
4
Date created: 2021/06/25
5
Last modified: 2021/08/05
6
Description: This notebook demonstrates how to do timeseries classification using a Transformer model.
7
Accelerator: GPU
8
"""
9
10
"""
11
## Introduction
12
13
This is the Transformer architecture from
14
[Attention Is All You Need](https://arxiv.org/abs/1706.03762),
15
applied to timeseries instead of natural language.
16
17
This example requires TensorFlow 2.4 or higher.
18
19
## Load the dataset
20
21
We are going to use the same dataset and preprocessing as the
22
[TimeSeries Classification from Scratch](https://keras.io/examples/timeseries/timeseries_classification_from_scratch)
23
example.
24
"""
25
26
import numpy as np
27
import keras
28
from keras import layers
29
30
31
def readucr(filename):
32
data = np.loadtxt(filename, delimiter="\t")
33
y = data[:, 0]
34
x = data[:, 1:]
35
return x, y.astype(int)
36
37
38
root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"
39
40
x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
41
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")
42
43
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
44
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))
45
46
n_classes = len(np.unique(y_train))
47
48
idx = np.random.permutation(len(x_train))
49
x_train = x_train[idx]
50
y_train = y_train[idx]
51
52
y_train[y_train == -1] = 0
53
y_test[y_test == -1] = 0
54
55
"""
56
## Build the model
57
58
Our model processes a tensor of shape `(batch size, sequence length, features)`,
59
where `sequence length` is the number of time steps and `features` is each input
60
timeseries.
61
62
You can replace your classification RNN layers with this one: the
63
inputs are fully compatible!
64
65
We include residual connections, layer normalization, and dropout.
66
The resulting layer can be stacked multiple times.
67
68
The projection layers are implemented through `keras.layers.Conv1D`.
69
"""
70
71
# This implementation applies Layer Normalization before the residual connection
72
# to improve training stability by producing better-behaved gradients and often
73
# eliminating the need for learning rate warm-up.
74
75
76
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
77
# Attention and Normalization
78
x = layers.MultiHeadAttention(
79
key_dim=head_size, num_heads=num_heads, dropout=dropout
80
)(inputs, inputs)
81
x = layers.Dropout(dropout)(x)
82
x = layers.LayerNormalization(epsilon=1e-6)(x)
83
res = x + inputs
84
85
# Feed Forward Part
86
x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(res)
87
x = layers.Dropout(dropout)(x)
88
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
89
x = layers.LayerNormalization(epsilon=1e-6)(x)
90
return x + res
91
92
93
"""
94
The main part of our model is now complete. We can stack multiple of those
95
`transformer_encoder` blocks and we can also proceed to add the final
96
Multi-Layer Perceptron classification head. Apart from a stack of `Dense`
97
layers, we need to reduce the output tensor of the `TransformerEncoder` part of
98
our model down to a vector of features for each data point in the current
99
batch. A common way to achieve this is to use a pooling layer. For
100
this example, a `GlobalAveragePooling1D` layer is sufficient.
101
"""
102
103
104
def build_model(
105
input_shape,
106
head_size,
107
num_heads,
108
ff_dim,
109
num_transformer_blocks,
110
mlp_units,
111
dropout=0,
112
mlp_dropout=0,
113
):
114
inputs = keras.Input(shape=input_shape)
115
x = inputs
116
for _ in range(num_transformer_blocks):
117
x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
118
119
x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
120
for dim in mlp_units:
121
x = layers.Dense(dim, activation="relu")(x)
122
x = layers.Dropout(mlp_dropout)(x)
123
outputs = layers.Dense(n_classes, activation="softmax")(x)
124
return keras.Model(inputs, outputs)
125
126
127
"""
128
## Train and evaluate
129
"""
130
131
input_shape = x_train.shape[1:]
132
133
model = build_model(
134
input_shape,
135
head_size=256,
136
num_heads=4,
137
ff_dim=4,
138
num_transformer_blocks=4,
139
mlp_units=[128],
140
mlp_dropout=0.4,
141
dropout=0.25,
142
)
143
144
model.compile(
145
loss="sparse_categorical_crossentropy",
146
optimizer=keras.optimizers.Adam(learning_rate=1e-4),
147
metrics=["sparse_categorical_accuracy"],
148
)
149
model.summary()
150
151
callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]
152
153
model.fit(
154
x_train,
155
y_train,
156
validation_split=0.2,
157
epochs=150,
158
batch_size=64,
159
callbacks=callbacks,
160
)
161
162
model.evaluate(x_test, y_test, verbose=1)
163
164
"""
165
## Conclusions
166
167
In about 110-120 epochs (25s each on Colab), the model reaches a training
168
accuracy of ~0.95, validation accuracy of ~84 and a testing
169
accuracy of ~85, without hyperparameter tuning. And that is for a model
170
with less than 100k parameters. Of course, parameter count and accuracy could be
171
improved by a hyperparameter search and a more sophisticated learning rate
172
schedule, or a different optimizer.
173
174
"""
175
176