CoCalc -- timeseries_classification

GitHub Repository: keras-team/keras-io
Path: blob/master/examples/timeseries/timeseries_classification_transformer.py
³⁵⁰⁷ views
1
"""
2
Title: Timeseries classification with a Transformer model
3
Author: [Theodoros Ntakouris](https://github.com/ntakouris)
4
Date created: 2021/06/25
5
Last modified: 2021/08/05
6
Description: This notebook demonstrates how to do timeseries classification using a Transformer model.
7
Accelerator: GPU
8
"""
9

10
"""
11
## Introduction
12

13
This is the Transformer architecture from
14
[Attention Is All You Need](https://arxiv.org/abs/1706.03762),
15
applied to timeseries instead of natural language.
16

17
This example requires TensorFlow 2.4 or higher.
18

19
## Load the dataset
20

21
We are going to use the same dataset and preprocessing as the
22
[TimeSeries Classification from Scratch](https://keras.io/examples/timeseries/timeseries_classification_from_scratch)
23
example.
24
"""
25

26
import numpy as np
27
import keras
28
from keras import layers
29

30

31
def readucr(filename):
32
    data = np.loadtxt(filename, delimiter="\t")
33
    y = data[:, 0]
34
    x = data[:, 1:]
35
    return x, y.astype(int)
36

37

38
root_url = "https://raw.githubusercontent.com/hfawaz/cd-diagram/master/FordA/"
39

40
x_train, y_train = readucr(root_url + "FordA_TRAIN.tsv")
41
x_test, y_test = readucr(root_url + "FordA_TEST.tsv")
42

43
x_train = x_train.reshape((x_train.shape[0], x_train.shape[1], 1))
44
x_test = x_test.reshape((x_test.shape[0], x_test.shape[1], 1))
45

46
n_classes = len(np.unique(y_train))
47

48
idx = np.random.permutation(len(x_train))
49
x_train = x_train[idx]
50
y_train = y_train[idx]
51

52
y_train[y_train == -1] = 0
53
y_test[y_test == -1] = 0
54

55
"""
56
## Build the model
57

58
Our model processes a tensor of shape `(batch size, sequence length, features)`,
59
where `sequence length` is the number of time steps and `features` is each input
60
timeseries.
61

62
You can replace your classification RNN layers with this one: the
63
inputs are fully compatible!
64

65
We include residual connections, layer normalization, and dropout.
66
The resulting layer can be stacked multiple times.
67

68
The projection layers are implemented through `keras.layers.Conv1D`.
69
"""
70

71
# This implementation applies Layer Normalization before the residual connection
72
# to improve training stability by producing better-behaved gradients and often
73
# eliminating the need for learning rate warm-up.
74

75

76
def transformer_encoder(inputs, head_size, num_heads, ff_dim, dropout=0):
77
    # Attention and Normalization
78
    x = layers.MultiHeadAttention(
79
        key_dim=head_size, num_heads=num_heads, dropout=dropout
80
    )(inputs, inputs)
81
    x = layers.Dropout(dropout)(x)
82
    x = layers.LayerNormalization(epsilon=1e-6)(x)
83
    res = x + inputs
84

85
    # Feed Forward Part
86
    x = layers.Conv1D(filters=ff_dim, kernel_size=1, activation="relu")(res)
87
    x = layers.Dropout(dropout)(x)
88
    x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
89
    x = layers.LayerNormalization(epsilon=1e-6)(x)
90
    return x + res
91

92

93
"""
94
The main part of our model is now complete. We can stack multiple of those
95
`transformer_encoder` blocks and we can also proceed to add the final
96
Multi-Layer Perceptron classification head. Apart from a stack of `Dense`
97
layers, we need to reduce the output tensor of the `TransformerEncoder` part of
98
our model down to a vector of features for each data point in the current
99
batch. A common way to achieve this is to use a pooling layer. For
100
this example, a `GlobalAveragePooling1D` layer is sufficient.
101
"""
102

103

104
def build_model(
105
    input_shape,
106
    head_size,
107
    num_heads,
108
    ff_dim,
109
    num_transformer_blocks,
110
    mlp_units,
111
    dropout=0,
112
    mlp_dropout=0,
113
):
114
    inputs = keras.Input(shape=input_shape)
115
    x = inputs
116
    for _ in range(num_transformer_blocks):
117
        x = transformer_encoder(x, head_size, num_heads, ff_dim, dropout)
118

119
    x = layers.GlobalAveragePooling1D(data_format="channels_last")(x)
120
    for dim in mlp_units:
121
        x = layers.Dense(dim, activation="relu")(x)
122
        x = layers.Dropout(mlp_dropout)(x)
123
    outputs = layers.Dense(n_classes, activation="softmax")(x)
124
    return keras.Model(inputs, outputs)
125

126

127
"""
128
## Train and evaluate
129
"""
130

131
input_shape = x_train.shape[1:]
132

133
model = build_model(
134
    input_shape,
135
    head_size=256,
136
    num_heads=4,
137
    ff_dim=4,
138
    num_transformer_blocks=4,
139
    mlp_units=[128],
140
    mlp_dropout=0.4,
141
    dropout=0.25,
142
)
143

144
model.compile(
145
    loss="sparse_categorical_crossentropy",
146
    optimizer=keras.optimizers.Adam(learning_rate=1e-4),
147
    metrics=["sparse_categorical_accuracy"],
148
)
149
model.summary()
150

151
callbacks = [keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True)]
152

153
model.fit(
154
    x_train,
155
    y_train,
156
    validation_split=0.2,
157
    epochs=150,
158
    batch_size=64,
159
    callbacks=callbacks,
160
)
161

162
model.evaluate(x_test, y_test, verbose=1)
163

164
"""
165
## Conclusions
166

167
In about 110-120 epochs (25s each on Colab), the model reaches a training
168
accuracy of ~0.95, validation accuracy of ~84 and a testing
169
accuracy of ~85, without hyperparameter tuning. And that is for a model
170
with less than 100k parameters. Of course, parameter count and accuracy could be
171
improved by a hyperparameter search and a more sophisticated learning rate
172
schedule, or a different optimizer.
173

174
"""
175

176
Product

Resources

Company