Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
keras-team
GitHub Repository: keras-team/keras-io
Path: blob/master/examples/timeseries/timeseries_traffic_forecasting.py
3507 views
1
"""
2
Title: Traffic forecasting using graph neural networks and LSTM
3
Author: [Arash Khodadadi](https://www.linkedin.com/in/arash-khodadadi-08a02490/)
4
Date created: 2021/12/28
5
Last modified: 2023/11/22
6
Description: This example demonstrates how to do timeseries forecasting over graphs.
7
Accelerator: GPU
8
"""
9
10
"""
11
## Introduction
12
13
This example shows how to forecast traffic condition using graph neural networks and LSTM.
14
Specifically, we are interested in predicting the future values of the traffic speed given
15
a history of the traffic speed for a collection of road segments.
16
17
One popular method to
18
solve this problem is to consider each road segment's traffic speed as a separate
19
timeseries and predict the future values of each timeseries
20
using the past values of the same timeseries.
21
22
This method, however, ignores the dependency of the traffic speed of one road segment on
23
the neighboring segments. To be able to take into account the complex interactions between
24
the traffic speed on a collection of neighboring roads, we can define the traffic network
25
as a graph and consider the traffic speed as a signal on this graph. In this example,
26
we implement a neural network architecture which can process timeseries data over a graph.
27
We first show how to process the data and create a
28
[tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) for
29
forecasting over graphs. Then, we implement a model which uses graph convolution and
30
LSTM layers to perform forecasting over a graph.
31
32
The data processing and the model architecture are inspired by this paper:
33
34
Yu, Bing, Haoteng Yin, and Zhanxing Zhu. "Spatio-temporal graph convolutional networks:
35
a deep learning framework for traffic forecasting." Proceedings of the 27th International
36
Joint Conference on Artificial Intelligence, 2018.
37
([github](https://github.com/VeritasYin/STGCN_IJCAI-18))
38
"""
39
40
"""
41
## Setup
42
"""
43
44
import os
45
46
os.environ["KERAS_BACKEND"] = "tensorflow"
47
48
import pandas as pd
49
import numpy as np
50
import typing
51
import matplotlib.pyplot as plt
52
53
import tensorflow as tf
54
import keras
55
from keras import layers
56
from keras import ops
57
58
"""
59
## Data preparation
60
"""
61
62
"""
63
### Data description
64
65
We use a real-world traffic speed dataset named `PeMSD7`. We use the version
66
collected and prepared by [Yu et al., 2018](https://arxiv.org/abs/1709.04875)
67
and available
68
[here](https://github.com/VeritasYin/STGCN_IJCAI-18/tree/master/dataset).
69
70
The data consists of two files:
71
72
- `PeMSD7_W_228.csv` contains the distances between 228
73
stations across the District 7 of California.
74
- `PeMSD7_V_228.csv` contains traffic
75
speed collected for those stations in the weekdays of May and June of 2012.
76
77
The full description of the dataset can be found in
78
[Yu et al., 2018](https://arxiv.org/abs/1709.04875).
79
"""
80
81
"""
82
### Loading data
83
"""
84
85
url = "https://github.com/VeritasYin/STGCN_IJCAI-18/raw/master/dataset/PeMSD7_Full.zip"
86
data_dir = keras.utils.get_file(origin=url, extract=True, archive_format="zip")
87
data_dir = data_dir.rstrip("PeMSD7_Full.zip")
88
89
route_distances = pd.read_csv(
90
os.path.join(data_dir, "PeMSD7_W_228.csv"), header=None
91
).to_numpy()
92
speeds_array = pd.read_csv(
93
os.path.join(data_dir, "PeMSD7_V_228.csv"), header=None
94
).to_numpy()
95
96
print(f"route_distances shape={route_distances.shape}")
97
print(f"speeds_array shape={speeds_array.shape}")
98
99
"""
100
### sub-sampling roads
101
102
To reduce the problem size and make the training faster, we will only
103
work with a sample of 26 roads out of the 228 roads in the dataset.
104
We have chosen the roads by starting from road 0, choosing the 5 closest
105
roads to it, and continuing this process until we get 25 roads. You can choose
106
any other subset of the roads. We chose the roads in this way to increase the likelihood
107
of having roads with correlated speed timeseries.
108
`sample_routes` contains the IDs of the selected roads.
109
"""
110
111
sample_routes = [
112
0,
113
1,
114
4,
115
7,
116
8,
117
11,
118
15,
119
108,
120
109,
121
114,
122
115,
123
118,
124
120,
125
123,
126
124,
127
126,
128
127,
129
129,
130
130,
131
132,
132
133,
133
136,
134
139,
135
144,
136
147,
137
216,
138
]
139
route_distances = route_distances[np.ix_(sample_routes, sample_routes)]
140
speeds_array = speeds_array[:, sample_routes]
141
142
print(f"route_distances shape={route_distances.shape}")
143
print(f"speeds_array shape={speeds_array.shape}")
144
145
"""
146
### Data visualization
147
148
Here are the timeseries of the traffic speed for two of the routes:
149
"""
150
151
plt.figure(figsize=(18, 6))
152
plt.plot(speeds_array[:, [0, -1]])
153
plt.legend(["route_0", "route_25"])
154
155
"""
156
We can also visualize the correlation between the timeseries in different routes.
157
"""
158
159
plt.figure(figsize=(8, 8))
160
plt.matshow(np.corrcoef(speeds_array.T), 0)
161
plt.xlabel("road number")
162
plt.ylabel("road number")
163
164
"""
165
Using this correlation heatmap, we can see that for example the speed in
166
routes 4, 5, 6 are highly correlated.
167
"""
168
169
"""
170
### Splitting and normalizing data
171
172
Next, we split the speed values array into train/validation/test sets,
173
and normalize the resulting arrays:
174
"""
175
176
train_size, val_size = 0.5, 0.2
177
178
179
def preprocess(data_array: np.ndarray, train_size: float, val_size: float):
180
"""Splits data into train/val/test sets and normalizes the data.
181
182
Args:
183
data_array: ndarray of shape `(num_time_steps, num_routes)`
184
train_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
185
to include in the train split.
186
val_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
187
to include in the validation split.
188
189
Returns:
190
`train_array`, `val_array`, `test_array`
191
"""
192
193
num_time_steps = data_array.shape[0]
194
num_train, num_val = (
195
int(num_time_steps * train_size),
196
int(num_time_steps * val_size),
197
)
198
train_array = data_array[:num_train]
199
mean, std = train_array.mean(axis=0), train_array.std(axis=0)
200
201
train_array = (train_array - mean) / std
202
val_array = (data_array[num_train : (num_train + num_val)] - mean) / std
203
test_array = (data_array[(num_train + num_val) :] - mean) / std
204
205
return train_array, val_array, test_array
206
207
208
train_array, val_array, test_array = preprocess(speeds_array, train_size, val_size)
209
210
print(f"train set size: {train_array.shape}")
211
print(f"validation set size: {val_array.shape}")
212
print(f"test set size: {test_array.shape}")
213
214
"""
215
### Creating TensorFlow Datasets
216
217
Next, we create the datasets for our forecasting problem. The forecasting problem
218
can be stated as follows: given a sequence of the
219
road speed values at times `t+1, t+2, ..., t+T`, we want to predict the future values of
220
the roads speed for times `t+T+1, ..., t+T+h`. So for each time `t` the inputs to our
221
model are `T` vectors each of size `N` and the targets are `h` vectors each of size `N`,
222
where `N` is the number of roads.
223
"""
224
225
"""
226
We use the Keras built-in function
227
`keras.utils.timeseries_dataset_from_array`.
228
The function `create_tf_dataset()` below takes as input a `numpy.ndarray` and returns a
229
`tf.data.Dataset`. In this function `input_sequence_length=T` and `forecast_horizon=h`.
230
231
The argument `multi_horizon` needs more explanation. Assume `forecast_horizon=3`.
232
If `multi_horizon=True` then the model will make a forecast for time steps
233
`t+T+1, t+T+2, t+T+3`. So the target will have shape `(T,3)`. But if
234
`multi_horizon=False`, the model will make a forecast only for time step `t+T+3` and
235
so the target will have shape `(T, 1)`.
236
237
You may notice that the input tensor in each batch has shape
238
`(batch_size, input_sequence_length, num_routes, 1)`. The last dimension is added to
239
make the model more general: at each time step, the input features for each raod may
240
contain multiple timeseries. For instance, one might want to use temperature timeseries
241
in addition to historical values of the speed as input features. In this example,
242
however, the last dimension of the input is always 1.
243
244
We use the last 12 values of the speed in each road to forecast the speed for 3 time
245
steps ahead:
246
"""
247
248
batch_size = 64
249
input_sequence_length = 12
250
forecast_horizon = 3
251
multi_horizon = False
252
253
254
def create_tf_dataset(
255
data_array: np.ndarray,
256
input_sequence_length: int,
257
forecast_horizon: int,
258
batch_size: int = 128,
259
shuffle=True,
260
multi_horizon=True,
261
):
262
"""Creates tensorflow dataset from numpy array.
263
264
This function creates a dataset where each element is a tuple `(inputs, targets)`.
265
`inputs` is a Tensor
266
of shape `(batch_size, input_sequence_length, num_routes, 1)` containing
267
the `input_sequence_length` past values of the timeseries for each node.
268
`targets` is a Tensor of shape `(batch_size, forecast_horizon, num_routes)`
269
containing the `forecast_horizon`
270
future values of the timeseries for each node.
271
272
Args:
273
data_array: np.ndarray with shape `(num_time_steps, num_routes)`
274
input_sequence_length: Length of the input sequence (in number of timesteps).
275
forecast_horizon: If `multi_horizon=True`, the target will be the values of the timeseries for 1 to
276
`forecast_horizon` timesteps ahead. If `multi_horizon=False`, the target will be the value of the
277
timeseries `forecast_horizon` steps ahead (only one value).
278
batch_size: Number of timeseries samples in each batch.
279
shuffle: Whether to shuffle output samples, or instead draw them in chronological order.
280
multi_horizon: See `forecast_horizon`.
281
282
Returns:
283
A tf.data.Dataset instance.
284
"""
285
286
inputs = keras.utils.timeseries_dataset_from_array(
287
np.expand_dims(data_array[:-forecast_horizon], axis=-1),
288
None,
289
sequence_length=input_sequence_length,
290
shuffle=False,
291
batch_size=batch_size,
292
)
293
294
target_offset = (
295
input_sequence_length
296
if multi_horizon
297
else input_sequence_length + forecast_horizon - 1
298
)
299
target_seq_length = forecast_horizon if multi_horizon else 1
300
targets = keras.utils.timeseries_dataset_from_array(
301
data_array[target_offset:],
302
None,
303
sequence_length=target_seq_length,
304
shuffle=False,
305
batch_size=batch_size,
306
)
307
308
dataset = tf.data.Dataset.zip((inputs, targets))
309
if shuffle:
310
dataset = dataset.shuffle(100)
311
312
return dataset.prefetch(16).cache()
313
314
315
train_dataset, val_dataset = (
316
create_tf_dataset(data_array, input_sequence_length, forecast_horizon, batch_size)
317
for data_array in [train_array, val_array]
318
)
319
320
test_dataset = create_tf_dataset(
321
test_array,
322
input_sequence_length,
323
forecast_horizon,
324
batch_size=test_array.shape[0],
325
shuffle=False,
326
multi_horizon=multi_horizon,
327
)
328
329
330
"""
331
### Roads Graph
332
333
As mentioned before, we assume that the road segments form a graph.
334
The `PeMSD7` dataset has the road segments distance. The next step
335
is to create the graph adjacency matrix from these distances. Following
336
[Yu et al., 2018](https://arxiv.org/abs/1709.04875) (equation 10) we assume there
337
is an edge between two nodes in the graph if the distance between the corresponding roads
338
is less than a threshold.
339
"""
340
341
342
def compute_adjacency_matrix(
343
route_distances: np.ndarray, sigma2: float, epsilon: float
344
):
345
"""Computes the adjacency matrix from distances matrix.
346
347
It uses the formula in https://github.com/VeritasYin/STGCN_IJCAI-18#data-preprocessing to
348
compute an adjacency matrix from the distance matrix.
349
The implementation follows that paper.
350
351
Args:
352
route_distances: np.ndarray of shape `(num_routes, num_routes)`. Entry `i,j` of this array is the
353
distance between roads `i,j`.
354
sigma2: Determines the width of the Gaussian kernel applied to the square distances matrix.
355
epsilon: A threshold specifying if there is an edge between two nodes. Specifically, `A[i,j]=1`
356
if `np.exp(-w2[i,j] / sigma2) >= epsilon` and `A[i,j]=0` otherwise, where `A` is the adjacency
357
matrix and `w2=route_distances * route_distances`
358
359
Returns:
360
A boolean graph adjacency matrix.
361
"""
362
num_routes = route_distances.shape[0]
363
route_distances = route_distances / 10000.0
364
w2, w_mask = (
365
route_distances * route_distances,
366
np.ones([num_routes, num_routes]) - np.identity(num_routes),
367
)
368
return (np.exp(-w2 / sigma2) >= epsilon) * w_mask
369
370
371
"""
372
The function `compute_adjacency_matrix()` returns a boolean adjacency matrix
373
where 1 means there is an edge between two nodes. We use the following class
374
to store the information about the graph.
375
"""
376
377
378
class GraphInfo:
379
def __init__(self, edges: typing.Tuple[list, list], num_nodes: int):
380
self.edges = edges
381
self.num_nodes = num_nodes
382
383
384
sigma2 = 0.1
385
epsilon = 0.5
386
adjacency_matrix = compute_adjacency_matrix(route_distances, sigma2, epsilon)
387
node_indices, neighbor_indices = np.where(adjacency_matrix == 1)
388
graph = GraphInfo(
389
edges=(node_indices.tolist(), neighbor_indices.tolist()),
390
num_nodes=adjacency_matrix.shape[0],
391
)
392
print(f"number of nodes: {graph.num_nodes}, number of edges: {len(graph.edges[0])}")
393
394
"""
395
## Network architecture
396
397
Our model for forecasting over the graph consists of a graph convolution
398
layer and a LSTM layer.
399
"""
400
401
"""
402
### Graph convolution layer
403
404
Our implementation of the graph convolution layer resembles the implementation
405
in [this Keras example](https://keras.io/examples/graph/gnn_citations/). Note that
406
in that example input to the layer is a 2D tensor of shape `(num_nodes,in_feat)`
407
but in our example the input to the layer is a 4D tensor of shape
408
`(num_nodes, batch_size, input_seq_length, in_feat)`. The graph convolution layer
409
performs the following steps:
410
411
- The nodes' representations are computed in `self.compute_nodes_representation()`
412
by multiplying the input features by `self.weight`
413
- The aggregated neighbors' messages are computed in `self.compute_aggregated_messages()`
414
by first aggregating the neighbors' representations and then multiplying the results by
415
`self.weight`
416
- The final output of the layer is computed in `self.update()` by combining the nodes
417
representations and the neighbors' aggregated messages
418
"""
419
420
421
class GraphConv(layers.Layer):
422
def __init__(
423
self,
424
in_feat,
425
out_feat,
426
graph_info: GraphInfo,
427
aggregation_type="mean",
428
combination_type="concat",
429
activation: typing.Optional[str] = None,
430
**kwargs,
431
):
432
super().__init__(**kwargs)
433
self.in_feat = in_feat
434
self.out_feat = out_feat
435
self.graph_info = graph_info
436
self.aggregation_type = aggregation_type
437
self.combination_type = combination_type
438
self.weight = self.add_weight(
439
initializer=keras.initializers.GlorotUniform(),
440
shape=(in_feat, out_feat),
441
dtype="float32",
442
trainable=True,
443
)
444
self.activation = layers.Activation(activation)
445
446
def aggregate(self, neighbour_representations):
447
aggregation_func = {
448
"sum": tf.math.unsorted_segment_sum,
449
"mean": tf.math.unsorted_segment_mean,
450
"max": tf.math.unsorted_segment_max,
451
}.get(self.aggregation_type)
452
453
if aggregation_func:
454
return aggregation_func(
455
neighbour_representations,
456
self.graph_info.edges[0],
457
num_segments=self.graph_info.num_nodes,
458
)
459
460
raise ValueError(f"Invalid aggregation type: {self.aggregation_type}")
461
462
def compute_nodes_representation(self, features):
463
"""Computes each node's representation.
464
465
The nodes' representations are obtained by multiplying the features tensor with
466
`self.weight`. Note that
467
`self.weight` has shape `(in_feat, out_feat)`.
468
469
Args:
470
features: Tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`
471
472
Returns:
473
A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
474
"""
475
return ops.matmul(features, self.weight)
476
477
def compute_aggregated_messages(self, features):
478
neighbour_representations = tf.gather(features, self.graph_info.edges[1])
479
aggregated_messages = self.aggregate(neighbour_representations)
480
return ops.matmul(aggregated_messages, self.weight)
481
482
def update(self, nodes_representation, aggregated_messages):
483
if self.combination_type == "concat":
484
h = ops.concatenate([nodes_representation, aggregated_messages], axis=-1)
485
elif self.combination_type == "add":
486
h = nodes_representation + aggregated_messages
487
else:
488
raise ValueError(f"Invalid combination type: {self.combination_type}.")
489
return self.activation(h)
490
491
def call(self, features):
492
"""Forward pass.
493
494
Args:
495
features: tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`
496
497
Returns:
498
A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
499
"""
500
nodes_representation = self.compute_nodes_representation(features)
501
aggregated_messages = self.compute_aggregated_messages(features)
502
return self.update(nodes_representation, aggregated_messages)
503
504
505
"""
506
### LSTM plus graph convolution
507
508
By applying the graph convolution layer to the input tensor, we get another tensor
509
containing the nodes' representations over time (another 4D tensor). For each time
510
step, a node's representation is informed by the information from its neighbors.
511
512
To make good forecasts, however, we need not only information from the neighbors
513
but also we need to process the information over time. To this end, we can pass each
514
node's tensor through a recurrent layer. The `LSTMGC` layer below, first applies
515
a graph convolution layer to the inputs and then passes the results through a
516
`LSTM` layer.
517
"""
518
519
520
class LSTMGC(layers.Layer):
521
"""Layer comprising a convolution layer followed by LSTM and dense layers."""
522
523
def __init__(
524
self,
525
in_feat,
526
out_feat,
527
lstm_units: int,
528
input_seq_len: int,
529
output_seq_len: int,
530
graph_info: GraphInfo,
531
graph_conv_params: typing.Optional[dict] = None,
532
**kwargs,
533
):
534
super().__init__(**kwargs)
535
536
# graph conv layer
537
if graph_conv_params is None:
538
graph_conv_params = {
539
"aggregation_type": "mean",
540
"combination_type": "concat",
541
"activation": None,
542
}
543
self.graph_conv = GraphConv(in_feat, out_feat, graph_info, **graph_conv_params)
544
545
self.lstm = layers.LSTM(lstm_units, activation="relu")
546
self.dense = layers.Dense(output_seq_len)
547
548
self.input_seq_len, self.output_seq_len = input_seq_len, output_seq_len
549
550
def call(self, inputs):
551
"""Forward pass.
552
553
Args:
554
inputs: tensor of shape `(batch_size, input_seq_len, num_nodes, in_feat)`
555
556
Returns:
557
A tensor of shape `(batch_size, output_seq_len, num_nodes)`.
558
"""
559
560
# convert shape to (num_nodes, batch_size, input_seq_len, in_feat)
561
inputs = ops.transpose(inputs, [2, 0, 1, 3])
562
563
gcn_out = self.graph_conv(
564
inputs
565
) # gcn_out has shape: (num_nodes, batch_size, input_seq_len, out_feat)
566
shape = ops.shape(gcn_out)
567
num_nodes, batch_size, input_seq_len, out_feat = (
568
shape[0],
569
shape[1],
570
shape[2],
571
shape[3],
572
)
573
574
# LSTM takes only 3D tensors as input
575
gcn_out = ops.reshape(
576
gcn_out, (batch_size * num_nodes, input_seq_len, out_feat)
577
)
578
lstm_out = self.lstm(
579
gcn_out
580
) # lstm_out has shape: (batch_size * num_nodes, lstm_units)
581
582
dense_output = self.dense(
583
lstm_out
584
) # dense_output has shape: (batch_size * num_nodes, output_seq_len)
585
output = ops.reshape(dense_output, (num_nodes, batch_size, self.output_seq_len))
586
return ops.transpose(
587
output, [1, 2, 0]
588
) # returns Tensor of shape (batch_size, output_seq_len, num_nodes)
589
590
591
"""
592
## Model training
593
"""
594
595
in_feat = 1
596
batch_size = 64
597
epochs = 20
598
input_sequence_length = 12
599
forecast_horizon = 3
600
multi_horizon = False
601
out_feat = 10
602
lstm_units = 64
603
graph_conv_params = {
604
"aggregation_type": "mean",
605
"combination_type": "concat",
606
"activation": None,
607
}
608
609
st_gcn = LSTMGC(
610
in_feat,
611
out_feat,
612
lstm_units,
613
input_sequence_length,
614
forecast_horizon,
615
graph,
616
graph_conv_params,
617
)
618
inputs = layers.Input((input_sequence_length, graph.num_nodes, in_feat))
619
outputs = st_gcn(inputs)
620
621
model = keras.models.Model(inputs, outputs)
622
model.compile(
623
optimizer=keras.optimizers.RMSprop(learning_rate=0.0002),
624
loss=keras.losses.MeanSquaredError(),
625
)
626
model.fit(
627
train_dataset,
628
validation_data=val_dataset,
629
epochs=epochs,
630
callbacks=[keras.callbacks.EarlyStopping(patience=10)],
631
)
632
633
"""
634
## Making forecasts on test set
635
636
Now we can use the trained model to make forecasts for the test set. Below, we
637
compute the MAE of the model and compare it to the MAE of naive forecasts.
638
The naive forecasts are the last value of the speed for each node.
639
"""
640
641
x_test, y = next(test_dataset.as_numpy_iterator())
642
y_pred = model.predict(x_test)
643
plt.figure(figsize=(18, 6))
644
plt.plot(y[:, 0, 0])
645
plt.plot(y_pred[:, 0, 0])
646
plt.legend(["actual", "forecast"])
647
648
naive_mse, model_mse = (
649
np.square(x_test[:, -1, :, 0] - y[:, 0, :]).mean(),
650
np.square(y_pred[:, 0, :] - y[:, 0, :]).mean(),
651
)
652
print(f"naive MAE: {naive_mse}, model MAE: {model_mse}")
653
654
"""
655
Of course, the goal here is to demonstrate the method,
656
not to achieve the best performance. To improve the
657
model's accuracy, all model hyperparameters should be tuned carefully. In addition,
658
several of the `LSTMGC` blocks can be stacked to increase the representation power
659
of the model.
660
"""
661
662