CoCalc -- timeseries_traffic

GitHub Repository: keras-team/keras-io
Path: blob/master/examples/timeseries/timeseries_traffic_forecasting.py
³⁵⁰⁷ views
1
"""
2
Title: Traffic forecasting using graph neural networks and LSTM
3
Author: [Arash Khodadadi](https://www.linkedin.com/in/arash-khodadadi-08a02490/)
4
Date created: 2021/12/28
5
Last modified: 2023/11/22
6
Description: This example demonstrates how to do timeseries forecasting over graphs.
7
Accelerator: GPU
8
"""
9

10
"""
11
## Introduction
12

13
This example shows how to forecast traffic condition using graph neural networks and LSTM.
14
Specifically, we are interested in predicting the future values of the traffic speed given
15
a history of the traffic speed for a collection of road segments.
16

17
One popular method to
18
solve this problem is to consider each road segment's traffic speed as a separate
19
timeseries and predict the future values of each timeseries
20
using the past values of the same timeseries.
21

22
This method, however, ignores the dependency of the traffic speed of one road segment on
23
the neighboring segments. To be able to take into account the complex interactions between
24
the traffic speed on a collection of neighboring roads, we can define the traffic network
25
as a graph and consider the traffic speed as a signal on this graph. In this example,
26
we implement a neural network architecture which can process timeseries data over a graph.
27
We first show how to process the data and create a
28
[tf.data.Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) for
29
forecasting over graphs. Then, we implement a model which uses graph convolution and
30
LSTM layers to perform forecasting over a graph.
31

32
The data processing and the model architecture are inspired by this paper:
33

34
Yu, Bing, Haoteng Yin, and Zhanxing Zhu. "Spatio-temporal graph convolutional networks:
35
a deep learning framework for traffic forecasting." Proceedings of the 27th International
36
Joint Conference on Artificial Intelligence, 2018.
37
([github](https://github.com/VeritasYin/STGCN_IJCAI-18))
38
"""
39

40
"""
41
## Setup
42
"""
43

44
import os
45

46
os.environ["KERAS_BACKEND"] = "tensorflow"
47

48
import pandas as pd
49
import numpy as np
50
import typing
51
import matplotlib.pyplot as plt
52

53
import tensorflow as tf
54
import keras
55
from keras import layers
56
from keras import ops
57

58
"""
59
## Data preparation
60
"""
61

62
"""
63
### Data description
64

65
We use a real-world traffic speed dataset named `PeMSD7`. We use the version
66
collected and prepared by [Yu et al., 2018](https://arxiv.org/abs/1709.04875)
67
and available
68
[here](https://github.com/VeritasYin/STGCN_IJCAI-18/tree/master/dataset).
69

70
The data consists of two files:
71

72
- `PeMSD7_W_228.csv` contains the distances between 228
73
stations across the District 7 of California.
74
- `PeMSD7_V_228.csv` contains traffic
75
speed collected for those stations in the weekdays of May and June of 2012.
76

77
The full description of the dataset can be found in
78
[Yu et al., 2018](https://arxiv.org/abs/1709.04875).
79
"""
80

81
"""
82
### Loading data
83
"""
84

85
url = "https://github.com/VeritasYin/STGCN_IJCAI-18/raw/master/dataset/PeMSD7_Full.zip"
86
data_dir = keras.utils.get_file(origin=url, extract=True, archive_format="zip")
87
data_dir = data_dir.rstrip("PeMSD7_Full.zip")
88

89
route_distances = pd.read_csv(
90
    os.path.join(data_dir, "PeMSD7_W_228.csv"), header=None
91
).to_numpy()
92
speeds_array = pd.read_csv(
93
    os.path.join(data_dir, "PeMSD7_V_228.csv"), header=None
94
).to_numpy()
95

96
print(f"route_distances shape={route_distances.shape}")
97
print(f"speeds_array shape={speeds_array.shape}")
98

99
"""
100
### sub-sampling roads
101

102
To reduce the problem size and make the training faster, we will only
103
work with a sample of 26 roads out of the 228 roads in the dataset.
104
We have chosen the roads by starting from road 0, choosing the 5 closest
105
roads to it, and continuing this process until we get 25 roads. You can choose
106
any other subset of the roads. We chose the roads in this way to increase the likelihood
107
of having roads with correlated speed timeseries.
108
`sample_routes` contains the IDs of the selected roads.
109
"""
110

111
sample_routes = [
112
    0,
113
    1,
114
    4,
115
    7,
116
    8,
117
    11,
118
    15,
119
    108,
120
    109,
121
    114,
122
    115,
123
    118,
124
    120,
125
    123,
126
    124,
127
    126,
128
    127,
129
    129,
130
    130,
131
    132,
132
    133,
133
    136,
134
    139,
135
    144,
136
    147,
137
    216,
138
]
139
route_distances = route_distances[np.ix_(sample_routes, sample_routes)]
140
speeds_array = speeds_array[:, sample_routes]
141

142
print(f"route_distances shape={route_distances.shape}")
143
print(f"speeds_array shape={speeds_array.shape}")
144

145
"""
146
### Data visualization
147

148
Here are the timeseries of the traffic speed for two of the routes:
149
"""
150

151
plt.figure(figsize=(18, 6))
152
plt.plot(speeds_array[:, [0, -1]])
153
plt.legend(["route_0", "route_25"])
154

155
"""
156
We can also visualize the correlation between the timeseries in different routes.
157
"""
158

159
plt.figure(figsize=(8, 8))
160
plt.matshow(np.corrcoef(speeds_array.T), 0)
161
plt.xlabel("road number")
162
plt.ylabel("road number")
163

164
"""
165
Using this correlation heatmap, we can see that for example the speed in
166
routes 4, 5, 6 are highly correlated.
167
"""
168

169
"""
170
### Splitting and normalizing data
171

172
Next, we split the speed values array into train/validation/test sets,
173
and normalize the resulting arrays:
174
"""
175

176
train_size, val_size = 0.5, 0.2
177

178

179
def preprocess(data_array: np.ndarray, train_size: float, val_size: float):
180
    """Splits data into train/val/test sets and normalizes the data.
181

182
    Args:
183
        data_array: ndarray of shape `(num_time_steps, num_routes)`
184
        train_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
185
            to include in the train split.
186
        val_size: A float value between 0.0 and 1.0 that represent the proportion of the dataset
187
            to include in the validation split.
188

189
    Returns:
190
        `train_array`, `val_array`, `test_array`
191
    """
192

193
    num_time_steps = data_array.shape[0]
194
    num_train, num_val = (
195
        int(num_time_steps * train_size),
196
        int(num_time_steps * val_size),
197
    )
198
    train_array = data_array[:num_train]
199
    mean, std = train_array.mean(axis=0), train_array.std(axis=0)
200

201
    train_array = (train_array - mean) / std
202
    val_array = (data_array[num_train : (num_train + num_val)] - mean) / std
203
    test_array = (data_array[(num_train + num_val) :] - mean) / std
204

205
    return train_array, val_array, test_array
206

207

208
train_array, val_array, test_array = preprocess(speeds_array, train_size, val_size)
209

210
print(f"train set size: {train_array.shape}")
211
print(f"validation set size: {val_array.shape}")
212
print(f"test set size: {test_array.shape}")
213

214
"""
215
### Creating TensorFlow Datasets
216

217
Next, we create the datasets for our forecasting problem. The forecasting problem
218
can be stated as follows: given a sequence of the
219
road speed values at times `t+1, t+2, ..., t+T`, we want to predict the future values of
220
the roads speed for times `t+T+1, ..., t+T+h`. So for each time `t` the inputs to our
221
model are `T` vectors each of size `N` and the targets are `h` vectors each of size `N`,
222
where `N` is the number of roads.
223
"""
224

225
"""
226
We use the Keras built-in function
227
`keras.utils.timeseries_dataset_from_array`.
228
The function `create_tf_dataset()` below takes as input a `numpy.ndarray` and returns a
229
`tf.data.Dataset`. In this function `input_sequence_length=T` and `forecast_horizon=h`.
230

231
The argument `multi_horizon` needs more explanation. Assume `forecast_horizon=3`.
232
If `multi_horizon=True` then the model will make a forecast for time steps
233
`t+T+1, t+T+2, t+T+3`. So the target will have shape `(T,3)`. But if
234
`multi_horizon=False`, the model will make a forecast only for time step `t+T+3` and
235
so the target will have shape `(T, 1)`.
236

237
You may notice that the input tensor in each batch has shape
238
`(batch_size, input_sequence_length, num_routes, 1)`. The last dimension is added to
239
make the model more general: at each time step, the input features for each raod may
240
contain multiple timeseries. For instance, one might want to use temperature timeseries
241
in addition to historical values of the speed as input features. In this example,
242
however, the last dimension of the input is always 1.
243

244
We use the last 12 values of the speed in each road to forecast the speed for 3 time
245
steps ahead:
246
"""
247

248
batch_size = 64
249
input_sequence_length = 12
250
forecast_horizon = 3
251
multi_horizon = False
252

253

254
def create_tf_dataset(
255
    data_array: np.ndarray,
256
    input_sequence_length: int,
257
    forecast_horizon: int,
258
    batch_size: int = 128,
259
    shuffle=True,
260
    multi_horizon=True,
261
):
262
    """Creates tensorflow dataset from numpy array.
263

264
    This function creates a dataset where each element is a tuple `(inputs, targets)`.
265
    `inputs` is a Tensor
266
    of shape `(batch_size, input_sequence_length, num_routes, 1)` containing
267
    the `input_sequence_length` past values of the timeseries for each node.
268
    `targets` is a Tensor of shape `(batch_size, forecast_horizon, num_routes)`
269
    containing the `forecast_horizon`
270
    future values of the timeseries for each node.
271

272
    Args:
273
        data_array: np.ndarray with shape `(num_time_steps, num_routes)`
274
        input_sequence_length: Length of the input sequence (in number of timesteps).
275
        forecast_horizon: If `multi_horizon=True`, the target will be the values of the timeseries for 1 to
276
            `forecast_horizon` timesteps ahead. If `multi_horizon=False`, the target will be the value of the
277
            timeseries `forecast_horizon` steps ahead (only one value).
278
        batch_size: Number of timeseries samples in each batch.
279
        shuffle: Whether to shuffle output samples, or instead draw them in chronological order.
280
        multi_horizon: See `forecast_horizon`.
281

282
    Returns:
283
        A tf.data.Dataset instance.
284
    """
285

286
    inputs = keras.utils.timeseries_dataset_from_array(
287
        np.expand_dims(data_array[:-forecast_horizon], axis=-1),
288
        None,
289
        sequence_length=input_sequence_length,
290
        shuffle=False,
291
        batch_size=batch_size,
292
    )
293

294
    target_offset = (
295
        input_sequence_length
296
        if multi_horizon
297
        else input_sequence_length + forecast_horizon - 1
298
    )
299
    target_seq_length = forecast_horizon if multi_horizon else 1
300
    targets = keras.utils.timeseries_dataset_from_array(
301
        data_array[target_offset:],
302
        None,
303
        sequence_length=target_seq_length,
304
        shuffle=False,
305
        batch_size=batch_size,
306
    )
307

308
    dataset = tf.data.Dataset.zip((inputs, targets))
309
    if shuffle:
310
        dataset = dataset.shuffle(100)
311

312
    return dataset.prefetch(16).cache()
313

314

315
train_dataset, val_dataset = (
316
    create_tf_dataset(data_array, input_sequence_length, forecast_horizon, batch_size)
317
    for data_array in [train_array, val_array]
318
)
319

320
test_dataset = create_tf_dataset(
321
    test_array,
322
    input_sequence_length,
323
    forecast_horizon,
324
    batch_size=test_array.shape[0],
325
    shuffle=False,
326
    multi_horizon=multi_horizon,
327
)
328

329

330
"""
331
### Roads Graph
332

333
As mentioned before, we assume that the road segments form a graph.
334
The `PeMSD7` dataset has the road segments distance. The next step
335
is to create the graph adjacency matrix from these distances. Following
336
[Yu et al., 2018](https://arxiv.org/abs/1709.04875) (equation 10) we assume there
337
is an edge between two nodes in the graph if the distance between the corresponding roads
338
is less than a threshold.
339
"""
340

341

342
def compute_adjacency_matrix(
343
    route_distances: np.ndarray, sigma2: float, epsilon: float
344
):
345
    """Computes the adjacency matrix from distances matrix.
346

347
    It uses the formula in https://github.com/VeritasYin/STGCN_IJCAI-18#data-preprocessing to
348
    compute an adjacency matrix from the distance matrix.
349
    The implementation follows that paper.
350

351
    Args:
352
        route_distances: np.ndarray of shape `(num_routes, num_routes)`. Entry `i,j` of this array is the
353
            distance between roads `i,j`.
354
        sigma2: Determines the width of the Gaussian kernel applied to the square distances matrix.
355
        epsilon: A threshold specifying if there is an edge between two nodes. Specifically, `A[i,j]=1`
356
            if `np.exp(-w2[i,j] / sigma2) >= epsilon` and `A[i,j]=0` otherwise, where `A` is the adjacency
357
            matrix and `w2=route_distances * route_distances`
358

359
    Returns:
360
        A boolean graph adjacency matrix.
361
    """
362
    num_routes = route_distances.shape[0]
363
    route_distances = route_distances / 10000.0
364
    w2, w_mask = (
365
        route_distances * route_distances,
366
        np.ones([num_routes, num_routes]) - np.identity(num_routes),
367
    )
368
    return (np.exp(-w2 / sigma2) >= epsilon) * w_mask
369

370

371
"""
372
The function `compute_adjacency_matrix()` returns a boolean adjacency matrix
373
where 1 means there is an edge between two nodes. We use the following class
374
to store the information about the graph.
375
"""
376

377

378
class GraphInfo:
379
    def __init__(self, edges: typing.Tuple[list, list], num_nodes: int):
380
        self.edges = edges
381
        self.num_nodes = num_nodes
382

383

384
sigma2 = 0.1
385
epsilon = 0.5
386
adjacency_matrix = compute_adjacency_matrix(route_distances, sigma2, epsilon)
387
node_indices, neighbor_indices = np.where(adjacency_matrix == 1)
388
graph = GraphInfo(
389
    edges=(node_indices.tolist(), neighbor_indices.tolist()),
390
    num_nodes=adjacency_matrix.shape[0],
391
)
392
print(f"number of nodes: {graph.num_nodes}, number of edges: {len(graph.edges[0])}")
393

394
"""
395
## Network architecture
396

397
Our model for forecasting over the graph consists of a graph convolution
398
layer and a LSTM layer.
399
"""
400

401
"""
402
### Graph convolution layer
403

404
Our implementation of the graph convolution layer resembles the implementation
405
in [this Keras example](https://keras.io/examples/graph/gnn_citations/). Note that
406
in that example input to the layer is a 2D tensor of shape `(num_nodes,in_feat)`
407
but in our example the input to the layer is a 4D tensor of shape
408
`(num_nodes, batch_size, input_seq_length, in_feat)`. The graph convolution layer
409
performs the following steps:
410

411
- The nodes' representations are computed in `self.compute_nodes_representation()`
412
by multiplying the input features by `self.weight`
413
- The aggregated neighbors' messages are computed in `self.compute_aggregated_messages()`
414
by first aggregating the neighbors' representations and then multiplying the results by
415
`self.weight`
416
- The final output of the layer is computed in `self.update()` by combining the nodes
417
representations and the neighbors' aggregated messages
418
"""
419

420

421
class GraphConv(layers.Layer):
422
    def __init__(
423
        self,
424
        in_feat,
425
        out_feat,
426
        graph_info: GraphInfo,
427
        aggregation_type="mean",
428
        combination_type="concat",
429
        activation: typing.Optional[str] = None,
430
        **kwargs,
431
    ):
432
        super().__init__(**kwargs)
433
        self.in_feat = in_feat
434
        self.out_feat = out_feat
435
        self.graph_info = graph_info
436
        self.aggregation_type = aggregation_type
437
        self.combination_type = combination_type
438
        self.weight = self.add_weight(
439
            initializer=keras.initializers.GlorotUniform(),
440
            shape=(in_feat, out_feat),
441
            dtype="float32",
442
            trainable=True,
443
        )
444
        self.activation = layers.Activation(activation)
445

446
    def aggregate(self, neighbour_representations):
447
        aggregation_func = {
448
            "sum": tf.math.unsorted_segment_sum,
449
            "mean": tf.math.unsorted_segment_mean,
450
            "max": tf.math.unsorted_segment_max,
451
        }.get(self.aggregation_type)
452

453
        if aggregation_func:
454
            return aggregation_func(
455
                neighbour_representations,
456
                self.graph_info.edges[0],
457
                num_segments=self.graph_info.num_nodes,
458
            )
459

460
        raise ValueError(f"Invalid aggregation type: {self.aggregation_type}")
461

462
    def compute_nodes_representation(self, features):
463
        """Computes each node's representation.
464

465
        The nodes' representations are obtained by multiplying the features tensor with
466
        `self.weight`. Note that
467
        `self.weight` has shape `(in_feat, out_feat)`.
468

469
        Args:
470
            features: Tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`
471

472
        Returns:
473
            A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
474
        """
475
        return ops.matmul(features, self.weight)
476

477
    def compute_aggregated_messages(self, features):
478
        neighbour_representations = tf.gather(features, self.graph_info.edges[1])
479
        aggregated_messages = self.aggregate(neighbour_representations)
480
        return ops.matmul(aggregated_messages, self.weight)
481

482
    def update(self, nodes_representation, aggregated_messages):
483
        if self.combination_type == "concat":
484
            h = ops.concatenate([nodes_representation, aggregated_messages], axis=-1)
485
        elif self.combination_type == "add":
486
            h = nodes_representation + aggregated_messages
487
        else:
488
            raise ValueError(f"Invalid combination type: {self.combination_type}.")
489
        return self.activation(h)
490

491
    def call(self, features):
492
        """Forward pass.
493

494
        Args:
495
            features: tensor of shape `(num_nodes, batch_size, input_seq_len, in_feat)`
496

497
        Returns:
498
            A tensor of shape `(num_nodes, batch_size, input_seq_len, out_feat)`
499
        """
500
        nodes_representation = self.compute_nodes_representation(features)
501
        aggregated_messages = self.compute_aggregated_messages(features)
502
        return self.update(nodes_representation, aggregated_messages)
503

504

505
"""
506
### LSTM plus graph convolution
507

508
By applying the graph convolution layer to the input tensor, we get another tensor
509
containing the nodes' representations over time (another 4D tensor). For each time
510
step, a node's representation is informed by the information from its neighbors.
511

512
To make good forecasts, however, we need not only information from the neighbors
513
but also we need to process the information over time. To this end, we can pass each
514
node's tensor through a recurrent layer. The `LSTMGC` layer below, first applies
515
a graph convolution layer to the inputs and then passes the results through a
516
`LSTM` layer.
517
"""
518

519

520
class LSTMGC(layers.Layer):
521
    """Layer comprising a convolution layer followed by LSTM and dense layers."""
522

523
    def __init__(
524
        self,
525
        in_feat,
526
        out_feat,
527
        lstm_units: int,
528
        input_seq_len: int,
529
        output_seq_len: int,
530
        graph_info: GraphInfo,
531
        graph_conv_params: typing.Optional[dict] = None,
532
        **kwargs,
533
    ):
534
        super().__init__(**kwargs)
535

536
        # graph conv layer
537
        if graph_conv_params is None:
538
            graph_conv_params = {
539
                "aggregation_type": "mean",
540
                "combination_type": "concat",
541
                "activation": None,
542
            }
543
        self.graph_conv = GraphConv(in_feat, out_feat, graph_info, **graph_conv_params)
544

545
        self.lstm = layers.LSTM(lstm_units, activation="relu")
546
        self.dense = layers.Dense(output_seq_len)
547

548
        self.input_seq_len, self.output_seq_len = input_seq_len, output_seq_len
549

550
    def call(self, inputs):
551
        """Forward pass.
552

553
        Args:
554
            inputs: tensor of shape `(batch_size, input_seq_len, num_nodes, in_feat)`
555

556
        Returns:
557
            A tensor of shape `(batch_size, output_seq_len, num_nodes)`.
558
        """
559

560
        # convert shape to  (num_nodes, batch_size, input_seq_len, in_feat)
561
        inputs = ops.transpose(inputs, [2, 0, 1, 3])
562

563
        gcn_out = self.graph_conv(
564
            inputs
565
        )  # gcn_out has shape: (num_nodes, batch_size, input_seq_len, out_feat)
566
        shape = ops.shape(gcn_out)
567
        num_nodes, batch_size, input_seq_len, out_feat = (
568
            shape[0],
569
            shape[1],
570
            shape[2],
571
            shape[3],
572
        )
573

574
        # LSTM takes only 3D tensors as input
575
        gcn_out = ops.reshape(
576
            gcn_out, (batch_size * num_nodes, input_seq_len, out_feat)
577
        )
578
        lstm_out = self.lstm(
579
            gcn_out
580
        )  # lstm_out has shape: (batch_size * num_nodes, lstm_units)
581

582
        dense_output = self.dense(
583
            lstm_out
584
        )  # dense_output has shape: (batch_size * num_nodes, output_seq_len)
585
        output = ops.reshape(dense_output, (num_nodes, batch_size, self.output_seq_len))
586
        return ops.transpose(
587
            output, [1, 2, 0]
588
        )  # returns Tensor of shape (batch_size, output_seq_len, num_nodes)
589

590

591
"""
592
## Model training
593
"""
594

595
in_feat = 1
596
batch_size = 64
597
epochs = 20
598
input_sequence_length = 12
599
forecast_horizon = 3
600
multi_horizon = False
601
out_feat = 10
602
lstm_units = 64
603
graph_conv_params = {
604
    "aggregation_type": "mean",
605
    "combination_type": "concat",
606
    "activation": None,
607
}
608

609
st_gcn = LSTMGC(
610
    in_feat,
611
    out_feat,
612
    lstm_units,
613
    input_sequence_length,
614
    forecast_horizon,
615
    graph,
616
    graph_conv_params,
617
)
618
inputs = layers.Input((input_sequence_length, graph.num_nodes, in_feat))
619
outputs = st_gcn(inputs)
620

621
model = keras.models.Model(inputs, outputs)
622
model.compile(
623
    optimizer=keras.optimizers.RMSprop(learning_rate=0.0002),
624
    loss=keras.losses.MeanSquaredError(),
625
)
626
model.fit(
627
    train_dataset,
628
    validation_data=val_dataset,
629
    epochs=epochs,
630
    callbacks=[keras.callbacks.EarlyStopping(patience=10)],
631
)
632

633
"""
634
## Making forecasts on test set
635

636
Now we can use the trained model to make forecasts for the test set. Below, we
637
compute the MAE of the model and compare it to the MAE of naive forecasts.
638
The naive forecasts are the last value of the speed for each node.
639
"""
640

641
x_test, y = next(test_dataset.as_numpy_iterator())
642
y_pred = model.predict(x_test)
643
plt.figure(figsize=(18, 6))
644
plt.plot(y[:, 0, 0])
645
plt.plot(y_pred[:, 0, 0])
646
plt.legend(["actual", "forecast"])
647

648
naive_mse, model_mse = (
649
    np.square(x_test[:, -1, :, 0] - y[:, 0, :]).mean(),
650
    np.square(y_pred[:, 0, :] - y[:, 0, :]).mean(),
651
)
652
print(f"naive MAE: {naive_mse}, model MAE: {model_mse}")
653

654
"""
655
Of course, the goal here is to demonstrate the method,
656
not to achieve the best performance. To improve the
657
model's accuracy, all model hyperparameters should be tuned carefully. In addition,
658
several of the `LSTMGC` blocks can be stacked to increase the representation power
659
of the model.
660
"""
661

662
Product

Resources

Company