Path: blob/main/C4/W1/assignment/C4W1_Assignment.ipynb
2956 views
Week 1: Working with time series
Welcome! In this assignment you will be working with time series data. All of the data is going to be generated and you will implement several functions to split the data, create forecasts and evaluate the quality of those forecasts.
TIPS FOR SUCCESSFUL GRADING OF YOUR ASSIGNMENT:
All cells are frozen except for the ones where you need to submit your solutions or when explicitly mentioned you can interact with it.
You can add new cells to experiment but these will be omitted by the grader, so don't rely on newly created cells to host your solution code, use the provided places for this.
You can add the comment # grade-up-to-here in any graded cell to signal the grader that it must only evaluate up to that point. This is helpful if you want to check if you are on the right track even if you are not done with the whole assignment. Be sure to remember to delete the comment afterwards!
Avoid using global variables unless you absolutely have to. The grader tests your code in an isolated environment without running all cells from the top. As a result, global variables may be unavailable when scoring your submission. Global variables that are meant to be used will be defined in UPPERCASE.
This assignment builds one block on top of the other, so it is very important that you pass all unittests before continuing to the next section, otherwise you might have issues grading your submission.
To submit your notebook, save it and then click on the blue submit button at the beginning of the page.
Let's get started!
The next cell includes a bunch of helper functions to generate and plot the time series:
Generate time series data
Using the previous functions, generate data that resembles a real-life time series.
Notice that TIME
represents the values in the x-coordinate while SERIES
represents the values in the y-coordinate. This naming is used to avoid confusion with other kinds of data in which x
and y
have different meanings.
This is a good time to also define some useful global variables.
Exercise 1: train_val_split
Now that you have the time series, let's split it so you can start forecasting.
Complete the train_val_split
function below which receives the time
(x coordinate) and series
(y coordinate) data. Notice that this value defaults to 1100 since this is an appropriate step to split the series into training and validation:
Expected Output:
![]() | ![]() |
Evaluation Metrics
Exercise 2: compute_metrics
Now that you have successfully split the data into training and validation sets you will need a way of knowing how good your forecasts are. For this complete the compute_metrics
below. This function receives the true series and the forecast and returns the mse
and the mae
between the two curves. You should use functions provided by tf.keras.losses
to compute MSE and MAE errors.
Notice that this function does not receive any time (x coordinate) data since it assumes that both series will have the same values for the x coordinate
Expected Output:
Forecasting
Now that you have a way of measuring the performance of your forecasts it is time to actually start doing some forecasts. Your goal is to predict the values in the validation set.
Let's start easy by using a naive forecast.
Naive Forecast
Exercise 3: naive_forecast
Define the naive_forecast
variable below. Remember that the naive forecast simply takes the last value to predict the next one. This means that the forecast series should be identical to the validation series but delayed one time step.
Hint:
You need to pass the correct elements of the original series SERIES
to compute the naive_forecast
. Here are a few things to keep in mind:
To make the forecast for the first element in the validation set you need the value of the very last element on the train set
You should leave out the last element, since the forecast obtained using this value does not exists in the validation set and you will not be able to compute the evaluation metrics if this element is kept.
Expected Output:
Expected Output:

Let's zoom in on the end of the validation period:
Expected Output:

You should see that the naive forecast lags 1 step behind the time series and that both series end on the same time step.
Now let's compute the mean squared error and the mean absolute error between the forecasts and the predictions in the validation period:
Expected Output:
That's our baseline, now let's try a moving average.
Moving Average
Exercise 4: moving_average_forecast
Complete the moving_average_forecast
function below. This function receives a series
and a window_size
and computes the moving average forecast for every point after the initial window_size
values.
This function should receive the complete SERIES
and, just for this exercise, you will get the prediction for all the SERIES
. The returned prediction will then be sliced to match the validation period, so your function doesn't need to account for matching the series to the validation period.
You cannot compute the moving average for the first window_size
values since there aren't enough values to compute the desired average. So if you use the whole SERIES
and a window_size
of 50 your function should return a series with the number of elements equal to:
Expected Output:
Expected output:

Expected Output:
That's worse than naive forecast! The moving average does not anticipate trend or seasonality, so let's try to remove them by using differentiation.
Differencing
Exercise 5: diff_series
Since the seasonality period is 365 days, we will subtract the value at time t – 365 from the value at time t.
Define the diff_series
and diff_time
variables below to achieve this. Notice that diff_time
is the values of the x-coordinate for diff_series
.
Expected Output:
Expected output:

Exercise 6: diff_moving_average
Great, the trend and seasonality seem to be gone, so now we can use the moving average.
Define the diff_moving_avg
variable.
Notice that the window_size
has already being defined and that you will need to perform the correct slicing for the series to match the validation period.
Expected Output:
Expected output:

Exercise 7: diff_moving_avg_plus_past
Now let's bring back the trend and seasonality by adding the past values from t – 365. For each value you want to forecast, you will be adding the exact same point, but from the previous cycle in the original time series.
Expected Output:
Expected output:

Expected Output:
Better than naive forecast, good. However the forecasts look a bit too random, because we're just adding past values, which were noisy.
Exercise 8: smooth_past_series
Let's use a moving averaging on past values to remove some of the noise. Use a window_size=10
for this smoothing.
Expected Output:
Expected output:

Expected Output:
Congratulations on finishing this week's assignment!
You have successfully implemented functions for time series splitting and evaluation while also learning how to deal with time series data and how to code forecasting methods!
Keep it up!