Path: blob/master/examples/keras_rs/ipynb/basic_retrieval.ipynb
3508 views
Recommending movies: retrieval
Author: Fabien Hertschuh, Abheesht Sharma
Date created: 2025/04/28
Last modified: 2025/04/28
Description: Retrieve movies using a two tower model.
Introduction
Recommender systems are often composed of two stages:
The retrieval stage is responsible for selecting an initial set of hundreds of candidates from all possible candidates. The main objective of this model is to efficiently weed out all candidates that the user is not interested in. Because the retrieval model may be dealing with millions of candidates, it has to be computationally efficient.
The ranking stage takes the outputs of the retrieval model and fine-tunes them to select the best possible handful of recommendations. Its task is to narrow down the set of items the user may be interested in to a shortlist of likely candidates.
In this tutorial, we're going to focus on the first stage, retrieval. If you are interested in the ranking stage, have a look at our ranking tutorial.
Retrieval models are often composed of two sub-models:
A query tower computing the query representation (normally a fixed-dimensionality embedding vector) using query features.
A candidate tower computing the candidate representation (an equally-sized vector) using the candidate features. The outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the candidate and the query.
In this tutorial, we're going to build and train such a two-tower model using the Movielens dataset.
We're going to:
Get our data and split it into a training and test set.
Implement a retrieval model.
Fit and evaluate it.
Test running predictions with the model.
The dataset
The Movielens dataset is a classic dataset from the GroupLens research group at the University of Minnesota. It contains a set of ratings given to movies by a set of users, and is a standard for recommender systems research.
The data can be treated in two ways:
It can be interpreted as expressesing which movies the users watched (and rated), and which they did not. This is a form of implicit feedback, where users' watches tell us which things they prefer to see and which they'd rather not see.
It can also be seen as expressesing how much the users liked the movies they did watch. This is a form of explicit feedback: given that a user watched a movie, we can tell how much they liked by looking at the rating they have given.
In this tutorial, we are focusing on a retrieval system: a model that predicts a set of movies from the catalogue that the user is likely to watch. For this, the model will try to predict the rating users would give to all the movies in the catalogue. We will therefore use the explicit rating data.
Let's begin by choosing JAX as the backend we want to run on, and import all the necessary libraries.
Preparing the dataset
Let's first have a look at the data.
We use the MovieLens dataset from Tensorflow Datasets. Loading movielens/100k_ratings
yields a tf.data.Dataset
object containing the ratings alongside user and movie data. Loading movielens/100k_movies
yields a tf.data.Dataset
object containing only the movies data.
Note that since the MovieLens dataset does not have predefined splits, all data are under train
split.
The ratings dataset returns a dictionary of movie id, user id, the assigned rating, timestamp, movie information, and user information:
In the Movielens dataset, user IDs are integers (represented as strings) starting at 1 and with no gap. Normally, you would need to create a lookup table to map user IDs to integers from 0 to N-1. But as a simplication, we'll use the user id directly as an index in our model, in particular to lookup the user embedding from the user embedding table. So we need do know the number of users.
The movies dataset contains the movie id, movie title, and the genres it belongs to. Note that the genres are encoded with integer labels.
In the Movielens dataset, movie IDs are integers (represented as strings) starting at 1 and with no gap. Normally, you would need to create a lookup table to map movie IDs to integers from 0 to N-1. But as a simplication, we'll use the movie id directly as an index in our model, in particular to lookup the movie embedding from the movie embedding table. So we need do know the number of movies.
In this example, we're going to focus on the ratings data. Other tutorials explore how to use the movie information data as well as the user information to improve the model quality.
We keep only the user_id
, movie_id
and rating
fields in the dataset. Our input is the user_id
. The labels are the movie_id
alongside the rating
for the given movie and user.
The rating
is a number between 1 and 5, we adapt it to be between 0 and 1.
To fit and evaluate the model, we need to split it into a training and evaluation set. In a real recommender system, this would most likely be done by time: the data up to time T would be used to predict interactions after T.
In this simple example, however, let's use a random split, putting 80% of the ratings in the train set, and 20% in the test set.
Implementing the Model
Choosing the architecture of our model is a key part of modelling.
We are building a two-tower retrieval model, therefore we need to combine a query tower for users and a candidate tower for movies.
The first step is to decide on the dimensionality of the query and candidate representations. This is the embedding_dimension
argument in our model constructor. We'll test with a value of 32
. Higher values will correspond to models that may be more accurate, but will also be slower to fit and more prone to overfitting.
Query and Candidate Towers
The second step is to define the model itself. In this simple example, the query tower and candidate tower are simply embeddings with nothing else. We'll use Keras' Embedding
layer.
We can easily extend the towers to make them arbitrarily complex using standard Keras components, as long as we return an embedding_dimension
-wide output at the end.
Retrieval
The retrieval itself will be performed by BruteForceRetrieval
layer from Keras Recommenders. This layer computes the affinity scores for the given users and all the candidate movies, then returns the top K in order.
Note that during training, we don't actually need to perform any retrieval since the only affinity scores we need are the ones for the users and movies in the batch. As an optimization, we skip the retrieval entirely in the call
method.
Loss
The next component is the loss used to train our model. In this case, we use a mean square error loss to measure the difference between the predicted movie ratings and the actual ratins from users.
Note that we override compute_loss
from the keras.Model
class. This allows us to compute the query-candidate affinity score, which is obtained by multiplying the outputs of the two towers together. That affinity score can then be passed to the loss function.
Fitting and evaluating
After defining the model, we can use the standard Keras model.fit()
to train and evaluate the model.
Let's first instantiate the model. Note that we add + 1
to the number of users and movies to account for the fact that id zero is not used for either (IDs start at 1), but still takes a row in the embedding tables.
Then train the model. Evaluation takes a bit of time, so we only evaluate the model every 5 epochs.
Making predictions
Now that we have a model, we would like to be able to make predictions.
So far, we have only handled movies by id. Now is the time to create a mapping keyed by movie IDs to be able to surface the titles.
We then simply use the Keras model.predict()
method. Under the hood, it calls the BruteForceRetrieval
layer to perform the actual retrieval.
Note that this model can retrieve movies already watched by the user. We could easily add logic to remove them if that is desirable.
Item-to-item recommendation
In this model, we created a user-movie model. However, for some applications (for example, product detail pages) it's common to perform item-to-item (for example, movie-to-movie or product-to-product) recommendations.
Training models like this would follow the same pattern as shown in this tutorial, but with different training data. Here, we had a user and a movie tower, and used (user, movie) pairs to train them. In an item-to-item model, we would have two item towers (for the query and candidate item), and train the model using (query item, candidate item) pairs. These could be constructed from clicks on product detail pages.