Path: blob/master/examples/keras_recipes/ipynb/memory_efficient_embeddings.ipynb
3508 views
Memory-efficient embeddings for recommendation systems
Author: Khalid Salama
Date created: 2021/02/15
Last modified: 2023/11/15
Description: Using compositional & mixed-dimension embeddings for memory-efficient recommendation models.
Introduction
This example demonstrates two techniques for building memory-efficient recommendation models by reducing the size of the embedding tables, without sacrificing model effectiveness:
Quotient-remainder trick, by Hao-Jun Michael Shi et al., which reduces the number of embedding vectors to store, yet produces unique embedding vector for each item without explicit definition.
Mixed Dimension embeddings, by Antonio Ginart et al., which stores embedding vectors with mixed dimensions, where less popular items have reduced dimension embeddings.
We use the 1M version of the Movielens dataset. The dataset includes around 1 million ratings from 6,000 users on 4,000 movies.
Setup
Prepare the data
Download and process data
Create train and eval data splits
Define dataset metadata and hyperparameters
Train and evaluate the model
Experiment 1: baseline collaborative filtering model
Implement embedding encoder
Implement the baseline model
Notice that the number of trainable parameters is 623,744
Experiment 2: memory-efficient model
Implement Quotient-Remainder embedding as a layer
The Quotient-Remainder technique works as follows. For a set of vocabulary and embedding size embedding_dim
, instead of creating a vocabulary_size X embedding_dim
embedding table, we create two num_buckets X embedding_dim
embedding tables, where num_buckets
is much smaller than vocabulary_size
. An embedding for a given item index
is generated via the following steps:
Compute the
quotient_index
asindex // num_buckets
.Compute the
remainder_index
asindex % num_buckets
.Lookup
quotient_embedding
from the first embedding table usingquotient_index
.Lookup
remainder_embedding
from the second embedding table usingremainder_index
.Return
quotient_embedding
*remainder_embedding
.
This technique not only reduces the number of embedding vectors needs to be stored and trained, but also generates a unique embedding vector for each item of size embedding_dim
. Note that q_embedding
and r_embedding
can be combined using other operations, like Add
and Concatenate
.
Implement Mixed Dimension embedding as a layer
In the mixed dimension embedding technique, we train embedding vectors with full dimensions for the frequently queried items, while train embedding vectors with reduced dimensions for less frequent items, plus a projection weights matrix to bring low dimension embeddings to the full dimensions.
More precisely, we define blocks of items of similar frequencies. For each block, a block_vocab_size X block_embedding_dim
embedding table and block_embedding_dim X full_embedding_dim
projection weights matrix are created. Note that, if block_embedding_dim
equals full_embedding_dim
, the projection weights matrix becomes an identity matrix. Embeddings for a given batch of item indices
are generated via the following steps:
For each block, lookup the
block_embedding_dim
embedding vectors usingindices
, and project them to thefull_embedding_dim
.If an item index does not belong to a given block, an out-of-vocabulary embedding is returned. Each block will return a
batch_size X full_embedding_dim
tensor.A mask is applied to the embeddings returned from each block in order to convert the out-of-vocabulary embeddings to vector of zeros. That is, for each item in the batch, a single non-zero embedding vector is returned from the all block embeddings.
Embeddings retrieved from the blocks are combined using sum to produce the final
batch_size X full_embedding_dim
tensor.
Implement the memory-efficient model
In this experiment, we are going to use the Quotient-Remainder technique to reduce the size of the user embeddings, and the Mixed Dimension technique to reduce the size of the movie embeddings.
While in the paper, an alpha-power rule is used to determined the dimensions of the embedding of each block, we simply set the number of blocks and the dimensions of embeddings of each block based on the histogram visualization of movies popularity.
You can see that we can group the movies into three blocks, and assign them 64, 32, and 16 embedding dimensions, respectively. Feel free to experiment with different number of blocks and dimensions.
Notice that the number of trainable parameters is 117,968, which is more than 5x less than the number of parameters in the baseline model.