Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Build Better Generative Adversarial Networks (GANs)/Week 1 - Evaluation of GANs/PPL.ipynb
Views: 13373
Perceptual Path Length (PPL)
Please note that this is an optional notebook, meant to introduce more advanced concepts if you're up for a challenge, so don't worry if you don't completely follow!
Perceptual path length (PPL) was a metric that was introduced as part of StyleGAN to evaluate how well a generator manages to smoothly interpolate between points in its latent space. In essence, if you travel between two images produced by a generator on a straight line in the latent space, PPL measures the expected "jarringness" of each step in the interpolation when you add together the jarringness of steps from random paths. In this notebook, you'll walk through the motivation and mechanism behind PPL.
The StyleGAN2 paper noted that metric also "correlates with consistency and stability of shapes," which led to one of the major changes between the two papers.
And don't worry, we don't expect you to be familiar with StyleGAN yet - you'll learn more about it later in this course!
Perceptual Similarity
Like FID, which you learned about this week, PPL uses the feature embeddings of deep convolutional neural network. Specifically, the distance between two image embeddings as proposed in The Unreasonable Effectiveness of Deep Features as a Perceptual Metric by Zhang et al (CVPR 2018). In this approach, unlike in FID, a VGG16 network is used instead of an InceptionNet.
Perceptual similarity is closely similar to the distance between two feature vectors, with one key difference: the features are passed through a learned transformation, which is trained to match human intuition on image similarity. Specifically, when shown two images with various transformations from a base image, the LPIPS ("Learned Perceptual Image Patch Similarity") metric is meant to have a lower distance for the image that people think is closer.
Figure from The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , showing a source image in the center and two transformations of it. Humans generally found the right-side image more similar to the center image than the left-side image, and the LPIPS metric matches this.
For our implementation, we can use the lpips
library, implemented by the authors of the perceptual similarity paper.
You'll define your generator and a function to visualize the images.
You'll also load a generator, pre-trained on CelebA, like in the main assignment for this week.
From LPIPS to PPL
Note that perceptual path length builds directly on the LPIPS metric.
As you'll learn, StyleGAN does not operate directly on the randomly sampled latent vector. Instead, it learns a mapping from to -- that is, . You'll learn more about this later, but for now, all you need to know is that there are two spaces over which you can calculate PPL.
Linear Interpolation (-space)
For the space, PPL is defined as follows using linear interpolation:
First, you sample two points in -space, and , from two randomly sampled points in -space. For simplicity, we'll let be the identity function here.
You will use your generator to produce two images interpolating between and , where the amount of is , one where the amount of is . You can think of as sampling a random point along the path interpolating between and .
You can use the torch.lerp
function for linear interpolation, and sample a random uniformly from 0 to 1 using torch.rand
. Also, here we can set for visualization, even though in the StyleGAN paper .
Now you can visualize these images and evaluate their LPIPS:
Finally, you need to account for the impact of different values of , so that the perceptual path length converges as . In order to do this, PPL divides by .
This leaves you with the following overall equation:
You'll notice the expectation symbol: that's because this is all repeated many times in order to approximate PPL.
Spherical Interpolation (-space)
Because you sample points in from a Gaussian, we use spherical interpolation instead of linear interpolation to interpolate in -space. We can use scipy.spatial.geometric_slerp
for this.
where and denotes the normalized version of x.
There you have it! Now you understand how PPL works - hopefully this makes you excited to start learning about StyleGAN.