Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.
Path: blob/master/Build Basic Generative Adversarial Networks (GANs)/Week 3 - Wasserstein GANs with Gradient Penalty/SNGAN.ipynb
Views: 13373
Spectrally Normalized Generative Adversarial Networks (SN-GAN)
Please note that this is an optional notebook, meant to introduce more advanced concepts if you're up for a challenge, so don't worry if you don't completely follow!
Goals
In this notebook, you'll learn about and implement spectral normalization, a weight normalization technique to stabilize the training of the discriminator, as proposed in Spectral Normalization for Generative Adversarial Networks (Miyato et al. 2018).
Background
As its name suggests, SN-GAN normalizes the weight matrices in the discriminator by their corresponding spectral norm, which helps control the Lipschitz constant of the discriminator. As you have learned with WGAN, Lipschitz continuity is important in ensuring the boundedness of the optimal discriminator. In the WGAN case, this makes it so that the underlying W-loss function for the discriminator (or more precisely, the critic) is valid.
As a result, spectral normalization helps improve stability and avoid vanishing gradient problems, such as mode collapse.
Spectral Norm
Notationally, the spectral norm of a matrix is typically represented as . For neural network purposes, this matrix represents a weight matrix in one of the network's layers. The spectral norm of a matrix is the matrix's largest singular value, which can be obtained via singular value decomposition (SVD).
A Quick Refresher on SVD
SVD is a generalization of eigendecomposition and is used to factorize a matrix as , where are orthogonal matrices and is a matrix of singular values on its diagonal. Note that doesn't have to be square.
where and are the largest and smallest singular values, respectively. Intuitively, larger values correspond to larger amounts of stretching a matrix can apply to another vector. Following this notation, .
Applying SVD to Spectral Normalization
To spectrally normalize the weight matrix, you divide every value in the matrix by its spectral norm. As a result, a spectrally normalized matrix can be expressed as
In practice, computing the SVD of is expensive, so the authors of the SN-GAN paper do something very neat. They instead approximate the left and right singular vectors, and respectively, through power iteration such that .
Starting from randomly initialization, and are updated according to
In practice, one round of iteration is sufficient to "achieve satisfactory performance" as per the authors.
Don't worry if you don't completely follow this! The algorithm is conveniently implemented as torch.nn.utils.spectral_norm
in PyTorch, so as long as you get the general gist of how it might be useful and when to use it, then you're all set.
A Bit of History on Spectral Normalization
This isn't the first time that spectral norm has been proposed in the context of deep learning models. There's a paper called Spectral Norm Regularization for Improving the Generalizability of Deep Learning (Yoshida et al. 2017) that proposes spectral norm regularization, which they showed to improve the generalizability of models by adding extra loss terms onto the loss function (just as L2 regularization and gradient penalty do!). These extra loss terms specifically penalize the spectral norm of the weights. You can think of this as data-independent regularization because the gradient with respect to isn't a function of the minibatch.
Spectral normalization, on the other hand, sets the spectral norm of the weight matrices to 1 -- it's a much harder constraint than adding a loss term, which is a form of "soft" regularization. As the authors show in the paper, you can think of spectral normalization as data-dependent regularization, since the gradient with respect to is dependent on the mini-batch statistics (shown in Section 2.1 of the main paper). Spectral normalization essentially prevents the transformation of each layer from becoming to sensitive in one direction and mitigates exploding gradients.
DCGAN with Spectral Normalization
In rest of this notebook, you will walk through how to apply spectral normalization to DCGAN as an example, using your earlier DCGAN implementation. You can always add spectral normalization to your other models too.
Here, you start with the same setup and helper function, as you've seen before.
DCGAN Generator
Since spectral normalization is only applied to the matrices in the discriminator, the generator implementation is the same as the original.
DCGAN Discriminator
For the discriminator, you can wrap each nn.Conv2d
with nn.utils.spectral_norm
. In the backend, this introduces parameters for and in addition to so that the can be computed as in runtime.
Pytorch also provides a nn.utils.remove_spectral_norm
function, which collapses the 3 separate parameters into a single explicit . You should only apply this to your convolutional layers during inference to improve runtime speed.
It is important note that spectral norm does not eliminate the need for batch norm. Spectral norm affects the weights of each layer, while batch norm affects the activations of each layer. You can see both in a discriminator architecture, but you can also see just one of them. Hope this is something you have fun experimenting with!
Training SN-DCGAN
You can now put everything together and train a spectrally normalized DCGAN! Here are all your parameters for initialization and optimization.
Now, initialize the generator, the discriminator, and the optimizers.
Finally, train the whole thing! And babysit those outputs 😃