Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/beginner_source/dcgan_faces_tutorial.py
Views: 712
# -*- coding: utf-8 -*-1"""2DCGAN Tutorial3==============45**Author**: `Nathan Inkawhich <https://github.com/inkawhich>`__67"""8910######################################################################11# Introduction12# ------------13#14# This tutorial will give an introduction to DCGANs through an example. We15# will train a generative adversarial network (GAN) to generate new16# celebrities after showing it pictures of many real celebrities. Most of17# the code here is from the DCGAN implementation in18# `pytorch/examples <https://github.com/pytorch/examples>`__, and this19# document will give a thorough explanation of the implementation and shed20# light on how and why this model works. But don’t worry, no prior21# knowledge of GANs is required, but it may require a first-timer to spend22# some time reasoning about what is actually happening under the hood.23# Also, for the sake of time it will help to have a GPU, or two. Lets24# start from the beginning.25#26# Generative Adversarial Networks27# -------------------------------28#29# What is a GAN?30# ~~~~~~~~~~~~~~31#32# GANs are a framework for teaching a deep learning model to capture the training33# data distribution so we can generate new data from that same34# distribution. GANs were invented by Ian Goodfellow in 2014 and first35# described in the paper `Generative Adversarial36# Nets <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__.37# They are made of two distinct models, a *generator* and a38# *discriminator*. The job of the generator is to spawn ‘fake’ images that39# look like the training images. The job of the discriminator is to look40# at an image and output whether or not it is a real training image or a41# fake image from the generator. During training, the generator is42# constantly trying to outsmart the discriminator by generating better and43# better fakes, while the discriminator is working to become a better44# detective and correctly classify the real and fake images. The45# equilibrium of this game is when the generator is generating perfect46# fakes that look as if they came directly from the training data, and the47# discriminator is left to always guess at 50% confidence that the48# generator output is real or fake.49#50# Now, lets define some notation to be used throughout tutorial starting51# with the discriminator. Let :math:`x` be data representing an image.52# :math:`D(x)` is the discriminator network which outputs the (scalar)53# probability that :math:`x` came from training data rather than the54# generator. Here, since we are dealing with images, the input to55# :math:`D(x)` is an image of CHW size 3x64x64. Intuitively, :math:`D(x)`56# should be HIGH when :math:`x` comes from training data and LOW when57# :math:`x` comes from the generator. :math:`D(x)` can also be thought of58# as a traditional binary classifier.59#60# For the generator’s notation, let :math:`z` be a latent space vector61# sampled from a standard normal distribution. :math:`G(z)` represents the62# generator function which maps the latent vector :math:`z` to data-space.63# The goal of :math:`G` is to estimate the distribution that the training64# data comes from (:math:`p_{data}`) so it can generate fake samples from65# that estimated distribution (:math:`p_g`).66#67# So, :math:`D(G(z))` is the probability (scalar) that the output of the68# generator :math:`G` is a real image. As described in `Goodfellow’s69# paper <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__,70# :math:`D` and :math:`G` play a minimax game in which :math:`D` tries to71# maximize the probability it correctly classifies reals and fakes72# (:math:`logD(x)`), and :math:`G` tries to minimize the probability that73# :math:`D` will predict its outputs are fake (:math:`log(1-D(G(z)))`).74# From the paper, the GAN loss function is75#76# .. math:: \underset{G}{\text{min}} \underset{D}{\text{max}}V(D,G) = \mathbb{E}_{x\sim p_{data}(x)}\big[logD(x)\big] + \mathbb{E}_{z\sim p_{z}(z)}\big[log(1-D(G(z)))\big]77#78# In theory, the solution to this minimax game is where79# :math:`p_g = p_{data}`, and the discriminator guesses randomly if the80# inputs are real or fake. However, the convergence theory of GANs is81# still being actively researched and in reality models do not always82# train to this point.83#84# What is a DCGAN?85# ~~~~~~~~~~~~~~~~86#87# A DCGAN is a direct extension of the GAN described above, except that it88# explicitly uses convolutional and convolutional-transpose layers in the89# discriminator and generator, respectively. It was first described by90# Radford et. al. in the paper `Unsupervised Representation Learning With91# Deep Convolutional Generative Adversarial92# Networks <https://arxiv.org/pdf/1511.06434.pdf>`__. The discriminator93# is made up of strided94# `convolution <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>`__95# layers, `batch96# norm <https://pytorch.org/docs/stable/nn.html#torch.nn.BatchNorm2d>`__97# layers, and98# `LeakyReLU <https://pytorch.org/docs/stable/nn.html#torch.nn.LeakyReLU>`__99# activations. The input is a 3x64x64 input image and the output is a100# scalar probability that the input is from the real data distribution.101# The generator is comprised of102# `convolutional-transpose <https://pytorch.org/docs/stable/nn.html#torch.nn.ConvTranspose2d>`__103# layers, batch norm layers, and104# `ReLU <https://pytorch.org/docs/stable/nn.html#relu>`__ activations. The105# input is a latent vector, :math:`z`, that is drawn from a standard106# normal distribution and the output is a 3x64x64 RGB image. The strided107# conv-transpose layers allow the latent vector to be transformed into a108# volume with the same shape as an image. In the paper, the authors also109# give some tips about how to setup the optimizers, how to calculate the110# loss functions, and how to initialize the model weights, all of which111# will be explained in the coming sections.112#113114#%matplotlib inline115import argparse116import os117import random118import torch119import torch.nn as nn120import torch.nn.parallel121import torch.optim as optim122import torch.utils.data123import torchvision.datasets as dset124import torchvision.transforms as transforms125import torchvision.utils as vutils126import numpy as np127import matplotlib.pyplot as plt128import matplotlib.animation as animation129from IPython.display import HTML130131# Set random seed for reproducibility132manualSeed = 999133#manualSeed = random.randint(1, 10000) # use if you want new results134print("Random Seed: ", manualSeed)135random.seed(manualSeed)136torch.manual_seed(manualSeed)137torch.use_deterministic_algorithms(True) # Needed for reproducible results138139140######################################################################141# Inputs142# ------143#144# Let’s define some inputs for the run:145#146# - ``dataroot`` - the path to the root of the dataset folder. We will147# talk more about the dataset in the next section.148# - ``workers`` - the number of worker threads for loading the data with149# the ``DataLoader``.150# - ``batch_size`` - the batch size used in training. The DCGAN paper151# uses a batch size of 128.152# - ``image_size`` - the spatial size of the images used for training.153# This implementation defaults to 64x64. If another size is desired,154# the structures of D and G must be changed. See155# `here <https://github.com/pytorch/examples/issues/70>`__ for more156# details.157# - ``nc`` - number of color channels in the input images. For color158# images this is 3.159# - ``nz`` - length of latent vector.160# - ``ngf`` - relates to the depth of feature maps carried through the161# generator.162# - ``ndf`` - sets the depth of feature maps propagated through the163# discriminator.164# - ``num_epochs`` - number of training epochs to run. Training for165# longer will probably lead to better results but will also take much166# longer.167# - ``lr`` - learning rate for training. As described in the DCGAN paper,168# this number should be 0.0002.169# - ``beta1`` - beta1 hyperparameter for Adam optimizers. As described in170# paper, this number should be 0.5.171# - ``ngpu`` - number of GPUs available. If this is 0, code will run in172# CPU mode. If this number is greater than 0 it will run on that number173# of GPUs.174#175176# Root directory for dataset177dataroot = "data/celeba"178179# Number of workers for dataloader180workers = 2181182# Batch size during training183batch_size = 128184185# Spatial size of training images. All images will be resized to this186# size using a transformer.187image_size = 64188189# Number of channels in the training images. For color images this is 3190nc = 3191192# Size of z latent vector (i.e. size of generator input)193nz = 100194195# Size of feature maps in generator196ngf = 64197198# Size of feature maps in discriminator199ndf = 64200201# Number of training epochs202num_epochs = 5203204# Learning rate for optimizers205lr = 0.0002206207# Beta1 hyperparameter for Adam optimizers208beta1 = 0.5209210# Number of GPUs available. Use 0 for CPU mode.211ngpu = 1212213214######################################################################215# Data216# ----217#218# In this tutorial we will use the `Celeb-A Faces219# dataset <http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html>`__ which can220# be downloaded at the linked site, or in `Google221# Drive <https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg>`__.222# The dataset will download as a file named ``img_align_celeba.zip``. Once223# downloaded, create a directory named ``celeba`` and extract the zip file224# into that directory. Then, set the ``dataroot`` input for this notebook to225# the ``celeba`` directory you just created. The resulting directory226# structure should be:227#228# .. code-block:: sh229#230# /path/to/celeba231# -> img_align_celeba232# -> 188242.jpg233# -> 173822.jpg234# -> 284702.jpg235# -> 537394.jpg236# ...237#238# This is an important step because we will be using the ``ImageFolder``239# dataset class, which requires there to be subdirectories in the240# dataset root folder. Now, we can create the dataset, create the241# dataloader, set the device to run on, and finally visualize some of the242# training data.243#244245# We can use an image folder dataset the way we have it setup.246# Create the dataset247dataset = dset.ImageFolder(root=dataroot,248transform=transforms.Compose([249transforms.Resize(image_size),250transforms.CenterCrop(image_size),251transforms.ToTensor(),252transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),253]))254# Create the dataloader255dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,256shuffle=True, num_workers=workers)257258# Decide which device we want to run on259device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")260261# Plot some training images262real_batch = next(iter(dataloader))263plt.figure(figsize=(8,8))264plt.axis("off")265plt.title("Training Images")266plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))267plt.show()268269270######################################################################271# Implementation272# --------------273#274# With our input parameters set and the dataset prepared, we can now get275# into the implementation. We will start with the weight initialization276# strategy, then talk about the generator, discriminator, loss functions,277# and training loop in detail.278#279# Weight Initialization280# ~~~~~~~~~~~~~~~~~~~~~281#282# From the DCGAN paper, the authors specify that all model weights shall283# be randomly initialized from a Normal distribution with ``mean=0``,284# ``stdev=0.02``. The ``weights_init`` function takes an initialized model as285# input and reinitializes all convolutional, convolutional-transpose, and286# batch normalization layers to meet this criteria. This function is287# applied to the models immediately after initialization.288#289290# custom weights initialization called on ``netG`` and ``netD``291def weights_init(m):292classname = m.__class__.__name__293if classname.find('Conv') != -1:294nn.init.normal_(m.weight.data, 0.0, 0.02)295elif classname.find('BatchNorm') != -1:296nn.init.normal_(m.weight.data, 1.0, 0.02)297nn.init.constant_(m.bias.data, 0)298299300######################################################################301# Generator302# ~~~~~~~~~303#304# The generator, :math:`G`, is designed to map the latent space vector305# (:math:`z`) to data-space. Since our data are images, converting306# :math:`z` to data-space means ultimately creating a RGB image with the307# same size as the training images (i.e. 3x64x64). In practice, this is308# accomplished through a series of strided two dimensional convolutional309# transpose layers, each paired with a 2d batch norm layer and a relu310# activation. The output of the generator is fed through a tanh function311# to return it to the input data range of :math:`[-1,1]`. It is worth312# noting the existence of the batch norm functions after the313# conv-transpose layers, as this is a critical contribution of the DCGAN314# paper. These layers help with the flow of gradients during training. An315# image of the generator from the DCGAN paper is shown below.316#317# .. figure:: /_static/img/dcgan_generator.png318# :alt: dcgan_generator319#320# Notice, how the inputs we set in the input section (``nz``, ``ngf``, and321# ``nc``) influence the generator architecture in code. ``nz`` is the length322# of the z input vector, ``ngf`` relates to the size of the feature maps323# that are propagated through the generator, and ``nc`` is the number of324# channels in the output image (set to 3 for RGB images). Below is the325# code for the generator.326#327328# Generator Code329330class Generator(nn.Module):331def __init__(self, ngpu):332super(Generator, self).__init__()333self.ngpu = ngpu334self.main = nn.Sequential(335# input is Z, going into a convolution336nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False),337nn.BatchNorm2d(ngf * 8),338nn.ReLU(True),339# state size. ``(ngf*8) x 4 x 4``340nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),341nn.BatchNorm2d(ngf * 4),342nn.ReLU(True),343# state size. ``(ngf*4) x 8 x 8``344nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),345nn.BatchNorm2d(ngf * 2),346nn.ReLU(True),347# state size. ``(ngf*2) x 16 x 16``348nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),349nn.BatchNorm2d(ngf),350nn.ReLU(True),351# state size. ``(ngf) x 32 x 32``352nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),353nn.Tanh()354# state size. ``(nc) x 64 x 64``355)356357def forward(self, input):358return self.main(input)359360361######################################################################362# Now, we can instantiate the generator and apply the ``weights_init``363# function. Check out the printed model to see how the generator object is364# structured.365#366367# Create the generator368netG = Generator(ngpu).to(device)369370# Handle multi-GPU if desired371if (device.type == 'cuda') and (ngpu > 1):372netG = nn.DataParallel(netG, list(range(ngpu)))373374# Apply the ``weights_init`` function to randomly initialize all weights375# to ``mean=0``, ``stdev=0.02``.376netG.apply(weights_init)377378# Print the model379print(netG)380381382######################################################################383# Discriminator384# ~~~~~~~~~~~~~385#386# As mentioned, the discriminator, :math:`D`, is a binary classification387# network that takes an image as input and outputs a scalar probability388# that the input image is real (as opposed to fake). Here, :math:`D` takes389# a 3x64x64 input image, processes it through a series of Conv2d,390# BatchNorm2d, and LeakyReLU layers, and outputs the final probability391# through a Sigmoid activation function. This architecture can be extended392# with more layers if necessary for the problem, but there is significance393# to the use of the strided convolution, BatchNorm, and LeakyReLUs. The394# DCGAN paper mentions it is a good practice to use strided convolution395# rather than pooling to downsample because it lets the network learn its396# own pooling function. Also batch norm and leaky relu functions promote397# healthy gradient flow which is critical for the learning process of both398# :math:`G` and :math:`D`.399#400401#########################################################################402# Discriminator Code403404class Discriminator(nn.Module):405def __init__(self, ngpu):406super(Discriminator, self).__init__()407self.ngpu = ngpu408self.main = nn.Sequential(409# input is ``(nc) x 64 x 64``410nn.Conv2d(nc, ndf, 4, 2, 1, bias=False),411nn.LeakyReLU(0.2, inplace=True),412# state size. ``(ndf) x 32 x 32``413nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False),414nn.BatchNorm2d(ndf * 2),415nn.LeakyReLU(0.2, inplace=True),416# state size. ``(ndf*2) x 16 x 16``417nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),418nn.BatchNorm2d(ndf * 4),419nn.LeakyReLU(0.2, inplace=True),420# state size. ``(ndf*4) x 8 x 8``421nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),422nn.BatchNorm2d(ndf * 8),423nn.LeakyReLU(0.2, inplace=True),424# state size. ``(ndf*8) x 4 x 4``425nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),426nn.Sigmoid()427)428429def forward(self, input):430return self.main(input)431432433######################################################################434# Now, as with the generator, we can create the discriminator, apply the435# ``weights_init`` function, and print the model’s structure.436#437438# Create the Discriminator439netD = Discriminator(ngpu).to(device)440441# Handle multi-GPU if desired442if (device.type == 'cuda') and (ngpu > 1):443netD = nn.DataParallel(netD, list(range(ngpu)))444445# Apply the ``weights_init`` function to randomly initialize all weights446# like this: ``to mean=0, stdev=0.2``.447netD.apply(weights_init)448449# Print the model450print(netD)451452453######################################################################454# Loss Functions and Optimizers455# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~456#457# With :math:`D` and :math:`G` setup, we can specify how they learn458# through the loss functions and optimizers. We will use the Binary Cross459# Entropy loss460# (`BCELoss <https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html#torch.nn.BCELoss>`__)461# function which is defined in PyTorch as:462#463# .. math:: \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right]464#465# Notice how this function provides the calculation of both log components466# in the objective function (i.e. :math:`log(D(x))` and467# :math:`log(1-D(G(z)))`). We can specify what part of the BCE equation to468# use with the :math:`y` input. This is accomplished in the training loop469# which is coming up soon, but it is important to understand how we can470# choose which component we wish to calculate just by changing :math:`y`471# (i.e. GT labels).472#473# Next, we define our real label as 1 and the fake label as 0. These474# labels will be used when calculating the losses of :math:`D` and475# :math:`G`, and this is also the convention used in the original GAN476# paper. Finally, we set up two separate optimizers, one for :math:`D` and477# one for :math:`G`. As specified in the DCGAN paper, both are Adam478# optimizers with learning rate 0.0002 and Beta1 = 0.5. For keeping track479# of the generator’s learning progression, we will generate a fixed batch480# of latent vectors that are drawn from a Gaussian distribution481# (i.e. fixed_noise) . In the training loop, we will periodically input482# this fixed_noise into :math:`G`, and over the iterations we will see483# images form out of the noise.484#485486# Initialize the ``BCELoss`` function487criterion = nn.BCELoss()488489# Create batch of latent vectors that we will use to visualize490# the progression of the generator491fixed_noise = torch.randn(64, nz, 1, 1, device=device)492493# Establish convention for real and fake labels during training494real_label = 1.495fake_label = 0.496497# Setup Adam optimizers for both G and D498optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))499optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))500501502######################################################################503# Training504# ~~~~~~~~505#506# Finally, now that we have all of the parts of the GAN framework defined,507# we can train it. Be mindful that training GANs is somewhat of an art508# form, as incorrect hyperparameter settings lead to mode collapse with509# little explanation of what went wrong. Here, we will closely follow510# Algorithm 1 from the `Goodfellow’s paper <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`__,511# while abiding by some of the best512# practices shown in `ganhacks <https://github.com/soumith/ganhacks>`__.513# Namely, we will “construct different mini-batches for real and fake”514# images, and also adjust G’s objective function to maximize515# :math:`log(D(G(z)))`. Training is split up into two main parts. Part 1516# updates the Discriminator and Part 2 updates the Generator.517#518# **Part 1 - Train the Discriminator**519#520# Recall, the goal of training the discriminator is to maximize the521# probability of correctly classifying a given input as real or fake. In522# terms of Goodfellow, we wish to “update the discriminator by ascending523# its stochastic gradient”. Practically, we want to maximize524# :math:`log(D(x)) + log(1-D(G(z)))`. Due to the separate mini-batch525# suggestion from `ganhacks <https://github.com/soumith/ganhacks>`__,526# we will calculate this in two steps. First, we527# will construct a batch of real samples from the training set, forward528# pass through :math:`D`, calculate the loss (:math:`log(D(x))`), then529# calculate the gradients in a backward pass. Secondly, we will construct530# a batch of fake samples with the current generator, forward pass this531# batch through :math:`D`, calculate the loss (:math:`log(1-D(G(z)))`),532# and *accumulate* the gradients with a backward pass. Now, with the533# gradients accumulated from both the all-real and all-fake batches, we534# call a step of the Discriminator’s optimizer.535#536# **Part 2 - Train the Generator**537#538# As stated in the original paper, we want to train the Generator by539# minimizing :math:`log(1-D(G(z)))` in an effort to generate better fakes.540# As mentioned, this was shown by Goodfellow to not provide sufficient541# gradients, especially early in the learning process. As a fix, we542# instead wish to maximize :math:`log(D(G(z)))`. In the code we accomplish543# this by: classifying the Generator output from Part 1 with the544# Discriminator, computing G’s loss *using real labels as GT*, computing545# G’s gradients in a backward pass, and finally updating G’s parameters546# with an optimizer step. It may seem counter-intuitive to use the real547# labels as GT labels for the loss function, but this allows us to use the548# :math:`log(x)` part of the ``BCELoss`` (rather than the :math:`log(1-x)`549# part) which is exactly what we want.550#551# Finally, we will do some statistic reporting and at the end of each552# epoch we will push our fixed_noise batch through the generator to553# visually track the progress of G’s training. The training statistics554# reported are:555#556# - **Loss_D** - discriminator loss calculated as the sum of losses for557# the all real and all fake batches (:math:`log(D(x)) + log(1 - D(G(z)))`).558# - **Loss_G** - generator loss calculated as :math:`log(D(G(z)))`559# - **D(x)** - the average output (across the batch) of the discriminator560# for the all real batch. This should start close to 1 then561# theoretically converge to 0.5 when G gets better. Think about why562# this is.563# - **D(G(z))** - average discriminator outputs for the all fake batch.564# The first number is before D is updated and the second number is565# after D is updated. These numbers should start near 0 and converge to566# 0.5 as G gets better. Think about why this is.567#568# **Note:** This step might take a while, depending on how many epochs you569# run and if you removed some data from the dataset.570#571572# Training Loop573574# Lists to keep track of progress575img_list = []576G_losses = []577D_losses = []578iters = 0579580print("Starting Training Loop...")581# For each epoch582for epoch in range(num_epochs):583# For each batch in the dataloader584for i, data in enumerate(dataloader, 0):585586############################587# (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))588###########################589## Train with all-real batch590netD.zero_grad()591# Format batch592real_cpu = data[0].to(device)593b_size = real_cpu.size(0)594label = torch.full((b_size,), real_label, dtype=torch.float, device=device)595# Forward pass real batch through D596output = netD(real_cpu).view(-1)597# Calculate loss on all-real batch598errD_real = criterion(output, label)599# Calculate gradients for D in backward pass600errD_real.backward()601D_x = output.mean().item()602603## Train with all-fake batch604# Generate batch of latent vectors605noise = torch.randn(b_size, nz, 1, 1, device=device)606# Generate fake image batch with G607fake = netG(noise)608label.fill_(fake_label)609# Classify all fake batch with D610output = netD(fake.detach()).view(-1)611# Calculate D's loss on the all-fake batch612errD_fake = criterion(output, label)613# Calculate the gradients for this batch, accumulated (summed) with previous gradients614errD_fake.backward()615D_G_z1 = output.mean().item()616# Compute error of D as sum over the fake and the real batches617errD = errD_real + errD_fake618# Update D619optimizerD.step()620621############################622# (2) Update G network: maximize log(D(G(z)))623###########################624netG.zero_grad()625label.fill_(real_label) # fake labels are real for generator cost626# Since we just updated D, perform another forward pass of all-fake batch through D627output = netD(fake).view(-1)628# Calculate G's loss based on this output629errG = criterion(output, label)630# Calculate gradients for G631errG.backward()632D_G_z2 = output.mean().item()633# Update G634optimizerG.step()635636# Output training stats637if i % 50 == 0:638print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'639% (epoch, num_epochs, i, len(dataloader),640errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))641642# Save Losses for plotting later643G_losses.append(errG.item())644D_losses.append(errD.item())645646# Check how the generator is doing by saving G's output on fixed_noise647if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):648with torch.no_grad():649fake = netG(fixed_noise).detach().cpu()650img_list.append(vutils.make_grid(fake, padding=2, normalize=True))651652iters += 1653654655######################################################################656# Results657# -------658#659# Finally, lets check out how we did. Here, we will look at three660# different results. First, we will see how D and G’s losses changed661# during training. Second, we will visualize G’s output on the fixed_noise662# batch for every epoch. And third, we will look at a batch of real data663# next to a batch of fake data from G.664#665# **Loss versus training iteration**666#667# Below is a plot of D & G’s losses versus training iterations.668#669670plt.figure(figsize=(10,5))671plt.title("Generator and Discriminator Loss During Training")672plt.plot(G_losses,label="G")673plt.plot(D_losses,label="D")674plt.xlabel("iterations")675plt.ylabel("Loss")676plt.legend()677plt.show()678679680######################################################################681# **Visualization of G’s progression**682#683# Remember how we saved the generator’s output on the fixed_noise batch684# after every epoch of training. Now, we can visualize the training685# progression of G with an animation. Press the play button to start the686# animation.687#688689#%%capture690fig = plt.figure(figsize=(8,8))691plt.axis("off")692ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list]693ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)694695HTML(ani.to_jshtml())696697698######################################################################699# **Real Images vs. Fake Images**700#701# Finally, lets take a look at some real images and fake images side by702# side.703#704705# Grab a batch of real images from the dataloader706real_batch = next(iter(dataloader))707708# Plot the real images709plt.figure(figsize=(15,15))710plt.subplot(1,2,1)711plt.axis("off")712plt.title("Real Images")713plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))714715# Plot the fake images from the last epoch716plt.subplot(1,2,2)717plt.axis("off")718plt.title("Fake Images")719plt.imshow(np.transpose(img_list[-1],(1,2,0)))720plt.show()721722723######################################################################724# Where to Go Next725# ----------------726#727# We have reached the end of our journey, but there are several places you728# could go from here. You could:729#730# - Train for longer to see how good the results get731# - Modify this model to take a different dataset and possibly change the732# size of the images and the model architecture733# - Check out some other cool GAN projects734# `here <https://github.com/nashory/gans-awesome-applications>`__735# - Create GANs that generate736# `music <https://www.deepmind.com/blog/wavenet-a-generative-model-for-raw-audio/>`__737#738739740741