Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/advanced_source/neural_style_tutorial.py
Views: 712
"""1Neural Transfer Using PyTorch2=============================345**Author**: `Alexis Jacq <https://alexis-jacq.github.io>`_67**Edited by**: `Winston Herring <https://github.com/winston6>`_89Introduction10------------1112This tutorial explains how to implement the `Neural-Style algorithm <https://arxiv.org/abs/1508.06576>`__13developed by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge.14Neural-Style, or Neural-Transfer, allows you to take an image and15reproduce it with a new artistic style. The algorithm takes three images,16an input image, a content-image, and a style-image, and changes the input17to resemble the content of the content-image and the artistic style of the style-image.181920.. figure:: /_static/img/neural-style/neuralstyle.png21:alt: content122"""2324######################################################################25# Underlying Principle26# --------------------27#28# The principle is simple: we define two distances, one for the content29# (:math:`D_C`) and one for the style (:math:`D_S`). :math:`D_C` measures how different the content30# is between two images while :math:`D_S` measures how different the style is31# between two images. Then, we take a third image, the input, and32# transform it to minimize both its content-distance with the33# content-image and its style-distance with the style-image. Now we can34# import the necessary packages and begin the neural transfer.35#36# Importing Packages and Selecting a Device37# -----------------------------------------38# Below is a list of the packages needed to implement the neural transfer.39#40# - ``torch``, ``torch.nn``, ``numpy`` (indispensables packages for41# neural networks with PyTorch)42# - ``torch.optim`` (efficient gradient descents)43# - ``PIL``, ``PIL.Image``, ``matplotlib.pyplot`` (load and display44# images)45# - ``torchvision.transforms`` (transform PIL images into tensors)46# - ``torchvision.models`` (train or load pretrained models)47# - ``copy`` (to deep copy the models; system package)4849import torch50import torch.nn as nn51import torch.nn.functional as F52import torch.optim as optim5354from PIL import Image55import matplotlib.pyplot as plt5657import torchvision.transforms as transforms58from torchvision.models import vgg19, VGG19_Weights5960import copy616263######################################################################64# Next, we need to choose which device to run the network on and import the65# content and style images. Running the neural transfer algorithm on large66# images takes longer and will go much faster when running on a GPU. We can67# use ``torch.cuda.is_available()`` to detect if there is a GPU available.68# Next, we set the ``torch.device`` for use throughout the tutorial. Also the ``.to(device)``69# method is used to move tensors or modules to a desired device.7071device = torch.device("cuda" if torch.cuda.is_available() else "cpu")72torch.set_default_device(device)7374######################################################################75# Loading the Images76# ------------------77#78# Now we will import the style and content images. The original PIL images have values between 0 and 255, but when79# transformed into torch tensors, their values are converted to be between80# 0 and 1. The images also need to be resized to have the same dimensions.81# An important detail to note is that neural networks from the82# torch library are trained with tensor values ranging from 0 to 1. If you83# try to feed the networks with 0 to 255 tensor images, then the activated84# feature maps will be unable to sense the intended content and style.85# However, pretrained networks from the Caffe library are trained with 086# to 255 tensor images.87#88#89# .. note::90# Here are links to download the images required to run the tutorial:91# `picasso.jpg <https://pytorch.org/tutorials/_static/img/neural-style/picasso.jpg>`__ and92# `dancing.jpg <https://pytorch.org/tutorials/_static/img/neural-style/dancing.jpg>`__.93# Download these two images and add them to a directory94# with name ``images`` in your current working directory.9596# desired size of the output image97imsize = 512 if torch.cuda.is_available() else 128 # use small size if no GPU9899loader = transforms.Compose([100transforms.Resize(imsize), # scale imported image101transforms.ToTensor()]) # transform it into a torch tensor102103104def image_loader(image_name):105image = Image.open(image_name)106# fake batch dimension required to fit network's input dimensions107image = loader(image).unsqueeze(0)108return image.to(device, torch.float)109110111style_img = image_loader("./data/images/neural-style/picasso.jpg")112content_img = image_loader("./data/images/neural-style/dancing.jpg")113114assert style_img.size() == content_img.size(), \115"we need to import style and content images of the same size"116117118######################################################################119# Now, let's create a function that displays an image by reconverting a120# copy of it to PIL format and displaying the copy using121# ``plt.imshow``. We will try displaying the content and style images122# to ensure they were imported correctly.123124unloader = transforms.ToPILImage() # reconvert into PIL image125126plt.ion()127128def imshow(tensor, title=None):129image = tensor.cpu().clone() # we clone the tensor to not do changes on it130image = image.squeeze(0) # remove the fake batch dimension131image = unloader(image)132plt.imshow(image)133if title is not None:134plt.title(title)135plt.pause(0.001) # pause a bit so that plots are updated136137138plt.figure()139imshow(style_img, title='Style Image')140141plt.figure()142imshow(content_img, title='Content Image')143144######################################################################145# Loss Functions146# --------------147# Content Loss148# ~~~~~~~~~~~~149#150# The content loss is a function that represents a weighted version of the151# content distance for an individual layer. The function takes the feature152# maps :math:`F_{XL}` of a layer :math:`L` in a network processing input :math:`X` and returns the153# weighted content distance :math:`w_{CL}.D_C^L(X,C)` between the image :math:`X` and the154# content image :math:`C`. The feature maps of the content image(:math:`F_{CL}`) must be155# known by the function in order to calculate the content distance. We156# implement this function as a torch module with a constructor that takes157# :math:`F_{CL}` as an input. The distance :math:`\|F_{XL} - F_{CL}\|^2` is the mean square error158# between the two sets of feature maps, and can be computed using ``nn.MSELoss``.159#160# We will add this content loss module directly after the convolution161# layer(s) that are being used to compute the content distance. This way162# each time the network is fed an input image the content losses will be163# computed at the desired layers and because of auto grad, all the164# gradients will be computed. Now, in order to make the content loss layer165# transparent we must define a ``forward`` method that computes the content166# loss and then returns the layer’s input. The computed loss is saved as a167# parameter of the module.168#169170class ContentLoss(nn.Module):171172def __init__(self, target,):173super(ContentLoss, self).__init__()174# we 'detach' the target content from the tree used175# to dynamically compute the gradient: this is a stated value,176# not a variable. Otherwise the forward method of the criterion177# will throw an error.178self.target = target.detach()179180def forward(self, input):181self.loss = F.mse_loss(input, self.target)182return input183184######################################################################185# .. note::186# **Important detail**: although this module is named ``ContentLoss``, it187# is not a true PyTorch Loss function. If you want to define your content188# loss as a PyTorch Loss function, you have to create a PyTorch autograd function189# to recompute/implement the gradient manually in the ``backward``190# method.191192######################################################################193# Style Loss194# ~~~~~~~~~~195#196# The style loss module is implemented similarly to the content loss197# module. It will act as a transparent layer in a198# network that computes the style loss of that layer. In order to199# calculate the style loss, we need to compute the gram matrix :math:`G_{XL}`. A gram200# matrix is the result of multiplying a given matrix by its transposed201# matrix. In this application the given matrix is a reshaped version of202# the feature maps :math:`F_{XL}` of a layer :math:`L`. :math:`F_{XL}` is reshaped to form :math:`\hat{F}_{XL}`, a :math:`K`\ x\ :math:`N`203# matrix, where :math:`K` is the number of feature maps at layer :math:`L` and :math:`N` is the204# length of any vectorized feature map :math:`F_{XL}^k`. For example, the first line205# of :math:`\hat{F}_{XL}` corresponds to the first vectorized feature map :math:`F_{XL}^1`.206#207# Finally, the gram matrix must be normalized by dividing each element by208# the total number of elements in the matrix. This normalization is to209# counteract the fact that :math:`\hat{F}_{XL}` matrices with a large :math:`N` dimension yield210# larger values in the Gram matrix. These larger values will cause the211# first layers (before pooling layers) to have a larger impact during the212# gradient descent. Style features tend to be in the deeper layers of the213# network so this normalization step is crucial.214#215216def gram_matrix(input):217a, b, c, d = input.size() # a=batch size(=1)218# b=number of feature maps219# (c,d)=dimensions of a f. map (N=c*d)220221features = input.view(a * b, c * d) # resize F_XL into \hat F_XL222223G = torch.mm(features, features.t()) # compute the gram product224225# we 'normalize' the values of the gram matrix226# by dividing by the number of element in each feature maps.227return G.div(a * b * c * d)228229230######################################################################231# Now the style loss module looks almost exactly like the content loss232# module. The style distance is also computed using the mean square233# error between :math:`G_{XL}` and :math:`G_{SL}`.234#235236class StyleLoss(nn.Module):237238def __init__(self, target_feature):239super(StyleLoss, self).__init__()240self.target = gram_matrix(target_feature).detach()241242def forward(self, input):243G = gram_matrix(input)244self.loss = F.mse_loss(G, self.target)245return input246247248######################################################################249# Importing the Model250# -------------------251#252# Now we need to import a pretrained neural network. We will use a 19253# layer VGG network like the one used in the paper.254#255# PyTorch’s implementation of VGG is a module divided into two child256# ``Sequential`` modules: ``features`` (containing convolution and pooling layers),257# and ``classifier`` (containing fully connected layers). We will use the258# ``features`` module because we need the output of the individual259# convolution layers to measure content and style loss. Some layers have260# different behavior during training than evaluation, so we must set the261# network to evaluation mode using ``.eval()``.262#263264cnn = vgg19(weights=VGG19_Weights.DEFAULT).features.eval()265266267268######################################################################269# Additionally, VGG networks are trained on images with each channel270# normalized by mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225].271# We will use them to normalize the image before sending it into the network.272#273274cnn_normalization_mean = torch.tensor([0.485, 0.456, 0.406])275cnn_normalization_std = torch.tensor([0.229, 0.224, 0.225])276277# create a module to normalize input image so we can easily put it in a278# ``nn.Sequential``279class Normalization(nn.Module):280def __init__(self, mean, std):281super(Normalization, self).__init__()282# .view the mean and std to make them [C x 1 x 1] so that they can283# directly work with image Tensor of shape [B x C x H x W].284# B is batch size. C is number of channels. H is height and W is width.285self.mean = torch.tensor(mean).view(-1, 1, 1)286self.std = torch.tensor(std).view(-1, 1, 1)287288def forward(self, img):289# normalize ``img``290return (img - self.mean) / self.std291292293######################################################################294# A ``Sequential`` module contains an ordered list of child modules. For295# instance, ``vgg19.features`` contains a sequence (``Conv2d``, ``ReLU``, ``MaxPool2d``,296# ``Conv2d``, ``ReLU``…) aligned in the right order of depth. We need to add our297# content loss and style loss layers immediately after the convolution298# layer they are detecting. To do this we must create a new ``Sequential``299# module that has content loss and style loss modules correctly inserted.300#301302# desired depth layers to compute style/content losses :303content_layers_default = ['conv_4']304style_layers_default = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']305306def get_style_model_and_losses(cnn, normalization_mean, normalization_std,307style_img, content_img,308content_layers=content_layers_default,309style_layers=style_layers_default):310# normalization module311normalization = Normalization(normalization_mean, normalization_std)312313# just in order to have an iterable access to or list of content/style314# losses315content_losses = []316style_losses = []317318# assuming that ``cnn`` is a ``nn.Sequential``, so we make a new ``nn.Sequential``319# to put in modules that are supposed to be activated sequentially320model = nn.Sequential(normalization)321322i = 0 # increment every time we see a conv323for layer in cnn.children():324if isinstance(layer, nn.Conv2d):325i += 1326name = 'conv_{}'.format(i)327elif isinstance(layer, nn.ReLU):328name = 'relu_{}'.format(i)329# The in-place version doesn't play very nicely with the ``ContentLoss``330# and ``StyleLoss`` we insert below. So we replace with out-of-place331# ones here.332layer = nn.ReLU(inplace=False)333elif isinstance(layer, nn.MaxPool2d):334name = 'pool_{}'.format(i)335elif isinstance(layer, nn.BatchNorm2d):336name = 'bn_{}'.format(i)337else:338raise RuntimeError('Unrecognized layer: {}'.format(layer.__class__.__name__))339340model.add_module(name, layer)341342if name in content_layers:343# add content loss:344target = model(content_img).detach()345content_loss = ContentLoss(target)346model.add_module("content_loss_{}".format(i), content_loss)347content_losses.append(content_loss)348349if name in style_layers:350# add style loss:351target_feature = model(style_img).detach()352style_loss = StyleLoss(target_feature)353model.add_module("style_loss_{}".format(i), style_loss)354style_losses.append(style_loss)355356# now we trim off the layers after the last content and style losses357for i in range(len(model) - 1, -1, -1):358if isinstance(model[i], ContentLoss) or isinstance(model[i], StyleLoss):359break360361model = model[:(i + 1)]362363return model, style_losses, content_losses364365366######################################################################367# Next, we select the input image. You can use a copy of the content image368# or white noise.369#370371input_img = content_img.clone()372# if you want to use white noise by using the following code:373#374# .. code-block:: python375#376# input_img = torch.randn(content_img.data.size())377378# add the original input image to the figure:379plt.figure()380imshow(input_img, title='Input Image')381382383######################################################################384# Gradient Descent385# ----------------386#387# As Leon Gatys, the author of the algorithm, suggested `here <https://discuss.pytorch.org/t/pytorch-tutorial-for-neural-transfert-of-artistic-style/336/20?u=alexis-jacq>`__, we will use388# L-BFGS algorithm to run our gradient descent. Unlike training a network,389# we want to train the input image in order to minimize the content/style390# losses. We will create a PyTorch L-BFGS optimizer ``optim.LBFGS`` and pass391# our image to it as the tensor to optimize.392#393394def get_input_optimizer(input_img):395# this line to show that input is a parameter that requires a gradient396optimizer = optim.LBFGS([input_img])397return optimizer398399400######################################################################401# Finally, we must define a function that performs the neural transfer. For402# each iteration of the networks, it is fed an updated input and computes403# new losses. We will run the ``backward`` methods of each loss module to404# dynamically compute their gradients. The optimizer requires a “closure”405# function, which reevaluates the module and returns the loss.406#407# We still have one final constraint to address. The network may try to408# optimize the input with values that exceed the 0 to 1 tensor range for409# the image. We can address this by correcting the input values to be410# between 0 to 1 each time the network is run.411#412413def run_style_transfer(cnn, normalization_mean, normalization_std,414content_img, style_img, input_img, num_steps=300,415style_weight=1000000, content_weight=1):416"""Run the style transfer."""417print('Building the style transfer model..')418model, style_losses, content_losses = get_style_model_and_losses(cnn,419normalization_mean, normalization_std, style_img, content_img)420421# We want to optimize the input and not the model parameters so we422# update all the requires_grad fields accordingly423input_img.requires_grad_(True)424# We also put the model in evaluation mode, so that specific layers425# such as dropout or batch normalization layers behave correctly.426model.eval()427model.requires_grad_(False)428429optimizer = get_input_optimizer(input_img)430431print('Optimizing..')432run = [0]433while run[0] <= num_steps:434435def closure():436# correct the values of updated input image437with torch.no_grad():438input_img.clamp_(0, 1)439440optimizer.zero_grad()441model(input_img)442style_score = 0443content_score = 0444445for sl in style_losses:446style_score += sl.loss447for cl in content_losses:448content_score += cl.loss449450style_score *= style_weight451content_score *= content_weight452453loss = style_score + content_score454loss.backward()455456run[0] += 1457if run[0] % 50 == 0:458print("run {}:".format(run))459print('Style Loss : {:4f} Content Loss: {:4f}'.format(460style_score.item(), content_score.item()))461print()462463return style_score + content_score464465optimizer.step(closure)466467# a last correction...468with torch.no_grad():469input_img.clamp_(0, 1)470471return input_img472473474######################################################################475# Finally, we can run the algorithm.476#477478output = run_style_transfer(cnn, cnn_normalization_mean, cnn_normalization_std,479content_img, style_img, input_img)480481plt.figure()482imshow(output, title='Output Image')483484# sphinx_gallery_thumbnail_number = 4485plt.ioff()486plt.show()487488489490