Path: blob/main/beginner_source/introyt/introyt1_tutorial.py
1367 views
"""1**Introduction** ||2`Tensors <tensors_deeper_tutorial.html>`_ ||3`Autograd <autogradyt_tutorial.html>`_ ||4`Building Models <modelsyt_tutorial.html>`_ ||5`TensorBoard Support <tensorboardyt_tutorial.html>`_ ||6`Training Models <trainingyt.html>`_ ||7`Model Understanding <captumyt.html>`_89Introduction to PyTorch10=======================1112Follow along with the video below or on `youtube <https://www.youtube.com/watch?v=IC0_FRiX-sw>`__.1314.. raw:: html1516<div style="margin-top:10px; margin-bottom:10px;">17<iframe width="560" height="315" src="https://www.youtube.com/embed/IC0_FRiX-sw" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>18</div>1920PyTorch Tensors21---------------2223Follow along with the video beginning at `03:50 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=230s>`__.2425First, we’ll import pytorch.2627"""2829import torch3031######################################################################32# Let’s see a few basic tensor manipulations. First, just a few of the33# ways to create tensors:34#3536z = torch.zeros(5, 3)37print(z)38print(z.dtype)394041#########################################################################42# Above, we create a 5x3 matrix filled with zeros, and query its datatype43# to find out that the zeros are 32-bit floating point numbers, which is44# the default PyTorch.45#46# What if you wanted integers instead? You can always override the47# default:48#4950i = torch.ones((5, 3), dtype=torch.int16)51print(i)525354######################################################################55# You can see that when we do change the default, the tensor helpfully56# reports this when printed.57#58# It’s common to initialize learning weights randomly, often with a59# specific seed for the PRNG for reproducibility of results:60#6162torch.manual_seed(1729)63r1 = torch.rand(2, 2)64print('A random tensor:')65print(r1)6667r2 = torch.rand(2, 2)68print('\nA different random tensor:')69print(r2) # new values7071torch.manual_seed(1729)72r3 = torch.rand(2, 2)73print('\nShould match r1:')74print(r3) # repeats values of r1 because of re-seed757677#######################################################################78# PyTorch tensors perform arithmetic operations intuitively. Tensors of79# similar shapes may be added, multiplied, etc. Operations with scalars80# are distributed over the tensor:81#8283ones = torch.ones(2, 3)84print(ones)8586twos = torch.ones(2, 3) * 2 # every element is multiplied by 287print(twos)8889threes = ones + twos # addition allowed because shapes are similar90print(threes) # tensors are added element-wise91print(threes.shape) # this has the same dimensions as input tensors9293r1 = torch.rand(2, 3)94r2 = torch.rand(3, 2)95# uncomment this line to get a runtime error96# r3 = r1 + r2979899######################################################################100# Here’s a small sample of the mathematical operations available:101#102103r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1104print('A random matrix, r:')105print(r)106107# Common mathematical operations are supported:108print('\nAbsolute value of r:')109print(torch.abs(r))110111# ...as are trigonometric functions:112print('\nInverse sine of r:')113print(torch.asin(r))114115# ...and linear algebra operations like determinant and singular value decomposition116print('\nDeterminant of r:')117print(torch.det(r))118print('\nSingular value decomposition of r:')119print(torch.svd(r))120121# ...and statistical and aggregate operations:122print('\nAverage and standard deviation of r:')123print(torch.std_mean(r))124print('\nMaximum value of r:')125print(torch.max(r))126127128##########################################################################129# There’s a good deal more to know about the power of PyTorch tensors,130# including how to set them up for parallel computations on GPU - we’ll be131# going into more depth in another video.132#133# PyTorch Models134# --------------135#136# Follow along with the video beginning at `10:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=600s>`__.137#138# Let’s talk about how we can express models in PyTorch139#140141import torch # for all things PyTorch142import torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models143import torch.nn.functional as F # for the activation function144145146#########################################################################147# .. figure:: /_static/img/mnist.png148# :alt: le-net-5 diagram149#150# *Figure: LeNet-5*151#152# Above is a diagram of LeNet-5, one of the earliest convolutional neural153# nets, and one of the drivers of the explosion in Deep Learning. It was154# built to read small images of handwritten numbers (the MNIST dataset),155# and correctly classify which digit was represented in the image.156#157# Here’s the abridged version of how it works:158#159# - Layer C1 is a convolutional layer, meaning that it scans the input160# image for features it learned during training. It outputs a map of161# where it saw each of its learned features in the image. This162# “activation map” is downsampled in layer S2.163# - Layer C3 is another convolutional layer, this time scanning C1’s164# activation map for *combinations* of features. It also puts out an165# activation map describing the spatial locations of these feature166# combinations, which is downsampled in layer S4.167# - Finally, the fully-connected layers at the end, F5, F6, and OUTPUT,168# are a *classifier* that takes the final activation map, and169# classifies it into one of ten bins representing the 10 digits.170#171# How do we express this simple neural network in code?172#173174class LeNet(nn.Module):175176def __init__(self):177super(LeNet, self).__init__()178# 1 input image channel (black & white), 6 output channels, 5x5 square convolution179# kernel180self.conv1 = nn.Conv2d(1, 6, 5)181self.conv2 = nn.Conv2d(6, 16, 5)182# an affine operation: y = Wx + b183self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension184self.fc2 = nn.Linear(120, 84)185self.fc3 = nn.Linear(84, 10)186187def forward(self, x):188# Max pooling over a (2, 2) window189x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))190# If the size is a square you can only specify a single number191x = F.max_pool2d(F.relu(self.conv2(x)), 2)192x = x.view(-1, self.num_flat_features(x))193x = F.relu(self.fc1(x))194x = F.relu(self.fc2(x))195x = self.fc3(x)196return x197198def num_flat_features(self, x):199size = x.size()[1:] # all dimensions except the batch dimension200num_features = 1201for s in size:202num_features *= s203return num_features204205206############################################################################207# Looking over this code, you should be able to spot some structural208# similarities with the diagram above.209#210# This demonstrates the structure of a typical PyTorch model:211#212# - It inherits from ``torch.nn.Module`` - modules may be nested - in fact,213# even the ``Conv2d`` and ``Linear`` layer classes inherit from214# ``torch.nn.Module``.215# - A model will have an ``__init__()`` function, where it instantiates216# its layers, and loads any data artifacts it might217# need (e.g., an NLP model might load a vocabulary).218# - A model will have a ``forward()`` function. This is where the actual219# computation happens: An input is passed through the network layers220# and various functions to generate an output.221# - Other than that, you can build out your model class like any other222# Python class, adding whatever properties and methods you need to223# support your model’s computation.224#225# Let’s instantiate this object and run a sample input through it.226#227228net = LeNet()229print(net) # what does the object tell us about itself?230231input = torch.rand(1, 1, 32, 32) # stand-in for a 32x32 black & white image232print('\nImage batch shape:')233print(input.shape)234235output = net(input) # we don't call forward() directly236print('\nRaw output:')237print(output)238print(output.shape)239240241##########################################################################242# There are a few important things happening above:243#244# First, we instantiate the ``LeNet`` class, and we print the ``net``245# object. A subclass of ``torch.nn.Module`` will report the layers it has246# created and their shapes and parameters. This can provide a handy247# overview of a model if you want to get the gist of its processing.248#249# Below that, we create a dummy input representing a 32x32 image with 1250# color channel. Normally, you would load an image tile and convert it to251# a tensor of this shape.252#253# You may have noticed an extra dimension to our tensor - the *batch254# dimension.* PyTorch models assume they are working on *batches* of data255# - for example, a batch of 16 of our image tiles would have the shape256# ``(16, 1, 32, 32)``. Since we’re only using one image, we create a batch257# of 1 with shape ``(1, 1, 32, 32)``.258#259# We ask the model for an inference by calling it like a function:260# ``net(input)``. The output of this call represents the model’s261# confidence that the input represents a particular digit. (Since this262# instance of the model hasn’t learned anything yet, we shouldn’t expect263# to see any signal in the output.) Looking at the shape of ``output``, we264# can see that it also has a batch dimension, the size of which should265# always match the input batch dimension. If we had passed in an input266# batch of 16 instances, ``output`` would have a shape of ``(16, 10)``.267#268# Datasets and Dataloaders269# ------------------------270#271# Follow along with the video beginning at `14:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=840s>`__.272#273# Below, we’re going to demonstrate using one of the ready-to-download,274# open-access datasets from TorchVision, how to transform the images for275# consumption by your model, and how to use the DataLoader to feed batches276# of data to your model.277#278# The first thing we need to do is transform our incoming images into a279# PyTorch tensor.280#281282#%matplotlib inline283284import torch285import torchvision286import torchvision.transforms as transforms287288transform = transforms.Compose(289[transforms.ToTensor(),290transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])291292293##########################################################################294# Here, we specify two transformations for our input:295#296# - ``transforms.ToTensor()`` converts images loaded by Pillow into297# PyTorch tensors.298# - ``transforms.Normalize()`` adjusts the values of the tensor so299# that their average is zero and their standard deviation is 1.0. Most300# activation functions have their strongest gradients around x = 0, so301# centering our data there can speed learning.302# The values passed to the transform are the means (first tuple) and the303# standard deviations (second tuple) of the rgb values of the images in304# the dataset. You can calculate these values yourself by running these305# few lines of code::306#307# from torch.utils.data import ConcatDataset308# transform = transforms.Compose([transforms.ToTensor()])309# trainset = torchvision.datasets.CIFAR10(root='./data', train=True,310# download=True, transform=transform)311#312# # stack all train images together into a tensor of shape313# # (50000, 3, 32, 32)314# x = torch.stack([sample[0] for sample in ConcatDataset([trainset])])315#316# # get the mean of each channel317# mean = torch.mean(x, dim=(0,2,3)) # tensor([0.4914, 0.4822, 0.4465])318# std = torch.std(x, dim=(0,2,3)) # tensor([0.2470, 0.2435, 0.2616])319#320#321# There are many more transforms available, including cropping, centering,322# rotation, and reflection.323#324# Next, we’ll create an instance of the CIFAR10 dataset. This is a set of325# 32x32 color image tiles representing 10 classes of objects: 6 of animals326# (bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane,327# automobile, ship, truck):328#329330trainset = torchvision.datasets.CIFAR10(root='./data', train=True,331download=True, transform=transform)332333334##########################################################################335# .. note::336# When you run the cell above, it may take a little time for the337# dataset to download.338#339# This is an example of creating a dataset object in PyTorch. Downloadable340# datasets (like CIFAR-10 above) are subclasses of341# ``torch.utils.data.Dataset``. ``Dataset`` classes in PyTorch include the342# downloadable datasets in TorchVision, Torchtext, and TorchAudio, as well343# as utility dataset classes such as ``torchvision.datasets.ImageFolder``,344# which will read a folder of labeled images. You can also create your own345# subclasses of ``Dataset``.346#347# When we instantiate our dataset, we need to tell it a few things:348#349# - The filesystem path to where we want the data to go.350# - Whether or not we are using this set for training; most datasets351# will be split into training and test subsets.352# - Whether we would like to download the dataset if we haven’t already.353# - The transformations we want to apply to the data.354#355# Once your dataset is ready, you can give it to the ``DataLoader``:356#357358trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,359shuffle=True, num_workers=2)360361362##########################################################################363# A ``Dataset`` subclass wraps access to the data, and is specialized to364# the type of data it’s serving. The ``DataLoader`` knows *nothing* about365# the data, but organizes the input tensors served by the ``Dataset`` into366# batches with the parameters you specify.367#368# In the example above, we’ve asked a ``DataLoader`` to give us batches of369# 4 images from ``trainset``, randomizing their order (``shuffle=True``),370# and we told it to spin up two workers to load data from disk.371#372# It’s good practice to visualize the batches your ``DataLoader`` serves:373#374375import matplotlib.pyplot as plt376import numpy as np377378classes = ('plane', 'car', 'bird', 'cat',379'deer', 'dog', 'frog', 'horse', 'ship', 'truck')380381def imshow(img):382img = img / 2 + 0.5 # unnormalize383npimg = img.numpy()384plt.imshow(np.transpose(npimg, (1, 2, 0)))385386387# get some random training images388dataiter = iter(trainloader)389images, labels = next(dataiter)390391# show images392imshow(torchvision.utils.make_grid(images))393# print labels394print(' '.join('%5s' % classes[labels[j]] for j in range(4)))395396397########################################################################398# Running the above cell should show you a strip of four images, and the399# correct label for each.400#401# Training Your PyTorch Model402# ---------------------------403#404# Follow along with the video beginning at `17:10 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=1030s>`__.405#406# Let’s put all the pieces together, and train a model:407#408409#%matplotlib inline410411import torch412import torch.nn as nn413import torch.nn.functional as F414import torch.optim as optim415416import torchvision417import torchvision.transforms as transforms418419import matplotlib420import matplotlib.pyplot as plt421import numpy as np422423424#########################################################################425# First, we’ll need training and test datasets. If you haven’t already,426# run the cell below to make sure the dataset is downloaded. (It may take427# a minute.)428#429430transform = transforms.Compose(431[transforms.ToTensor(),432transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])433434trainset = torchvision.datasets.CIFAR10(root='./data', train=True,435download=True, transform=transform)436trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,437shuffle=True, num_workers=2)438439testset = torchvision.datasets.CIFAR10(root='./data', train=False,440download=True, transform=transform)441testloader = torch.utils.data.DataLoader(testset, batch_size=4,442shuffle=False, num_workers=2)443444classes = ('plane', 'car', 'bird', 'cat',445'deer', 'dog', 'frog', 'horse', 'ship', 'truck')446447448######################################################################449# We’ll run our check on the output from ``DataLoader``:450#451452import matplotlib.pyplot as plt453import numpy as np454455# functions to show an image456457458def imshow(img):459img = img / 2 + 0.5 # unnormalize460npimg = img.numpy()461plt.imshow(np.transpose(npimg, (1, 2, 0)))462463464# get some random training images465dataiter = iter(trainloader)466images, labels = next(dataiter)467468# show images469imshow(torchvision.utils.make_grid(images))470# print labels471print(' '.join('%5s' % classes[labels[j]] for j in range(4)))472473474##########################################################################475# This is the model we’ll train. If it looks familiar, that’s because it’s476# a variant of LeNet - discussed earlier in this video - adapted for477# 3-color images.478#479480class Net(nn.Module):481def __init__(self):482super(Net, self).__init__()483self.conv1 = nn.Conv2d(3, 6, 5)484self.pool = nn.MaxPool2d(2, 2)485self.conv2 = nn.Conv2d(6, 16, 5)486self.fc1 = nn.Linear(16 * 5 * 5, 120)487self.fc2 = nn.Linear(120, 84)488self.fc3 = nn.Linear(84, 10)489490def forward(self, x):491x = self.pool(F.relu(self.conv1(x)))492x = self.pool(F.relu(self.conv2(x)))493x = x.view(-1, 16 * 5 * 5)494x = F.relu(self.fc1(x))495x = F.relu(self.fc2(x))496x = self.fc3(x)497return x498499500net = Net()501502503######################################################################504# The last ingredients we need are a loss function and an optimizer:505#506507criterion = nn.CrossEntropyLoss()508optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)509510511##########################################################################512# The loss function, as discussed earlier in this video, is a measure of513# how far from our ideal output the model’s prediction was. Cross-entropy514# loss is a typical loss function for classification models like ours.515#516# The **optimizer** is what drives the learning. Here we have created an517# optimizer that implements *stochastic gradient descent,* one of the more518# straightforward optimization algorithms. Besides parameters of the519# algorithm, like the learning rate (``lr``) and momentum, we also pass in520# ``net.parameters()``, which is a collection of all the learning weights521# in the model - which is what the optimizer adjusts.522#523# Finally, all of this is assembled into the training loop. Go ahead and524# run this cell, as it will likely take a few minutes to execute:525#526527for epoch in range(2): # loop over the dataset multiple times528529running_loss = 0.0530for i, data in enumerate(trainloader, 0):531# get the inputs532inputs, labels = data533534# zero the parameter gradients535optimizer.zero_grad()536537# forward + backward + optimize538outputs = net(inputs)539loss = criterion(outputs, labels)540loss.backward()541optimizer.step()542543# print statistics544running_loss += loss.item()545if i % 2000 == 1999: # print every 2000 mini-batches546print('[%d, %5d] loss: %.3f' %547(epoch + 1, i + 1, running_loss / 2000))548running_loss = 0.0549550print('Finished Training')551552553########################################################################554# Here, we are doing only **2 training epochs** (line 1) - that is, two555# passes over the training dataset. Each pass has an inner loop that556# **iterates over the training data** (line 4), serving batches of557# transformed input images and their correct labels.558#559# **Zeroing the gradients** (line 9) is an important step. Gradients are560# accumulated over a batch; if we do not reset them for every batch, they561# will keep accumulating, which will provide incorrect gradient values,562# making learning impossible.563#564# In line 12, we **ask the model for its predictions** on this batch. In565# the following line (13), we compute the loss - the difference between566# ``outputs`` (the model prediction) and ``labels`` (the correct output).567#568# In line 14, we do the ``backward()`` pass, and calculate the gradients569# that will direct the learning.570#571# In line 15, the optimizer performs one learning step - it uses the572# gradients from the ``backward()`` call to nudge the learning weights in573# the direction it thinks will reduce the loss.574#575# The remainder of the loop does some light reporting on the epoch number,576# how many training instances have been completed, and what the collected577# loss is over the training loop.578#579# **When you run the cell above,** you should see something like this:580#581# .. code-block:: sh582#583# [1, 2000] loss: 2.235584# [1, 4000] loss: 1.940585# [1, 6000] loss: 1.713586# [1, 8000] loss: 1.573587# [1, 10000] loss: 1.507588# [1, 12000] loss: 1.442589# [2, 2000] loss: 1.378590# [2, 4000] loss: 1.364591# [2, 6000] loss: 1.349592# [2, 8000] loss: 1.319593# [2, 10000] loss: 1.284594# [2, 12000] loss: 1.267595# Finished Training596#597# Note that the loss is monotonically descending, indicating that our598# model is continuing to improve its performance on the training dataset.599#600# As a final step, we should check that the model is actually doing601# *general* learning, and not simply “memorizing” the dataset. This is602# called **overfitting,** and usually indicates that the dataset is too603# small (not enough examples for general learning), or that the model has604# more learning parameters than it needs to correctly model the dataset.605#606# This is the reason datasets are split into training and test subsets -607# to test the generality of the model, we ask it to make predictions on608# data it hasn’t trained on:609#610611correct = 0612total = 0613with torch.no_grad():614for data in testloader:615images, labels = data616outputs = net(images)617_, predicted = torch.max(outputs.data, 1)618total += labels.size(0)619correct += (predicted == labels).sum().item()620621print('Accuracy of the network on the 10000 test images: %d %%' % (622100 * correct / total))623624625#########################################################################626# If you followed along, you should see that the model is roughly 50%627# accurate at this point. That’s not exactly state-of-the-art, but it’s628# far better than the 10% accuracy we’d expect from a random output. This629# demonstrates that some general learning did happen in the model.630#631632633