Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/beginner_source/introyt/introyt1_tutorial.py
Views: 713
"""1**Introduction** ||2`Tensors <tensors_deeper_tutorial.html>`_ ||3`Autograd <autogradyt_tutorial.html>`_ ||4`Building Models <modelsyt_tutorial.html>`_ ||5`TensorBoard Support <tensorboardyt_tutorial.html>`_ ||6`Training Models <trainingyt.html>`_ ||7`Model Understanding <captumyt.html>`_89Introduction to PyTorch10=======================1112Follow along with the video below or on `youtube <https://www.youtube.com/watch?v=IC0_FRiX-sw>`__.1314.. raw:: html1516<div style="margin-top:10px; margin-bottom:10px;">17<iframe width="560" height="315" src="https://www.youtube.com/embed/IC0_FRiX-sw" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>18</div>1920PyTorch Tensors21---------------2223Follow along with the video beginning at `03:50 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=230s>`__.2425First, we’ll import pytorch.2627"""2829import torch3031######################################################################32# Let’s see a few basic tensor manipulations. First, just a few of the33# ways to create tensors:34#3536z = torch.zeros(5, 3)37print(z)38print(z.dtype)394041#########################################################################42# Above, we create a 5x3 matrix filled with zeros, and query its datatype43# to find out that the zeros are 32-bit floating point numbers, which is44# the default PyTorch.45#46# What if you wanted integers instead? You can always override the47# default:48#4950i = torch.ones((5, 3), dtype=torch.int16)51print(i)525354######################################################################55# You can see that when we do change the default, the tensor helpfully56# reports this when printed.57#58# It’s common to initialize learning weights randomly, often with a59# specific seed for the PRNG for reproducibility of results:60#6162torch.manual_seed(1729)63r1 = torch.rand(2, 2)64print('A random tensor:')65print(r1)6667r2 = torch.rand(2, 2)68print('\nA different random tensor:')69print(r2) # new values7071torch.manual_seed(1729)72r3 = torch.rand(2, 2)73print('\nShould match r1:')74print(r3) # repeats values of r1 because of re-seed757677#######################################################################78# PyTorch tensors perform arithmetic operations intuitively. Tensors of79# similar shapes may be added, multiplied, etc. Operations with scalars80# are distributed over the tensor:81#8283ones = torch.ones(2, 3)84print(ones)8586twos = torch.ones(2, 3) * 2 # every element is multiplied by 287print(twos)8889threes = ones + twos # addition allowed because shapes are similar90print(threes) # tensors are added element-wise91print(threes.shape) # this has the same dimensions as input tensors9293r1 = torch.rand(2, 3)94r2 = torch.rand(3, 2)95# uncomment this line to get a runtime error96# r3 = r1 + r2979899######################################################################100# Here’s a small sample of the mathematical operations available:101#102103r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1104print('A random matrix, r:')105print(r)106107# Common mathematical operations are supported:108print('\nAbsolute value of r:')109print(torch.abs(r))110111# ...as are trigonometric functions:112print('\nInverse sine of r:')113print(torch.asin(r))114115# ...and linear algebra operations like determinant and singular value decomposition116print('\nDeterminant of r:')117print(torch.det(r))118print('\nSingular value decomposition of r:')119print(torch.svd(r))120121# ...and statistical and aggregate operations:122print('\nAverage and standard deviation of r:')123print(torch.std_mean(r))124print('\nMaximum value of r:')125print(torch.max(r))126127128##########################################################################129# There’s a good deal more to know about the power of PyTorch tensors,130# including how to set them up for parallel computations on GPU - we’ll be131# going into more depth in another video.132#133# PyTorch Models134# --------------135#136# Follow along with the video beginning at `10:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=600s>`__.137#138# Let’s talk about how we can express models in PyTorch139#140141import torch # for all things PyTorch142import torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models143import torch.nn.functional as F # for the activation function144145146#########################################################################147# .. figure:: /_static/img/mnist.png148# :alt: le-net-5 diagram149#150# *Figure: LeNet-5*151#152# Above is a diagram of LeNet-5, one of the earliest convolutional neural153# nets, and one of the drivers of the explosion in Deep Learning. It was154# built to read small images of handwritten numbers (the MNIST dataset),155# and correctly classify which digit was represented in the image.156#157# Here’s the abridged version of how it works:158#159# - Layer C1 is a convolutional layer, meaning that it scans the input160# image for features it learned during training. It outputs a map of161# where it saw each of its learned features in the image. This162# “activation map” is downsampled in layer S2.163# - Layer C3 is another convolutional layer, this time scanning C1’s164# activation map for *combinations* of features. It also puts out an165# activation map describing the spatial locations of these feature166# combinations, which is downsampled in layer S4.167# - Finally, the fully-connected layers at the end, F5, F6, and OUTPUT,168# are a *classifier* that takes the final activation map, and169# classifies it into one of ten bins representing the 10 digits.170#171# How do we express this simple neural network in code?172#173174class LeNet(nn.Module):175176def __init__(self):177super(LeNet, self).__init__()178# 1 input image channel (black & white), 6 output channels, 5x5 square convolution179# kernel180self.conv1 = nn.Conv2d(1, 6, 5)181self.conv2 = nn.Conv2d(6, 16, 5)182# an affine operation: y = Wx + b183self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension184self.fc2 = nn.Linear(120, 84)185self.fc3 = nn.Linear(84, 10)186187def forward(self, x):188# Max pooling over a (2, 2) window189x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))190# If the size is a square you can only specify a single number191x = F.max_pool2d(F.relu(self.conv2(x)), 2)192x = x.view(-1, self.num_flat_features(x))193x = F.relu(self.fc1(x))194x = F.relu(self.fc2(x))195x = self.fc3(x)196return x197198def num_flat_features(self, x):199size = x.size()[1:] # all dimensions except the batch dimension200num_features = 1201for s in size:202num_features *= s203return num_features204205206############################################################################207# Looking over this code, you should be able to spot some structural208# similarities with the diagram above.209#210# This demonstrates the structure of a typical PyTorch model:211#212# - It inherits from ``torch.nn.Module`` - modules may be nested - in fact,213# even the ``Conv2d`` and ``Linear`` layer classes inherit from214# ``torch.nn.Module``.215# - A model will have an ``__init__()`` function, where it instantiates216# its layers, and loads any data artifacts it might217# need (e.g., an NLP model might load a vocabulary).218# - A model will have a ``forward()`` function. This is where the actual219# computation happens: An input is passed through the network layers220# and various functions to generate an output.221# - Other than that, you can build out your model class like any other222# Python class, adding whatever properties and methods you need to223# support your model’s computation.224#225# Let’s instantiate this object and run a sample input through it.226#227228net = LeNet()229print(net) # what does the object tell us about itself?230231input = torch.rand(1, 1, 32, 32) # stand-in for a 32x32 black & white image232print('\nImage batch shape:')233print(input.shape)234235output = net(input) # we don't call forward() directly236print('\nRaw output:')237print(output)238print(output.shape)239240241##########################################################################242# There are a few important things happening above:243#244# First, we instantiate the ``LeNet`` class, and we print the ``net``245# object. A subclass of ``torch.nn.Module`` will report the layers it has246# created and their shapes and parameters. This can provide a handy247# overview of a model if you want to get the gist of its processing.248#249# Below that, we create a dummy input representing a 32x32 image with 1250# color channel. Normally, you would load an image tile and convert it to251# a tensor of this shape.252#253# You may have noticed an extra dimension to our tensor - the *batch254# dimension.* PyTorch models assume they are working on *batches* of data255# - for example, a batch of 16 of our image tiles would have the shape256# ``(16, 1, 32, 32)``. Since we’re only using one image, we create a batch257# of 1 with shape ``(1, 1, 32, 32)``.258#259# We ask the model for an inference by calling it like a function:260# ``net(input)``. The output of this call represents the model’s261# confidence that the input represents a particular digit. (Since this262# instance of the model hasn’t learned anything yet, we shouldn’t expect263# to see any signal in the output.) Looking at the shape of ``output``, we264# can see that it also has a batch dimension, the size of which should265# always match the input batch dimension. If we had passed in an input266# batch of 16 instances, ``output`` would have a shape of ``(16, 10)``.267#268# Datasets and Dataloaders269# ------------------------270#271# Follow along with the video beginning at `14:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=840s>`__.272#273# Below, we’re going to demonstrate using one of the ready-to-download,274# open-access datasets from TorchVision, how to transform the images for275# consumption by your model, and how to use the DataLoader to feed batches276# of data to your model.277#278# The first thing we need to do is transform our incoming images into a279# PyTorch tensor.280#281282#%matplotlib inline283284import torch285import torchvision286import torchvision.transforms as transforms287288transform = transforms.Compose(289[transforms.ToTensor(),290transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])291292293##########################################################################294# Here, we specify two transformations for our input:295#296# - ``transforms.ToTensor()`` converts images loaded by Pillow into297# PyTorch tensors.298# - ``transforms.Normalize()`` adjusts the values of the tensor so299# that their average is zero and their standard deviation is 1.0. Most300# activation functions have their strongest gradients around x = 0, so301# centering our data there can speed learning.302# The values passed to the transform are the means (first tuple) and the303# standard deviations (second tuple) of the rgb values of the images in304# the dataset. You can calculate these values yourself by running these305# few lines of code:306# ```307# from torch.utils.data import ConcatDataset308# transform = transforms.Compose([transforms.ToTensor()])309# trainset = torchvision.datasets.CIFAR10(root='./data', train=True,310# download=True, transform=transform)311#312# #stack all train images together into a tensor of shape313# #(50000, 3, 32, 32)314# x = torch.stack([sample[0] for sample in ConcatDataset([trainset])])315#316# #get the mean of each channel317# mean = torch.mean(x, dim=(0,2,3)) #tensor([0.4914, 0.4822, 0.4465])318# std = torch.std(x, dim=(0,2,3)) #tensor([0.2470, 0.2435, 0.2616])319#320# ```321#322# There are many more transforms available, including cropping, centering,323# rotation, and reflection.324#325# Next, we’ll create an instance of the CIFAR10 dataset. This is a set of326# 32x32 color image tiles representing 10 classes of objects: 6 of animals327# (bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane,328# automobile, ship, truck):329#330331trainset = torchvision.datasets.CIFAR10(root='./data', train=True,332download=True, transform=transform)333334335##########################################################################336# .. note::337# When you run the cell above, it may take a little time for the338# dataset to download.339#340# This is an example of creating a dataset object in PyTorch. Downloadable341# datasets (like CIFAR-10 above) are subclasses of342# ``torch.utils.data.Dataset``. ``Dataset`` classes in PyTorch include the343# downloadable datasets in TorchVision, Torchtext, and TorchAudio, as well344# as utility dataset classes such as ``torchvision.datasets.ImageFolder``,345# which will read a folder of labeled images. You can also create your own346# subclasses of ``Dataset``.347#348# When we instantiate our dataset, we need to tell it a few things:349#350# - The filesystem path to where we want the data to go.351# - Whether or not we are using this set for training; most datasets352# will be split into training and test subsets.353# - Whether we would like to download the dataset if we haven’t already.354# - The transformations we want to apply to the data.355#356# Once your dataset is ready, you can give it to the ``DataLoader``:357#358359trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,360shuffle=True, num_workers=2)361362363##########################################################################364# A ``Dataset`` subclass wraps access to the data, and is specialized to365# the type of data it’s serving. The ``DataLoader`` knows *nothing* about366# the data, but organizes the input tensors served by the ``Dataset`` into367# batches with the parameters you specify.368#369# In the example above, we’ve asked a ``DataLoader`` to give us batches of370# 4 images from ``trainset``, randomizing their order (``shuffle=True``),371# and we told it to spin up two workers to load data from disk.372#373# It’s good practice to visualize the batches your ``DataLoader`` serves:374#375376import matplotlib.pyplot as plt377import numpy as np378379classes = ('plane', 'car', 'bird', 'cat',380'deer', 'dog', 'frog', 'horse', 'ship', 'truck')381382def imshow(img):383img = img / 2 + 0.5 # unnormalize384npimg = img.numpy()385plt.imshow(np.transpose(npimg, (1, 2, 0)))386387388# get some random training images389dataiter = iter(trainloader)390images, labels = next(dataiter)391392# show images393imshow(torchvision.utils.make_grid(images))394# print labels395print(' '.join('%5s' % classes[labels[j]] for j in range(4)))396397398########################################################################399# Running the above cell should show you a strip of four images, and the400# correct label for each.401#402# Training Your PyTorch Model403# ---------------------------404#405# Follow along with the video beginning at `17:10 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=1030s>`__.406#407# Let’s put all the pieces together, and train a model:408#409410#%matplotlib inline411412import torch413import torch.nn as nn414import torch.nn.functional as F415import torch.optim as optim416417import torchvision418import torchvision.transforms as transforms419420import matplotlib421import matplotlib.pyplot as plt422import numpy as np423424425#########################################################################426# First, we’ll need training and test datasets. If you haven’t already,427# run the cell below to make sure the dataset is downloaded. (It may take428# a minute.)429#430431transform = transforms.Compose(432[transforms.ToTensor(),433transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])434435trainset = torchvision.datasets.CIFAR10(root='./data', train=True,436download=True, transform=transform)437trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,438shuffle=True, num_workers=2)439440testset = torchvision.datasets.CIFAR10(root='./data', train=False,441download=True, transform=transform)442testloader = torch.utils.data.DataLoader(testset, batch_size=4,443shuffle=False, num_workers=2)444445classes = ('plane', 'car', 'bird', 'cat',446'deer', 'dog', 'frog', 'horse', 'ship', 'truck')447448449######################################################################450# We’ll run our check on the output from ``DataLoader``:451#452453import matplotlib.pyplot as plt454import numpy as np455456# functions to show an image457458459def imshow(img):460img = img / 2 + 0.5 # unnormalize461npimg = img.numpy()462plt.imshow(np.transpose(npimg, (1, 2, 0)))463464465# get some random training images466dataiter = iter(trainloader)467images, labels = next(dataiter)468469# show images470imshow(torchvision.utils.make_grid(images))471# print labels472print(' '.join('%5s' % classes[labels[j]] for j in range(4)))473474475##########################################################################476# This is the model we’ll train. If it looks familiar, that’s because it’s477# a variant of LeNet - discussed earlier in this video - adapted for478# 3-color images.479#480481class Net(nn.Module):482def __init__(self):483super(Net, self).__init__()484self.conv1 = nn.Conv2d(3, 6, 5)485self.pool = nn.MaxPool2d(2, 2)486self.conv2 = nn.Conv2d(6, 16, 5)487self.fc1 = nn.Linear(16 * 5 * 5, 120)488self.fc2 = nn.Linear(120, 84)489self.fc3 = nn.Linear(84, 10)490491def forward(self, x):492x = self.pool(F.relu(self.conv1(x)))493x = self.pool(F.relu(self.conv2(x)))494x = x.view(-1, 16 * 5 * 5)495x = F.relu(self.fc1(x))496x = F.relu(self.fc2(x))497x = self.fc3(x)498return x499500501net = Net()502503504######################################################################505# The last ingredients we need are a loss function and an optimizer:506#507508criterion = nn.CrossEntropyLoss()509optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)510511512##########################################################################513# The loss function, as discussed earlier in this video, is a measure of514# how far from our ideal output the model’s prediction was. Cross-entropy515# loss is a typical loss function for classification models like ours.516#517# The **optimizer** is what drives the learning. Here we have created an518# optimizer that implements *stochastic gradient descent,* one of the more519# straightforward optimization algorithms. Besides parameters of the520# algorithm, like the learning rate (``lr``) and momentum, we also pass in521# ``net.parameters()``, which is a collection of all the learning weights522# in the model - which is what the optimizer adjusts.523#524# Finally, all of this is assembled into the training loop. Go ahead and525# run this cell, as it will likely take a few minutes to execute:526#527528for epoch in range(2): # loop over the dataset multiple times529530running_loss = 0.0531for i, data in enumerate(trainloader, 0):532# get the inputs533inputs, labels = data534535# zero the parameter gradients536optimizer.zero_grad()537538# forward + backward + optimize539outputs = net(inputs)540loss = criterion(outputs, labels)541loss.backward()542optimizer.step()543544# print statistics545running_loss += loss.item()546if i % 2000 == 1999: # print every 2000 mini-batches547print('[%d, %5d] loss: %.3f' %548(epoch + 1, i + 1, running_loss / 2000))549running_loss = 0.0550551print('Finished Training')552553554########################################################################555# Here, we are doing only **2 training epochs** (line 1) - that is, two556# passes over the training dataset. Each pass has an inner loop that557# **iterates over the training data** (line 4), serving batches of558# transformed input images and their correct labels.559#560# **Zeroing the gradients** (line 9) is an important step. Gradients are561# accumulated over a batch; if we do not reset them for every batch, they562# will keep accumulating, which will provide incorrect gradient values,563# making learning impossible.564#565# In line 12, we **ask the model for its predictions** on this batch. In566# the following line (13), we compute the loss - the difference between567# ``outputs`` (the model prediction) and ``labels`` (the correct output).568#569# In line 14, we do the ``backward()`` pass, and calculate the gradients570# that will direct the learning.571#572# In line 15, the optimizer performs one learning step - it uses the573# gradients from the ``backward()`` call to nudge the learning weights in574# the direction it thinks will reduce the loss.575#576# The remainder of the loop does some light reporting on the epoch number,577# how many training instances have been completed, and what the collected578# loss is over the training loop.579#580# **When you run the cell above,** you should see something like this:581#582# .. code-block:: sh583#584# [1, 2000] loss: 2.235585# [1, 4000] loss: 1.940586# [1, 6000] loss: 1.713587# [1, 8000] loss: 1.573588# [1, 10000] loss: 1.507589# [1, 12000] loss: 1.442590# [2, 2000] loss: 1.378591# [2, 4000] loss: 1.364592# [2, 6000] loss: 1.349593# [2, 8000] loss: 1.319594# [2, 10000] loss: 1.284595# [2, 12000] loss: 1.267596# Finished Training597#598# Note that the loss is monotonically descending, indicating that our599# model is continuing to improve its performance on the training dataset.600#601# As a final step, we should check that the model is actually doing602# *general* learning, and not simply “memorizing” the dataset. This is603# called **overfitting,** and usually indicates that the dataset is too604# small (not enough examples for general learning), or that the model has605# more learning parameters than it needs to correctly model the dataset.606#607# This is the reason datasets are split into training and test subsets -608# to test the generality of the model, we ask it to make predictions on609# data it hasn’t trained on:610#611612correct = 0613total = 0614with torch.no_grad():615for data in testloader:616images, labels = data617outputs = net(images)618_, predicted = torch.max(outputs.data, 1)619total += labels.size(0)620correct += (predicted == labels).sum().item()621622print('Accuracy of the network on the 10000 test images: %d %%' % (623100 * correct / total))624625626#########################################################################627# If you followed along, you should see that the model is roughly 50%628# accurate at this point. That’s not exactly state-of-the-art, but it’s629# far better than the 10% accuracy we’d expect from a random output. This630# demonstrates that some general learning did happen in the model.631#632633634