Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
pytorch
GitHub Repository: pytorch/tutorials
Path: blob/main/beginner_source/introyt/introyt1_tutorial.py
1367 views
1
"""
2
**Introduction** ||
3
`Tensors <tensors_deeper_tutorial.html>`_ ||
4
`Autograd <autogradyt_tutorial.html>`_ ||
5
`Building Models <modelsyt_tutorial.html>`_ ||
6
`TensorBoard Support <tensorboardyt_tutorial.html>`_ ||
7
`Training Models <trainingyt.html>`_ ||
8
`Model Understanding <captumyt.html>`_
9
10
Introduction to PyTorch
11
=======================
12
13
Follow along with the video below or on `youtube <https://www.youtube.com/watch?v=IC0_FRiX-sw>`__.
14
15
.. raw:: html
16
17
<div style="margin-top:10px; margin-bottom:10px;">
18
<iframe width="560" height="315" src="https://www.youtube.com/embed/IC0_FRiX-sw" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
19
</div>
20
21
PyTorch Tensors
22
---------------
23
24
Follow along with the video beginning at `03:50 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=230s>`__.
25
26
First, we’ll import pytorch.
27
28
"""
29
30
import torch
31
32
######################################################################
33
# Let’s see a few basic tensor manipulations. First, just a few of the
34
# ways to create tensors:
35
#
36
37
z = torch.zeros(5, 3)
38
print(z)
39
print(z.dtype)
40
41
42
#########################################################################
43
# Above, we create a 5x3 matrix filled with zeros, and query its datatype
44
# to find out that the zeros are 32-bit floating point numbers, which is
45
# the default PyTorch.
46
#
47
# What if you wanted integers instead? You can always override the
48
# default:
49
#
50
51
i = torch.ones((5, 3), dtype=torch.int16)
52
print(i)
53
54
55
######################################################################
56
# You can see that when we do change the default, the tensor helpfully
57
# reports this when printed.
58
#
59
# It’s common to initialize learning weights randomly, often with a
60
# specific seed for the PRNG for reproducibility of results:
61
#
62
63
torch.manual_seed(1729)
64
r1 = torch.rand(2, 2)
65
print('A random tensor:')
66
print(r1)
67
68
r2 = torch.rand(2, 2)
69
print('\nA different random tensor:')
70
print(r2) # new values
71
72
torch.manual_seed(1729)
73
r3 = torch.rand(2, 2)
74
print('\nShould match r1:')
75
print(r3) # repeats values of r1 because of re-seed
76
77
78
#######################################################################
79
# PyTorch tensors perform arithmetic operations intuitively. Tensors of
80
# similar shapes may be added, multiplied, etc. Operations with scalars
81
# are distributed over the tensor:
82
#
83
84
ones = torch.ones(2, 3)
85
print(ones)
86
87
twos = torch.ones(2, 3) * 2 # every element is multiplied by 2
88
print(twos)
89
90
threes = ones + twos # addition allowed because shapes are similar
91
print(threes) # tensors are added element-wise
92
print(threes.shape) # this has the same dimensions as input tensors
93
94
r1 = torch.rand(2, 3)
95
r2 = torch.rand(3, 2)
96
# uncomment this line to get a runtime error
97
# r3 = r1 + r2
98
99
100
######################################################################
101
# Here’s a small sample of the mathematical operations available:
102
#
103
104
r = (torch.rand(2, 2) - 0.5) * 2 # values between -1 and 1
105
print('A random matrix, r:')
106
print(r)
107
108
# Common mathematical operations are supported:
109
print('\nAbsolute value of r:')
110
print(torch.abs(r))
111
112
# ...as are trigonometric functions:
113
print('\nInverse sine of r:')
114
print(torch.asin(r))
115
116
# ...and linear algebra operations like determinant and singular value decomposition
117
print('\nDeterminant of r:')
118
print(torch.det(r))
119
print('\nSingular value decomposition of r:')
120
print(torch.svd(r))
121
122
# ...and statistical and aggregate operations:
123
print('\nAverage and standard deviation of r:')
124
print(torch.std_mean(r))
125
print('\nMaximum value of r:')
126
print(torch.max(r))
127
128
129
##########################################################################
130
# There’s a good deal more to know about the power of PyTorch tensors,
131
# including how to set them up for parallel computations on GPU - we’ll be
132
# going into more depth in another video.
133
#
134
# PyTorch Models
135
# --------------
136
#
137
# Follow along with the video beginning at `10:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=600s>`__.
138
#
139
# Let’s talk about how we can express models in PyTorch
140
#
141
142
import torch # for all things PyTorch
143
import torch.nn as nn # for torch.nn.Module, the parent object for PyTorch models
144
import torch.nn.functional as F # for the activation function
145
146
147
#########################################################################
148
# .. figure:: /_static/img/mnist.png
149
# :alt: le-net-5 diagram
150
#
151
# *Figure: LeNet-5*
152
#
153
# Above is a diagram of LeNet-5, one of the earliest convolutional neural
154
# nets, and one of the drivers of the explosion in Deep Learning. It was
155
# built to read small images of handwritten numbers (the MNIST dataset),
156
# and correctly classify which digit was represented in the image.
157
#
158
# Here’s the abridged version of how it works:
159
#
160
# - Layer C1 is a convolutional layer, meaning that it scans the input
161
# image for features it learned during training. It outputs a map of
162
# where it saw each of its learned features in the image. This
163
# “activation map” is downsampled in layer S2.
164
# - Layer C3 is another convolutional layer, this time scanning C1’s
165
# activation map for *combinations* of features. It also puts out an
166
# activation map describing the spatial locations of these feature
167
# combinations, which is downsampled in layer S4.
168
# - Finally, the fully-connected layers at the end, F5, F6, and OUTPUT,
169
# are a *classifier* that takes the final activation map, and
170
# classifies it into one of ten bins representing the 10 digits.
171
#
172
# How do we express this simple neural network in code?
173
#
174
175
class LeNet(nn.Module):
176
177
def __init__(self):
178
super(LeNet, self).__init__()
179
# 1 input image channel (black & white), 6 output channels, 5x5 square convolution
180
# kernel
181
self.conv1 = nn.Conv2d(1, 6, 5)
182
self.conv2 = nn.Conv2d(6, 16, 5)
183
# an affine operation: y = Wx + b
184
self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension
185
self.fc2 = nn.Linear(120, 84)
186
self.fc3 = nn.Linear(84, 10)
187
188
def forward(self, x):
189
# Max pooling over a (2, 2) window
190
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
191
# If the size is a square you can only specify a single number
192
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
193
x = x.view(-1, self.num_flat_features(x))
194
x = F.relu(self.fc1(x))
195
x = F.relu(self.fc2(x))
196
x = self.fc3(x)
197
return x
198
199
def num_flat_features(self, x):
200
size = x.size()[1:] # all dimensions except the batch dimension
201
num_features = 1
202
for s in size:
203
num_features *= s
204
return num_features
205
206
207
############################################################################
208
# Looking over this code, you should be able to spot some structural
209
# similarities with the diagram above.
210
#
211
# This demonstrates the structure of a typical PyTorch model:
212
#
213
# - It inherits from ``torch.nn.Module`` - modules may be nested - in fact,
214
# even the ``Conv2d`` and ``Linear`` layer classes inherit from
215
# ``torch.nn.Module``.
216
# - A model will have an ``__init__()`` function, where it instantiates
217
# its layers, and loads any data artifacts it might
218
# need (e.g., an NLP model might load a vocabulary).
219
# - A model will have a ``forward()`` function. This is where the actual
220
# computation happens: An input is passed through the network layers
221
# and various functions to generate an output.
222
# - Other than that, you can build out your model class like any other
223
# Python class, adding whatever properties and methods you need to
224
# support your model’s computation.
225
#
226
# Let’s instantiate this object and run a sample input through it.
227
#
228
229
net = LeNet()
230
print(net) # what does the object tell us about itself?
231
232
input = torch.rand(1, 1, 32, 32) # stand-in for a 32x32 black & white image
233
print('\nImage batch shape:')
234
print(input.shape)
235
236
output = net(input) # we don't call forward() directly
237
print('\nRaw output:')
238
print(output)
239
print(output.shape)
240
241
242
##########################################################################
243
# There are a few important things happening above:
244
#
245
# First, we instantiate the ``LeNet`` class, and we print the ``net``
246
# object. A subclass of ``torch.nn.Module`` will report the layers it has
247
# created and their shapes and parameters. This can provide a handy
248
# overview of a model if you want to get the gist of its processing.
249
#
250
# Below that, we create a dummy input representing a 32x32 image with 1
251
# color channel. Normally, you would load an image tile and convert it to
252
# a tensor of this shape.
253
#
254
# You may have noticed an extra dimension to our tensor - the *batch
255
# dimension.* PyTorch models assume they are working on *batches* of data
256
# - for example, a batch of 16 of our image tiles would have the shape
257
# ``(16, 1, 32, 32)``. Since we’re only using one image, we create a batch
258
# of 1 with shape ``(1, 1, 32, 32)``.
259
#
260
# We ask the model for an inference by calling it like a function:
261
# ``net(input)``. The output of this call represents the model’s
262
# confidence that the input represents a particular digit. (Since this
263
# instance of the model hasn’t learned anything yet, we shouldn’t expect
264
# to see any signal in the output.) Looking at the shape of ``output``, we
265
# can see that it also has a batch dimension, the size of which should
266
# always match the input batch dimension. If we had passed in an input
267
# batch of 16 instances, ``output`` would have a shape of ``(16, 10)``.
268
#
269
# Datasets and Dataloaders
270
# ------------------------
271
#
272
# Follow along with the video beginning at `14:00 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=840s>`__.
273
#
274
# Below, we’re going to demonstrate using one of the ready-to-download,
275
# open-access datasets from TorchVision, how to transform the images for
276
# consumption by your model, and how to use the DataLoader to feed batches
277
# of data to your model.
278
#
279
# The first thing we need to do is transform our incoming images into a
280
# PyTorch tensor.
281
#
282
283
#%matplotlib inline
284
285
import torch
286
import torchvision
287
import torchvision.transforms as transforms
288
289
transform = transforms.Compose(
290
[transforms.ToTensor(),
291
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])
292
293
294
##########################################################################
295
# Here, we specify two transformations for our input:
296
#
297
# - ``transforms.ToTensor()`` converts images loaded by Pillow into
298
# PyTorch tensors.
299
# - ``transforms.Normalize()`` adjusts the values of the tensor so
300
# that their average is zero and their standard deviation is 1.0. Most
301
# activation functions have their strongest gradients around x = 0, so
302
# centering our data there can speed learning.
303
# The values passed to the transform are the means (first tuple) and the
304
# standard deviations (second tuple) of the rgb values of the images in
305
# the dataset. You can calculate these values yourself by running these
306
# few lines of code::
307
#
308
# from torch.utils.data import ConcatDataset
309
# transform = transforms.Compose([transforms.ToTensor()])
310
# trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
311
# download=True, transform=transform)
312
#
313
# # stack all train images together into a tensor of shape
314
# # (50000, 3, 32, 32)
315
# x = torch.stack([sample[0] for sample in ConcatDataset([trainset])])
316
#
317
# # get the mean of each channel
318
# mean = torch.mean(x, dim=(0,2,3)) # tensor([0.4914, 0.4822, 0.4465])
319
# std = torch.std(x, dim=(0,2,3)) # tensor([0.2470, 0.2435, 0.2616])
320
#
321
#
322
# There are many more transforms available, including cropping, centering,
323
# rotation, and reflection.
324
#
325
# Next, we’ll create an instance of the CIFAR10 dataset. This is a set of
326
# 32x32 color image tiles representing 10 classes of objects: 6 of animals
327
# (bird, cat, deer, dog, frog, horse) and 4 of vehicles (airplane,
328
# automobile, ship, truck):
329
#
330
331
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
332
download=True, transform=transform)
333
334
335
##########################################################################
336
# .. note::
337
# When you run the cell above, it may take a little time for the
338
# dataset to download.
339
#
340
# This is an example of creating a dataset object in PyTorch. Downloadable
341
# datasets (like CIFAR-10 above) are subclasses of
342
# ``torch.utils.data.Dataset``. ``Dataset`` classes in PyTorch include the
343
# downloadable datasets in TorchVision, Torchtext, and TorchAudio, as well
344
# as utility dataset classes such as ``torchvision.datasets.ImageFolder``,
345
# which will read a folder of labeled images. You can also create your own
346
# subclasses of ``Dataset``.
347
#
348
# When we instantiate our dataset, we need to tell it a few things:
349
#
350
# - The filesystem path to where we want the data to go.
351
# - Whether or not we are using this set for training; most datasets
352
# will be split into training and test subsets.
353
# - Whether we would like to download the dataset if we haven’t already.
354
# - The transformations we want to apply to the data.
355
#
356
# Once your dataset is ready, you can give it to the ``DataLoader``:
357
#
358
359
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
360
shuffle=True, num_workers=2)
361
362
363
##########################################################################
364
# A ``Dataset`` subclass wraps access to the data, and is specialized to
365
# the type of data it’s serving. The ``DataLoader`` knows *nothing* about
366
# the data, but organizes the input tensors served by the ``Dataset`` into
367
# batches with the parameters you specify.
368
#
369
# In the example above, we’ve asked a ``DataLoader`` to give us batches of
370
# 4 images from ``trainset``, randomizing their order (``shuffle=True``),
371
# and we told it to spin up two workers to load data from disk.
372
#
373
# It’s good practice to visualize the batches your ``DataLoader`` serves:
374
#
375
376
import matplotlib.pyplot as plt
377
import numpy as np
378
379
classes = ('plane', 'car', 'bird', 'cat',
380
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
381
382
def imshow(img):
383
img = img / 2 + 0.5 # unnormalize
384
npimg = img.numpy()
385
plt.imshow(np.transpose(npimg, (1, 2, 0)))
386
387
388
# get some random training images
389
dataiter = iter(trainloader)
390
images, labels = next(dataiter)
391
392
# show images
393
imshow(torchvision.utils.make_grid(images))
394
# print labels
395
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
396
397
398
########################################################################
399
# Running the above cell should show you a strip of four images, and the
400
# correct label for each.
401
#
402
# Training Your PyTorch Model
403
# ---------------------------
404
#
405
# Follow along with the video beginning at `17:10 <https://www.youtube.com/watch?v=IC0_FRiX-sw&t=1030s>`__.
406
#
407
# Let’s put all the pieces together, and train a model:
408
#
409
410
#%matplotlib inline
411
412
import torch
413
import torch.nn as nn
414
import torch.nn.functional as F
415
import torch.optim as optim
416
417
import torchvision
418
import torchvision.transforms as transforms
419
420
import matplotlib
421
import matplotlib.pyplot as plt
422
import numpy as np
423
424
425
#########################################################################
426
# First, we’ll need training and test datasets. If you haven’t already,
427
# run the cell below to make sure the dataset is downloaded. (It may take
428
# a minute.)
429
#
430
431
transform = transforms.Compose(
432
[transforms.ToTensor(),
433
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
434
435
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
436
download=True, transform=transform)
437
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
438
shuffle=True, num_workers=2)
439
440
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
441
download=True, transform=transform)
442
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
443
shuffle=False, num_workers=2)
444
445
classes = ('plane', 'car', 'bird', 'cat',
446
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
447
448
449
######################################################################
450
# We’ll run our check on the output from ``DataLoader``:
451
#
452
453
import matplotlib.pyplot as plt
454
import numpy as np
455
456
# functions to show an image
457
458
459
def imshow(img):
460
img = img / 2 + 0.5 # unnormalize
461
npimg = img.numpy()
462
plt.imshow(np.transpose(npimg, (1, 2, 0)))
463
464
465
# get some random training images
466
dataiter = iter(trainloader)
467
images, labels = next(dataiter)
468
469
# show images
470
imshow(torchvision.utils.make_grid(images))
471
# print labels
472
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
473
474
475
##########################################################################
476
# This is the model we’ll train. If it looks familiar, that’s because it’s
477
# a variant of LeNet - discussed earlier in this video - adapted for
478
# 3-color images.
479
#
480
481
class Net(nn.Module):
482
def __init__(self):
483
super(Net, self).__init__()
484
self.conv1 = nn.Conv2d(3, 6, 5)
485
self.pool = nn.MaxPool2d(2, 2)
486
self.conv2 = nn.Conv2d(6, 16, 5)
487
self.fc1 = nn.Linear(16 * 5 * 5, 120)
488
self.fc2 = nn.Linear(120, 84)
489
self.fc3 = nn.Linear(84, 10)
490
491
def forward(self, x):
492
x = self.pool(F.relu(self.conv1(x)))
493
x = self.pool(F.relu(self.conv2(x)))
494
x = x.view(-1, 16 * 5 * 5)
495
x = F.relu(self.fc1(x))
496
x = F.relu(self.fc2(x))
497
x = self.fc3(x)
498
return x
499
500
501
net = Net()
502
503
504
######################################################################
505
# The last ingredients we need are a loss function and an optimizer:
506
#
507
508
criterion = nn.CrossEntropyLoss()
509
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
510
511
512
##########################################################################
513
# The loss function, as discussed earlier in this video, is a measure of
514
# how far from our ideal output the model’s prediction was. Cross-entropy
515
# loss is a typical loss function for classification models like ours.
516
#
517
# The **optimizer** is what drives the learning. Here we have created an
518
# optimizer that implements *stochastic gradient descent,* one of the more
519
# straightforward optimization algorithms. Besides parameters of the
520
# algorithm, like the learning rate (``lr``) and momentum, we also pass in
521
# ``net.parameters()``, which is a collection of all the learning weights
522
# in the model - which is what the optimizer adjusts.
523
#
524
# Finally, all of this is assembled into the training loop. Go ahead and
525
# run this cell, as it will likely take a few minutes to execute:
526
#
527
528
for epoch in range(2): # loop over the dataset multiple times
529
530
running_loss = 0.0
531
for i, data in enumerate(trainloader, 0):
532
# get the inputs
533
inputs, labels = data
534
535
# zero the parameter gradients
536
optimizer.zero_grad()
537
538
# forward + backward + optimize
539
outputs = net(inputs)
540
loss = criterion(outputs, labels)
541
loss.backward()
542
optimizer.step()
543
544
# print statistics
545
running_loss += loss.item()
546
if i % 2000 == 1999: # print every 2000 mini-batches
547
print('[%d, %5d] loss: %.3f' %
548
(epoch + 1, i + 1, running_loss / 2000))
549
running_loss = 0.0
550
551
print('Finished Training')
552
553
554
########################################################################
555
# Here, we are doing only **2 training epochs** (line 1) - that is, two
556
# passes over the training dataset. Each pass has an inner loop that
557
# **iterates over the training data** (line 4), serving batches of
558
# transformed input images and their correct labels.
559
#
560
# **Zeroing the gradients** (line 9) is an important step. Gradients are
561
# accumulated over a batch; if we do not reset them for every batch, they
562
# will keep accumulating, which will provide incorrect gradient values,
563
# making learning impossible.
564
#
565
# In line 12, we **ask the model for its predictions** on this batch. In
566
# the following line (13), we compute the loss - the difference between
567
# ``outputs`` (the model prediction) and ``labels`` (the correct output).
568
#
569
# In line 14, we do the ``backward()`` pass, and calculate the gradients
570
# that will direct the learning.
571
#
572
# In line 15, the optimizer performs one learning step - it uses the
573
# gradients from the ``backward()`` call to nudge the learning weights in
574
# the direction it thinks will reduce the loss.
575
#
576
# The remainder of the loop does some light reporting on the epoch number,
577
# how many training instances have been completed, and what the collected
578
# loss is over the training loop.
579
#
580
# **When you run the cell above,** you should see something like this:
581
#
582
# .. code-block:: sh
583
#
584
# [1, 2000] loss: 2.235
585
# [1, 4000] loss: 1.940
586
# [1, 6000] loss: 1.713
587
# [1, 8000] loss: 1.573
588
# [1, 10000] loss: 1.507
589
# [1, 12000] loss: 1.442
590
# [2, 2000] loss: 1.378
591
# [2, 4000] loss: 1.364
592
# [2, 6000] loss: 1.349
593
# [2, 8000] loss: 1.319
594
# [2, 10000] loss: 1.284
595
# [2, 12000] loss: 1.267
596
# Finished Training
597
#
598
# Note that the loss is monotonically descending, indicating that our
599
# model is continuing to improve its performance on the training dataset.
600
#
601
# As a final step, we should check that the model is actually doing
602
# *general* learning, and not simply “memorizing” the dataset. This is
603
# called **overfitting,** and usually indicates that the dataset is too
604
# small (not enough examples for general learning), or that the model has
605
# more learning parameters than it needs to correctly model the dataset.
606
#
607
# This is the reason datasets are split into training and test subsets -
608
# to test the generality of the model, we ask it to make predictions on
609
# data it hasn’t trained on:
610
#
611
612
correct = 0
613
total = 0
614
with torch.no_grad():
615
for data in testloader:
616
images, labels = data
617
outputs = net(images)
618
_, predicted = torch.max(outputs.data, 1)
619
total += labels.size(0)
620
correct += (predicted == labels).sum().item()
621
622
print('Accuracy of the network on the 10000 test images: %d %%' % (
623
100 * correct / total))
624
625
626
#########################################################################
627
# If you followed along, you should see that the model is roughly 50%
628
# accurate at this point. That’s not exactly state-of-the-art, but it’s
629
# far better than the 10% accuracy we’d expect from a random output. This
630
# demonstrates that some general learning did happen in the model.
631
#
632
633