CoCalc -- nn_tutorial.py

GitHub Repository: pytorch/tutorials
Path: blob/main/beginner_source/nn_tutorial.py
²³⁵³ views
1
# -*- coding: utf-8 -*-
2
"""
3
What is `torch.nn` *really*?
4
============================
5

6
**Authors:** Jeremy Howard, `fast.ai <https://www.fast.ai>`_. Thanks to Rachel Thomas and Francisco Ingham.
7
"""
8

9
###############################################################################
10
# We recommend running this tutorial as a notebook, not a script. To download the notebook (``.ipynb``) file,
11
# click the link at the top of the page.
12
#
13
# PyTorch provides the elegantly designed modules and classes `torch.nn <https://pytorch.org/docs/stable/nn.html>`_ ,
14
# `torch.optim <https://pytorch.org/docs/stable/optim.html>`_ ,
15
# `Dataset <https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset>`_ ,
16
# and `DataLoader <https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader>`_
17
# to help you create and train neural networks.
18
# In order to fully utilize their power and customize
19
# them for your problem, you need to really understand exactly what they're
20
# doing. To develop this understanding, we will first train basic neural net
21
# on the MNIST data set without using any features from these models; we will
22
# initially only use the most basic PyTorch tensor functionality. Then, we will
23
# incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or
24
# ``DataLoader`` at a time, showing exactly what each piece does, and how it
25
# works to make the code either more concise, or more flexible.
26
#
27
# **This tutorial assumes you already have PyTorch installed, and are familiar
28
# with the basics of tensor operations.** (If you're familiar with Numpy array
29
# operations, you'll find the PyTorch tensor operations used here nearly identical).
30
#
31
# MNIST data setup
32
# ----------------
33
#
34
# We will use the classic `MNIST <https://yann.lecun.com/exdb/mnist/index.html>`_ dataset,
35
# which consists of black-and-white images of hand-drawn digits (between 0 and 9).
36
#
37
# We will use `pathlib <https://docs.python.org/3/library/pathlib.html>`_
38
# for dealing with paths (part of the Python 3 standard library), and will
39
# download the dataset using
40
# `requests <http://docs.python-requests.org/en/master/>`_. We will only
41
# import modules when we use them, so you can see exactly what's being
42
# used at each point.
43

44
from pathlib import Path
45
import requests
46

47
DATA_PATH = Path("data")
48
PATH = DATA_PATH / "mnist"
49

50
PATH.mkdir(parents=True, exist_ok=True)
51

52
URL = "https://github.com/pytorch/tutorials/raw/main/_static/"
53
FILENAME = "mnist.pkl.gz"
54

55
if not (PATH / FILENAME).exists():
56
        content = requests.get(URL + FILENAME).content
57
        (PATH / FILENAME).open("wb").write(content)
58

59
###############################################################################
60
# This dataset is in numpy array format, and has been stored using pickle,
61
# a python-specific format for serializing data.
62

63
import pickle
64
import gzip
65

66
with gzip.open((PATH / FILENAME).as_posix(), "rb") as f:
67
        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding="latin-1")
68

69
###############################################################################
70
# Each image is 28 x 28, and is being stored as a flattened row of length
71
# 784 (=28x28). Let's take a look at one; we need to reshape it to 2d
72
# first.
73

74
from matplotlib import pyplot
75
import numpy as np
76

77
pyplot.imshow(x_train[0].reshape((28, 28)), cmap="gray")
78
# ``pyplot.show()`` only if not on Colab
79
try:
80
    import google.colab
81
except ImportError:
82
    pyplot.show()
83
print(x_train.shape)
84

85
###############################################################################
86
# PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to
87
# convert our data.
88

89
import torch
90

91
x_train, y_train, x_valid, y_valid = map(
92
    torch.tensor, (x_train, y_train, x_valid, y_valid)
93
)
94
n, c = x_train.shape
95
print(x_train, y_train)
96
print(x_train.shape)
97
print(y_train.min(), y_train.max())
98

99
###############################################################################
100
# Neural net from scratch (without ``torch.nn``)
101
# -----------------------------------------------
102
#
103
# Let's first create a model using nothing but PyTorch tensor operations. We're assuming
104
# you're already familiar with the basics of neural networks. (If you're not, you can
105
# learn them at `course.fast.ai <https://course.fast.ai>`_).
106
#
107
# PyTorch provides methods to create random or zero-filled tensors, which we will
108
# use to create our weights and bias for a simple linear model. These are just regular
109
# tensors, with one very special addition: we tell PyTorch that they require a
110
# gradient. This causes PyTorch to record all of the operations done on the tensor,
111
# so that it can calculate the gradient during back-propagation *automatically*!
112
#
113
# For the weights, we set ``requires_grad`` **after** the initialization, since we
114
# don't want that step included in the gradient. (Note that a trailing ``_`` in
115
# PyTorch signifies that the operation is performed in-place.)
116
#
117
# .. note:: We are initializing the weights here with
118
#    `Xavier initialisation <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_
119
#    (by multiplying with ``1/sqrt(n)``).
120

121
import math
122

123
weights = torch.randn(784, 10) / math.sqrt(784)
124
weights.requires_grad_()
125
bias = torch.zeros(10, requires_grad=True)
126

127
###############################################################################
128
# Thanks to PyTorch's ability to calculate gradients automatically, we can
129
# use any standard Python function (or callable object) as a model! So
130
# let's just write a plain matrix multiplication and broadcasted addition
131
# to create a simple linear model. We also need an activation function, so
132
# we'll write `log_softmax` and use it. Remember: although PyTorch
133
# provides lots of prewritten loss functions, activation functions, and
134
# so forth, you can easily write your own using plain python. PyTorch will
135
# even create fast accelerator or vectorized CPU code for your function
136
# automatically.
137

138
def log_softmax(x):
139
    return x - x.exp().sum(-1).log().unsqueeze(-1)
140

141
def model(xb):
142
    return log_softmax(xb @ weights + bias)
143

144
######################################################################################
145
# In the above, the ``@`` stands for the matrix multiplication operation. We will call
146
# our function on one batch of data (in this case, 64 images).  This is
147
# one *forward pass*.  Note that our predictions won't be any better than
148
# random at this stage, since we start with random weights.
149

150
bs = 64  # batch size
151

152
xb = x_train[0:bs]  # a mini-batch from x
153
preds = model(xb)  # predictions
154
preds[0], preds.shape
155
print(preds[0], preds.shape)
156

157
###############################################################################
158
# As you see, the ``preds`` tensor contains not only the tensor values, but also a
159
# gradient function. We'll use this later to do backprop.
160
#
161
# Let's implement negative log-likelihood to use as the loss function
162
# (again, we can just use standard Python):
163

164

165
def nll(input, target):
166
    return -input[range(target.shape[0]), target].mean()
167

168
loss_func = nll
169

170
###############################################################################
171
# Let's check our loss with our random model, so we can see if we improve
172
# after a backprop pass later.
173

174
yb = y_train[0:bs]
175
print(loss_func(preds, yb))
176

177

178
###############################################################################
179
# Let's also implement a function to calculate the accuracy of our model.
180
# For each prediction, if the index with the largest value matches the
181
# target value, then the prediction was correct.
182

183
def accuracy(out, yb):
184
    preds = torch.argmax(out, dim=1)
185
    return (preds == yb).float().mean()
186

187
###############################################################################
188
# Let's check the accuracy of our random model, so we can see if our
189
# accuracy improves as our loss improves.
190

191
print(accuracy(preds, yb))
192

193
###############################################################################
194
# We can now run a training loop.  For each iteration, we will:
195
#
196
# - select a mini-batch of data (of size ``bs``)
197
# - use the model to make predictions
198
# - calculate the loss
199
# - ``loss.backward()`` updates the gradients of the model, in this case, ``weights``
200
#   and ``bias``.
201
#
202
# We now use these gradients to update the weights and bias.  We do this
203
# within the ``torch.no_grad()`` context manager, because we do not want these
204
# actions to be recorded for our next calculation of the gradient.  You can read
205
# more about how PyTorch's Autograd records operations
206
# `here <https://pytorch.org/docs/stable/notes/autograd.html>`_.
207
#
208
# We then set the
209
# gradients to zero, so that we are ready for the next loop.
210
# Otherwise, our gradients would record a running tally of all the operations
211
# that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is
212
# already stored, rather than replacing them).
213
#
214
# .. tip:: You can use the standard python debugger to step through PyTorch
215
#    code, allowing you to check the various variable values at each step.
216
#    Uncomment ``set_trace()`` below to try it out.
217
#
218

219
from IPython.core.debugger import set_trace
220

221
lr = 0.5  # learning rate
222
epochs = 2  # how many epochs to train for
223

224
for epoch in range(epochs):
225
    for i in range((n - 1) // bs + 1):
226
        #         set_trace()
227
        start_i = i * bs
228
        end_i = start_i + bs
229
        xb = x_train[start_i:end_i]
230
        yb = y_train[start_i:end_i]
231
        pred = model(xb)
232
        loss = loss_func(pred, yb)
233

234
        loss.backward()
235
        with torch.no_grad():
236
            weights -= weights.grad * lr
237
            bias -= bias.grad * lr
238
            weights.grad.zero_()
239
            bias.grad.zero_()
240

241
###############################################################################
242
# That's it: we've created and trained a minimal neural network (in this case, a
243
# logistic regression, since we have no hidden layers) entirely from scratch!
244
#
245
# Let's check the loss and accuracy and compare those to what we got
246
# earlier. We expect that the loss will have decreased and accuracy to
247
# have increased, and they have.
248

249
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
250

251
###############################################################################
252
# Using ``torch.nn.functional``
253
# ------------------------------
254
#
255
# We will now refactor our code, so that it does the same thing as before, only
256
# we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise
257
# and flexible. At each step from here, we should be making our code one or more
258
# of: shorter, more understandable, and/or more flexible.
259
#
260
# The first and easiest step is to make our code shorter by replacing our
261
# hand-written activation and loss functions with those from ``torch.nn.functional``
262
# (which is generally imported into the namespace ``F`` by convention). This module
263
# contains all the functions in the ``torch.nn`` library (whereas other parts of the
264
# library contain classes). As well as a wide range of loss and activation
265
# functions, you'll also find here some convenient functions for creating neural
266
# nets, such as pooling functions. (There are also functions for doing convolutions,
267
# linear layers, etc, but as we'll see, these are usually better handled using
268
# other parts of the library.)
269
#
270
# If you're using negative log likelihood loss and log softmax activation,
271
# then Pytorch provides a single function ``F.cross_entropy`` that combines
272
# the two. So we can even remove the activation function from our model.
273

274
import torch.nn.functional as F
275

276
loss_func = F.cross_entropy
277

278
def model(xb):
279
    return xb @ weights + bias
280

281
###############################################################################
282
# Note that we no longer call ``log_softmax`` in the ``model`` function. Let's
283
# confirm that our loss and accuracy are the same as before:
284

285
print(loss_func(model(xb), yb), accuracy(model(xb), yb))
286

287
###############################################################################
288
# Refactor using ``nn.Module``
289
# -----------------------------
290
# Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more
291
# concise training loop. We subclass ``nn.Module`` (which itself is a class and
292
# able to keep track of state).  In this case, we want to create a class that
293
# holds our weights, bias, and method for the forward step.  ``nn.Module`` has a
294
# number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)
295
# which we will be using.
296
#
297
# .. note:: ``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a
298
#    class we'll be using a lot. ``nn.Module`` is not to be confused with the Python
299
#    concept of a (lowercase ``m``) `module <https://docs.python.org/3/tutorial/modules.html>`_,
300
#    which is a file of Python code that can be imported.
301

302
from torch import nn
303

304
class Mnist_Logistic(nn.Module):
305
    def __init__(self):
306
        super().__init__()
307
        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))
308
        self.bias = nn.Parameter(torch.zeros(10))
309

310
    def forward(self, xb):
311
        return xb @ self.weights + self.bias
312

313
###############################################################################
314
# Since we're now using an object instead of just using a function, we
315
# first have to instantiate our model:
316

317
model = Mnist_Logistic()
318

319
###############################################################################
320
# Now we can calculate the loss in the same way as before. Note that
321
# ``nn.Module`` objects are used as if they are functions (i.e they are
322
# *callable*), but behind the scenes Pytorch will call our ``forward``
323
# method automatically.
324

325
print(loss_func(model(xb), yb))
326

327
###############################################################################
328
# Previously for our training loop we had to update the values for each parameter
329
# by name, and manually zero out the grads for each parameter separately, like this:
330
#
331
# .. code-block:: python
332
#
333
#    with torch.no_grad():
334
#        weights -= weights.grad * lr
335
#        bias -= bias.grad * lr
336
#        weights.grad.zero_()
337
#        bias.grad.zero_()
338
#
339
#
340
# Now we can take advantage of model.parameters() and model.zero_grad() (which
341
# are both defined by PyTorch for ``nn.Module``) to make those steps more concise
342
# and less prone to the error of forgetting some of our parameters, particularly
343
# if we had a more complicated model:
344
#
345
# .. code-block:: python
346
#
347
#    with torch.no_grad():
348
#        for p in model.parameters(): p -= p.grad * lr
349
#        model.zero_grad()
350
#
351
#
352
# We'll wrap our little training loop in a ``fit`` function so we can run it
353
# again later.
354

355
def fit():
356
    for epoch in range(epochs):
357
        for i in range((n - 1) // bs + 1):
358
            start_i = i * bs
359
            end_i = start_i + bs
360
            xb = x_train[start_i:end_i]
361
            yb = y_train[start_i:end_i]
362
            pred = model(xb)
363
            loss = loss_func(pred, yb)
364

365
            loss.backward()
366
            with torch.no_grad():
367
                for p in model.parameters():
368
                    p -= p.grad * lr
369
                model.zero_grad()
370

371
fit()
372

373
###############################################################################
374
# Let's double-check that our loss has gone down:
375

376
print(loss_func(model(xb), yb))
377

378
###############################################################################
379
# Refactor using ``nn.Linear``
380
# ----------------------------
381
#
382
# We continue to refactor our code.  Instead of manually defining and
383
# initializing ``self.weights`` and ``self.bias``, and calculating ``xb  @
384
# self.weights + self.bias``, we will instead use the Pytorch class
385
# `nn.Linear <https://pytorch.org/docs/stable/nn.html#linear-layers>`_ for a
386
# linear layer, which does all that for us. Pytorch has many types of
387
# predefined layers that can greatly simplify our code, and often makes it
388
# faster too.
389

390
class Mnist_Logistic(nn.Module):
391
    def __init__(self):
392
        super().__init__()
393
        self.lin = nn.Linear(784, 10)
394

395
    def forward(self, xb):
396
        return self.lin(xb)
397

398
###############################################################################
399
# We instantiate our model and calculate the loss in the same way as before:
400

401
model = Mnist_Logistic()
402
print(loss_func(model(xb), yb))
403

404
###############################################################################
405
# We are still able to use our same ``fit`` method as before.
406

407
fit()
408

409
print(loss_func(model(xb), yb))
410

411
###############################################################################
412
# Refactor using ``torch.optim``
413
# ------------------------------
414
#
415
# Pytorch also has a package with various optimization algorithms, ``torch.optim``.
416
# We can use the ``step`` method from our optimizer to take a forward step, instead
417
# of manually updating each parameter.
418
#
419
# This will let us replace our previous manually coded optimization step:
420
#
421
# .. code-block:: python
422
#
423
#    with torch.no_grad():
424
#        for p in model.parameters(): p -= p.grad * lr
425
#        model.zero_grad()
426
#
427
# and instead use just:
428
#
429
# .. code-block:: python
430
#
431
#    opt.step()
432
#    opt.zero_grad()
433
#
434
# (``optim.zero_grad()`` resets the gradient to 0 and we need to call it before
435
# computing the gradient for the next minibatch.)
436

437
from torch import optim
438

439
###############################################################################
440
# We'll define a little function to create our model and optimizer so we
441
# can reuse it in the future.
442

443
def get_model():
444
    model = Mnist_Logistic()
445
    return model, optim.SGD(model.parameters(), lr=lr)
446

447
model, opt = get_model()
448
print(loss_func(model(xb), yb))
449

450
for epoch in range(epochs):
451
    for i in range((n - 1) // bs + 1):
452
        start_i = i * bs
453
        end_i = start_i + bs
454
        xb = x_train[start_i:end_i]
455
        yb = y_train[start_i:end_i]
456
        pred = model(xb)
457
        loss = loss_func(pred, yb)
458

459
        loss.backward()
460
        opt.step()
461
        opt.zero_grad()
462

463
print(loss_func(model(xb), yb))
464

465
###############################################################################
466
# Refactor using Dataset
467
# ------------------------------
468
#
469
# PyTorch has an abstract Dataset class.  A Dataset can be anything that has
470
# a ``__len__`` function (called by Python's standard ``len`` function) and
471
# a ``__getitem__`` function as a way of indexing into it.
472
# `This tutorial <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`_
473
# walks through a nice example of creating a custom ``FacialLandmarkDataset`` class
474
# as a subclass of ``Dataset``.
475
#
476
# PyTorch's `TensorDataset <https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#TensorDataset>`_
477
# is a Dataset wrapping tensors. By defining a length and way of indexing,
478
# this also gives us a way to iterate, index, and slice along the first
479
# dimension of a tensor. This will make it easier to access both the
480
# independent and dependent variables in the same line as we train.
481

482
from torch.utils.data import TensorDataset
483

484
###############################################################################
485
# Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,
486
# which will be easier to iterate over and slice.
487

488
train_ds = TensorDataset(x_train, y_train)
489

490
###############################################################################
491
# Previously, we had to iterate through minibatches of ``x`` and ``y`` values separately:
492
#
493
# .. code-block:: python
494
#
495
#    xb = x_train[start_i:end_i]
496
#    yb = y_train[start_i:end_i]
497
#
498
#
499
# Now, we can do these two steps together:
500
#
501
# .. code-block:: python
502
#
503
#    xb,yb = train_ds[i*bs : i*bs+bs]
504
#
505

506
model, opt = get_model()
507

508
for epoch in range(epochs):
509
    for i in range((n - 1) // bs + 1):
510
        xb, yb = train_ds[i * bs: i * bs + bs]
511
        pred = model(xb)
512
        loss = loss_func(pred, yb)
513

514
        loss.backward()
515
        opt.step()
516
        opt.zero_grad()
517

518
print(loss_func(model(xb), yb))
519

520
###############################################################################
521
# Refactor using ``DataLoader``
522
# ------------------------------
523
#
524
# PyTorch's ``DataLoader`` is responsible for managing batches. You can
525
# create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier
526
# to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,
527
# the ``DataLoader`` gives us each minibatch automatically.
528

529
from torch.utils.data import DataLoader
530

531
train_ds = TensorDataset(x_train, y_train)
532
train_dl = DataLoader(train_ds, batch_size=bs)
533

534
###############################################################################
535
# Previously, our loop iterated over batches ``(xb, yb)`` like this:
536
#
537
# .. code-block:: python
538
#
539
#    for i in range((n-1)//bs + 1):
540
#        xb,yb = train_ds[i*bs : i*bs+bs]
541
#        pred = model(xb)
542
#
543
# Now, our loop is much cleaner, as ``(xb, yb)`` are loaded automatically from the data loader:
544
#
545
# .. code-block:: python
546
#
547
#    for xb,yb in train_dl:
548
#        pred = model(xb)
549

550
model, opt = get_model()
551

552
for epoch in range(epochs):
553
    for xb, yb in train_dl:
554
        pred = model(xb)
555
        loss = loss_func(pred, yb)
556

557
        loss.backward()
558
        opt.step()
559
        opt.zero_grad()
560

561
print(loss_func(model(xb), yb))
562

563
###############################################################################
564
# Thanks to PyTorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,
565
# our training loop is now dramatically smaller and easier to understand. Let's
566
# now try to add the basic features necessary to create effective models in practice.
567
#
568
# Add validation
569
# -----------------------
570
#
571
# In section 1, we were just trying to get a reasonable training loop set up for
572
# use on our training data.  In reality, you **always** should also have
573
# a `validation set <https://www.fast.ai/2017/11/13/validation-sets/>`_, in order
574
# to identify if you are overfitting.
575
#
576
# Shuffling the training data is
577
# `important <https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks>`_
578
# to prevent correlation between batches and overfitting. On the other hand, the
579
# validation loss will be identical whether we shuffle the validation set or not.
580
# Since shuffling takes extra time, it makes no sense to shuffle the validation data.
581
#
582
# We'll use a batch size for the validation set that is twice as large as
583
# that for the training set. This is because the validation set does not
584
# need backpropagation and thus takes less memory (it doesn't need to
585
# store the gradients). We take advantage of this to use a larger batch
586
# size and compute the loss more quickly.
587

588
train_ds = TensorDataset(x_train, y_train)
589
train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)
590

591
valid_ds = TensorDataset(x_valid, y_valid)
592
valid_dl = DataLoader(valid_ds, batch_size=bs * 2)
593

594
###############################################################################
595
# We will calculate and print the validation loss at the end of each epoch.
596
#
597
# (Note that we always call ``model.train()`` before training, and ``model.eval()``
598
# before inference, because these are used by layers such as ``nn.BatchNorm2d``
599
# and ``nn.Dropout`` to ensure appropriate behavior for these different phases.)
600

601
model, opt = get_model()
602

603
for epoch in range(epochs):
604
    model.train()
605
    for xb, yb in train_dl:
606
        pred = model(xb)
607
        loss = loss_func(pred, yb)
608

609
        loss.backward()
610
        opt.step()
611
        opt.zero_grad()
612

613
    model.eval()
614
    with torch.no_grad():
615
        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)
616

617
    print(epoch, valid_loss / len(valid_dl))
618

619
###############################################################################
620
# Create fit() and get_data()
621
# ----------------------------------
622
#
623
# We'll now do a little refactoring of our own. Since we go through a similar
624
# process twice of calculating the loss for both the training set and the
625
# validation set, let's make that into its own function, ``loss_batch``, which
626
# computes the loss for one batch.
627
#
628
# We pass an optimizer in for the training set, and use it to perform
629
# backprop.  For the validation set, we don't pass an optimizer, so the
630
# method doesn't perform backprop.
631

632

633
def loss_batch(model, loss_func, xb, yb, opt=None):
634
    loss = loss_func(model(xb), yb)
635

636
    if opt is not None:
637
        loss.backward()
638
        opt.step()
639
        opt.zero_grad()
640

641
    return loss.item(), len(xb)
642

643
###############################################################################
644
# ``fit`` runs the necessary operations to train our model and compute the
645
# training and validation losses for each epoch.
646

647
import numpy as np
648

649
def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
650
    for epoch in range(epochs):
651
        model.train()
652
        for xb, yb in train_dl:
653
            loss_batch(model, loss_func, xb, yb, opt)
654

655
        model.eval()
656
        with torch.no_grad():
657
            losses, nums = zip(
658
                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]
659
            )
660
        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)
661

662
        print(epoch, val_loss)
663

664
###############################################################################
665
# ``get_data`` returns dataloaders for the training and validation sets.
666

667

668
def get_data(train_ds, valid_ds, bs):
669
    return (
670
        DataLoader(train_ds, batch_size=bs, shuffle=True),
671
        DataLoader(valid_ds, batch_size=bs * 2),
672
    )
673

674
###############################################################################
675
# Now, our whole process of obtaining the data loaders and fitting the
676
# model can be run in 3 lines of code:
677

678
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
679
model, opt = get_model()
680
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
681

682
###############################################################################
683
# You can use these basic 3 lines of code to train a wide variety of models.
684
# Let's see if we can use them to train a convolutional neural network (CNN)!
685
#
686
# Switch to CNN
687
# -------------
688
#
689
# We are now going to build our neural network with three convolutional layers.
690
# Because none of the functions in the previous section assume anything about
691
# the model form, we'll be able to use them to train a CNN without any modification.
692
#
693
# We will use PyTorch's predefined
694
# `Conv2d <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>`_ class
695
# as our convolutional layer. We define a CNN with 3 convolutional layers.
696
# Each convolution is followed by a ReLU.  At the end, we perform an
697
# average pooling.  (Note that ``view`` is PyTorch's version of Numpy's
698
# ``reshape``)
699

700
class Mnist_CNN(nn.Module):
701
    def __init__(self):
702
        super().__init__()
703
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)
704
        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)
705
        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)
706

707
    def forward(self, xb):
708
        xb = xb.view(-1, 1, 28, 28)
709
        xb = F.relu(self.conv1(xb))
710
        xb = F.relu(self.conv2(xb))
711
        xb = F.relu(self.conv3(xb))
712
        xb = F.avg_pool2d(xb, 4)
713
        return xb.view(-1, xb.size(1))
714

715
lr = 0.1
716

717
###############################################################################
718
# `Momentum <https://cs231n.github.io/neural-networks-3/#sgd>`_ is a variation on
719
# stochastic gradient descent that takes previous updates into account as well
720
# and generally leads to faster training.
721

722
model = Mnist_CNN()
723
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
724

725
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
726

727
###############################################################################
728
# Using ``nn.Sequential``
729
# ------------------------
730
#
731
# ``torch.nn`` has another handy class we can use to simplify our code:
732
# `Sequential <https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential>`_ .
733
# A ``Sequential`` object runs each of the modules contained within it, in a
734
# sequential manner. This is a simpler way of writing our neural network.
735
#
736
# To take advantage of this, we need to be able to easily define a
737
# **custom layer** from a given function.  For instance, PyTorch doesn't
738
# have a `view` layer, and we need to create one for our network. ``Lambda``
739
# will create a layer that we can then use when defining a network with
740
# ``Sequential``.
741

742
class Lambda(nn.Module):
743
    def __init__(self, func):
744
        super().__init__()
745
        self.func = func
746

747
    def forward(self, x):
748
        return self.func(x)
749

750

751
def preprocess(x):
752
    return x.view(-1, 1, 28, 28)
753

754
###############################################################################
755
# The model created with ``Sequential`` is simple:
756

757
model = nn.Sequential(
758
    Lambda(preprocess),
759
    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
760
    nn.ReLU(),
761
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
762
    nn.ReLU(),
763
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
764
    nn.ReLU(),
765
    nn.AvgPool2d(4),
766
    Lambda(lambda x: x.view(x.size(0), -1)),
767
)
768

769
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
770

771
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
772

773
###############################################################################
774
# Wrapping ``DataLoader``
775
# -----------------------------
776
#
777
# Our CNN is fairly concise, but it only works with MNIST, because:
778
#  - It assumes the input is a 28\*28 long vector
779
#  - It assumes that the final CNN grid size is 4\*4 (since that's the average pooling kernel size we used)
780
#
781
# Let's get rid of these two assumptions, so our model works with any 2d
782
# single channel image. First, we can remove the initial Lambda layer by
783
# moving the data preprocessing into a generator:
784

785
def preprocess(x, y):
786
    return x.view(-1, 1, 28, 28), y
787

788

789
class WrappedDataLoader:
790
    def __init__(self, dl, func):
791
        self.dl = dl
792
        self.func = func
793

794
    def __len__(self):
795
        return len(self.dl)
796

797
    def __iter__(self):
798
        for b in self.dl:
799
            yield (self.func(*b))
800

801
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
802
train_dl = WrappedDataLoader(train_dl, preprocess)
803
valid_dl = WrappedDataLoader(valid_dl, preprocess)
804

805
###############################################################################
806
# Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which
807
# allows us to define the size of the *output* tensor we want, rather than
808
# the *input* tensor we have. As a result, our model will work with any
809
# size input.
810

811
model = nn.Sequential(
812
    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),
813
    nn.ReLU(),
814
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),
815
    nn.ReLU(),
816
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),
817
    nn.ReLU(),
818
    nn.AdaptiveAvgPool2d(1),
819
    Lambda(lambda x: x.view(x.size(0), -1)),
820
)
821

822
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
823

824
###############################################################################
825
# Let's try it out:
826

827
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
828

829
###############################################################################
830
# Using your `Accelerator <https://pytorch.org/docs/stable/torch.html#accelerators>`__
831
# ---------------
832
#
833
# If you're lucky enough to have access to an accelerator such as CUDA (you can
834
# rent one for about $0.50/hour from most cloud providers) you can
835
# use it to speed up your code. First check that your accelerator is working in
836
# Pytorch:
837

838
# If the current accelerator is available, we will use it. Otherwise, we use the CPU.
839
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
840
print(f"Using {device} device")
841

842

843
###############################################################################
844
# Let's update ``preprocess`` to move batches to the accelerator:
845

846

847
def preprocess(x, y):
848
    return x.view(-1, 1, 28, 28).to(device), y.to(device)
849

850

851
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
852
train_dl = WrappedDataLoader(train_dl, preprocess)
853
valid_dl = WrappedDataLoader(valid_dl, preprocess)
854

855
###############################################################################
856
# Finally, we can move our model to the accelerator.
857

858
model.to(device)
859
opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
860

861
###############################################################################
862
# You should find it runs faster now:
863

864
fit(epochs, model, loss_func, opt, train_dl, valid_dl)
865

866
###############################################################################
867
# Closing thoughts
868
# -----------------
869
#
870
# We now have a general data pipeline and training loop which you can use for
871
# training many types of models using Pytorch. To see how simple training a model
872
# can now be, take a look at the `mnist_sample notebook <https://github.com/fastai/fastai_dev/blob/master/dev_nb/mnist_sample.ipynb>`__.
873
#
874
# Of course, there are many things you'll want to add, such as data augmentation,
875
# hyperparameter tuning, monitoring training, transfer learning, and so forth.
876
# These features are available in the fastai library, which has been developed
877
# using the same design approach shown in this tutorial, providing a natural
878
# next step for practitioners looking to take their models further.
879
#
880
# We promised at the start of this tutorial we'd explain through example each of
881
# ``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize
882
# what we've seen:
883
#
884
#  - ``torch.nn``:
885
#
886
#    + ``Module``: creates a callable which behaves like a function, but can also
887
#      contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it
888
#      contains and can zero all their gradients, loop through them for weight updates, etc.
889
#    + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights
890
#      that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated
891
#    + ``functional``: a module(usually imported into the ``F`` namespace by convention)
892
#      which contains activation functions, loss functions, etc, as well as non-stateful
893
#      versions of layers such as convolutional and linear layers.
894
#  - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights
895
#    of ``Parameter`` during the backward step
896
#  - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,
897
#    including classes provided with Pytorch such as ``TensorDataset``
898
#  - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.
899

900
Product

Resources

Company