Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/beginner_source/basics/optimization_tutorial.py
Views: 713
"""1`Learn the Basics <intro.html>`_ ||2`Quickstart <quickstart_tutorial.html>`_ ||3`Tensors <tensorqs_tutorial.html>`_ ||4`Datasets & DataLoaders <data_tutorial.html>`_ ||5`Transforms <transforms_tutorial.html>`_ ||6`Build Model <buildmodel_tutorial.html>`_ ||7`Autograd <autogradqs_tutorial.html>`_ ||8**Optimization** ||9`Save & Load Model <saveloadrun_tutorial.html>`_1011Optimizing Model Parameters12===========================1314Now that we have a model and data it's time to train, validate and test our model by optimizing its parameters on15our data. Training a model is an iterative process; in each iteration the model makes a guess about the output, calculates16the error in its guess (*loss*), collects the derivatives of the error with respect to its parameters (as we saw in17the `previous section <autograd_tutorial.html>`_), and **optimizes** these parameters using gradient descent. For a more18detailed walkthrough of this process, check out this video on `backpropagation from 3Blue1Brown <https://www.youtube.com/watch?v=tIeHLnjs5U8>`__.1920Prerequisite Code21-----------------22We load the code from the previous sections on `Datasets & DataLoaders <data_tutorial.html>`_23and `Build Model <buildmodel_tutorial.html>`_.24"""2526import torch27from torch import nn28from torch.utils.data import DataLoader29from torchvision import datasets30from torchvision.transforms import ToTensor3132training_data = datasets.FashionMNIST(33root="data",34train=True,35download=True,36transform=ToTensor()37)3839test_data = datasets.FashionMNIST(40root="data",41train=False,42download=True,43transform=ToTensor()44)4546train_dataloader = DataLoader(training_data, batch_size=64)47test_dataloader = DataLoader(test_data, batch_size=64)4849class NeuralNetwork(nn.Module):50def __init__(self):51super().__init__()52self.flatten = nn.Flatten()53self.linear_relu_stack = nn.Sequential(54nn.Linear(28*28, 512),55nn.ReLU(),56nn.Linear(512, 512),57nn.ReLU(),58nn.Linear(512, 10),59)6061def forward(self, x):62x = self.flatten(x)63logits = self.linear_relu_stack(x)64return logits6566model = NeuralNetwork()676869##############################################70# Hyperparameters71# -----------------72#73# Hyperparameters are adjustable parameters that let you control the model optimization process.74# Different hyperparameter values can impact model training and convergence rates75# (`read more <https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html>`__ about hyperparameter tuning)76#77# We define the following hyperparameters for training:78# - **Number of Epochs** - the number times to iterate over the dataset79# - **Batch Size** - the number of data samples propagated through the network before the parameters are updated80# - **Learning Rate** - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.81#8283learning_rate = 1e-384batch_size = 6485epochs = 586878889#####################################90# Optimization Loop91# -----------------92#93# Once we set our hyperparameters, we can then train and optimize our model with an optimization loop. Each94# iteration of the optimization loop is called an **epoch**.95#96# Each epoch consists of two main parts:97# - **The Train Loop** - iterate over the training dataset and try to converge to optimal parameters.98# - **The Validation/Test Loop** - iterate over the test dataset to check if model performance is improving.99#100# Let's briefly familiarize ourselves with some of the concepts used in the training loop. Jump ahead to101# see the :ref:`full-impl-label` of the optimization loop.102#103# Loss Function104# ~~~~~~~~~~~~~~~~~105#106# When presented with some training data, our untrained network is likely not to give the correct107# answer. **Loss function** measures the degree of dissimilarity of obtained result to the target value,108# and it is the loss function that we want to minimize during training. To calculate the loss we make a109# prediction using the inputs of our given data sample and compare it against the true data label value.110#111# Common loss functions include `nn.MSELoss <https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss>`_ (Mean Square Error) for regression tasks, and112# `nn.NLLLoss <https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss>`_ (Negative Log Likelihood) for classification.113# `nn.CrossEntropyLoss <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss>`_ combines ``nn.LogSoftmax`` and ``nn.NLLLoss``.114#115# We pass our model's output logits to ``nn.CrossEntropyLoss``, which will normalize the logits and compute the prediction error.116117# Initialize the loss function118loss_fn = nn.CrossEntropyLoss()119120121122#####################################123# Optimizer124# ~~~~~~~~~~~~~~~~~125#126# Optimization is the process of adjusting model parameters to reduce model error in each training step. **Optimization algorithms** define how this process is performed (in this example we use Stochastic Gradient Descent).127# All optimization logic is encapsulated in the ``optimizer`` object. Here, we use the SGD optimizer; additionally, there are many `different optimizers <https://pytorch.org/docs/stable/optim.html>`_128# available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.129#130# We initialize the optimizer by registering the model's parameters that need to be trained, and passing in the learning rate hyperparameter.131132optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)133134#####################################135# Inside the training loop, optimization happens in three steps:136# * Call ``optimizer.zero_grad()`` to reset the gradients of model parameters. Gradients by default add up; to prevent double-counting, we explicitly zero them at each iteration.137# * Backpropagate the prediction loss with a call to ``loss.backward()``. PyTorch deposits the gradients of the loss w.r.t. each parameter.138# * Once we have our gradients, we call ``optimizer.step()`` to adjust the parameters by the gradients collected in the backward pass.139140141########################################142# .. _full-impl-label:143#144# Full Implementation145# -----------------------146# We define ``train_loop`` that loops over our optimization code, and ``test_loop`` that147# evaluates the model's performance against our test data.148149def train_loop(dataloader, model, loss_fn, optimizer):150size = len(dataloader.dataset)151# Set the model to training mode - important for batch normalization and dropout layers152# Unnecessary in this situation but added for best practices153model.train()154for batch, (X, y) in enumerate(dataloader):155# Compute prediction and loss156pred = model(X)157loss = loss_fn(pred, y)158159# Backpropagation160loss.backward()161optimizer.step()162optimizer.zero_grad()163164if batch % 100 == 0:165loss, current = loss.item(), batch * batch_size + len(X)166print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")167168169def test_loop(dataloader, model, loss_fn):170# Set the model to evaluation mode - important for batch normalization and dropout layers171# Unnecessary in this situation but added for best practices172model.eval()173size = len(dataloader.dataset)174num_batches = len(dataloader)175test_loss, correct = 0, 0176177# Evaluating the model with torch.no_grad() ensures that no gradients are computed during test mode178# also serves to reduce unnecessary gradient computations and memory usage for tensors with requires_grad=True179with torch.no_grad():180for X, y in dataloader:181pred = model(X)182test_loss += loss_fn(pred, y).item()183correct += (pred.argmax(1) == y).type(torch.float).sum().item()184185test_loss /= num_batches186correct /= size187print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")188189190########################################191# We initialize the loss function and optimizer, and pass it to ``train_loop`` and ``test_loop``.192# Feel free to increase the number of epochs to track the model's improving performance.193194loss_fn = nn.CrossEntropyLoss()195optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)196197epochs = 10198for t in range(epochs):199print(f"Epoch {t+1}\n-------------------------------")200train_loop(train_dataloader, model, loss_fn, optimizer)201test_loop(test_dataloader, model, loss_fn)202print("Done!")203204205206#################################################################207# Further Reading208# -----------------------209# - `Loss Functions <https://pytorch.org/docs/stable/nn.html#loss-functions>`_210# - `torch.optim <https://pytorch.org/docs/stable/optim.html>`_211# - `Warmstart Training a Model <https://pytorch.org/tutorials/recipes/recipes/warmstarting_model_using_parameters_from_a_different_model.html>`_212#213214215