Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/beginner_source/basics/autogradqs_tutorial.py
Views: 713
"""1`Learn the Basics <intro.html>`_ ||2`Quickstart <quickstart_tutorial.html>`_ ||3`Tensors <tensorqs_tutorial.html>`_ ||4`Datasets & DataLoaders <data_tutorial.html>`_ ||5`Transforms <transforms_tutorial.html>`_ ||6`Build Model <buildmodel_tutorial.html>`_ ||7**Autograd** ||8`Optimization <optimization_tutorial.html>`_ ||9`Save & Load Model <saveloadrun_tutorial.html>`_1011Automatic Differentiation with ``torch.autograd``12=================================================1314When training neural networks, the most frequently used algorithm is15**back propagation**. In this algorithm, parameters (model weights) are16adjusted according to the **gradient** of the loss function with respect17to the given parameter.1819To compute those gradients, PyTorch has a built-in differentiation engine20called ``torch.autograd``. It supports automatic computation of gradient for any21computational graph.2223Consider the simplest one-layer neural network, with input ``x``,24parameters ``w`` and ``b``, and some loss function. It can be defined in25PyTorch in the following manner:26"""2728import torch2930x = torch.ones(5) # input tensor31y = torch.zeros(3) # expected output32w = torch.randn(5, 3, requires_grad=True)33b = torch.randn(3, requires_grad=True)34z = torch.matmul(x, w)+b35loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)363738######################################################################39# Tensors, Functions and Computational graph40# ------------------------------------------41#42# This code defines the following **computational graph**:43#44# .. figure:: /_static/img/basics/comp-graph.png45# :alt:46#47# In this network, ``w`` and ``b`` are **parameters**, which we need to48# optimize. Thus, we need to be able to compute the gradients of loss49# function with respect to those variables. In order to do that, we set50# the ``requires_grad`` property of those tensors.5152#######################################################################53# .. note:: You can set the value of ``requires_grad`` when creating a54# tensor, or later by using ``x.requires_grad_(True)`` method.5556#######################################################################57# A function that we apply to tensors to construct computational graph is58# in fact an object of class ``Function``. This object knows how to59# compute the function in the *forward* direction, and also how to compute60# its derivative during the *backward propagation* step. A reference to61# the backward propagation function is stored in ``grad_fn`` property of a62# tensor. You can find more information of ``Function`` `in the63# documentation <https://pytorch.org/docs/stable/autograd.html#function>`__.64#6566print(f"Gradient function for z = {z.grad_fn}")67print(f"Gradient function for loss = {loss.grad_fn}")6869######################################################################70# Computing Gradients71# -------------------72#73# To optimize weights of parameters in the neural network, we need to74# compute the derivatives of our loss function with respect to parameters,75# namely, we need :math:`\frac{\partial loss}{\partial w}` and76# :math:`\frac{\partial loss}{\partial b}` under some fixed values of77# ``x`` and ``y``. To compute those derivatives, we call78# ``loss.backward()``, and then retrieve the values from ``w.grad`` and79# ``b.grad``:80#8182loss.backward()83print(w.grad)84print(b.grad)858687######################################################################88# .. note::89# - We can only obtain the ``grad`` properties for the leaf90# nodes of the computational graph, which have ``requires_grad`` property91# set to ``True``. For all other nodes in our graph, gradients will not be92# available.93# - We can only perform gradient calculations using94# ``backward`` once on a given graph, for performance reasons. If we need95# to do several ``backward`` calls on the same graph, we need to pass96# ``retain_graph=True`` to the ``backward`` call.97#9899100######################################################################101# Disabling Gradient Tracking102# ---------------------------103#104# By default, all tensors with ``requires_grad=True`` are tracking their105# computational history and support gradient computation. However, there106# are some cases when we do not need to do that, for example, when we have107# trained the model and just want to apply it to some input data, i.e. we108# only want to do *forward* computations through the network. We can stop109# tracking computations by surrounding our computation code with110# ``torch.no_grad()`` block:111#112113z = torch.matmul(x, w)+b114print(z.requires_grad)115116with torch.no_grad():117z = torch.matmul(x, w)+b118print(z.requires_grad)119120121######################################################################122# Another way to achieve the same result is to use the ``detach()`` method123# on the tensor:124#125126z = torch.matmul(x, w)+b127z_det = z.detach()128print(z_det.requires_grad)129130######################################################################131# There are reasons you might want to disable gradient tracking:132# - To mark some parameters in your neural network as **frozen parameters**.133# - To **speed up computations** when you are only doing forward pass, because computations on tensors that do134# not track gradients would be more efficient.135136137######################################################################138139######################################################################140# More on Computational Graphs141# ----------------------------142# Conceptually, autograd keeps a record of data (tensors) and all executed143# operations (along with the resulting new tensors) in a directed acyclic144# graph (DAG) consisting of145# `Function <https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function>`__146# objects. In this DAG, leaves are the input tensors, roots are the output147# tensors. By tracing this graph from roots to leaves, you can148# automatically compute the gradients using the chain rule.149#150# In a forward pass, autograd does two things simultaneously:151#152# - run the requested operation to compute a resulting tensor153# - maintain the operation’s *gradient function* in the DAG.154#155# The backward pass kicks off when ``.backward()`` is called on the DAG156# root. ``autograd`` then:157#158# - computes the gradients from each ``.grad_fn``,159# - accumulates them in the respective tensor’s ``.grad`` attribute160# - using the chain rule, propagates all the way to the leaf tensors.161#162# .. note::163# **DAGs are dynamic in PyTorch**164# An important thing to note is that the graph is recreated from scratch; after each165# ``.backward()`` call, autograd starts populating a new graph. This is166# exactly what allows you to use control flow statements in your model;167# you can change the shape, size and operations at every iteration if168# needed.169170######################################################################171# Optional Reading: Tensor Gradients and Jacobian Products172# --------------------------------------------------------173#174# In many cases, we have a scalar loss function, and we need to compute175# the gradient with respect to some parameters. However, there are cases176# when the output function is an arbitrary tensor. In this case, PyTorch177# allows you to compute so-called **Jacobian product**, and not the actual178# gradient.179#180# For a vector function :math:`\vec{y}=f(\vec{x})`, where181# :math:`\vec{x}=\langle x_1,\dots,x_n\rangle` and182# :math:`\vec{y}=\langle y_1,\dots,y_m\rangle`, a gradient of183# :math:`\vec{y}` with respect to :math:`\vec{x}` is given by **Jacobian184# matrix**:185#186# .. math::187#188#189# J=\left(\begin{array}{ccc}190# \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{n}}\\191# \vdots & \ddots & \vdots\\192# \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{n}}193# \end{array}\right)194#195# Instead of computing the Jacobian matrix itself, PyTorch allows you to196# compute **Jacobian Product** :math:`v^T\cdot J` for a given input vector197# :math:`v=(v_1 \dots v_m)`. This is achieved by calling ``backward`` with198# :math:`v` as an argument. The size of :math:`v` should be the same as199# the size of the original tensor, with respect to which we want to200# compute the product:201#202203inp = torch.eye(4, 5, requires_grad=True)204out = (inp+1).pow(2).t()205out.backward(torch.ones_like(out), retain_graph=True)206print(f"First call\n{inp.grad}")207out.backward(torch.ones_like(out), retain_graph=True)208print(f"\nSecond call\n{inp.grad}")209inp.grad.zero_()210out.backward(torch.ones_like(out), retain_graph=True)211print(f"\nCall after zeroing gradients\n{inp.grad}")212213214######################################################################215# Notice that when we call ``backward`` for the second time with the same216# argument, the value of the gradient is different. This happens because217# when doing ``backward`` propagation, PyTorch **accumulates the218# gradients**, i.e. the value of computed gradients is added to the219# ``grad`` property of all leaf nodes of computational graph. If you want220# to compute the proper gradients, you need to zero out the ``grad``221# property before. In real-life training an *optimizer* helps us to do222# this.223224######################################################################225# .. note:: Previously we were calling ``backward()`` function without226# parameters. This is essentially equivalent to calling227# ``backward(torch.tensor(1.0))``, which is a useful way to compute the228# gradients in case of a scalar-valued function, such as loss during229# neural network training.230#231232######################################################################233# --------------234#235236#################################################################237# Further Reading238# ~~~~~~~~~~~~~~~~~239# - `Autograd Mechanics <https://pytorch.org/docs/stable/notes/autograd.html>`_240241242