Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.
Path: blob/main/beginner_source/nlp/pytorch_tutorial.py
Views: 713
# -*- coding: utf-8 -*-1r"""2Introduction to PyTorch3***********************45Introduction to Torch's tensor library6======================================78All of deep learning is computations on tensors, which are9generalizations of a matrix that can be indexed in more than 210dimensions. We will see exactly what this means in-depth later. First,11let's look what we can do with tensors.12"""13# Author: Robert Guthrie1415import torch1617torch.manual_seed(1)181920######################################################################21# Creating Tensors22# ~~~~~~~~~~~~~~~~23#24# Tensors can be created from Python lists with the torch.tensor()25# function.26#2728# torch.tensor(data) creates a torch.Tensor object with the given data.29V_data = [1., 2., 3.]30V = torch.tensor(V_data)31print(V)3233# Creates a matrix34M_data = [[1., 2., 3.], [4., 5., 6]]35M = torch.tensor(M_data)36print(M)3738# Create a 3D tensor of size 2x2x2.39T_data = [[[1., 2.], [3., 4.]],40[[5., 6.], [7., 8.]]]41T = torch.tensor(T_data)42print(T)434445######################################################################46# What is a 3D tensor anyway? Think about it like this. If you have a47# vector, indexing into the vector gives you a scalar. If you have a48# matrix, indexing into the matrix gives you a vector. If you have a 3D49# tensor, then indexing into the tensor gives you a matrix!50#51# A note on terminology:52# when I say "tensor" in this tutorial, it refers53# to any torch.Tensor object. Matrices and vectors are special cases of54# torch.Tensors, where their dimension is 2 and 1 respectively. When I am55# talking about 3D tensors, I will explicitly use the term "3D tensor".56#5758# Index into V and get a scalar (0 dimensional tensor)59print(V[0])60# Get a Python number from it61print(V[0].item())6263# Index into M and get a vector64print(M[0])6566# Index into T and get a matrix67print(T[0])686970######################################################################71# You can also create tensors of other data types. To create a tensor of integer types, try72# torch.tensor([[1, 2], [3, 4]]) (where all elements in the list are integers).73# You can also specify a data type by passing in ``dtype=torch.data_type``.74# Check the documentation for more data types, but75# Float and Long will be the most common.76#777879######################################################################80# You can create a tensor with random data and the supplied dimensionality81# with torch.randn()82#8384x = torch.randn((3, 4, 5))85print(x)868788######################################################################89# Operations with Tensors90# ~~~~~~~~~~~~~~~~~~~~~~~91#92# You can operate on tensors in the ways you would expect.9394x = torch.tensor([1., 2., 3.])95y = torch.tensor([4., 5., 6.])96z = x + y97print(z)9899100######################################################################101# See `the documentation <https://pytorch.org/docs/torch.html>`__ for a102# complete list of the massive number of operations available to you. They103# expand beyond just mathematical operations.104#105# One helpful operation that we will make use of later is concatenation.106#107108# By default, it concatenates along the first axis (concatenates rows)109x_1 = torch.randn(2, 5)110y_1 = torch.randn(3, 5)111z_1 = torch.cat([x_1, y_1])112print(z_1)113114# Concatenate columns:115x_2 = torch.randn(2, 3)116y_2 = torch.randn(2, 5)117# second arg specifies which axis to concat along118z_2 = torch.cat([x_2, y_2], 1)119print(z_2)120121# If your tensors are not compatible, torch will complain. Uncomment to see the error122# torch.cat([x_1, x_2])123124125######################################################################126# Reshaping Tensors127# ~~~~~~~~~~~~~~~~~128#129# Use the .view() method to reshape a tensor. This method receives heavy130# use, because many neural network components expect their inputs to have131# a certain shape. Often you will need to reshape before passing your data132# to the component.133#134135x = torch.randn(2, 3, 4)136print(x)137print(x.view(2, 12)) # Reshape to 2 rows, 12 columns138# Same as above. If one of the dimensions is -1, its size can be inferred139print(x.view(2, -1))140141142######################################################################143# Computation Graphs and Automatic Differentiation144# ================================================145#146# The concept of a computation graph is essential to efficient deep147# learning programming, because it allows you to not have to write the148# back propagation gradients yourself. A computation graph is simply a149# specification of how your data is combined to give you the output. Since150# the graph totally specifies what parameters were involved with which151# operations, it contains enough information to compute derivatives. This152# probably sounds vague, so let's see what is going on using the153# fundamental flag ``requires_grad``.154#155# First, think from a programmers perspective. What is stored in the156# torch.Tensor objects we were creating above? Obviously the data and the157# shape, and maybe a few other things. But when we added two tensors158# together, we got an output tensor. All this output tensor knows is its159# data and shape. It has no idea that it was the sum of two other tensors160# (it could have been read in from a file, it could be the result of some161# other operation, etc.)162#163# If ``requires_grad=True``, the Tensor object keeps track of how it was164# created. Let's see it in action.165#166167# Tensor factory methods have a ``requires_grad`` flag168x = torch.tensor([1., 2., 3], requires_grad=True)169170# With requires_grad=True, you can still do all the operations you previously171# could172y = torch.tensor([4., 5., 6], requires_grad=True)173z = x + y174print(z)175176# BUT z knows something extra.177print(z.grad_fn)178179180######################################################################181# So Tensors know what created them. z knows that it wasn't read in from182# a file, it wasn't the result of a multiplication or exponential or183# whatever. And if you keep following z.grad_fn, you will find yourself at184# x and y.185#186# But how does that help us compute a gradient?187#188189# Let's sum up all the entries in z190s = z.sum()191print(s)192print(s.grad_fn)193194195######################################################################196# So now, what is the derivative of this sum with respect to the first197# component of x? In math, we want198#199# .. math::200#201# \frac{\partial s}{\partial x_0}202#203#204#205# Well, s knows that it was created as a sum of the tensor z. z knows206# that it was the sum x + y. So207#208# .. math:: s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}209#210# And so s contains enough information to determine that the derivative211# we want is 1!212#213# Of course this glosses over the challenge of how to actually compute214# that derivative. The point here is that s is carrying along enough215# information that it is possible to compute it. In reality, the216# developers of Pytorch program the sum() and + operations to know how to217# compute their gradients, and run the back propagation algorithm. An218# in-depth discussion of that algorithm is beyond the scope of this219# tutorial.220#221222223######################################################################224# Let's have Pytorch compute the gradient, and see that we were right:225# (note if you run this block multiple times, the gradient will increment.226# That is because Pytorch *accumulates* the gradient into the .grad227# property, since for many models this is very convenient.)228#229230# calling .backward() on any variable will run backprop, starting from it.231s.backward()232print(x.grad)233234235######################################################################236# Understanding what is going on in the block below is crucial for being a237# successful programmer in deep learning.238#239240x = torch.randn(2, 2)241y = torch.randn(2, 2)242# By default, user created Tensors have ``requires_grad=False``243print(x.requires_grad, y.requires_grad)244z = x + y245# So you can't backprop through z246print(z.grad_fn)247248# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``249# flag in-place. The input flag defaults to ``True`` if not given.250x = x.requires_grad_()251y = y.requires_grad_()252# z contains enough information to compute gradients, as we saw above253z = x + y254print(z.grad_fn)255# If any input to an operation has ``requires_grad=True``, so will the output256print(z.requires_grad)257258# Now z has the computation history that relates itself to x and y259# Can we just take its values, and **detach** it from its history?260new_z = z.detach()261262# ... does new_z have information to backprop to x and y?263# NO!264print(new_z.grad_fn)265# And how could it? ``z.detach()`` returns a tensor that shares the same storage266# as ``z``, but with the computation history forgotten. It doesn't know anything267# about how it was computed.268# In essence, we have broken the Tensor away from its past history269270###############################################################271# You can also stop autograd from tracking history on Tensors272# with ``.requires_grad=True`` by wrapping the code block in273# ``with torch.no_grad():``274print(x.requires_grad)275print((x ** 2).requires_grad)276277with torch.no_grad():278print((x ** 2).requires_grad)279280281282283