CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
pytorch

CoCalc provides the best real-time collaborative environment for Jupyter Notebooks, LaTeX documents, and SageMath, scalable from individual users to large groups and classes!

GitHub Repository: pytorch/tutorials
Path: blob/main/beginner_source/nlp/pytorch_tutorial.py
Views: 494
1
# -*- coding: utf-8 -*-
2
r"""
3
Introduction to PyTorch
4
***********************
5
6
Introduction to Torch's tensor library
7
======================================
8
9
All of deep learning is computations on tensors, which are
10
generalizations of a matrix that can be indexed in more than 2
11
dimensions. We will see exactly what this means in-depth later. First,
12
let's look what we can do with tensors.
13
"""
14
# Author: Robert Guthrie
15
16
import torch
17
18
torch.manual_seed(1)
19
20
21
######################################################################
22
# Creating Tensors
23
# ~~~~~~~~~~~~~~~~
24
#
25
# Tensors can be created from Python lists with the torch.tensor()
26
# function.
27
#
28
29
# torch.tensor(data) creates a torch.Tensor object with the given data.
30
V_data = [1., 2., 3.]
31
V = torch.tensor(V_data)
32
print(V)
33
34
# Creates a matrix
35
M_data = [[1., 2., 3.], [4., 5., 6]]
36
M = torch.tensor(M_data)
37
print(M)
38
39
# Create a 3D tensor of size 2x2x2.
40
T_data = [[[1., 2.], [3., 4.]],
41
[[5., 6.], [7., 8.]]]
42
T = torch.tensor(T_data)
43
print(T)
44
45
46
######################################################################
47
# What is a 3D tensor anyway? Think about it like this. If you have a
48
# vector, indexing into the vector gives you a scalar. If you have a
49
# matrix, indexing into the matrix gives you a vector. If you have a 3D
50
# tensor, then indexing into the tensor gives you a matrix!
51
#
52
# A note on terminology:
53
# when I say "tensor" in this tutorial, it refers
54
# to any torch.Tensor object. Matrices and vectors are special cases of
55
# torch.Tensors, where their dimension is 2 and 1 respectively. When I am
56
# talking about 3D tensors, I will explicitly use the term "3D tensor".
57
#
58
59
# Index into V and get a scalar (0 dimensional tensor)
60
print(V[0])
61
# Get a Python number from it
62
print(V[0].item())
63
64
# Index into M and get a vector
65
print(M[0])
66
67
# Index into T and get a matrix
68
print(T[0])
69
70
71
######################################################################
72
# You can also create tensors of other data types. To create a tensor of integer types, try
73
# torch.tensor([[1, 2], [3, 4]]) (where all elements in the list are integers).
74
# You can also specify a data type by passing in ``dtype=torch.data_type``.
75
# Check the documentation for more data types, but
76
# Float and Long will be the most common.
77
#
78
79
80
######################################################################
81
# You can create a tensor with random data and the supplied dimensionality
82
# with torch.randn()
83
#
84
85
x = torch.randn((3, 4, 5))
86
print(x)
87
88
89
######################################################################
90
# Operations with Tensors
91
# ~~~~~~~~~~~~~~~~~~~~~~~
92
#
93
# You can operate on tensors in the ways you would expect.
94
95
x = torch.tensor([1., 2., 3.])
96
y = torch.tensor([4., 5., 6.])
97
z = x + y
98
print(z)
99
100
101
######################################################################
102
# See `the documentation <https://pytorch.org/docs/torch.html>`__ for a
103
# complete list of the massive number of operations available to you. They
104
# expand beyond just mathematical operations.
105
#
106
# One helpful operation that we will make use of later is concatenation.
107
#
108
109
# By default, it concatenates along the first axis (concatenates rows)
110
x_1 = torch.randn(2, 5)
111
y_1 = torch.randn(3, 5)
112
z_1 = torch.cat([x_1, y_1])
113
print(z_1)
114
115
# Concatenate columns:
116
x_2 = torch.randn(2, 3)
117
y_2 = torch.randn(2, 5)
118
# second arg specifies which axis to concat along
119
z_2 = torch.cat([x_2, y_2], 1)
120
print(z_2)
121
122
# If your tensors are not compatible, torch will complain. Uncomment to see the error
123
# torch.cat([x_1, x_2])
124
125
126
######################################################################
127
# Reshaping Tensors
128
# ~~~~~~~~~~~~~~~~~
129
#
130
# Use the .view() method to reshape a tensor. This method receives heavy
131
# use, because many neural network components expect their inputs to have
132
# a certain shape. Often you will need to reshape before passing your data
133
# to the component.
134
#
135
136
x = torch.randn(2, 3, 4)
137
print(x)
138
print(x.view(2, 12)) # Reshape to 2 rows, 12 columns
139
# Same as above. If one of the dimensions is -1, its size can be inferred
140
print(x.view(2, -1))
141
142
143
######################################################################
144
# Computation Graphs and Automatic Differentiation
145
# ================================================
146
#
147
# The concept of a computation graph is essential to efficient deep
148
# learning programming, because it allows you to not have to write the
149
# back propagation gradients yourself. A computation graph is simply a
150
# specification of how your data is combined to give you the output. Since
151
# the graph totally specifies what parameters were involved with which
152
# operations, it contains enough information to compute derivatives. This
153
# probably sounds vague, so let's see what is going on using the
154
# fundamental flag ``requires_grad``.
155
#
156
# First, think from a programmers perspective. What is stored in the
157
# torch.Tensor objects we were creating above? Obviously the data and the
158
# shape, and maybe a few other things. But when we added two tensors
159
# together, we got an output tensor. All this output tensor knows is its
160
# data and shape. It has no idea that it was the sum of two other tensors
161
# (it could have been read in from a file, it could be the result of some
162
# other operation, etc.)
163
#
164
# If ``requires_grad=True``, the Tensor object keeps track of how it was
165
# created. Let's see it in action.
166
#
167
168
# Tensor factory methods have a ``requires_grad`` flag
169
x = torch.tensor([1., 2., 3], requires_grad=True)
170
171
# With requires_grad=True, you can still do all the operations you previously
172
# could
173
y = torch.tensor([4., 5., 6], requires_grad=True)
174
z = x + y
175
print(z)
176
177
# BUT z knows something extra.
178
print(z.grad_fn)
179
180
181
######################################################################
182
# So Tensors know what created them. z knows that it wasn't read in from
183
# a file, it wasn't the result of a multiplication or exponential or
184
# whatever. And if you keep following z.grad_fn, you will find yourself at
185
# x and y.
186
#
187
# But how does that help us compute a gradient?
188
#
189
190
# Let's sum up all the entries in z
191
s = z.sum()
192
print(s)
193
print(s.grad_fn)
194
195
196
######################################################################
197
# So now, what is the derivative of this sum with respect to the first
198
# component of x? In math, we want
199
#
200
# .. math::
201
#
202
# \frac{\partial s}{\partial x_0}
203
#
204
#
205
#
206
# Well, s knows that it was created as a sum of the tensor z. z knows
207
# that it was the sum x + y. So
208
#
209
# .. math:: s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}
210
#
211
# And so s contains enough information to determine that the derivative
212
# we want is 1!
213
#
214
# Of course this glosses over the challenge of how to actually compute
215
# that derivative. The point here is that s is carrying along enough
216
# information that it is possible to compute it. In reality, the
217
# developers of Pytorch program the sum() and + operations to know how to
218
# compute their gradients, and run the back propagation algorithm. An
219
# in-depth discussion of that algorithm is beyond the scope of this
220
# tutorial.
221
#
222
223
224
######################################################################
225
# Let's have Pytorch compute the gradient, and see that we were right:
226
# (note if you run this block multiple times, the gradient will increment.
227
# That is because Pytorch *accumulates* the gradient into the .grad
228
# property, since for many models this is very convenient.)
229
#
230
231
# calling .backward() on any variable will run backprop, starting from it.
232
s.backward()
233
print(x.grad)
234
235
236
######################################################################
237
# Understanding what is going on in the block below is crucial for being a
238
# successful programmer in deep learning.
239
#
240
241
x = torch.randn(2, 2)
242
y = torch.randn(2, 2)
243
# By default, user created Tensors have ``requires_grad=False``
244
print(x.requires_grad, y.requires_grad)
245
z = x + y
246
# So you can't backprop through z
247
print(z.grad_fn)
248
249
# ``.requires_grad_( ... )`` changes an existing Tensor's ``requires_grad``
250
# flag in-place. The input flag defaults to ``True`` if not given.
251
x = x.requires_grad_()
252
y = y.requires_grad_()
253
# z contains enough information to compute gradients, as we saw above
254
z = x + y
255
print(z.grad_fn)
256
# If any input to an operation has ``requires_grad=True``, so will the output
257
print(z.requires_grad)
258
259
# Now z has the computation history that relates itself to x and y
260
# Can we just take its values, and **detach** it from its history?
261
new_z = z.detach()
262
263
# ... does new_z have information to backprop to x and y?
264
# NO!
265
print(new_z.grad_fn)
266
# And how could it? ``z.detach()`` returns a tensor that shares the same storage
267
# as ``z``, but with the computation history forgotten. It doesn't know anything
268
# about how it was computed.
269
# In essence, we have broken the Tensor away from its past history
270
271
###############################################################
272
# You can also stop autograd from tracking history on Tensors
273
# with ``.requires_grad=True`` by wrapping the code block in
274
# ``with torch.no_grad():``
275
print(x.requires_grad)
276
print((x ** 2).requires_grad)
277
278
with torch.no_grad():
279
print((x ** 2).requires_grad)
280
281
282
283