CoCalc -- compiling_optimizer_lr

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

GitHub Repository: pytorch/tutorials
Path: blob/main/recipes_source/compiling_optimizer_lr_scheduler.py
Views: ⁷¹²

1
"""
2
(beta) Running the compiled optimizer with an LR Scheduler
3
============================================================
4

5
**Author:** `Michael Lazos <https://github.com/mlazos>`_
6
"""
7

8
#########################################################
9
# The optimizer is a key algorithm for training any deep learning model.
10
# In this example, we will show how to pair the optimizer, which has been compiled using ``torch.compile``,
11
# with the LR schedulers to accelerate training convergence.
12
#
13
# .. note::
14
#
15
#    This tutorial requires PyTorch 2.3.0 or later.
16

17
#####################################################################
18
# Model Setup
19
# ~~~~~~~~~~~~~~~~~~~~~
20
# For this example, we'll use a simple sequence of linear layers.
21
#
22

23
import torch
24

25
# Create simple model
26
model = torch.nn.Sequential(
27
    *[torch.nn.Linear(1024, 1024, False, device="cuda") for _ in range(10)]
28
)
29
input = torch.rand(1024, device="cuda")
30

31
# run forward pass
32
output = model(input)
33

34
# run backward to populate the grads for our optimizer below
35
output.sum().backward()
36

37

38
#####################################################################
39
# Setting up and running the compiled optimizer with LR Scheduler
40
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
41
#
42
# In this section, we'll use the Adam optimizer with LinearLR Scheduler
43
# and create a helper function to wrap the ``step()`` call for each of them
44
# in ``torch.compile()``.
45
#
46
# .. note::
47
#
48
#    ``torch.compile`` is only supported on CUDA devices that have a compute capability of 7.0 or higher.
49

50

51
# exit cleanly if we are on a device that doesn't support ``torch.compile``
52
if torch.cuda.get_device_capability() < (7, 0):
53
    print("Exiting because torch.compile is not supported on this device.")
54
    import sys
55
    sys.exit(0)
56

57
# !!! IMPORTANT !!! Wrap the lr in a Tensor if we are pairing the
58
# the optimizer with an LR Scheduler.
59
# Without this, torch.compile will recompile as the value of the LR
60
# changes.
61
opt = torch.optim.Adam(model.parameters(), lr=torch.tensor(0.01))
62
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)
63

64
@torch.compile(fullgraph=False)
65
def fn():
66
    opt.step()
67
    sched.step()
68

69

70
# Warmup runs to compile the function
71
for _ in range(5):
72
    fn()
73
    print(opt.param_groups[0]["lr"])
74

75

76
######################################################################
77
# Extension: What happens with a non-tensor LR?
78
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
79
# For the curious, we will show how to peek into what happens with ``torch.compile`` when we don't wrap the
80
# LR in a tensor.
81

82
# No longer wrap the LR in a tensor here
83
opt = torch.optim.Adam(model.parameters(), lr=0.01)
84
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)
85

86
@torch.compile(fullgraph=False)
87
def fn():
88
    opt.step()
89
    sched.step()
90

91
# Setup logging to view recompiles
92
torch._logging.set_logs(recompiles=True)
93

94
# Warmup runs to compile the function
95
# We will now recompile on each iteration
96
# as the value of the lr is mutated.
97
for _ in range(5):
98
    fn()
99

100

101
######################################################################
102
# With this example, we can see that we recompile the optimizer a few times
103
# due to the guard failure on the ``lr`` in ``param_groups[0]``.
104

105
######################################################################
106
# Conclusion
107
# ~~~~~~~~~~
108
#
109
# In this tutorial we showed how to pair the optimizer compiled with ``torch.compile``
110
# with an LR Scheduler to accelerate training convergence. We used a model consisting
111
# of a simple sequence of linear layers with the Adam optimizer paired
112
# with a LinearLR scheduler to demonstrate the LR changing across iterations.
113
#
114
# See also:
115
#
116
# * `Compiled optimizer tutorial <https://pytorch.org/tutorials/recipes/compiling_optimizer.html>`__ - an intro into the compiled optimizer.
117
# * `Compiling the optimizer with PT2 <https://dev-discuss.pytorch.org/t/compiling-the-optimizer-with-pt2/1669>`__ - deeper technical details on the compiled optimizer. 
118

119

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.

Product

Resources

Company

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more, all in one place.

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place.