Skip to main content

Overview

Learning rate schedulers provide different strategies for adjusting the learning rate during training. All schedulers inherit from torch.optim.lr_scheduler.LRScheduler.

Basic Usage

import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)

for epoch in range(100):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()  # Update learning rate after each epoch
Call scheduler.step() after optimizer.step() in each epoch.

StepLR

Decays the learning rate by gamma every step_size epochs.
torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
step_size
int
required
Period of learning rate decay (in epochs)
gamma
float
default:"0.1"
Multiplicative factor of learning rate decay
last_epoch
int
default:"-1"
The index of last epoch. Use -1 for fresh training

Example

# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05     if epoch < 30
# lr = 0.005    if 30 <= epoch < 60
# lr = 0.0005   if 60 <= epoch < 90
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()

MultiStepLR

Decays the learning rate by gamma at specified epoch milestones.
torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
milestones
list[int]
required
List of epoch indices. Must be increasing
gamma
float
default:"0.1"
Multiplicative factor of learning rate decay

Example

# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05     if epoch < 30
# lr = 0.005    if 30 <= epoch < 80
# lr = 0.0005   if epoch >= 80
scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)

ExponentialLR

Decays the learning rate by gamma every epoch.
torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
gamma
float
required
Multiplicative factor of learning rate decay

Example

scheduler = ExponentialLR(optimizer, gamma=0.9)
# lr_epoch = lr_initial * gamma^epoch

CosineAnnealingLR

Sets the learning rate using a cosine annealing schedule.
torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
T_max
int
required
Maximum number of iterations (half period of the cosine schedule)
eta_min
float
default:"0"
Minimum learning rate

Example

scheduler = CosineAnnealingLR(optimizer, T_max=100, eta_min=0)
# lr varies smoothly from initial_lr to eta_min over T_max epochs

Formula

lr_t = eta_min + (lr_initial - eta_min) * (1 + cos(π * t / T_max)) / 2

ReduceLROnPlateau

Reduces learning rate when a metric has stopped improving.
torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1,
                                           patience=10, threshold=1e-4,
                                           threshold_mode='rel', cooldown=0,
                                           min_lr=0, eps=1e-8)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
mode
str
default:"'min'"
One of 'min' or 'max'. In min mode, lr will be reduced when metric stops decreasing
factor
float
default:"0.1"
Factor by which the learning rate will be reduced: new_lr = lr * factor
patience
int
default:"10"
Number of epochs with no improvement after which learning rate will be reduced
threshold
float
default:"1e-4"
Threshold for measuring the new optimum
threshold_mode
str
default:"'rel'"
One of 'rel' or 'abs'. In rel mode, threshold is relative to best metric
cooldown
int
default:"0"
Number of epochs to wait before resuming normal operation after lr has been reduced
min_lr
float | list[float]
default:"0"
Scalar or list of scalars. Lower bound on learning rate

Example

scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)
for epoch in range(100):
    train(...)
    val_loss = validate(...)
    scheduler.step(val_loss)  # Pass the metric to monitor
Unlike other schedulers, ReduceLROnPlateau requires a metric argument in step().

CyclicLR

Sets the learning rate according to cyclical learning rate policy.
torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000,
                                  step_size_down=None, mode='triangular',
                                  gamma=1.0, scale_fn=None, scale_mode='cycle',
                                  cycle_momentum=True, base_momentum=0.8,
                                  max_momentum=0.9, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
base_lr
float | list[float]
required
Initial learning rate (lower boundary in the cycle)
max_lr
float | list[float]
required
Upper learning rate boundary in the cycle
step_size_up
int
default:"2000"
Number of training iterations in the increasing half of a cycle
mode
str
default:"'triangular'"
One of 'triangular', 'triangular2', or 'exp_range'

Example

scheduler = CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=2000, mode='triangular')
for epoch in range(10):
    for batch in train_loader:
        train_batch(...)
        optimizer.step()
        scheduler.step()  # Call per batch, not per epoch

OneCycleLR

Sets the learning rate according to the 1cycle learning rate policy.
torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None,
                                    epochs=None, steps_per_epoch=None,
                                    pct_start=0.3, anneal_strategy='cos',
                                    cycle_momentum=True, base_momentum=0.85,
                                    max_momentum=0.95, div_factor=25.0,
                                    final_div_factor=1e4, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
max_lr
float | list[float]
required
Upper learning rate boundary in the cycle
total_steps
int
Total number of steps in the cycle. Either provide this or (epochs, steps_per_epoch)
epochs
int
Number of epochs to train for
steps_per_epoch
int
Number of steps per epoch
pct_start
float
default:"0.3"
Percentage of the cycle spent increasing the learning rate
anneal_strategy
str
default:"'cos'"
One of 'cos' or 'linear'

Example

scheduler = OneCycleLR(optimizer, max_lr=0.1, epochs=10, steps_per_epoch=100)
for epoch in range(10):
    for batch in train_loader:
        train_batch(...)
        optimizer.step()
        scheduler.step()

LinearLR

Decays the learning rate linearly.
torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=1.0/3, end_factor=1.0,
                                  total_iters=5, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
start_factor
float
default:"1.0/3"
The number we multiply learning rate in the first epoch
end_factor
float
default:"1.0"
The number we multiply learning rate at the end of linear changing process
total_iters
int
default:"5"
Number of iterations that multiplicative factor reaches end_factor

ConstantLR

Multiplies the learning rate by a constant factor.
torch.optim.lr_scheduler.ConstantLR(optimizer, factor=1.0/3, total_iters=5, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
factor
float
default:"1.0/3"
The number we multiply learning rate until total_iters
total_iters
int
default:"5"
Number of epochs with constant learning rate

PolynomialLR

Decays the learning rate using a polynomial function.
torch.optim.lr_scheduler.PolynomialLR(optimizer, total_iters=5, power=1.0, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
total_iters
int
default:"5"
Number of iterations to decay learning rate
power
float
default:"1.0"
Power of the polynomial

LambdaLR

Sets the learning rate to a custom lambda function.
torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
lr_lambda
function | list[function]
required
Function or list of functions that computes a multiplicative factor given an integer epoch

Example

lambda1 = lambda epoch: epoch // 30
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])

ChainedScheduler

Chains multiple learning rate schedulers.
torch.optim.lr_scheduler.ChainedScheduler(schedulers)

Parameters

schedulers
list
required
List of chained schedulers

Example

scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = ChainedScheduler([scheduler1, scheduler2])

SequentialLR

Receives a list of schedulers and milestones to switch between them.
torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers, milestones, last_epoch=-1)

Parameters

optimizer
Optimizer
required
Wrapped optimizer
schedulers
list
required
List of schedulers
milestones
list[int]
required
List of integers that reflects milestone points

Example

scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[2])
# Uses scheduler1 for epochs [0, 1], then scheduler2 for epochs [2, ...]

Common Methods

All schedulers implement these methods:

step()

scheduler.step(epoch=None)
Update the learning rate.

get_last_lr()

lr_list = scheduler.get_last_lr()
Return last computed learning rate by scheduler.

state_dict()

state = scheduler.state_dict()
Return the state of the scheduler as a dict.

load_state_dict()

scheduler.load_state_dict(state_dict)
Load the scheduler’s state.

Cheat Sheet

SchedulerUse Case
StepLRDrop LR by a factor every N epochs
MultiStepLRDrop LR at specific epoch milestones
ExponentialLRExponential decay each epoch
CosineAnnealingLRCosine annealing for smooth decay
ReduceLROnPlateauReduce LR when validation metric plateaus
CyclicLRCyclical learning rates (per batch)
OneCycleLRSuper-convergence with 1cycle policy
LinearLRLinear warmup or decay

See Also