torch.optim.lr_scheduler - Learning Rate Schedulers

Overview

Learning rate schedulers provide different strategies for adjusting the learning rate during training. All schedulers inherit from torch.optim.lr_scheduler.LRScheduler.

Basic Usage

import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)

for epoch in range(100):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()  # Update learning rate after each epoch

Call scheduler.step() after optimizer.step() in each epoch.

StepLR

Decays the learning rate by gamma every step_size epochs.

torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

step_size

int

required

Period of learning rate decay (in epochs)

gamma

float

default:"0.1"

Multiplicative factor of learning rate decay

last_epoch

int

default:"-1"

The index of last epoch. Use -1 for fresh training

Example

# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05     if epoch < 30
# lr = 0.005    if 30 <= epoch < 60
# lr = 0.0005   if 60 <= epoch < 90
scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()

MultiStepLR

Decays the learning rate by gamma at specified epoch milestones.

torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

milestones

list[int]

required

List of epoch indices. Must be increasing

gamma

float

default:"0.1"

Multiplicative factor of learning rate decay

Example

# Assuming optimizer uses lr = 0.05 for all groups
# lr = 0.05     if epoch < 30
# lr = 0.005    if 30 <= epoch < 80
# lr = 0.0005   if epoch >= 80
scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)

ExponentialLR

Decays the learning rate by gamma every epoch.

torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

gamma

float

required

Multiplicative factor of learning rate decay

Example

scheduler = ExponentialLR(optimizer, gamma=0.9)
# lr_epoch = lr_initial * gamma^epoch

CosineAnnealingLR

Sets the learning rate using a cosine annealing schedule.

torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

T_max

int

required

Maximum number of iterations (half period of the cosine schedule)

eta_min

float

default:"0"

Minimum learning rate

Example

scheduler = CosineAnnealingLR(optimizer, T_max=100, eta_min=0)
# lr varies smoothly from initial_lr to eta_min over T_max epochs

Formula

lr_t = eta_min + (lr_initial - eta_min) * (1 + cos(π * t / T_max)) / 2

ReduceLROnPlateau

Reduces learning rate when a metric has stopped improving.

torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1,
                                           patience=10, threshold=1e-4,
                                           threshold_mode='rel', cooldown=0,
                                           min_lr=0, eps=1e-8)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

mode

str

default:"'min'"

One of 'min' or 'max'. In min mode, lr will be reduced when metric stops decreasing

factor

float

default:"0.1"

Factor by which the learning rate will be reduced: new_lr = lr * factor

patience

int

default:"10"

Number of epochs with no improvement after which learning rate will be reduced

threshold

float

default:"1e-4"

Threshold for measuring the new optimum

threshold_mode

str

default:"'rel'"

One of 'rel' or 'abs'. In rel mode, threshold is relative to best metric

cooldown

int

default:"0"

Number of epochs to wait before resuming normal operation after lr has been reduced

min_lr

float | list[float]

default:"0"

Scalar or list of scalars. Lower bound on learning rate

Example

scheduler = ReduceLROnPlateau(optimizer, 'min', patience=5)
for epoch in range(100):
    train(...)
    val_loss = validate(...)
    scheduler.step(val_loss)  # Pass the metric to monitor

Unlike other schedulers, ReduceLROnPlateau requires a metric argument in step().

CyclicLR

Sets the learning rate according to cyclical learning rate policy.

torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000,
                                  step_size_down=None, mode='triangular',
                                  gamma=1.0, scale_fn=None, scale_mode='cycle',
                                  cycle_momentum=True, base_momentum=0.8,
                                  max_momentum=0.9, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

base_lr

float | list[float]

required

Initial learning rate (lower boundary in the cycle)

max_lr

float | list[float]

required

Upper learning rate boundary in the cycle

step_size_up

int

default:"2000"

Number of training iterations in the increasing half of a cycle

mode

str

default:"'triangular'"

One of 'triangular', 'triangular2', or 'exp_range'

Example

scheduler = CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=2000, mode='triangular')
for epoch in range(10):
    for batch in train_loader:
        train_batch(...)
        optimizer.step()
        scheduler.step()  # Call per batch, not per epoch

OneCycleLR

Sets the learning rate according to the 1cycle learning rate policy.

torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None,
                                    epochs=None, steps_per_epoch=None,
                                    pct_start=0.3, anneal_strategy='cos',
                                    cycle_momentum=True, base_momentum=0.85,
                                    max_momentum=0.95, div_factor=25.0,
                                    final_div_factor=1e4, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

max_lr

float | list[float]

required

Upper learning rate boundary in the cycle

total_steps

int

Total number of steps in the cycle. Either provide this or (epochs, steps_per_epoch)

epochs

int

Number of epochs to train for

steps_per_epoch

int

Number of steps per epoch

pct_start

float

default:"0.3"

Percentage of the cycle spent increasing the learning rate

anneal_strategy

str

default:"'cos'"

One of 'cos' or 'linear'

Example

scheduler = OneCycleLR(optimizer, max_lr=0.1, epochs=10, steps_per_epoch=100)
for epoch in range(10):
    for batch in train_loader:
        train_batch(...)
        optimizer.step()
        scheduler.step()

LinearLR

Decays the learning rate linearly.

torch.optim.lr_scheduler.LinearLR(optimizer, start_factor=1.0/3, end_factor=1.0,
                                  total_iters=5, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

start_factor

float

default:"1.0/3"

The number we multiply learning rate in the first epoch

end_factor

float

default:"1.0"

The number we multiply learning rate at the end of linear changing process

total_iters

int

default:"5"

Number of iterations that multiplicative factor reaches end_factor

ConstantLR

Multiplies the learning rate by a constant factor.

torch.optim.lr_scheduler.ConstantLR(optimizer, factor=1.0/3, total_iters=5, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

factor

float

default:"1.0/3"

The number we multiply learning rate until total_iters

total_iters

int

default:"5"

Number of epochs with constant learning rate

PolynomialLR

Decays the learning rate using a polynomial function.

torch.optim.lr_scheduler.PolynomialLR(optimizer, total_iters=5, power=1.0, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

total_iters

int

default:"5"

Number of iterations to decay learning rate

power

float

default:"1.0"

Power of the polynomial

LambdaLR

Sets the learning rate to a custom lambda function.

torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

lr_lambda

function | list[function]

required

Function or list of functions that computes a multiplicative factor given an integer epoch

Example

lambda1 = lambda epoch: epoch // 30
lambda2 = lambda epoch: 0.95 ** epoch
scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])

ChainedScheduler

Chains multiple learning rate schedulers.

torch.optim.lr_scheduler.ChainedScheduler(schedulers)

Parameters

schedulers

list

required

List of chained schedulers

Example

scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = ChainedScheduler([scheduler1, scheduler2])

SequentialLR

Receives a list of schedulers and milestones to switch between them.

torch.optim.lr_scheduler.SequentialLR(optimizer, schedulers, milestones, last_epoch=-1)

Parameters

optimizer

Optimizer

required

Wrapped optimizer

schedulers

list

required

List of schedulers

milestones

list[int]

required

List of integers that reflects milestone points

Example

scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[2])
# Uses scheduler1 for epochs [0, 1], then scheduler2 for epochs [2, ...]

Common Methods

All schedulers implement these methods:

step()

scheduler.step(epoch=None)

Update the learning rate.

get_last_lr()

lr_list = scheduler.get_last_lr()

Return last computed learning rate by scheduler.

state_dict()

state = scheduler.state_dict()

Return the state of the scheduler as a dict.

load_state_dict()

scheduler.load_state_dict(state_dict)

Load the scheduler’s state.

Cheat Sheet

Scheduler	Use Case
StepLR	Drop LR by a factor every N epochs
MultiStepLR	Drop LR at specific epoch milestones
ExponentialLR	Exponential decay each epoch
CosineAnnealingLR	Cosine annealing for smooth decay
ReduceLROnPlateau	Reduce LR when validation metric plateaus
CyclicLR	Cyclical learning rates (per batch)
OneCycleLR	Super-convergence with 1cycle policy
LinearLR	Linear warmup or decay

​Overview

​Basic Usage

​StepLR

​Parameters

​Example

​MultiStepLR

​Parameters

​Example

​ExponentialLR

​Parameters

​Example

​CosineAnnealingLR

​Parameters

​Example

​Formula

​ReduceLROnPlateau

​Parameters

​Example

​CyclicLR

​Parameters

​Example

​OneCycleLR

​Parameters

​Example

​LinearLR

​Parameters

​ConstantLR

​Parameters

​PolynomialLR

​Parameters

​LambdaLR

​Parameters

​Example

​ChainedScheduler

​Parameters

​Example

​SequentialLR

​Parameters

​Example

​Common Methods

​step()

​get_last_lr()

​state_dict()

​load_state_dict()

​Cheat Sheet

​See Also

Overview

Basic Usage

StepLR

Parameters

Example

MultiStepLR

Parameters

Example

ExponentialLR

Parameters

Example

CosineAnnealingLR

Parameters

Example

Formula

ReduceLROnPlateau

Parameters

Example

CyclicLR

Parameters

Example

OneCycleLR

Parameters

Example

LinearLR

Parameters

ConstantLR

Parameters

PolynomialLR

Parameters

LambdaLR

Parameters

Example

ChainedScheduler

Parameters

Example

SequentialLR

Parameters

Example

Common Methods

step()

get_last_lr()

state_dict()

load_state_dict()

Cheat Sheet

See Also