Overview
Learning rate schedulers provide different strategies for adjusting the learning rate during training. All schedulers inherit fromtorch.optim.lr_scheduler.LRScheduler.
Basic Usage
Call
scheduler.step() after optimizer.step() in each epoch.StepLR
Decays the learning rate bygamma every step_size epochs.
Parameters
Wrapped optimizer
Period of learning rate decay (in epochs)
Multiplicative factor of learning rate decay
The index of last epoch. Use -1 for fresh training
Example
MultiStepLR
Decays the learning rate bygamma at specified epoch milestones.
Parameters
Wrapped optimizer
List of epoch indices. Must be increasing
Multiplicative factor of learning rate decay
Example
ExponentialLR
Decays the learning rate bygamma every epoch.
Parameters
Wrapped optimizer
Multiplicative factor of learning rate decay
Example
CosineAnnealingLR
Sets the learning rate using a cosine annealing schedule.Parameters
Wrapped optimizer
Maximum number of iterations (half period of the cosine schedule)
Minimum learning rate
Example
Formula
ReduceLROnPlateau
Reduces learning rate when a metric has stopped improving.Parameters
Wrapped optimizer
One of
'min' or 'max'. In min mode, lr will be reduced when metric stops decreasingFactor by which the learning rate will be reduced:
new_lr = lr * factorNumber of epochs with no improvement after which learning rate will be reduced
Threshold for measuring the new optimum
One of
'rel' or 'abs'. In rel mode, threshold is relative to best metricNumber of epochs to wait before resuming normal operation after lr has been reduced
Scalar or list of scalars. Lower bound on learning rate
Example
Unlike other schedulers,
ReduceLROnPlateau requires a metric argument in step().CyclicLR
Sets the learning rate according to cyclical learning rate policy.Parameters
Wrapped optimizer
Initial learning rate (lower boundary in the cycle)
Upper learning rate boundary in the cycle
Number of training iterations in the increasing half of a cycle
One of
'triangular', 'triangular2', or 'exp_range'Example
OneCycleLR
Sets the learning rate according to the 1cycle learning rate policy.Parameters
Wrapped optimizer
Upper learning rate boundary in the cycle
Total number of steps in the cycle. Either provide this or (epochs, steps_per_epoch)
Number of epochs to train for
Number of steps per epoch
Percentage of the cycle spent increasing the learning rate
One of
'cos' or 'linear'Example
LinearLR
Decays the learning rate linearly.Parameters
Wrapped optimizer
The number we multiply learning rate in the first epoch
The number we multiply learning rate at the end of linear changing process
Number of iterations that multiplicative factor reaches end_factor
ConstantLR
Multiplies the learning rate by a constant factor.Parameters
Wrapped optimizer
The number we multiply learning rate until total_iters
Number of epochs with constant learning rate
PolynomialLR
Decays the learning rate using a polynomial function.Parameters
Wrapped optimizer
Number of iterations to decay learning rate
Power of the polynomial
LambdaLR
Sets the learning rate to a custom lambda function.Parameters
Wrapped optimizer
Function or list of functions that computes a multiplicative factor given an integer epoch
Example
ChainedScheduler
Chains multiple learning rate schedulers.Parameters
List of chained schedulers
Example
SequentialLR
Receives a list of schedulers and milestones to switch between them.Parameters
Wrapped optimizer
List of schedulers
List of integers that reflects milestone points
Example
Common Methods
All schedulers implement these methods:step()
get_last_lr()
state_dict()
load_state_dict()
Cheat Sheet
| Scheduler | Use Case |
|---|---|
| StepLR | Drop LR by a factor every N epochs |
| MultiStepLR | Drop LR at specific epoch milestones |
| ExponentialLR | Exponential decay each epoch |
| CosineAnnealingLR | Cosine annealing for smooth decay |
| ReduceLROnPlateau | Reduce LR when validation metric plateaus |
| CyclicLR | Cyclical learning rates (per batch) |
| OneCycleLR | Super-convergence with 1cycle policy |
| LinearLR | Linear warmup or decay |
See Also
- Optimizers - Optimization algorithms
- Learning Rate Finder Tutorial