Overview
torch.autograd provides classes and functions for automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to existing code - you only need to declare Tensors with requires_grad=True.
Supported types: floating point (half, float, double, bfloat16) and complex (cfloat, cdouble) tensors.
Core Functions
torch.autograd.backward()
torch.autograd.backward()
Computes the sum of gradients of given tensors with respect to graph leaves.This function accumulates gradients in the leaves. You might need to zero
Tensors of which the derivative will be computed.
The “vector” in the Jacobian-vector product. Should be a sequence of matching length. Required for non-scalar tensors.
If False, the graph used to compute grad will be freed. Defaults to create_graph value.
If True, graph of the derivative will be constructed for higher order derivatives.
Inputs w.r.t. which the gradient will be accumulated into .grad. If not provided, accumulates into all leaf tensors.
.grad attributes before calling.torch.autograd.grad()
torch.autograd.grad()
Computes and returns the sum of gradients of outputs with respect to inputs.
Outputs of the differentiated function.
Inputs w.r.t. which the gradient will be returned (not accumulated into .grad).
The “vector” in the vector-Jacobian product. Usually gradients w.r.t. each output.
If False, the graph will be freed. Defaults to create_graph value.
If True, graph of derivative will be constructed for higher order derivatives.
If False, error if inputs not used. Defaults to materialize_grads value.
If True, first dimension of grad_outputs is batch dimension for vectorized Jacobian computation.
If True, set gradient for unused inputs to zero instead of None.
Tuple of gradients, one for each input.
Gradient Control
torch.autograd.no_grad()
torch.autograd.no_grad()
Context manager that disables gradient calculation.Disabling gradient calculation is useful for inference when you are sure you will not call Can also be used as a decorator:
backward(). It reduces memory consumption and speeds up computations.torch.autograd.enable_grad()
torch.autograd.enable_grad()
Context manager that enables gradient calculation.Enables gradients in a region where they were disabled (e.g., inside a
no_grad context).torch.autograd.set_grad_enabled()
torch.autograd.set_grad_enabled()
Context manager to set gradient calculation on or off.Useful for conditionally enabling/disabling gradients:
Flag whether to enable grad (True) or disable (False).
torch.autograd.inference_mode()
torch.autograd.inference_mode()
Context manager that disables autograd and reduces overhead.Similar to Key differences from
no_grad() but faster and with more restrictions. Tensors created in inference mode cannot be used with autograd afterward.no_grad():- Lower overhead (faster)
- Tensors created inside cannot require gradients later
- View relationships not tracked
Debugging and Validation
torch.autograd.gradcheck()
torch.autograd.gradcheck()
Checks gradients computed via small finite differences against analytical gradients.
A Python function that takes Tensor inputs and returns a Tensor or tuple of Tensors.
Inputs to the function.
Perturbation for finite differences.
Absolute tolerance.
Relative tolerance.
Whether to raise exception on failure.
True if gradients are correct.
torch.autograd.gradgradcheck()
torch.autograd.gradgradcheck()
Checks gradients of gradients (second derivatives).Parameters similar to
A Python function that takes Tensor inputs and returns a Tensor or tuple of Tensors.
Inputs to the function.
Gradient outputs for computing second derivatives.
gradcheck().torch.autograd.detect_anomaly()
torch.autograd.detect_anomaly()
Context manager that enables anomaly detection for autograd.When enabled, forward pass will run with anomaly detection enabled and backward pass will raise errors if NaN/Inf gradients are detected.
Whether to check for NaN in backward pass.
Custom Functions
torch.autograd.Function
torch.autograd.Function
Base class for creating custom autograd functions.To create a custom function, subclass Methods:
Function and implement forward() and backward() static methods.Performs the operation. Must be implemented by subclass.
ctx: Context object to save information for backward*args, **kwargs: Input tensors and other arguments- Returns: Output tensor(s)
Defines gradient formula. Must be implemented by subclass.
ctx: Context object with saved tensors*grad_outputs: Gradients w.r.t. outputs- Returns: Gradients w.r.t. inputs (one per input, or None)
Context Methods
Context Methods
Methods available in ctx.saved_tensors
Retrieves saved tensors in backward.*ctx.mark_non_differentiable(tensors)
Marks output tensors as non-differentiable.ctx.set_materialize_grads(value)
Sets whether to materialize gradient tensors that are None.
ctx (context object) for custom functions:*ctx.save_for_backward(tensors)
Saves tensors to use in backward pass.Advanced Features
Gradient Accumulation
Gradient Accumulation
Gradients accumulate in
.grad attribute by default:Higher Order Gradients
Higher Order Gradients
Compute gradients of gradients:
Checkpointing
Checkpointing
Trade compute for memory by recomputing forward pass during backward:
Jacobian and Hessian
Jacobian and Hessian
Compute full Jacobian or Hessian matrices:
Performance Tips
Reduce Memory Usage
Reduce Memory Usage
-
Use inference_mode() for inference:
-
Clear gradients efficiently:
-
Use gradient checkpointing for large models:
Avoid Common Pitfalls
Avoid Common Pitfalls
-
In-place operations can cause errors:
-
Detach when mixing autograd and non-autograd:
-
Memory leaks with retain_graph:
Related APIs
torch Module
Core PyTorch functions
Tensor API
Tensor operations and methods
torch.nn
Neural network modules
torch.optim
Optimization algorithms