PyTorch’s autograd package provides automatic differentiation for all operations on tensors. This is the foundation of training neural networks using backpropagation.
torch.autograd automatically computes gradients (derivatives) of tensor operations. When you perform operations on tensors with requires_grad=True, PyTorch builds a computational graph and can automatically compute gradients via backpropagation.
Autograd is a define-by-run framework: the computational graph is built dynamically as operations execute. This makes it easy to use control flow (if statements, loops) in your models.
For non-scalar outputs, you need to specify a gradient:
import torchx = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)y = x ** 2# For non-scalar, need to provide gradientgrad_output = torch.ones_like(y)y.backward(grad_output)print(x.grad)# tensor([[2., 4.],# [6., 8.]])# Gradient is 2*x element-wise
PyTorch builds a dynamic computational graph (DAG) to track operations:
How Computational Graphs Work
When you perform operations on tensors with requires_grad=True, PyTorch creates a graph of Function objects. Each tensor has a .grad_fn attribute pointing to the function that created it. During .backward(), PyTorch traverses this graph in reverse (backpropagation) to compute gradients using the chain rule.
Disables gradient tracking (useful for inference):
import torchx = torch.tensor([1.0], requires_grad=True)# Normal operationy = x ** 2print(y.requires_grad) # True# With no_gradwith torch.no_grad(): y = x ** 2 print(y.requires_grad) # False# As decorator@torch.no_grad()def inference(model, x): return model(x)
Faster than no_grad for inference (more restrictive):
import torchimport torch.nn as nnmodel = nn.Linear(10, 5)x = torch.randn(32, 10)with torch.inference_mode(): output = model(x) # Faster than no_grad, but tensors can't be used with grad later
import torchimport torch.nn as nnmodel = nn.LSTM(10, 20, 2)optimizer = torch.optim.Adam(model.parameters())# In training looploss.backward()# Clip gradients by normtorch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)# Or clip by value# torch.nn.utils.clip_grad_value_(model.parameters(), clip_value=0.5)optimizer.step()
import torch# Enable anomaly detection (slower, for debugging)with torch.autograd.detect_anomaly(): x = torch.tensor([1.0], requires_grad=True) y = x ** 2 z = 1 / (y - 1) # Will cause inf when y=1 z.backward()