Overview
The torch.nn.functional module provides functional implementations of neural network operations. Unlike torch.nn modules, these functions are stateless and don’t maintain learnable parameters.
Use F as an alias: import torch.nn.functional as F
Convolution Operations
conv2d
F.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 2D convolution over an input image.
Input tensor of shape (minibatch, in_channels, iH, iW)
Filters of shape (out_channels, in_channels/groups, kH, kW)
Optional bias tensor of shape (out_channels)
Stride of the convolving kernel
padding
int | tuple | str
default:"0"
Padding added to both sides. Can be 'valid', 'same', or integer
Spacing between kernel elements
Number of blocked connections from input to output channels
Example:
import torch.nn.functional as F
filters = torch.randn(8, 4, 3, 3)
inputs = torch.randn(1, 4, 5, 5)
output = F.conv2d(inputs, filters, padding=1)
conv1d
F.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 1D convolution over an input signal.
Shape:
- Input:
(minibatch, in_channels, iW)
- Weight:
(out_channels, in_channels/groups, kW)
- Output:
(minibatch, out_channels, oW)
conv3d
F.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 3D convolution over an input signal.
Shape:
- Input:
(minibatch, in_channels, iT, iH, iW)
- Weight:
(out_channels, in_channels/groups, kT, kH, kW)
- Output:
(minibatch, out_channels, oT, oH, oW)
Pooling Operations
max_pool2d
F.max_pool2d(input, kernel_size, stride=None, padding=0, dilation=1,
ceil_mode=False, return_indices=False)
Applies 2D max pooling over an input signal.
Input tensor of shape (minibatch, in_channels, iH, iW)
Size of the pooling region
stride
int | tuple
default:"kernel_size"
Stride of the pooling operation
Implicit negative infinity padding
Stride between elements within a sliding window
Use ceil instead of floor to compute output shape
Return the max indices along with outputs
avg_pool2d
F.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False,
count_include_pad=True, divisor_override=None)
Applies 2D average pooling over an input signal.
Include zero-padding in averaging calculation
If specified, used as divisor instead of pooling region size
adaptive_avg_pool2d
F.adaptive_avg_pool2d(input, output_size)
Applies 2D adaptive average pooling.
Target output size (H_out, W_out)
Example:
input = torch.randn(1, 64, 8, 9)
output = F.adaptive_avg_pool2d(input, (5, 7))
print(output.shape) # torch.Size([1, 64, 5, 7])
Activation Functions
relu
F.relu(input, inplace=False)
Applies the rectified linear unit function element-wise: ReLU(x) = max(0, x)
Perform operation in-place
Example:
input = torch.randn(2)
output = F.relu(input)
leaky_relu
F.leaky_relu(input, negative_slope=0.01, inplace=False)
Applies element-wise: LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)
Controls the angle of the negative slope
gelu
F.gelu(input, approximate='none')
Applies the Gaussian Error Linear Units function.
Approximation type: 'none' or 'tanh'
When approximate='none':
Where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.
sigmoid
Applies the element-wise sigmoid function: σ(x) = 1 / (1 + exp(-x))
tanh
Applies the hyperbolic tangent function element-wise.
softmax
F.softmax(input, dim=None, dtype=None)
Applies the Softmax function to an n-dimensional input Tensor.
Dimension along which Softmax will be computed
dtype
torch.dtype
default:"None"
Desired data type of returned tensor
Formula:
Softmax(x_i) = exp(x_i) / Σ_j exp(x_j)
Example:
input = torch.randn(2, 3)
output = F.softmax(input, dim=1)
print(output.sum(dim=1)) # tensor([1., 1.])
log_softmax
F.log_softmax(input, dim=None, dtype=None)
Applies a softmax followed by a logarithm.
For numerical stability and better gradient flow, use log_softmax instead of manually computing log(softmax(x)).
Example:
input = torch.randn(2, 3)
output = F.log_softmax(input, dim=1)
elu
F.elu(input, alpha=1.0, inplace=False)
Applies the Exponential Linear Unit (ELU) function.
The α value for the ELU formulation
selu
F.selu(input, inplace=False)
Applies the Scaled Exponential Linear Unit (SELU) function.
Normalization Functions
batch_norm
F.batch_norm(input, running_mean, running_var, weight=None, bias=None,
training=False, momentum=0.1, eps=1e-05)
Applies Batch Normalization for each channel across a batch of data.
Input tensor of shape (N, C, *) where * means any spatial dimensions
Running mean tensor of shape (C,)
Running variance tensor of shape (C,)
Learnable scale parameter of shape (C,)
Learnable shift parameter of shape (C,)
Momentum for running statistics
Value added for numerical stability
layer_norm
F.layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05)
Applies Layer Normalization over a mini-batch of inputs.
Input shape from an expected input
group_norm
F.group_norm(input, num_groups, weight=None, bias=None, eps=1e-05)
Applies Group Normalization over a mini-batch of inputs.
Number of groups to separate channels into
Dropout Functions
dropout
F.dropout(input, p=0.5, training=True, inplace=False)
Randomly zeroes some elements of the input tensor with probability p.
Probability of an element to be zeroed
Example:
input = torch.randn(20, 16)
output = F.dropout(input, p=0.2, training=True)
dropout2d
F.dropout2d(input, p=0.5, training=True, inplace=False)
Randomly zeros out entire channels (channel-wise dropout).
Linear Functions
linear
F.linear(input, weight, bias=None)
Applies a linear transformation: y = xA^T + b
Input tensor of shape (*, in_features)
Weight tensor of shape (out_features, in_features)
Optional bias tensor of shape (out_features,)
Example:
input = torch.randn(128, 20)
weight = torch.randn(30, 20)
output = F.linear(input, weight)
print(output.shape) # torch.Size([128, 30])
bilinear
F.bilinear(input1, input2, weight, bias=None)
Applies a bilinear transformation: y = x1^T A x2 + b
Loss Functions
cross_entropy
F.cross_entropy(input, target, weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean',
label_smoothing=0.0)
Computes the cross entropy loss between input logits and target.
Predicted unnormalized logits of shape (N, C) or (N, C, d1, d2, ..., dK)
Ground truth class indices of shape (N) or (N, d1, d2, ..., dK)
Specifies reduction: 'none', 'mean', or 'sum'
Label smoothing factor in [0.0, 1.0]
Example:
input = torch.randn(3, 5, requires_grad=True)
target = torch.randint(5, (3,), dtype=torch.long)
loss = F.cross_entropy(input, target)
loss.backward()
mse_loss
F.mse_loss(input, target, size_average=None, reduce=None, reduction='mean')
Measures the element-wise mean squared error.
binary_cross_entropy
F.binary_cross_entropy(input, target, weight=None, size_average=None,
reduce=None, reduction='mean')
Measures the Binary Cross Entropy between the target and input probabilities.
nll_loss
F.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean')
Negative log likelihood loss.
Utility Functions
pad
F.pad(input, pad, mode='constant', value=0)
Pads tensor.
m-elements tuple specifying padding sizes
'constant', 'reflect', 'replicate', or 'circular'
Example:
t = torch.tensor([[1, 2], [3, 4]])
padded = F.pad(t, (1, 1, 1, 1), mode='constant', value=0)
# tensor([[0, 0, 0, 0],
# [0, 1, 2, 0],
# [0, 3, 4, 0],
# [0, 0, 0, 0]])
interpolate
F.interpolate(input, size=None, scale_factor=None, mode='nearest',
align_corners=None, recompute_scale_factor=None, antialias=False)
Down/up samples the input to given size or scale_factor.
size
int | tuple
default:"None"
Output spatial size
scale_factor
float | tuple
default:"None"
Multiplier for spatial size
Algorithm: 'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear', 'area'