Skip to main content

Overview

The torch.nn.functional module provides functional implementations of neural network operations. Unlike torch.nn modules, these functions are stateless and don’t maintain learnable parameters.
Use F as an alias: import torch.nn.functional as F

Convolution Operations

conv2d

F.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 2D convolution over an input image.
input
Tensor
required
Input tensor of shape (minibatch, in_channels, iH, iW)
weight
Tensor
required
Filters of shape (out_channels, in_channels/groups, kH, kW)
bias
Tensor
default:"None"
Optional bias tensor of shape (out_channels)
stride
int | tuple
default:"1"
Stride of the convolving kernel
padding
int | tuple | str
default:"0"
Padding added to both sides. Can be 'valid', 'same', or integer
dilation
int | tuple
default:"1"
Spacing between kernel elements
groups
int
default:"1"
Number of blocked connections from input to output channels
Example:
import torch.nn.functional as F

filters = torch.randn(8, 4, 3, 3)
inputs = torch.randn(1, 4, 5, 5)
output = F.conv2d(inputs, filters, padding=1)

conv1d

F.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 1D convolution over an input signal. Shape:
  • Input: (minibatch, in_channels, iW)
  • Weight: (out_channels, in_channels/groups, kW)
  • Output: (minibatch, out_channels, oW)

conv3d

F.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)
Applies a 3D convolution over an input signal. Shape:
  • Input: (minibatch, in_channels, iT, iH, iW)
  • Weight: (out_channels, in_channels/groups, kT, kH, kW)
  • Output: (minibatch, out_channels, oT, oH, oW)

Pooling Operations

max_pool2d

F.max_pool2d(input, kernel_size, stride=None, padding=0, dilation=1, 
             ceil_mode=False, return_indices=False)
Applies 2D max pooling over an input signal.
input
Tensor
required
Input tensor of shape (minibatch, in_channels, iH, iW)
kernel_size
int | tuple
required
Size of the pooling region
stride
int | tuple
default:"kernel_size"
Stride of the pooling operation
padding
int | tuple
default:"0"
Implicit negative infinity padding
dilation
int | tuple
default:"1"
Stride between elements within a sliding window
ceil_mode
bool
default:"False"
Use ceil instead of floor to compute output shape
return_indices
bool
default:"False"
Return the max indices along with outputs

avg_pool2d

F.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, 
             count_include_pad=True, divisor_override=None)
Applies 2D average pooling over an input signal.
count_include_pad
bool
default:"True"
Include zero-padding in averaging calculation
divisor_override
int
default:"None"
If specified, used as divisor instead of pooling region size

adaptive_avg_pool2d

F.adaptive_avg_pool2d(input, output_size)
Applies 2D adaptive average pooling.
output_size
int | tuple
required
Target output size (H_out, W_out)
Example:
input = torch.randn(1, 64, 8, 9)
output = F.adaptive_avg_pool2d(input, (5, 7))
print(output.shape)  # torch.Size([1, 64, 5, 7])

Activation Functions

relu

F.relu(input, inplace=False)
Applies the rectified linear unit function element-wise: ReLU(x) = max(0, x)
input
Tensor
required
Input tensor
inplace
bool
default:"False"
Perform operation in-place
Example:
input = torch.randn(2)
output = F.relu(input)

leaky_relu

F.leaky_relu(input, negative_slope=0.01, inplace=False)
Applies element-wise: LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)
negative_slope
float
default:"0.01"
Controls the angle of the negative slope

gelu

F.gelu(input, approximate='none')
Applies the Gaussian Error Linear Units function.
approximate
str
default:"'none'"
Approximation type: 'none' or 'tanh'
When approximate='none':
GELU(x) = x * Φ(x)
Where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

sigmoid

F.sigmoid(input)
Applies the element-wise sigmoid function: σ(x) = 1 / (1 + exp(-x))

tanh

F.tanh(input)
Applies the hyperbolic tangent function element-wise.

softmax

F.softmax(input, dim=None, dtype=None)
Applies the Softmax function to an n-dimensional input Tensor.
input
Tensor
required
Input tensor
dim
int
required
Dimension along which Softmax will be computed
dtype
torch.dtype
default:"None"
Desired data type of returned tensor
Formula:
Softmax(x_i) = exp(x_i) / Σ_j exp(x_j)
Example:
input = torch.randn(2, 3)
output = F.softmax(input, dim=1)
print(output.sum(dim=1))  # tensor([1., 1.])

log_softmax

F.log_softmax(input, dim=None, dtype=None)
Applies a softmax followed by a logarithm.
For numerical stability and better gradient flow, use log_softmax instead of manually computing log(softmax(x)).
Example:
input = torch.randn(2, 3)
output = F.log_softmax(input, dim=1)

elu

F.elu(input, alpha=1.0, inplace=False)
Applies the Exponential Linear Unit (ELU) function.
alpha
float
default:"1.0"
The α value for the ELU formulation

selu

F.selu(input, inplace=False)
Applies the Scaled Exponential Linear Unit (SELU) function.

Normalization Functions

batch_norm

F.batch_norm(input, running_mean, running_var, weight=None, bias=None, 
             training=False, momentum=0.1, eps=1e-05)
Applies Batch Normalization for each channel across a batch of data.
input
Tensor
required
Input tensor of shape (N, C, *) where * means any spatial dimensions
running_mean
Tensor
required
Running mean tensor of shape (C,)
running_var
Tensor
required
Running variance tensor of shape (C,)
weight
Tensor
default:"None"
Learnable scale parameter of shape (C,)
bias
Tensor
default:"None"
Learnable shift parameter of shape (C,)
training
bool
default:"False"
Training mode flag
momentum
float
default:"0.1"
Momentum for running statistics
eps
float
default:"1e-05"
Value added for numerical stability

layer_norm

F.layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05)
Applies Layer Normalization over a mini-batch of inputs.
normalized_shape
list | tuple
required
Input shape from an expected input

group_norm

F.group_norm(input, num_groups, weight=None, bias=None, eps=1e-05)
Applies Group Normalization over a mini-batch of inputs.
num_groups
int
required
Number of groups to separate channels into

Dropout Functions

dropout

F.dropout(input, p=0.5, training=True, inplace=False)
Randomly zeroes some elements of the input tensor with probability p.
p
float
default:"0.5"
Probability of an element to be zeroed
training
bool
default:"True"
Apply dropout if True
Example:
input = torch.randn(20, 16)
output = F.dropout(input, p=0.2, training=True)

dropout2d

F.dropout2d(input, p=0.5, training=True, inplace=False)
Randomly zeros out entire channels (channel-wise dropout).

Linear Functions

linear

F.linear(input, weight, bias=None)
Applies a linear transformation: y = xA^T + b
input
Tensor
required
Input tensor of shape (*, in_features)
weight
Tensor
required
Weight tensor of shape (out_features, in_features)
bias
Tensor
default:"None"
Optional bias tensor of shape (out_features,)
Example:
input = torch.randn(128, 20)
weight = torch.randn(30, 20)
output = F.linear(input, weight)
print(output.shape)  # torch.Size([128, 30])

bilinear

F.bilinear(input1, input2, weight, bias=None)
Applies a bilinear transformation: y = x1^T A x2 + b

Loss Functions

cross_entropy

F.cross_entropy(input, target, weight=None, size_average=None, 
                ignore_index=-100, reduce=None, reduction='mean', 
                label_smoothing=0.0)
Computes the cross entropy loss between input logits and target.
input
Tensor
required
Predicted unnormalized logits of shape (N, C) or (N, C, d1, d2, ..., dK)
target
Tensor
required
Ground truth class indices of shape (N) or (N, d1, d2, ..., dK)
reduction
str
default:"'mean'"
Specifies reduction: 'none', 'mean', or 'sum'
label_smoothing
float
default:"0.0"
Label smoothing factor in [0.0, 1.0]
Example:
input = torch.randn(3, 5, requires_grad=True)
target = torch.randint(5, (3,), dtype=torch.long)
loss = F.cross_entropy(input, target)
loss.backward()

mse_loss

F.mse_loss(input, target, size_average=None, reduce=None, reduction='mean')
Measures the element-wise mean squared error.

binary_cross_entropy

F.binary_cross_entropy(input, target, weight=None, size_average=None, 
                       reduce=None, reduction='mean')
Measures the Binary Cross Entropy between the target and input probabilities.

nll_loss

F.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, 
           reduce=None, reduction='mean')
Negative log likelihood loss.

Utility Functions

pad

F.pad(input, pad, mode='constant', value=0)
Pads tensor.
pad
tuple
required
m-elements tuple specifying padding sizes
mode
str
default:"'constant'"
'constant', 'reflect', 'replicate', or 'circular'
Example:
t = torch.tensor([[1, 2], [3, 4]])
padded = F.pad(t, (1, 1, 1, 1), mode='constant', value=0)
# tensor([[0, 0, 0, 0],
#         [0, 1, 2, 0],
#         [0, 3, 4, 0],
#         [0, 0, 0, 0]])

interpolate

F.interpolate(input, size=None, scale_factor=None, mode='nearest', 
              align_corners=None, recompute_scale_factor=None, antialias=False)
Down/up samples the input to given size or scale_factor.
size
int | tuple
default:"None"
Output spatial size
scale_factor
float | tuple
default:"None"
Multiplier for spatial size
mode
str
default:"'nearest'"
Algorithm: 'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear', 'area'