torch.nn.functional

Overview

The torch.nn.functional module provides functional implementations of neural network operations. Unlike torch.nn modules, these functions are stateless and don’t maintain learnable parameters.

Use F as an alias: import torch.nn.functional as F

Convolution Operations

conv2d

F.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

Applies a 2D convolution over an input image.

input

Tensor

required

Input tensor of shape (minibatch, in_channels, iH, iW)

weight

Tensor

required

Filters of shape (out_channels, in_channels/groups, kH, kW)

bias

Tensor

default:"None"

Optional bias tensor of shape (out_channels)

stride

int | tuple

default:"1"

Stride of the convolving kernel

padding

int | tuple | str

default:"0"

Padding added to both sides. Can be 'valid', 'same', or integer

dilation

int | tuple

default:"1"

Spacing between kernel elements

groups

int

default:"1"

Number of blocked connections from input to output channels

Example:

import torch.nn.functional as F

filters = torch.randn(8, 4, 3, 3)
inputs = torch.randn(1, 4, 5, 5)
output = F.conv2d(inputs, filters, padding=1)

conv1d

F.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

Applies a 1D convolution over an input signal. Shape:

Input: (minibatch, in_channels, iW)
Weight: (out_channels, in_channels/groups, kW)
Output: (minibatch, out_channels, oW)

conv3d

F.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

Applies a 3D convolution over an input signal. Shape:

Input: (minibatch, in_channels, iT, iH, iW)
Weight: (out_channels, in_channels/groups, kT, kH, kW)
Output: (minibatch, out_channels, oT, oH, oW)

Pooling Operations

max_pool2d

F.max_pool2d(input, kernel_size, stride=None, padding=0, dilation=1, 
             ceil_mode=False, return_indices=False)

Applies 2D max pooling over an input signal.

input

Tensor

required

Input tensor of shape (minibatch, in_channels, iH, iW)

kernel_size

int | tuple

required

Size of the pooling region

stride

int | tuple

default:"kernel_size"

Stride of the pooling operation

padding

int | tuple

default:"0"

Implicit negative infinity padding

dilation

int | tuple

default:"1"

Stride between elements within a sliding window

ceil_mode

bool

default:"False"

Use ceil instead of floor to compute output shape

return_indices

bool

default:"False"

Return the max indices along with outputs

avg_pool2d

F.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, 
             count_include_pad=True, divisor_override=None)

Applies 2D average pooling over an input signal.

count_include_pad

bool

default:"True"

Include zero-padding in averaging calculation

divisor_override

int

default:"None"

If specified, used as divisor instead of pooling region size

adaptive_avg_pool2d

F.adaptive_avg_pool2d(input, output_size)

Applies 2D adaptive average pooling.

output_size

int | tuple

required

Target output size (H_out, W_out)

Example:

input = torch.randn(1, 64, 8, 9)
output = F.adaptive_avg_pool2d(input, (5, 7))
print(output.shape)  # torch.Size([1, 64, 5, 7])

Activation Functions

relu

F.relu(input, inplace=False)

Applies the rectified linear unit function element-wise: ReLU(x) = max(0, x)

input

Tensor

required

Input tensor

inplace

bool

default:"False"

Perform operation in-place

Example:

input = torch.randn(2)
output = F.relu(input)

leaky_relu

F.leaky_relu(input, negative_slope=0.01, inplace=False)

Applies element-wise: LeakyReLU(x) = max(0, x) + negative_slope * min(0, x)

negative_slope

float

default:"0.01"

Controls the angle of the negative slope

gelu

F.gelu(input, approximate='none')

Applies the Gaussian Error Linear Units function.

approximate

str

default:"'none'"

Approximation type: 'none' or 'tanh'

When approximate='none':

GELU(x) = x * Φ(x)

Where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

sigmoid

F.sigmoid(input)

Applies the element-wise sigmoid function: σ(x) = 1 / (1 + exp(-x))

tanh

F.tanh(input)

Applies the hyperbolic tangent function element-wise.

softmax

F.softmax(input, dim=None, dtype=None)

Applies the Softmax function to an n-dimensional input Tensor.

input

Tensor

required

Input tensor

dim

int

required

Dimension along which Softmax will be computed

dtype

torch.dtype

default:"None"

Desired data type of returned tensor

Formula:

Softmax(x_i) = exp(x_i) / Σ_j exp(x_j)

Example:

input = torch.randn(2, 3)
output = F.softmax(input, dim=1)
print(output.sum(dim=1))  # tensor([1., 1.])

log_softmax

F.log_softmax(input, dim=None, dtype=None)

Applies a softmax followed by a logarithm.

For numerical stability and better gradient flow, use log_softmax instead of manually computing log(softmax(x)).

Example:

input = torch.randn(2, 3)
output = F.log_softmax(input, dim=1)

elu

F.elu(input, alpha=1.0, inplace=False)

Applies the Exponential Linear Unit (ELU) function.

alpha

float

default:"1.0"

The α value for the ELU formulation

selu

F.selu(input, inplace=False)

Applies the Scaled Exponential Linear Unit (SELU) function.

Normalization Functions

batch_norm

F.batch_norm(input, running_mean, running_var, weight=None, bias=None, 
             training=False, momentum=0.1, eps=1e-05)

Applies Batch Normalization for each channel across a batch of data.

input

Tensor

required

Input tensor of shape (N, C, *) where * means any spatial dimensions

running_mean

Tensor

required

Running mean tensor of shape (C,)

running_var

Tensor

required

Running variance tensor of shape (C,)

weight

Tensor

default:"None"

Learnable scale parameter of shape (C,)

bias

Tensor

default:"None"

Learnable shift parameter of shape (C,)

training

bool

default:"False"

Training mode flag

momentum

float

default:"0.1"

Momentum for running statistics

eps

float

default:"1e-05"

Value added for numerical stability

layer_norm

F.layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05)

Applies Layer Normalization over a mini-batch of inputs.

normalized_shape

list | tuple

required

Input shape from an expected input

group_norm

F.group_norm(input, num_groups, weight=None, bias=None, eps=1e-05)

Applies Group Normalization over a mini-batch of inputs.

num_groups

int

required

Number of groups to separate channels into

Dropout Functions

dropout

F.dropout(input, p=0.5, training=True, inplace=False)

Randomly zeroes some elements of the input tensor with probability p.

float

default:"0.5"

Probability of an element to be zeroed

training

bool

default:"True"

Apply dropout if True

Example:

input = torch.randn(20, 16)
output = F.dropout(input, p=0.2, training=True)

dropout2d

F.dropout2d(input, p=0.5, training=True, inplace=False)

Randomly zeros out entire channels (channel-wise dropout).

Linear Functions

linear

F.linear(input, weight, bias=None)

Applies a linear transformation: y = xA^T + b

input

Tensor

required

Input tensor of shape (*, in_features)

weight

Tensor

required

Weight tensor of shape (out_features, in_features)

bias

Tensor

default:"None"

Optional bias tensor of shape (out_features,)

Example:

input = torch.randn(128, 20)
weight = torch.randn(30, 20)
output = F.linear(input, weight)
print(output.shape)  # torch.Size([128, 30])

bilinear

F.bilinear(input1, input2, weight, bias=None)

Applies a bilinear transformation: y = x1^T A x2 + b

Loss Functions

cross_entropy

F.cross_entropy(input, target, weight=None, size_average=None, 
                ignore_index=-100, reduce=None, reduction='mean', 
                label_smoothing=0.0)

Computes the cross entropy loss between input logits and target.

input

Tensor

required

Predicted unnormalized logits of shape (N, C) or (N, C, d1, d2, ..., dK)

target

Tensor

required

Ground truth class indices of shape (N) or (N, d1, d2, ..., dK)

reduction

str

default:"'mean'"

Specifies reduction: 'none', 'mean', or 'sum'

label_smoothing

float

default:"0.0"

Label smoothing factor in [0.0, 1.0]

Example:

input = torch.randn(3, 5, requires_grad=True)
target = torch.randint(5, (3,), dtype=torch.long)
loss = F.cross_entropy(input, target)
loss.backward()

mse_loss

F.mse_loss(input, target, size_average=None, reduce=None, reduction='mean')

Measures the element-wise mean squared error.

binary_cross_entropy

F.binary_cross_entropy(input, target, weight=None, size_average=None, 
                       reduce=None, reduction='mean')

Measures the Binary Cross Entropy between the target and input probabilities.

nll_loss

F.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, 
           reduce=None, reduction='mean')

Negative log likelihood loss.

Utility Functions

pad

F.pad(input, pad, mode='constant', value=0)

Pads tensor.

pad

tuple

required

m-elements tuple specifying padding sizes

mode

str

default:"'constant'"

'constant', 'reflect', 'replicate', or 'circular'

Example:

t = torch.tensor([[1, 2], [3, 4]])
padded = F.pad(t, (1, 1, 1, 1), mode='constant', value=0)
# tensor([[0, 0, 0, 0],
#         [0, 1, 2, 0],
#         [0, 3, 4, 0],
#         [0, 0, 0, 0]])

interpolate

F.interpolate(input, size=None, scale_factor=None, mode='nearest', 
              align_corners=None, recompute_scale_factor=None, antialias=False)

Down/up samples the input to given size or scale_factor.

size

int | tuple

default:"None"

Output spatial size

scale_factor

float | tuple

default:"None"

Multiplier for spatial size

mode

str

default:"'nearest'"

Algorithm: 'nearest', 'linear', 'bilinear', 'bicubic', 'trilinear', 'area'

​Overview

​Convolution Operations

​conv2d

​conv1d

​conv3d

​Pooling Operations

​max_pool2d

​avg_pool2d

​adaptive_avg_pool2d

​Activation Functions

​relu

​leaky_relu

​gelu

​sigmoid

​tanh

​softmax

​log_softmax

​elu

​selu

​Normalization Functions

​batch_norm

​layer_norm

​group_norm

​Dropout Functions

​dropout

​dropout2d

​Linear Functions

​linear

​bilinear

​Loss Functions

​cross_entropy

​mse_loss

​binary_cross_entropy

​nll_loss

​Utility Functions

​pad

​interpolate

Overview

Convolution Operations

conv2d

conv1d

conv3d

Pooling Operations

max_pool2d

avg_pool2d

adaptive_avg_pool2d

Activation Functions

relu

leaky_relu

gelu

sigmoid

tanh

softmax

log_softmax

elu

selu

Normalization Functions

batch_norm

layer_norm

group_norm

Dropout Functions

dropout

dropout2d

Linear Functions

linear

bilinear

Loss Functions

cross_entropy

mse_loss

binary_cross_entropy

nll_loss

Utility Functions

pad

interpolate