Skip to main content

Overview

The torch.nn module provides the building blocks for creating neural networks in PyTorch. It contains layers, activation functions, loss functions, and utilities for building and training deep learning models.

Key Components

Module Base Class

All neural network modules inherit from torch.nn.Module, which provides:
  • Parameter management
  • Forward pass computation
  • Training/evaluation mode switching
  • Device and dtype management

Main Submodules

Modules

Neural network layers and containers (Linear, Conv2d, Sequential, etc.)

Functional

Functional API for operations without learnable parameters

Initialization

Weight initialization methods for neural networks

Loss Functions

Training objectives like CrossEntropyLoss, MSELoss

Core Classes

Parameter

torch.nn.Parameter(data, requires_grad=True)
data
Tensor
required
Tensor data for the parameter
requires_grad
bool
default:"True"
Whether the parameter requires gradient computation
A kind of Tensor that is automatically registered as a parameter when assigned as a Module attribute.

Buffer

torch.nn.Buffer(tensor)
Registers a buffer that won’t be considered a model parameter but will be saved in state_dict.

Usage Example

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create model
model = SimpleNet()

# Forward pass
input_data = torch.randn(32, 784)
output = model(input_data)

Factory Kwargs

torch.nn.factory_kwargs(kwargs)
Returns a canonicalized dict of factory kwargs (device, dtype, memory_format) that can be passed to factory functions.
kwargs
dict
Dictionary that may contain device, dtype, memory_format, or factory_kwargs keys
return
dict
Canonicalized dictionary with factory kwargs

Common Patterns

Model Definition

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Define layers
        self.layer1 = nn.Linear(10, 20)
        self.layer2 = nn.Linear(20, 1)
    
    def forward(self, x):
        # Define forward pass
        x = self.layer1(x)
        x = torch.relu(x)
        x = self.layer2(x)
        return x

Training Mode

# Set model to training mode
model.train()

# Set model to evaluation mode
model.eval()

Device Management

# Move model to GPU
model = model.to('cuda')

# Or with device object
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

Best Practices

When creating custom modules, always call the parent class constructor:
class MyModule(nn.Module):
    def __init__(self):
        super().__init__()  # or super(MyModule, self).__init__()
        # Your initialization
Tensors assigned as attributes won’t be registered as parameters unless wrapped:
# Correct
self.weight = nn.Parameter(torch.randn(10, 20))

# Wrong - won't be trained
self.weight = torch.randn(10, 20)
The forward method defines the computation performed at every call:
def forward(self, x):
    # Define your forward pass
    return self.layer(x)

See Also