torch.nn - PyTorch

Overview

The torch.nn module provides the building blocks for creating neural networks in PyTorch. It contains layers, activation functions, loss functions, and utilities for building and training deep learning models.

Key Components

Module Base Class

All neural network modules inherit from torch.nn.Module, which provides:

Parameter management
Forward pass computation
Training/evaluation mode switching
Device and dtype management

Main Submodules

Modules

Neural network layers and containers (Linear, Conv2d, Sequential, etc.)

Functional

Functional API for operations without learnable parameters

Initialization

Weight initialization methods for neural networks

Loss Functions

Training objectives like CrossEntropyLoss, MSELoss

Core Classes

Parameter

torch.nn.Parameter(data, requires_grad=True)

data

Tensor

required

Tensor data for the parameter

requires_grad

bool

default:"True"

Whether the parameter requires gradient computation

A kind of Tensor that is automatically registered as a parameter when assigned as a Module attribute.

Buffer

torch.nn.Buffer(tensor)

Registers a buffer that won’t be considered a model parameter but will be saved in state_dict.

Usage Example

import torch
import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 128)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Create model
model = SimpleNet()

# Forward pass
input_data = torch.randn(32, 784)
output = model(input_data)

Factory Kwargs

torch.nn.factory_kwargs(kwargs)

Returns a canonicalized dict of factory kwargs (device, dtype, memory_format) that can be passed to factory functions.

kwargs

dict

Dictionary that may contain device, dtype, memory_format, or factory_kwargs keys

return

dict

Canonicalized dictionary with factory kwargs

Common Patterns

Model Definition

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        # Define layers
        self.layer1 = nn.Linear(10, 20)
        self.layer2 = nn.Linear(20, 1)
    
    def forward(self, x):
        # Define forward pass
        x = self.layer1(x)
        x = torch.relu(x)
        x = self.layer2(x)
        return x

Training Mode

# Set model to training mode
model.train()

# Set model to evaluation mode
model.eval()

Device Management

# Move model to GPU
model = model.to('cuda')

# Or with device object
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

Best Practices

Always call super().__init__()

When creating custom modules, always call the parent class constructor:

class MyModule(nn.Module):
    def __init__(self):
        super().__init__()  # or super(MyModule, self).__init__()
        # Your initialization

Use nn.Parameter for learnable weights

Tensors assigned as attributes won’t be registered as parameters unless wrapped:

# Correct
self.weight = nn.Parameter(torch.randn(10, 20))

# Wrong - won't be trained
self.weight = torch.randn(10, 20)

Implement forward() method

The forward method defines the computation performed at every call:

def forward(self, x):
    # Define your forward pass
    return self.layer(x)

​Overview

​Key Components

​Module Base Class

​Main Submodules

Modules

Functional

Initialization

Loss Functions

​Core Classes

​Parameter

​Buffer

​Usage Example

​Factory Kwargs

​Common Patterns

​Model Definition

​Training Mode

​Device Management

​Best Practices

​See Also