Skip to main content

Quick Start Guide

This guide will walk you through PyTorch basics with practical examples. You’ll learn about tensors, automatic differentiation, and building a simple neural network.
Make sure you have installed PyTorch before following this guide.

Working with Tensors

Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration support.

Creating Tensors

1

Import PyTorch

First, import the torch package:
import torch
import torch.nn as nn
import torch.optim as optim
2

Create basic tensors

You can create tensors in multiple ways:
# Create a tensor from data
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
print(f"Tensor from data:\n{x_data}")
# Output:
# tensor([[1, 2],
#         [3, 4]])

# Create random tensors
x_rand = torch.rand(2, 3)  # Uniform distribution [0, 1)
print(f"Random tensor:\n{x_rand}")

x_randn = torch.randn(2, 3)  # Normal distribution (mean=0, std=1)
print(f"Normal tensor:\n{x_randn}")

# Create tensors with specific values
x_zeros = torch.zeros(2, 3)
x_ones = torch.ones(2, 3)
print(f"Zeros:\n{x_zeros}")
print(f"Ones:\n{x_ones}")
3

Tensor properties

Tensors have attributes describing their shape, datatype, and device:
x = torch.rand(3, 4)

print(f"Shape: {x.shape}")           # Shape: torch.Size([3, 4])
print(f"Datatype: {x.dtype}")        # Datatype: torch.float32
print(f"Device: {x.device}")         # Device: cpu
print(f"Requires grad: {x.requires_grad}")  # Requires grad: False

Tensor Operations

PyTorch supports a wide variety of tensor operations:
# Arithmetic operations
a = torch.tensor([1.0, 2.0, 3.0])
b = torch.tensor([4.0, 5.0, 6.0])

print(f"Addition: {a + b}")        # tensor([5., 7., 9.])
print(f"Multiplication: {a * b}")  # tensor([4., 10., 18.])
print(f"Matrix multiply: {a @ b}") # tensor(32.) (dot product)

# Element-wise operations
print(f"Square: {torch.square(a)}")
print(f"Sqrt: {torch.sqrt(a)}")
print(f"Exp: {torch.exp(a)}")

GPU Acceleration

PyTorch tensors can be moved to GPU for faster computation. All operations on GPU tensors are accelerated.
# Check if CUDA is available
if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("Using CPU")

# Create tensor directly on GPU
x = torch.rand(3, 3, device=device)

# Or move existing tensor to GPU
y = torch.rand(3, 3)
y = y.to(device)

# Operations are performed on GPU
z = x + y

# Move back to CPU for numpy conversion
z_cpu = z.to("cpu")
z_numpy = z_cpu.numpy()
print(f"Result on CPU:\n{z_numpy}")

Automatic Differentiation

PyTorch’s autograd system automatically computes gradients for tensor operations, which is essential for training neural networks.
1

Enable gradient tracking

Set requires_grad=True to track computations:
# Create tensors that require gradients
x = torch.tensor([2.0, 3.0], requires_grad=True)
w = torch.tensor([1.0, 2.0], requires_grad=True)
b = torch.tensor([0.5], requires_grad=True)

print(f"x requires grad: {x.requires_grad}")
print(f"w requires grad: {w.requires_grad}")
2

Perform operations

All operations are recorded in the computational graph:
# Forward pass: compute y = w·x + b
y = torch.dot(w, x) + b
print(f"Output y: {y}")  # tensor([8.5000], grad_fn=<AddBackward0>)

# Apply non-linearity
z = torch.relu(y)
print(f"After ReLU z: {z}")
3

Compute gradients

Call .backward() to compute gradients:
# Compute gradients
z.backward()

# Access gradients
print(f"Gradient dz/dx: {x.grad}")  # tensor([1., 2.])
print(f"Gradient dz/dw: {w.grad}")  # tensor([2., 3.])
print(f"Gradient dz/db: {b.grad}")  # tensor([1.])
Gradients accumulate by default. Always call .zero_grad() before computing new gradients in training loops.
# Example of gradient accumulation
x = torch.tensor([1.0], requires_grad=True)

for i in range(3):
    y = x ** 2
    y.backward()
    print(f"Iteration {i+1}, gradient: {x.grad}")  # Accumulates!
    
    # Clear gradients for next iteration
    x.grad.zero_()

Building a Neural Network

Let’s build a simple feedforward neural network for classification.

Define the Model

1

Create a neural network class

Define your model by subclassing nn.Module:
import torch.nn as nn
import torch.nn.functional as F

class SimpleNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(SimpleNet, self).__init__()
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, hidden_size)
        self.fc3 = nn.Linear(hidden_size, num_classes)
        
    def forward(self, x):
        # Forward pass through the network
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)  # No activation on final layer for logits
        return x

# Create model instance
input_size = 784  # e.g., 28x28 images flattened
hidden_size = 128
num_classes = 10  # e.g., digits 0-9

model = SimpleNet(input_size, hidden_size, num_classes)
print(model)
Output:
SimpleNet(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=128, bias=True)
  (fc3): Linear(in_features=128, out_features=10, bias=True)
)
2

Inspect model parameters

View the learnable parameters:
# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# View layer weights and biases
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")
Output:
Total parameters: 118,282
Trainable parameters: 118,282
fc1.weight: torch.Size([128, 784])
fc1.bias: torch.Size([128])
fc2.weight: torch.Size([128, 128])
fc2.bias: torch.Size([128])
fc3.weight: torch.Size([10, 128])
fc3.bias: torch.Size([10])

Training Loop

Here’s a complete training example:
import torch
import torch.nn as nn
import torch.optim as optim

# Create synthetic dataset
batch_size = 32
num_samples = 1000

X_train = torch.randn(num_samples, input_size)
y_train = torch.randint(0, num_classes, (num_samples,))

# Initialize model, loss function, and optimizer
model = SimpleNet(input_size, hidden_size, num_classes)
criterion = nn.CrossEntropyLoss()  # For classification
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 10

for epoch in range(num_epochs):
    # Set model to training mode
    model.train()
    
    # Mini-batch training
    for i in range(0, num_samples, batch_size):
        # Get batch
        batch_X = X_train[i:i+batch_size]
        batch_y = y_train[i:i+batch_size]
        
        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights
    
    # Print progress
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")

print("Training complete!")

Making Predictions

# Set model to evaluation mode
model.eval()

# Create test data
X_test = torch.randn(5, input_size)

# Make predictions (no gradient computation needed)
with torch.no_grad():
    logits = model(X_test)
    probabilities = F.softmax(logits, dim=1)
    predictions = torch.argmax(probabilities, dim=1)

print(f"Predictions: {predictions}")
print(f"Probabilities:\n{probabilities}")

Model Architectures

PyTorch provides common layers for building different types of networks:

Fully Connected

nn.Linear(in_features, out_features)
Basic feedforward layer for MLPs

Convolutional

nn.Conv2d(in_channels, out_channels, kernel_size)
For image processing and CNNs

Recurrent

nn.LSTM(input_size, hidden_size)
nn.GRU(input_size, hidden_size)
For sequential data and time series

Attention

nn.MultiheadAttention(embed_dim, num_heads)
For transformers and attention mechanisms

Common Activation Functions

Available in torch.nn and torch.nn.functional:
# Activation functions
F.relu(x)        # Rectified Linear Unit
F.gelu(x)        # Gaussian Error Linear Unit
F.sigmoid(x)     # Sigmoid
F.tanh(x)        # Hyperbolic Tangent
F.softmax(x, dim=1)  # Softmax (for probabilities)
F.log_softmax(x, dim=1)  # Log Softmax (numerically stable)

# As layers
nn.ReLU()
nn.GELU()
nn.Sigmoid()
nn.Tanh()
nn.Softmax(dim=1)

Next Steps

Congratulations! You’ve learned the basics of PyTorch. Here’s what to explore next:

Tutorials

Official PyTorch tutorials covering computer vision, NLP, and more

API Reference

Complete API documentation for all PyTorch modules

Examples

Production-ready example implementations

Forums

Get help from the PyTorch community
Performance Tips:
  • Use GPU acceleration with .to(device) or .cuda()
  • Enable mixed precision training with torch.cuda.amp for faster training
  • Use torch.jit.script() or torch.jit.trace() to compile models for production
  • Profile your code with torch.profiler to identify bottlenecks