Quick Start Guide
This guide will walk you through PyTorch basics with practical examples. You’ll learn about tensors, automatic differentiation, and building a simple neural network.
Working with Tensors
Tensors are the fundamental data structure in PyTorch, similar to NumPy arrays but with GPU acceleration support.
Creating Tensors
Import PyTorch
First, import the torch package: import torch
import torch.nn as nn
import torch.optim as optim
Create basic tensors
You can create tensors in multiple ways: # Create a tensor from data
data = [[ 1 , 2 ], [ 3 , 4 ]]
x_data = torch.tensor(data)
print ( f "Tensor from data: \n { x_data } " )
# Output:
# tensor([[1, 2],
# [3, 4]])
# Create random tensors
x_rand = torch.rand( 2 , 3 ) # Uniform distribution [0, 1)
print ( f "Random tensor: \n { x_rand } " )
x_randn = torch.randn( 2 , 3 ) # Normal distribution (mean=0, std=1)
print ( f "Normal tensor: \n { x_randn } " )
# Create tensors with specific values
x_zeros = torch.zeros( 2 , 3 )
x_ones = torch.ones( 2 , 3 )
print ( f "Zeros: \n { x_zeros } " )
print ( f "Ones: \n { x_ones } " )
Tensor properties
Tensors have attributes describing their shape, datatype, and device: x = torch.rand( 3 , 4 )
print ( f "Shape: { x.shape } " ) # Shape: torch.Size([3, 4])
print ( f "Datatype: { x.dtype } " ) # Datatype: torch.float32
print ( f "Device: { x.device } " ) # Device: cpu
print ( f "Requires grad: { x.requires_grad } " ) # Requires grad: False
Tensor Operations
PyTorch supports a wide variety of tensor operations:
Basic Operations
Matrix Operations
Indexing & Slicing
Reshaping
# Arithmetic operations
a = torch.tensor([ 1.0 , 2.0 , 3.0 ])
b = torch.tensor([ 4.0 , 5.0 , 6.0 ])
print ( f "Addition: { a + b } " ) # tensor([5., 7., 9.])
print ( f "Multiplication: { a * b } " ) # tensor([4., 10., 18.])
print ( f "Matrix multiply: { a @ b } " ) # tensor(32.) (dot product)
# Element-wise operations
print ( f "Square: { torch.square(a) } " )
print ( f "Sqrt: { torch.sqrt(a) } " )
print ( f "Exp: { torch.exp(a) } " )
GPU Acceleration
PyTorch tensors can be moved to GPU for faster computation. All operations on GPU tensors are accelerated.
# Check if CUDA is available
if torch.cuda.is_available():
device = torch.device( "cuda" )
print ( f "Using GPU: { torch.cuda.get_device_name( 0 ) } " )
else :
device = torch.device( "cpu" )
print ( "Using CPU" )
# Create tensor directly on GPU
x = torch.rand( 3 , 3 , device = device)
# Or move existing tensor to GPU
y = torch.rand( 3 , 3 )
y = y.to(device)
# Operations are performed on GPU
z = x + y
# Move back to CPU for numpy conversion
z_cpu = z.to( "cpu" )
z_numpy = z_cpu.numpy()
print ( f "Result on CPU: \n { z_numpy } " )
Automatic Differentiation
PyTorch’s autograd system automatically computes gradients for tensor operations, which is essential for training neural networks.
Enable gradient tracking
Set requires_grad=True to track computations: # Create tensors that require gradients
x = torch.tensor([ 2.0 , 3.0 ], requires_grad = True )
w = torch.tensor([ 1.0 , 2.0 ], requires_grad = True )
b = torch.tensor([ 0.5 ], requires_grad = True )
print ( f "x requires grad: { x.requires_grad } " )
print ( f "w requires grad: { w.requires_grad } " )
Perform operations
All operations are recorded in the computational graph: # Forward pass: compute y = w·x + b
y = torch.dot(w, x) + b
print ( f "Output y: { y } " ) # tensor([8.5000], grad_fn=<AddBackward0>)
# Apply non-linearity
z = torch.relu(y)
print ( f "After ReLU z: { z } " )
Compute gradients
Call .backward() to compute gradients: # Compute gradients
z.backward()
# Access gradients
print ( f "Gradient dz/dx: { x.grad } " ) # tensor([1., 2.])
print ( f "Gradient dz/dw: { w.grad } " ) # tensor([2., 3.])
print ( f "Gradient dz/db: { b.grad } " ) # tensor([1.])
Gradients accumulate by default. Always call .zero_grad() before computing new gradients in training loops.
# Example of gradient accumulation
x = torch.tensor([ 1.0 ], requires_grad = True )
for i in range ( 3 ):
y = x ** 2
y.backward()
print ( f "Iteration { i + 1 } , gradient: { x.grad } " ) # Accumulates!
# Clear gradients for next iteration
x.grad.zero_()
Building a Neural Network
Let’s build a simple feedforward neural network for classification.
Define the Model
Create a neural network class
Define your model by subclassing nn.Module: import torch.nn as nn
import torch.nn.functional as F
class SimpleNet ( nn . Module ):
def __init__ ( self , input_size , hidden_size , num_classes ):
super (SimpleNet, self ). __init__ ()
# Define layers
self .fc1 = nn.Linear(input_size, hidden_size)
self .fc2 = nn.Linear(hidden_size, hidden_size)
self .fc3 = nn.Linear(hidden_size, num_classes)
def forward ( self , x ):
# Forward pass through the network
x = F.relu( self .fc1(x))
x = F.relu( self .fc2(x))
x = self .fc3(x) # No activation on final layer for logits
return x
# Create model instance
input_size = 784 # e.g., 28x28 images flattened
hidden_size = 128
num_classes = 10 # e.g., digits 0-9
model = SimpleNet(input_size, hidden_size, num_classes)
print (model)
Output: SimpleNet(
(fc1): Linear(in_features=784, out_features=128, bias=True)
(fc2): Linear(in_features=128, out_features=128, bias=True)
(fc3): Linear(in_features=128, out_features=10, bias=True)
)
Inspect model parameters
View the learnable parameters: # Count parameters
total_params = sum (p.numel() for p in model.parameters())
trainable_params = sum (p.numel() for p in model.parameters() if p.requires_grad)
print ( f "Total parameters: { total_params :,} " )
print ( f "Trainable parameters: { trainable_params :,} " )
# View layer weights and biases
for name, param in model.named_parameters():
print ( f " { name } : { param.shape } " )
Output: Total parameters: 118,282
Trainable parameters: 118,282
fc1.weight: torch.Size([128, 784])
fc1.bias: torch.Size([128])
fc2.weight: torch.Size([128, 128])
fc2.bias: torch.Size([128])
fc3.weight: torch.Size([10, 128])
fc3.bias: torch.Size([10])
Training Loop
Here’s a complete training example:
import torch
import torch.nn as nn
import torch.optim as optim
# Create synthetic dataset
batch_size = 32
num_samples = 1000
X_train = torch.randn(num_samples, input_size)
y_train = torch.randint( 0 , num_classes, (num_samples,))
# Initialize model, loss function, and optimizer
model = SimpleNet(input_size, hidden_size, num_classes)
criterion = nn.CrossEntropyLoss() # For classification
optimizer = optim.Adam(model.parameters(), lr = 0.001 )
# Training loop
num_epochs = 10
for epoch in range (num_epochs):
# Set model to training mode
model.train()
# Mini-batch training
for i in range ( 0 , num_samples, batch_size):
# Get batch
batch_X = X_train[i:i + batch_size]
batch_y = y_train[i:i + batch_size]
# Forward pass
outputs = model(batch_X)
loss = criterion(outputs, batch_y)
# Backward pass and optimization
optimizer.zero_grad() # Clear gradients
loss.backward() # Compute gradients
optimizer.step() # Update weights
# Print progress
print ( f "Epoch [ { epoch + 1 } / { num_epochs } ], Loss: { loss.item() :.4f} " )
print ( "Training complete!" )
Making Predictions
Inference
Save Model
Save Full Model
# Set model to evaluation mode
model.eval()
# Create test data
X_test = torch.randn( 5 , input_size)
# Make predictions (no gradient computation needed)
with torch.no_grad():
logits = model(X_test)
probabilities = F.softmax(logits, dim = 1 )
predictions = torch.argmax(probabilities, dim = 1 )
print ( f "Predictions: { predictions } " )
print ( f "Probabilities: \n { probabilities } " )
Model Architectures
PyTorch provides common layers for building different types of networks:
Fully Connected nn.Linear(in_features, out_features)
Basic feedforward layer for MLPs
Convolutional nn.Conv2d(in_channels, out_channels, kernel_size)
For image processing and CNNs
Recurrent nn.LSTM(input_size, hidden_size)
nn.GRU(input_size, hidden_size)
For sequential data and time series
Attention nn.MultiheadAttention(embed_dim, num_heads)
For transformers and attention mechanisms
Common Activation Functions
Available in torch.nn and torch.nn.functional:
# Activation functions
F.relu(x) # Rectified Linear Unit
F.gelu(x) # Gaussian Error Linear Unit
F.sigmoid(x) # Sigmoid
F.tanh(x) # Hyperbolic Tangent
F.softmax(x, dim = 1 ) # Softmax (for probabilities)
F.log_softmax(x, dim = 1 ) # Log Softmax (numerically stable)
# As layers
nn.ReLU()
nn.GELU()
nn.Sigmoid()
nn.Tanh()
nn.Softmax( dim = 1 )
Next Steps
Congratulations! You’ve learned the basics of PyTorch. Here’s what to explore next:
Tutorials Official PyTorch tutorials covering computer vision, NLP, and more
API Reference Complete API documentation for all PyTorch modules
Examples Production-ready example implementations
Forums Get help from the PyTorch community
Performance Tips:
Use GPU acceleration with .to(device) or .cuda()
Enable mixed precision training with torch.cuda.amp for faster training
Use torch.jit.script() or torch.jit.trace() to compile models for production
Profile your code with torch.profiler to identify bottlenecks