Tensors are the fundamental data structure in PyTorch. They are multi-dimensional arrays similar to NumPy ndarrays, but with the added capability of running on GPUs and automatic differentiation.
A tensor is a generalization of vectors and matrices to potentially higher dimensions. In PyTorch, tensors are used to encode inputs, outputs, and model parameters.
Tensors are similar to NumPy arrays, but can run on GPUs to accelerate computing and support automatic differentiation for building neural networks.
In deep learning, understanding tensor shapes is crucial. A batch of 32 RGB images of size 224x224 has shape (32, 3, 224, 224): batch_size × channels × height × width.
import torchx = torch.randn(3, 4)# Sumtotal = x.sum() # Sum all elementsrow_sum = x.sum(dim=1) # Sum along columns (per row), shape: (3,)col_sum = x.sum(dim=0) # Sum along rows (per column), shape: (4,)# Mean, max, minmean_val = x.mean()max_val = x.max()min_val = x.min()# Along specific dimensionsmax_per_row, indices = x.max(dim=1) # Returns values and indices
One of PyTorch’s key features is seamless GPU support for accelerating tensor computations.
GPU Memory Management
When working with GPUs, be mindful of memory usage. Large tensors can quickly consume GPU memory. Use torch.cuda.empty_cache() to free unused cached memory, and consider using mixed precision training with torch.cuda.amp for memory efficiency.
import torch# Check number of GPUsnum_gpus = torch.cuda.device_count()print(f"Available GPUs: {num_gpus}")# Use specific GPUif num_gpus > 1: x = torch.randn(100, 100, device="cuda:0") # GPU 0 y = torch.randn(100, 100, device="cuda:1") # GPU 1 # Move to same device for operations y = y.to("cuda:0") z = x @ y
Operations with a _ suffix modify tensors in-place:
In-place operations save memory but can cause issues with autograd. Avoid using them on tensors that require gradients unless you know what you’re doing.
import torchx = torch.tensor([1.0, 2.0, 3.0])# Regular operation (creates new tensor)y = x.add(5) # x unchanged, y = [6., 7., 8.]# In-place operation (modifies x)x.add_(5) # x = [6., 7., 8.]# Other in-place operationsx.mul_(2) # Multiply in-placex.zero_() # Fill with zerosx.normal_(0, 1) # Fill with random normal values
import torchx = torch.randn(3, 4, 5)# Shapeprint(x.shape) # torch.Size([3, 4, 5])print(x.size()) # Same as x.shape# Data typeprint(x.dtype) # torch.float32# Deviceprint(x.device) # cpu or cuda:0, etc.# Number of dimensionsprint(x.ndim) # 3# Total number of elementsprint(x.numel()) # 60 (3 * 4 * 5)# Memory layoutprint(x.is_contiguous()) # True or False