Skip to main content

Overview

A torch.Tensor is a multi-dimensional matrix containing elements of a single data type. PyTorch defines 10 tensor types with CPU and GPU variants.

Tensor Creation

Creates a tensor from data.
data
array_like
Initial tensor data.
dtype
torch.dtype
Desired data type. If None, infers from data.
device
torch.device
Desired device (cpu, cuda, mps, etc).
requires_grad
bool
default:"False"
If True, gradients will be computed for this tensor.
import torch

# From Python list
x = torch.Tensor([[1, 2], [3, 4]])

# With gradient tracking
x = torch.Tensor([1.0, 2.0], requires_grad=True)
Returns a new Tensor with data as the tensor data.
data
array_like
required
The returned Tensor copies data.
dtype
torch.dtype
The desired data type. Default: same as this tensor.
device
torch.device
The desired device. Default: same as this tensor.
requires_grad
bool
default:"False"
If autograd should record operations.
>>> tensor = torch.ones((2,), dtype=torch.int8)
>>> data = [[0, 1], [2, 3]]
>>> tensor.new_tensor(data)
tensor([[0, 1],
        [2, 3]], dtype=torch.int8)
Returns a Tensor of size filled with 0.
size
int...
required
Shape of the output tensor.
dtype
torch.dtype
Default: same as this tensor.
>>> tensor = torch.tensor((), dtype=torch.float64)
>>> tensor.new_zeros((2, 3))
tensor([[0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)
Returns a Tensor of size filled with 1.
size
int...
required
Shape of the output tensor.
>>> tensor = torch.tensor((), dtype=torch.int32)
>>> tensor.new_ones((2, 3))
tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int32)

Tensor Properties

Returns the size of the tensor.
shape
torch.Size
A tuple-like object of integers representing tensor dimensions.
>>> x = torch.randn(3, 4, 5)
>>> x.shape
torch.Size([3, 4, 5])
Returns the data type of the tensor.
dtype
torch.dtype
Data type (torch.float32, torch.int64, etc).
Available dtypes:
  • Float: torch.float16, torch.float32, torch.float64, torch.bfloat16
  • Integer: torch.int8, torch.int16, torch.int32, torch.int64
  • Unsigned: torch.uint8, torch.uint16, torch.uint32, torch.uint64
  • Complex: torch.complex64, torch.complex128
  • Boolean: torch.bool
>>> x = torch.randn(3, 4)
>>> x.dtype
torch.float32
Returns the device where the tensor is located.
device
torch.device
Device object (cpu, cuda:0, cuda:1, etc).
>>> x = torch.randn(3, 4, device='cuda:0')
>>> x.device
device(type='cuda', index=0)
Returns the number of dimensions.
ndim
int
Number of tensor dimensions.
>>> x = torch.randn(3, 4, 5)
>>> x.ndim
3
Returns the total number of elements in the tensor.
count
int
Total number of elements.
>>> x = torch.randn(3, 4, 5)
>>> x.numel()
60
Returns True if gradients need to be computed for this Tensor.
requires_grad
bool
Whether gradient tracking is enabled.
>>> x = torch.randn(3, 4, requires_grad=True)
>>> x.requires_grad
True

Mathematical Operations

Computes the absolute value of each element.
output
Tensor
Tensor with absolute values.
Formula: out_i = |input_i|
>>> x = torch.tensor([-1, -2, 3])
>>> x.abs()
tensor([1, 2, 3])
Adds other to self, scaled by alpha.
other
Tensor or Number
required
The tensor or number to add.
alpha
Number
default:"1"
Multiplier for other.
Formula: out = self + alpha × other
>>> x = torch.tensor([1, 2, 3])
>>> x.add(10)
tensor([11, 12, 13])

>>> y = torch.tensor([4, 5, 6])
>>> x.add(y, alpha=2)
tensor([9, 12, 15])
Matrix multiplication of self with other.
other
Tensor
required
The tensor to multiply with.
output
Tensor
Result of matrix multiplication.
Shapes:
  • 1D × 1D → scalar
  • 2D × 2D → 2D matrix
  • (batch, n, m) × (batch, m, p) → (batch, n, p)
>>> a = torch.randn(2, 3)
>>> b = torch.randn(3, 4)
>>> a.matmul(b).shape
torch.Size([2, 4])
Returns the sum of all elements or along a dimension.
dim
int or tuple of ints
Dimension(s) to reduce. If None, reduces all dimensions.
keepdim
bool
default:"False"
Whether output has dim retained or not.
dtype
torch.dtype
Desired data type of output.
>>> x = torch.randn(4, 4)
>>> x.sum()
tensor(-1.2345)

>>> x.sum(dim=0, keepdim=True)
tensor([[-0.5, 1.2, -0.8, 0.3]])
Returns the mean value of all elements or along a dimension.
dim
int or tuple of ints
Dimension(s) to reduce.
keepdim
bool
default:"False"
Whether output has dim retained.
>>> x = torch.randn(4, 4)
>>> x.mean()
tensor(0.1234)

>>> x.mean(dim=1)
tensor([0.2, -0.5, 0.8, -0.1])
Returns the maximum value of all elements or along a dimension.
dim
int
The dimension to reduce.
keepdim
bool
default:"False"
Whether output has dim retained.
values
Tensor
Maximum values.
indices
LongTensor
Indices of maximum values (when dim is specified).
>>> x = torch.randn(4, 4)
>>> x.max()
tensor(2.4567)

>>> values, indices = x.max(dim=1)

Shape Manipulation

Returns a tensor with the same data but different shape.
*shape
int
required
The desired shape.
output
Tensor
Reshaped tensor (view if possible, otherwise copy).
>>> x = torch.randn(4, 6)
>>> x.reshape(2, 12).shape
torch.Size([2, 12])

>>> x.reshape(-1, 3).shape  # -1 inferred
torch.Size([8, 3])
Returns a new tensor with the same data but different shape.
*shape
int
required
The desired shape.
output
Tensor
Tensor view (shares storage with original).
Note: Requires the tensor to be contiguous.
>>> x = torch.randn(4, 6)
>>> x.view(2, 12).shape
torch.Size([2, 12])
Returns a tensor with two dimensions swapped.
dim0
int
required
First dimension to transpose.
dim1
int
required
Second dimension to transpose.
>>> x = torch.randn(2, 3, 4)
>>> x.transpose(0, 2).shape
torch.Size([4, 3, 2])
Returns a view with dimensions permuted.
*dims
int
required
Desired ordering of dimensions.
>>> x = torch.randn(2, 3, 4)
>>> x.permute(2, 0, 1).shape
torch.Size([4, 2, 3])
Returns a tensor with all dimensions of size 1 removed.
dim
int
If given, only squeeze this dimension if size is 1.
>>> x = torch.randn(2, 1, 3, 1)
>>> x.squeeze().shape
torch.Size([2, 3])

>>> x.squeeze(1).shape
torch.Size([2, 3, 1])
Returns a tensor with a dimension of size 1 inserted at the specified position.
dim
int
required
Index at which to insert the dimension.
>>> x = torch.randn(2, 3)
>>> x.unsqueeze(0).shape
torch.Size([1, 2, 3])

>>> x.unsqueeze(1).shape
torch.Size([2, 1, 3])

Device Transfer

Performs Tensor dtype and/or device conversion.
device
torch.device or str
Target device.
dtype
torch.dtype
Target data type.
non_blocking
bool
default:"False"
If True, tries to convert asynchronously.
# Move to GPU
>>> x_gpu = x.to('cuda')

# Change dtype
>>> x_fp16 = x.to(torch.float16)

# Both
>>> x_gpu_fp16 = x.to(device='cuda', dtype=torch.float16)
Returns a copy of this tensor in CUDA memory.
device
int
The destination GPU device. Defaults to current CUDA device.
non_blocking
bool
default:"False"
If True, tries to convert asynchronously.
>>> x_cpu = torch.randn(3, 4)
>>> x_gpu = x_cpu.cuda()
>>> x_gpu.device
device(type='cuda', index=0)
Returns a copy of this tensor in CPU memory.
>>> x_gpu = torch.randn(3, 4, device='cuda')
>>> x_cpu = x_gpu.cpu()
>>> x_cpu.device
device(type='cpu')

Gradient Operations

Change if autograd should record operations on this tensor.
requires_grad
bool
default:"True"
Whether to record operations.
self
Tensor
Returns self (modified in-place).
>>> x = torch.randn(3, 4)
>>> x.requires_grad_(True)
>>> x.requires_grad
True
Returns a new Tensor detached from the current graph.
output
Tensor
Detached tensor (shares storage, but no grad tracking).
>>> x = torch.randn(3, 4, requires_grad=True)
>>> y = x.detach()
>>> y.requires_grad
False
Computes the gradient of current tensor w.r.t. graph leaves.
gradient
Tensor
Gradient w.r.t. the tensor. If None and tensor is scalar, uses torch.ones_like(tensor).
retain_graph
bool
If False, free the graph used to compute grads.
create_graph
bool
default:"False"
If True, graph of the derivative will be constructed for higher order derivatives.
>>> x = torch.randn(3, 4, requires_grad=True)
>>> y = x.sum()
>>> y.backward()
>>> x.grad  # Gradient computed
This attribute is None by default. It accumulates gradients during backward().
grad
Tensor or None
Accumulated gradients.
>>> x = torch.tensor([1., 2., 3.], requires_grad=True)
>>> y = x.sum()
>>> y.backward()
>>> x.grad
tensor([1., 1., 1.])

Indexing and Slicing

import torch

x = torch.randn(3, 4, 5)

# Single element
val = x[0, 1, 2]

# Slicing
slice1 = x[0, :, :]  # First matrix
slice2 = x[:, 1, :]  # All elements in 2nd column

In-place Operations

Operations with a trailing _ modify the tensor in-place:
x = torch.randn(3, 4)

# Add in-place
x.add_(5)

# Multiply in-place
x.mul_(2)

# Clamp in-place
x.clamp_(min=0, max=1)

Best Practices

  • Use in-place operations (add_, mul_, etc.) when safe to reduce memory
  • Prefer view() over reshape() when possible (avoids copy)
  • Use detach() when you don’t need gradients
  • Clear gradients with tensor.grad = None instead of tensor.grad.zero_() for better memory
  • Use view(-1) to flatten tensors
  • Use unsqueeze() to add broadcasting dimensions
  • Use contiguous() before view() if tensor is not contiguous
  • Prefer reshape() over view() for more robust code
  • Create tensors on target device directly: torch.randn(3, 4, device='cuda')
  • Use .to(device) for device-agnostic code
  • Set non_blocking=True for async CPU→GPU transfers when possible

torch Module

Core PyTorch functions

Autograd API

Automatic differentiation

CUDA API

GPU operations