Tensor - Core Data Structure

Overview

A torch.Tensor is a multi-dimensional matrix containing elements of a single data type. PyTorch defines 10 tensor types with CPU and GPU variants.

Tensor Creation

Tensor.__init__()

Creates a tensor from data.

data

array_like

Initial tensor data.

dtype

torch.dtype

Desired data type. If None, infers from data.

device

torch.device

Desired device (cpu, cuda, mps, etc).

requires_grad

bool

default:"False"

If True, gradients will be computed for this tensor.

import torch

# From Python list
x = torch.Tensor([[1, 2], [3, 4]])

# With gradient tracking
x = torch.Tensor([1.0, 2.0], requires_grad=True)

Tensor.new_tensor()

Returns a new Tensor with data as the tensor data.

data

array_like

required

The returned Tensor copies data.

dtype

torch.dtype

The desired data type. Default: same as this tensor.

device

torch.device

The desired device. Default: same as this tensor.

requires_grad

bool

default:"False"

If autograd should record operations.

>>> tensor = torch.ones((2,), dtype=torch.int8)
>>> data = [[0, 1], [2, 3]]
>>> tensor.new_tensor(data)
tensor([[0, 1],
        [2, 3]], dtype=torch.int8)

Tensor.new_zeros()

Returns a Tensor of size filled with 0.

size

int...

required

Shape of the output tensor.

dtype

torch.dtype

Default: same as this tensor.

>>> tensor = torch.tensor((), dtype=torch.float64)
>>> tensor.new_zeros((2, 3))
tensor([[0., 0., 0.],
        [0., 0., 0.]], dtype=torch.float64)

Tensor.new_ones()

Returns a Tensor of size filled with 1.

size

int...

required

Shape of the output tensor.

>>> tensor = torch.tensor((), dtype=torch.int32)
>>> tensor.new_ones((2, 3))
tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int32)

Tensor Properties

Tensor.shape

Returns the size of the tensor.

shape

torch.Size

A tuple-like object of integers representing tensor dimensions.

>>> x = torch.randn(3, 4, 5)
>>> x.shape
torch.Size([3, 4, 5])

Tensor.dtype

Returns the data type of the tensor.

dtype

torch.dtype

Data type (torch.float32, torch.int64, etc).

Available dtypes:

Float: torch.float16, torch.float32, torch.float64, torch.bfloat16
Integer: torch.int8, torch.int16, torch.int32, torch.int64
Unsigned: torch.uint8, torch.uint16, torch.uint32, torch.uint64
Complex: torch.complex64, torch.complex128
Boolean: torch.bool

>>> x = torch.randn(3, 4)
>>> x.dtype
torch.float32

Tensor.device

Returns the device where the tensor is located.

device

torch.device

Device object (cpu, cuda:0, cuda:1, etc).

>>> x = torch.randn(3, 4, device='cuda:0')
>>> x.device
device(type='cuda', index=0)

Tensor.ndim

Returns the number of dimensions.

ndim

int

Number of tensor dimensions.

>>> x = torch.randn(3, 4, 5)
>>> x.ndim
3

Tensor.numel()

Returns the total number of elements in the tensor.

count

int

Total number of elements.

>>> x = torch.randn(3, 4, 5)
>>> x.numel()
60

Tensor.requires_grad

Returns True if gradients need to be computed for this Tensor.

requires_grad

bool

Whether gradient tracking is enabled.

>>> x = torch.randn(3, 4, requires_grad=True)
>>> x.requires_grad
True

Mathematical Operations

Tensor.abs()

Computes the absolute value of each element.

output

Tensor

Tensor with absolute values.

Formula: out_i = |input_i|

>>> x = torch.tensor([-1, -2, 3])
>>> x.abs()
tensor([1, 2, 3])

Tensor.add()

Adds other to self, scaled by alpha.

other

Tensor or Number

required

The tensor or number to add.

alpha

Number

default:"1"

Multiplier for other.

Formula: out = self + alpha × other

>>> x = torch.tensor([1, 2, 3])
>>> x.add(10)
tensor([11, 12, 13])

>>> y = torch.tensor([4, 5, 6])
>>> x.add(y, alpha=2)
tensor([9, 12, 15])

Tensor.matmul()

Matrix multiplication of self with other.

other

Tensor

required

The tensor to multiply with.

output

Tensor

Result of matrix multiplication.

Shapes:

1D × 1D → scalar
2D × 2D → 2D matrix
(batch, n, m) × (batch, m, p) → (batch, n, p)

>>> a = torch.randn(2, 3)
>>> b = torch.randn(3, 4)
>>> a.matmul(b).shape
torch.Size([2, 4])

Tensor.sum()

Returns the sum of all elements or along a dimension.

dim

int or tuple of ints

Dimension(s) to reduce. If None, reduces all dimensions.

keepdim

bool

default:"False"

Whether output has dim retained or not.

dtype

torch.dtype

Desired data type of output.

>>> x = torch.randn(4, 4)
>>> x.sum()
tensor(-1.2345)

>>> x.sum(dim=0, keepdim=True)
tensor([[-0.5, 1.2, -0.8, 0.3]])

Tensor.mean()

Returns the mean value of all elements or along a dimension.

dim

int or tuple of ints

Dimension(s) to reduce.

keepdim

bool

default:"False"

Whether output has dim retained.

>>> x = torch.randn(4, 4)
>>> x.mean()
tensor(0.1234)

>>> x.mean(dim=1)
tensor([0.2, -0.5, 0.8, -0.1])

Tensor.max()

Returns the maximum value of all elements or along a dimension.

dim

int

The dimension to reduce.

keepdim

bool

default:"False"

Whether output has dim retained.

values

Tensor

Maximum values.

indices

LongTensor

Indices of maximum values (when dim is specified).

>>> x = torch.randn(4, 4)
>>> x.max()
tensor(2.4567)

>>> values, indices = x.max(dim=1)

Shape Manipulation

Tensor.reshape()

Returns a tensor with the same data but different shape.

*shape

int

required

The desired shape.

output

Tensor

Reshaped tensor (view if possible, otherwise copy).

>>> x = torch.randn(4, 6)
>>> x.reshape(2, 12).shape
torch.Size([2, 12])

>>> x.reshape(-1, 3).shape  # -1 inferred
torch.Size([8, 3])

Tensor.view()

Returns a new tensor with the same data but different shape.

*shape

int

required

The desired shape.

output

Tensor

Tensor view (shares storage with original).

Note: Requires the tensor to be contiguous.

>>> x = torch.randn(4, 6)
>>> x.view(2, 12).shape
torch.Size([2, 12])

Tensor.transpose()

Returns a tensor with two dimensions swapped.

dim0

int

required

First dimension to transpose.

dim1

int

required

Second dimension to transpose.

>>> x = torch.randn(2, 3, 4)
>>> x.transpose(0, 2).shape
torch.Size([4, 3, 2])

Tensor.permute()

Returns a view with dimensions permuted.

*dims

int

required

Desired ordering of dimensions.

>>> x = torch.randn(2, 3, 4)
>>> x.permute(2, 0, 1).shape
torch.Size([4, 2, 3])

Tensor.squeeze()

Returns a tensor with all dimensions of size 1 removed.

dim

int

If given, only squeeze this dimension if size is 1.

>>> x = torch.randn(2, 1, 3, 1)
>>> x.squeeze().shape
torch.Size([2, 3])

>>> x.squeeze(1).shape
torch.Size([2, 3, 1])

Tensor.unsqueeze()

Returns a tensor with a dimension of size 1 inserted at the specified position.

dim

int

required

Index at which to insert the dimension.

>>> x = torch.randn(2, 3)
>>> x.unsqueeze(0).shape
torch.Size([1, 2, 3])

>>> x.unsqueeze(1).shape
torch.Size([2, 1, 3])

Device Transfer

Tensor.to()

Performs Tensor dtype and/or device conversion.

device

torch.device or str

Target device.

dtype

torch.dtype

Target data type.

non_blocking

bool

default:"False"

If True, tries to convert asynchronously.

# Move to GPU
>>> x_gpu = x.to('cuda')

# Change dtype
>>> x_fp16 = x.to(torch.float16)

# Both
>>> x_gpu_fp16 = x.to(device='cuda', dtype=torch.float16)

Tensor.cuda()

Returns a copy of this tensor in CUDA memory.

device

int

The destination GPU device. Defaults to current CUDA device.

non_blocking

bool

default:"False"

If True, tries to convert asynchronously.

>>> x_cpu = torch.randn(3, 4)
>>> x_gpu = x_cpu.cuda()
>>> x_gpu.device
device(type='cuda', index=0)

Tensor.cpu()

Returns a copy of this tensor in CPU memory.

>>> x_gpu = torch.randn(3, 4, device='cuda')
>>> x_cpu = x_gpu.cpu()
>>> x_cpu.device
device(type='cpu')

Gradient Operations

Tensor.requires_grad_()

Change if autograd should record operations on this tensor.

requires_grad

bool

default:"True"

Whether to record operations.

self

Tensor

Returns self (modified in-place).

>>> x = torch.randn(3, 4)
>>> x.requires_grad_(True)
>>> x.requires_grad
True

Tensor.detach()

Returns a new Tensor detached from the current graph.

output

Tensor

Detached tensor (shares storage, but no grad tracking).

>>> x = torch.randn(3, 4, requires_grad=True)
>>> y = x.detach()
>>> y.requires_grad
False

Tensor.backward()

Computes the gradient of current tensor w.r.t. graph leaves.

gradient

Tensor

Gradient w.r.t. the tensor. If None and tensor is scalar, uses torch.ones_like(tensor).

retain_graph

bool

If False, free the graph used to compute grads.

create_graph

bool

default:"False"

If True, graph of the derivative will be constructed for higher order derivatives.

>>> x = torch.randn(3, 4, requires_grad=True)
>>> y = x.sum()
>>> y.backward()
>>> x.grad  # Gradient computed

Tensor.grad

This attribute is None by default. It accumulates gradients during backward().

grad

Tensor or None

Accumulated gradients.

>>> x = torch.tensor([1., 2., 3.], requires_grad=True)
>>> y = x.sum()
>>> y.backward()
>>> x.grad
tensor([1., 1., 1.])

Indexing and Slicing

import torch

x = torch.randn(3, 4, 5)

# Single element
val = x[0, 1, 2]

# Slicing
slice1 = x[0, :, :]  # First matrix
slice2 = x[:, 1, :]  # All elements in 2nd column

In-place Operations

Operations with a trailing _ modify the tensor in-place:

x = torch.randn(3, 4)

# Add in-place
x.add_(5)

# Multiply in-place
x.mul_(2)

# Clamp in-place
x.clamp_(min=0, max=1)

Best Practices

Memory Efficiency

Use in-place operations (add_, mul_, etc.) when safe to reduce memory
Prefer view() over reshape() when possible (avoids copy)
Use detach() when you don’t need gradients
Clear gradients with tensor.grad = None instead of tensor.grad.zero_() for better memory

Shape Operations

Use view(-1) to flatten tensors
Use unsqueeze() to add broadcasting dimensions
Use contiguous() before view() if tensor is not contiguous
Prefer reshape() over view() for more robust code

Device Management

Create tensors on target device directly: torch.randn(3, 4, device='cuda')
Use .to(device) for device-agnostic code
Set non_blocking=True for async CPU→GPU transfers when possible

torch Module

Core PyTorch functions

Autograd API

Automatic differentiation

CUDA API

GPU operations

​Overview

​Tensor Creation

​Tensor Properties

​Mathematical Operations

​Shape Manipulation

​Device Transfer

​Gradient Operations

​Indexing and Slicing

​In-place Operations

​Best Practices

​Related APIs

torch Module

Autograd API

CUDA API

Overview

Tensor Creation

Tensor Properties

Mathematical Operations

Shape Manipulation

Device Transfer

Gradient Operations

Indexing and Slicing

In-place Operations

Best Practices

Related APIs