Overview
Thetorch.cuda package adds support for CUDA tensor types that utilize GPUs for computation. It implements the same functions as CPU tensors but leverages GPU acceleration.
It is lazily initialized, so you can always import it and use is_available() to determine if your system supports CUDA.
Device Management
torch.cuda.is_available()
torch.cuda.is_available()
torch.cuda.device_count()
torch.cuda.device_count()
torch.cuda.current_device()
torch.cuda.current_device()
torch.cuda.set_device()
torch.cuda.set_device()
torch.cuda.device()
torch.cuda.device()
torch.cuda.get_device_name()
torch.cuda.get_device_name()
torch.cuda.get_device_capability()
torch.cuda.get_device_capability()
torch.cuda.get_device_properties()
torch.cuda.get_device_properties()
name: Device namemajor: Major compute capabilityminor: Minor compute capabilitytotal_memory: Total memory in bytesmulti_processor_count: Number of multiprocessors
Memory Management
torch.cuda.memory_allocated()
torch.cuda.memory_allocated()
torch.cuda.memory_reserved()
torch.cuda.memory_reserved()
torch.cuda.max_memory_allocated()
torch.cuda.max_memory_allocated()
torch.cuda.reset_peak_memory_stats()
torch.cuda.reset_peak_memory_stats()
torch.cuda.empty_cache()
torch.cuda.empty_cache()
torch.cuda.memory_summary()
torch.cuda.memory_summary()
torch.cuda.memory_stats()
torch.cuda.memory_stats()
allocated_bytes.all.current: Current allocated memoryreserved_bytes.all.current: Current reserved memoryactive_bytes.all.current: Current active memory- And many more detailed metrics
Stream and Event Management
torch.cuda.Stream
torch.cuda.Stream
torch.cuda.stream()
torch.cuda.stream()
torch.cuda.Event
torch.cuda.Event
torch.cuda.synchronize()
torch.cuda.synchronize()
Random Number Generation
torch.cuda.manual_seed()
torch.cuda.manual_seed()
torch.cuda.manual_seed_all()
torch.cuda.manual_seed_all()
torch.cuda.seed()
torch.cuda.seed()
torch.cuda.seed_all()
torch.cuda.seed_all()
torch.cuda.initial_seed()
torch.cuda.initial_seed()
CUDA Graphs
torch.cuda.CUDAGraph
torch.cuda.CUDAGraph
torch.cuda.graph()
torch.cuda.graph()
Capability Checks
torch.cuda.is_bf16_supported()
torch.cuda.is_bf16_supported()
torch.cuda.is_tf32_supported()
torch.cuda.is_tf32_supported()
Performance Optimization
Best Practices
Memory Management
Memory Management
-
Pin memory for faster transfers:
-
Clear cache when needed:
-
Monitor memory usage:
Multi-GPU Training
Multi-GPU Training
Error Handling
Error Handling