← 返回首页
Introduction to torch.compile — PyTorch Tutorials 2.12.0+cu130 documentation
Skip to main content
Back to top
Ctrl+K
v2.12.0+cu130
Intro
Learn the Basics
Introduction to PyTorch - YouTube Series
Deep Learning with PyTorch: A 60 Minute Blitz
Learning PyTorch with Examples
What is torch.nn really?
Understanding requires_grad, retain_grad, Leaf, and Non-leaf Tensors
NLP from Scratch
Visualizing Models, Data, and Training with TensorBoard
A guide on good usage of non_blocking and pin_memory() in PyTorch
Data Loading Optimization in PyTorch
Visualizing Gradients
Compilers
Introduction to torch.compile
torch.compile End-to-End Tutorial
Compiled Autograd: Capturing a larger backward graph for torch.compile
Inductor CPU backend debugging and profiling
Dynamic Compilation Control with torch.compiler.set_stance
Demonstration of torch.export flow, common challenges and the solutions to address them
(beta) Compiling the optimizer with torch.compile
(beta) Running the compiled optimizer with an LR Scheduler
Using Variable Length Attention in PyTorch
Using User-Defined Triton Kernels with torch.compile
Compile Time Caching in torch.compile
Reducing torch.compile cold start compilation time with regional compilation
torch.export Tutorial
torch.export AOTInductor Tutorial for Python runtime (Beta)
Demonstration of torch.export flow, common challenges and the solutions to address them
Introduction to ONNX
Export a PyTorch model to ONNX
Extending the ONNX Exporter Operator Support
Export a model with control flow to ONNX
Building a Convolution/Batch Norm fuser with torch.compile
(beta) Building a Simple CPU Performance Profiler with FX
Domains
TorchVision Object Detection Finetuning Tutorial
Transfer Learning for Computer Vision Tutorial
Adversarial Example Generation
DCGAN Tutorial
Spatial Transformer Networks Tutorial
Reinforcement Learning (DQN) Tutorial
Reinforcement Learning (PPO) with TorchRL Tutorial
Train a Mario-playing RL Agent
Pendulum: Writing your environment and transforms with TorchRL
Introduction to TorchRec
Exploring TorchRec sharding
Distributed
PyTorch Distributed Overview
Distributed Data Parallel in PyTorch - Video Tutorials
Getting Started with Distributed Data Parallel
Writing Distributed Applications with PyTorch
Getting Started with Fully Sharded Data Parallel (FSDP2)
Introduction to Libuv TCPStore Backend
Large Scale Transformer model training with Tensor Parallel (TP)
Introduction to Distributed Pipeline Parallelism
Customize Process Group Backends Using Cpp Extensions
Getting Started with Distributed RPC Framework
Implementing a Parameter Server Using Distributed RPC Framework
Implementing Batch RPC Processing Using Asynchronous Executions
Interactive Distributed Applications with Monarch
Combining Distributed DataParallel with Distributed RPC Framework
Distributed Training with Uneven Inputs Using the Join Context Manager
Distributed training at scale with PyTorch and Ray Train
Deep Dive
Profiling your PyTorch Module
Parametrizations Tutorial
Pruning Tutorial
Inductor CPU backend debugging and profiling
(Beta) Implementing High-Performance Transformers with Scaled Dot Product Attention (SDPA)
Knowledge Distillation Tutorial
Channels Last Memory Format in PyTorch
Forward-mode Automatic Differentiation (Beta)
Jacobians, Hessians, hvp, vhp, and more: composing function transforms
Model ensembling
Per-sample-gradients
Using the PyTorch C++ Frontend
Autograd in C++ Frontend
Extension
PyTorch Custom Operators
Custom Python Operators
Custom C++ and CUDA Operators
Double Backward with Custom Functions
Fusing Convolution and Batch Norm using Custom Function
Registering a Dispatched Operator in C++
Extending dispatcher for a new backend in C++
Facilitating New Backend Integration by PrivateUse1
Ecosystem
Hyperparameter tuning using Ray Tune
Serve PyTorch models at scale with Ray Serve
Multi-Objective NAS with Ax
PyTorch Profiler With TensorBoard
Real Time Inference on Raspberry Pi 4 and 5 (40 fps!)
Mosaic: Memory Profiling for PyTorch
Distributed training at scale with PyTorch and Ray Train
More
Recipes
Unstable
Go to
pytorch.org
Ctrl+K
X
GitHub
Discourse
PyPi
v2.12.0+cu130
Intro
Learn the Basics
Introduction to PyTorch - YouTube Series
Deep Learning with PyTorch: A 60 Minute Blitz
Learning PyTorch with Examples
What is torch.nn really?
Understanding requires_grad, retain_grad, Leaf, and Non-leaf Tensors
NLP from Scratch
Visualizing Models, Data, and Training with TensorBoard
A guide on good usage of non_blocking and pin_memory() in PyTorch
Data Loading Optimization in PyTorch
Visualizing Gradients
Compilers
Introduction to torch.compile
torch.compile End-to-End Tutorial
Compiled Autograd: Capturing a larger backward graph for torch.compile
Inductor CPU backend debugging and profiling
Dynamic Compilation Control with torch.compiler.set_stance
Demonstration of torch.export flow, common challenges and the solutions to address them
(beta) Compiling the optimizer with torch.compile
(beta) Running the compiled optimizer with an LR Scheduler
Using Variable Length Attention in PyTorch
Using User-Defined Triton Kernels with torch.compile
Compile Time Caching in torch.compile
Reducing torch.compile cold start compilation time with regional compilation
torch.export Tutorial
torch.export AOTInductor Tutorial for Python runtime (Beta)
Demonstration of torch.export flow, common challenges and the solutions to address them
Introduction to ONNX
Export a PyTorch model to ONNX
Extending the ONNX Exporter Operator Support
Export a model with control flow to ONNX
Building a Convolution/Batch Norm fuser with torch.compile
(beta) Building a Simple CPU Performance Profiler with FX
Domains
TorchVision Object Detection Finetuning Tutorial
Transfer Learning for Computer Vision Tutorial
Adversarial Example Generation
DCGAN Tutorial
Spatial Transformer Networks Tutorial
Reinforcement Learning (DQN) Tutorial
Reinforcement Learning (PPO) with TorchRL Tutorial
Train a Mario-playing RL Agent
Pendulum: Writing your environment and transforms with TorchRL
Introduction to TorchRec
Exploring TorchRec sharding
Distributed
PyTorch Distributed Overview
Distributed Data Parallel in PyTorch - Video Tutorials
Getting Started with Distributed Data Parallel
Writing Distributed Applications with PyTorch
Getting Started with Fully Sharded Data Parallel (FSDP2)
Introduction to Libuv TCPStore Backend
Large Scale Transformer model training with Tensor Parallel (TP)
Introduction to Distributed Pipeline Parallelism
Customize Process Group Backends Using Cpp Extensions
Getting Started with Distributed RPC Framework
Implementing a Parameter Server Using Distributed RPC Framework
Implementing Batch RPC Processing Using Asynchronous Executions
Interactive Distributed Applications with Monarch
Combining Distributed DataParallel with Distributed RPC Framework
Distributed Training with Uneven Inputs Using the Join Context Manager
Distributed training at scale with PyTorch and Ray Train
Deep Dive
Profiling your PyTorch Module
Parametrizations Tutorial
Pruning Tutorial
Inductor CPU backend debugging and profiling
(Beta) Implementing High-Performance Transformers with Scaled Dot Product Attention (SDPA)
Knowledge Distillation Tutorial
Channels Last Memory Format in PyTorch
Forward-mode Automatic Differentiation (Beta)
Jacobians, Hessians, hvp, vhp, and more: composing function transforms
Model ensembling
Per-sample-gradients
Using the PyTorch C++ Frontend
Autograd in C++ Frontend
Extension
PyTorch Custom Operators
Custom Python Operators
Custom C++ and CUDA Operators
Double Backward with Custom Functions
Fusing Convolution and Batch Norm using Custom Function
Registering a Dispatched Operator in C++
Extending dispatcher for a new backend in C++
Facilitating New Backend Integration by PrivateUse1
Ecosystem
Hyperparameter tuning using Ray Tune
Serve PyTorch models at scale with Ray Serve
Multi-Objective NAS with Ax
PyTorch Profiler With TensorBoard
Real Time Inference on Raspberry Pi 4 and 5 (40 fps!)
Mosaic: Memory Profiling for PyTorch
Distributed training at scale with PyTorch and Ray Train
Recipes
Defining a Neural Network in PyTorch
(beta) Using TORCH_LOGS python API with torch.compile
What is a state_dict in PyTorch
Warmstarting model using parameters from a different model in PyTorch
Zeroing out gradients in PyTorch
PyTorch Profiler
Model Interpretability using Captum
How to use TensorBoard with PyTorch
Automatic Mixed Precision
Performance Tuning Guide
(beta) Compiling the optimizer with torch.compile
Timer quick start
Shard Optimizer States with ZeroRedundancyOptimizer
Getting Started with CommDebugMode
Demonstration of torch.export flow, common challenges and the solutions to address them
SyntaxError
Tips for Loading an nn.Module from a Checkpoint
Reasoning about Shapes in PyTorch
Extension points in nn.Module for load_state_dict and tensor subclasses
torch.export AOTInductor Tutorial for Python runtime (Beta)
How to use TensorBoard with PyTorch
(beta) Utilizing Torch Function modes with torch.compile
(beta) Running the compiled optimizer with an LR Scheduler
Explicit horizontal fusion with foreach_map and torch.compile
Using User-Defined Triton Kernels with torch.compile
Compile Time Caching in torch.compile
Compile Time Caching Configuration
Reducing torch.compile cold start compilation time with regional compilation
Reducing AoT cold start compilation time with regional compilation
Ease-of-use quantization for PyTorch with Intel® Neural Compressor
Getting Started with DeviceMesh
Getting Started with Distributed Checkpoint (DCP)
Asynchronous Saving with Distributed Checkpoint (DCP)
DebugMode: Recording Dispatched Operations and Numerical Debugging
Unstable
Introduction to Context Parallel
Flight Recorder for Debugging Stuck Jobs
TorchInductor C++ Wrapper Tutorial
How to use torch.compile on Windows CPU/XPU
torch.vmap
Getting Started with Nested Tensors
MaskedTensor Overview
MaskedTensor Sparsity
MaskedTensor Advanced Semantics
Efficiently writing “sparse” semantics for Adagrad with MaskedTensor
Autoloading Out-of-Tree Extension
Using Max-Autotune Compilation on CPU for Better Performance
Go to
pytorch.org
Ctrl+K
X
GitHub
Discourse
PyPi
Section Navigation
torch.compile
Introduction to
torch.compile
torch.compile
End-to-End Tutorial
Compiled Autograd: Capturing a larger backward graph for
torch.compile
Inductor CPU backend debugging and profiling
Dynamic Compilation Control with
torch.compiler.set_stance
Demonstration of torch.export flow, common challenges and the solutions to address them
(beta) Compiling the optimizer with torch.compile
(beta) Running the compiled optimizer with an LR Scheduler
Using Variable Length Attention in PyTorch
Using User-Defined Triton Kernels with
torch.compile
Compile Time Caching in
torch.compile
Reducing torch.compile cold start compilation time with regional compilation
torch.export
torch.export Tutorial
torch.export
AOTInductor Tutorial for Python runtime (Beta)
Demonstration of torch.export flow, common challenges and the solutions to address them
ONNX
Introduction to ONNX
Export a PyTorch model to ONNX
Extending the ONNX Exporter Operator Support
Export a model with control flow to ONNX
Code Transforms with FX
Building a Convolution/Batch Norm fuser with torch.compile
(beta) Building a Simple CPU Performance Profiler with FX
Compilers
Introduction...
Rate this Page
★
★
★
★
★