Once CUPTI Python is installed, the CUPTI samples are located under the site-packages/cupti-python-samples directory. You can determine the location of your site-packages directory by executing the following command:
The CUPTI Python Numba samples require the numba-cuda package along with the dependencies for CUDA 13.0. You can install numba-cuda using the following command:
The CuptiVectorAdd* samples have a simple code which does element by element vector addition.
CUPTI Python sample which shows use of CUPTI Activity APIs. This sample uses numba-cuda.
Enable CUPTI based profiling. Default: OFF
--output, -o OUTPUT_TYPESelect the profiler output format. OUTPUT_TYPE can be: brief, detailed, or none. Default: brief
--help, -hShows the usage.
CUPTI Python sample which shows use of CUPTI Callback APIs. This sample uses numba-cuda.
Enable CUPTI based profiling. Default: OFF
--output, -o OUTPUT_TYPESelect the profiler output format. OUTPUT_TYPE can be: brief, detailed, or none. Default: brief
--help, -hShows the usage.
CUPTI Python sample which shows use of CUPTI Activity APIs. This sample uses CUDA Python Driver APIs from cuda-bindings. It also shows how to use CUDA profiler start and stop APIs to define the range of code to be profiled.
This sample uses NVRTC (NVIDIA Runtime Compilation) to compile CUDA kernel code to PTX at runtime. The sample demonstrates:
Using cuda.bindings.nvrtc to compile CUDA kernel source code to PTX
Using cuda.bindings.driver APIs to load the PTX module and launch kernels
Using CUPTI Activity APIs to profile the CUDA operations
For ensuring cuda-bindings is set up correctly along with the necessary CUDA Toolkit (CTK) components (including NVRTC), please refer to the cuda-bindings runtime requirements documentation.
Enable CUPTI based profiling. Default: OFF
--define-profile-range, -rInclude CUDA profiler start and stop APIs to define the range of code to be profiled. Default: OFF
--output, -o OUTPUT_TYPESelect the profiler output format. OUTPUT_TYPE can be: brief, detailed, or none. Default: brief
--help, -hShows the usage.
CUPTI Python sample which shows how to profile a CUDA Python application using the CUPTI Python APIs without having to modify the CUDA Python application code. This sample shows use of CUPTI Activity APIs and Callback APIs. It also shows how to profile a range of code for a CUDA Python application which uses CUDA profiler start and stop APIs.
usage: cupyprof.py [-h] [-p {from_start|range}] [-a <activities>] [-o {brief|detailed|none}] <python_file_path> [args]
Shows the usage.
--profile, -p PROFILING_TYPEEnable profiling for entire CUDA python program, or only for the subset between cuProfilerStart and cuProfilerStop. PROFILING_TYPE can be : from_start or range. Default: from_start
--activity, -a <comma separated list of activities>Use --help to view the list of supported activities. To know which activities are enabled by default, see default_activity_choices in cupyprof.py.
--output, -o OUTPUT_TYPESelect the profiler output format. OUTPUT_TYPE can be: brief, detailed, or none. Default: brief
python_file_path is the path to the CUDA Python application, and args are the arguments for the python application.
Run the sample without profiling:
Run the sample with profiling enabled and use default output:
Using the cupyprof.py sample to profile a CUDA Python application with profiling range defined and with detailed output: