Occupancy
hipError_t hipModuleOccupancyMaxPotentialBlockSize(int *gridSize,
int *blockSize,
hipFunction_t f,
size_t dynSharedMemPerBlk,
int blockSizeLimit)
determine the grid and block sizes to achieves maximum occupancy for a kernel
Please note, HIP does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.
Parameters
:
gridSize – [out] minimum grid size for maximum potential occupancy
blockSize – [out] block size for maximum potential occupancy
f – [in] kernel function for which occupancy is calculated
dynSharedMemPerBlk – [in] dynamic shared memory usage (in bytes) intended for each block
blockSizeLimit – [in] the maximum block size for the kernel, use 0 for no limit
Returns
:
hipSuccess, hipErrorInvalidValue
hipError_t hipModuleOccupancyMaxPotentialBlockSizeWithFlags(int *gridSize,
int *blockSize,
hipFunction_t f,
size_t dynSharedMemPerBlk,
int blockSizeLimit,
unsigned int flags)
determine the grid and block sizes to achieves maximum occupancy for a kernel
Please note, HIP does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.
Parameters
:
gridSize – [out] minimum grid size for maximum potential occupancy
blockSize – [out] block size for maximum potential occupancy
f – [in] kernel function for which occupancy is calculated
dynSharedMemPerBlk – [in] dynamic shared memory usage (in bytes) intended for each block
blockSizeLimit – [in] the maximum block size for the kernel, use 0 for no limit
flags – [in] Extra flags for occupancy calculation (only default supported)
Returns
:
hipSuccess, hipErrorInvalidValue
hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessor(int *numBlocks,
hipFunction_t f,
int blockSize,
size_t dynSharedMemPerBlk)
Returns occupancy for a device function.
Parameters
:
numBlocks – [out] Returned occupancy
f – [in] Kernel function (hipFunction) for which occupancy is calculated
blockSize – [in] Block size the kernel is intended to be launched with
dynSharedMemPerBlk – [in] Dynamic shared memory usage (in bytes) intended for each block
Returns
:
hipSuccess, hipErrorInvalidValue
hipError_t hipModuleOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int *numBlocks,
hipFunction_t f,
int blockSize,
size_t dynSharedMemPerBlk,
unsigned int flags)
Returns occupancy for a device function.
Parameters
:
numBlocks – [out] Returned occupancy
f – [in] Kernel function(hipFunction_t) for which occupancy is calculated
blockSize – [in] Block size the kernel is intended to be launched with
dynSharedMemPerBlk – [in] Dynamic shared memory usage (in bytes) intended for each block
flags – [in] Extra flags for occupancy calculation (only default supported)
Returns
:
hipSuccess, hipErrorInvalidValue
hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessor(int *numBlocks,
const void *f,
int blockSize,
size_t dynSharedMemPerBlk)
Returns occupancy for a device function.
Parameters
:
numBlocks – [out] Returned occupancy
f – [in] Kernel function for which occupancy is calculated
blockSize – [in] Block size the kernel is intended to be launched with
dynSharedMemPerBlk – [in] Dynamic shared memory usage (in bytes) intended for each block
Returns
:
hipSuccess, hipErrorInvalidDeviceFunction, hipErrorInvalidValue
hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int *numBlocks,
const void *f,
int blockSize,
size_t dynSharedMemPerBlk,
unsigned int flags)
Returns occupancy for a device function.
Parameters
:
numBlocks – [out] Returned occupancy
f – [in] Kernel function for which occupancy is calculated
blockSize – [in] Block size the kernel is intended to be launched with
dynSharedMemPerBlk – [in] Dynamic shared memory usage (in bytes) intended for each block
flags – [in] Extra flags for occupancy calculation (currently ignored)
Returns
:
hipSuccess, hipErrorInvalidDeviceFunction, hipErrorInvalidValue
hipError_t hipOccupancyMaxPotentialBlockSize(int *gridSize,
int *blockSize,
const void *f,
size_t dynSharedMemPerBlk,
int blockSizeLimit)
determine the grid and block sizes to achieves maximum occupancy for a kernel
Please note, HIP does not support kernel launch with total work items defined in dimension with size gridDim x blockDim >= 2^32.
Parameters
:
gridSize – [out] minimum grid size for maximum potential occupancy
blockSize – [out] block size for maximum potential occupancy
f – [in] kernel function for which occupancy is calculated
dynSharedMemPerBlk – [in] dynamic shared memory usage (in bytes) intended for each block
blockSizeLimit – [in] the maximum block size for the kernel, use 0 for no limit
Returns
:
hipSuccess, hipErrorInvalidValue
hipError_t hipOccupancyAvailableDynamicSMemPerBlock(size_t *dynamicSmemSize,
const void *f,
int numBlocks,
int blockSize)
Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
Returns in *dynamicSmemSize the maximum size of dynamic shared memory / to allow numBlocks blocks per SM.
Parameters
:
dynamicSmemSize – [out] Returned maximum dynamic shared memory.
f – [in] Kernel function for which occupancy is calculated.
numBlocks – [in] Number of blocks to fit on SM
blockSize – [in] Size of the block
Returns
:
hipSuccess, hipErrorInvalidDevice, hipErrorInvalidDeviceFunction, hipErrorInvalidValue, hipErrorUnknown
hipError_t hipOccupancyMaxActiveClusters(int *numClusters,
const void *f,
const hipLaunchConfig_t *config)
determines the amount of active kernel clusters can co-exist at the same time in a device
Parameters
:
numClusters – [out] the amount of clusters
f – [in] kernel function for which occupancy is calculated
config – [in] pointer to the kernel launch configuration structure
Returns
:
hipSuccess, hipErrorInvalidDeviceFunction, hipErrorInvalidClusterSize, hipErrorInvalidValue
hipError_t hipOccupancyMaxPotentialClusterSize(int *clusterSize,
const void *f,
const hipLaunchConfig_t *config)
returns the maximum cluster size (in number of blocks) that can run on the device
Parameters
:
clusterSize – [out] the maximum cluster size
f – [in] kernel function for which occupancy is calculated
config – [in] pointer to the kernel launch configuration structure
Returns
:
hipSuccess, hipErrorInvalidDeviceFunction, hipErrorInvalidClusterSize, hipErrorInvalidValue
template<class T>inline hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessor(int *numBlocks,
T f,
int blockSize,
size_t dynSharedMemPerBlk)
Returns occupancy for a kernel function.
Parameters
:
numBlocks – [out] - Pointer of occupancy in number of blocks.
f – [in] - The kernel function to launch on the device.
blockSize – [in] - The block size as kernel launched.
dynSharedMemPerBlk – [in] - Dynamic shared memory in bytes per block.
Returns
:
hipSuccess, hipErrorInvalidValue
template<class T>inline hipError_t hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags(int *numBlocks,
T f,
int blockSize,
size_t dynSharedMemPerBlk,
unsigned int flags)
Returns occupancy for a device function with the specified flags.
Parameters
:
numBlocks – [out] - Pointer of occupancy in number of blocks.
f – [in] - The kernel function to launch on the device.
blockSize – [in] - The block size as kernel launched.
dynSharedMemPerBlk – [in] - Dynamic shared memory in bytes per block.
flags – [in] - Flag to handle the behavior for the occupancy calculator.
Returns
:
hipSuccess, hipErrorInvalidValue
template<typename UnaryFunction, class T>static inline hipError_t hipOccupancyMaxPotentialBlockSizeVariableSMemWithFlags(int *min_grid_size,
int *block_size,
T func,
UnaryFunction block_size_to_dynamic_smem_size,
int block_size_limit = 0,
unsigned int flags = 0)
Returns grid and block size that achieves maximum potential occupancy for a device function.
Returns in *min_grid_size and *block_size a suggested grid / block size pair that achieves the best potential occupancy (i.e. the maximum number of active warps on the current device with the smallest number of blocks for a particular function).
Parameters
:
min_grid_size – [out] minimum grid size needed to achieve the best potential occupancy
block_size – [out] block size required for the best potential occupancy
func – [in] device function symbol
block_size_to_dynamic_smem_size – [in] - a unary function/functor that takes block size, and returns the size, in bytes, of dynamic shared memory needed for a block
block_size_limit – [in] the maximum block size func is designed to work with. 0 means no limit.
flags – [in] reserved
Returns
:
hipSuccess, hipErrorInvalidDevice, hipErrorInvalidDeviceFunction, hipErrorInvalidValue, hipErrorUnknown
template<typename UnaryFunction, class T>static inline hipError_t hipOccupancyMaxPotentialBlockSizeVariableSMem(int *min_grid_size,
int *block_size,
T func,
UnaryFunction block_size_to_dynamic_smem_size,
int block_size_limit = 0)
Returns grid and block size that achieves maximum potential occupancy for a device function.
Returns in *min_grid_size and *block_size a suggested grid / block size pair that achieves the best potential occupancy (i.e. the maximum number of active warps on the current device with the smallest number of blocks for a particular function).
Parameters
:
min_grid_size – [out] minimum grid size needed to achieve the best potential occupancy
block_size – [out] block size required for the best potential occupancy
func – [in] device function symbol
block_size_to_dynamic_smem_size – [in] - a unary function/functor that takes block size, and returns the size, in bytes, of dynamic shared memory needed for a block
block_size_limit – [in] the maximum block size func is designed to work with. 0 means no limit.
Returns
:
hipSuccess, hipErrorInvalidDevice, hipErrorInvalidDeviceFunction, hipErrorInvalidValue, hipErrorUnknown
template<typename F>inline hipError_t hipOccupancyMaxPotentialBlockSize(int *gridSize,
int *blockSize,
F kernel,
size_t dynSharedMemPerBlk,
uint32_t blockSizeLimit)
Returns grid and block size that achieves maximum potential occupancy for a device function.
Returns in *min_grid_size and *block_size a suggested grid / block size pair that achieves the best potential occupancy (i.e. the maximum number of active warps on the current device with the smallest number of blocks for a particular function).
Returns
:
hipSuccess, hipErrorInvalidDevice, hipErrorInvalidValue
template<typename F>inline hipError_t hipOccupancyAvailableDynamicSMemPerBlock(size_t *dynamicSmemSize,
F f,
int numBlocks,
int blockSize)
Returns dynamic shared memory available per block when launching numBlocks blocks on SM.
Returns in *dynamicSmemSize the maximum size of dynamic shared memory / to allow numBlocks blocks per SM.
Parameters
:
dynamicSmemSize – [out] Returned maximum dynamic shared memory.
f – [in] Kernel function for which occupancy is calculated.
numBlocks – [in] Number of blocks to fit on SM
blockSize – [in] Size of the block
Returns
:
hipSuccess, hipErrorInvalidDevice, hipErrorInvalidDeviceFunction, hipErrorInvalidValue, hipErrorUnknown