cuda::kernel::locate_features is the CUDA kernel that uses the fast
lookup table. Shared below is performance of the kernel using constant
memory vs texture memory. There is neglible to no difference between two
versions. Hence, shifted to texture memory LUT to reduce global constant
memory usage.
Performance using constant memory LUT
-------------------------------------
Time(%) Time Calls Avg Min Max Name
1.48% 101.09us 3 33.696us 32.385us 34.976us void cuda::kernel::locate_features<float, int=9>
1.34% 91.713us 2 45.856us 45.792us 45.921us void cuda::kernel::locate_features<double, int=9>
1.02% 69.505us 2 34.752us 34.400us 35.105us void cuda::kernel::locate_features<unsigned int, int=9>
0.99% 67.456us 2 33.728us 32.768us 34.688us void cuda::kernel::locate_features<int, int=9>
0.95% 65.186us 2 32.593us 31.201us 33.985us void cuda::kernel::locate_features<short, int=9>
0.93% 63.874us 2 31.937us 30.817us 33.057us void cuda::kernel::locate_features<unsigned short, int=9>
Performance using texture LUT
-----------------------------
Time(%) Time Calls Avg Min Max Name
1.45% 99.776us 3 33.258us 32.896us 33.504us void cuda::kernel::locate_features<float, int=9>
1.33% 91.105us 2 45.552us 44.961us 46.144us void cuda::kernel::locate_features<double, int=9>
1.02% 70.017us 2 35.008us 34.273us 35.744us void cuda::kernel::locate_features<unsigned int, int=9>
0.97% 66.689us 2 33.344us 32.065us 34.624us void cuda::kernel::locate_features<int, int=9>
0.95% 65.249us 2 32.624us 31.585us 33.664us void cuda::kernel::locate_features<short, int=9>
0.95% 65.025us 2 32.512us 30.945us 34.080us void cuda::kernel::locate_features<unsigned short, int=9>
There is negligible to no difference between texture based look-up table and constant memory look-up table. Hence, shifted to texture memory look-up table to reduce global constant memory usage.