← 返回首页
Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage by 9prady9 · Pull Request #2791 · arrayfire/arrayfire · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage#2791

Merged
9prady9 merged 2 commits into
arrayfire:masterfrom
9prady9:lut_move
Mar 14, 2020
Merged

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage#2791
9prady9 merged 2 commits into
arrayfire:masterfrom
9prady9:lut_move

Conversation

Copy link
Copy Markdown
Member

9prady9 commented Mar 12, 2020

There is negligible to no difference between texture based look-up table and constant memory look-up table. Hence, shifted to texture memory look-up table to reduce global constant memory usage.

9prady9 added this to the v3.7.1 milestone Mar 12, 2020
9prady9 requested a review from umar456 March 12, 2020 10:14
9prady9 force-pushed the lut_move branch 2 times, most recently from afc0ff8 to e0a0307 Compare March 13, 2020 16:11
Comment thread src/backend/cuda/texture.hpp Show resolved Hide resolved
class LookupTable1D {
public:
LookupTable1D() = delete;
LookupTable1D(const LookupTable1D& arg) = delete;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason Spam Abuse Off Topic Outdated Duplicate Resolved Low Quality Hide comment

It makes sense to have a copy constructor for this right? You can copy a texture if you need it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason Spam Abuse Off Topic Outdated Duplicate Resolved Low Quality Hide comment

Then we have to take care of how texture object is copied also. Just handle copy won't work. Move operation wont have issues though. We don't need neither for these use cases.

pradeep added 2 commits March 13, 2020 23:30
cuda::kernel::locate_features is the CUDA kernel that uses the fast lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 1.48% 101.09us 3 33.696us 32.385us 34.976us void cuda::kernel::locate_features<float, int=9> 1.34% 91.713us 2 45.856us 45.792us 45.921us void cuda::kernel::locate_features<double, int=9> 1.02% 69.505us 2 34.752us 34.400us 35.105us void cuda::kernel::locate_features<unsigned int, int=9> 0.99% 67.456us 2 33.728us 32.768us 34.688us void cuda::kernel::locate_features<int, int=9> 0.95% 65.186us 2 32.593us 31.201us 33.985us void cuda::kernel::locate_features<short, int=9> 0.93% 63.874us 2 31.937us 30.817us 33.057us void cuda::kernel::locate_features<unsigned short, int=9> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 1.45% 99.776us 3 33.258us 32.896us 33.504us void cuda::kernel::locate_features<float, int=9> 1.33% 91.105us 2 45.552us 44.961us 46.144us void cuda::kernel::locate_features<double, int=9> 1.02% 70.017us 2 35.008us 34.273us 35.744us void cuda::kernel::locate_features<unsigned int, int=9> 0.97% 66.689us 2 33.344us 32.065us 34.624us void cuda::kernel::locate_features<int, int=9> 0.95% 65.249us 2 32.624us 31.585us 33.664us void cuda::kernel::locate_features<short, int=9> 0.95% 65.025us 2 32.512us 30.945us 34.080us void cuda::kernel::locate_features<unsigned short, int=9>
cuda::kernel::extract_orb is the CUDA kernel that uses the orb lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 3.02% 292.26us 24 12.177us 11.360us 14.528us void cuda::kernel::extract_orb<float> 2.16% 209.00us 16 13.062us 11.616us 16.033us void cuda::kernel::extract_orb<double> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 2.84% 270.63us 24 11.276us 9.6970us 15.040us void cuda::kernel::extract_orb<float> 2.20% 209.28us 16 13.080us 10.688us 16.960us void cuda::kernel::extract_orb<double>
9prady9 merged commit 0d61c6f into arrayfire:master Mar 14, 2020
9prady9 deleted the lut_move branch March 14, 2020 04:54
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Footer

© 2026 GitHub, Inc.