← 返回首页
OPT: Optimize indexing using dynamic thread block sizes. by umar456 · Pull Request #3111 · arrayfire/arrayfire · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

OPT: Optimize indexing using dynamic thread block sizes.#3111

Merged
9prady9 merged 1 commit into
arrayfire:masterfrom
umar456:index_opt
Mar 23, 2021
Merged

OPT: Optimize indexing using dynamic thread block sizes.#3111
9prady9 merged 1 commit into
arrayfire:masterfrom
umar456:index_opt

Conversation

Copy link
Copy Markdown
Member

umar456 commented Mar 20, 2021

This optimization dynamically sets the block size based on the output array
dimension. Originally we had a block size of 32x8 threads per block. This
configuration was not ideal when indexing into a long array where you
had few columns and many rows. The current approach creates blocks of
256x1, 128x2, 64x4 and 32,8 to better accommodate smaller dimensions.

Description

This optimization dynamically sets the block size based on the output array
dimension. Originally we had a block size of 32x8 threads per block. This
configuration was not ideal when indexing into a long array where you
had few columns and many rows. The current approach creates blocks of
256x1, 128x2, 64x4 and 32,8 to better accommodate smaller dimensions.

Changes to Users

None

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • [ ] Functions added to unified API
  • [ ] Functions documented

umar456 added the perf label Mar 20, 2021
umar456 requested a review from 9prady9 March 20, 2021 04:09
Copy link
Copy Markdown
Member

9prady9 commented Mar 20, 2021

What is the speedup range ?

Copy link
Copy Markdown
Member Author

umar456 commented Mar 20, 2021
edited
Loading

TEST(Index, blah) { array a = randu(20000000, 10); array idx = seq(20000000); array b = a(idx); b.eval(); af::sync(); }

Master:

Time(%) Total Time (ns) Instances Average Minimum Maximum Name ------- --------------- --------- ------------ ---------- ---------- ---------------------------------------------------------------------------------------------------- 49.7 10,107,916 1 10,107,916.0 10,107,916 10,107,916 void cuda::index<float>(cuda::Param<float>, cuda::CParam<float>, cuda::AssignKernelParam, int, int) 38.4 7,811,232 1 7,811,232.0 7,811,232 7,811,232 void cuda::kernel::uniformPhilox<float>(float*, unsigned int, unsigned int, unsigned int, unsigned … 8.1 1,645,554 1 1,645,554.0 1,645,554 1,645,554 KER9745534647381087054 3.9 791,546 1 791,546.0 791,546 791,546 void cuda::range<float>(cuda::Param<float>, int, int, int)

This PR:

Time(%) Total Time (ns) Instances Average Minimum Maximum Name ------- --------------- --------- ----------- --------- --------- ---------------------------------------------------------------------------------------------------- 60.6 7,809,308 1 7,809,308.0 7,809,308 7,809,308 void cuda::kernel::uniformPhilox<float>(float*, unsigned int, unsigned int, unsigned int, unsigned … 20.3 2,615,049 1 2,615,049.0 2,615,049 2,615,049 void cuda::index<float>(cuda::Param<float>, cuda::CParam<float>, cuda::AssignKernelParam, int, int) 12.8 1,646,066 1 1,646,066.0 1,646,066 1,646,066 KER9745534647381087054 6.3 812,761 1 812,761.0 812,761 812,761 void cuda::range<float>(cuda::Param<float>, int, int, int)

3.8x faster

This optimization dynamically sets the block size based on the output array dimension. Originally we had a block size of 32x8 threads per block. This configuration was not ideal when indexing into a long array where you had few columns and many rows. The current approach creates blocks of 256x1, 128x2, 64x4 and 32x8 to better accommodate smaller dimensions.
9prady9 merged commit d56c3bc into arrayfire:master Mar 23, 2021
9prady9 deleted the index_opt branch March 23, 2021 02:59
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Footer

© 2026 GitHub, Inc.