← 返回首页
JIT optimization for sequential casts that are idempotent by jacobkahn · Pull Request #3031 · arrayfire/arrayfire · GitHub
Skip to content

Navigation Menu

Toggle navigation
Sign in
Appearance settings
Search or jump to...

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
Resetting focus

JIT optimization for sequential casts that are idempotent#3031

Merged
umar456 merged 4 commits into
arrayfire:masterfrom
jacobkahn:as_jit_optimization
Apr 6, 2022
Merged

JIT optimization for sequential casts that are idempotent#3031
umar456 merged 4 commits into
arrayfire:masterfrom
jacobkahn:as_jit_optimization

Conversation

Copy link
Copy Markdown
Contributor

jacobkahn commented Oct 23, 2020
edited
Loading

Adds a JIT optimization which does a NOOP in the case of sequential cases that don't result in a differently-typed result.

Description

The following code is technically a noop:

af::array a = af::randu(10, 1, 1, 1, af::dtype::f32); af::array b = a.as(af::dtype::f64); af::array c = b.as(af::dtype::f32);

No casting kernels should be generated for any of the above operations, especially for c, but they are. The solution here is to, when creating the CastOp/CastWrapper for c, to check to see if the previous operation was a cast. If it was, and the previous operation's previous operation's output type is the same output type as the current cast, create a __noop node between the prev-prev operation and the current one.

This also precludes tricky cases like:

af::array a = af::randu(10, 1, 1, 1, af::dtype::f32); af::array b = a.as(af::dtype::f64); af::array d = b + 2; af::array c = b.as(af::dtype::f32); c.eval(); d.eval();

where the result of b could be used, in which case the intermediate casting operation can't be discounted completely.

With the change, running AF_JIT_KERNEL_TRACE=stderr ./test/cast_cuda --gtest_filter="*Test_JIT_DuplicateCastNoop" produces the following kernels:

Before the change, the generated kernel used wasteful casts:

This PR also adds op, type, and children accessors to Node/NaryNode to facilitate inspecting the JIT tree for optimization.

Further optimization could be had by recursively checking previous operations until an operation has no previous operations are casts - this would fix arbitrarily long chains of casts that were noops on a particular subtree of JIT operations.

Changes to Users

No changes to user behavior.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • Functions added to unified API
  • Functions documented

Copy link
Copy Markdown
Member

umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason Spam Abuse Off Topic Outdated Duplicate Resolved Low Quality Hide comment

Thanks for sending this in. I made a couple of suggestions which are required to get the OpenCL backend working again.

Comment thread src/backend/cuda/cast.hpp Outdated Show resolved Hide resolved
Comment thread src/backend/opencl/cast.hpp Outdated Show resolved Hide resolved
Comment thread src/backend/cuda/cast.hpp Outdated Show resolved Hide resolved
jacobkahn force-pushed the as_jit_optimization branch from 654f335 to 7509994 Compare October 28, 2020 01:02
jacobkahn requested a review from umar456 October 28, 2020 01:03
Copy link
Copy Markdown
Member

umar456 commented Oct 29, 2020

There is a invalid read access error with the sparse test in the CUDA backend. I am able to reproduce it using the following code:

TEST(Cast, abs) { using namespace af; array a = randu(100, 100, f64); array b = a.as(f32); array c = max<double>(abs(a - b)); }

There is something odd going on with the implicit casts in the subtraction operation. It looks like the buffer object's shape is not set during the conversion. I am not sure where this is happening and I am investigating it.

Copy link
Copy Markdown
Contributor Author

@umar456 any update on this? Can I help in any way?

umar456 force-pushed the as_jit_optimization branch 2 times, most recently from a9171ff to 3663038 Compare June 8, 2021 18:13
Copy link
Copy Markdown
Member

umar456 commented Jun 8, 2021

@jacobkahn
I fixed the issues I referenced in my previous comment. I am thinking of a couple of scenarios where this optimization could be an issue. For example, what if we cast a floating point type to an integer type and back to float. This should floor all the values in the array but that may not happen with this change. Do you think it would be a good idea to perform this operation in non-destructive casting operations?

We can limit this optimization to casts between integer types or floating point types. This way it behaves like C++ types.

Alternatively, we could keep the current behavior and allow destructive casts and expect the user to use functions like floor or ceil to get the same behavior.

Copy link
Copy Markdown
Contributor Author

@umar456 — revisiting this after some time — I think for now, destructive casts that emulate floor/ceiling operations probably aren't good to implicitly-implement. The casts that I think are more interesting to optimize away are casts between similar types with different precisions — f16 <> f32 <> f64 or u32 <> u64, etc. While some of these casts are destructive, there isn't really an operation to emulate them. A user who casts f32 --> f16 --> f32 almost certainly isn't doing so to intentionally lose precision.

Thoughts?

jacobkahn and others added 3 commits March 31, 2022 01:59
The cast optimization removes the previous node in the AST but doesn't update the returned Array's ready flag in case it is being replaced by a buffer. This caused an error in the CUDA backend under certain scenarios. Added additional tests and logging for testing and debugging
umar456 force-pushed the as_jit_optimization branch from 3663038 to fde4d44 Compare March 31, 2022 05:59
umar456 force-pushed the as_jit_optimization branch from 8f93e72 to 6f76038 Compare April 4, 2022 23:02
Copy link
Copy Markdown
Member

umar456 commented Apr 5, 2022

I modified the PR to remove the cast in limited scenarios. Currently I am removing the intermediate casts for casts between floating point values. Floating point to integer types are not removed and integer casts from larger to smaller types are not removed. Here are all the combinations of casts that will be removed. The x cell indicates that the cast will be removed if the left hand column type is the outer type and the top row is the inner type. for example outer -> inner -> outer

| inner-> | f32 | f64 | c32 | c64 | s32 | u32 | u8 | b8 | s64 | u64 | s16 | u16 | f16 | |---------|-----|-----|-----|-----|-----|-----|----|----|-----|-----|-----|-----|-----| | f32 | x | x | x | x | | | | | | | | | x | | f64 | x | x | x | x | | | | | | | | | x | | c32 | x | x | x | x | | | | | | | | | x | | c64 | x | x | x | x | | | | | | | | | x | | s32 | x | x | x | x | x | x | | | x | x | | | x | | u32 | x | x | x | x | x | x | | | x | x | | | x | | u8 | x | x | x | x | x | x | x | x | x | x | x | x | x | | b8 | x | x | x | x | x | x | x | x | x | x | x | x | x | | s64 | x | x | x | x | | | | | x | x | | | x | | u64 | x | x | x | x | | | | | x | x | | | x | | s16 | x | x | x | x | x | x | | | x | x | x | x | x | | u16 | x | x | x | x | x | x | | | x | x | x | x | x | | f16 | x | x | x | x | | | | | | | | | x |

umar456 merged commit 5b2e8ea into arrayfire:master Apr 6, 2022
jacobkahn deleted the as_jit_optimization branch April 6, 2022 18:36
umar456 pushed a commit to umar456/arrayfire that referenced this pull request Apr 21, 2022
…3031) Adds a JIT optimization which removes sequential casts in cases that don't result in a differently-typed result. This commit removes the following casts: * Casts for conversions between any floating point types. * Casts from smaller integer types to larger integer type and back Following casts are NOT removed * Floating point to integer types and back * Integer types from larger types to smaller types and back Casts can be forced by calling eval on the casted intermediate array
umar456 pushed a commit that referenced this pull request Apr 22, 2022
Adds a JIT optimization which removes sequential casts in cases that don't result in a differently-typed result. This commit removes the following casts: * Casts for conversions between any floating point types. * Casts from smaller integer types to larger integer type and back Following casts are NOT removed * Floating point to integer types and back * Integer types from larger types to smaller types and back Casts can be forced by calling eval on the casted intermediate array
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Footer

© 2026 GitHub, Inc.