|
Took me a bit to figure out the problem but I see the issue now. The we can ignore the errors in the CI because they are not related. I will test it on a couple of other systems before merge this PR. Thank you for your contribution! |
Sorry, something went wrong.
Description
Without the barrier at the end of barrierOR, it is possible for work-item 0 to start the next loop iteration and update predicates[0] while other work-items are still inside barrierOR reading predicates, meaning they read the next loop iteration's exit condition. This results in a divergent loop, where not all work-items reach the same barriers.
A previous fix identified this as a problem only on NVIDIA platforms, but strictly speaking a barrier is required in all cases to avoid a spec violation and undefined behaviour.
Changes to Users
The kernel should produce correct results on more OpenCL implementations.
Locally I tested both Intel(R) FPGA Emulation Device and various oneAPI Construction Kit devices, which all previously failed the confidence_connected_opencl --gtest_filter="SingleSeed/ConfidenceConnectedDataTest.SegmentARegion/_prefix_background_radius_0_multiplier_1_iterations_5_replace_255" unit test.
I'm unable to test other OpenCL implementations, sorry.
Checklist