|
|
||
| \note NaN values are ignored | ||
| */ | ||
| AFAPI void max(array &val, array &idx, const array &in, const int dim, const array &ragged_len); |
There was a problem hiding this comment.
Opinion: I think ragged_len should come in after the in array.
Sorry, something went wrong.
| } | ||
|
|
||
| template<af_op_t op> | ||
| static af_err rreduce_common(af_array *val, af_array *idx, const af_array in, |
There was a problem hiding this comment.
Will an overload to reduce_common not work?
Sorry, something went wrong.
|
|
||
| template<af_op_t op, typename T> | ||
| void rreduce(Array<T> &out, Array<uint> &loc, const Array<T> &in, | ||
| const int dim, const Array<uint> &rlen) { |
There was a problem hiding this comment.
Is this function still needed? It looks like you can combine this with ireduce.
Sorry, something went wrong.
|
@syurkevi sounds good: Though f16 not really faster than f32: Could you please just add a minimalist bench like this one: |
Sorry, something went wrong.
|
@syurkevi could you please at least rebase for me to see what needs to be finished and advise accordingly ? |
Sorry, something went wrong.
|
@syurkevi I took care of rebase from latest master. If you are adding more ragged functions and need to touch the ireduce kernel. You can find the kernels in the file src/backend/cuda/kernel/ireduce.cuh and the kernel wrappers inside src/backend/cuda/kernel/ireduce.hpp. The backend source file is src/backend/cuda/ireduce.cpp and not ireduce.cu. If you face any issues while editing/adding new things to kernels, please ping me. I can guide you through the nvrtc related changes. |
Sorry, something went wrong.
Addresses #2782 .
TODO: