[RNG] Follow-up improvements for Philox RNG engine#2619
[RNG] Follow-up improvements for Philox RNG engine#2619ElenaTyuleneva wants to merge 11 commits intomainfrom
Conversation
* Merged odd and even constants arrays generation into one method with parity parameter
There was a problem hiding this comment.
Pull request overview
Follow-up refinement to oneDPL RNG engines (notably philox_engine) to address post-merge feedback from #2603, focusing on constant handling and deterministic initialization for vector return paths.
Changes:
- Refactors
philox_engineconstant unpacking into a singleget_consts_by_indices()helper and cleans up some internal naming. - Ensures deterministic initialization of partially-filled
sycl::vecresults by explicitly constructing result vectors from0in portion-generation paths. - Applies the same explicit-zero construction pattern across several RNG engines’ internal generation helpers.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| include/oneapi/dpl/internal/random_impl/subtract_with_carry_engine.h | Explicitly initializes vector result buffers to zero in internal generation paths. |
| include/oneapi/dpl/internal/random_impl/philox_engine.h | Unifies constant extraction, renames internal masks, and zero-initializes vector result buffers. |
| include/oneapi/dpl/internal/random_impl/linear_congruential_engine.h | Zero-initializes partial vector results returned by result_portion_internal(). |
| include/oneapi/dpl/internal/random_impl/discard_block_engine.h | Zero-initializes vector result buffers in internal generation paths. |
Comments suppressed due to low confidence (3)
include/oneapi/dpl/internal/random_impl/discard_block_engine.h:231
- generate_internal() assigns to every lane of _res in all control-flow paths (either by whole-vector assignment from engine() or by filling each element in the loop). Initializing __res with result_type __res(0) is therefore unnecessary and can be a small performance regression. Consider default-constructing here and reserving explicit zero-initialization for the partial-result overload that doesn’t write all lanes.
result_type __res(0);
if (static_cast<::std::size_t>(_N) < (used_block - n_))
{
__res = engine_();
n_ += static_cast<::std::size_t>(_N);
}
else
{
for (int __i = 0; __i < _N; ++__i)
{
__res[__i] = generate_internal_scalar<internal::type_traits_t<result_type>::num_elems>();
}
}
return __res;
include/oneapi/dpl/internal/random_impl/philox_engine.h:334
- This generate_internal() overload writes every element of __loc_result in the loop, so initializing it with result_type __loc_result(0) is redundant and adds extra per-call work for sycl::vec outputs. Consider default-constructing here, and keep explicit zero-initialization only in the overload that returns a portion (where unwritten lanes must be defined).
result_type __loc_result(0);
for (int __elm_count = 0; __elm_count < _N; ++__elm_count)
{
++state_.idx;
// check if buffer is empty
if (state_.idx == word_count)
{
philox_kernel();
increase_counter_internal();
state_.idx = 0;
}
__loc_result[__elm_count] = state_.Y[state_.idx];
}
return __loc_result;
include/oneapi/dpl/internal/random_impl/subtract_with_carry_engine.h:220
- In generate_internal() for vector outputs, __res is fully assigned in the following loop, so zero-initializing it with result_type __res(0) is redundant and adds extra work (sycl::vec scalar-ctor will write all lanes). Consider reverting this to default construction here and keep explicit zero-initialization only in the “portion” generation path where not all lanes are written.
result_type __res(0);
for (int __i = 0; __i < _N; ++__i)
{
__res[__i] = generate_internal_scalar();
}
return __res;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| __even = 0, | ||
| __odd = 1 |
There was a problem hiding this comment.
I would suggest a small renaming to
| __even = 0, | |
| __odd = 1 | |
| __even_indices = 0, | |
| __odd_indices = 1 |
together with renaming below
There was a problem hiding this comment.
Done by the latest changes.
| template <std::size_t _Offset, std::size_t... _Is> | ||
| static constexpr auto | ||
| get_odd_element_array(std::array<scalar_type, _n> __input_array, std::index_sequence<_Is...>) | ||
| get_consts_by_indices(std::index_sequence<_Is...>) |
There was a problem hiding this comment.
| get_consts_by_indices(std::index_sequence<_Is...>) | |
| get_consts_by(std::index_sequence<_Is...>) |
There was a problem hiding this comment.
Thanks for the proposal, applied!
| static constexpr auto | ||
| get_even_element_array(std::array<scalar_type, _n> __input_array, std::index_sequence<_Is...>) | ||
| /* Method for unpacking variadic of constants into two arrays - with odd and even elements */ | ||
| enum |
There was a problem hiding this comment.
| enum | |
| enum class __indices_offset : std::size_t |
Or whatever syntax is correct.
There was a problem hiding this comment.
Thanks, the enum approach was sligthly changed.
There was a problem hiding this comment.
Spec says:
template<class UIntType, size_t w, size_t n, size_t r, UIntType... consts>Please add uglification
template <typename _UIntType, std::size_t _W, std::size_t _N, std::size_t _R,
oneapi::dpl::internal::element_type_t<_UIntType>... _Consts>There was a problem hiding this comment.
Thanks! The uglification was fixed in 7c2e0cd.
There was a problem hiding this comment.
Uglify this sth like
template <typename _CharT, typename _Traits, typename _UIntType, std::size_t _W, std::size_t _N,
std::size_t _R, oneapi::dpl::internal::element_type_t<_UIntType>... _Consts>| std::array<scalar_type, word_count> __X; // counters | ||
| std::array<scalar_type, word_count / 2> __K; // keys | ||
| std::array<scalar_type, word_count> __Y; // results | ||
| scalar_type __idx; // index |
There was a problem hiding this comment.
| std::array<scalar_type, word_count> __X; // counters | |
| std::array<scalar_type, word_count / 2> __K; // keys | |
| std::array<scalar_type, word_count> __Y; // results | |
| scalar_type __idx; // index | |
| std::array<scalar_type, word_count> __x; // counters | |
| std::array<scalar_type, word_count / 2> __k; // keys | |
| std::array<scalar_type, word_count> __y; // results | |
| scalar_type __idx; // index |
| scalar_type __V0 = __state.__X[0]; | ||
| scalar_type __V1 = __state.__X[1]; | ||
| scalar_type __K0 = __state.__K[0]; |
There was a problem hiding this comment.
| scalar_type __V0 = __state.__X[0]; | |
| scalar_type __V1 = __state.__X[1]; | |
| scalar_type __K0 = __state.__K[0]; | |
| scalar_type __v0 = __state.__x[0]; | |
| scalar_type __v1 = __state.__k[1]; | |
| scalar_type __k0 = __state.__k[0]; |
| scalar_type __V1 = state_.X[1]; | ||
| scalar_type __K0 = state_.K[0]; | ||
| scalar_type __V0 = __state.__X[0]; | ||
| scalar_type __V1 = __state.__X[1]; |
There was a problem hiding this comment.
Minor: Consider using aliases to x[0], x[1]
There was a problem hiding this comment.
Implemented, thanks for the idea!
| scalar_type __V0 = __state.__X[2]; | ||
| scalar_type __V3 = __state.__X[3]; | ||
| scalar_type __K0 = __state.__K[0]; | ||
| scalar_type __K1 = __state.__K[1]; |
There was a problem hiding this comment.
if y1, x1, y0 are not used below it can be reused (with aliases if it's better). It can affect performance due to high registers' pressure
There was a problem hiding this comment.
Reduced the amount of temporary registers from 11 to 7 in this function (ca2ff49).
There was a problem hiding this comment.
Pull request overview
This PR is a follow-up cleanup for the Philox RNG engine implementation, addressing review feedback from #2603 by tightening internal naming/uglification, simplifying constants unpacking, and adjusting internal kernels to reduce register pressure.
Changes:
- Uglified Philox private members/methods and updated related friend/operator code accordingly.
- Unified even/odd constants extraction into a single
__get_consts_by()helper. - Refactored
__philox_kernel()/__mulhilo()internals to reduce temporary variables/register pressure.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
andreyfe1
left a comment
There was a problem hiding this comment.
lgtm. Make sure that tests are passed
rarutyun
left a comment
There was a problem hiding this comment.
Honestly, I would drop all the uglification from the PR. As I mentioned before, I would not "pollute" commits history with meaningless (from the code perspective) changes.
If uglification matters, I would introduce it only for the lines that have meaningful changes.
As an example, we had ::std namespace all over the place. Over time, the design was settled and we found out that we don't need :: before std. Instead of changing everything at once and create big "pollution" in our code base, we change namespace :: prefix only in the lines that have other meaningful changes.
This reverts commit 7c2e0cd.
I agree that mixing large style-only changes with functional updates makes the history harder to follow. At the same time, this part of the code doesn’t change very often, so opportunities to clean it up incrementally are relatively rare and I’m a bit concerned it may take a while to converge to a consistent style. To keep this PR focused, I’ve moved all uglification-related changes into a separate branch and will proceed with just the functional/refactoring changes here. We can then handle the naming cleanup independently (either incrementally or all at once - let's discuss offline what works best). |
Description
The PR addresses some comments received after #2603 was merged.
Done:
get_even_element_array()andget_odd_element_array()functions in the Philox class into a singleget_consts_by().__philox_kernel()and__mulhilo()internal functions.