Skip to content

[RNG] Follow-up improvements for Philox RNG engine#2619

Open
ElenaTyuleneva wants to merge 11 commits intomainfrom
dev/etyulene/philox_follow_up
Open

[RNG] Follow-up improvements for Philox RNG engine#2619
ElenaTyuleneva wants to merge 11 commits intomainfrom
dev/etyulene/philox_follow_up

Conversation

@ElenaTyuleneva
Copy link
Copy Markdown
Contributor

@ElenaTyuleneva ElenaTyuleneva commented Mar 16, 2026

Description

The PR addresses some comments received after #2603 was merged.

Done:

  • Unified get_even_element_array() and get_odd_element_array() functions in the Philox class into a single get_consts_by().
  • Reduced the register's pressure in the __philox_kernel() and __mulhilo() internal functions.

* Merged odd and even constants arrays generation into one method with parity parameter
@ElenaTyuleneva ElenaTyuleneva changed the title [RNG] Follow-up improvements in the Philox engine [RNG] Follow-up improvements in the Philox and other random engines Mar 16, 2026
@ElenaTyuleneva ElenaTyuleneva changed the title [RNG] Follow-up improvements in the Philox and other random engines [RNG] Follow-up improvements for Philox and other random number engines Mar 16, 2026
@ElenaTyuleneva ElenaTyuleneva marked this pull request as ready for review March 16, 2026 18:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Follow-up refinement to oneDPL RNG engines (notably philox_engine) to address post-merge feedback from #2603, focusing on constant handling and deterministic initialization for vector return paths.

Changes:

  • Refactors philox_engine constant unpacking into a single get_consts_by_indices() helper and cleans up some internal naming.
  • Ensures deterministic initialization of partially-filled sycl::vec results by explicitly constructing result vectors from 0 in portion-generation paths.
  • Applies the same explicit-zero construction pattern across several RNG engines’ internal generation helpers.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
include/oneapi/dpl/internal/random_impl/subtract_with_carry_engine.h Explicitly initializes vector result buffers to zero in internal generation paths.
include/oneapi/dpl/internal/random_impl/philox_engine.h Unifies constant extraction, renames internal masks, and zero-initializes vector result buffers.
include/oneapi/dpl/internal/random_impl/linear_congruential_engine.h Zero-initializes partial vector results returned by result_portion_internal().
include/oneapi/dpl/internal/random_impl/discard_block_engine.h Zero-initializes vector result buffers in internal generation paths.
Comments suppressed due to low confidence (3)

include/oneapi/dpl/internal/random_impl/discard_block_engine.h:231

  • generate_internal() assigns to every lane of _res in all control-flow paths (either by whole-vector assignment from engine() or by filling each element in the loop). Initializing __res with result_type __res(0) is therefore unnecessary and can be a small performance regression. Consider default-constructing here and reserving explicit zero-initialization for the partial-result overload that doesn’t write all lanes.
        result_type __res(0);
        if (static_cast<::std::size_t>(_N) < (used_block - n_))
        {
            __res = engine_();
            n_ += static_cast<::std::size_t>(_N);
        }
        else
        {
            for (int __i = 0; __i < _N; ++__i)
            {
                __res[__i] = generate_internal_scalar<internal::type_traits_t<result_type>::num_elems>();
            }
        }
        return __res;

include/oneapi/dpl/internal/random_impl/philox_engine.h:334

  • This generate_internal() overload writes every element of __loc_result in the loop, so initializing it with result_type __loc_result(0) is redundant and adds extra per-call work for sycl::vec outputs. Consider default-constructing here, and keep explicit zero-initialization only in the overload that returns a portion (where unwritten lanes must be defined).
        result_type __loc_result(0);
        for (int __elm_count = 0; __elm_count < _N; ++__elm_count)
        {
            ++state_.idx;

            // check if buffer is empty
            if (state_.idx == word_count)
            {
                philox_kernel();
                increase_counter_internal();
                state_.idx = 0;
            }
            __loc_result[__elm_count] = state_.Y[state_.idx];
        }

        return __loc_result;

include/oneapi/dpl/internal/random_impl/subtract_with_carry_engine.h:220

  • In generate_internal() for vector outputs, __res is fully assigned in the following loop, so zero-initializing it with result_type __res(0) is redundant and adds extra work (sycl::vec scalar-ctor will write all lanes). Consider reverting this to default construction here and keep explicit zero-initialization only in the “portion” generation path where not all lanes are written.
        result_type __res(0);
        for (int __i = 0; __i < _N; ++__i)
        {
            __res[__i] = generate_internal_scalar();
        }
        return __res;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@timmiesmith timmiesmith requested a review from rarutyun March 16, 2026 19:09
Comment on lines +71 to +72
__even = 0,
__odd = 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a small renaming to

Suggested change
__even = 0,
__odd = 1
__even_indices = 0,
__odd_indices = 1

together with renaming below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by the latest changes.

template <std::size_t _Offset, std::size_t... _Is>
static constexpr auto
get_odd_element_array(std::array<scalar_type, _n> __input_array, std::index_sequence<_Is...>)
get_consts_by_indices(std::index_sequence<_Is...>)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
get_consts_by_indices(std::index_sequence<_Is...>)
get_consts_by(std::index_sequence<_Is...>)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal, applied!

static constexpr auto
get_even_element_array(std::array<scalar_type, _n> __input_array, std::index_sequence<_Is...>)
/* Method for unpacking variadic of constants into two arrays - with odd and even elements */
enum
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enum
enum class __indices_offset : std::size_t

Or whatever syntax is correct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the enum approach was sligthly changed.

Copy link
Copy Markdown
Contributor

@andreyfe1 andreyfe1 Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spec says:

template<class UIntType, size_t w, size_t n, size_t r, UIntType... consts>

Please add uglification

template <typename _UIntType, std::size_t _W, std::size_t _N, std::size_t _R,
          oneapi::dpl::internal::element_type_t<_UIntType>... _Consts>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The uglification was fixed in 7c2e0cd.

Comment on lines 179 to 180
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uglify this sth like

template <typename _CharT, typename _Traits, typename _UIntType, std::size_t _W, std::size_t _N,
std::size_t _R, oneapi::dpl::internal::element_type_t<_UIntType>... _Consts>

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +198 to +201
std::array<scalar_type, word_count> __X; // counters
std::array<scalar_type, word_count / 2> __K; // keys
std::array<scalar_type, word_count> __Y; // results
scalar_type __idx; // index
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::array<scalar_type, word_count> __X; // counters
std::array<scalar_type, word_count / 2> __K; // keys
std::array<scalar_type, word_count> __Y; // results
scalar_type __idx; // index
std::array<scalar_type, word_count> __x; // counters
std::array<scalar_type, word_count / 2> __k; // keys
std::array<scalar_type, word_count> __y; // results
scalar_type __idx; // index

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +389 to +391
scalar_type __V0 = __state.__X[0];
scalar_type __V1 = __state.__X[1];
scalar_type __K0 = __state.__K[0];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scalar_type __V0 = __state.__X[0];
scalar_type __V1 = __state.__X[1];
scalar_type __K0 = __state.__K[0];
scalar_type __v0 = __state.__x[0];
scalar_type __v1 = __state.__k[1];
scalar_type __k0 = __state.__k[0];

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

scalar_type __V1 = state_.X[1];
scalar_type __K0 = state_.K[0];
scalar_type __V0 = __state.__X[0];
scalar_type __V1 = __state.__X[1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Consider using aliases to x[0], x[1]

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented, thanks for the idea!

scalar_type __V0 = __state.__X[2];
scalar_type __V3 = __state.__X[3];
scalar_type __K0 = __state.__K[0];
scalar_type __K1 = __state.__K[1];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented as part of 94c9da5.

Comment on lines 455 to 457
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if y1, x1, y0 are not used below it can be reused (with aliases if it's better). It can affect performance due to high registers' pressure

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced the amount of temporary registers from 11 to 7 in this function (ca2ff49).

@ElenaTyuleneva ElenaTyuleneva changed the title [RNG] Follow-up improvements for Philox and other random number engines [RNG] Follow-up improvements for Philox RNG engine Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR is a follow-up cleanup for the Philox RNG engine implementation, addressing review feedback from #2603 by tightening internal naming/uglification, simplifying constants unpacking, and adjusting internal kernels to reduce register pressure.

Changes:

  • Uglified Philox private members/methods and updated related friend/operator code accordingly.
  • Unified even/odd constants extraction into a single __get_consts_by() helper.
  • Refactored __philox_kernel() / __mulhilo() internals to reduce temporary variables/register pressure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

andreyfe1
andreyfe1 previously approved these changes Apr 8, 2026
Copy link
Copy Markdown
Contributor

@andreyfe1 andreyfe1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Make sure that tests are passed

Copy link
Copy Markdown
Contributor

@rarutyun rarutyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I would drop all the uglification from the PR. As I mentioned before, I would not "pollute" commits history with meaningless (from the code perspective) changes.

If uglification matters, I would introduce it only for the lines that have meaningful changes.

As an example, we had ::std namespace all over the place. Over time, the design was settled and we found out that we don't need :: before std. Instead of changing everything at once and create big "pollution" in our code base, we change namespace :: prefix only in the lines that have other meaningful changes.

@ElenaTyuleneva
Copy link
Copy Markdown
Contributor Author

Honestly, I would drop all the uglification from the PR. As I mentioned before, I would not "pollute" commits history with meaningless (from the code perspective) changes.

If uglification matters, I would introduce it only for the lines that have meaningful changes.

As an example, we had ::std namespace all over the place. Over time, the design was settled and we found out that we >don't need :: before std. Instead of changing everything at once and create big "pollution" in our code base, we change >namespace :: prefix only in the lines that have other meaningful changes.

I agree that mixing large style-only changes with functional updates makes the history harder to follow. At the same time, this part of the code doesn’t change very often, so opportunities to clean it up incrementally are relatively rare and I’m a bit concerned it may take a while to converge to a consistent style.

To keep this PR focused, I’ve moved all uglification-related changes into a separate branch and will proceed with just the functional/refactoring changes here. We can then handle the naming cleanup independently (either incrementally or all at once - let's discuss offline what works best).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants