Skip to content

Cursor::populate_key#725

Merged
frankmcsherry merged 3 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:cursor_populate
Apr 26, 2026
Merged

Cursor::populate_key#725
frankmcsherry merged 3 commits intoTimelyDataflow:master-nextfrom
frankmcsherry:cursor_populate

Conversation

@frankmcsherry
Copy link
Copy Markdown
Member

This PR starts to investigate a "bulk" data load, which I think is probably the right direction, allowing a C: Cursor to populate an EditList behind its implementation abstraction, and then users can interact with the edit lists without returning to the cursor's iterators. This is meant to give cursors the ability to be more thoughtful about data loading, using "internal iteration" idioms rather than exposing their iterators outwards and relying on folks using them.

At the moment, this gives a modest reduction in binary size just from the removal of closures passed around, but the intent is that more fully developed it would allow the stack of cursors to move larger collections of updates around, rather than bouncing in and out of cursor navigation. Eventually, Cursor::populate_keys in the plural, and general bulk loading for supplied sets of keys. For the moment, this is potentially either mergeable as is, or .. we can wait for a bit of EditList evolution that should be coming down the pipe (trait simplification, but also a columnar backbone).

@frankmcsherry frankmcsherry marked this pull request as ready for review April 26, 2026 00:09
@frankmcsherry frankmcsherry merged commit 5062a54 into TimelyDataflow:master-next Apr 26, 2026
6 checks passed
@frankmcsherry frankmcsherry deleted the cursor_populate branch April 26, 2026 00:21
frankmcsherry added a commit that referenced this pull request Apr 29, 2026
* Restore pre-#725 spines.rs and inline EditList::load

Brings back the spines arrangement bake-off (deleted in #724 Spring
cleaning, then RHH-dependent) with three modes: `key` (OrdKeySpine),
`val` (OrdValSpine with Val=()), and `col` (columnar ValSpine via the
columnar module added in #730). All three feed the same Vec-shaped
input collections through one driver loop; `col` repacks via a small
in-dataflow `unary` (`ToRecorded`) that builds `RecordedUpdates`
containers before `arrange_core`.

Bisecting against the example exposed a regression introduced in #725:
EditList::load now delegates to populate_key, which seek_keys + checks
+ rewinds vals on every call. In the merge-join inner loop (join.rs
Ordering::Equal arm), the cursor is already positioned by the upstream
`match trace_key.cmp(&batch_key)` work, so the seek is redundant.
Repeated 1M times in the spines query phase, this added ~3s (+40%
queries time vs pre-#725 baseline).

Restoring EditList::load to its pre-#725 division of labor — assume
the cursor is positioned, walk vals inline — recovers performance.
populate_key and replay_key keep the seek for callers that legitimately
need it (reduce, ValueHistory). The Option-based meet API from #725
stays.

Measurements (1M keys, 1000 size, key mode):
- v0.23.0 baseline: 6.56s queries
- pre-#725 (f4e7550): 7.16s queries
- master HEAD before this commit: 10.12s queries
- this commit: 7.00s queries

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten up spines examples

* Extract common target columnar size

* TrieChunker work

* De-penalize col in spiners.rs

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant