(improvement)perf: Optimize DCAware/RackAware/TokenAware/HostFilter policies with host distance caching and overall perf. improvements (100's to 1000's of ns reduction, x1.1-2.9 improvement!) by mykaul · Pull Request #651 · scylladb/python-driver

mykaul · 2026-01-22T18:14:21Z

Refactor DCAwareRoundRobinPolicy to use a Copy-On-Write (COW) strategy for managing host distances.

Results (5 DCs × 3 racks × 3 nodes = 45 nodes, 100K queries, median of 5 iterations):

Policy                         | Kops/s | master Kops/s | Speedup
---------------------------------------------------------------
DCAware                        |    209 |            88 |   2.4x
RackAware                      |    173 |            59 |   2.9x
TokenAware(DCAware)            |     62 |            15 |   4.1x
TokenAware(RackAware)          |     60 |            14 |   4.3x
Default(DCAware)               |    132 |            73 |   1.8x
HostFilter(DCAware)            |     63 |            44 |   1.4x

Key changes:

Introduce _remote_hosts to cache REMOTE hosts, enabling O(1) distance lookups during query planning for distance. IGNORED hosts do not need to be stored in the cache.
For 'LOCAL' we do a simple comparison.
Add _refresh_remote_hosts to handle node changes.
LRU cache for token-to-replicas lookup in TokenAwarePolicy (default 1024 entries, auto-invalidated on topology change).
TokenAwarePolicy skips distance re-sorting for DCAware/RackAware child policies (they already yield in distance order), with a fallback re-sort for other child policies.
TokenAwarePolicy no longer uses __slots__ to avoid breaking downstream subclasses.
LWT queries skip replica shuffling for deterministic plans.

This is a different attempt from #650 to add caching to host distance to make query planning faster.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

mykaul · 2026-01-23T16:44:59Z

This is interesting, my change has exposed this -

2026-01-23 18:26:45.488 DEBUG [libevreactor:376]: Message pushed from server: <EventMessage(event_type='STATUS_CHANGE', event_args={'change_type': 'DOWN', 'address': ('127.0.0.3', 9042)}, stream_id=-1, trace_id=None)>

2026-01-23 18:26:45.489 WARNING [libevreactor:376]: Host 127.0.0.3:9042 has been marked down                      <--- host .3 is marked as DOWN

2026-01-23 18:26:45.489 DEBUG [thread:73]: First connection created to 127.0.0.2:9042 for shard_id=0
2026-01-23 18:26:45.489 DEBUG [thread:73]: Finished initializing connection for host 127.0.0.2:9042
2026-01-23 18:26:45.489 DEBUG [thread:73]: Added pool for host 127.0.0.2:9042 to session
2026-01-23 18:26:45.489 DEBUG [thread:73]: Removed connection pool for <Host: 127.0.0.3:9042 dc1>
2026-01-23 18:26:45.490 DEBUG [thread:73]: Shutting down connections to 127.0.0.3:9042
2026-01-23 18:26:45.490 DEBUG [thread:73]: Closing connection (139753730215760) to 127.0.0.3:9042
2026-01-23 18:26:48.496 DEBUG [test_ip_change:35]: Change IP address for node3
2026-01-23 18:26:48.534 DEBUG [test_ip_change:40]: Start node3 again with ip address 127.0.0.33
2026-01-23 18:26:48.551 DEBUG [cluster:772]: node3: Starting scylla: args=['/home/ykaul/github/python-driver/tests/integration/ccm/test_ip_change/node3/bin/scylla', '--options-file', '/home/ykaul/github/python-driver/tests/integration/ccm/test_ip_change/node3/conf/scylla.yaml', '--log-to-stdout', '1', '--api-address', '127.0.0.33', '--smp', '1', '--memory', '512M', '--developer-mode', 'true', '--default-log-level', 'info', '--overprovisioned', '--prometheus-address', '127.0.0.33', '--unsafe-bypass-fsync', '1', '--kernel-page-cache', '1', '--commitlog-use-o-dsync', '0', '--max-networking-io-control-blocks', '1000'] wait_other_notice=False wait_for_binary_proto=True
2026-01-23 18:26:49.947 INFO [cluster:775]: node3: Started scylla: pid: 186960
2026-01-23 18:26:49.947 DEBUG [test_ip_change:45]: ['127.0.0.1', '127.0.0.3', '127.0.0.2']
2026-01-23 18:26:50.164 DEBUG [libevreactor:376]: Message pushed from server: <EventMessage(event_type='TOPOLOGY_CHANGE', event_args={'change_type': 'NEW_NODE', 'address': ('127.0.0.33', 9042)}, stream_id=-1, trace_id=None)>
2026-01-23 18:26:50.165 DEBUG [libevreactor:376]: Message pushed from server: <EventMessage(event_type='STATUS_CHANGE', event_args={'change_type': 'UP', 'address': ('127.0.0.33', 9042)}, stream_id=-1, trace_id=None)>
2026-01-23 18:26:50.448 DEBUG [test_ip_change:45]: ['127.0.0.1', '127.0.0.3', '127.0.0.2']
2026-01-23 18:26:50.948 DEBUG [test_ip_change:45]: ['127.0.0.1', '127.0.0.3', '127.0.0.2']
2026-01-23 18:26:51.449 DEBUG [test_ip_change:45]: ['127.0.0.1', '127.0.0.3', '127.0.0.2']
2026-01-23 18:26:51.569 DEBUG [thread:73]: [control connection] Refreshing node list and token map
2026-01-23 18:26:51.570 DEBUG [thread:73]: [control connection] Updating host ip from 127.0.0.3:9042 to 127.0.0.33:9042 for (c989a851-2dcb-4b05-8a0c-fb1658a32e21)

2026-01-23 18:26:51.570 WARNING [thread:73]: Host 127.0.0.33:9042 has been marked down            <-- due to an IP change, the host is marked as down!?!

2026-01-23 18:26:51.571 DEBUG [thread:73]: [control connection] Finished fetching ring info
2026-01-23 18:26:51.949 DEBUG [test_ip_change:45]: ['127.0.0.1', '127.0.0.33', '127.0.0.2']

Need to understand this better :-/

mykaul · 2026-01-23T16:50:37Z

            if host is None:
                host = self._cluster.metadata.get_host_by_host_id(host_id)
                if host and host.endpoint != endpoint:
                    log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
                    old_endpoint = host.endpoint
                    host.endpoint = endpoint
                    self._cluster.metadata.update_host(host, old_endpoint)
                    reconnector = host.get_and_set_reconnection_handler(None)
                    if reconnector:
                        reconnector.cancel()
                    self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)

So first we update the host with the new endpoint, then mark it as down?

mykaul · 2026-01-23T17:39:47Z

This fixes it for me:

diff --git a/cassandra/cluster.py b/cassandra/cluster.py
index a9c1d00e..099043ea 100644
--- a/cassandra/cluster.py
+++ b/cassandra/cluster.py
@@ -3831,14 +3831,16 @@ class ControlConnection(object):
                 host = self._cluster.metadata.get_host_by_host_id(host_id)
                 if host and host.endpoint != endpoint:
                     log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
-                    old_endpoint = host.endpoint
-                    host.endpoint = endpoint
-                    self._cluster.metadata.update_host(host, old_endpoint)
                     reconnector = host.get_and_set_reconnection_handler(None)
                     if reconnector:
                         reconnector.cancel()
                     self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)
 
+                    old_endpoint = host.endpoint
+                    host.endpoint = endpoint
+                    self._cluster.metadata.update_host(host, old_endpoint)
+                    self._cluster.on_up(host)
+
             if host is None:
                 log.debug("[control connection] Found new host to connect to: %s", endpoint)
                 host, _ = self._cluster.add_host(endpoint, datacenter=datacenter, rack=rack, signal=True, refresh_nodes=False, host_id=host_id)

which also makes sense to me.
@dkropachev - I think this fix should go in a separate issue and PR, no? (context - start with #651 (comment) - my changes here failed, due to a wrong order of update of a host which changed its IP)

mykaul · 2026-01-23T20:14:13Z

I think CI failure is unrelated and is #359

mykaul · 2026-01-24T11:49:05Z

By using the (not amazing) benchmark from #653 , I got the following results:

For master branch as a baseline:

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.2309     | 433       
RackAware                      | 100000     | 0.3607     | 277       
TokenAware(DCAware)            | 100000     | 1.3262     | 75        
TokenAware(RackAware)          | 100000     | 1.4343     | 69

This branch (with just DC aware improvements):

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1280     | 781       
RackAware                      | 100000     | 0.3572     | 279       
TokenAware(DCAware)            | 100000     | 1.1620     | 86        
TokenAware(RackAware)          | 100000     | 1.4435     | 69

** 433 -> 781 Kops/sec improvement **

With improvement to rack aware (on top of master), I got:

=== Performance Benchmarks ===
Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.2306     | 433       
RackAware                      | 100000     | 0.3084     | 324       
TokenAware(DCAware)            | 100000     | 1.3031     | 76        
TokenAware(RackAware)          | 100000     | 1.3440     | 74

** 277 -> 324 Kops/sec improvement **

And on top of this branch:

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1283     | 779       
RackAware                      | 100000     | 0.2905     | 344       
TokenAware(DCAware)            | 100000     | 1.1454     | 87        
TokenAware(RackAware)          | 100000     | 1.3293     | 75

** 277 -> 344 Kops/sec improvement **

And finally, for #650 :

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.2325     | 430       
RackAware                      | 100000     | 0.3611     | 276       
TokenAware(DCAware)            | 100000     | 1.5826     | 63        
TokenAware(RackAware)          | 100000     | 1.6927     | 59

which kinda makes me suspect that branch is no good :-/

mykaul · 2026-01-24T11:57:18Z

This fixes it for me:

diff --git a/cassandra/cluster.py b/cassandra/cluster.py
index a9c1d00e..099043ea 100644
--- a/cassandra/cluster.py
+++ b/cassandra/cluster.py
@@ -3831,14 +3831,16 @@ class ControlConnection(object):
                 host = self._cluster.metadata.get_host_by_host_id(host_id)
                 if host and host.endpoint != endpoint:
                     log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
-                    old_endpoint = host.endpoint
-                    host.endpoint = endpoint
-                    self._cluster.metadata.update_host(host, old_endpoint)
                     reconnector = host.get_and_set_reconnection_handler(None)
                     if reconnector:
                         reconnector.cancel()
                     self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)
 
+                    old_endpoint = host.endpoint
+                    host.endpoint = endpoint
+                    self._cluster.metadata.update_host(host, old_endpoint)
+                    self._cluster.on_up(host)
+
             if host is None:
                 log.debug("[control connection] Found new host to connect to: %s", endpoint)
                 host, _ = self._cluster.add_host(endpoint, datacenter=datacenter, rack=rack, signal=True, refresh_nodes=False, host_id=host_id)

which also makes sense to me. @dkropachev - I think this fix should go in a separate issue and PR, no? (context - start with #651 (comment) - my changes here failed, due to a wrong order of update of a host which changed its IP)

Sent separate PR - #654

mykaul · 2026-01-24T15:27:45Z

With rack aware added (3rd commit), these are the current numbers:

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1235     | 809       
RackAware                      | 100000     | 0.2934     | 340       
TokenAware(DCAware)            | 100000     | 1.1371     | 87        
TokenAware(RackAware)          | 100000     | 1.3291     | 75

mykaul · 2026-01-24T15:43:46Z

With rack aware added (3rd commit), these are the current numbers:

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1235     | 809       
RackAware                      | 100000     | 0.2934     | 340       
TokenAware(DCAware)            | 100000     | 1.1371     | 87        
TokenAware(RackAware)          | 100000     | 1.3291     | 75

Now that I also cache non-local hosts, not just remote (duh!), perf. is better:

Policy                         | Ops        | Time (s)   | Kops/s    
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1247     | 802       
RackAware                      | 100000     | 0.1624     | 615       
TokenAware(DCAware)            | 100000     | 1.2408     | 80        
TokenAware(RackAware)          | 100000     | 1.3087     | 76

mykaul · 2026-01-24T20:44:07Z

Added for TokenAware as well some optimization (need to improve commit message).
Current results:

Policy                         | Ops        | Time (s)   | Kops/s    | master | (improv from master)
----------------------------------------------------------------------
DCAware                        | 100000     | 0.1266   | 790   | 433 | (x1.8)
RackAware                      | 100000     | 0.1670   | 598   | 277 | (x2.1)
TokenAware(DCAware)            | 100000     | 0.2663   | 375   | 75   | (x5)     
TokenAware(RackAware)          | 100000     | 0.3009   | 332  | 69  | (x4.8)

So reasonable improvement, at least in this micro-benchmark.

mykaul · 2026-01-25T16:51:15Z

Last push, I think I'm done:

    Policy                         | Ops        | Time (s)   | Kops/s | (master)
    ----------------------------------------------------------------------
    DCAware                        | 100000     | 0.0989     | 1010 | 433
    Default(DCAware)               | 100000     | 0.1532     | 652  | ?
    HostFilter(DCAware)            | 100000     | 0.3303     | 302  | ?
    RackAware                      | 100000     | 0.1149     | 870  | 277 
    TokenAware(DCAware)            | 100000     | 0.2112     | 473  | 75
    TokenAware(RackAware)          | 100000     | 0.2249     | 444  | 69

mykaul · 2026-03-12T20:47:21Z

Latest numbers:
Cumulative results (master -> final branch):
Policy Master Branch Improvement
DCAware 106 Kops/s 204 Kops/s +92%
RackAware 68 Kops/s 180 Kops/s +165%
TokenAware(DCAware) 18 Kops/s 60 Kops/s +233%
TokenAware(RackAware) 17 Kops/s 57 Kops/s +235%
Default(DCAware) 91 Kops/s 132 Kops/s +45%
HostFilter(DCAware) 53 Kops/s 66 Kops/s +25%

Copilot

Pull request overview

This PR optimizes load balancing policies (DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, HostFilterPolicy) with host distance caching and general performance improvements. The key insight is caching computed host distance data (remote hosts, non-local-rack hosts) and replica lookups to avoid repeated computation in the hot query-planning path.

Changes:

Introduce _remote_hosts (COW dict) on DCAwareRoundRobinPolicy and RackAwareRoundRobinPolicy for O(1) distance lookups, plus _non_local_rack_hosts for rack-aware iteration; both refreshed on topology changes.
Add an LRU replica cache to TokenAwarePolicy (keyed by (keyspace, routing_key), invalidated on token map changes) and restructure make_query_plan to use direct distance bucketing instead of repeated yield_in_order scans.
Add make_query_plan_with_exclusion API to LoadBalancingPolicy and all subclasses, enabling TokenAwarePolicy to skip already-yielded replicas when querying the child policy for remaining hosts.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`cassandra/policies.py`	Core optimization: distance caching in DC/RackAware policies, LRU replica cache in TokenAwarePolicy, new `make_query_plan_with_exclusion` method across policies, formatting cleanup
`tests/unit/test_policies.py`	New tests for `make_query_plan_with_exclusion`, replica cache (hit/miss/eviction/invalidation/disabled), LWT determinism, tablet bypass; test mocks updated for new `token_map`-based replica resolution; formatting cleanup

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

This PR optimizes the load balancing policies in the ScyllaDB Python driver by introducing distance caching via Copy-On-Write (COW) strategy, an LRU cache for token-to-replica lookups in TokenAwarePolicy, and a new make_query_plan_with_exclusion API to avoid redundant iteration.

Changes:

Introduces _remote_hosts and _non_local_rack_hosts cached dictionaries/lists in DCAwareRoundRobinPolicy and RackAwareRoundRobinPolicy for O(1) distance lookups, rebuilt on topology changes.
Adds an LRU cache (OrderedDict) in TokenAwarePolicy for token-to-replica lookups, invalidated by token_map object identity change; includes LWT deterministic ordering (no shuffle).
Adds make_query_plan_with_exclusion() to LoadBalancingPolicy and its subclasses to efficiently skip already-yielded replicas in TokenAwarePolicy.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`cassandra/policies.py`	Core performance optimizations: COW distance caching, LRU replica cache, `make_query_plan_with_exclusion` API, LWT shuffle skip, code formatting
`tests/unit/test_policies.py`	New tests for exclusion-based query planning, LRU cache behavior, LWT determinism, cache invalidation, and test infrastructure updates for new mock requirements

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add micro-benchmarks measuring query plan generation throughput for DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, DefaultLoadBalancingPolicy, and HostFilterPolicy. Uses pytest-benchmark for accurate timing and statistical reporting with a simulated 45-node cluster topology (5 DCs x 3 racks x 3 nodes) and 100,000 deterministic queries. Also rename tests/integration/standard/column_encryption/test_policies.py to test_encrypted_policies.py to avoid module name conflicts when running the full test suite. Run with: pytest -m benchmark tests/performance/ Benchmark results comparing master vs PR scylladb#651 optimizations (Python 3.14.3, pytest-benchmark 5.2.3, GC disabled, median): Policy | master (Kops/s) | PR#651 (Kops/s) | Speedup --------------------------|-----------------|-----------------|-------- DCAware | 833 | 1898 | 2.3x RackAware | 542 | 1589 | 2.9x TokenAware(DCAware) | 135 | 572 | 4.2x TokenAware(RackAware) | 123 | 539 | 4.4x Default(DCAware) | 674 | 1257 | 1.9x HostFilter(DCAware) | 394 | 579 | 1.5x

mykaul · 2026-04-07T17:32:17Z

Benchmark Results — Load Balancing Policy Optimization

Setup: 5 DCs × 3 racks × 3 nodes = 45 nodes, 50K make_query_plan calls per iteration, min of 10 iterations, pinned to single CPU core (taskset -c 0), machine at 99% idle / load ~1.3.

Policy	master (ns/op)	PR (ns/op)	Speedup
DCAwareRoundRobin	1,145	509	2.2×
RackAwareRoundRobin	1,720	591	2.9×
TokenAware(DCAware)	28,116	16,764	1.7×
TokenAware(RackAware)	31,626	17,281	1.8×
DefaultLoadBalancing(DCAware)	14,905	13,872	1.1×
HostFilterPolicy(DCAware)	2,019	1,283	1.6×

Key changes

Host distance caching via LRU (default 256 entries) avoids recomputing distance() on every make_query_plan call
Pre-bucketed host lists (_local_hosts, _remote_hosts, _non_local_rack_hosts) maintained on membership changes instead of computed per query
Code deduplication: make_query_plan now delegates to make_query_plan_with_exclusion(set()), eliminating duplicated generator logic

Fixes applied on top of original PR

_non_local_rack_hosts stored as tuple (not list) for thread-safe snapshot reads, matching codebase convention
Restored host.is_up filter for non-replica hosts in TokenAwarePolicy.make_query_plan (parity with master's yield_in_order)
LRU cache default reduced from 1024 → 256 entries
Deduplicated make_query_plan / make_query_plan_with_exclusion with lazy _remote_hosts reads to preserve concurrent modification visibility

All 103 unit tests pass.

Add micro-benchmarks measuring query plan generation throughput for DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, DefaultLoadBalancingPolicy, and HostFilterPolicy. Uses pytest-benchmark for accurate timing and statistical reporting with a simulated 45-node cluster topology (5 DCs x 3 racks x 3 nodes) and 100,000 deterministic queries. Also rename tests/integration/standard/column_encryption/test_policies.py to test_encrypted_policies.py to avoid module name conflicts when running the full test suite. Run with: pytest -m benchmark tests/performance/ Benchmark results comparing master vs PR scylladb#651 optimizations (Python 3.14.3, pytest-benchmark 5.2.3, GC disabled, median): Policy | master (Kops/s) | PR#651 (Kops/s) | Speedup --------------------------|-----------------|-----------------|-------- DCAware | 833 | 1898 | 2.3x RackAware | 542 | 1589 | 2.9x TokenAware(DCAware) | 135 | 572 | 4.2x TokenAware(RackAware) | 123 | 539 | 4.4x Default(DCAware) | 674 | 1257 | 1.9x HostFilter(DCAware) | 394 | 579 | 1.5x

Introduce _remote_hosts dict to cache REMOTE hosts, enabling O(1) distance lookups instead of scanning per-DC host lists. Replace islice(cycle(...)) with index arithmetic in make_query_plan. Call _refresh_remote_hosts() on topology changes.

…dRobin, and DCAware Add a new make_query_plan_with_exclusion() method that skips hosts in an exclusion set. The base class provides a default filtering implementation; RoundRobin and DCAware override for efficiency.

Cache remote hosts and non-local-rack hosts to enable O(1) distance lookups. Replace islice(cycle(...)) with index arithmetic. Reorder on_up/on_down to update DC-level hosts before rack-level for correct cache invalidation.

Optimized exclusion-aware query plan that avoids re-computing non-local-rack and remote host lists.

- Add LRU cache (default 1024 entries) for token-to-replicas lookups, auto-invalidated on topology changes (token_map identity check). - Sort replicas by distance (LOCAL_RACK > LOCAL > REMOTE) in a single pass instead of iterating three times. - Skip distance re-sorting for DCAware/RackAware child policies since they already yield in distance order; fallback re-sort for others. - LWT queries skip replica shuffling for deterministic plans. - Use make_query_plan_with_exclusion to avoid re-yielding replicas.

…ultLoadBalancingPolicy Both delegate to their child policy's exclusion-aware query plan while preserving their specific filtering/targeting behavior.

…erminism Add tests for make_query_plan_with_exclusion in RoundRobin, DCAware, and RackAware policies. Add cache tests (hit, miss, eviction, topology invalidation, disabled) and LWT determinism tests for TokenAwarePolicy. Update existing tests to set up token_map mocks and shuffle_replicas=False to match the new TokenAwarePolicy implementation.

…ace-aware invalidation - Move LRU cache lookup before token_class.from_key() so cache hits skip the murmur3 hash computation and Token object allocation entirely. - Add keyspace-aware cache invalidation: track the per-keyspace replica map object identity so ALTER KEYSPACE / replication changes are detected even when the TokenMap object itself is reused (in-place rebuild). - Remove unused 'token' from cache entries (was never read after storage). - Add test_cache_invalidation_on_keyspace_replication_change. TODO: The tablet path still does two full child-policy traversals per query. Metadata.get_host_by_host_id() is O(1) and could resolve tablet replicas in O(rf) instead. Deferred to minimize behavioral change.

mykaul · 2026-04-20T05:07:37Z

Updated the branch after the latest fix/review pass.

Recent changes:

fixed TokenAwarePolicy cache key collision by including query.table
fixed cache invalidation for ALTER KEYSPACE by tracking per-keyspace replica-map object identity instead of id(...)
moved cache lookup ahead of token_class.from_key(...) so cache hits avoid the hash/token allocation path
changed RackAwareRoundRobinPolicy._non_local_rack_hosts to a tuple for consistency/immutability
fixed DefaultLoadBalancingPolicy.make_query_plan_with_exclusion() so target_host is included in the child exclusion set

Validation:

source .venv/bin/activate && pytest tests/unit/test_policies.py -> 104/104 passing
second self-review pass found no remaining medium+ correctness issues

Most recent benchmark (best of 3 runs, branch vs origin/master baseline):

DCAware: 490 ns/op vs 1167 ns/op (-58%)
RackAware: 573 ns/op vs 1767 ns/op (-68%)
TokenAware(DCAware): 27408 ns/op vs 26526 ns/op (+3%, benchmark noise; random routing keys mean the LRU cache does not help here)
TokenAware(RackAware): 27790 ns/op vs 31703 ns/op (-12%)
Default(DCAware): 13229 ns/op vs 14008 ns/op (-6%)
HostFilter(DCAware): 1303 ns/op vs 1964 ns/op (-34%)

Deferred TODO:

tablet replica resolution can still be optimized further by resolving hosts via get_host_by_host_id in O(rf) instead of traversing the child plan twice; intentionally left out of this PR

…emote hosts - Convert used_hosts_per_remote_dc to a property in DCAwareRoundRobinPolicy and RackAwareRoundRobinPolicy so that runtime changes immediately refresh the cached _remote_hosts dict (restores origin/master behavior). - Defer reading self._remote_hosts until the remote iteration phase in make_query_plan() and make_query_plan_with_exclusion() so topology changes during local iteration are visible (restores origin/master late-binding behavior). - Add tests for runtime used_hosts_per_remote_dc changes and for modification-during-generation on the exclusion path.

- Add Tablets.__bool__() so 'if tablets:' is False when no tablets are registered, avoiding the get_tablet_for_key() method call + dict lookup on every cache miss in the non-tablet path. - Restructure TokenAwarePolicy.make_query_plan() to nest the tablet handling inside 'if cluster_metadata._tablets:' guard. - Fix benchmark to use real Tablets({}) instead of Mock (Mock is always truthy, hiding the tablet-skip optimization). - Add zipfian workload to benchmark (500 distinct keys, Zipf a=1.2) to measure cache-hit performance alongside cache-miss.

mykaul · 2026-04-20T08:04:40Z

Latest changes (`92a0318`)

Code changes

Skip tablet lookup when no tablets exist: Added Tablets.__bool__() so if cluster_metadata._tablets: is False when no tablets are registered. The TokenAwarePolicy.make_query_plan() cache-miss path now guards the get_tablet_for_key() call behind this check, avoiding a method call + dict lookup on every query in the common non-tablet case.
Restructured tablet handling: Nested tablet replica resolution inside the if cluster_metadata._tablets: guard -- cleaner structure, no sentinel variables.

Benchmark fixes

Replaced Mock(spec=Tablets) with real Tablets({}) in the benchmark. Mock objects are always truthy, which was hiding the tablet-skip optimization entirely.
Added a zipfian workload (500 distinct keys, Zipf a=1.2) to measure cache-hit performance alongside the existing unique-key (cache-miss) workload.

Results (best of 3 runs, apples-to-apples with real `Tablets({})`)

Policy	origin/master (ns/op)	Branch (ns/op)	Speedup	Delta
DCAware	1141	492	2.32x	-57%
RackAware	1768	569	3.11x	-68%
TokenAware(DCAware) miss	18467	19508	0.95x	+6%
TokenAware(DCAware) zipf	19082	15271	1.25x	-20%
TokenAware(RackAware) miss	23096	20195	1.14x	-13%
TokenAware(RackAware) zipf	24972	16241	1.54x	-35%
Default(DCAware)	14836	13690	1.08x	-8%
HostFilter(DCAware)	1953	1301	1.50x	-33%

Notes:

Speedup is computed as origin/master ns/op / branch ns/op, so values below 1.0x indicate a slowdown.
origin/master has no cache, so its "miss" and "zipf" numbers are essentially the same.
The +6% on TokenAware(DCAware) miss is the cache overhead (lock + OrderedDict) when the cache never hits. A lock-free RCU-style cache would eliminate this; deferred for now.
The zipfian workload (realistic hot-partition access pattern) shows the cache payoff: -20% DCAware, -35% RackAware.
All 118 tests pass (108 policy + 10 tablet).

mykaul marked this pull request as draft January 22, 2026 18:14

mykaul force-pushed the query_plan_opt_2 branch from 76ee195 to edab823 Compare January 23, 2026 19:04

mykaul force-pushed the query_plan_opt_2 branch from edab823 to 1884f59 Compare January 23, 2026 20:14

mykaul changed the title ~~(improvement)Optimize DCAwareRoundRobinPolicy with host distance caching~~ (improvement)Optimize DCAware/RackAware RoundRobinPolicy with host distance caching Jan 24, 2026

mykaul force-pushed the query_plan_opt_2 branch from cc0204d to 6282e6f Compare January 24, 2026 15:43

mykaul force-pushed the query_plan_opt_2 branch from 8f96d39 to 87c6a01 Compare January 25, 2026 08:32

mykaul mentioned this pull request Jan 25, 2026

feat: Optimize TokenAwarePolicy with thread-safe distance caching #650

Closed

8 tasks

mykaul changed the title ~~(improvement)Optimize DCAware/RackAware RoundRobinPolicy with host distance caching~~ (improvement)Optimize DCAware/RackAware/TokenAware RoundRobinPolicy with host distance caching Jan 25, 2026

mykaul mentioned this pull request Jan 25, 2026

Call child.make_query_plan in TokenAwarePolicy.make_query_plan only once #358

Open

mykaul changed the title ~~(improvement)Optimize DCAware/RackAware/TokenAware RoundRobinPolicy with host distance caching~~ (improvement)Optimize DCAware/RackAware/TokenAware/HostFilter policies with host distance caching and overall perf. improvements Jan 26, 2026

mykaul marked this pull request as ready for review January 27, 2026 17:40

mykaul force-pushed the query_plan_opt_2 branch 2 times, most recently from 5f283d1 to bd6a9c5 Compare March 12, 2026 20:46

mykaul mentioned this pull request Mar 14, 2026

Tracking: General (non-vector) performance improvement PRs #747

Open

mykaul requested a review from Copilot March 14, 2026 10:58

Copilot started reviewing on behalf of mykaul March 14, 2026 10:58 View session

Copilot AI reviewed Mar 14, 2026

View reviewed changes

Comment thread cassandra/policies.py Outdated

mykaul requested a review from Copilot March 16, 2026 17:21

Copilot started reviewing on behalf of mykaul March 16, 2026 17:21 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

Comment thread tests/unit/test_policies.py Outdated

This was referenced Apr 1, 2026

LWT routing: Tablet path loses natural token-ring order (Paxos leader not prioritized) #781

Open

Fix LWT routing: preserve Paxos leader order in TokenAwarePolicy #782

Draft

mykaul force-pushed the query_plan_opt_2 branch 2 times, most recently from c689f0d to 6a4878d Compare April 7, 2026 16:50

mykaul added 8 commits April 19, 2026 15:51

feat: add make_query_plan_with_exclusion to RackAwareRoundRobinPolicy

dcafd63

Optimized exclusion-aware query plan that avoids re-computing non-local-rack and remote host lists.

feat: add make_query_plan_with_exclusion to HostFilterPolicy and Defa…

04f541a

…ultLoadBalancingPolicy Both delegate to their child policy's exclusion-aware query plan while preserving their specific filtering/targeting behavior.

mykaul force-pushed the query_plan_opt_2 branch from 6a4878d to ee04aa9 Compare April 20, 2026 05:06

mykaul added 2 commits April 20, 2026 09:07

mykaul mentioned this pull request Apr 20, 2026

Flaky test: test_times_from_uuid1 fails on Windows CI due to clock drift #826

Open

Conversation

mykaul commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist

Uh oh!

mykaul commented Jan 23, 2026

Uh oh!

mykaul commented Jan 23, 2026

Uh oh!

mykaul commented Jan 23, 2026

Uh oh!

mykaul commented Jan 23, 2026

Uh oh!

mykaul commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mykaul commented Jan 24, 2026

Uh oh!

mykaul commented Jan 24, 2026

Uh oh!

mykaul commented Jan 24, 2026

Uh oh!

mykaul commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mykaul commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mykaul commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

mykaul commented Apr 7, 2026

Benchmark Results — Load Balancing Policy Optimization

Key changes

Fixes applied on top of original PR

Uh oh!

mykaul commented Apr 20, 2026

Uh oh!

mykaul commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Latest changes (92a0318)

Code changes

Benchmark fixes

Results (best of 3 runs, apples-to-apples with real Tablets({}))

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mykaul commented Jan 22, 2026 •

edited

Loading

mykaul commented Jan 24, 2026 •

edited

Loading

mykaul commented Jan 24, 2026 •

edited

Loading

mykaul commented Jan 25, 2026 •

edited

Loading

mykaul commented Apr 20, 2026 •

edited

Loading

Latest changes (`92a0318`)

Results (best of 3 runs, apples-to-apples with real `Tablets({})`)