(improvement)perf: Optimize DCAware/RackAware/TokenAware/HostFilter policies with host distance caching and overall perf. improvements (100's to 1000's of ns reduction, x1.1-2.9 improvement!)#651
Conversation
|
This is interesting, my change has exposed this - Need to understand this better :-/ |
if host is None:
host = self._cluster.metadata.get_host_by_host_id(host_id)
if host and host.endpoint != endpoint:
log.debug("[control connection] Updating host ip from %s to %s for (%s)", host.endpoint, endpoint, host_id)
old_endpoint = host.endpoint
host.endpoint = endpoint
self._cluster.metadata.update_host(host, old_endpoint)
reconnector = host.get_and_set_reconnection_handler(None)
if reconnector:
reconnector.cancel()
self._cluster.on_down(host, is_host_addition=False, expect_host_to_be_down=True)So first we update the host with the new endpoint, then mark it as down? |
|
This fixes it for me: which also makes sense to me. |
76ee195 to
edab823
Compare
|
I think CI failure is unrelated and is #359 |
edab823 to
1884f59
Compare
|
By using the (not amazing) benchmark from #653 , I got the following results: For master branch as a baseline: This branch (with just DC aware improvements): ** 433 -> 781 Kops/sec improvement ** With improvement to rack aware (on top of master), I got: ** 277 -> 324 Kops/sec improvement ** And on top of this branch: ** 277 -> 344 Kops/sec improvement ** And finally, for #650 : which kinda makes me suspect that branch is no good :-/ |
Sent separate PR - #654 |
|
With rack aware added (3rd commit), these are the current numbers: |
cc0204d to
6282e6f
Compare
Now that I also cache non-local hosts, not just remote (duh!), perf. is better: |
|
Added for TokenAware as well some optimization (need to improve commit message). So reasonable improvement, at least in this micro-benchmark. |
8f96d39 to
87c6a01
Compare
|
Last push, I think I'm done: |
5f283d1 to
bd6a9c5
Compare
|
Latest numbers: |
There was a problem hiding this comment.
Pull request overview
This PR optimizes load balancing policies (DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, HostFilterPolicy) with host distance caching and general performance improvements. The key insight is caching computed host distance data (remote hosts, non-local-rack hosts) and replica lookups to avoid repeated computation in the hot query-planning path.
Changes:
- Introduce
_remote_hosts(COW dict) onDCAwareRoundRobinPolicyandRackAwareRoundRobinPolicyfor O(1) distance lookups, plus_non_local_rack_hostsfor rack-aware iteration; both refreshed on topology changes. - Add an LRU replica cache to
TokenAwarePolicy(keyed by(keyspace, routing_key), invalidated on token map changes) and restructuremake_query_planto use direct distance bucketing instead of repeatedyield_in_orderscans. - Add
make_query_plan_with_exclusionAPI toLoadBalancingPolicyand all subclasses, enablingTokenAwarePolicyto skip already-yielded replicas when querying the child policy for remaining hosts.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cassandra/policies.py |
Core optimization: distance caching in DC/RackAware policies, LRU replica cache in TokenAwarePolicy, new make_query_plan_with_exclusion method across policies, formatting cleanup |
tests/unit/test_policies.py |
New tests for make_query_plan_with_exclusion, replica cache (hit/miss/eviction/invalidation/disabled), LWT determinism, tablet bypass; test mocks updated for new token_map-based replica resolution; formatting cleanup |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
This PR optimizes the load balancing policies in the ScyllaDB Python driver by introducing distance caching via Copy-On-Write (COW) strategy, an LRU cache for token-to-replica lookups in TokenAwarePolicy, and a new make_query_plan_with_exclusion API to avoid redundant iteration.
Changes:
- Introduces
_remote_hostsand_non_local_rack_hostscached dictionaries/lists inDCAwareRoundRobinPolicyandRackAwareRoundRobinPolicyfor O(1) distance lookups, rebuilt on topology changes. - Adds an LRU cache (
OrderedDict) inTokenAwarePolicyfor token-to-replica lookups, invalidated by token_map object identity change; includes LWT deterministic ordering (no shuffle). - Adds
make_query_plan_with_exclusion()toLoadBalancingPolicyand its subclasses to efficiently skip already-yielded replicas inTokenAwarePolicy.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
cassandra/policies.py |
Core performance optimizations: COW distance caching, LRU replica cache, make_query_plan_with_exclusion API, LWT shuffle skip, code formatting |
tests/unit/test_policies.py |
New tests for exclusion-based query planning, LRU cache behavior, LWT determinism, cache invalidation, and test infrastructure updates for new mock requirements |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add micro-benchmarks measuring query plan generation throughput for DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, DefaultLoadBalancingPolicy, and HostFilterPolicy. Uses pytest-benchmark for accurate timing and statistical reporting with a simulated 45-node cluster topology (5 DCs x 3 racks x 3 nodes) and 100,000 deterministic queries. Also rename tests/integration/standard/column_encryption/test_policies.py to test_encrypted_policies.py to avoid module name conflicts when running the full test suite. Run with: pytest -m benchmark tests/performance/ Benchmark results comparing master vs PR scylladb#651 optimizations (Python 3.14.3, pytest-benchmark 5.2.3, GC disabled, median): Policy | master (Kops/s) | PR#651 (Kops/s) | Speedup --------------------------|-----------------|-----------------|-------- DCAware | 833 | 1898 | 2.3x RackAware | 542 | 1589 | 2.9x TokenAware(DCAware) | 135 | 572 | 4.2x TokenAware(RackAware) | 123 | 539 | 4.4x Default(DCAware) | 674 | 1257 | 1.9x HostFilter(DCAware) | 394 | 579 | 1.5x
c689f0d to
6a4878d
Compare
Benchmark Results — Load Balancing Policy OptimizationSetup: 5 DCs × 3 racks × 3 nodes = 45 nodes, 50K
Key changes
Fixes applied on top of original PR
All 103 unit tests pass. |
Add micro-benchmarks measuring query plan generation throughput for DCAwareRoundRobinPolicy, RackAwareRoundRobinPolicy, TokenAwarePolicy, DefaultLoadBalancingPolicy, and HostFilterPolicy. Uses pytest-benchmark for accurate timing and statistical reporting with a simulated 45-node cluster topology (5 DCs x 3 racks x 3 nodes) and 100,000 deterministic queries. Also rename tests/integration/standard/column_encryption/test_policies.py to test_encrypted_policies.py to avoid module name conflicts when running the full test suite. Run with: pytest -m benchmark tests/performance/ Benchmark results comparing master vs PR scylladb#651 optimizations (Python 3.14.3, pytest-benchmark 5.2.3, GC disabled, median): Policy | master (Kops/s) | PR#651 (Kops/s) | Speedup --------------------------|-----------------|-----------------|-------- DCAware | 833 | 1898 | 2.3x RackAware | 542 | 1589 | 2.9x TokenAware(DCAware) | 135 | 572 | 4.2x TokenAware(RackAware) | 123 | 539 | 4.4x Default(DCAware) | 674 | 1257 | 1.9x HostFilter(DCAware) | 394 | 579 | 1.5x
Introduce _remote_hosts dict to cache REMOTE hosts, enabling O(1) distance lookups instead of scanning per-DC host lists. Replace islice(cycle(...)) with index arithmetic in make_query_plan. Call _refresh_remote_hosts() on topology changes.
…dRobin, and DCAware Add a new make_query_plan_with_exclusion() method that skips hosts in an exclusion set. The base class provides a default filtering implementation; RoundRobin and DCAware override for efficiency.
Cache remote hosts and non-local-rack hosts to enable O(1) distance lookups. Replace islice(cycle(...)) with index arithmetic. Reorder on_up/on_down to update DC-level hosts before rack-level for correct cache invalidation.
Optimized exclusion-aware query plan that avoids re-computing non-local-rack and remote host lists.
- Add LRU cache (default 1024 entries) for token-to-replicas lookups, auto-invalidated on topology changes (token_map identity check). - Sort replicas by distance (LOCAL_RACK > LOCAL > REMOTE) in a single pass instead of iterating three times. - Skip distance re-sorting for DCAware/RackAware child policies since they already yield in distance order; fallback re-sort for others. - LWT queries skip replica shuffling for deterministic plans. - Use make_query_plan_with_exclusion to avoid re-yielding replicas.
…ultLoadBalancingPolicy Both delegate to their child policy's exclusion-aware query plan while preserving their specific filtering/targeting behavior.
…erminism Add tests for make_query_plan_with_exclusion in RoundRobin, DCAware, and RackAware policies. Add cache tests (hit, miss, eviction, topology invalidation, disabled) and LWT determinism tests for TokenAwarePolicy. Update existing tests to set up token_map mocks and shuffle_replicas=False to match the new TokenAwarePolicy implementation.
…ace-aware invalidation - Move LRU cache lookup before token_class.from_key() so cache hits skip the murmur3 hash computation and Token object allocation entirely. - Add keyspace-aware cache invalidation: track the per-keyspace replica map object identity so ALTER KEYSPACE / replication changes are detected even when the TokenMap object itself is reused (in-place rebuild). - Remove unused 'token' from cache entries (was never read after storage). - Add test_cache_invalidation_on_keyspace_replication_change. TODO: The tablet path still does two full child-policy traversals per query. Metadata.get_host_by_host_id() is O(1) and could resolve tablet replicas in O(rf) instead. Deferred to minimize behavioral change.
6a4878d to
ee04aa9
Compare
|
Updated the branch after the latest fix/review pass. Recent changes:
Validation:
Most recent benchmark (best of 3 runs, branch vs origin/master baseline):
Deferred TODO:
|
…emote hosts - Convert used_hosts_per_remote_dc to a property in DCAwareRoundRobinPolicy and RackAwareRoundRobinPolicy so that runtime changes immediately refresh the cached _remote_hosts dict (restores origin/master behavior). - Defer reading self._remote_hosts until the remote iteration phase in make_query_plan() and make_query_plan_with_exclusion() so topology changes during local iteration are visible (restores origin/master late-binding behavior). - Add tests for runtime used_hosts_per_remote_dc changes and for modification-during-generation on the exclusion path.
- Add Tablets.__bool__() so 'if tablets:' is False when no tablets are
registered, avoiding the get_tablet_for_key() method call + dict
lookup on every cache miss in the non-tablet path.
- Restructure TokenAwarePolicy.make_query_plan() to nest the tablet
handling inside 'if cluster_metadata._tablets:' guard.
- Fix benchmark to use real Tablets({}) instead of Mock (Mock is always
truthy, hiding the tablet-skip optimization).
- Add zipfian workload to benchmark (500 distinct keys, Zipf a=1.2)
to measure cache-hit performance alongside cache-miss.
Latest changes (92a0318)Code changes
Benchmark fixes
Results (best of 3 runs, apples-to-apples with real
|
| Policy | origin/master (ns/op) | Branch (ns/op) | Speedup | Delta |
|---|---|---|---|---|
| DCAware | 1141 | 492 | 2.32x | -57% |
| RackAware | 1768 | 569 | 3.11x | -68% |
| TokenAware(DCAware) miss | 18467 | 19508 | 0.95x | +6% |
| TokenAware(DCAware) zipf | 19082 | 15271 | 1.25x | -20% |
| TokenAware(RackAware) miss | 23096 | 20195 | 1.14x | -13% |
| TokenAware(RackAware) zipf | 24972 | 16241 | 1.54x | -35% |
| Default(DCAware) | 14836 | 13690 | 1.08x | -8% |
| HostFilter(DCAware) | 1953 | 1301 | 1.50x | -33% |
Notes:
- Speedup is computed as
origin/master ns/op / branch ns/op, so values below1.0xindicate a slowdown. - origin/master has no cache, so its "miss" and "zipf" numbers are essentially the same.
- The +6% on TokenAware(DCAware) miss is the cache overhead (lock + OrderedDict) when the cache never hits. A lock-free RCU-style cache would eliminate this; deferred for now.
- The zipfian workload (realistic hot-partition access pattern) shows the cache payoff: -20% DCAware, -35% RackAware.
- All 118 tests pass (108 policy + 10 tablet).
Refactor
DCAwareRoundRobinPolicyto use a Copy-On-Write (COW) strategy for managing host distances.Results (5 DCs × 3 racks × 3 nodes = 45 nodes, 100K queries, median of 5 iterations):
Key changes:
_remote_hoststo cacheREMOTEhosts, enabling O(1) distance lookups during query planning for distance.IGNOREDhosts do not need to be stored in the cache.For 'LOCAL' we do a simple comparison.
_refresh_remote_hoststo handle node changes.TokenAwarePolicy(default 1024 entries, auto-invalidated on topology change).TokenAwarePolicyskips distance re-sorting for DCAware/RackAware child policies (they already yield in distance order), with a fallback re-sort for other child policies.TokenAwarePolicyno longer uses__slots__to avoid breaking downstream subclasses.This is a different attempt from #650 to add caching to host distance to make query planning faster.
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.