fix: prevent rate limiter bypass via Lua atomic script #593
fix: prevent rate limiter bypass via Lua atomic script #593YashIIT0909 wants to merge 7 commits intocameri:mainfrom
Conversation
🦋 Changeset detectedLatest commit: 4ba022b The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…s in SlidingWindowRateLimiter
There was a problem hiding this comment.
Pull request overview
This PR hardens the SlidingWindowRateLimiter against concurrent-burst bypasses by moving its Redis sliding-window logic into an atomic Lua script (and switching expiry to millisecond precision), addressing the race/collision behavior described in #467.
Changes:
- Replaces multi-call Redis logic in
SlidingWindowRateLimiter.hit()with a single Redis-evaluated Lua script (atomic ZSET maintenance + unique member generation). - Adds a raw Lua eval helper to the Redis cache adapter and updates the cache adapter typing/tests to stub
eval. - Adds a changeset documenting the patch release.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/utils/sliding-window-rate-limiter.ts |
Migrates sliding-window operations into an atomic Lua script and uses PEXPIRE for ms-accurate expiry. |
src/adapters/redis-adapter.ts |
Adds evalRaw alongside existing SHA-cached eval implementation. |
src/@types/adapters.ts |
Updates ICacheAdapter typing context around eval (formatting/spacing). |
test/unit/utils/sliding-window-rate-limiter.spec.ts |
Updates unit tests to stub cache.eval and assert boolean outcomes. |
.changeset/metal-snails-prove.md |
Patch changeset for the rate-limiter bypass fix. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| it('returns true if rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:6`, `${now}:4`, `${now}:1`]) | ||
| evalStub.resolves(1) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) | ||
|
|
||
| expect(actualResult).to.be.true | ||
| }) | ||
|
|
||
| it('returns false if not rate limited', async () => { | ||
| const now = Date.now() | ||
| getRangeFromSortedSetStub.resolves([`${now}:10`]) | ||
| evalStub.resolves(0) | ||
|
|
||
| const actualResult = await rateLimiter.hit('key', 1, { period: 60000, rate: 10 }) |
There was a problem hiding this comment.
These tests now only assert the boolean mapping of the Redis result. Given the security impact of this change, it would be better to also assert that cache.eval is invoked with the expected keys/args (period/step/rate) and add a case where the stub resolves to a string (e.g. '1') to ensure the production code handles Redis return types robustly.
|
|
||
| public async evalRaw(script: string, keys: string[], args: string[]): Promise<unknown> { | ||
| await this.connection | ||
| logger('eval script with keys %o and args %o', keys, args) | ||
| return this.client.eval(script, { keys, arguments: args }) | ||
| } | ||
|
|
There was a problem hiding this comment.
evalRaw is introduced as a new public method but is not referenced anywhere in the repo and isn’t part of ICacheAdapter, so it can’t be used from most call sites (which type the adapter as the interface). Either remove it to avoid dead API surface, or add it to ICacheAdapter and update call sites to use it intentionally.
| public async evalRaw(script: string, keys: string[], args: string[]): Promise<unknown> { | |
| await this.connection | |
| logger('eval script with keys %o and args %o', keys, args) | |
| return this.client.eval(script, { keys, arguments: args }) | |
| } |
| end | ||
| end | ||
|
|
||
| if hits >= max_rate then |
There was a problem hiding this comment.
Rate-limit decision ignores the incoming step value. Right now the script blocks only when hits >= max_rate, but if step can be > 1, a request could be accepted even when hits + step would exceed the limit. Update the Lua condition to account for the pending increment (and keep parity with the previous hitsAfterAdd > rate behavior).
| if hits >= max_rate then | |
| if hits + step > max_rate then |
| local step_str = string.match(entries[i], "^[^:]+:(%d+)") | ||
| if step_str then | ||
| hits = hits + tonumber(step_str) |
There was a problem hiding this comment.
The Lua parsing only matches integer steps (%d+). Since step is typed as number, non-integer step values (e.g. 0.5) would be miscounted or skipped entirely, which can undercount hits and weaken enforcement. Consider extracting the full step token between the first and second : and passing it through tonumber, or otherwise enforce/validate that step is always an integer before calling the script.
| local step_str = string.match(entries[i], "^[^:]+:(%d+)") | |
| if step_str then | |
| hits = hits + tonumber(step_str) | |
| local step_str = string.match(entries[i], "^[^:]+:([^:]+)") | |
| if step_str then | |
| local entry_step = tonumber(step_str) | |
| if entry_step then | |
| hits = hits + entry_step | |
| end |
| const script = ` | ||
| local key = KEYS[1] | ||
| local timestamp = tonumber(ARGV[1]) |
There was a problem hiding this comment.
script is constructed inside hit() on every call. Since this is a hot path (rate limiting), consider hoisting the Lua script string to a module-level const so it’s allocated once and also makes the caching-by-script-string intent clearer.
…string return types from Redis
…nto fix/rate-limiter
Description
This PR resolves a critical rate-limiter bypass vulnerability in the SlidingWindowRateLimiter by migrating the sliding-window operations from Node.js
Promise.alllogic into an atomic Lua script executed natively on the Redis client.It accomplishes three crucial fixes:
while redis.call('ZSCORE', key, member), entirely avoiding data loss for exact-millisecond bursts without relying on heavy external UUID libraries for uniqueness.EXPIREcommand to the precisely accurate milliseconds-basedPEXPIREcommand.Related Issue
Fixes #467
Motivation and Context
Previously, when hundreds of WebSocket connections or REST payloads arrived in the exact same millisecond, Redis answered
0to the initial grouped read checks before any writing actually completed due to Node's asynchronous event loop. This allowed massive concurrent bursts to completely bypass the configured metrics. Furthermore, because concurrent writes were uniquely formatted strictly astimestamp:step, they overwrote each other as exact-duplicate members in theZSET, effectively erasing accurate metrics from the cache. Utilizing an atomic eval wrapper enforces exact, bottlenecked throughput serialization in Redis, guaranteeing the metrics are impenetrable to race-condition exploitation.How Has This Been Tested?
npm run test:unit).5 / minuteenvironment threshold locally via Docker Compose.1,000concurrent payloads inside the exact same millisecond using a WebSocket parallelPromise.all()PoC script.995requests returning["NOTICE", "rate limited"], while safely accepting exactly5requests, proving maximum robustness against intentional bypass bursts.Screenshots (if appropriate):
The screenshot is of a checking script that is being ran against a limit of 5.
Types of changes
Checklist: