Datadome server-side validation#654
Datadome server-side validation#654ChristianPavilonis wants to merge 2 commits intosplit/multi-backend-routingfrom
Conversation
7139cdb to
b66b1d2
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Adds validate_request_server_side() for DataDome edge bot validation. Three core problems block merge: a critical correctness bug (HTTP header names containing : panic at runtime, taking down 100% of sampled traffic when validation is enabled), a sensitive customer config file checked in (already deleted in the rebased base branch), and per-request rebuilds of stateful objects on the hot path. Tests pass locally and in CI — but coverage misses the buggy code path entirely.
Blocking
🔧 wrench
- Header names with
:panic at runtime —Request::set_header("x-datadome-params:key", ...)panics via Fastly'sconvert_stringy!macro. (crates/trusted-server-core/src/integrations/datadome.rs:563-572) - Customer config with 170+ real publisher domains committed —
crates/trusted-server-adapter-fastly/backends.tomllistsnytimes.com,washingtonpost.com,forbes.com,cbsnews.com, etc. as backends forhttps://raven-public.prod.saymedia.com. The base branch's follow-up commita53f6df6("Remove customer backend config and sensitive routing logs") deletes this exact file and sanitizes alog::info!("Backend routing: host=… → …")line inpublisher.rsthat prints the customer origin URL on every request. The PR will pick those changes up automatically on rebase, but right now the public diff still contains both. Rebase onto the currentsplit/multi-backend-routingtip before merge. validate_request_server_side()rebuilds the integration on every request — re-parses JSON config, re-validates, re-allocatesArc, re-logs atINFO. (datadome.rs:786-794+714-745)BackendRouterrebuilt on every request — recompiles everypath_regexper request. (publisher.rs:277-290)- Invalid backend regex silently swallowed —
.ok()drops the error and falls back to the default origin, contradicting CLAUDE.md's startup-error policy. (publisher.rs:289)
❓ question
- Is the
x-datadome-params:keyheader format derived from a working reference? Beyond the panic, the parameter naming (requestmodulename,moduleversion,clientid,ip,method) does not match the documented DataDome Fastly Compute Rust example, which uses a singleX-DataDome-Paramsheader with;-separatedKey=Valuepairs (capitalized). What was the source for this format, and has any version of this code been exercised against a liveapi-fastly.datadome.coendpoint?
Non-blocking
♻️ refactor
- Hardcoded
moduleversion = "1.0"— useenv!("CARGO_PKG_VERSION"). (datadome.rs:565) enabled = false+server_side_enabled = trueonly warns — should fail config validation per CLAUDE.md. (datadome.rs:721-727)- Hand-rolled
extract_host— useurl::Url::parse. (datadome.rs:302-309) - Validation
Errarm loses cause chain — usechange_context. (datadome.rs:614-631)
🤔 thinking
503cacheability — setCache-Control: no-storeon validation errors. (publisher.rs:341-348)validation_timeout_msis first-byte only — slow body responses still stall. (datadome.rs:466-468,555-558)- Cookie double-encoding risk — raw cookie value passed to
urlencoding::encode. (datadome.rs:497-501,569-574) 0.0.0.0fallback IP buckets — all no-IP traffic in/out together. (datadome.rs:448-452)- Path patterns dead when same backend has
domains— document this footgun inmulti-backend-routing.md. (backend_router.rs:157-177) - Routing tests bind fixed ports 19090–19093 — collisions between concurrent runs. (
crates/integration-tests/tests/routing.rs:101-104)
🌱 seedling
- No body-size limit on
tags.jsproxy —take_body_str()atdatadome.rs:344reads the full upstream body into memory. DataDome's tags.js is typically <100 KB so this is fine today, but a malicious upstream or misconfig could push more. Consider using the streaming pipeline for consistency with other proxied content. (Outside this PR's diff, but adjacent to changes here.)
⛏ nitpick
- Polynomial hash for sampling —
DefaultHasherwould be more idiomatic. (datadome.rs:487-491) build()called from two paths —register()callsbuild()internally;validate_request_server_side()also callsbuild(). After the registry fix, only the registry path should remain. (datadome.rs:798-807)
CI Status
- fmt: PASS (CI)
- clippy: PASS (CI)
- rust tests: PASS — 731/731 locally + CI integration tests passing
- js tests: PASS (CI)
- prepare integration artifacts: PASS (CI)
| validation_req.set_header("x-datadome-params:requestmodulename", "TrustedServerRust"); | ||
| validation_req.set_header("x-datadome-params:moduleversion", "1.0"); | ||
| validation_req.set_header("x-datadome-params:ip", client_ip); | ||
| validation_req.set_header("x-datadome-params:method", req.get_method().as_str()); |
There was a problem hiding this comment.
🔧 wrench — HTTP header names containing : panic at runtime, crashing every sampled request.
Request::set_header(name: impl ToHeaderName, ...) panics when name is invalid. RFC 7230 disallows : in token characters, so HeaderName::from_str("x-datadome-params:key") returns Err(InvalidHeaderName). Verified against fastly-0.11.12/src/convert/macros.rs:100:
fn into_borrowable(self) -> Self::Borrowable {
<$type>::from_str(self).unwrap_or_else(|_| panic!(concat!($fail_msg, ": {}"), self))
}The $fail_msg here is "invalid HTTP header name". Confirmed independently with http::HeaderName::from_bytes(b"x-datadome-params:key") → Err(InvalidHeaderName).
Impact: As soon as server_side_enabled = true, a valid server_side_key is configured, and a request is sampled (default sample_rate = 100, so every request), the WASM guest panics. This is a 100%-traffic outage the moment validation is enabled.
Why no test caught it: All five validate_request_inner tests early-return before reaching set_header (validation disabled / missing key / sample_rate = 0 / server_side_key = None). The full-validation code path is never exercised.
Fix: Use a single combined header per DataDome's documented Fastly Compute integration:
let mut params = format!(
"Key={};RequestModuleName=TrustedServerRust;ModuleVersion=1.0;ServerName=trusted-server;IP={};Method={}",
api_key, client_ip, req.get_method()
);
if !datadome_cookie.is_empty() {
params.push_str(&format!(";ClientID={}", urlencoding::encode(datadome_cookie)));
}
validation_req.set_header("x-datadome-params", params);Confirm the exact wire format against DataDome's current docs/example, and add a test that builds the validation request and validates each header name with HeaderName::from_str so this class of bug fails fast.
| Some(integration) => integration.validate_request(req), | ||
| None => Ok(true), | ||
| } | ||
| } |
There was a problem hiding this comment.
🔧 wrench — validate_request_server_side() rebuilds the integration on every request.
Every request that reaches handle_publisher_request pays for:
- Re-parsing the DataDome JSON config out of
settings.integrationsand runningvalidatorover it. - Allocating a new
Arc<DataDomeIntegration>. - Logging
"[datadome] Registering integration (sdk_origin: …, rewrite_sdk: …, server_side: …)"atINFO(line 737) — 1 line of log spam per request.
The disabled fast path (server_side_enabled = false) is also subject to this — every request pays for parse + validate + Arc + log even when the feature is off.
Fix: The IntegrationRegistry already holds an Arc<DataDomeIntegration> constructed at startup via register(). Expose a public accessor on the registry (or pass the integration through the request handler) and call integration.validate_request(req) directly. At minimum, demote the registration log to debug! and ensure it only fires once.
| e | ||
| }) | ||
| .ok() | ||
| }; |
There was a problem hiding this comment.
🔧 wrench — BackendRouter is rebuilt on every request.
BackendRouter::new compiles every backend's path_regex (via Regex::new) on every call. A typical multi-publisher deployment may have dozens of regex patterns. Regex compilation on the request hot path is a real cost, and BackendRoutingConfig::path_patterns is already validated at startup.
Fix: Build the router once during settings initialization (or memoize via OnceLock<BackendRouter> keyed on &Settings), and pass &BackendRouter into handle_publisher_request.
| log::error!("Failed to build backend router: {:?}", e); | ||
| e | ||
| }) | ||
| .ok() |
There was a problem hiding this comment.
🔧 wrench — Invalid backend regex is silently swallowed at runtime.
.map_err(|e| { log::error!("Failed to build backend router: {:?}", e); e })
.ok()If BackendRouter::new fails (e.g. bad path_regex injected via env var), the error is logged and dropped, then routing falls back to the default origin. CLAUDE.md is explicit:
Config-derived regex/pattern compilation must not use panic-prone
expect()/unwrap(); invalid enabled config should surface as startup/config errors.
Fix: Validate path regexes at config-load time (in Settings::from_toml* via the existing validator derive — currently path_regex is just Option<String> with no validation). Then construct the router infallibly, never at request time.
| // Set DataDome validation headers | ||
| validation_req.set_header("x-datadome-params:key", api_key); | ||
| validation_req.set_header("x-datadome-params:requestmodulename", "TrustedServerRust"); | ||
| validation_req.set_header("x-datadome-params:moduleversion", "1.0"); |
There was a problem hiding this comment.
♻️ refactor — Hardcoded moduleversion = "1.0". Use env!("CARGO_PKG_VERSION") so DataDome's analytics reflect the deployed version.
| "x-datadome-params:clientid", | ||
| urlencoding::encode(datadome_cookie).as_ref(), | ||
| ); | ||
| } |
There was a problem hiding this comment.
🤔 thinking — Cookie value double-encoding risk. extract_datadome_cookie returns the raw value, then it's passed through urlencoding::encode. Browsers send cookies URL-encoded (or often do), so encoding again can produce e.g. %2520 from %20. Confirm with DataDome whether this header should be raw or percent-encoded.
| req.get_client_ip_addr() | ||
| .map(|ip| ip.to_string()) | ||
| .unwrap_or_else(|| "0.0.0.0".to_string()) | ||
| } |
There was a problem hiding this comment.
🤔 thinking — All requests with no client IP hash into the same sampling bucket — when sample_rate is e.g. 10%, the entire "no IP" bucket is in or out depending on hash("0.0.0.0") % 100. Probably fine, but document it.
| return (&route.origin_url, route.certificate_check); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🤔 thinking — Domain matches on backend A short-circuit before path patterns on backend B can run, even when A has no path pattern. An entry with both domains and path_patterns always reaches via domains; the path_patterns are dead for that hostname. The test test_domain_and_path_pattern_precedence documents this, but multi-backend-routing.md should call it out more clearly — it's a footgun.
| MockOrigin::start(19091, "site-a"), | ||
| MockOrigin::start(19092, "site-b"), | ||
| MockOrigin::start(19093, "api"), | ||
| ]; |
There was a problem hiding this comment.
🤔 thinking — Hardcoded ports 19090–19093 collide between concurrent local runs. Lower-priority since these tests skip when ROUTING_WASM_PATH is unset, but consider ephemeral-port allocation + writing the chosen ports into the WASM build env.
| ip.bytes().fold(0u32, |acc, b| { | ||
| acc.wrapping_mul(31).wrapping_add(u32::from(b)) | ||
| }) | ||
| } |
There was a problem hiding this comment.
⛏ nitpick — DefaultHasher would be more idiomatic and have better distribution than this Java-style polynomial hash. Functional difference is negligible for % 100 sampling, so purely style.
Summary
What changed
validate_request_server_side()and supporting DataDome config/logicVerification
cargo fmt --all -- --checkcargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningsStack