Skip to content

Datadome server-side validation#654

Open
ChristianPavilonis wants to merge 2 commits intosplit/multi-backend-routingfrom
split/datadome-server-side-validation
Open

Datadome server-side validation#654
ChristianPavilonis wants to merge 2 commits intosplit/multi-backend-routingfrom
split/datadome-server-side-validation

Conversation

@ChristianPavilonis
Copy link
Copy Markdown
Collaborator

Summary

  • split the DataDome server-side validation work into a stacked follow-up PR
  • add server-side request validation and related sampling/timeout behavior
  • add DataDome docs and publisher request handling updates

What changed

  • add validate_request_server_side() and supporting DataDome config/logic
  • add DataDome validation tests and deterministic sampling behavior
  • return an edge-generated blocked response when validation denies the request
  • document DataDome server-side validation in the integration guide

Verification

  • cargo fmt --all -- --check
  • cargo test --workspace
  • cargo clippy --workspace --all-targets --all-features -- -D warnings

Stack

@ChristianPavilonis ChristianPavilonis changed the title Split DataDome server-side validation into a follow-up PR Datadome server-side validation Apr 23, 2026
@ChristianPavilonis ChristianPavilonis self-assigned this Apr 23, 2026
@ChristianPavilonis ChristianPavilonis force-pushed the split/multi-backend-routing branch from 7139cdb to b66b1d2 Compare April 23, 2026 18:52
Copy link
Copy Markdown
Collaborator

@aram356 aram356 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

Adds validate_request_server_side() for DataDome edge bot validation. Three core problems block merge: a critical correctness bug (HTTP header names containing : panic at runtime, taking down 100% of sampled traffic when validation is enabled), a sensitive customer config file checked in (already deleted in the rebased base branch), and per-request rebuilds of stateful objects on the hot path. Tests pass locally and in CI — but coverage misses the buggy code path entirely.

Blocking

🔧 wrench

  • Header names with : panic at runtimeRequest::set_header("x-datadome-params:key", ...) panics via Fastly's convert_stringy! macro. (crates/trusted-server-core/src/integrations/datadome.rs:563-572)
  • Customer config with 170+ real publisher domains committedcrates/trusted-server-adapter-fastly/backends.toml lists nytimes.com, washingtonpost.com, forbes.com, cbsnews.com, etc. as backends for https://raven-public.prod.saymedia.com. The base branch's follow-up commit a53f6df6 ("Remove customer backend config and sensitive routing logs") deletes this exact file and sanitizes a log::info!("Backend routing: host=… → …") line in publisher.rs that prints the customer origin URL on every request. The PR will pick those changes up automatically on rebase, but right now the public diff still contains both. Rebase onto the current split/multi-backend-routing tip before merge.
  • validate_request_server_side() rebuilds the integration on every request — re-parses JSON config, re-validates, re-allocates Arc, re-logs at INFO. (datadome.rs:786-794 + 714-745)
  • BackendRouter rebuilt on every request — recompiles every path_regex per request. (publisher.rs:277-290)
  • Invalid backend regex silently swallowed.ok() drops the error and falls back to the default origin, contradicting CLAUDE.md's startup-error policy. (publisher.rs:289)

❓ question

  • Is the x-datadome-params:key header format derived from a working reference? Beyond the panic, the parameter naming (requestmodulename, moduleversion, clientid, ip, method) does not match the documented DataDome Fastly Compute Rust example, which uses a single X-DataDome-Params header with ;-separated Key=Value pairs (capitalized). What was the source for this format, and has any version of this code been exercised against a live api-fastly.datadome.co endpoint?

Non-blocking

♻️ refactor

  • Hardcoded moduleversion = "1.0" — use env!("CARGO_PKG_VERSION"). (datadome.rs:565)
  • enabled = false + server_side_enabled = true only warns — should fail config validation per CLAUDE.md. (datadome.rs:721-727)
  • Hand-rolled extract_host — use url::Url::parse. (datadome.rs:302-309)
  • Validation Err arm loses cause chain — use change_context. (datadome.rs:614-631)

🤔 thinking

  • 503 cacheability — set Cache-Control: no-store on validation errors. (publisher.rs:341-348)
  • validation_timeout_ms is first-byte only — slow body responses still stall. (datadome.rs:466-468, 555-558)
  • Cookie double-encoding risk — raw cookie value passed to urlencoding::encode. (datadome.rs:497-501, 569-574)
  • 0.0.0.0 fallback IP buckets — all no-IP traffic in/out together. (datadome.rs:448-452)
  • Path patterns dead when same backend has domains — document this footgun in multi-backend-routing.md. (backend_router.rs:157-177)
  • Routing tests bind fixed ports 19090–19093 — collisions between concurrent runs. (crates/integration-tests/tests/routing.rs:101-104)

🌱 seedling

  • No body-size limit on tags.js proxytake_body_str() at datadome.rs:344 reads the full upstream body into memory. DataDome's tags.js is typically <100 KB so this is fine today, but a malicious upstream or misconfig could push more. Consider using the streaming pipeline for consistency with other proxied content. (Outside this PR's diff, but adjacent to changes here.)

⛏ nitpick

  • Polynomial hash for samplingDefaultHasher would be more idiomatic. (datadome.rs:487-491)
  • build() called from two pathsregister() calls build() internally; validate_request_server_side() also calls build(). After the registry fix, only the registry path should remain. (datadome.rs:798-807)

CI Status

  • fmt: PASS (CI)
  • clippy: PASS (CI)
  • rust tests: PASS — 731/731 locally + CI integration tests passing
  • js tests: PASS (CI)
  • prepare integration artifacts: PASS (CI)

validation_req.set_header("x-datadome-params:requestmodulename", "TrustedServerRust");
validation_req.set_header("x-datadome-params:moduleversion", "1.0");
validation_req.set_header("x-datadome-params:ip", client_ip);
validation_req.set_header("x-datadome-params:method", req.get_method().as_str());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — HTTP header names containing : panic at runtime, crashing every sampled request.

Request::set_header(name: impl ToHeaderName, ...) panics when name is invalid. RFC 7230 disallows : in token characters, so HeaderName::from_str("x-datadome-params:key") returns Err(InvalidHeaderName). Verified against fastly-0.11.12/src/convert/macros.rs:100:

fn into_borrowable(self) -> Self::Borrowable {
    <$type>::from_str(self).unwrap_or_else(|_| panic!(concat!($fail_msg, ": {}"), self))
}

The $fail_msg here is "invalid HTTP header name". Confirmed independently with http::HeaderName::from_bytes(b"x-datadome-params:key")Err(InvalidHeaderName).

Impact: As soon as server_side_enabled = true, a valid server_side_key is configured, and a request is sampled (default sample_rate = 100, so every request), the WASM guest panics. This is a 100%-traffic outage the moment validation is enabled.

Why no test caught it: All five validate_request_inner tests early-return before reaching set_header (validation disabled / missing key / sample_rate = 0 / server_side_key = None). The full-validation code path is never exercised.

Fix: Use a single combined header per DataDome's documented Fastly Compute integration:

let mut params = format!(
    "Key={};RequestModuleName=TrustedServerRust;ModuleVersion=1.0;ServerName=trusted-server;IP={};Method={}",
    api_key, client_ip, req.get_method()
);
if !datadome_cookie.is_empty() {
    params.push_str(&format!(";ClientID={}", urlencoding::encode(datadome_cookie)));
}
validation_req.set_header("x-datadome-params", params);

Confirm the exact wire format against DataDome's current docs/example, and add a test that builds the validation request and validates each header name with HeaderName::from_str so this class of bug fails fast.

Some(integration) => integration.validate_request(req),
None => Ok(true),
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchvalidate_request_server_side() rebuilds the integration on every request.

Every request that reaches handle_publisher_request pays for:

  1. Re-parsing the DataDome JSON config out of settings.integrations and running validator over it.
  2. Allocating a new Arc<DataDomeIntegration>.
  3. Logging "[datadome] Registering integration (sdk_origin: …, rewrite_sdk: …, server_side: …)" at INFO (line 737) — 1 line of log spam per request.

The disabled fast path (server_side_enabled = false) is also subject to this — every request pays for parse + validate + Arc + log even when the feature is off.

Fix: The IntegrationRegistry already holds an Arc<DataDomeIntegration> constructed at startup via register(). Expose a public accessor on the registry (or pass the integration through the request handler) and call integration.validate_request(req) directly. At minimum, demote the registration log to debug! and ensure it only fires once.

e
})
.ok()
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrenchBackendRouter is rebuilt on every request.

BackendRouter::new compiles every backend's path_regex (via Regex::new) on every call. A typical multi-publisher deployment may have dozens of regex patterns. Regex compilation on the request hot path is a real cost, and BackendRoutingConfig::path_patterns is already validated at startup.

Fix: Build the router once during settings initialization (or memoize via OnceLock<BackendRouter> keyed on &Settings), and pass &BackendRouter into handle_publisher_request.

log::error!("Failed to build backend router: {:?}", e);
e
})
.ok()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 wrench — Invalid backend regex is silently swallowed at runtime.

.map_err(|e| { log::error!("Failed to build backend router: {:?}", e); e })
.ok()

If BackendRouter::new fails (e.g. bad path_regex injected via env var), the error is logged and dropped, then routing falls back to the default origin. CLAUDE.md is explicit:

Config-derived regex/pattern compilation must not use panic-prone expect()/unwrap(); invalid enabled config should surface as startup/config errors.

Fix: Validate path regexes at config-load time (in Settings::from_toml* via the existing validator derive — currently path_regex is just Option<String> with no validation). Then construct the router infallibly, never at request time.

// Set DataDome validation headers
validation_req.set_header("x-datadome-params:key", api_key);
validation_req.set_header("x-datadome-params:requestmodulename", "TrustedServerRust");
validation_req.set_header("x-datadome-params:moduleversion", "1.0");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ refactor — Hardcoded moduleversion = "1.0". Use env!("CARGO_PKG_VERSION") so DataDome's analytics reflect the deployed version.

"x-datadome-params:clientid",
urlencoding::encode(datadome_cookie).as_ref(),
);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Cookie value double-encoding risk. extract_datadome_cookie returns the raw value, then it's passed through urlencoding::encode. Browsers send cookies URL-encoded (or often do), so encoding again can produce e.g. %2520 from %20. Confirm with DataDome whether this header should be raw or percent-encoded.

req.get_client_ip_addr()
.map(|ip| ip.to_string())
.unwrap_or_else(|| "0.0.0.0".to_string())
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — All requests with no client IP hash into the same sampling bucket — when sample_rate is e.g. 10%, the entire "no IP" bucket is in or out depending on hash("0.0.0.0") % 100. Probably fine, but document it.

return (&route.origin_url, route.certificate_check);
}
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Domain matches on backend A short-circuit before path patterns on backend B can run, even when A has no path pattern. An entry with both domains and path_patterns always reaches via domains; the path_patterns are dead for that hostname. The test test_domain_and_path_pattern_precedence documents this, but multi-backend-routing.md should call it out more clearly — it's a footgun.

MockOrigin::start(19091, "site-a"),
MockOrigin::start(19092, "site-b"),
MockOrigin::start(19093, "api"),
];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 thinking — Hardcoded ports 19090–19093 collide between concurrent local runs. Lower-priority since these tests skip when ROUTING_WASM_PATH is unset, but consider ephemeral-port allocation + writing the chosen ports into the WASM build env.

ip.bytes().fold(0u32, |acc, b| {
acc.wrapping_mul(31).wrapping_add(u32::from(b))
})
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpickDefaultHasher would be more idiomatic and have better distribution than this Java-style polynomial hash. Functional difference is negligible for % 100 sampling, so purely style.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants