Skip to content

feat(rest/auth): SigV4 authentication for AWS Glue Iceberg REST#4

Merged
mmaslankaprv merged 11 commits intomainfrom
feat/rest-sigv4-auth
Apr 22, 2026
Merged

feat(rest/auth): SigV4 authentication for AWS Glue Iceberg REST#4
mmaslankaprv merged 11 commits intomainfrom
feat/rest-sigv4-auth

Conversation

@mmaslankaprv
Copy link
Copy Markdown
Member

Summary

Adds AWS Signature V4 (SigV4) authentication to the REST catalog client so it can talk to AWS Glue's Iceberg REST endpoint (https://glue.<region>.amazonaws.com/iceberg). Glue rejects OAuth bearer tokens and requires SigV4 on every request.

Uses aws-crt-cpp (AWS's idiomatic C++ wrapper over aws-c-auth) so we don't hand-roll crypto. The dep is optional — when aws-crt-cpp isn't found via Conan/vcpkg the REST client still builds and the sigv4 auth type returns NotImplemented.

Commit series (single-concern)

  1. build(conan) — adds aws-crt-cpp and a QUIET CMake resolver that exports ICEBERG_REST_HAVE_SIGV4 when the dep is available.
  2. feat(rest/auth) — extends AuthSession with a SignableRequest overload (method/URL/query/body). Default forwards to the existing headers-only method, so None/Basic/OAuth2 sessions need no changes.
  3. feat(rest/http) — plumbs request context through HttpClient::BuildHeaders and all five call sites; PostForm pre-encodes the body so signing sees the exact bytes libcurl sends.
  4. feat(rest/auth) — adds SigV4 property keys (static access-key-id / secret-access-key / session-token, a credentials-provider selector, and the execute-api signing-service default that matches the Java Iceberg client).
  5. feat(rest/auth) — implements SigV4Signer + SigV4AuthSession + SigV4Manager on top of aws-crt-cpp, registers the factory when ICEBERG_REST_WITH_SIGV4 is defined, wires the Conan target into iceberg_rest.
  6. test(rest/auth) — 12 manager tests + 4 canonical-vector signer tests (signing timestamp pinned, expected Authorization headers cross-checked with Python botocore).

Configuration

Property Required Default Notes
rest.auth.type yes Must be sigv4.
rest.auth.sigv4.region yes e.g. us-east-1.
rest.auth.sigv4.service no execute-api Set to glue for AWS Glue.
rest.auth.sigv4.credentials-provider no auto static or default. Auto-detects: static when access-key-id is set, else default (env → profile → STS Web Identity → IMDS/ECS).
rest.auth.sigv4.access-key-id when provider=static
rest.auth.sigv4.secret-access-key when provider=static
rest.auth.sigv4.session-token no For temporary STS creds.
rest.auth.sigv4.delegate-auth-type no Wraps the inner auth type (e.g. oauth2) so its headers are covered by the signature.

For Glue on EKS (IRSA): rest.auth.type=sigv4, rest.auth.sigv4.service=glue, rest.auth.sigv4.region=<region> — no keys needed, the pod's service-account token resolves via the default chain.

Design notes

  • Double URI encoding is enabled (Glue is non-S3). should_normalize_uri_path=true.
  • Payload hash is computed in-house and fed via AwsSigningConfig::SetSignedBodyValue, avoiding aws-c-auth's empty-body handling bug that bit the Java Iceberg impl (apache/iceberg#6951).
  • Delegate auth (e.g. OAuth2 wrapped by SigV4): the delegate's Authorization: Bearer header would collide with the Authorization header SigV4 emits and would also trigger aws-c-auth's AWS_AUTH_SIGNING_ILLEGAL_REQUEST_HEADER. We rename it to X-Iceberg-Access-Delegation before signing, matching the Java RESTSigV4AuthSession.
  • ApiHandle is a function-local static so aws-c-* is initialized once per process.
  • Credentials chain flows straight to AwsSigningConfig::SetCredentialsProvider, so caching and refresh are handled inside aws-c-auth — static keys are frozen for the life of the signer; the default chain refreshes.

Test plan

  • conan install . -s build_type=Release --build=missing
  • cmake --preset conan-release && cmake --build build/Release --target rest_catalog_test
  • ctest --test-dir build/Release -R rest_catalog --output-on-failure → 199/199 pass locally, including 12 new manager cases + 4 canonical-vector signer cases.
  • Manual end-to-end against a real AWS Glue catalog (off-CI, requires IAM credentials). An automated local-mock integration test is sketched in the follow-up plan but not in this PR.

Needed for AWS SigV4 request signing in the REST catalog client (AWS
Glue's Iceberg REST endpoint rejects OAuth bearer tokens and requires
SigV4). The CMake resolver is QUIET so the dep stays optional: when
found, ICEBERG_REST_HAVE_SIGV4 is exported for downstream targets to
gate the SigV4 code path; when absent, the REST client still builds.
Adds a SignableRequest struct (method, URL, query params, body) and a
virtual Authenticate(SignableRequest, headers) overload that by default
forwards to the existing headers-only method. Needed for SigV4, which
must hash the body and include the method / URL / query in its
canonical request. Existing None/Basic/OAuth2 sessions ignore the
extra context and need no changes.
Every HttpClient method (Get/Post/PostForm/Head/Delete) now forwards
method, URL, query params, and body to BuildHeaders so the
AuthSession's richer Authenticate overload can see what's being sent.
PostForm pre-encodes the form body once via UrlEncoder so the bytes
that get signed match what libcurl puts on the wire.

No-op for existing auth schemes (they don't override the new
overload); unlocks SigV4 signing in a follow-up.
Declares property keys for SigV4 static credentials (access-key-id,
secret-access-key, session-token) and a credentials-provider selector
that toggles between static keys and the aws-crt-cpp default
credential chain. Also adds the execute-api signing-service default
that matches the Java Iceberg client. Consumed by the SigV4 manager
in a follow-up; no behaviour change on its own.
Wraps aws-crt-cpp's Sigv4HttpRequestSigner + AwsSigningConfig to sign
outgoing REST requests for AWS Glue's Iceberg REST endpoint.

- SigV4Signer holds a shared ICredentialsProvider built from config:
  either static keys (rest.auth.sigv4.access-key-id/...) or the
  aws-crt-cpp default chain (Environment -> Profile -> STS Web
  Identity -> IMDS/ECS), with auto-detection when the provider type
  is unset.
- Signing options mirror the SigV4 spec for non-S3 services:
  double URI encoding, path normalization, explicit sha256 payload
  hash via x-amz-content-sha256 (worked around the Java Iceberg
  empty-body SDK bug).
- SigV4AuthSession runs an optional delegate session first so its
  Authorization header is covered by the signature; the delegate's
  Authorization is renamed to X-Iceberg-Access-Delegation before
  signing to avoid aws-c-auth's reserved-header rejection.
- MakeSigV4Manager is registered in the auth-manager registry when
  ICEBERG_REST_WITH_SIGV4 is defined; otherwise the registry keeps
  returning NotImplemented.
- CMakeLists gates the sigv4_signer.cc source + AWS::aws-crt-cpp
  link on the cache variable set by the toolchain resolver.
- auth_manager_test.cc: 12 SigV4 manager cases covering case-insensitive
  auth-type, required-field validation, header emission, session token
  pass-through, OAuth2-delegate wrapping, recursion rejection, and
  the default / static / unknown credentials-provider paths.
- sigv4_signer_test.cc (new): canonical SigV4 vectors with the signing
  time pinned via SigV4Signer::MakeForTests. Expected Authorization
  headers were generated with Python botocore (SigV4Auth with the
  same parameters as our signer) and are asserted bytewise so any
  regression in our aws-crt-cpp wrapping is caught.
- CMakeLists gates sigv4_signer_test.cc on ICEBERG_REST_HAVE_SIGV4
  and defines ICEBERG_REST_WITH_SIGV4 for the whole rest_catalog_test
  target so conditional tests compile.
Whitespace-only reformats across the SigV4 additions; no behaviour
change. Tests still pass.
- Replace the C-style ``unsigned char digest[]`` scratch buffer with
  ``std::array`` and feed ``digest.data()`` to ``SHA256`` (modernize-
  avoid-c-arrays).
- Swap the hex lookup table to a ``constexpr std::string_view`` instead
  of a C-string array (modernize-avoid-c-arrays).
- Use braced-init return in ``ExtractPath`` (modernize-return-braced-
  init-list).

No behaviour change; the canonical-vector tests still produce the same
Authorization headers.
Without aws-crt-cpp on the lint runner, clang-tidy reports
sigv4_signer.cc's ``#include <aws/crt/Api.h>`` as file-not-found and
fails the job even though the rest of the file is conceptually clean.

Switch the lint job to resolve dependencies via Conan (``conan install
. -s build_type=Release --build=missing``) and configure with the
resulting ``conan-release`` preset. aws-crt-cpp comes in as a Conan
package and the generated compile_commands.json now carries the right
include paths for the SigV4 source.

Updated the cpp-linter ``database`` and ``extra-args`` inputs to point
at the new ``build/Release`` layout.
@mmaslankaprv mmaslankaprv requested a review from bharathv April 21, 2026 08:12
- sigv4_signer.cc: ByteCursorToStdString now uses braced-init return to
  satisfy modernize-return-braced-init-list (missed in the previous
  sweep).
- cpp-linter.yml: override ICEBERG_BUILD_TESTS=ON in the lint build so
  the gtest/gmock test targets land in compile_commands.json. Without
  this, clang-tidy can't resolve <gmock/gmock.h> in auth_manager_test
  or sigv4_signer_test because the conanfile disables tests for the
  package build by default.
Enabling ICEBERG_BUILD_TESTS with the default bundle ON was pulling in
arrow-backed test targets (avro/arrow/parquet/scan), which Conan had to
compile from source for gcc-14 — pushing the lint job past 30 minutes.

Since the lint only needs compile_commands.json entries for our
sigv4_signer / rest auth / rest test files, configure with BUNDLE=OFF
and build only the ``rest_catalog_test`` target. cpp-linter still gets
full coverage of every file touched in this PR.
Comment on lines +591 to +593
set(ICEBERG_REST_HAVE_SIGV4
FALSE
PARENT_SCOPE)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just fail? This seems unlikely?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for completeness as Meson build doesn't require the AWS CRT

Comment on lines +54 to +70
struct SigV4Config {
/// AWS region (e.g., "us-east-1"). Required.
std::string region;
/// AWS signing service name. "glue" for AWS Glue's Iceberg REST endpoint;
/// "execute-api" for API Gateway; "s3tables" for S3 Tables. Required.
std::string service;
/// Which credential source the signer should consult.
SigV4CredentialsProvider provider = SigV4CredentialsProvider::kStatic;
/// Static AWS access key ID. Required when ``provider == kStatic``.
std::string access_key_id;
/// Static AWS secret access key. Required when ``provider == kStatic``.
std::string secret_access_key;
/// Optional STS session token. When present, X-Amz-Security-Token is
/// added to the request and included in the signed header set.
/// Only consulted when ``provider == kStatic``.
std::string session_token;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A variant would be nicer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i do not see any place in which we conditionally check the config type

@mmaslankaprv mmaslankaprv merged commit d4932de into main Apr 22, 2026
24 checks passed
@mmaslankaprv mmaslankaprv deleted the feat/rest-sigv4-auth branch April 22, 2026 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants