Skip to content

Use SystemCertPool for OpenShift OAuth token exchange#900

Open
rubenvp8510 wants to merge 1 commit intoobservatorium:mainfrom
rubenvp8510:fix/use-system-cert-pool-for-openshift-oauth
Open

Use SystemCertPool for OpenShift OAuth token exchange#900
rubenvp8510 wants to merge 1 commit intoobservatorium:mainfrom
rubenvp8510:fix/use-system-cert-pool-for-openshift-oauth

Conversation

@rubenvp8510
Copy link
Copy Markdown
Contributor

The OAuth HTTP client used x509.NewCertPool() which creates an empty certificate pool, then only adds the Kubernetes service account CA. This excludes all system-trusted CAs (Let's Encrypt, DigiCert, etc.).

On regular OpenShift this works by coincidence because the ingress CA is part of the service account CA bundle. On HyperShift/HostedCluster environments (e.g. ROSA) the OAuth endpoint uses a publicly-trusted certificate (Let's Encrypt) that is not in the SA CA bundle, causing "x509: certificate signed by unknown authority" errors during the OAuth callback token exchange.

Switch to x509.SystemCertPool() so that system-trusted CAs are included alongside the service account CA. Falls back to an empty pool if SystemCertPool() is unavailable.

@rubenvp8510 rubenvp8510 force-pushed the fix/use-system-cert-pool-for-openshift-oauth branch from 8261ae9 to faee945 Compare April 13, 2026 18:50
@pavolloffay
Copy link
Copy Markdown
Member

@rubenvp8510 e2e test failed. Could you please rebase or change the commit to trigger it again? Once is all green ping me back for a review.

The OAuth HTTP client used x509.NewCertPool() which creates an empty
certificate pool, then only adds the Kubernetes service account CA.
This excludes all system-trusted CAs (Let's Encrypt, DigiCert, etc.).

On regular OpenShift this works by coincidence because the ingress CA
is part of the service account CA bundle. On HyperShift/HostedCluster
environments (e.g. ROSA) the OAuth endpoint uses a publicly-trusted
certificate (Let's Encrypt) that is not in the SA CA bundle, causing
"x509: certificate signed by unknown authority" errors during the
OAuth callback token exchange.

Switch to x509.SystemCertPool() so that system-trusted CAs are
included alongside the service account CA. Falls back to an empty
pool if SystemCertPool() is unavailable.

Signed-off-by: Ruben Vargas <ruben.vp8510@gmail.com>
@rubenvp8510 rubenvp8510 force-pushed the fix/use-system-cert-pool-for-openshift-oauth branch from faee945 to aad6e73 Compare April 21, 2026 03:55
@rubenvp8510
Copy link
Copy Markdown
Contributor Author

@pavolloffay seems like the e2e test are broken

Error: No such object: metrics-up-oidc-redirect-protection

=== NAME  TestMetricsReadAndWrite/OIDC_redirect_protection
    metrics_test.go:135: metrics_test.go:135: ""
        
         unexpected error: docker container up-oidc-redirect-protection failed to start: exit status 1
        
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation/query
    metrics_test.go:183: metrics_test.go:183: ""
        
         unexpected error: Post "https://127.0.0.1:32893/api/metrics/v1/test-oidc/api/v1/query": dial tcp 127.0.0.1:32893: connect: connection refused
        
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation/query_range
    metrics_test.go:201: metrics_test.go:201: ""
        
         unexpected error: Post "https://127.0.0.1:32893/api/metrics/v1/test-oidc/api/v1/query_range": dial tcp 127.0.0.1:32893: connect: connection refused
        
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation/series
    metrics_test.go:219: metrics_test.go:219: ""
        
         unexpected error: Post "https://127.0.0.1:32893/api/metrics/v1/test-oidc/api/v1/series": dial tcp 127.0.0.1:32893: connect: connection refused
        
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation/label_names
    metrics_test.go:232: metrics_test.go:232: ""
        
         unexpected error: Post "https://127.0.0.1:32893/api/metrics/v1/test-oidc/api/v1/labels": dial tcp 127.0.0.1:32893: connect: connection refused
        
=== RUN   TestMetricsReadAndWrite/metrics-tenant-isolation/labels_values
    metrics_test.go:244: metrics_test.go:244: ""
        
         unexpected error: Get "https://127.0.0.1:32893/api/metrics/v1/test-oidc/api/v1/label/__name__/values?end=1776744255.016&start=1776743955.016": dial tcp 127.0.0.1:32893: connect: connection refused
        
04:04:15 Killing up2-metrics-read-write
04:04:15 Error response from daemon: cannot kill container: metrics-up2-metrics-read-write: No such container: metrics-up2-metrics-read-write

04:04:15 Unable to kill service up2-metrics-read-write : exit status 1
04:04:15 Killing up-metrics-read-write
04:04:15 Error response from daemon: cannot kill container: metrics-up-metrics-read-write: No such container: metrics-up-metrics-read-write

04:04:15 Unable to kill service up-metrics-read-write : exit status 1
04:04:15 Killing observatorium-api
04:04:15 Error response from daemon: cannot kill container: metrics-observatorium-api: No such container: metrics-observatorium-api

04:04:15 Unable to kill service observatorium-api : exit status 1
04:04:15 Killing thanos-query
04:04:15 Error response from daemon: cannot kill container: metrics-thanos-query: No such container: metrics-thanos-query

04:04:15 Unable to kill service thanos-query : exit status 1
04:04:15 Killing thanos-receive
04:04:15 Error response from daemon: cannot kill container: metrics-thanos-receive: No such container: metrics-thanos-receive

04:04:15 Unable to kill service thanos-receive : exit status 1
04:04:15 Killing opa
04:04:15 Error response from daemon: cannot kill container: metrics-opa: No such container: metrics-opa

04:04:15 Unable to kill service opa : exit status 1
04:04:15 Killing gubernator
04:04:15 Error response from daemon: cannot kill container: metrics-gubernator: No such container: metrics-gubernator

04:04:15 Unable to kill service gubernator : exit status 1
04:04:15 Killing dex
--- PASS: TestRedisRateLimiter_GetRateLimits (3.60s)
    --- PASS: TestRedisRateLimiter_GetRateLimits/At_the_edge_of_the_limit (2.23s)
    --- PASS: TestRedisRateLimiter_GetRateLimits/Wait_for_1_leak (3.74s)
    --- PASS: TestRedisRateLimiter_GetRateLimits/Beyond_the_limit (1.94s)
    --- PASS: TestRedisRateLimiter_GetRateLimits/Single_hit,_far_from_limit (2.15s)
04:04:15 Error response from daemon: cannot kill container: metrics-dex: No such container: metrics-dex

@rubenvp8510
Copy link
Copy Markdown
Contributor Author

I don't think is related to this PR.

@JoaoBraveCoding
Copy link
Copy Markdown
Contributor

Tried reducing the flakiness of the tests with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants