How to handle cluster-specific Pod spec variations (GKE vs non-GKE workers)? #9522

mpsanj · 2026-02-26T13:25:30Z

mpsanj
Feb 26, 2026

Hi Team,

Firstly, thank you for maintaining and developing such a great product
We started to use Multikueue and want to expand across multiple clouds for gpu resource availability.

Setup

We have a MultiKueue setup with:

1 GKE manager cluster
Multiple worker clusters (mix of GKE and Akamai/LKE)

Deploymens are submitted on the manager and distributed to workers via MultiKueue.

Problem

Pods created on GKE manager contain GKE-specific fields that don't work on non-GKE workers:

schedulerName: gke.io/optimize-utilization-scheduler (doesn't exist on Akamai)
readinessGates: cloud.google.com/load-balancer-neg-ready (GKE-specific)
GCP access: GKE uses native Workload Identity, but Akamai needs Workload Identity Federation (WIF) with projected ServiceAccount tokens

What We Tried

We deployed a MutatingWebhook on worker clusters to:

Replace gke.io/* schedulerName → default-scheduler
Remove cloud.google.com/* readinessGates
Inject WIF volumes/env/volumeMounts for GCP access (GAR image pull, GCS runtime access)

Result

┌────────────────────────────────────────────┬───────────────────────────────────┐
│ Mutation Type │ Behavior │
├────────────────────────────────────────────┼───────────────────────────────────┤
│ schedulerName patch │ ✅ Works - pods progress normally │
├────────────────────────────────────────────┼───────────────────────────────────┤
│ readinessGates patch │ ✅ Works - pods progress normally │
├────────────────────────────────────────────┼───────────────────────────────────┤
│ WIF injection (volumes, env, volumeMounts) │ ❌ Pods terminate in 1-2 seconds │
└────────────────────────────────────────────┴───────────────────────────────────┘

Error Details

When WIF injection is enabled:
Workload transitions: Admitted → Finished in ~34ms
Pod terminates immediately after creation

Kueue controller logs show:
"prebuilt workload not found"

Root Cause Analysis

We believe Kueue creates a spec hash when the Workload is created on the manager. When our webhook on the worker modifies the Pod spec (adding volumes/env/volumeMounts), the spec no longer matches the prebuilt workload hash, causing Kueue to reject it.
Interestingly, schedulerName and readinessGates patches don't cause this issue - suggesting Kueue may exclude certain fields from the hash comparison.

Questions

Is this the expected behavior? Does Kueue hash the full pod spec including volumes/env, but exclude schedulerName/readinessGates?
What's the recommended pattern for cluster-specific Pod variations in MultiKueue?
- Our workers have different capabilities (GKE native WI vs WIF for non-GKE)
- We need different volume mounts depending on target cluster
Possible solutions we're considering:
- Manager-side webhook: Inject cluster-specific spec before Kueue processes the Job (based on target queue label)
- Separate Jobs: Different Job specs per cluster type
Is there a better/native way to handle this?
Is there a Kueue configuration to exclude certain fields (like volumes) from the prebuilt workload comparison?

Thanks again !

kannon92 · 2026-03-04T02:38:20Z

kannon92
Mar 4, 2026

cc @mimowo

Anyone you can loop in on this?

0 replies

kannon92 · 2026-03-04T02:41:27Z

kannon92
Mar 4, 2026

I can't answer 1-3.

But for 4:

Is there a Kueue configuration to exclude certain fields (like volumes) from the prebuilt workload comparison?

https://kueue.sigs.k8s.io/docs/tasks/manage/administer_cluster_quotas/#exclude-arbitrary-resources-in-the-quota-management

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle cluster-specific Pod spec variations (GKE vs non-GKE workers)? #9522

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to handle cluster-specific Pod spec variations (GKE vs non-GKE workers)? #9522

Uh oh!

Uh oh!

mpsanj Feb 26, 2026

Replies: 2 comments

Uh oh!

kannon92 Mar 4, 2026

Uh oh!

kannon92 Mar 4, 2026

mpsanj
Feb 26, 2026

kannon92
Mar 4, 2026

kannon92
Mar 4, 2026