Skip to content

auth: wire secure-storage cache into CLI#5013

Merged
simonfaltum merged 13 commits intomainfrom
simonfaltum/cli-ga-secure-storage-wiring
Apr 22, 2026
Merged

auth: wire secure-storage cache into CLI#5013
simonfaltum merged 13 commits intomainfrom
simonfaltum/cli-ga-secure-storage-wiring

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented Apr 17, 2026

Stack

Final PR in the opt-in secure token storage stack. Review and merge top-to-bottom:

  1. auth: import FileTokenCache into CLI and wire DualWrite #5056 auth: import FileTokenCache into CLI and wire DualWrite
  2. libs/auth/storage: add dormant secure-storage foundation #5008 libs/auth/storage: add dormant secure-storage foundation
  3. auth: wire secure-storage cache into CLI #5013 auth: wire secure-storage cache into CLI (this PR)

Depends on #5008 for the StorageMode resolver and KeyringCache, and on #5056 for the CLI-owned FileTokenCache that ResolveCache now returns in legacy mode.

Why

Opt-in OS-native secure token storage for every CLI command that authenticates via the databricks-cli strategy. Users turn it on by setting DATABRICKS_AUTH_STORAGE=secure (per-shell) or [__settings__].auth_storage = secure in .databrickscfg (per-machine). Everyone else stays on the legacy file-backed cache. The default does not change.

Storage backend is a per-machine setting, not a per-invocation choice. This PR deliberately ships no --secure-storage flag so that login, token, logout, and every other command can never drift apart on the same machine.

Changes

Before: auth login, auth token, auth logout, and every workspace client built through CLICredentials always went through the SDK's default file-backed TokenCache.

Now: every path that talks to OAuth runs through a single resolver.

  • New helper libs/auth/storage/cache.go:ResolveCache(ctx, override) resolves the mode via ResolveStorageMode and returns the corresponding cache.TokenCache. Split into a public form and an injectable core (resolveCacheWith) so unit tests exercise the secure path with a fake cache. Production always passes override = "" and relies on env -> config -> default. Legacy and plaintext modes return the CLI-owned storage.FileTokenCache from auth: import FileTokenCache into CLI and wire DualWrite #5056, not the SDK's file cache.
  • auth login, auth token (via newTokenCommand + runInlineLogin), and auth logout call storage.ResolveCache(ctx, "") and plumb the resolved cache into u2m.WithTokenCache(...) at every NewPersistentAuth call site.
  • libs/auth.CLICredentials.Configure also routes through ResolveCache so every workspace client built through the CLI credentials strategy reads from the same backend login writes to. This covers auth profiles, jobs list, clusters list, bundle deploy, and every other non-auth command. Without this, secure-storage users would hit "cache: token not found" on the first non-auth command.
  • dualWriteLegacyHostKey helper centralises the post-Challenge mirror to the legacy host-based key. Runs only when mode is legacy so secure-mode users do not end up with duplicate keyring entries.
  • discoveryLogin takes a discoveryLoginInputs struct after review feedback (8 positional args was over the threshold).
  • CLICredentials.persistentAuth no longer opens its own file cache; it relies entirely on the caller to pass WithTokenCache. Previously it opened a file cache and prepended a WithTokenCache option, only to have it immediately overridden by the keyring one appended by Configure (last-one-wins). Single source of truth now.
  • Acceptance tests under acceptance/cmd/auth/storage-modes/ cover the two CLI-visible behaviors that do not require an OS keyring: invalid env var surfaces a clear error, and explicit legacy mode behaves identically to the default path.

Out of scope:

  • Dedicated plaintext storage implementation. The resolver accepts plaintext; a follow-up will route it to a non-dual-write file backend.
  • Write path for [__settings__].auth_storage. Users hand-edit the config for now.
  • Automatic migration of existing token-cache.json entries. Users re-login after upgrading.
  • Flipping the default to secure.

Known limitation: duplicate keyring entries until SDK bump

The CLI currently pins databricks-sdk-go v0.128.0, which still contains PersistentAuth.dualWrite and calls it from both Challenge() and refresh(). dualWrite stores the token under GetCacheKey() (profile name) AND hostCacheKey() (the host URL). In legacy mode this mirrors into the file cache, which is the desired backward-compat behavior. In secure mode it mirrors into the keyring, producing two entries per login:

  • databricks-cli / <profile> -> token
  • databricks-cli / https://<host> -> same token

databricks/databricks-sdk-go#1646 removes the SDK-side dualWrite and the host-key fallback in loadToken. It is already merged to SDK main but blocked on #5056 shipping in a CLI release before it can be released itself. Once the SDK ships a new version and the CLI bumps to it, secure mode will produce a single keyring entry per login and the dualWriteLegacyHostKey helper in this PR remains the only source of the legacy host-key mirror (file cache only, by design).

No code change needed in this PR; the resolver, helper gating, and CLICredentials wiring are already correct for the post-bump world.

Test plan

  • Unit tests for ResolveCache covering default legacy, explicit override, env-var selection, plaintext fallback, invalid override, invalid env, and file-factory error propagation. Secure path uses a fake cache.TokenCache so CI never touches the OS keyring.
  • Unit tests for CLICredentials.Configure confirming ResolveCache is invoked and the resulting cache is passed to NewPersistentAuth via WithTokenCache.
  • Acceptance tests under acceptance/cmd/auth/storage-modes/: invalid-env (bogus DATABRICKS_AUTH_STORAGE surfaces a clear error via auth token) and legacy-env-default (explicit legacy mode clears the file cache via auth logout, matching default behavior).
  • make checks, make test, make lint pass.
  • Existing cmd/auth/login, cmd/auth/token, cmd/auth/logout, cmd/auth/profiles acceptance test suites still pass (regression).
  • Manual smoke: with DATABRICKS_AUTH_STORAGE=secure, run auth login, then auth profiles, auth token, jobs list, auth logout; confirm the token goes to the keyring and reads from there.

@simonfaltum simonfaltum marked this pull request as ready for review April 17, 2026 13:36
@simonfaltum simonfaltum changed the title auth: wire secure-storage cache into login/token/logout (CLI GA MS1, 2/2) auth: wire secure-storage cache into login/token/logout Apr 17, 2026
Comment thread cmd/auth/cache.go Outdated
Comment thread cmd/auth/login.go Outdated
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-wiring branch from 661420f to 1fc0c22 Compare April 20, 2026 07:14
@simonfaltum simonfaltum marked this pull request as draft April 21, 2026 15:56
@simonfaltum simonfaltum self-assigned this Apr 21, 2026
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from ea3802a to 4dd3b58 Compare April 22, 2026 09:36
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-wiring branch from b04e515 to 0bd9c0e Compare April 22, 2026 09:36
@simonfaltum simonfaltum marked this pull request as ready for review April 22, 2026 09:43
@simonfaltum simonfaltum changed the title auth: wire secure-storage cache into login/token/logout auth: wire secure-storage cache into CLI Apr 22, 2026
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from 4dd3b58 to 014a6a6 Compare April 22, 2026 11:27
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-wiring branch 2 times, most recently from a8c822e to 4d6f006 Compare April 22, 2026 12:05
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-foundation branch from 7fc4937 to f50a274 Compare April 22, 2026 15:43
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-wiring branch from 4d6f006 to ca2c821 Compare April 22, 2026 15:44
jamesbroadhead pushed a commit to jamesbroadhead/cli that referenced this pull request Apr 22, 2026
)

## Stack

Entry point for the opt-in secure token storage work. Review and merge
top-to-bottom:

1. **databricks#5056 auth: import FileTokenCache into CLI and wrap cache for
dual-write (this PR)**
2. databricks#5008 libs/auth/storage: add dormant secure-storage foundation
3. databricks#5013 auth: wire secure-storage cache into CLI

This PR is also the first of a separate 3-PR sequence that moves file
token cache ownership from the SDK to the CLI. That sequence (SDK PR
removing the internal dual-write, then SDK bump in the CLI) can proceed
in parallel with databricks#5008 and databricks#5013.

## Why

First of 3 PRs moving file-based OAuth token cache ownership from the Go
SDK to the CLI. Today the SDK owns both the cache interface and the
file-backed implementation, including the dual-write-under-host-key
convention. Long-term we want the SDK to stop owning persistence: the
OAuth flow and cache interface stay, but file format and host-key
conventions move to the CLI. This unblocks secure storage backends and
Renaud's Session model on a cleaner foundation.

This PR imports the cache into the CLI and wires it everywhere. Nothing
is deleted from the SDK yet. PRs 2 and 3 (SDK PR, then SDK bump) finish
the move.

## Changes

**Before:** CLI relied on the SDK's default `FileTokenCache`. Dual-write
to the legacy host-based key happened inside
`PersistentAuth.Challenge()` and `refresh()` via the SDK's internal
`dualWrite`.

**Now:** CLI owns its own `FileTokenCache` at
`libs/auth/storage/file_cache.go`, a near-verbatim copy from the SDK
(same JSON schema, same path `~/.databricks/token-cache.json`, same
permissions). A new `storage.DualWritingTokenCache` wraps the file cache
so that every write through it under the primary key is also mirrored
under the legacy host key. Every `u2m.NewPersistentAuth` call site in
the CLI now passes
`u2m.WithTokenCache(storage.NewDualWritingTokenCache(fileCache, arg))`.

Because mirroring happens inside the cache's `Store` method, every
SDK-internal write (Challenge, refresh, discovery) dual-writes
automatically. No call site needs to remember to invoke a helper, so
refresh paths (`Token()`, `ForceRefreshToken()`) preserve cross-SDK
compatibility just like login paths do.

The SDK is unchanged. It still dual-writes internally, so the two writes
hit the same file with the same keys and bytes, i.e. idempotent. Zero
user-visible behavior change.

Files touched:
- `libs/auth/storage/file_cache.go` + test (new)
- `libs/auth/storage/dual_writing_cache.go` + test (new)
- `cmd/auth/login.go`, `cmd/auth/token.go`, `cmd/auth/logout.go`
- `libs/auth/credentials.go`
- `NEXT_CHANGELOG.md`

**Lint-driven deltas from SDK:**
- `os.UserHomeDir()` is forbidden in the CLI, uses
`env.UserHomeDir(ctx)` instead. Required threading `ctx` through
`NewFileTokenCache`.
- `os.IsNotExist(err)` is forbidden, uses `errors.Is(err,
fs.ErrNotExist)`.

**Known edge case:** Tokens that exist only under the legacy host key
(users who logged in before profile-keyed writes existed and never
re-ran `auth login --profile`) keep working for now because the SDK's
internal dualWrite still runs. After PR 2 (SDK stops dual-writing),
re-login will be required for those users. Minimal impact.

## Test plan

- [x] Unit tests for `file_cache_test.go` (port of SDK tests)
- [x] Unit tests for `dual_writing_cache_test.go` covering primary-key
mirroring, non-primary passthrough, no host-key,
host-key-equals-primary, discovery with populated/empty
`GetDiscoveredHost`, and Lookup delegation
- [x] `make checks` and `make test` pass
- [ ] Manual smoke test of `databricks auth login`, `auth token`, `auth
logout` on a live profile before merging
Base automatically changed from simonfaltum/cli-ga-secure-storage-foundation to main April 22, 2026 18:26
Introduces a thin helper that resolves the CLI's token storage mode via
libs/auth/storage.ResolveStorageMode and returns a cache.TokenCache. The
helper is split into a public convenience form and an injectable core
(newAuthCacheWith) so unit tests exercise the secure path with a fake
cache without touching the real OS keyring. Follow-up commits wire this
into the auth command call sites.
cache.NewFileTokenCache has a variadic signature (opts ...FileTokenCacheOption),
which cannot be assigned directly to a non-variadic field. Adds a one-line
explanation so future readers do not 'simplify' the closure back into a direct
function reference and break the build.
Adds a hidden --secure-storage BoolVar to auth login and routes the
resolved token cache into both NewPersistentAuth call sites (main flow
and discoveryLogin). Users opt into OS-native secure storage either per
invocation via the flag or for the whole shell via
DATABRICKS_AUTH_STORAGE=secure. The default storage mode remains legacy.

Flag is MarkHidden because MS1 is experimental; release notes document
the env var for discovery.
Resolves the token cache via newAuthCache at the two NewPersistentAuth
call sites inside cmd/auth/token.go (loadToken via newTokenCommand and
runInlineLogin). No flag is added: per the rollout contract, auth token
reads from whichever mode the resolver selects via DATABRICKS_AUTH_STORAGE
or [__settings__].auth_storage.
Replaces the direct cache.NewFileTokenCache() call with newAuthCache so
that logout targets whichever backend ResolveStorageMode selects. The
resolver still defaults to legacy, so existing flows are unchanged; once
a user opts into secure storage, logout clears the keyring entry instead
of the file cache.
Adds three acceptance tests under acceptance/cmd/auth/storage-modes/:

  - invalid-env:    DATABRICKS_AUTH_STORAGE=bogus surfaces a clear error
  - legacy-env-default: explicit legacy mode behaves like the default path
  - hidden-flag:    --secure-storage is hidden from auth login --help

Secure-mode end-to-end behavior is covered by unit tests in
cmd/auth/cache_test.go because acceptance runners cannot safely exercise
the real OS keyring on macOS (would write to Keychain) or Linux (no
D-Bus session in CI).
Storage backend is a per-machine setting, not a per-invocation choice.
Keeping a write-time flag on login while token/logout rely on the
resolver creates an asymmetry: login --secure-storage writes to the
keyring, but a later auth token without the env var defaults to the
legacy file cache and can't find the token.

Resolution is now always env -> [__settings__].auth_storage -> default.
login.go matches token.go and logout.go: newAuthCache(ctx, "") with no
per-call override from a flag.

Also drops the hidden-flag acceptance test (no flag to hide) and updates
NEXT_CHANGELOG to advertise the env var and config key instead of the
flag.
Drop MS1/MS3 comment markers from cache.go and convert discoveryLogin
to take a discoveryLoginInputs struct for readability (8 positional
args was over the threshold). No behavior change.

Co-authored-by: Isaac
The file factory was calling the SDK's cache.NewFileTokenCache(), which
is being phased out in favor of the CLI's own storage.NewFileTokenCache(ctx)
imported in #5056. Route ResolveCache through it so that legacy and
plaintext modes share a single file cache implementation owned by the CLI.

Co-authored-by: Isaac
Configure already resolves the correct cache via storage.ResolveCache and
passes it via u2m.WithTokenCache. The persistentAuth helper opened a second
storage.FileTokenCache and prepended its own WithTokenCache to opts. Because
u2m options are last-one-wins, the keyring still won, but we were opening a
file cache on every workspace client just to have it immediately overridden.

Remove the hardcoded file cache and rely on the caller to supply one. Now
there is a single place that manages the token cache (ResolveCache in
Configure) and the helper is a thin wrapper around u2m.NewPersistentAuth.

Co-authored-by: Isaac
@simonfaltum simonfaltum force-pushed the simonfaltum/cli-ga-secure-storage-wiring branch from ca2c821 to e5d1e3c Compare April 22, 2026 18:43
@simonfaltum simonfaltum added this pull request to the merge queue Apr 22, 2026
Merged via the queue into main with commit da7fc67 Apr 22, 2026
22 checks passed
@simonfaltum simonfaltum deleted the simonfaltum/cli-ga-secure-storage-wiring branch April 22, 2026 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants