Skip to content

[ES-1804970] Fix CloudFetch returning stale column names from cached results#351

Merged
sreekanth-db merged 5 commits intomainfrom
fix/ES-1804970-cloudfetch-stale-column-names
Apr 21, 2026
Merged

[ES-1804970] Fix CloudFetch returning stale column names from cached results#351
sreekanth-db merged 5 commits intomainfrom
fix/ES-1804970-cloudfetch-stale-column-names

Conversation

@sreekanth-db
Copy link
Copy Markdown
Collaborator

Summary

Fixes a bug where arrow.Record.Schema() returns stale column aliases when CloudFetch serves cached Arrow IPC files from a structurally identical prior query with different AS aliases.

  • Root cause: NewCloudBatchIterator was not receiving the authoritative schema bytes from GetResultSetMetadata, unlike the local batch path which already had this. CloudFetch Arrow IPC files have column names baked in from the original query, and the driver was reading them as-is.
  • Fix: Pass arrowSchemaBytes (the authoritative schema from GetResultSetMetadata) into NewCloudBatchIterator. After records are deserialized from the IPC stream, replace the stale schema with the authoritative one using array.NewRecord() (zero-copy — shares underlying column data, only swaps metadata).

Changes

  • arrowRecordIterator.go — Pass ri.arrowSchemaBytes to NewCloudBatchIterator in newBatchIterator()
  • arrowRows.go — Pass schemaBytes to NewCloudBatchIterator in NewArrowRowScanner()
  • batchloader.go — Core fix:
    • NewCloudBatchIterator accepts arrowSchemaBytes, parses into *arrow.Schema, stores on batchIterator
    • batchIterator.Next() applies override schema to CloudFetch records only (local path is untouched, overrideSchema is nil)
    • Added schemaFromIPCBytes() helper
    • Field count validation guard to prevent panics on schema mismatch
    • Schema parse failure logged at Warn level
  • batchloader_test.go — Added TestCloudFetchSchemaOverride with two subtests:
    • Verifies stale column names ["id","name"] are overridden to ["x","y"]
    • Verifies nil schema bytes pass through original names unchanged

Who is affected

Go driver users with CloudFetch enabled (WithCloudFetch(true)) who read arrow.Record.Schema() directly. Python, ODBC, and JDBC drivers are not affected.

Test plan

  • All existing unit tests pass (37 tests in internal/rows/arrowbased/)
  • New unit test TestCloudFetchSchemaOverride covers the override and no-override paths
  • Verified end-to-end against a real Databricks warehouse using samples.tpch.lineitem (~30M rows) with two queries differing only in column aliases — confirmed arrow.Record.Schema() now returns correct aliases

This pull request was AI-assisted by Isaac.

…results

When the server result cache serves Arrow IPC files from a prior query,
the embedded schema contains stale column aliases. The Go driver's
CloudFetch path read these stale names directly, while the local path
already used the authoritative schema from GetResultSetMetadata.

Pass the authoritative schema bytes into NewCloudBatchIterator and
replace stale column names on deserialized records using
array.NewRecord, which is zero-copy (shares underlying column data).

Co-authored-by: Isaac
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
…dfetch-stale-column-names

Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>

# Conflicts:
#	internal/rows/arrowbased/arrowRecordIterator.go
#	internal/rows/arrowbased/arrowRows.go
#	internal/rows/arrowbased/batchloader_test.go
@sreekanth-db sreekanth-db enabled auto-merge (squash) April 21, 2026 09:24
@sreekanth-db sreekanth-db merged commit 3c0f7e4 into main Apr 21, 2026
3 checks passed
@sreekanth-db sreekanth-db deleted the fix/ES-1804970-cloudfetch-stale-column-names branch April 21, 2026 09:29
@vikrantpuppala vikrantpuppala mentioned this pull request Apr 21, 2026
2 tasks
vikrantpuppala added a commit that referenced this pull request Apr 21, 2026
## Summary
Bump `DriverVersion` to `1.11.0` and add the v1.11.0 section to
`CHANGELOG.md`.

### Changes since v1.10.0
- Enable telemetry by default with DSN-controlled priority (#320, #321,
#322, #349)
- Add SPOG (Custom URL) routing support via `x-databricks-org-id` header
(#347)
- Add statement-level query tag support (#341)
- Add AI coding agent detection to User-Agent header (#326)
- Fix CloudFetch returning stale column names from cached results (#351)
- Fix resource leak: close staging Rows in execStagingOperation (#325)

Internal/infra-only changes are omitted from the user-facing notes (CI
hardening, dependabot bumps, CODEOWNERS).

## Test plan
- [x] `go build ./...` clean
- [x] `go test ./... -count=1 -short` passes locally

## Next steps after merge
1. Tag the merge commit as `v1.11.0` and push the tag
2. Trigger `peco-databricks-sql-go` in
secure-public-registry-releases-eng with `ref=v1.11.0`, `dry-run=true`
to verify
3. Re-run with `dry-run=false` for the actual release

NO_CHANGELOG=true

This pull request was AI-assisted by Isaac.

Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants