Open
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t spec) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unions Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hBytesConfig, TimeConfig) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…om zarr-metadata Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…data Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ed NumcodecsConfig Spec-defined metadata fields with fixed length and no mutation semantics are typed as tuples, not Sequence. Applies to: - v2 ArrayMetadataV2.shape, .chunks - v2 DataTypeV2Structured.shape - v2 ArrayMetadataV2.filters (tuple of codec configs) - v3 RegularChunkGridConfig.chunk_shape - v3 RectilinearChunkGridConfig.chunk_shapes Adds zarr_metadata.v2.codec.NumcodecsConfig, a TypedDict modeling the v2 spec shape for compressors and filters: a required 'id' field plus arbitrary codec-specific extras. ArrayMetadataV2.compressor and .filters now reference this type instead of an untyped Mapping[str, JSON]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… migration
Three fixes:
1. Add missing "μs" unit to zarr_metadata.dtype.time.DateTimeUnit so it
matches zarr-python's DateTimeUnit. zarr.core.dtype.npy.common.DateTimeUnit
now re-exports from zarr-metadata (downstream consumers like
zarr.core.dtype.npy.time pick it up transitively).
2. Replace `from X import Y as LegacyName` with `from X import Y` followed
by a module-level `LegacyName: TypeAlias = Y` binding. mypy under
`strict = true` rejected the renamed-import form under the explicit-
re-export check ("Module 'X' does not explicitly export attribute 'Y'"),
affecting 13 call sites across the codebase. The TypeAlias form makes
the alias a proper type (mypy uses it in annotations) while preserving
runtime introspection (`.__annotations__` access on the aliased TypedDict).
Affects:
- src/zarr/core/dtype/common.py (DTypeJSON)
- src/zarr/core/metadata/v2.py (ArrayV2MetadataDict)
- src/zarr/core/metadata/v3.py (ArrayMetadataJSON_V3 + 5 others)
3. noqa: UP040 on the TypeAlias bindings. ruff prefers the `type` keyword
(PEP 695), but that wraps the alias in a TypeAliasType which breaks
`.__annotations__` lookup used by tests.
The 12 remaining "unused type: ignore" mypy errors in v3.py are
pre-existing (same count on the pre-refactor state) and unrelated to this
work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ycle
Moves JSON, NamedConfig, NamedRequiredConfig out of zarr_metadata/__init__.py
into zarr_metadata/common.py. Submodules (v2/*, v3/*) now import from
zarr_metadata.common directly, avoiding the circular import that occurred
when v2.codec was loaded during __init__.py execution.
Also:
- v3.array declares RegularChunkGrid/RectilinearChunkGrid as direct
TypedDict classes instead of NamedRequiredConfig aliases, simplifying
the types and enabling more precise chunk-grid annotations downstream.
- v2.consolidated.ConsolidatedMetadataV2.metadata value type widened to
GroupMetadataV2 | ArrayMetadataV2 | JSON.
- Added spec links to v2/{array,codec} docstrings.
zarr_metadata/__init__.py continues to re-export JSON, NamedConfig,
NamedRequiredConfig at the top level so zarr.core.common keeps resolving.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues surfaced by final code review: 1. Add py.typed marker to zarr-metadata. Without it, PEP 561 makes type checkers treat zarr-metadata as untyped, cascading into ~44 spurious mypy errors in zarr (subclassing Any, unused type: ignore, etc). 2. RegularChunkGrid.configuration was accidentally typed NotRequired when converted from NamedRequiredConfig to a direct TypedDict class. Per spec, chunk_shape is mandatory. Make configuration required. 3. RectilinearDimSpec was declared as tuples but zarr's compress_rle returned lists, and the to_dict producer built lists. Align producers with the declared type: compress_rle now returns list[int | tuple[int, int]], expand_rle accepts both list and tuple RLE pairs, to_dict builds tuples. The tuple shape is correct per spec: each RLE pair is a JSON array of exactly two elements (size, count) — a fixed-cardinality structure that tuple models more faithfully than a mutable list. Mypy error count now matches main (32) with these fixes in place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
consolidated_metadata is not in the core Zarr v3 spec as a field on group metadata. It has an (unmerged) extension spec and is implemented by zarr-python, but keeping it out of GroupMetadataV3 is the spec-faithful move. The extra_items=AllowedExtraField on GroupMetadataV3 already permits it to appear at runtime as an extension. ConsolidatedMetadataV3 remains available at zarr_metadata.v3.consolidated for consumers that want to type the extension shape. Also fix two stray lint issues (missing trailing newline in common.py, unused Mapping import in v2/array.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
zarr-metadata is a library, not an application — its lockfile pins transitive dev versions that shouldn't be fixed in source. Untrack and gitignore. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nspose, sharding
Adds per-codec TypedDict configurations + name literals + full envelope
types for every core v3 codec besides blosc (which is extended in the
same style for consistency):
- {Codec}CodecName : Literal["<name>"] — the spec "name" value
- {Codec}CodecConfiguration : TypedDict — the "configuration" body
- {Codec}Codec : NamedRequiredConfig — the full envelope
crc32c has no configuration fields, so Crc32cCodec uses NamedConfig
(configuration optional) and no Configuration TypedDict is exported.
The `V1` suffix is dropped from the Configuration types (except blosc,
where V1 + Numcodecs disambiguate two concrete shapes). The other v3
codec specs aren't versioned at the codec level; there's only one shape
per codec today, and an incompatible future change would land under a
new codec name rather than a v2 of the same name.
Also fixes pre-existing v2 test fixtures to include the now-required
compressor/fill_value/order/filters fields on ArrayMetadataV2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each {Codec}Codec envelope type is now an explicit TypedDict class with
`name` and `configuration` fields, rather than a NamedRequiredConfig[...]
generic alias. Readable at the call site, surfaces the spec structure
directly, and allows a real class-level docstring.
Also:
- Drop BloscCodecConfigurationNumcodecs from zarr-metadata. numcodecs-
shape modeling belongs in zarr-python (which implements that shape),
not in zarr-metadata (which is spec-only).
- Rename BloscCodecConfigurationV1 to BloscCodecConfiguration, matching
the unversioned naming used for the other codecs.
- Restore BloscConfigV2 locally in zarr-python for the numcodecs shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…alued fields
Each codec now exports SCREAMING_CASE Final constants alongside the
Literal types. Downstream packages can reference the spec-defined
strings without retyping magic strings.
Codec names: BLOSC_CODEC_NAME, BYTES_CODEC_NAME, CRC32C_CODEC_NAME,
GZIP_CODEC_NAME, SHARDING_CODEC_NAME, TRANSPOSE_CODEC_NAME, ZSTD_CODEC_NAME.
Enum-valued field values:
- Blosc: BLOSC_SHUFFLE_{NOSHUFFLE,SHUFFLE,BITSHUFFLE},
BLOSC_CNAME_{LZ4,LZ4HC,BLOSCLZ,SNAPPY,ZLIB,ZSTD}
- Bytes: BYTES_ENDIAN_{LITTLE,BIG} (also extracts the existing
Literal into a new `Endian` alias)
- Sharding: SHARDING_INDEX_LOCATION_{START,END} (and `IndexLocation`
Literal alias)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rewords docstrings and test names throughout the package: the {Codec}Codec
TypedDict describes a codec's JSON metadata, not a "named-config envelope."
Less jargon, consistent with the package name. Identifier names are
unchanged (still BloscCodec, GzipCodec, etc.).
Also renames v3/array.py chunk-grid docstrings for consistency
(Regular/Rectilinear ChunkGrid "metadata" rather than "named-config
container"), and updates the README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Convert all double-backtick RST-style inline code in zarr-metadata docstrings to single-backtick markdown style. The package's documentation will be rendered by mkdocs, which expects markdown, so single backticks render correctly as inline code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Models the spec-defined v3 data types from zarr-specs core and
zarr-extensions:
* `dtype/primitive.py` (NEW) - Final constants and `PrimitiveDTypeName`
Literal union for all 14 core v3 primitives (bool, int8..int64,
uint8..uint64, float16..float64, complex64, complex128).
* `dtype/bytes.py` - adds `BYTES_DTYPE_NAME` and `BytesDTypeName` for
the variable-length `bytes` extension; adds `NullTerminatedBytes`
envelope TypedDict for `null_terminated_bytes` (zarr-extensions).
Retains `FixedLengthBytesConfig` (re-exported by zarr-python).
* `dtype/string.py` - adds `STRING_DTYPE_NAME`/`StringDTypeName` for
the `string` extension; adds `FixedLengthUtf32` envelope. Retains
`LengthBytesConfig`.
* `dtype/time.py` - adds `NumpyDatetime64` and `NumpyTimedelta64`
envelopes plus name constants/literals. The shared `TimeConfig` body
is preserved.
* `dtype/struct.py` (NEW) - the `struct` extension type, with
`StructField`, `StructConfig`, and `Struct` envelope. Fields hold
recursive `DType` values, supporting nested structs.
The `r<N>` raw-bytes type from the core spec is parameterised on bit
count, not a single literal name, so it isn't given a TypedDict; consumers
match it against the wider `DType` alias.
Tests updated and extended for the new types and constants.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ators
Restructure `zarr_metadata.dtype.*` so each spec data type lives in its
own module, mirroring the per-codec layout in `zarr_metadata.codec.*`
and the per-dtype directory layout in zarr-extensions.
New per-type modules (one per spec data type):
bool.py, int8/16/32/64.py, uint8/16/32/64.py,
float16/32/64.py, complex64/128.py,
bytes.py, string.py, numpy_datetime64.py, numpy_timedelta64.py,
struct.py, raw.py
Each module exports:
- {DTYPE}_DTYPE_NAME (Final str)
- {DType}DTypeName (Literal)
- For envelope types: a {DType} TypedDict + a {DType}Configuration
- {DType}FillValue alias for the JSON shape of `fill_value`
Removed `null_terminated_bytes` and `fixed_length_utf32` from
zarr-metadata: they are not in zarr-specs or zarr-extensions; they are
zarr-python-specific. Their `LengthBytesConfig` and
`FixedLengthBytesConfig` TypedDicts now live locally in zarr-python at
src/zarr/core/dtype/npy/{string,bytes}.py.
zarr.core.dtype.npy.common now imports `DateTimeUnit` from
`zarr_metadata.dtype.numpy_datetime64`. zarr.core.dtype.npy.time imports
`TimeConfig` (aliased from `NumpyDatetime64Configuration`).
NewType + validating-constructor pattern for non-literal spec strings:
- HexFloat{16,32,64} for the float hex-string fill values
- Base64Bytes for the `bytes` base64 fill value
- RawBytesDTypeName for the `r<N>` parameterised name
These make spec-format constraints visible to the type system; the
matching validating constructors (e.g. `hex_float32`) are the only
runtime logic in the package and are minimal regex checks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ chunk_key_encoding
Move chunk-grid TypedDicts out of `v3/array.py` into per-type modules,
mirroring the per-codec and per-dtype layouts:
packages/zarr-metadata/src/zarr_metadata/v3/
├── chunk_grid/
│ ├── __init__.py
│ ├── regular.py # core spec
│ └── rectilinear.py # zarr-extensions
└── chunk_key_encoding/
├── __init__.py # ChunkKeySeparator alias
├── default.py # core spec
└── v2.py # core spec
Each module exports:
- {NAME}_NAME (Final str)
- {Name} (TypedDict envelope)
- {Name}Configuration (TypedDict body)
- {Name}Name (Literal type of the `name` field)
`v3/array.py` shrinks to just `AllowedExtraField`, `MetadataField`, and
`ArrayMetadataV3`. `chunk_grid` and `chunk_key_encoding` fields stay
typed as `MetadataField` (str | NamedConfig) -- narrowing them to a
specific union belongs in a future validation layer, not in the
spec-faithful types layer.
Configuration TypedDicts renamed from `*Config` to `*Configuration`
to match the dtype/codec naming. zarr.core.metadata.v3 re-exports
preserve the legacy `*Config` aliases via `as` imports.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both directories model v3-spec artifacts, so they belong under the v3/ subpackage alongside v3/array, v3/group, v3/consolidated, v3/chunk_grid, and v3/chunk_key_encoding. The principle is now: anything imported from `zarr_metadata.v3.X` is a v3-spec artifact; anything from `zarr_metadata.v2.X` is a v2-spec artifact; only true cross-version primitives sit at the top level (`zarr_metadata.JSON`, `NamedConfig`, `NamedRequiredConfig`, and the `ArrayMetadata`/`GroupMetadata` unions). Path moves: zarr_metadata.codec.* -> zarr_metadata.v3.codec.* zarr_metadata.dtype.* -> zarr_metadata.v3.dtype.* Internal imports inside the moved modules and zarr-python re-export sites updated accordingly. zarr.abc.codec imports the zarr-metadata Codec alias with a private name to avoid colliding with its own runtime `Codec` union (`ArrayArrayCodec | ArrayBytesCodec | BytesBytesCodec`), then re-exports as `CodecJSON`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Matches the v3 spec field name `data_type` exactly. All imports inside the package and in zarr-python re-export sites updated accordingly. The `DType` type alias keeps its short name (it's the widely understood abbreviation for "data type JSON shape"); only the module path changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR creates a new subpackage called
zarr-metadatajust for JSON metadata. it's stored inpackages/zarr_metadata. It contains typeddict classes that model the spec-defined JSON forms for v2 array + group metadata, and v3 array + group metadata, including data types, codecs, chunk key encodings and chunk grids. I only included type definitions for metadata that has an external spec. So zarr-python will need to define some types internally for e.g. unspecified data types or codecs.I would like to publish this subpackage to pypi. These types useful to any python tool that works with zarr data, even if that tool doesn't use zarr-python. It is also useful to zarr-python, because it means we can remove and resolve some lingering questions about publishing types.
If we adopted the changes here, adding a new data type / codec / chunk grid, etc, would require adding types to
zarr-metadata, then adding the implementation in zarr-python that work with those types. We wouldn't need to do these 2 operations in the same PR, but I expect that would be the normal practice.This change does add complexity to our publishing workflow: we need to ensure that
zarr-metadatachanges are published at or before newzarr-pythonreleases. We should add some checks to ensure that this happens.Docs for the new package are missing from this PR. I would handle that in a follow-up.
I would appreciate feedback at all levels, including the following topics:
closes #3355 and #3795