Experimental Support for Subarray DTypes by sehoffmann · Pull Request #3587 · zarr-developers/zarr-python

sehoffmann · 2025-11-20T11:36:14Z

This PR adds experimental support for subarray dtypes (https://numpy.org/doc/stable/glossary.html#term-subarray-data-type, https://numpy.org/doc/stable/user/basics.rec.html#structured-datatype-creation) and closes #3582 and #3583.

It also fixes support for nested (and subarray-containing) Structured dtypes for Zarr v2 which worked before in 2.18.* but not anymore 3.1.*. In particular, the buggy implementation forgot that a nested structured dtype is again a list of lists and not just a single flat list.

Note 1:
Subarray dtypes are in a very weird spot. They are a proper np.dtype, particular a np.VoidDType with unset fields attribute but set subdtype field. Hence, it makes sense to map them one-to-one to a ZDType. This also makes sense from an implementation standpoint wrt. serialization.

On the other hand, they do not have a proper scalar value. I.e. one can not create a np.void scalar for a subarray dtype (throws). Conceptually, a scalar value of a subarray dtype would be a np.ndarray. This, however, is not a subtype of np.generic despite sharing a lot of the interface. When one creates a np.ndarray with a subarray dtype directly, the result is "flat" np.ndarray with shape array_shape + subarray_shape.

I've decided to still implement them as separate Subarray-ZDType and not conflate them within the Structured class. While this works flawlessly when used within a structured dtype, the intended use case, using them directly is not fully supported. Specifically, there is no specification for standalone subarray dtypes in Zarr V2, making a lot of test cases fail. Apart from that, some tests in test_array.py do not expect an array as scalar and hence fail. I want to stress though, that I was able to successfully create and read a Subarray zarr array with V3.

Solving this conundrum adequately is beyond my possibilities and might require significant conceptual changes in Zarr. I did not add the dtype directly to test_dtype/contest.py but instead added a new test case for Structured that uses a Subarray inside which passes.

Note 2: I've also added a test case for an invalid float value string which fails due to #3584. Since that test case highlights an existing bug, I've decided to leave it there.

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

… for nested Structured dtypes in V2 (zarr-developers#3582, zarr-developers#3583)

codecov · 2025-11-21T09:09:01Z

Codecov Report

❌ Patch coverage is 93.42105% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.12%. Comparing base (dd5a321) to head (d4cb2d8).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
src/zarr/core/dtype/npy/structured.py	84.00%	4 Missing ⚠️
src/zarr/core/dtype/common.py	78.57%	3 Missing ⚠️
src/zarr/core/dtype/npy/subarray.py	97.29%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3587      +/-   ##
==========================================
+ Coverage   93.10%   93.12%   +0.02%     
==========================================
  Files          85       86       +1     
  Lines       11193    11334     +141     
==========================================
+ Hits        10421    10555     +134     
- Misses        772      779       +7

Files with missing lines	Coverage Δ
src/zarr/core/dtype/__init__.py	`100.00% <100.00%> (ø)`
src/zarr/core/dtype/npy/bytes.py	`99.50% <100.00%> (ø)`
src/zarr/core/dtype/common.py	`86.36% <78.57%> (+1.17%)`	⬆️
src/zarr/core/dtype/npy/subarray.py	`97.29% <97.29%> (ø)`
src/zarr/core/dtype/npy/structured.py	`95.68% <84.00%> (-3.25%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sehoffmann · 2025-12-18T12:28:17Z

@d-v-b Don't want to be pushy here, but did you manage to have a look at this PR yet? Do you have any feedback or is there anything that needs to be changed or addressed?

d-v-b · 2025-12-19T14:09:39Z

@d-v-b Don't want to be pushy here, but did you manage to have a look at this PR yet? Do you have any feedback or is there anything that needs to be changed or addressed?

Hi @sehoffmann sorry for the long silence. I think there are 2 distinct elements in this PR: first is improving how we handle numpy structured dtypes, and the second is including sub-array data types.

The first element looks great, but I have some concerns about the second element. So far we have tried to keep the set of supported data types as close as possible to the union of the data types zarr python v2 supported, plus the data types supported by other zarr v3 implementations (namely, zarrs and tensorstore).

This means when we add a new data type, there are two questions to answer: is this dtype something people used in zarr python 2, (and if does adding it resolve a feature regression)? or, is this dtype something the other zarr v3 implementations are supporting? If the answer to both of those is "no", then it seems like the maintenance burden for zarr-python might not be worth it, compared to the alternative of users registering this data type themselves via the registry. And I think sub-arrays are not something people used heavily in zarr python 2.x, nor are they supported by other zarr v3 implementations (please correct me if I'm wrong on either of these points).

How important is it for your application that this data type is bundled with Zarr python? And if that outcome is very important, would you be willing to work on a data type spec in the zarr-extensions repo? I think I'd support adding the new subarray data type unreservedly if there was buy-in from other zarr implementers. Without that buy-in, I'm pretty skeptical about the addition, and I would encourage using the data type registry to register the data type instead of relying on it being shipped wit zarr-python.

these are just my thoughts though, it would be good to hear from the other devs @zarr-developers/python-core-devs

vitusbenson · 2026-03-26T21:41:25Z

Following the implementation of the zarr-extension for structured dtypes, I've rebased this branch to the latest main and kept only the changes related to subarrays inside structured dtypes.

See sehoffmann#1
And https://github.com/vitusbenson/zarr-python/tree/subarray_dtypes

sehoffmann · 2026-04-13T12:41:53Z

Hey @d-v-b,

just to follow up on this. This PR would be ready to merge if it would add support for subarray dtypes as part of Structured(not standalone), following the spec from zarr-developers/zarr-extensions#45 ?

d-v-b · 2026-04-21T19:19:45Z

hi @sehoffmann we are still missing a spec for an "array of scalars" dtype. This should probably be a generic fixed-length data type that's configured with its length and the type of its contents. Once we have a spec for that, then it composes with the new struct dtype.

sehoffmann · 2026-04-22T15:50:38Z

Hey @d-v-b,

Just supporting subarrays as part of Structured would also be completely fine from my side. I see this as a matter of taste. Given that standalone subarray dtypes are behaving rather unexpected and weird compared with other dtypes, there might actually be a good reason to not allow standalone subarrays.

I.e. np.dtype([('x', 'f4'), ('y', np.float32), ('z', 'f4', (2, 2))]) should still be supported by zarr as part of Structured (as already in v2), but it is completely ok for me if np.dtype(('f4', (2,2))) doesn't have a corresponding zarr datatype. Maybe this was a misunderstanding before.

If that is fine with you, I could proceed by adapting this PR such that subarrays are only supported as part of Structured.

d-v-b · 2026-04-23T07:29:46Z

hi @sehoffmann, we now have a spec for a language-agnostic struct dtype, and this was added to zarr-python recently. The spec defines the struct dtype as generic over other dtypes.

So if we want the struct dtype to support subarrays in a way that could work for implementations outside of zarr-python, the best way forward would be a subarray dtype spec in zarr-extensions, then the struct dtype would "just work" without any changes.

I opened an issue about this in zarr-extensions: zarr-developers/zarr-extensions#57. I don't think the spec for a subarray dtype would be too much work (but I don't have time for it at the moment).

sehoffmann · 2026-04-23T16:07:41Z

I see, thanks for the clarification. How do you want to handle the backward compatibility to zarr v2? If I am not mistaken, subarrays were an integral part of Structured there and not standalone. Is this an issue and does the new subarray dtype need to concern itself with this?

sehoffmann added 2 commits November 19, 2025 17:54

fix: Metadata (v2) for nested Structured dtypes

bff778b

feat: experimental support for Subarray dtypes and backported support…

1d13dae

… for nested Structured dtypes in V2 (zarr-developers#3582, zarr-developers#3583)

github-actions Bot added the needs release notes Automatically applied to PRs which haven't added release notes label Nov 20, 2025

fix: np.bool -> bool

c21fcc6

sehoffmann force-pushed the subarray_dtypes branch from 221cc75 to c21fcc6 Compare November 21, 2025 08:58

Merge branch 'main' into subarray_dtypes

0df5510

Merge branch 'main' into subarray_dtypes

c168f9f

d-v-b mentioned this pull request Feb 7, 2026

[v3] Structured dtype support #2134

Open

d-v-b mentioned this pull request Mar 2, 2026

Add a formal definition for structured data in zarr3 zarr-developers/zarr-extensions#45

Merged

Merge branch 'main' into subarray_dtypes

d4cb2d8

d-v-b mentioned this pull request Apr 22, 2026

generic container data types zarr-developers/zarr-extensions#57

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Experimental Support for Subarray DTypes#3587

Experimental Support for Subarray DTypes#3587
sehoffmann wants to merge 6 commits intozarr-developers:mainfrom
sehoffmann:subarray_dtypes

sehoffmann commented Nov 20, 2025

Uh oh!

codecov Bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

sehoffmann commented Dec 18, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

vitusbenson commented Mar 26, 2026

Uh oh!

sehoffmann commented Apr 13, 2026

Uh oh!

d-v-b commented Apr 21, 2026

Uh oh!

sehoffmann commented Apr 22, 2026

Uh oh!

d-v-b commented Apr 23, 2026

Uh oh!

sehoffmann commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

sehoffmann commented Nov 20, 2025

Uh oh!

codecov Bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sehoffmann commented Dec 18, 2025

Uh oh!

d-v-b commented Dec 19, 2025

Uh oh!

vitusbenson commented Mar 26, 2026

Uh oh!

sehoffmann commented Apr 13, 2026

Uh oh!

d-v-b commented Apr 21, 2026

Uh oh!

sehoffmann commented Apr 22, 2026

Uh oh!

d-v-b commented Apr 23, 2026

Uh oh!

sehoffmann commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Nov 21, 2025 •

edited

Loading