This Epic is for adding robust support for extension types in Vortex. An extension type is a logical type that is not "built-in" to the enum DType. For example, a Vector type would be an extension type, whereas Primitive is built-in to the DType enum.
This is a followup of some older issues and documents:
Status
Active.
We are in the process of adding different extension types (like UUID, Vectors, Tensors, and potentially Refinement types) to the Vortex main repository, which helps inform us of the best way to support all possible extension types that people might want to add in the future.
Goal
Users should be able to easily extend the Vortex type system with their own type constructs that nicely fit their application model and use case. Note that this Epic will link to different first-class extension type implementations for visibility, but the purpose of this Epic is to track the underlying systems that support creating and utilizing extension types in Vortex.
Motivation
A limitation of the current type system in Vortex is that we cannot easily add new logical types. For example, the effort to add FixedSizeList (vortex#4372) and also change List to ListView (vortex#4699) was very intrusive. It is much easier to add wrappers around canonical types (treating the canonical dtype as a "storage type") and implement some additional logic than to add a new variant to the DType enum.
We would like to add many more extension types. Some notable extension types include:
Time: We already have basic support for some different notions of Time in Vortex, but this can be improved.
Vector / Matrix / FixedShapeTensor: This would be an extension over FixedSizeList, where dimensions correspond to levels of nesting.
VariableShapeTensor: This would help support storing variable-size and variable-length videos in Vortex.
Uuid: Since this is a 128-bit number, we likely want to add FixedSizeBinary.
- Geospatial: This would be similar to the Arrow canonical extension type: https://github.com/geoarrow/geoarrow.
Json: TODO
- Images (are PNG and JPG logical types?)
- Video (are different codecs logical types?)
- ???
- Refinement Types: More on this in the Type System RFC (TODO).
We would like the Vortex extension type system to be expressive enough that anyone can add their own extension type by simply providing an implementation of the ExtVTable.
Additionally, we want extension types to be just as performant as the native logical types in Vortex (like Primitive or List), and have access to all Vortex features like expression evaluation, late materialization, compression, etc.
Unresolved questions
Notes
This Epic is for adding robust support for extension types in Vortex. An extension type is a logical type that is not "built-in" to the
enum DType. For example, aVectortype would be an extension type, whereasPrimitiveis built-in to theDTypeenum.This is a followup of some older issues and documents:
Status
Active.
We are in the process of adding different extension types (like UUID, Vectors, Tensors, and potentially Refinement types) to the Vortex main repository, which helps inform us of the best way to support all possible extension types that people might want to add in the future.
Goal
Users should be able to easily extend the Vortex type system with their own type constructs that nicely fit their application model and use case. Note that this Epic will link to different first-class extension type implementations for visibility, but the purpose of this Epic is to track the underlying systems that support creating and utilizing extension types in Vortex.
Motivation
A limitation of the current type system in Vortex is that we cannot easily add new logical types. For example, the effort to add
FixedSizeList(vortex#4372) and also changeListtoListView(vortex#4699) was very intrusive. It is much easier to add wrappers around canonical types (treating the canonical dtype as a "storage type") and implement some additional logic than to add a new variant to theDTypeenum.We would like to add many more extension types. Some notable extension types include:
Time: We already have basic support for some different notions ofTimein Vortex, but this can be improved.Vector/Matrix/FixedShapeTensor: This would be an extension overFixedSizeList, where dimensions correspond to levels of nesting.VariableShapeTensor: This would help support storing variable-size and variable-length videos in Vortex.Uuid: Since this is a 128-bit number, we likely want to addFixedSizeBinary.Json: TODOWe would like the Vortex extension type system to be expressive enough that anyone can add their own extension type by simply providing an implementation of the
ExtVTable.Additionally, we want extension types to be just as performant as the native logical types in Vortex (like
PrimitiveorList), and have access to all Vortex features like expression evaluation, late materialization, compression, etc.Unresolved questions
ExtensionArrayit is a no-op.Notes
Vectoras a new extension type (see Tracking Issue: Vector Extension Type #7297), but we have put that as a subissue of Epic: Vector Similarity Search #7704