diff --git a/format_spec/array_file_hierarchy.md b/format_spec/array_file_hierarchy.md index d4ee0e3da66..37fe863f30c 100644 --- a/format_spec/array_file_hierarchy.md +++ b/format_spec/array_file_hierarchy.md @@ -57,3 +57,6 @@ Inside the array folder, you can find the following: > [!NOTE] > Prior to version 10, the array schema was stored in a single `__array_schema.tdb` file in the array folder. Implementations must support arrays that contain both `__array_schema.tdb` and schemas in the `__schema` folder at the same time. For the purpose of array schema evolution, the timestamp of `__array_schema.tdb` must be considered to be earlier than any schema in the `__schema` folder. + +> [!NOTE] +> Prior to version 5, commit files were not written. Fragments of these versions are considered to be committed if their corresponding fragment metadata file exists. diff --git a/format_spec/array_schema.md b/format_spec/array_schema.md index d13b573b195..2b3b13eacc2 100644 --- a/format_spec/array_schema.md +++ b/format_spec/array_schema.md @@ -9,7 +9,7 @@ The array schema file consists of a single [generic tile](./generic_tile.md), wi | **Field** | **Type** | **Description** | | :--- | :--- | :--- | | Array version | `uint32_t` | Format version number of the array schema | -| Allows dups | `bool` | Whether or not the array allows duplicate cells | +| Allows dups | `bool` | _New in version 5_ Whether or not the array allows duplicate cells | | Array type | `uint8_t` | Dense or sparse | | Tile order | `uint8_t` | Row or column major | | Cell order | `uint8_t` | Row or column major | @@ -43,6 +43,7 @@ The domain has internal format: | **Field** | **Type** | **Description** | | :--- | :--- | :--- | +| Domain datatype | `uint8_t` | _Removed in version 5_ Datatype of all dimensions | | Num dimensions | `uint32_t` | Dimensionality/rank of the domain | | Dimension 1 | [Dimension](#dimension) | First dimension | | … | … | … | @@ -56,14 +57,17 @@ The dimension has internal format: | :--- | :--- | :--- | | Dimension name length | `uint32_t` | Number of characters in dimension name | | Dimension name | `uint8_t[]` | Dimension name character array | -| Dimension datatype | `uint8_t` | Datatype of the coordinate values | -| Cell val num | `uint32_t` | Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits::max()` | -| Filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used on coordinate value tiles | -| Domain size | `uint64_t` | The domain size in bytes | +| Dimension datatype | `uint8_t` | _New in version 5_ Datatype of the coordinate values | +| Cell val num | `uint32_t` | _New in version 5_ Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits::max()` | +| Filters | [Filter Pipeline](./filter_pipeline.md) | _New in version 5_ The filter pipeline used on coordinate value tiles | +| Domain size | `uint64_t` | _New in version 5_ The domain size in bytes | | Domain | `uint8_t[]` | Byte array of length equal to domain size above, storing the min, max values of the dimension. | | Null tile extent | `uint8_t` | `1` if the dimension has a null tile extent, else `0`. | | Tile extent | `uint8_t[]` | Byte array of length equal to the dimension datatype size, storing the space tile extent of this dimension. | +> [!NOTE] +> Prior to version 5, the size of the _Domain_ field was always equal to twice the size of the dimension's data type (which is stored in the [domain](#domain) in these versions). + ## Attribute The attribute has internal format: diff --git a/format_spec/fragment.md b/format_spec/fragment.md index 733e2352be4..644d0955558 100644 --- a/format_spec/fragment.md +++ b/format_spec/fragment.md @@ -28,7 +28,8 @@ my_array # array folder | |_ ... | |_ dci.tdb # delete condition index attribute | |_ ... - |_ ... + | |_ __coords.tdb # legacy coordinates + |_ ... ``` There can be any number of fragments in an array. The fragment folder contains: @@ -283,3 +284,12 @@ The on-disk format of each data file is: | Tile 1 | [Tile](./tile.md#tile) | The data of tile 1 | | … | … | … | | Tile N | [Tile](./tile.md#tile) | The data of tile N | + +## Legacy coordinates file + +Prior to version 5, dimension data for sparse cells are combined in a single tile that is stored in the `__coords.tdb` file. The tile is filtered with the filters specified in the _Coords filters_ field of the [array schema](./array_schema.md). + +Coordinates of a multi-dimensional array are placed in either zipped or unzipped order. In zipped order, coordinates of a cell are placed next to each other and ordered by the cell index, while in unzipped order, all coordinates values of a dimension are placed next to each other and ordered by the dimension index. + +* Since version 2, coordinates are always stored unzipped. +* In version 1, coordinates are stored unzipped if a [compression filter](./tile.md#compression-filters) exists in the filter list, otherwise they are stored zipped.