Skip to content

Commit

Permalink
Cover changes in the storage format specification in versions 5, 4 (n…
Browse files Browse the repository at this point in the history
…o changes) and 2.
  • Loading branch information
teo-tsirpanis committed Oct 2, 2024
1 parent 6e5abe1 commit 21b77df
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 6 deletions.
3 changes: 3 additions & 0 deletions format_spec/array_file_hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,6 @@ Inside the array folder, you can find the following:
> [!NOTE]
> Prior to version 10, the array schema was stored in a single `__array_schema.tdb` file in the array folder. Implementations must support arrays that contain both `__array_schema.tdb` and schemas in the `__schema` folder at the same time. For the purpose of array schema evolution, the timestamp of `__array_schema.tdb` must be considered to be earlier than any schema in the `__schema` folder.
> [!NOTE]
> Prior to version 5, commit files were not written. Fragments of these versions are considered to be committed if their corresponding fragment metadata file exists.
14 changes: 9 additions & 5 deletions format_spec/array_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The array schema file consists of a single [generic tile](./generic_tile.md), wi
| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Array version | `uint32_t` | Format version number of the array schema |
| Allows dups | `bool` | Whether or not the array allows duplicate cells |
| Allows dups | `bool` | _New in version 5_ Whether or not the array allows duplicate cells |
| Array type | `uint8_t` | Dense or sparse |
| Tile order | `uint8_t` | Row or column major |
| Cell order | `uint8_t` | Row or column major |
Expand Down Expand Up @@ -43,6 +43,7 @@ The domain has internal format:

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Domain datatype | `uint8_t` | _Removed in version 5_ Datatype of all dimensions |
| Num dimensions | `uint32_t` | Dimensionality/rank of the domain |
| Dimension 1 | [Dimension](#dimension) | First dimension |
||||
Expand All @@ -56,14 +57,17 @@ The dimension has internal format:
| :--- | :--- | :--- |
| Dimension name length | `uint32_t` | Number of characters in dimension name |
| Dimension name | `uint8_t[]` | Dimension name character array |
| Dimension datatype | `uint8_t` | Datatype of the coordinate values |
| Cell val num | `uint32_t` | Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()` |
| Filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used on coordinate value tiles |
| Domain size | `uint64_t` | The domain size in bytes |
| Dimension datatype | `uint8_t` | _New in version 5_ Datatype of the coordinate values |
| Cell val num | `uint32_t` | _New in version 5_ Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()` |
| Filters | [Filter Pipeline](./filter_pipeline.md) | _New in version 5_ The filter pipeline used on coordinate value tiles |
| Domain size | `uint64_t` | _New in version 5_ The domain size in bytes |
| Domain | `uint8_t[]` | Byte array of length equal to domain size above, storing the min, max values of the dimension. |
| Null tile extent | `uint8_t` | `1` if the dimension has a null tile extent, else `0`. |
| Tile extent | `uint8_t[]` | Byte array of length equal to the dimension datatype size, storing the space tile extent of this dimension. |

> [!NOTE]
> Prior to version 5, the size of the _Domain_ field was always equal to twice the size of the dimension's data type (which is stored in the [domain](#domain) in these versions).
## Attribute

The attribute has internal format:
Expand Down
12 changes: 11 additions & 1 deletion format_spec/fragment.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ my_array # array folder
| |_ ...
| |_ dci.tdb # delete condition index attribute
| |_ ...
|_ ...
| |_ __coords.tdb # legacy coordinates
|_ ...
```

There can be any number of fragments in an array. The fragment folder contains:
Expand Down Expand Up @@ -283,3 +284,12 @@ The on-disk format of each data file is:
| Tile 1 | [Tile](./tile.md#tile) | The data of tile 1 |
||||
| Tile N | [Tile](./tile.md#tile) | The data of tile N |

## Legacy coordinates file

Prior to version 5, dimension data for sparse cells are combined in a single tile that is stored in the `__coords.tdb` file. The tile is filtered with the filters specified in the _Coords filters_ field of the [array schema](./array_schema.md).

Coordinates of a multi-dimensional array are placed in either zipped or unzipped order. In zipped order, coordinates of a cell are placed next to each other and ordered by the cell index, while in unzipped order, all coordinates values of a dimension are placed next to each other and ordered by the dimension index.

* Since version 2, coordinates are always stored unzipped.
* In version 1, coordinates are stored unzipped if a [compression filter](./tile.md#compression-filters) exists in the filter list, otherwise they are stored zipped.

0 comments on commit 21b77df

Please sign in to comment.