Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage format specification improvements 2/N #5329

Merged
merged 15 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions format_spec/FORMAT_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,18 @@ title: Format Specification

**Notes:**

* The current TileDB format version number is **22** (`uint32_t`).
* The current TileDB array format version number is **22** (`uint32_t`).
* Other structures might be versioned separately.
* Data written by TileDB and referenced in this document is **little-endian**
with the following exceptions:

- [Dictionary filter](filters/dictionary_encoding.md)
- [Dictionary encoding filter](filters/dictionary_encoding.md)
- RLE filter

## Table of Contents

* **Array**
* [Format Version History](./history.md)
* [Format Version History](./array_format_history.md)
* [File hierarchy](./array_file_hierarchy.md)
* [Array Schema](./array_schema.md)
* [Fragment](./fragment.md)
Expand Down
44 changes: 31 additions & 13 deletions format_spec/array_file_hierarchy.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ An array is a folder with the following structure:
```
my_array # array folder
|_ __schema # array schema folder
|_ <timestamp_name> # array schema files
|_ ...
|_ __enumerations # array enumerations folder
|_ __fragments # array fragments folder
|_ <timestamped_name> # fragment folder
|_ ...
Expand All @@ -22,23 +25,38 @@ my_array # array folder
|_ <timestamped_name>.con # consolidated commits file
|_ ...
|_ <timestamped_name>.ign # ignore file for consolidated commits file
|_ __fragment_meta
|_ <timestamped_name>.meta # consol. fragment meta file
|_ ...
|_ __fragment_meta # consolidated fragment metadata folder
|_ <timestamped_name>.meta # consolidated fragment meta file
|_ ...
|_ __meta # array metadata folder
|_ __labels # dimension label folder

|_ <timestamped_name> # legacy fragment folder
|_ ...
|_ <timestamped_name>.ok # legacy fragment write file
|_ <timestamped_name>.meta # legacy consolidated fragment meta file
|_ __array_schema.tdb # legacy array schema file
```

Inside the array folder, you can find the following:

* [Array schema](./array_schema.md) folder `__schema`.
* Inside of a fragments folder, any number of [fragment folders](./fragment.md) [`<timestamped_name>`](./timestamped_name.md).
* Inside of a commit folder, an empty file [`<timestamped_name>`](./timestamped_name.md)`.wrt` associated with every fragment folder [`<timestamped_name>`](./timestamped_name.md), where [`<timestamped_name>`](./timestamped_name.md) is common for the folder and the WRT file. This is used to indicate that fragment [`<timestamped_name>`](./timestamped_name.md) has been *committed* (i.e., its write process finished successfully) and it is ready for use by TileDB. If the WRT file does not exist, the corresponding fragment folder is ignored by TileDB during the reads.
* Inside the same commit folder, any number of [delete commit files](./delete_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.del`.
* Inside the same commit folder, any number of [update commit files](./update_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.upd`.
* Inside the same commit folder, any number of [consolidated commits files](./consolidated_commits_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.con`.
* Inside the same commit folder, any number of [ignore files](./ignore_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.ign`.
* Inside of a fragment metadata folder, any number of [consolidated fragment metadata files](./consolidated_fragment_metadata_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.meta`.
* Inside of a `__schema` folder, any number of [array schema files](./array_schema.md) [`<timestamped_name>`](./timestamped_name.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Inside of a `__schema` folder, any number of [array schema files](./array_schema.md) [`<timestamped_name>`](./timestamped_name.md).
* Inside the `__schema` folder, any number of [array schema files](./array_schema.md) [`<timestamped_name>`](./timestamped_name.md).

There's only one __schema folder per array, as far as I know.

* Note: the name does _not_ include the format version.
teo-tsirpanis marked this conversation as resolved.
Show resolved Hide resolved
* _New in version 20_ Inside of the schema folder, an enumerations folder `__enumerations`.
* Inside of a `__fragments` folder, any number of [fragment folders](./fragment.md) [`<timestamped_name>`](./timestamped_name.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Inside of a `__fragments` folder, any number of [fragment folders](./fragment.md) [`<timestamped_name>`](./timestamped_name.md).
* Inside the `__fragments` folder, any number of [fragment folders](./fragment.md) [`<timestamped_name>`](./timestamped_name.md).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "the" might imply that the existence of the folder is required. There cannot be empty folders in cloud object storage which means that an array with no fragments yet written will not have a __fragments folder.

* _New in version 12_ Inside of a `__commits` folder, an empty file [`<timestamped_name>`](./timestamped_name.md)`.wrt` associated with every fragment folder [`<timestamped_name>`](./timestamped_name.md), where [`<timestamped_name>`](./timestamped_name.md) is common for the folder and the WRT file. This is used to indicate that fragment [`<timestamped_name>`](./timestamped_name.md) has been *committed* (i.e., its write process finished successfully) and it is ready for use by TileDB. If the WRT file does not exist, the corresponding fragment folder is ignored by TileDB during the reads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* _New in version 12_ Inside of a `__commits` folder, an empty file [`<timestamped_name>`](./timestamped_name.md)`.wrt` associated with every fragment folder [`<timestamped_name>`](./timestamped_name.md), where [`<timestamped_name>`](./timestamped_name.md) is common for the folder and the WRT file. This is used to indicate that fragment [`<timestamped_name>`](./timestamped_name.md) has been *committed* (i.e., its write process finished successfully) and it is ready for use by TileDB. If the WRT file does not exist, the corresponding fragment folder is ignored by TileDB during the reads.
* _New in version 12_ Inside the `__commits` folder lives an empty file [`<timestamped_name>`](./timestamped_name.md)`.wrt` associated with every fragment folder [`<timestamped_name>`](./timestamped_name.md), where [`<timestamped_name>`](./timestamped_name.md) is common for the folder and the WRT file. This is used to indicate that fragment [`<timestamped_name>`](./timestamped_name.md) has been *committed* (i.e., its write process finished successfully) and it is ready for use by TileDB. If the WRT file does not exist, the corresponding fragment folder is ignored by TileDB during the reads.

* _New in version 16_ Inside the same commits folder, any number of [delete commit files](./delete_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.del`.
* _New in version 16_ Inside the same commits folder, any number of [update commit files](./update_commit_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.upd`.
* Inside the same commits folder, any number of [consolidated commits files](./consolidated_commits_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.con`.
* Inside the same commits folder, any number of [ignore files](./ignore_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.ign`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated commit files should have been gated behind a format version but they aren't.

* _New in version 12_ Inside of a fragment metadata folder, any number of [consolidated fragment metadata files](./consolidated_fragment_metadata_file.md) of the form [`<timestamped_name>`](./timestamped_name.md)`.meta`.
* [Array metadata](./metadata.md) folder `__meta`.
* Inside of a labels folder, additional TileDB arrays storing dimension label data.
* _New in version 18_ Inside of a `__labels` folder, additional TileDB arrays storing dimension label data.

> [!NOTE]
> Prior to version 12, fragments, commit files, and consolidated fragment metadata were stored directly in the array folder and the extension of commit files was `.ok` instead of `.wrt`. Implementations must support arrays that contain data in both the old and the new hierarchy at the same time.

> [!NOTE]
> Prior to version 10, the array schema was stored in a single `__array_schema.tdb` file in the array folder. Implementations must support arrays that contain both `__array_schema.tdb` and schemas in the `__schema` folder at the same time. For the purpose of array schema evolution, the timestamp of `__array_schema.tdb` must be considered to be earlier than any schema in the `__schema` folder.

> [!NOTE]
> Prior to version 5, commit files were not written. Fragments of these versions are considered to be committed if their corresponding fragment metadata file exists.
14 changes: 7 additions & 7 deletions format_spec/history.md → format_spec/array_format_history.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: Format version history
title: Array format version history
---

# Format Version History
# Array Format Version History

## Version 22

Expand All @@ -24,7 +24,7 @@ Introduced in TileDB 2.19
Introduced in TileDB 2.17

* Arrays can have [enumerations](./enumeration.md).
* The bit-width reduction and positive delta filters are supported on data of date or time types.
* The bit-width reduction and positive delta encoding filters are supported on data of date or time types.
* The [filter pipeline options](./filter_pipeline.md#filter-options) for the double-delta filter contain the _Reinterpret datatype_ field.

## Version 19
Expand All @@ -45,7 +45,7 @@ Introduced in TileDB 2.15
Introduced in TileDB 2.14

* The _Order_ field was added to [attributes](./array_schema.md#attribute).
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary filter exists in the filter pipeline. They are instead encoded as part of the data tile.
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.

## Version 16

Expand All @@ -72,7 +72,7 @@ Introduced in TileDB 2.10

Introduced in TileDB 2.9

* The [dictionary filter](./filters/dictionary_encoding.md) was added.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filters are not "added" in a version so we have to reword this.

* Cell offsets in dimensions or attributes of ASCII string type are not written in the offset tiles, if the dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.

## Version 12

Expand All @@ -86,7 +86,7 @@ Introduced in TileDB 2.8

Introduced in TileDB 2.7

* Fragment metadata contain [metadata](./fragment.md#tile-mins-maxes) (min/max value, sum, null count) for each tile.
* Fragment metadata contain [metadata](./fragment.md#tile-mins-maxes) (min/max value, sum, null count) for data in the whole fragment and each tile.
* The TileDB implementation has been updated to never split cells when storing them in chunks.

## Version 10
Expand Down Expand Up @@ -154,7 +154,7 @@ Introduced in TileDB 1.6
* The [footer](./fragment.md#footer) and [R-Tree](./fragment.md#r-tree) structures were added.
* The _Bounding coords_ field was removed.
* The _MBRs_ field was removed. MBRs are now stored in the R-Tree.
* Structures other than the footer like tile offsets, sizes and metadata are wrapped in their own generic tiles. This allows loading them lazily and in parallel.
* Tile offsets and sizes are wrapped in their own generic tiles. This allows loading them lazily and in parallel.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tile metadata did not exist back then.


## Version 2

Expand Down
97 changes: 43 additions & 54 deletions format_spec/array_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,75 +2,48 @@
title: Array Schema
---

## Current Array Schema Version

The current array schema version(`>=10`) is a folder called `__schema` located here:

```
my_array # array folder
| ...
|_ __schema # array schema folder
|_ <timestamped_name> # array schema file
|_ ...
```

The array schema folder can contain:

* Any number of [array schema files](#array-schema-file) with name [`<timestamped_name>`](./timestamped_name.md).
* Note: the name does _not_ include the format version.

## Previous Array Schema Version

The previous array schema version(`<=9`) has a file named `__array_schema.tdb` and is located here:

```
my_array # array folder
|_ ....
|_ __array_schema.tdb # array schema file
|_ ...
```

teo-tsirpanis marked this conversation as resolved.
Show resolved Hide resolved
## Array Schema File

The array schema file consists of a single [generic tile](./generic_tile.md), with the following data:

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Array version | `uint32_t` | Format version number of the array schema |
| Allows dups | `bool` | Whether or not the array allows duplicate cells |
| Array version | `uint32_t` | [Format version](./array_format_history.md) number of the array schema |
| Allows dups | `bool` | _New in version 5_ Whether or not the array allows duplicate cells |
| Array type | `uint8_t` | Dense or sparse |
| Tile order | `uint8_t` | Row or column major |
| Cell order | `uint8_t` | Row or column major |
| Capacity | `uint64_t` | For sparse fragments, the data tile capacity |
| Coords filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used as default for coordinate tiles |
| Offsets filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used for cell var-len offset tiles |
| Validity filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used for cell validity tiles |
| Validity filters | [Filter Pipeline](./filter_pipeline.md) | _New in version 7_ The filter pipeline used for cell validity tiles |
| Domain | [Domain](#domain) | The array domain |
| Num attributes | `uint32_t` | Number of attributes in the array |
| Attribute 1 | [Attribute](#attribute) | First attribute |
| … | … | … |
| Attribute N | [Attribute](#attribute) | Nth attribute |
| Num labels | `uint32_t` | Number of dimension labels in the array |
| Label 1 | [Dimension Label](#dimension_label) | First dimension label |
| Num labels | `uint32_t` | _New in version 18_ Number of dimension labels in the array |
| Label 1 | [Dimension Label](#dimension_label) | _New in version 18_ First dimension label |
| … | … | … |
| Label N | [Dimension Label](#dimension_label) | Nth dimension label |
| Num enumerations | `uint32_t` | Number of [enumerations](./enumeration.md) in the array |
| Enumeration name length 1 | `uint32_t` | The number of characters in the enumeration 1 name |
| Enumeration name 1 | `uint8_t[]` | The name of enumeration 1 |
| Enumeration filename length 1 | `uint32_t` | The number of characters in the enumeration 1 file |
| Enumeration filename 1 | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration 1's data |
| Enumeration name length N | `uint32_t` | The number of characters in the enumeration N name |
| Enumeration name N | `uint8_t[]` | The name of enumeration N |
| Enumeration filename length N | `uint32_t` | The number of characters in the enumeration N file |
| Enumeration filename N | `uint8_t[]` | The name of the file in the `__enumerations` subdirectory that conatins enumeration N's data |
| CurrentDomain | [CurrentDomain](./current_domain.md) | The array current domain |
| Label N | [Dimension Label](#dimension_label) | _New in version 18_ Nth dimension label |
| Num enumerations | `uint32_t` | _New in version 20_ Number of [enumerations](./enumeration.md) in the array |
| Enumeration name length 1 | `uint32_t` | _New in version 20_ The number of characters in the enumeration 1 name |
| Enumeration name 1 | `uint8_t[]` | _New in version 20_ The name of enumeration 1 |
| Enumeration filename length 1 | `uint32_t` | _New in version 20_ The number of characters in the enumeration 1 file |
| Enumeration filename 1 | `uint8_t[]` | _New in version 20_ The name of the file in the `__enumerations` subdirectory that contains enumeration 1's data |
| Enumeration name length N | `uint32_t` | _New in version 20_ The number of characters in the enumeration N name |
| Enumeration name N | `uint8_t[]` | _New in version 20_ The name of enumeration N |
| Enumeration filename length N | `uint32_t` | _New in version 20_ The number of characters in the enumeration N file |
| Enumeration filename N | `uint8_t[]` | _New in version 20_ The name of the file in the `__enumerations` subdirectory that contains enumeration N's data |
| Current domain | [Current Domain](#current-domain) | _New in version 22_ The array's current domain |

## Domain

The domain has internal format:

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Domain datatype | `uint8_t` | _Removed in version 5_ Datatype of all dimensions |
| Num dimensions | `uint32_t` | Dimensionality/rank of the domain |
| Dimension 1 | [Dimension](#dimension) | First dimension |
| … | … | … |
Expand All @@ -84,14 +57,17 @@ The dimension has internal format:
| :--- | :--- | :--- |
| Dimension name length | `uint32_t` | Number of characters in dimension name |
| Dimension name | `uint8_t[]` | Dimension name character array |
| Dimension datatype | `uint8_t` | Datatype of the coordinate values |
| Cell val num | `uint32_t` | Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()` |
| Filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used on coordinate value tiles |
| Domain size | `uint64_t[]` | The domain size in bytes |
| Dimension datatype | `uint8_t` | _New in version 5_ Datatype of the coordinate values |
| Cell val num | `uint32_t` | _New in version 5_ Number of coordinate values per cell. For variable-length dimensions, this is `std::numeric_limits<uint32_t>::max()` |
| Filters | [Filter Pipeline](./filter_pipeline.md) | _New in version 5_ The filter pipeline used on coordinate value tiles |
| Domain size | `uint64_t` | _New in version 5_ The domain size in bytes |
| Domain | `uint8_t[]` | Byte array of length equal to domain size above, storing the min, max values of the dimension. |
| Null tile extent | `uint8_t` | `1` if the dimension has a null tile extent, else `0`. |
| Tile extent | `uint8_t[]` | Byte array of length equal to the dimension datatype size, storing the space tile extent of this dimension. |

> [!NOTE]
> Prior to version 5, the size of the _Domain_ field was always equal to twice the size of the dimension's data type (which is stored in the [domain](#domain) in these versions).

## Attribute

The attribute has internal format:
Expand All @@ -103,11 +79,11 @@ The attribute has internal format:
| Attribute datatype | `uint8_t` | Datatype of the attribute values |
| Cell val num | `uint32_t` | Number of attribute values per cell. For variable-length attributes, this is `std::numeric_limits<uint32_t>::max()` |
| Filters | [Filter Pipeline](./filter_pipeline.md) | The filter pipeline used on attribute value tiles |
| Fill value size | `uint64_t` | The size in bytes of the fill value |
| Fill value | `uint8_t[]` | The fill value |
| Nullable | `bool` | Whether or not the attribute can be null |
| Fill value validity | `uint8_t` | The validity fill value |
| Order | `uint8_t` | Order of the data stored in the attribute. This may be unordered, increasing or decreasing |
| Fill value size | `uint64_t` | _New in version 6_ The size in bytes of the fill value |
| Fill value | `uint8_t[]` | _New in version 6_ The fill value |
| Nullable | `bool` | _New in version 7_ Whether or not the attribute can be null |
| Fill value validity | `uint8_t` | _New in version 7_ The validity fill value |
| Order | `uint8_t` | _New in version 17_ Order of the data stored in the attribute. This may be unordered, increasing or decreasing |

## Dimension Label

Expand All @@ -127,6 +103,19 @@ The dimension label has internal format:
| Label datatype | `uint8_t` | The datatype of the label data |
| Label cell_val_num | `uint32_t` | The number of values per cell of the label data. For variable-length labels, this is `std::numeric_limits<uint32_t>::max()` |
| Label domain size | `uint64_t` | The size of the label domain |
| Label domain start size | `uint64_t` | The size of the first value of the domain for variable-lenght datatypes. For fixed-lenght labels, this is 0|
| Label domain start size | `uint64_t` | The size of the first value of the domain for variable-length datatypes. For fixed-length labels, this is 0|
| Label domain data | `uint8_t[]`| Byte array of length equal to domain size above, storing the min, max values of the dimension |
| Is external | `uint8_t` | If the URI is not stored as part of this array |

## Current Domain

If a current domain is empty, only the version number and the empty flag are serialized to storage.

The current domain format is versioned separately from arrays. The current version is `1`.

| **Field** | **Type** | **Description** |
| :--- | :--- | :--- |
| Version number | `uint32_t` | Current domain version number |
| Empty | `uint8_t` | Whether the current domain has a representation(e.g. NDRectangle) set or not |
teo-tsirpanis marked this conversation as resolved.
Show resolved Hide resolved
| Type | `uint8_t` | The type of current domain stored in this file |
| NDRectangle | [MBR](./fragment.md#mbr) | A hyperrectangle defined using [1DRange](./fragment.md#mbr) items for each dimension |
Loading
Loading