Skip to content

Commit

Permalink
More fixes.
Browse files Browse the repository at this point in the history
  • Loading branch information
teo-tsirpanis committed Oct 9, 2024
1 parent ae4fd0c commit b45e03f
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 12 deletions.
2 changes: 1 addition & 1 deletion format_spec/FORMAT_SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ title: Format Specification
* Data written by TileDB and referenced in this document is **little-endian**
with the following exceptions:

- [Dictionary filter](filters/dictionary_encoding.md)
- [Dictionary encoding filter](filters/dictionary_encoding.md)
- RLE filter

## Table of Contents
Expand Down
5 changes: 3 additions & 2 deletions format_spec/array_format_history.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Introduced in TileDB 2.17
Introduced in TileDB 2.16

* [Vacuum files](./vacuum_file.md) contain relative paths to the location of the array.
* The [filter pipeline options](./filter_pipeline.md#filter-options) for the delta filter contain the _Reinterpret datatype_ field.

## Version 18

Expand All @@ -44,7 +45,7 @@ Introduced in TileDB 2.15
Introduced in TileDB 2.14

* The _Order_ field was added to [attributes](./array_schema.md#attribute).
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary filter exists in the filter pipeline. They are instead encoded as part of the data tile.
* Cell offsets in dimensions or attributes of UTF-8 string type are not written in the offset tiles, if the RLE or dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.

## Version 16

Expand All @@ -71,7 +72,7 @@ Introduced in TileDB 2.10

Introduced in TileDB 2.9

* The [dictionary filter](./filters/dictionary_encoding.md) was added.
* Cell offsets in dimensions or attributes of ASCII string type are not written in the offset tiles, if the dictionary encoding filter exists in the filter pipeline. They are instead encoded as part of the data tile.

## Version 12

Expand Down
2 changes: 1 addition & 1 deletion format_spec/filter_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ For the `TILEDB_FILTER_DELTA` and `TILEDB_FILTER_DOUBLE_DELTA` compression filte
| Reinterpret datatype | `uint8_t` | Type to reinterpret data prior to compression. |

> [!NOTE]
> Prior to version 20, the `Reinterpret datatype` field was not present for the double delta filter.
> Prior to version 20, the _Reinterpret datatype_ field was not present for the double delta filter. Also prior to version 19, the same field was not present for the delta filter.
### Bit-width Reduction Options

Expand Down
4 changes: 2 additions & 2 deletions format_spec/filters/dictionary_encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ As an example in pseudocode:
output_data = [0, 0, 0, 1, 1, 2, 0, 1]
```

The dictionary filter is supported only for variable-sized strings and must be the first filter in the filter pipeline.
The dictionary encoding filter is supported only for variable-sized strings and must be the first filter in the filter pipeline.

# Filter Enum Value

Expand All @@ -26,7 +26,7 @@ The filter enum value for the Dictionary Encoding filter is `14` (`TILEDB_FILTER

All the above integers are stored in big-endian format.

Because the dictionary filter works on variable-sized cells of data, it filters the cell data and offsets combined and its output gets stored in the variable-sized data file, after applying any subsequent filters. The fixed-sized data file does not contain any data.
Because the dictionary encoding filter works on variable-sized cells of data, it filters the cell data and offsets combined and its output gets stored in the variable-sized data file, after applying any subsequent filters. The fixed-sized data file does not contain any data.

> [!NOTE]
> Prior to version 13 for ASCII strings and version 17 for UTF-8 strings, the offsets buffers was separately filtered as well due to an oversight. Accessing the cell offsets only is generally not useful to implementations.
12 changes: 6 additions & 6 deletions format_spec/vacuum_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,20 @@ title: Vacuum File
A vacuum file has name `[`<timestamped_name>`](./timestamped_name.md)`.vac` and can be located either in the array commit folder:

```
my_array # array folder
my_array # array folder
|_ ....
|_ __commits # array commit folder
|_ <timestamped_name>.vac # vacuum file
|_ __commits # array commit folder
|_ <timestamped_name>.vac # vacuum file
```

or in the array or group metadata folder:

```
my_obj # array/group folder
my_obj # array/group folder
| ...
| __meta # array metadata folder
| __meta # metadata folder
| ...
| <timestamped_name>.vac # vacuum file
| <timestamped_name>.vac # vacuum file
| ...
```

Expand Down

0 comments on commit b45e03f

Please sign in to comment.