Skip to content

Commit

Permalink
Add (more) Parquet Metadata Documentation (#6184)
Browse files Browse the repository at this point in the history
* Minor: Add (more) Parquet Metadata Documenation

* fix clippy
  • Loading branch information
alamb authored Aug 6, 2024
1 parent d5ed6b9 commit db239e5
Showing 1 changed file with 61 additions and 0 deletions.
61 changes: 61 additions & 0 deletions parquet/src/file/metadata/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,67 @@
//! * [`ColumnChunkMetaData`]: Metadata for each column chunk (primitive leaf)
//! within a Row Group including encoding and compression information,
//! number of values, statistics, etc.
//!
//! # APIs for working with Parquet Metadata
//!
//! The Parquet readers and writers in this crate read and write
//! metadata into parquet files. To work with metadata directly,
//! the following APIs are available.
//!
//! Reading:
//! * Read from bytes to `ParquetMetaData`: [`decode_footer`]
//! and [`decode_metadata`]
//! * Read from an `async` source to `ParquetMetadata`: [`MetadataLoader`]
//!
//! [`MetadataLoader`]: https://docs.rs/parquet/latest/parquet/arrow/async_reader/struct.MetadataLoader.html
//! [`decode_footer`]: crate::file::footer::decode_footer
//! [`decode_metadata`]: crate::file::footer::decode_metadata
//!
//! Writing:
//! * Write `ParquetMetaData` to bytes in memory: Not yet supported (see [#6002])
//! * Writes `ParquetMetaData` to an async target: Not yet supported
//!
//! [#6002]: https://github.com/apache/arrow-rs/issues/6002
//!
//! # Metadata Encodings and Structures
//!
//! There are three different encodings of Parquet Metadata in this crate:
//!
//! 1. `bytes`:encoded with the Thrift TCompactProtocol as defined in
//! [parquet.thrift]
//!
//! 2. [`format`]: Rust structures automatically generated by the thrift compiler
//! from [parquet.thrift]. These structures are low level and mirror
//! the thrift definitions.
//!
//! 3. [`file::metadata`] (this module): Easier to use Rust structures
//! with a more idiomatic API. Note that, confusingly, some but not all
//! of these structures have the same name as the [`format`] structures.
//!
//! [`format`]: crate::format
//! [`file::metadata`]: crate::file::metadata
//! [parquet.thrift]: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift
//!
//! Graphically, this is how the different structures relate to each other:
//!
//! ```text
//! ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
//! ┌──────────────┐ │ ┌───────────────────────┐ │
//! │ │ ColumnIndex │ ││ ParquetMetaData │
//! └──────────────┘ │ └───────────────────────┘ │
//! ┌──────────────┐ │ ┌────────────────┐ │┌───────────────────────┐
//! │ ..0x24.. │ ◀────▶ │ OffsetIndex │ │ ◀────▶ │ ParquetMetaData │ │
//! └──────────────┘ │ └────────────────┘ │└───────────────────────┘
//! ... │ ... │
//! │ ┌──────────────────┐ │ ┌──────────────────┐
//! bytes │ FileMetaData* │ │ │ FileMetaData* │ │
//! (thrift encoded) │ └──────────────────┘ │ └──────────────────┘
//! ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
//!
//! format::meta structures file::metadata structures
//!
//! * Same name, different struct
//! ```
mod memory;

use std::ops::Range;
Expand Down

0 comments on commit db239e5

Please sign in to comment.