Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(common): add more docs for DataChunk #8736

Merged
merged 5 commits into from
Mar 23, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions src/common/src/array/column.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,18 @@ use risingwave_pb::data::PbColumn;
use super::{Array, ArrayError, ArrayResult, I64Array};
use crate::array::{ArrayImpl, ArrayRef};

/// Column is owned by `DataChunk`. It consists of logic data type and physical array
/// implementation.
/// A [`Column`] consists of its logical data type
/// and its corresponding physical array implementation,
/// The array contains all the datums bound to this [`Column`].
/// [`Column`] is owned by [`DataChunk`].
///
/// For instance, in this [`DataChunk`],
/// for column `v1`, [`ArrayRef`] will contain: [1,1,1]
/// | v1 | v2 |
/// |----|----|
/// | 1 | a |
/// | 1 | b |
/// | 1 | c |
#[derive(Clone, Debug, PartialEq)]
pub struct Column {
array: ArrayRef,
Expand Down
31 changes: 29 additions & 2 deletions src/common/src/array/data_chunk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,23 @@ use crate::util::hash_util::finalize_hashers;
use crate::util::iter_util::{ZipEqDebug, ZipEqFast};
use crate::util::value_encoding::{serialize_datum_into, ValueRowSerializer};

/// `DataChunk` is a collection of arrays with visibility mask.
/// [`DataChunk`] is a collection of Columns,
/// a with visibility mask for each row.
/// For instance, we could have a [`DataChunk`] of this format.
/// | v1 | v2 | v3 |
/// |----|----|----|
/// | 1 | a | t |
/// | 2 | b | f |
/// | 3 | c | t |
/// | 4 | d | f |
///
/// Our columns are v1, v2, v3.
/// Then, if the Visibility Mask hides rows 2 and 4,
/// We will only have these columns visible:
/// | v1 | v2 | v3 |
/// |----|----|----|
/// | 1 | a | t |
/// | 3 | c | t |
#[derive(Clone, PartialEq)]
#[must_use]
pub struct DataChunk {
Expand Down Expand Up @@ -170,7 +186,18 @@ impl DataChunk {
}

/// `compact` will convert the chunk to compact format.
/// Compact format means that `visibility == None`.
/// Compacting removes the hidden rows, and returns a new visibility
/// mask which indicates this.
///
/// `compact` has trade-offs:
///
/// Cost:
/// It has to rebuild the each column, meaning it will incur cost
/// of copying over bytes from the original column array to the new one.
///
/// Benefit:
/// The main benefit is that the data chunk is smaller, taking up less memory.
/// We can also save the cost of iterating over many hidden rows.
pub fn compact(self) -> Self {
match &self.vis2 {
Vis::Compact(_) => self,
Expand Down
3 changes: 2 additions & 1 deletion src/common/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,8 @@ macro_rules! impl_array_builder {
}
}

/// Append a [`Datum`] or [`DatumRef`] multiple times, return error while type not match.
/// Append a [`Datum`] or [`DatumRef`] multiple times,
/// panicking if the datum's type does not match the array builder's type.
pub fn append_datum_n(&mut self, n: usize, datum: impl ToDatumRef) {
match datum.to_datum_ref() {
None => match self {
Expand Down
8 changes: 6 additions & 2 deletions src/common/src/array/vis.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,15 @@ use itertools::repeat_n;

use crate::buffer::{Bitmap, BitmapBuilder};

/// `Vis` is a visibility bitmap of rows. When all rows are visible, it is considered compact and
/// is represented by a single cardinality number rather than that many of ones.
/// `Vis` is a visibility bitmap of rows.
#[derive(Clone, PartialEq, Debug)]
pub enum Vis {
/// Non-compact variant.
/// Certain rows are excluded using this bitmap.
Bitmap(Bitmap),

/// Compact variant which just stores cardinality of rows.
/// This is used when all rows are visible.
Compact(usize), // equivalent to all ones of this size
}

Expand Down