Skip to content

Commit

Permalink
Improve default implementation of Array::is_nullable (#6721)
Browse files Browse the repository at this point in the history
* Improve default implementation of Array::is_nullable

Since is_nullable returns `false` if the array is guaranteed to not
contain any logical nulls, the correct default implementation could
leverage `self.logical_null_count` more appropriately than
`self.null_count`.

To reduce chance of negative surprises in downstream projects, we could
introduce the new behavior under `is_logically_nullable` name and
deprecate `is_nullable` without changing it.

* Implement efficient DictionaryArray::logical_null_count
  • Loading branch information
findepi authored Nov 20, 2024
1 parent 70f415d commit 7c50fa8
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 2 deletions.
20 changes: 20 additions & 0 deletions arrow-array/src/array/dictionary_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -749,6 +749,26 @@ impl<T: ArrowDictionaryKeyType> Array for DictionaryArray<T> {
}
}

fn logical_null_count(&self) -> usize {
match (self.keys.nulls(), self.values.logical_nulls()) {
(None, None) => 0,
(Some(key_nulls), None) => key_nulls.null_count(),
(None, Some(value_nulls)) => self
.keys
.values()
.iter()
.filter(|k| value_nulls.is_null(k.as_usize()))
.count(),
(Some(key_nulls), Some(value_nulls)) => self
.keys
.values()
.iter()
.enumerate()
.filter(|(idx, k)| key_nulls.is_null(*idx) || value_nulls.is_null(k.as_usize()))
.count(),
}
}

fn is_nullable(&self) -> bool {
!self.is_empty() && (self.nulls().is_some() || self.values.is_nullable())
}
Expand Down
3 changes: 1 addition & 2 deletions arrow-array/src/array/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -317,8 +317,7 @@ pub trait Array: std::fmt::Debug + Send + Sync {
/// even if the nulls present in [`DictionaryArray::values`] are not referenced by any key,
/// and therefore would not appear in [`Array::logical_nulls`].
fn is_nullable(&self) -> bool {
// TODO this is not necessarily perfect default implementation, since null_count() and logical_null_count() are not always equivalent
self.null_count() != 0
self.logical_null_count() != 0
}

/// Returns the total number of bytes of memory pointed to by this array.
Expand Down

0 comments on commit 7c50fa8

Please sign in to comment.