You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
DataFusion joins seem to produce incorrect results when there is a collision in the hash function. This is very rare, but it can happen.
To Reproduce
After #842 is merged, remove the #[cfg(not(feature = "force_hash_collisions"))] gate from the join tests, and run
cd datafusion
cargo test --features=force_hash_collisions
Here is a diff that does so:
diff --git a/datafusion/src/physical_plan/hash_join.rs b/datafusion/src/physical_plan/hash_join.rs
index fa75437e3..1a57c404e 100644
--- a/datafusion/src/physical_plan/hash_join.rs+++ b/datafusion/src/physical_plan/hash_join.rs@@ -1372,8 +1372,6 @@ mod tests {
}
#[tokio::test]
- // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed- #[cfg(not(feature = "force_hash_collisions"))]
async fn join_full_multi_batch() {
let left = build_table(
("a1", &vec![1, 2, 3]),
@@ -1639,8 +1637,6 @@ mod tests {
}
#[tokio::test]
- // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed- #[cfg(not(feature = "force_hash_collisions"))]
async fn join_right_one() -> Result<()> {
let left = build_table(
("a1", &vec![1, 2, 3]),
@@ -1677,8 +1673,6 @@ mod tests {
}
#[tokio::test]
- // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed- #[cfg(not(feature = "force_hash_collisions"))]
async fn partitioned_join_right_one() -> Result<()> {
let left = build_table(
("a1", &vec![1, 2, 3]),
@@ -1716,8 +1710,6 @@ mod tests {
}
#[tokio::test]
- // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed- #[cfg(not(feature = "force_hash_collisions"))]
async fn join_full_one() -> Result<()> {
let left = build_table(
("a1", &vec![1, 2, 3]),
diff --git a/datafusion/tests/sql.rs b/datafusion/tests/sql.rs
index 046e4f28e..0c33bd477 100644
--- a/datafusion/tests/sql.rs+++ b/datafusion/tests/sql.rs@@ -1797,8 +1797,6 @@ async fn equijoin_left_and_condition_from_right() -> Result<()> {
}
#[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed-#[cfg(not(feature = "force_hash_collisions"))]
async fn equijoin_right_and_condition_from_left() -> Result<()> {
let mut ctx = create_join_context("t1_id", "t2_id")?;
let sql =
@@ -1852,8 +1850,6 @@ async fn left_join() -> Result<()> {
}
#[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed-#[cfg(not(feature = "force_hash_collisions"))]
async fn right_join() -> Result<()> {
let mut ctx = create_join_context("t1_id", "t2_id")?;
let equivalent_sql = [
@@ -1874,8 +1870,6 @@ async fn right_join() -> Result<()> {
}
#[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed-#[cfg(not(feature = "force_hash_collisions"))]
async fn full_join() -> Result<()> {
let mut ctx = create_join_context("t1_id", "t2_id")?;
let equivalent_sql = [
I think (besides above error, but that doesn't seem to be the problem) there is something else happening in the FULL/LEFT/RIGHT implementations:
if a row doesn't match the constraint, currently it emits a "null-row" for each value it didn't match with. This should only be done if it doesn't match any value within the checked indices.
Describe the bug
DataFusion joins seem to produce incorrect results when there is a collision in the hash function. This is very rare, but it can happen.
To Reproduce
After #842 is merged, remove the
#[cfg(not(feature = "force_hash_collisions"))]
gate from the join tests, and runHere is a diff that does so:
This results in the following failures:
Expected behavior
Tests should pass
The text was updated successfully, but these errors were encountered: