Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect results for joins on hash collisions #843

Closed
alamb opened this issue Aug 9, 2021 · 3 comments · Fixed by #845
Closed

Incorrect results for joins on hash collisions #843

alamb opened this issue Aug 9, 2021 · 3 comments · Fixed by #845
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Aug 9, 2021

Describe the bug
DataFusion joins seem to produce incorrect results when there is a collision in the hash function. This is very rare, but it can happen.

To Reproduce
After #842 is merged, remove the #[cfg(not(feature = "force_hash_collisions"))] gate from the join tests, and run

cd datafusion
cargo test --features=force_hash_collisions

Here is a diff that does so:

diff --git a/datafusion/src/physical_plan/hash_join.rs b/datafusion/src/physical_plan/hash_join.rs
index fa75437e3..1a57c404e 100644
--- a/datafusion/src/physical_plan/hash_join.rs
+++ b/datafusion/src/physical_plan/hash_join.rs
@@ -1372,8 +1372,6 @@ mod tests {
     }
 
     #[tokio::test]
-    // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-    #[cfg(not(feature = "force_hash_collisions"))]
     async fn join_full_multi_batch() {
         let left = build_table(
             ("a1", &vec![1, 2, 3]),
@@ -1639,8 +1637,6 @@ mod tests {
     }
 
     #[tokio::test]
-    // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-    #[cfg(not(feature = "force_hash_collisions"))]
     async fn join_right_one() -> Result<()> {
         let left = build_table(
             ("a1", &vec![1, 2, 3]),
@@ -1677,8 +1673,6 @@ mod tests {
     }
 
     #[tokio::test]
-    // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-    #[cfg(not(feature = "force_hash_collisions"))]
     async fn partitioned_join_right_one() -> Result<()> {
         let left = build_table(
             ("a1", &vec![1, 2, 3]),
@@ -1716,8 +1710,6 @@ mod tests {
     }
 
     #[tokio::test]
-    // Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-    #[cfg(not(feature = "force_hash_collisions"))]
     async fn join_full_one() -> Result<()> {
         let left = build_table(
             ("a1", &vec![1, 2, 3]),
diff --git a/datafusion/tests/sql.rs b/datafusion/tests/sql.rs
index 046e4f28e..0c33bd477 100644
--- a/datafusion/tests/sql.rs
+++ b/datafusion/tests/sql.rs
@@ -1797,8 +1797,6 @@ async fn equijoin_left_and_condition_from_right() -> Result<()> {
 }
 
 #[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-#[cfg(not(feature = "force_hash_collisions"))]
 async fn equijoin_right_and_condition_from_left() -> Result<()> {
     let mut ctx = create_join_context("t1_id", "t2_id")?;
     let sql =
@@ -1852,8 +1850,6 @@ async fn left_join() -> Result<()> {
 }
 
 #[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-#[cfg(not(feature = "force_hash_collisions"))]
 async fn right_join() -> Result<()> {
     let mut ctx = create_join_context("t1_id", "t2_id")?;
     let equivalent_sql = [
@@ -1874,8 +1870,6 @@ async fn right_join() -> Result<()> {
 }
 
 #[tokio::test]
-// Disable until https://github.com/apache/arrow-datafusion/issues/843 fixed
-#[cfg(not(feature = "force_hash_collisions"))]
 async fn full_join() -> Result<()> {
     let mut ctx = create_join_context("t1_id", "t2_id")?;
     let equivalent_sql = [

This results in the following failures:

failures:

---- physical_plan::hash_join::tests::join_full_one stdout ----
thread 'physical_plan::hash_join::tests::join_full_one' panicked at 'assertion failed: `(left == right)`
  left: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b2 | c2 |", "+----+----+----+----+----+----+", "|    |    |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 3  | 7  | 9  |    |    |    |", "+----+----+----+----+----+----+"]`,
 right: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b2 | c2 |", "+----+----+----+----+----+----+", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 3  | 7  | 9  |    |    |    |", "+----+----+----+----+----+----+"]`: 

expected:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b2 | c2 |",
    "+----+----+----+----+----+----+",
    "|    |    |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 3  | 7  | 9  |    |    |    |",
    "+----+----+----+----+----+----+",
]
actual:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b2 | c2 |",
    "+----+----+----+----+----+----+",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 3  | 7  | 9  |    |    |    |",
    "+----+----+----+----+----+----+",
]

', datafusion/src/physical_plan/hash_join.rs:1747:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- physical_plan::hash_join::tests::join_right_one stdout ----
thread 'physical_plan::hash_join::tests::join_right_one' panicked at 'assertion failed: `(left == right)`
  left: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b1 | c2 |", "+----+----+----+----+----+----+", "|    | 6  |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "+----+----+----+----+----+----+"]`,
 right: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b1 | c2 |", "+----+----+----+----+----+----+", "|    | 4  |    | 10 | 4  | 70 |", "|    | 4  |    | 10 | 4  | 70 |", "|    | 5  |    | 20 | 5  | 80 |", "|    | 5  |    | 20 | 5  | 80 |", "|    | 6  |    | 30 | 6  | 90 |", "|    | 6  |    | 30 | 6  | 90 |", "|    | 6  |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "+----+----+----+----+----+----+"]`: 

expected:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b1 | c2 |",
    "+----+----+----+----+----+----+",
    "|    | 6  |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "+----+----+----+----+----+----+",
]
actual:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b1 | c2 |",
    "+----+----+----+----+----+----+",
    "|    | 4  |    | 10 | 4  | 70 |",
    "|    | 4  |    | 10 | 4  | 70 |",
    "|    | 5  |    | 20 | 5  | 80 |",
    "|    | 5  |    | 20 | 5  | 80 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "+----+----+----+----+----+----+",
]

', datafusion/src/physical_plan/hash_join.rs:1670:9

---- physical_plan::hash_join::tests::join_full_multi_batch stdout ----
thread 'physical_plan::hash_join::tests::join_full_multi_batch' panicked at 'assertion failed: `(left == right)`
  left: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b2 | c2 |", "+----+----+----+----+----+----+", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 3  | 7  | 9  |    |    |    |", "+----+----+----+----+----+----+"]`,
 right: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b2 | c2 |", "+----+----+----+----+----+----+", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 10 | 4  | 70 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 20 | 5  | 80 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "|    |    |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "| 3  | 7  | 9  |    |    |    |", "+----+----+----+----+----+----+"]`: 

expected:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b2 | c2 |",
    "+----+----+----+----+----+----+",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 3  | 7  | 9  |    |    |    |",
    "+----+----+----+----+----+----+",
]
actual:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b2 | c2 |",
    "+----+----+----+----+----+----+",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 10 | 4  | 70 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 20 | 5  | 80 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "|    |    |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "| 3  | 7  | 9  |    |    |    |",
    "+----+----+----+----+----+----+",
]

', datafusion/src/physical_plan/hash_join.rs:1414:9

---- physical_plan::hash_join::tests::partitioned_join_right_one stdout ----
thread 'physical_plan::hash_join::tests::partitioned_join_right_one' panicked at 'assertion failed: `(left == right)`
  left: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b1 | c2 |", "+----+----+----+----+----+----+", "|    | 6  |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "+----+----+----+----+----+----+"]`,
 right: `["+----+----+----+----+----+----+", "| a1 | b1 | c1 | a2 | b1 | c2 |", "+----+----+----+----+----+----+", "|    | 4  |    | 10 | 4  | 70 |", "|    | 4  |    | 10 | 4  | 70 |", "|    | 5  |    | 20 | 5  | 80 |", "|    | 5  |    | 20 | 5  | 80 |", "|    | 6  |    | 30 | 6  | 90 |", "|    | 6  |    | 30 | 6  | 90 |", "|    | 6  |    | 30 | 6  | 90 |", "| 1  | 4  | 7  | 10 | 4  | 70 |", "| 2  | 5  | 8  | 20 | 5  | 80 |", "+----+----+----+----+----+----+"]`: 

expected:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b1 | c2 |",
    "+----+----+----+----+----+----+",
    "|    | 6  |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "+----+----+----+----+----+----+",
]
actual:

[
    "+----+----+----+----+----+----+",
    "| a1 | b1 | c1 | a2 | b1 | c2 |",
    "+----+----+----+----+----+----+",
    "|    | 4  |    | 10 | 4  | 70 |",
    "|    | 4  |    | 10 | 4  | 70 |",
    "|    | 5  |    | 20 | 5  | 80 |",
    "|    | 5  |    | 20 | 5  | 80 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "|    | 6  |    | 30 | 6  | 90 |",
    "| 1  | 4  | 7  | 10 | 4  | 70 |",
    "| 2  | 5  | 8  | 20 | 5  | 80 |",
    "+----+----+----+----+----+----+",
]

', datafusion/src/physical_plan/hash_join.rs:1707:9


failures:
    physical_plan::hash_join::tests::join_full_multi_batch
    physical_plan::hash_join::tests::join_full_one
    physical_plan::hash_join::tests::join_right_one
    physical_plan::hash_join::tests::partitioned_join_right_one

test result: FAILED. 640 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out; finished in 4.65s

error: test failed, to rerun pass '--lib'

Compilation exited abnormally with code 101 at Mon Aug  9 11:02:00

Expected behavior
Tests should pass

@alamb alamb added the bug Something isn't working label Aug 9, 2021
@alamb
Copy link
Contributor Author

alamb commented Aug 9, 2021

FYI @Dandandan

@Dandandan
Copy link
Contributor

Dandandan commented Aug 9, 2021

Interesting approach! I will look at this later to solve it if you don't beat me to it.

Based on the test output I think a bug triggers with handling of NULL values.

        match (left_array.is_null($left), left_array.is_null($right)) {
            (false, false) => left_array.value($left) == right_array.value($right),
            _ => false,
        }

This looks suspicious, at least this should be right_array for the second value.

@Dandandan
Copy link
Contributor

I think (besides above error, but that doesn't seem to be the problem) there is something else happening in the FULL/LEFT/RIGHT implementations:

if a row doesn't match the constraint, currently it emits a "null-row" for each value it didn't match with. This should only be done if it doesn't match any value within the checked indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants