Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Count_Distinct on a Boolean column in Snowflake #10611

Closed
radeusgd opened this issue Jul 19, 2024 · 2 comments · Fixed by #10818
Closed

Fix Count_Distinct on a Boolean column in Snowflake #10611

radeusgd opened this issue Jul 19, 2024 · 2 comments · Fixed by #10818
Assignees
Labels
-libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors

Comments

@radeusgd
Copy link
Member

Currently, when trying to perform Count_Distinct aggregate on a Boolean column in the Snowflake backend, it fails with:

An unexpected dataflow error (There was an SQL error: Failed to cast variant value {"enso-null-replacement-marker":"084954a3-9f58-4a2d-aa63-3f56bc270800"} to BOOLEAN. [Query was: SELECT COUNT("Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Concatenate strs") AS "Concatenate strs", COUNT("Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Sum ints") AS "Sum ints", COUNT("Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Count Distinct bools") AS "Count Distinct bools" FROM (SELECT "Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Concatenate strs" AS "Concatenate strs", "Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Sum ints" AS "Sum ints", "Tinfo-31365771-714d-4377-bb92-da160dccbf19"."Cou (...) "strs") END, ?) || (?) AS "Concatenate strs", SUM("Tinfo-31365771-714d-4377-bb92-da160dccbf19"."ints") AS "Sum ints", COUNT(DISTINCT COALESCE("Tinfo-31365771-714d-4377-bb92-da160dccbf19"."bools", {'enso-null-replacement-marker':'084954a3-9f58-4a2d-aa63-3f56bc270800'}::variant)) AS "Count Distinct bools" FROM "Tinfo-31365771-714d-4377-bb92-da160dccbf19" AS "Tinfo-31365771-714d-4377-bb92-da160dccbf19") AS "Tinfo-31365771-714d-4377-bb92-da160dccbf19") AS "Tinfo-31365771-714d-4377-bb92-da160dccbf19"]) has been matched (at C:\NBO\enso\test\Snowflake_Tests\src\Snowflake_Spec.enso:189:13-109).

Apparently the 'special null replacement value' that we use does not play well with the Boolean type.

@radeusgd radeusgd self-assigned this Jul 19, 2024
@radeusgd radeusgd added -libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors labels Jul 19, 2024
@radeusgd
Copy link
Member Author

radeusgd commented Jul 19, 2024

Adding a special test to Aggregate_Spec for this case: should work correctly with Boolean columns:

        group_builder.specify "should work correctly with Boolean columns" <|
            table = table_builder [["A", [True, True, True]], ["B", [False, False, False]], ["C", [True, False, True]], ["D", [Nothing, False, True]]]

            t_with_nulls = table.aggregate columns=[..Count_Distinct "A", ..Count_Distinct "B", ..Count_Distinct "C", ..Count_Distinct "D"]
            m1 = materialize t_with_nulls
            m1.column_count . should_equal 4
            m1.at "Count Distinct A" . to_vector . should_equal [1]
            m1.at "Count Distinct B" . to_vector . should_equal [1]
            m1.at "Count Distinct C" . to_vector . should_equal [2]
            m1.at "Count Distinct D" . to_vector . should_equal [3]

            t_without_nulls = table.aggregate columns=[..Count_Distinct "A" ignore_nothing=True, ..Count_Distinct "B" ignore_nothing=True, ..Count_Distinct "C" ignore_nothing=True, ..Count_Distinct "D" ignore_nothing=True]
            m2 = materialize t_without_nulls
            m2.column_count . should_equal 4
            m2.at "Count Distinct A" . to_vector . should_equal [1]
            m2.at "Count Distinct B" . to_vector . should_equal [1]
            m2.at "Count Distinct C" . to_vector . should_equal [2]
            # The NULL is ignored, and not counted towards the total
            m2.at "Count Distinct D" . to_vector . should_equal [2]

radeusgd added a commit that referenced this issue Jul 19, 2024
@radeusgd
Copy link
Member Author

Additionally a test in Snowflake_Spec is marked as pending, so it should be re-enabled once this is fixed.

@enso-bot enso-bot bot mentioned this issue Jul 19, 2024
radeusgd added a commit that referenced this issue Jul 22, 2024
mergify bot pushed a commit that referenced this issue Jul 23, 2024
- Closes #9486
- All tests are succeeding or marked pending
- Created follow up tickets for things that still need to be addressed, including:
- Fixing upload / table update #10609
- Fixing `Count_Distinct` on Boolean columns #10611
- Running the tests on CI is not part of this PR - to be addressed separately
@GregoryTravis GregoryTravis moved this from ❓New to 📤 Backlog in Issues Board Jul 23, 2024
@jdunkerley jdunkerley added this to the 2024-08 Release milestone Jul 29, 2024
@mergify mergify bot closed this as completed in #10818 Aug 16, 2024
@mergify mergify bot closed this as completed in 09137f7 Aug 16, 2024
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-db-connector Libraries: database connectors
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants