-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added full outer join #822
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #822 +/- ##
=======================================
Coverage 87.44% 87.45%
=======================================
Files 128 128
Lines 11368 11383 +15
Branches 1545 1550 +5
=======================================
+ Hits 9941 9955 +14
Misses 1038 1038
- Partials 389 390 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Deploying datachain-documentation with Cloudflare Pages
|
else: | ||
ch = ch1.merge(ch2, "emp.person.name", "team.player", full=True) | ||
|
||
str_default = String.default_value(test_session.catalog.warehouse.db.dialect) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are those value differ across implementations?
what happens then if I pull a dataset from Studio to sqlite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those are different. Actually, right now I'm working on a task to fix this as issue is on CH side https://github.com/iterative/studio/issues/11161
If you pull dataset from studio, instead of NULL
(or None
in the code) it will have ""
for string, 0
for int etc. As we discussed, non nullable columns were added in CH for performance reasons from the beginning of CH implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, yes, it's related to that CH nulls issue.
Besides that ... I wonder how it will work for cases where there is a column with some default values defined via Pydantic model 🤔 . Probably. we'll need a way to specify default values in the operation itself.
* added main logic for outer join * fixing filters * removign datasetquery tests and added more datachain unit tests
Adds full outer join support to our interface
DataChain.merge(...)
method and lower levelDatasetQuery.join(...)
.Up until now we only had left outer join and inner join.
Note that since lower SQLite versions (< 3.39.0) doesn't support full outer joins out of the box there is a workaround with 2 left joins + union.
Example: