Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

Open
bpbond opened this issue Jul 9, 2023 · 0 comments

Comments

@bpbond
Copy link
Member

bpbond commented Jul 9, 2023

FYI dplyr 1.1.0 provides a way to immediately error if a join returns more than one row from y, or if there's no match:

Multiple matches in equality joins like this one are typically unexpected (even though they are baked in to SQL) so we’ve also added a new warning to alert you when this happens. If multiple matches are expected, you can explicitly set multiple = "all" to silence this warning. This also serves as a code “sign post” for future readers of your code to let them know that this is a join that is expected to increase the number of rows in the data. If multiple matches aren’t expected, you can also set multiple = "error" to immediately halt the analysis.

https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#inequality-joins
Update: https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/

multiple: Handling of rows in x with multiple matches in y. For each row of x:
"all", the default, returns every match detected in y. This is the same behavior as SQL.
"any" returns one match detected in y, with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
"first" returns the first match detected in y.
"last" returns the last match detected in y.

unmatched: How should unmatched keys that would result in dropped rows be handled?
"drop" drops unmatched keys from the result.
"error" throws an error if unmatched keys are detected.

When gcamdata is ready to move to dplyr 1.1, this should allow for the removal of both left_join_keep_first_only and left_join_error_no_match I think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant