dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

bpbond · 2023-07-09T21:58:24Z

FYI dplyr 1.1.0 provides a way to immediately error if a join returns more than one row from y, or if there's no match:

Multiple matches in equality joins like this one are typically unexpected (even though they are baked in to SQL) so we’ve also added a new warning to alert you when this happens. If multiple matches are expected, you can explicitly set multiple = "all" to silence this warning. This also serves as a code “sign post” for future readers of your code to let them know that this is a join that is expected to increase the number of rows in the data. If multiple matches aren’t expected, you can also set multiple = "error" to immediately halt the analysis.

https://www.tidyverse.org/blog/2023/01/dplyr-1-1-0-joins/#inequality-joins
Update: https://www.tidyverse.org/blog/2023/03/dplyr-1-1-1/

multiple: Handling of rows in x with multiple matches in y. For each row of x:
"all", the default, returns every match detected in y. This is the same behavior as SQL.
"any" returns one match detected in y, with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
"first" returns the first match detected in y.
"last" returns the last match detected in y.

unmatched: How should unmatched keys that would result in dropped rows be handled?
"drop" drops unmatched keys from the result.
"error" throws an error if unmatched keys are detected.

When gcamdata is ready to move to dplyr 1.1, this should allow for the removal of both left_join_keep_first_only and left_join_error_no_match I think?

The text was updated successfully, but these errors were encountered:

bpbond added the enhancement label Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

bpbond commented Jul 9, 2023 •

edited

Loading

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

dplyr now has optional immediate error on multiple-match and unmatched-key joins #1246

Comments

bpbond commented Jul 9, 2023 • edited Loading

bpbond commented Jul 9, 2023 •

edited

Loading