Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39191: [R] throw error when string_replace is passed vector of values in pattern #39219

Merged
merged 7 commits into from
Dec 19, 2023
Merged
7 changes: 6 additions & 1 deletion r/R/dplyr-funcs-string.R
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,6 @@ get_stringr_pattern_options <- function(pattern) {
}

ensure_opts <- function(opts) {

# default options for the simple cases
if (is.character(opts)) {
opts <- list(pattern = opts, fixed = FALSE, ignore_case = FALSE)
Expand Down Expand Up @@ -352,6 +351,12 @@ register_bindings_string_regex <- function() {
# Encapsulate some common logic for sub/gsub/str_replace/str_replace_all
arrow_r_string_replace_function <- function(max_replacements) {
function(pattern, replacement, x, ignore.case = FALSE, fixed = FALSE) {
if (length(pattern) != 1) {
stop("`pattern` must be a length 1 character vector")
}
if (length(replacement) != 1) {
stop("`replacement` must be a length 1 character vector")
}
Expression$create(
ifelse(fixed && !ignore.case, "replace_substring", "replace_substring_regex"),
x,
Expand Down
19 changes: 18 additions & 1 deletion r/tests/testthat/test-dplyr-funcs-string.R
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,24 @@ test_that("str_replace and str_replace_all", {
collect(),
df
)

expect_error(
arrow_table(df) %>%
transmute(x = call_binding("str_replace_all", x, c("F" = "_", "b" = ""))) %>%
collect(),
regexp = "`pattern` must be a length 1 character vector",
)
expect_error(
arrow_table(df) %>%
transmute(x = call_binding("str_replace_all", x, c("F", "b"), c("_", ""))) %>%
collect(),
regexp = "`pattern` must be a length 1 character vector",
)
expect_error(
arrow_table(df) %>%
transmute(x = call_binding("str_replace_all", x, c("F"), c("_", ""))) %>%
collect(),
regexp = "`replacement` must be a length 1 character vector",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny last change to suggest, and then this will be ready to merge! You don't need the arrow_table() etc stuff here as call_binding() can be called on its own. What is needed instead is to create a field reference so call_binding() has something to refer to in the expression it creates, and then expect_error() can just wrap call_binding().

For example, in that final test, you'll need something a bit shorter, like this:

  x <- Expression$field_ref("x")
  expect_error(
    call_binding("str_replace_all", x, c("F"), c("_", ""))),
    regexp = "`replacement` must be a length 1 character vector"
  )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll be honest. I really am lost in this whole call_bindings/Expression$field_ref("x") stuff. I guess thats what I get for being a field biologist dabbling in computer science. Thanks for walking me through this stuff!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've dived into a really tricky bit of the codebase, but you've done a great job! A lot of my own early PRs to arrow involved drawing analogies between different bits of the codebase, but not understanding exactly what was going on (and this is still the case for any of my PRs which involve any C++).

Congratulations on your first PR to Arrow! Once the CI passes I'll merge it. Welcome to the project :)

compare_dplyr_binding(
.input %>%
transmute(x = str_replace_all(x, regex("^F"), "baz")) %>%
Expand Down
Loading