Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings in data_summary() with by? #583

Open
strengejacke opened this issue Jan 12, 2025 · 5 comments · May be fixed by #585
Open

Warnings in data_summary() with by? #583

strengejacke opened this issue Jan 12, 2025 · 5 comments · May be fixed by #585

Comments

@strengejacke
Copy link
Member

Not sure if grouping is the problem here, but there shouldn't be warnings.

library(datawizard)
library(rtdists)
data <- rtdists::speed_acc |>
  data_filter(rt < 1.5 & stim_cat == "word" & frequency == "low")

data <- data_modify(
  data,
  error = ifelse(as.character(response) != as.character(stim_cat), 1, 0)
)
data_rt <- data_filter(data, error == 0)
data_sub <- aggregate(rt ~ id + condition, data_rt, mean)
data_summary(data_rt, rt = mean(rt), by = c("id", "condition"))
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> id | condition |   rt
#> ---------------------
#> 1  |  accuracy | 0.59
#> 1  |     speed | 0.54
#> 2  |  accuracy | 0.77
#> 2  |     speed | 0.60
#> 3  |  accuracy | 0.72
#> 3  |     speed | 0.54
#> 4  |  accuracy | 0.56
#> 4  |     speed | 0.48
#> 5  |  accuracy | 0.64
#> 5  |     speed | 0.48
#> 6  |  accuracy | 0.62
#> 6  |     speed | 0.48
#> 7  |  accuracy | 0.59
#> 7  |     speed | 0.55
#> 8  |  accuracy | 0.91
#> 8  |     speed | 0.56
#> 9  |  accuracy | 0.72
#> 9  |     speed | 0.54
#> 10 |  accuracy | 0.59
#> 10 |     speed | 0.50
#> 11 |  accuracy | 0.60
#> 11 |     speed | 0.52
#> 12 |  accuracy | 0.69
#> 12 |     speed | 0.54
#> 13 |  accuracy | 0.77
#> 13 |     speed | 0.56
#> 14 |  accuracy | 0.72
#> 14 |     speed | 0.53
#> 15 |  accuracy | 0.78
#> 15 |     speed | 0.56
#> 16 |  accuracy | 0.73
#> 16 |     speed | 0.57
#> 17 |  accuracy | 0.75
#> 17 |     speed | 0.54

warnings()

Created on 2025-01-12 with reprex v2.1.1

@mattansb
Copy link
Member

Looks like it's confused by the rt() function - confusing the global environment with the data environment.

@etiennebacher
Copy link
Member

etiennebacher commented Feb 4, 2025

I don't think there's a good solution for this. The problem comes from those lines:

datawizard/R/data_modify.R

Lines 341 to 363 in 3f23020

# expression is given as character string in a variable, but named, e.g.
# a <- "2 * Sepal.Width"
# data_modify(iris, double_SepWidth = a)
# we reconstruct the symbol as if it were provided as literal expression.
# However, we need to check that we don't have a character vector,
# like: data_modify(iris, new_var = "a")
# this one should be recycled instead.
if (!is.character(symbol)) {
eval_symbol <- .dynEval(symbol, ifnotfound = NULL)
if (is.character(eval_symbol)) {
symbol <- try(str2lang(paste0(names(dots)[i], " = ", eval_symbol)), silent = TRUE)
# we may have the edge-case of having a function that returns a character
# vector, like "new_var = sample(letters[1:3])". In this case, "eval_symbol"
# is of type character, but no symbol, thus str2lang() above creates a
# wrong pattern. We then take "eval_symbol" as character input.
if (inherits(symbol, "try-error")) {
symbol <- str2lang(paste0(
names(dots)[i],
" = c(", paste0("\"", eval_symbol, "\"", collapse = ","), ")"
))
}
}
}

In this example,

a <- "2 * Sepal.Width"
data_modify(iris, double_SepWidth = a)

we try to evaluate a and if it's not found then we return NULL (eval_symbol <- .dynEval(symbol, ifnotfound = NULL).

With your example, as @mattansb pointed out, evaluating mean(rt) doesn't lead to NULL because there's a rt() function in stats. mean(rt) doesn't make any sense but does run, which is why the function returns NA. I think we're just hitting the limits of our dynamic evaluation code.

@strengejacke
Copy link
Member Author

I think we're just hitting the limits of our dynamic evaluation code.

Yes, agree. Not sure if it works when we check if the symbol is a function? Or do we accept functions in other contexts?

@mattansb
Copy link
Member

mattansb commented Feb 4, 2025

We should be able to pass the data frame to .dynEval() no? And have that be the first environment to try the evaluation in?

@etiennebacher
Copy link
Member

Yes we could, I'm preparing a patch for this but I'd like to explore more the side effects before merging it.

@etiennebacher etiennebacher linked a pull request Feb 4, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants