Warnings in `data_summary()` with `by`? #583

strengejacke · 2025-01-12T14:43:14Z

Not sure if grouping is the problem here, but there shouldn't be warnings.

library(datawizard)
library(rtdists)
data <- rtdists::speed_acc |>
  data_filter(rt < 1.5 & stim_cat == "word" & frequency == "low")

data <- data_modify(
  data,
  error = ifelse(as.character(response) != as.character(stim_cat), 1, 0)
)
data_rt <- data_filter(data, error == 0)
data_sub <- aggregate(rt ~ id + condition, data_rt, mean)
data_summary(data_rt, rt = mean(rt), by = c("id", "condition"))
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> Warning in mean.default(rt): argument is not numeric or logical: returning NA
#> id | condition |   rt
#> ---------------------
#> 1  |  accuracy | 0.59
#> 1  |     speed | 0.54
#> 2  |  accuracy | 0.77
#> 2  |     speed | 0.60
#> 3  |  accuracy | 0.72
#> 3  |     speed | 0.54
#> 4  |  accuracy | 0.56
#> 4  |     speed | 0.48
#> 5  |  accuracy | 0.64
#> 5  |     speed | 0.48
#> 6  |  accuracy | 0.62
#> 6  |     speed | 0.48
#> 7  |  accuracy | 0.59
#> 7  |     speed | 0.55
#> 8  |  accuracy | 0.91
#> 8  |     speed | 0.56
#> 9  |  accuracy | 0.72
#> 9  |     speed | 0.54
#> 10 |  accuracy | 0.59
#> 10 |     speed | 0.50
#> 11 |  accuracy | 0.60
#> 11 |     speed | 0.52
#> 12 |  accuracy | 0.69
#> 12 |     speed | 0.54
#> 13 |  accuracy | 0.77
#> 13 |     speed | 0.56
#> 14 |  accuracy | 0.72
#> 14 |     speed | 0.53
#> 15 |  accuracy | 0.78
#> 15 |     speed | 0.56
#> 16 |  accuracy | 0.73
#> 16 |     speed | 0.57
#> 17 |  accuracy | 0.75
#> 17 |     speed | 0.54

warnings()

^{Created on 2025-01-12 with reprex v2.1.1}

mattansb · 2025-01-12T15:49:29Z

Looks like it's confused by the rt() function - confusing the global environment with the data environment.

etiennebacher · 2025-02-04T13:27:53Z

I don't think there's a good solution for this. The problem comes from those lines:

datawizard/R/data_modify.R

Lines 341 to 363 in 3f23020

    
           # expression is given as character string in a variable, but named, e.g. 
        
           # a <- "2 * Sepal.Width" 
        
           # data_modify(iris, double_SepWidth = a) 
        
           # we reconstruct the symbol as if it were provided as literal expression. 
        
           # However, we need to check that we don't have a character vector, 
        
           # like: data_modify(iris, new_var = "a") 
        
           # this one should be recycled instead. 
        
           if (!is.character(symbol)) { 
        
             eval_symbol <- .dynEval(symbol, ifnotfound = NULL) 
        
             if (is.character(eval_symbol)) { 
        
               symbol <- try(str2lang(paste0(names(dots)[i], " = ", eval_symbol)), silent = TRUE) 
        
               # we may have the edge-case of having a function that returns a character 
        
               # vector, like "new_var = sample(letters[1:3])". In this case, "eval_symbol" 
        
               # is of type character, but no symbol, thus str2lang() above creates a 
        
               # wrong pattern. We then take "eval_symbol" as character input. 
        
               if (inherits(symbol, "try-error")) { 
        
                 symbol <- str2lang(paste0( 
        
                   names(dots)[i], 
        
                   " = c(", paste0("\"", eval_symbol, "\"", collapse = ","), ")" 
        
                 )) 
        
               } 
        
             } 
        
           }

In this example,

a <- "2 * Sepal.Width"
data_modify(iris, double_SepWidth = a)

we try to evaluate a and if it's not found then we return NULL (eval_symbol <- .dynEval(symbol, ifnotfound = NULL).

With your example, as @mattansb pointed out, evaluating mean(rt) doesn't lead to NULL because there's a rt() function in stats. mean(rt) doesn't make any sense but does run, which is why the function returns NA. I think we're just hitting the limits of our dynamic evaluation code.

strengejacke · 2025-02-04T13:32:12Z

I think we're just hitting the limits of our dynamic evaluation code.

Yes, agree. Not sure if it works when we check if the symbol is a function? Or do we accept functions in other contexts?

mattansb · 2025-02-04T13:33:12Z

We should be able to pass the data frame to .dynEval() no? And have that be the first environment to try the evaluation in?

etiennebacher · 2025-02-04T14:09:58Z

Yes we could, I'm preparing a patch for this but I'd like to explore more the side effects before merging it.

etiennebacher linked a pull request Feb 4, 2025 that will close this issue

Start dynamic evaluation in provided dataframe #585

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warnings in `data_summary()` with `by`? #583

Warnings in `data_summary()` with `by`? #583

strengejacke commented Jan 12, 2025

mattansb commented Jan 12, 2025

etiennebacher commented Feb 4, 2025 •

edited

Loading

strengejacke commented Feb 4, 2025

mattansb commented Feb 4, 2025

etiennebacher commented Feb 4, 2025

Warnings in data_summary() with by? #583

Warnings in data_summary() with by? #583

Comments

strengejacke commented Jan 12, 2025

mattansb commented Jan 12, 2025

etiennebacher commented Feb 4, 2025 • edited Loading

strengejacke commented Feb 4, 2025

mattansb commented Feb 4, 2025

etiennebacher commented Feb 4, 2025

Warnings in `data_summary()` with `by`? #583

Warnings in `data_summary()` with `by`? #583

etiennebacher commented Feb 4, 2025 •

edited

Loading