`draw` forgets order of ordered factor #284

mhpob · 2024-05-06T19:22:42Z

Everything evaluates fine, but the terms are not ordered after parametric_effects and plotted in alphabetical order.

The factor and its ordering look to become purposefully uncoupled into row order, an "ordered" type, and an unordered character representation of the levels in parametric_effects. The row ordering is not passed on to the base plotting step of draw_parametric_effect.

> library(mgcv)
Loading required package: nlme
This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
> library(gratia)
> 
> df <- data_sim("eg1", seed = 42)
> df$month <- factor(
+   rep(month.abb[1:10], times = 40),
+   levels = month.abb[1:10],
+   ordered = T
+ )
> 
> class(df$month)
[1] "ordered" "factor" 
> levels(df$month)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"
> 
> m <- gam(y ~ month + s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")
> summary(m)

Family: gaussian 
Link function: identity 

Formula:
y ~ month + s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.49514    0.10571  70.901   <2e-16 ***
month.L     -0.10830    0.33948  -0.319    0.750    
month.Q      0.37636    0.34039   1.106    0.270    
month.C      0.42859    0.33954   1.262    0.208    
month^4     -0.17750    0.34422  -0.516    0.606    
month^5      0.33674    0.34115   0.987    0.324    
month^6      0.12809    0.34837   0.368    0.713    
month^7     -0.07840    0.33997  -0.231    0.818    
month^8     -0.09706    0.33962  -0.286    0.775    
month^9      0.03407    0.34168   0.100    0.921    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df      F  p-value    
s(x0) 3.358  4.165  8.438 1.73e-06 ***
s(x1) 3.112  3.871 66.289  < 2e-16 ***
s(x2) 7.889  8.676 66.775  < 2e-16 ***
s(x3) 1.902  2.382  2.810    0.054 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.682   Deviance explained = 70.2%
-REML = 886.14  Scale est. = 4.47      n = 400
> 
> parametric_effects(m, terms ='month')
# A tibble: 10 × 5
   term  type    .level .partial   .se
   <chr> <chr>   <chr>     <dbl> <dbl>
 1 month ordered Jan    -0.0591  0.322
 2 month ordered Feb     0.352   0.326
 3 month ordered Mar     0.251   0.325
 4 month ordered Apr    -0.00511 0.324
 5 month ordered May    -0.312   0.322
 6 month ordered Jun    -0.200   0.329
 7 month ordered Jul    -0.0499  0.324
 8 month ordered Aug    -0.176   0.326
 9 month ordered Sep    -0.159   0.321
10 month ordered Oct     0.359   0.322
> 
> parametric_effects(m, terms ='month') |> 
+   draw()

The text was updated successfully, but these errors were encountered:

gavinsimpson · 2024-05-19T08:35:07Z

Yeah; this is a known infelicity of the current way I return everything as a single tibble; we can't merge factors (ordered or otherwise) with different levels and wouldn't want to anyway.

I think I really want to return a nested tibble with 1 row per effect (1 row for each smooth or parametric effect), which will allow for storage of the levels as we won't need to convert them to character to bind all the effects together. But that would be quite a change to the UI and would fail as soon as someone unnested the tibble...

Right now, the best I can think of is to attach another attribute the carries the factor levels in a list, and make sure I preserve that attribute through any subsetting operations.

… missed adding a period to the names of objects returned by parametric_effects

…c_effects objects that got missed in 0.9.0; bump patch version

gavinsimpson · 2024-05-19T10:52:45Z

This is now fixed in the devel version on GitHub. I'll make a release to CRAN shortly after June 10th.

tamas-ferenci · 2025-01-14T21:04:58Z

I am sorry to reopen this issue @gavinsimpson , but I think there is still a problem: the solution breaks down if there are more than one factor explanatory variables and the variable in question is not the first in draw.

Continuing the original example:

df$var <- as.factor(rep(letters[1:2], 400/2))
m <- gam(y ~ month + var + s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")

Now, draw(parametric_effects(m, c("month"))) works perfectly, draw(parametric_effects(m, c("month", "var"))) works perfectly, but draw(parametric_effects(m, c("var", "month"))) produces the original error.

gavinsimpson · 2025-01-15T15:36:26Z

Thanks; I'll take a look

gavinsimpson · 2025-01-17T15:47:24Z

Hmm, I can't reproduce this @tamas-ferenci, at least not with the latest CRAN release.

> m |> parametric_effects(select = c("var", "month"))
# A tibble: 12 × 5
   .term .type   .level .partial   .se
   <chr> <chr>   <chr>     <dbl> <dbl>
 1 month ordered Jan     -0.0444 0.352
 2 month ordered Feb      0.337  0.351
 3 month ordered Mar      0.266  0.331
 4 month ordered Apr     -0.0198 0.298
 5 month ordered May     -0.297  0.263
 6 month ordered Jun     -0.215  0.266
 7 month ordered Jul     -0.0351 0.297
 8 month ordered Aug     -0.191  0.336
 9 month ordered Sep     -0.144  0.348
10 month ordered Oct      0.344  0.354
11 var   factor  a        0      0
12 var   factor  b        0.0294 0.294

tamas-ferenci · 2025-01-17T17:12:52Z

That's crazy! I think I know what happened: you used m |> parametric_effects(select = c("var", "month")) |> draw(), and that indeed works, I used draw(parametric_effects(m, c("var", "month"))) and that produces the error. But are these two not exactly the same?? I just embedded the function instead of using a pipe.

gavinsimpson · 2025-01-20T09:27:47Z

@tamas-ferenci Yes, they are essentially identical, although I see at least two problems and I can now reproduce your findings:

the argument to select terms is terms not select, but it should be the latter.
when you specify terms the original issue that you report raises its head again.

(My example works, because I sued select and that is being silently ignored. So, I guess that the code that does the filtering of terms is not preserving the information about the ordering, while the default terms = NULL is.

I'll fix both of these issues this week.

gavinsimpson · 2025-01-20T10:19:17Z

The problem is happening in this code

  effects <- map_df(valid_terms,
    .f = fun, data = data, pred = pred,
    vars = vars
  )

  if (unnest) {
    f_levels <- attr(effects, "factor_levels")
    effects <- unnest(effects, cols = "data") |>
      relocate(c(".partial", ".se"), .after = last_col())
    attr(effects, "factor_levels") <- f_levels
  }

from parametric_effects.gam.

Only one set of levels is preserved in the factor_levels attribute, presumably when the coercion to a data frame happens in map_df.

If so, perhaps best to not coerce to a data frame at the stage that effects is created, but instead to just use map() to leave this as a list. Then pull off the factor levels attribute for each term, and then go into the unnest clause. There will need to be a bind_rows() step immediately before the if (unnest) { clause, to get effects as a data frame, regardless of whether we unnest or not.

tamas-ferenci · 2025-01-20T10:34:21Z

Aaaah, I see! (I automatically assumed that the second argument is called select, I haven't even bothered to check it.) As a sidenote, the documentation probably also needs some improvement, currently it doesn't even mention select. Thank you!

gavinsimpson · 2025-01-20T11:37:25Z

@tamas-ferenci Sorry if I was unclear; your code is correct, mine example wasn't. The argument is called terms which is why the documentation doesn't mention select. I'm not sure why I left this as terms, as everything else in the package change to use the argument name select.

The bug is real; I'm testing a fix at the moment and should have it on GitHub in a couple of hours.

tamas-ferenci · 2025-01-20T11:47:19Z

OK, I understand! Great, thank you!

gavinsimpson · 2025-01-20T13:05:02Z

This seems to be fixed again; thanks for the report @tamas-ferenci

tamas-ferenci · 2025-01-22T20:49:46Z

I checked it @gavinsimpson , and it seems to be working perfectly, thank you very much!

gavinsimpson · 2025-01-23T09:00:38Z

Thanks @tamas-ferenci

gavinsimpson closed this as completed in 3651ff4 May 19, 2024

gavinsimpson added a commit that referenced this issue May 19, 2024

Merge branch 'issue-284': closes #284

e2c56ec

gavinsimpson added a commit that referenced this issue May 19, 2024

update tests and snapshots; add test for #284; update tests because I…

6fd5b5b

… missed adding a period to the names of objects returned by parametric_effects

gavinsimpson added a commit that referenced this issue May 19, 2024

document fix for #284; document breaking change to names of parametri…

0e4da89

…c_effects objects that got missed in 0.9.0; bump patch version

gavinsimpson added this to the 0.9.1 milestone May 19, 2024

gavinsimpson reopened this Jan 17, 2025

gavinsimpson closed this as completed in df44219 Jan 20, 2025

gavinsimpson added a commit that referenced this issue Jan 20, 2025

fix #284 properly

aebe165

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`draw` forgets order of ordered factor #284

`draw` forgets order of ordered factor #284

mhpob commented May 6, 2024 •

edited

Loading

gavinsimpson commented May 19, 2024

gavinsimpson commented May 19, 2024

tamas-ferenci commented Jan 14, 2025 •

edited

Loading

gavinsimpson commented Jan 15, 2025

gavinsimpson commented Jan 17, 2025

tamas-ferenci commented Jan 17, 2025

gavinsimpson commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 22, 2025

gavinsimpson commented Jan 23, 2025

draw forgets order of ordered factor #284

draw forgets order of ordered factor #284

Comments

mhpob commented May 6, 2024 • edited Loading

gavinsimpson commented May 19, 2024

gavinsimpson commented May 19, 2024

tamas-ferenci commented Jan 14, 2025 • edited Loading

gavinsimpson commented Jan 15, 2025

gavinsimpson commented Jan 17, 2025

tamas-ferenci commented Jan 17, 2025

gavinsimpson commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 20, 2025

gavinsimpson commented Jan 20, 2025

tamas-ferenci commented Jan 22, 2025

gavinsimpson commented Jan 23, 2025

`draw` forgets order of ordered factor #284

`draw` forgets order of ordered factor #284

mhpob commented May 6, 2024 •

edited

Loading

tamas-ferenci commented Jan 14, 2025 •

edited

Loading