Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draw forgets order of ordered factor #284

Closed
mhpob opened this issue May 6, 2024 · 14 comments
Closed

draw forgets order of ordered factor #284

mhpob opened this issue May 6, 2024 · 14 comments
Milestone

Comments

@mhpob
Copy link

mhpob commented May 6, 2024

Everything evaluates fine, but the terms are not ordered after parametric_effects and plotted in alphabetical order.

The factor and its ordering look to become purposefully uncoupled into row order, an "ordered" type, and an unordered character representation of the levels in parametric_effects. The row ordering is not passed on to the base plotting step of draw_parametric_effect.

> library(mgcv)
Loading required package: nlme
This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
> library(gratia)
> 
> df <- data_sim("eg1", seed = 42)
> df$month <- factor(
+   rep(month.abb[1:10], times = 40),
+   levels = month.abb[1:10],
+   ordered = T
+ )
> 
> class(df$month)
[1] "ordered" "factor" 
> levels(df$month)
 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct"
> 
> m <- gam(y ~ month + s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")
> summary(m)

Family: gaussian 
Link function: identity 

Formula:
y ~ month + s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7.49514    0.10571  70.901   <2e-16 ***
month.L     -0.10830    0.33948  -0.319    0.750    
month.Q      0.37636    0.34039   1.106    0.270    
month.C      0.42859    0.33954   1.262    0.208    
month^4     -0.17750    0.34422  -0.516    0.606    
month^5      0.33674    0.34115   0.987    0.324    
month^6      0.12809    0.34837   0.368    0.713    
month^7     -0.07840    0.33997  -0.231    0.818    
month^8     -0.09706    0.33962  -0.286    0.775    
month^9      0.03407    0.34168   0.100    0.921    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
        edf Ref.df      F  p-value    
s(x0) 3.358  4.165  8.438 1.73e-06 ***
s(x1) 3.112  3.871 66.289  < 2e-16 ***
s(x2) 7.889  8.676 66.775  < 2e-16 ***
s(x3) 1.902  2.382  2.810    0.054 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.682   Deviance explained = 70.2%
-REML = 886.14  Scale est. = 4.47      n = 400
> 
> parametric_effects(m, terms ='month')
# A tibble: 10 × 5
   term  type    .level .partial   .se
   <chr> <chr>   <chr>     <dbl> <dbl>
 1 month ordered Jan    -0.0591  0.322
 2 month ordered Feb     0.352   0.326
 3 month ordered Mar     0.251   0.325
 4 month ordered Apr    -0.00511 0.324
 5 month ordered May    -0.312   0.322
 6 month ordered Jun    -0.200   0.329
 7 month ordered Jul    -0.0499  0.324
 8 month ordered Aug    -0.176   0.326
 9 month ordered Sep    -0.159   0.321
10 month ordered Oct     0.359   0.322
> 
> parametric_effects(m, terms ='month') |> 
+   draw()

image

@gavinsimpson
Copy link
Owner

Yeah; this is a known infelicity of the current way I return everything as a single tibble; we can't merge factors (ordered or otherwise) with different levels and wouldn't want to anyway.

I think I really want to return a nested tibble with 1 row per effect (1 row for each smooth or parametric effect), which will allow for storage of the levels as we won't need to convert them to character to bind all the effects together. But that would be quite a change to the UI and would fail as soon as someone unnested the tibble...

Right now, the best I can think of is to attach another attribute the carries the factor levels in a list, and make sure I preserve that attribute through any subsetting operations.

gavinsimpson added a commit that referenced this issue May 19, 2024
gavinsimpson added a commit that referenced this issue May 19, 2024
… missed adding a period to the names of objects returned by parametric_effects
gavinsimpson added a commit that referenced this issue May 19, 2024
…c_effects objects that got missed in 0.9.0; bump patch version
@gavinsimpson
Copy link
Owner

This is now fixed in the devel version on GitHub. I'll make a release to CRAN shortly after June 10th.

issue-284-plot

@gavinsimpson gavinsimpson added this to the 0.9.1 milestone May 19, 2024
@tamas-ferenci
Copy link

tamas-ferenci commented Jan 14, 2025

I am sorry to reopen this issue @gavinsimpson , but I think there is still a problem: the solution breaks down if there are more than one factor explanatory variables and the variable in question is not the first in draw.

Continuing the original example:

df$var <- as.factor(rep(letters[1:2], 400/2))
m <- gam(y ~ month + var + s(x0) + s(x1) + s(x2) + s(x3), data = df, method = "REML")

Now, draw(parametric_effects(m, c("month"))) works perfectly, draw(parametric_effects(m, c("month", "var"))) works perfectly, but draw(parametric_effects(m, c("var", "month"))) produces the original error.

@gavinsimpson
Copy link
Owner

Thanks; I'll take a look

@gavinsimpson gavinsimpson reopened this Jan 17, 2025
@gavinsimpson
Copy link
Owner

Hmm, I can't reproduce this @tamas-ferenci, at least not with the latest CRAN release.

> m |> parametric_effects(select = c("var", "month"))
# A tibble: 12 × 5
   .term .type   .level .partial   .se
   <chr> <chr>   <chr>     <dbl> <dbl>
 1 month ordered Jan     -0.0444 0.352
 2 month ordered Feb      0.337  0.351
 3 month ordered Mar      0.266  0.331
 4 month ordered Apr     -0.0198 0.298
 5 month ordered May     -0.297  0.263
 6 month ordered Jun     -0.215  0.266
 7 month ordered Jul     -0.0351 0.297
 8 month ordered Aug     -0.191  0.336
 9 month ordered Sep     -0.144  0.348
10 month ordered Oct      0.344  0.354
11 var   factor  a        0      0
12 var   factor  b        0.0294 0.294

Image

@tamas-ferenci
Copy link

That's crazy! I think I know what happened: you used m |> parametric_effects(select = c("var", "month")) |> draw(), and that indeed works, I used draw(parametric_effects(m, c("var", "month"))) and that produces the error. But are these two not exactly the same?? I just embedded the function instead of using a pipe.

@gavinsimpson
Copy link
Owner

@tamas-ferenci Yes, they are essentially identical, although I see at least two problems and I can now reproduce your findings:

  1. the argument to select terms is terms not select, but it should be the latter.
  2. when you specify terms the original issue that you report raises its head again.

(My example works, because I sued select and that is being silently ignored. So, I guess that the code that does the filtering of terms is not preserving the information about the ordering, while the default terms = NULL is.

I'll fix both of these issues this week.

@gavinsimpson
Copy link
Owner

The problem is happening in this code

  effects <- map_df(valid_terms,
    .f = fun, data = data, pred = pred,
    vars = vars
  )

  if (unnest) {
    f_levels <- attr(effects, "factor_levels")
    effects <- unnest(effects, cols = "data") |>
      relocate(c(".partial", ".se"), .after = last_col())
    attr(effects, "factor_levels") <- f_levels
  }

from parametric_effects.gam.

Only one set of levels is preserved in the factor_levels attribute, presumably when the coercion to a data frame happens in map_df.

If so, perhaps best to not coerce to a data frame at the stage that effects is created, but instead to just use map() to leave this as a list. Then pull off the factor levels attribute for each term, and then go into the unnest clause. There will need to be a bind_rows() step immediately before the if (unnest) { clause, to get effects as a data frame, regardless of whether we unnest or not.

@tamas-ferenci
Copy link

Aaaah, I see! (I automatically assumed that the second argument is called select, I haven't even bothered to check it.) As a sidenote, the documentation probably also needs some improvement, currently it doesn't even mention select. Thank you!

@gavinsimpson
Copy link
Owner

@tamas-ferenci Sorry if I was unclear; your code is correct, mine example wasn't. The argument is called terms which is why the documentation doesn't mention select. I'm not sure why I left this as terms, as everything else in the package change to use the argument name select.

The bug is real; I'm testing a fix at the moment and should have it on GitHub in a couple of hours.

@tamas-ferenci
Copy link

OK, I understand! Great, thank you!

gavinsimpson added a commit that referenced this issue Jan 20, 2025
@gavinsimpson
Copy link
Owner

This seems to be fixed again; thanks for the report @tamas-ferenci

@tamas-ferenci
Copy link

I checked it @gavinsimpson , and it seems to be working perfectly, thank you very much!

@gavinsimpson
Copy link
Owner

Thanks @tamas-ferenci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants