Feature Request: function to add "totals" as an indexed group for summarizing #3463

sfirke · 2018-03-26T16:38:16Z

(this is a generalization and extension of #1253)

This function would add a group to a grouped_df, called "Total" (or maybe a user-specified name), that has all row numbers in its index. Once this group is added to the tibble, summarise functions would treat it like any other group.

Hypothetical example:

mtcars %>%
  group_by(cyl) %>%
  add_totals_group() %>%
  summarise(avg_mpg = mean(mpg),
            n = n())

# A tibble: 4 x 3
  cyl   avg_mpg     n
  <chr>   <dbl> <int>
1 4        26.7    11
2 6        19.7     7
3 8        15.1    14
4 Total    20.1    32

The actual code to produce this hypothetical result is:

rbind(
  mtcars %>%
    group_by(cyl) %>%
    summarise(avg_mpg = mean(mpg),
              n = n()),
  mtcars %>%
    summarise(avg_mpg = mean(mpg),
              n = n()) %>%
    mutate(cyl = "Total")
)

I started working on it but ran into internal dplyr validation checks that don't like my "corrupt" grouped_df. Here's my non-working attempt to tinker with the group indices:

add_totals_group <- function(dat){
  cur_indices <- attr(dat, "indices")
  attr(dat, "indices")[[length(cur_indices) + 1]] <- seq(nrow(dat))
  attr(dat, "group_sizes") <- c(attr(dat, "group_sizes"), nrow(dat))
  attr(dat, "biggest_group_size") <- nrow(dat)
  attr(dat, "labels") <- rbind(rep("Total", ncol(dat)))
  dat
}

The text was updated successfully, but these errors were encountered:

romainfrancois · 2018-03-26T17:19:35Z

That's the kind of things tidy evaluation can help you build.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

yada <- function(data, ... ){
  groups <- quos(...)

  structure(
    list(
      with = group_by(data, !!!groups),
      without = data
    ),
    class = "grouped_and_not_grouped"
  )

}

summarise.grouped_and_not_grouped <- function(data, ...){
  exprs <- quos(...)

  with    <- summarise( data$with, !!!exprs )
  without <- summarise( data$without, !!!exprs )

  bind_rows( with, without )

}

mtcars %>%
  yada(cyl) %>%
  summarise( avg_mpg = mean(mpg), n = n())
#> # A tibble: 4 x 3
#>     cyl avg_mpg     n
#>   <dbl>   <dbl> <int>
#> 1    4.    26.7    11
#> 2    6.    19.7     7
#> 3    8.    15.1    14
#> 4   NA     20.1    32

Might need some extra help with the NA, but I don't think this should be in dplyr.

sfirke · 2018-03-28T13:10:43Z

I appreciate the example with tidyeval. But compared to using the grouping internals of dplyr, this feels heavy and brittle. Adding an index group would allow for a versatile, natural user experience, compared to another package taking this approach. For instance this fails:

mtcars %>%
  yada(cyl) %>%
  filter(mpg < 30) %>% 
  summarise( avg_mpg = mean(mpg), n = n())

Because we're now relying on another class. But would work if done through the grouping indices. And something simple like this would work:

mtcars %>%
  group_by(cyl) %>%
  add_total_grp() %>%
  tally()

Without needing to define a new class in a new package and write S3 methods for all the dplyr verbs. Not to mention other tidyverse functions that take advantage of grouping.

As an outsider, the approach of using the grouping indices is really appealing for the user experience it would provide. But I am ignorant of how feasible that is on the back end. How difficult would it be to accommodate a total group? If the answer is that it would require a massive rewrite, or it's impossible, or "tricky and simply not a priority", I can accept that.

romainfrancois · 2018-03-28T13:23:18Z

I am also concerned about the change of class in your original example where the cyl column became a character to accommodate for Total.

sfirke · 2018-03-28T13:30:28Z

Yep. To me that seems inevitable with introducing an all group. It can't be NA because there could be cyl values of NA already. But maybe someone else has an idea for how to address that.

romainfrancois · 2018-03-28T13:33:59Z

About your initial add_totals_group, mind that dplyr uses 0 based indices, so I guess you'd need to use something like seq(nrow(dat)) - 1L

add_totals_group <- function(dat){
  cur_indices <- attr(dat, "indices")
  attr(dat, "indices")[[length(cur_indices) + 1]] <- seq(nrow(dat)) - 1L
  attr(dat, "group_sizes") <- c(attr(dat, "group_sizes"), nrow(dat))
  attr(dat, "biggest_group_size") <- nrow(dat)
  attr(dat, "labels") <- rbind(rep("Total", ncol(dat)))
  dat
}

not sure that's enough to fool it.

krlmlr · 2018-04-01T10:07:30Z

I'd like to treat attributes, in particular for grouped df-s, as private to dplyr.

As Romain has shown, it is now easy to build a custom class that will forward the same operation to both a grouped and an ungrouped version of the same table, and produce a nice output. I agree with Romain's and Hadley's original comment in #1253 that this feels out of scope for dplyr.

sfirke · 2018-04-02T14:02:54Z

👍 Thanks Roman and Kirill for considering and Romain for the proof of concept example to get something started. To anyone who sees this issue in the future: if you draft a package with these totals functions I'll user-test it 😀

krlmlr · 2018-04-09T23:58:03Z

Thanks!

lock · 2018-10-07T00:34:10Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

krlmlr added feature a feature request or enhancement data frame labels Apr 9, 2018

krlmlr closed this as completed Apr 9, 2018

lock bot locked and limited conversation to collaborators Oct 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: function to add "totals" as an indexed group for summarizing #3463

Feature Request: function to add "totals" as an indexed group for summarizing #3463

sfirke commented Mar 26, 2018

romainfrancois commented Mar 26, 2018

sfirke commented Mar 28, 2018 •

edited

Loading

romainfrancois commented Mar 28, 2018

sfirke commented Mar 28, 2018

romainfrancois commented Mar 28, 2018

krlmlr commented Apr 1, 2018

sfirke commented Apr 2, 2018

krlmlr commented Apr 9, 2018

lock bot commented Oct 7, 2018

Feature Request: function to add "totals" as an indexed group for summarizing #3463

Feature Request: function to add "totals" as an indexed group for summarizing #3463

Comments

sfirke commented Mar 26, 2018

romainfrancois commented Mar 26, 2018

sfirke commented Mar 28, 2018 • edited Loading

romainfrancois commented Mar 28, 2018

sfirke commented Mar 28, 2018

romainfrancois commented Mar 28, 2018

krlmlr commented Apr 1, 2018

sfirke commented Apr 2, 2018

krlmlr commented Apr 9, 2018

lock bot commented Oct 7, 2018

sfirke commented Mar 28, 2018 •

edited

Loading