-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: function to add "totals" as an indexed group for summarizing #3463
Comments
That's the kind of things tidy evaluation can help you build. library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
yada <- function(data, ... ){
groups <- quos(...)
structure(
list(
with = group_by(data, !!!groups),
without = data
),
class = "grouped_and_not_grouped"
)
}
summarise.grouped_and_not_grouped <- function(data, ...){
exprs <- quos(...)
with <- summarise( data$with, !!!exprs )
without <- summarise( data$without, !!!exprs )
bind_rows( with, without )
}
mtcars %>%
yada(cyl) %>%
summarise( avg_mpg = mean(mpg), n = n())
#> # A tibble: 4 x 3
#> cyl avg_mpg n
#> <dbl> <dbl> <int>
#> 1 4. 26.7 11
#> 2 6. 19.7 7
#> 3 8. 15.1 14
#> 4 NA 20.1 32 Might need some extra help with the NA, but I don't think this should be in |
I appreciate the example with tidyeval. But compared to using the grouping internals of dplyr, this feels heavy and brittle. Adding an index group would allow for a versatile, natural user experience, compared to another package taking this approach. For instance this fails:
Because we're now relying on another class. But would work if done through the grouping indices. And something simple like this would work:
Without needing to define a new class in a new package and write S3 methods for all the dplyr verbs. Not to mention other tidyverse functions that take advantage of grouping. As an outsider, the approach of using the grouping indices is really appealing for the user experience it would provide. But I am ignorant of how feasible that is on the back end. How difficult would it be to accommodate a total group? If the answer is that it would require a massive rewrite, or it's impossible, or "tricky and simply not a priority", I can accept that. |
I am also concerned about the change of class in your original example where the |
Yep. To me that seems inevitable with introducing an |
About your initial
not sure that's enough to fool it. |
I'd like to treat attributes, in particular for grouped df-s, as private to dplyr. As Romain has shown, it is now easy to build a custom class that will forward the same operation to both a grouped and an ungrouped version of the same table, and produce a nice output. I agree with Romain's and Hadley's original comment in #1253 that this feels out of scope for dplyr. |
👍 Thanks Roman and Kirill for considering and Romain for the proof of concept example to get something started. To anyone who sees this issue in the future: if you draft a package with these totals functions I'll user-test it 😀 |
Thanks! |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
(this is a generalization and extension of #1253)
This function would add a group to a grouped_df, called "Total" (or maybe a user-specified name), that has all row numbers in its index. Once this group is added to the tibble, summarise functions would treat it like any other group.
Hypothetical example:
The actual code to produce this hypothetical result is:
I started working on it but ran into internal dplyr validation checks that don't like my "corrupt" grouped_df. Here's my non-working attempt to tinker with the group indices:
The text was updated successfully, but these errors were encountered: