Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BREAKING] new design of select, transform and combine #2214

Merged
merged 32 commits into from
May 5, 2020
Merged
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
55031d7
implement AbstractDataFrame functionality
bkamins Apr 27, 2020
55bde12
preparation in grouping, rename to _mutate in non-grouping
bkamins Apr 27, 2020
2f81c63
tentative rework of _combine that should be able to support select an…
bkamins Apr 27, 2020
fd951c5
continue grouping
bkamins Apr 28, 2020
eb9ace9
implement select, transform, select! and transform! for GroupedDataFr…
bkamins Apr 28, 2020
6908ee8
update DataFrame constructor
bkamins Apr 28, 2020
7b644dd
fix handling of aggregates
bkamins Apr 28, 2020
2753235
code cleanup
bkamins Apr 28, 2020
2a03190
improve canonical check + start rewriting tests
bkamins Apr 28, 2020
7b86eb8
allow changing sort order of groups in cannonical test
bkamins Apr 28, 2020
cb94903
make old tests pass
bkamins Apr 29, 2020
908d489
Merge branch 'master' into improve_selection
bkamins Apr 29, 2020
384c0b1
change error thrown on Julia 1.0
bkamins Apr 29, 2020
ea574c4
done tests of combine
bkamins Apr 29, 2020
8977017
finish tests and documentation
bkamins Apr 29, 2020
d51f3f8
updates after review comments
bkamins Apr 30, 2020
ef461e6
Apply suggestions from code review
bkamins May 1, 2020
245714d
fixes after code review
bkamins May 1, 2020
2bd31ff
add deprecated map tests
bkamins May 1, 2020
9d1b20d
fix error types in select
bkamins May 1, 2020
0f3d309
avoid computing idx, starts and ends in combine if regroup=true
bkamins May 1, 2020
1d69fa3
performance improvements
bkamins May 1, 2020
5713194
@simd did not improve the performance here
bkamins May 1, 2020
1f34d55
Update docs/src/man/split_apply_combine.md
bkamins May 1, 2020
2201789
add an example of passing function as a first argument to combine
bkamins May 1, 2020
2aa9170
change regroup to ungroup
bkamins May 2, 2020
cf4736c
Merge branch 'master' into improve_selection
bkamins May 5, 2020
333cca2
Apply suggestions from code review
bkamins May 5, 2020
334aba0
Merge remote-tracking branch 'origin/improve_selection' into improve_…
bkamins May 5, 2020
10b9474
update docs
bkamins May 5, 2020
792b57d
improve description of what gets returned in combine and select
bkamins May 5, 2020
f34873c
fix repeated code
bkamins May 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions src/groupeddataframe/splitapplycombine.jl
Original file line number Diff line number Diff line change
Expand Up @@ -258,9 +258,11 @@ const KWARG_PROCESSING_RULES =
combine(fun::Union{Function, Type}, df::AbstractDataFrame, ungroup::Bool=true)
combine(pair::Pair, df::AbstractDataFrame, ungroup::Bool=true)

Apply operations to each group in a [`GroupedDataFrame`](@ref) and return
the combined result as a `DataFrame`.
If an `AbstractDataFrame` is passed, apply operations to the data frame as a whole.
Apply operations to each group in a [`GroupedDataFrame`](@ref) and return the combined
result as a `DataFrame` if `ungroup=true` or `GroupedDataFrame` if `ungroup=false`.

If an `AbstractDataFrame` is passed, apply operations to the data frame as a whole
and a `DataFrame` is always returend.

$F_ARGUMENT_RULES

Expand Down Expand Up @@ -1423,9 +1425,15 @@ end
select(gd::GroupedDataFrame, args...;
copycols::Bool=true, keepkeys::Bool=true, ungroup::Bool=true)

Apply `args` to `gd` following the rules described in [`combine`](@ref).
The returned object has as many rows as `parent(gd)`.
If an operation returns a single value it is always broadcasted to have this number of rows.
Apply `args` to `gd` following the rules described in [`combine`](@ref) and return the
result as a `DataFrame` if `ungroup=true` or `GroupedDataFrame` if `ungroup=false`.

The `parent` of the returned value has as many rows as `parent(gd)`. If an operation
in `args` returns a single value it is always broadcasted to have this number of rows.

Apply operations to each group in a [`GroupedDataFrame`](@ref) and return the combined
result as a `DataFrame` if `ungroup=true` or `GroupedDataFrame` if `ungroup=false`.


If `copycols=false` then do not perform copying of columns that are not transformed.

Expand Down