-
Notifications
You must be signed in to change notification settings - Fork 11
Stack should use similar_nullable, not NullableArray #54
Conversation
cjprybol
commented
Apr 13, 2017
•
edited
Loading
edited
- Adds a test set to trigger multiple entries in unstack warning
- Adds a similar_nullable function for NullableCategoricalArrays
- Tests to make sure column types are converted to their appropriate nullable type.
src/abstractdatatable/reshape.jl
Outdated
@@ -98,8 +98,8 @@ function stack(dt::AbstractDataTable, measure_vars::Vector{Int}, id_var::Int; | |||
end | |||
function stack(dt::AbstractDataTable, measure_var::Int, id_vars::Vector{Int}; | |||
variable_name::Symbol=:variable, value_name::Symbol=:value) | |||
stackdt(dt, [measure_var], id_vars; | |||
variable_name=variable_name, value_name=value_name) | |||
stack(dt, [measure_var], id_vars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local tests say no, so it must be something I hadn't yet reverted. I'm not sure why this is the only stack function calling stackdt
, but I'll investigate that when I update the indexing of stackdt's custom array views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message shouldn't mention this change now that it's not included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you add a test for the incorrect type that you fixed?
test/datatable.jl
Outdated
@@ -341,4 +341,21 @@ module TestDataTable | |||
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), :A).columns) == [1] | |||
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), [1]).columns) == [1] | |||
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), 1).columns) == [1] | |||
|
|||
@testset "multiple entries in unstack warnings" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great but should go to a separate commit since it's not related to the above changes AFAICT.
Updated. I tagged on a commit at the end to fix the deprecation warnings for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this exercizes the code path, but doesn't test that warnings are thrown, right? The commit message could explain this. I don't know how to catch warnings, but ideally we would check that they are thrown as expected.
@@ -15,6 +15,9 @@ similar_nullable{T,R}(dv::CategoricalArray{T,R}, dims::@compat(Union{Int, Tuple{ | |||
similar_nullable(dt::AbstractDataTable, dims::Int) = | |||
DataTable(Any[similar_nullable(x, dims) for x in columns(dt)], copy(index(dt))) | |||
|
|||
similar_nullable{T,R}(dv::NullableCategoricalArray{T,R}, dims::Union{Int, Tuple{Vararg{Int}}}) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be more logical to add this definition right after that for CategoricalArray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AbstractDataTable function that separates the two will disappear in #44. I thought adding the new function to the end of the set like this would minimize the risk of a merge conflict, although that's purely speculative. I can reverse them if you'd like, but I'm not sure it will matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but it's always better to have commits be correct when taken in isolation.
test/datatable.jl
Outdated
udt = unstack(long, :variable, :value) | ||
@test udt == DataTable(Any[collect(1:12), repeat(1:3, inner = 4), | ||
repeat(1:4, inner = 3), | ||
Nullable{Float64}.(linspace(0, 1, 12)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just Nullable.
is enough. Or even NullableArray
since that's a more common structure to use.
test/datatable.jl
Outdated
@testset "unstack nullable promotion" begin | ||
long = DataTable(Any[repeat([:c, :d], inner=12), | ||
vcat(linspace(0, 1, 12), linspace(2, 1, 12)), | ||
repeat(1:12, outer=2), repmat(repeat(1:3, inner=4), 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Only applies if you keep this.) Simply repeat(1:3, inner=4, outer=2)
. Same below.
test/datatable.jl
Outdated
@@ -342,6 +342,40 @@ module TestDataTable | |||
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), [1]).columns) == [1] | |||
@test find(c -> isa(c, NullableCategoricalArray), categorical!(DataTable(A=1:3, B=4:6), 1).columns) == [1] | |||
|
|||
@testset "unstack nullable promotion" begin | |||
long = DataTable(Any[repeat([:c, :d], inner=12), | |||
vcat(linspace(0, 1, 12), linspace(2, 1, 12)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you find a simpler example which doesn't involve repeat
nor linspace
? Since the goal here is to test promotion, focus on testing types rather than contents. BTW, types are not tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this example from the unstack docstring, replacing rand with linspace to avoid srand and random number issues. I've updated it a bit to make it a little simpler. The two tested types here are converting standard arrays to NullableArrays and CategoricalArrays to NullableCategoricalArrays, with the latter being the primary issue of the current master implementation. Other types are tested in assorted places where other calls to unstack are made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took this example from the unstack docstring, replacing rand with linspace to avoid srand and random number issues.
Yes, but there's no reason to use such a complicated example here. Just a handful a values are enough to test that promotion works.
The two tested types here are converting standard arrays to NullableArrays and CategoricalArrays to NullableCategoricalArrays,
But you should explicitly test for column types of the result. The current code only checks that it compares equal, which can happen with any nullable vector, not just NullableCategoricalArray
(and in the future, maybe even with a non-nullable vector with the new Nullable
implementation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
test/datatable.jl
Outdated
NullableCategoricalArray(linspace(2, 1, 12))], | ||
[:id, :a, :b, :c, :d]) | ||
@test unstack(long) == | ||
unstack(long, :id, :variable, :value) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation.
test/datatable.jl
Outdated
dt = DataTable(id=[1, 2, 1, 2], variable=["a", "b", "a", "b"], value=[3, 4, 5, 6]) | ||
a = unstack(dt, :id, :variable, :value) | ||
b = unstack(dt, :variable, :value) | ||
@test a == b == DataTable(id = Nullable[1, 2], a = Nullable[5, Nullable()], b = Nullable[Nullable(), 6]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullable[
isn't needed for the two last uses and is actually even counter-productive here since the array will have an abstract element type. Elsewhere, use Nullable{Int}
.
On 0.6 you can test warnings with |
Thanks for the suggestion, Alex! Didn't know about I couldn't find anything in BaseTestNext or compat to extend supporting those macros into v0.5 |
test/datatable.jl
Outdated
|
||
@testset "duplicate entries in unstack warnings" begin | ||
dt = DataTable(id=NullableArray([1, 2, 1, 2]), variable=["a", "b", "a", "b"], value=[3, 4, 5, 6]) | ||
if isdefined(Symbol(:@test_warn)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be isdefined(Base.Test, Symbol("@test_warn"))
. This is why the tests failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't get isdefined
to work because Julia will still try and expand the @test_warn
macro during parsing. I think it needs a @static
check to avoid the if block entirely during parsing in v0.5. A static check against the first tagged version with the @test_warn
macro works, although I'm not sure how to reference the exact commit. JuliaLang/julia@28a11ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Run ./contrib/commit-name.sh 28a11ffd4d2e8451492013f18dad5e9a576233f6
from the root directory of a clone of Julia to get the exact version number for a particular commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get around the eager macro expansion,
if isdefined(Base.Test, Symbol("@test_warn"))
include_string("""
@test_warn ...
""")
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just bookmarked both of those comments.
In v0.6, where the macros test_warn and test_nowarn are defined, tests will actually check the warning messages. In v0.5 warnings are not checked but can be manually confirmed by reviewing the output of running Pkg.test("DataTables").
@@ -235,15 +232,10 @@ function unstack(dt::AbstractDataTable, colkey::Int, value::Int) | |||
end | |||
keycol = NullableCategoricalArray(dt[colkey]) | |||
valuecol = dt[value] | |||
dt1 = dt[g.idx[g.starts], g.cols] | |||
dt1 = nullable!(dt[g.idx[g.starts], g.cols], g.cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to specify g.cols
the second time AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nullify!
function I had written for the pre-refactor PR didn't need any specification of columns when you want to nullify all of the columns, but this existing nullable!
function does require columns to be specified.
julia> nullable!(DataTable(a = 1:4))
ERROR: MethodError: no method matching nullable!(::DataTables.DataTable)
Closest candidates are:
nullable!(::DataTables.DataTable, ::Union{Real, Symbol}) at /Users/Cameron/.julia/v0.6/DataTables/src/datatable/datatable.jl:779
nullable!(::DataTables.DataTable, ::Array{T<:Union{Real, Symbol},1}) where T<:Union{Real, Symbol} at /Users/Cameron/.julia/v0.6/DataTables/src/datatable/datatable.jl:783
nullable!(::Array{Symbol,1}, ::DataTables.AbstractDataTable) at deprecated.jl:50
...
julia> nullable!(DataTable(a = 1:4), :a)
4×1 DataTables.DataTable
│ Row │ a │
├─────┼───┤
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 3 │
│ 4 │ 4 │
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so that's fine for now.
test/datatable.jl
Outdated
[:id, :variable, :value]) | ||
udt = unstack(dt) | ||
@test udt == unstack(dt, :variable, :value) == unstack(dt, :id, :variable, :value) | ||
@test udt == DataTable(Any[NullableCategoricalArray([1, 2]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nullable[1, 2]
and so on is enough here, NullableCategoricalArray
is only needed in the type check.
Stack was using NullableArray, which would incorrectly cast CategoricalArrays to NullableArrays{CategoricalValue}. Adds a new similar_nullable function for NullableCategoricalArrays.