-
Notifications
You must be signed in to change notification settings - Fork 11
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to review the rest of the tests, but this should cover the most substantive parts.
if length(uniqueheaders) == 0 | ||
return DataTable() | ||
elseif length(unique(map(length, uniqueheaders))) > 1 | ||
throw(ArgumentError("not all DataTables have the same number of columns. Resolve column(s): $(setdiff(union(allheaders...), intersect(allheaders...)))")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Resolve" might not be completely clear. Maybe "unmatched" (sorry, not a native speaker)?
Also it would still be interesting to print the number of columns of each table, since that's what the message says.
elseif length(unique(map(length, uniqueheaders))) > 1 | ||
throw(ArgumentError("not all DataTables have the same number of columns. Resolve column(s): $(setdiff(union(allheaders...), intersect(allheaders...)))")) | ||
elseif length(uniqueheaders) > 1 | ||
throw(ArgumentError("Column names do not match. Use `rename!` or `names!` to adjust columns names. Resolve column(s): $(setdiff(union(allheaders...), intersect(allheaders...)))")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think errors should start with a lowercase letter.
Int64 | ||
``` | ||
|
||
See also [`denullify`](@ref) & [`nullify!`](@ref). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use "and" than "&".
function denullify!(dt::AbstractDataTable) | ||
for i in 1:size(dt,2) | ||
if !anynull(dt[i]) | ||
dt[i] = dropnull(dt[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dropnull!
? That will avoid a copy for NullableArray
, and it won't make a difference for other arrays. Since there a no nulls, the original arrays won't be modified.
denullify!(dt::AbstractDataTable) | ||
|
||
Convert columns with a `Nullable` element type without any null values | ||
to a non-`Nullable` equivalent array type. The table `dt` is modified in place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention that new columns may alias the old ones, even when they were converted. Same for denullify
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should denullify
switch to deepcopy
? I'm not sure I like this alias behavior for denullify
.
julia> using DataTables
julia> dt = DataTable(A = 1:3, B = NullableArray(1:3))
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
julia> ddt = denullify(dt)
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
julia> dt[:A] === ddt[:A]
true
julia> ddt[:A] = 1
1
julia> ddt
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 2 │
│ 3 │ 1 │ 3 │
julia> dt
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 1 │ 2 │
│ 3 │ 1 │ 3 │
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and nullify...
julia> dt = DataTable(A = 1:3, B = NullableArray(1:3))
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
julia> ndt = nullify(dt)
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 1 │
│ 2 │ 2 │ 2 │
│ 3 │ 3 │ 3 │
julia> dt[:B] === ndt[:B]
true
julia> ndt[:B] = 3
3
julia> dt
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 3 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 3 │
julia> ndt
3×2 DataTables.DataTable
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ 1 │ 3 │
│ 2 │ 2 │ 3 │
│ 3 │ 3 │ 3 │
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, we would need concrete use cases to decide. In both cases people can easily make a copy manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed these back to using copy as you suggested and added a note for nullify
and denullify
that if users want fully alias-free copies, they should use nullify!(deepcopy(dt))
and denullify!(deepcopy(dt))
. Hopefully if anyone hits this issue and doesn't know why columns are changed across multiple tables, they'll open the help docstrings and see the note.
test/constructors.jl
Outdated
x2 = [1.0, 1.0, 1.0])) | ||
@test isequal(dt, DataTable(x1 = [0.0, 0.0, 0.0], | ||
x2 = [1.0, 1.0, 1.0], | ||
@test isequal(dt, DataTable(x1 = NullableCategoricalVector([0.0, 0.0, 0.0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NullableVector
rather.
test/constructors.jl
Outdated
|
||
dt = DataTable([Int, Float64], [:x1, :x2], 2) | ||
@test size(dt) == (2, 2) | ||
@test eltypes(dt) == [Nullable{Int}, Nullable{Float64}] | ||
|
||
@test isequal(dt, DataTable([Int, Float64], 2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor still seems legitimate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor is legitimate but the test is asking if two datatables with uninitialized columns contain the same values.
by the same logic these two vectors should be equal
julia> Vector{Float64}(10)
10-element Array{Float64,1}:
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
2.29759e-314
julia> Vector{Float64}(10)
10-element Array{Float64,1}:
2.29514e-314
9.88131e-324
2.25711e-314
2.29432e-314
0.0
2.29469e-314
2.29469e-314
2.29514e-314
2.29469e-314
5.45361e-312
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. So use Nullable
eltypes, which should compare as isequal
even with nulls values.
test/constructors.jl
Outdated
@test DataTable(a=1, b=1:2) == DataTable(a=[1,1], b=[1,2]) | ||
|
||
@testset "associative" begin | ||
dt = DataTable(Dict(k => v for (k,v) in zip([:A, :B], [1:3, 4:6]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just Dict(A => 1:3, B => 4:6)
.
test/constructors.jl
Outdated
|
||
@testset "associative" begin | ||
dt = DataTable(Dict(k => v for (k,v) in zip([:A, :B], [1:3, 4:6]))) | ||
@test dt == DataTable(A = 1:3, B = 4:6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test the type of the columns (would maybe make sense elsewhere too).
src/datatable/datatable.jl
Outdated
if isa(columns[i], Range) | ||
columns[i] = collect(columns[i]) | ||
end | ||
repeats = div(maxlen, length(columns[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this recycling behavior which feels too magical. JuliaData/DataFrames.jl#882 only recycles single-element arrays, and that feels safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructors seem like they are already pretty forgiving given that recycling is attempted at all. It would be an interesting duality of stances: forgiving and friendly during the construction of a DataTable but enforce strict behavior rules during use. We've taken a pretty clear stance on the latter by deciding no to allowing dt[row, :col] = Nullabe()
to autopromote to a NullableArray and accept the assignment. I think there's precedent for the recycling behavior too. I'd like some additional perspectives here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of this mainly because I saw it in the tests
dt = DataTable(a = repeat([1, 2, 3, 4], outer=[2]), |
d1 = DataTable(a = repeat([1:3;], inner = [4]), |
dt = DataTable(a = repeat([1, 2, 3, 4], outer=[2]), |
DataTables.jl/test/grouping.jl
Line 6 in 5f611bf
dt = DataTable(a = repeat([1, 2, 3, 4], outer=[2]), |
Line 146 in 20c71d6
d1 = DataTable(a = repeat([1:3;], inner = [4]), |
DataTables.jl/src/abstractdatatable/join.jl
Line 293 in 20c71d6
cols = Any[[repeat(c, inner=r2) for c in columns(dt1)]; |
Of course, the shortcoming here is that we decide for the user which is the default, inner or outer repeat. But at least one of these DataTable constructor behaviors will have a smaller declaration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructors seem like they are already pretty forgiving given that recycling is attempted at all. It would be an interesting duality of stances: forgiving and friendly during the construction of a DataTable but enforce strict behavior rules during use.
What do you mean? Constructors do not support any recycling currently. I'm just suggesting we extend to constructors the current behavior of setindex!
, i.e. recycle scalars (and maybe on-element arrays, but that's up for discussion).
The tests/examples are not really representative of work with real data, where you more rarely need to generate vectors by recycling.
New, potentially disruptive (but probably not?) change I have added to this. stack/unstack & meltdt/unstackdt were all performing lots of Array type conversions, the first to NullableCategoricalArrays/NullableArrays and the later to the internal AbstractVector types of RepeatedArray/StackedArray. Removing the automatic array type promotion of stack/unstack wasn't too bad, but removing it from meltdt/unstackdt is essentially impossible without a rewrite of how those work (and the custom backend array types). RepeatedArrays and StackedArrays don't support all indexing types (Bool/BitArray were the two I cared about) and also, they are both another case of auto promotion of Array type that we're trying to get rid of. They don't appear to be commonly used (and are definitely not well tested), otherwise I'm sure someone would have caught these basic usability issues already. |
|
||
This saves memory. To create the view, several AbstractVectors are defined: | ||
|
||
`:variable` column -- `EachRepeatedVector` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There aren't any EachRepeatedVector
s left in the code so this is probably overdue for removal
I think that behavior is fine. That's the only way to avoid unnecessary
copies. People can just make a copy if needed.
|
Probably not worth it in practice, but who knows...
|
Ah, too bad. I'd rather keep the old form then. Anyway iterating over
keyword arguments should be quite cheap.
|
Ah, that's really annoying. This could be a sufficient motivation to
add temporary hacks to support concatenation up to 3 or 4 arguments as
in JuliaStats/NullableArrays.jl#187.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay!
src/abstractdatatable/reshape.jl
Outdated
@@ -76,9 +76,9 @@ function stack(dt::AbstractDataTable, measure_vars::Vector{Int}, | |||
cnames = names(dt)[id_vars] | |||
insert!(cnames, 1, value_name) | |||
insert!(cnames, 1, variable_name) | |||
DataTable(Any[Compat.repeat(_names(dt)[measure_vars], inner=nrow(dt)), # variable | |||
DataTable(Any[repeat(_names(dt)[measure_vars], inner=nrow(dt)), # variable | |||
vcat([dt[c] for c in measure_vars]...), # value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should align this comment too.
src/abstractdatatable/join.jl
Outdated
NullableArray(eltype(T), dims) | ||
|
||
similar_nullable{T,R}(dv::CategoricalArray{T,R}, dims::@compat(Union{Int, Tuple{Vararg{Int}}})) = | ||
similar_nullable{T,R}(dv::CategoricalArray{T,R}, dims::Union{Int, Tuple{Vararg{Int}}}) = | ||
NullableCategoricalArray(T, dims) | ||
|
||
similar_nullable(dt::AbstractDataTable, dims::Int) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this definition still used somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, only the Array ones are still used. I'll take this one out.
src/abstractdatatable/io.jl
Outdated
@@ -45,7 +45,7 @@ function printtable(io::IO, | |||
if !isnull(dt[j][i]) | |||
if ! (etypes[j] <: Real) | |||
print(io, quotemark) | |||
x = isa(dt[i, j], Nullable) ? _unsafe_get(dt[i, j]) : dt[i, j] | |||
x = isa(dt[i, j], Nullable) ? NullableArrays.unsafe_get(dt[i, j]) : dt[i, j] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not rely on NullableArrays internals for this. Also in Julia 0.6 `unsafe_get exists in Julia Base, so better not make it look like a NullableArrays-specific function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was working on 0.5 when I wrote this and couldn't find the Base.unsafe_get. Will use @compat(unsafeget)
clarify error messages, fix spacing, address case of length(::Symbol) and length(::String) not giving the desired output (Vector length), and unify constructors in accepting AbstractVector{Symbol} for colnames
src/abstractdatatable/join.jl
Outdated
similar_nullable{T}(dv::AbstractVector{T}, dims::Union{Int, Tuple{Vararg{Int}}}) = | ||
NullableVector{T}(dims) | ||
similar_nullable{T}(dv::AbstractArray{T}, dims::Union{Int, Tuple{Vararg{Int}}}) = | ||
NullableArray(T, dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do NullableArray
s not support the proper type parameter convention? I thought that was just DataArray
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they do. This is reverting back to the current master syntax. Now that I've added tests for CategoricalArrays in unstack (or maybe just ran a Pkg.update?) I caught a new deprecation warning.
julia> DataTables.similar_nullable(CategoricalArray([1, 1, 1]), 2)
WARNING: NullableCategoricalArray{T}(::Type{T}, dims::Int...; ordered=false) is deprecated, use NullableCategoricalArray{T}(dims, ordered=ordered) instead.
Stacktrace:
[1] depwarn(::String, ::Symbol) at ./deprecated.jl:64
[2] #NullableCategoricalArray#137(::Bool, ::Type{T} where T, ::Type{Int64}, ::Int64, ::Vararg{Int64,N} where N) at ./deprecated.jl:51
[3] CategoricalArrays.NullableCategoricalArray(::Type{Int64}, ::Int64) at ./deprecated.jl:51
[4] similar_nullable(::CategoricalArrays.CategoricalArray{Int64,1,UInt32}, ::Int64) at /Users/Cameron/.julia/v0.6/DataTables/src/abstractdatatable/join.jl:12
[5] eval(::Module, ::Any) at ./boot.jl:235
[6] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:66
[7] macro expansion at ./REPL.jl:97 [inlined]
[8] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
while loading no file, in expression starting on line 0
2-element CategoricalArrays.NullableCategoricalArray{Int64,1,UInt32}:
#NULL
#NULL
julia> DataTables.similar_nullable(NullableCategoricalArray([1, 1, 1]), 2)
WARNING: NullableCategoricalArray{T}(::Type{T}, dims::Int...; ordered=false) is deprecated, use NullableCategoricalArray{T}(dims, ordered=ordered) instead.
Stacktrace:
[1] depwarn(::String, ::Symbol) at ./deprecated.jl:64
[2] #NullableCategoricalArray#137(::Bool, ::Type{T} where T, ::Type{Int64}, ::Int64, ::Vararg{Int64,N} where N) at ./deprecated.jl:51
[3] CategoricalArrays.NullableCategoricalArray(::Type{Int64}, ::Int64) at ./deprecated.jl:51
[4] similar_nullable(::CategoricalArrays.NullableCategoricalArray{Int64,1,UInt32}, ::Int64) at /Users/Cameron/.julia/v0.6/DataTables/src/abstractdatatable/join.jl:15
[5] eval(::Module, ::Any) at ./boot.jl:235
[6] eval_user_input(::Any, ::Base.REPL.REPLBackend) at ./REPL.jl:66
[7] macro expansion at ./REPL.jl:97 [inlined]
[8] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73
while loading no file, in expression starting on line 0
2-element CategoricalArrays.NullableCategoricalArray{Int64,1,UInt32}:
#NULL
#NULL
I changed all of these to the new syntax but @nalimilan asked if I would revert and save the changes for another PR, too many changes at once.
hcat!(dt::DataTable, x::NullableVector) = hcat!(dt, DataTable(Any[x])) | ||
hcat!(dt::DataTable, x::Vector) = hcat!(dt, DataTable(Any[NullableArray(x)])) | ||
hcat!(dt::DataTable, x) = hcat!(dt, DataTable(Any[NullableArray([x])])) | ||
hcat!(dt::DataTable, x) = hcat!(dt, DataTable(Any[x])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the scalar recycling in this PR these can all be consolidated into a single function
unionunique = union(uniqueheaders...) | ||
coldiff = setdiff(unionunique, intersect(uniqueheaders...)) | ||
if !isempty(coldiff) | ||
# if any datatables are a full superset of names, skip them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for each unique set of column names I'm throwing the error to tell which columns are missing from each of the sets. If any of the inputs to vcat have all of the column names then we can't show which are missing, so they're dropped from the error output
pending no major issues during reviews, I think this is ready to merge. |
src/deprecated.jl
Outdated
@@ -18,5 +18,5 @@ import Base: keys, values, insert! | |||
|
|||
@deprecate sub(dt::AbstractDataTable, rows) view(dt, rows) | |||
|
|||
@deprecate stackdf stackdt | |||
@deprecate meltdf meltdt | |||
@deprecate stackdf stack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be considered an unrelated change but given the issues with the stackdt/meltdt maybe we should be steering people towards the non-view versions? I'm happy to revert this if others would like it reverted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd revert it for now, as it's unrelated. We can discuss steering people toward stack
separately.
@@ -61,13 +61,13 @@ d = stackdt(iris) | |||
|
|||
This saves memory. To create the view, several AbstractVectors are defined: | |||
|
|||
`:variable` column -- `EachRepeatedVector` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, these probably are important. 2 spaces == newline. I forgot this is markdown interpreter dependant
dt2 = DataTable(A = 1:3) | ||
# right missing 1 column | ||
err = @test_throws ArgumentError vcat(dt1, dt2) | ||
@test err.value.msg == "column(s) B are missing from argument(s) 2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like that this is possible (catching the error output in a variable to test the string). Thanks for the suggestion here
src/abstractdatatable/reshape.jl
Outdated
valuecol = dt[value] | ||
keycol = NullableCategoricalArray(dt[colkey]) | ||
levels!(keycol, unique(dt[colkey])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/abstractdatatable/reshape.jl
Outdated
groupidxs = [g.idx[g.starts[i]:g.ends[i]] for i in 1:length(g.starts)] | ||
rowkey = zeros(Int, size(dt, 1)) | ||
for i in 1:length(groupidxs) | ||
rowkey[groupidxs[i]] = i | ||
end | ||
keycol = NullableCategoricalArray(dt[colkey]) | ||
levels!(keycol, unique(dt[colkey])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, couldn't find the time to review the new changes.
After discussing this with others, I think we should adopt the following strategy to merge this:
-
I hadn't really thought about the
stackdt
/meltdt
issue. I turns out it's not OK to remove these. The fact that nobody noticed they were semi-broken doesn't mean they aren't useful, since DataFrames master hasn't been used widely before forking to DataTables. So these need to be restored and adapted like you did forstack
andmelt
. -
The changes to tests implied by the removal of automatic promotion to
NullableArray
are somewhat problematic. Indeed, that means by default data tables won't be usingNullableArray
, so we won't test that they are still supported by all functions. Ideally, we would test bothArray
andNullableArray
, and their combination, but for now to ensure we don't regress (now or in the future) I'm afraid all tests will have to be converted to useNullableArray
columns, except of course for specific cases where it's justified.
In order to make this less messy, could you refactor this PR in (at least) two parts? The first part can contain most changes to the code (but not to tests), except that the promotion to NullableArray
would still happen. That way, tests do not need to be adapted too much. The second part can then change the default, and adapt tests to use NullableArray
columns. Since that's going to touch a lot of code, it's better to have it separate so that more fundamental changes do not get lost in the noise.
Thanks!
matchingloci = find(h -> u == h, allheaders) | ||
headerdiff = filter(x -> !in(x, u), coldiff) | ||
headerdiff = length(headerdiff) > 1 ? | ||
join(string.(headerdiff[1:end-1]), ", ") * " and " * string(headerdiff[end]) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather align everything on length
, without further indentation. Anyway if you keep the indentation, it should be four spaces. Same below.
Anyway, no need for this length check nor to handle the last element manually: as I said, just use join
's last
argument.
# multiple missing 1 column | ||
err = @test_throws ArgumentError vcat(dt1, dt2, dt2, dt2, dt2, dt2) | ||
@test err.value.msg == "column(s) B are missing from argument(s) 2, 3, 4, 5 and 6" | ||
# argument missing >1columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space.
string(matchingloci[end]) | ||
estrings[i] = "column(s) $headerdiff are missing from argument(s) $matchingloci" | ||
end | ||
throw(ArgumentError(join(estrings, ", and "))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use last
here for "and".
test/io.jl
Outdated
@@ -49,6 +49,6 @@ module TestIO | |||
F = NullableArray(fill(Nullable(), 3)), | |||
G = fill(Nullable(), 3)) | |||
|
|||
answer = Sys.WORD_SIZE == 64 ? 0xd4b5a035796ad770 : 0x1950ccd7 | |||
answer = Sys.WORD_SIZE == 64 ? 0x937e94e70d642cce : 0x2b8864d8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually while this trick saves some space, I'd prefer copying the full output (using triple quotes). That way the logs show the actual and expected output on failure, which makes it clearer what doesn't work.
@testset "null conversions" begin | ||
dt = DataTable(A = 1:3, B = 2:4, C = 3:5) | ||
nullfree = Any[Array{Int,1},Array{Int,1},Array{Int,1}] | ||
nullified = convert(Vector{Any}, fill(NullableArray{Int,1}, 3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this conversion? Why use a different approach from the line above? Also I don't think you need the Any
above, and Vector{Int}
is shorter.
@@ -271,13 +272,6 @@ module TestData | |||
v1 = randn(10) | |||
) | |||
|
|||
dt2 = DataTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated, please keep this for now.
@@ -35,8 +35,6 @@ module TestConversions | |||
@test isa(ai, Matrix{Int}) | |||
@test ai == convert(Matrix{Int}, dt) | |||
|
|||
dt[1,1] = Nullable() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this tested somewhere else? If not, should be kept (after adapting of course).
@testset "associative" begin | ||
dt = DataTable(Dict(:A => 1:3, :B => 4:6)) | ||
@test dt == DataTable(A = 1:3, B = 4:6) | ||
@test all(e -> e <: Int, eltypes(dt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just eltypes(dt) == [Int, Int]
.
|
||
@test isequal(dt, DataTable([Int, Float64], 2)) | ||
@test isequal(dt, DataTable([Nullable{Int}, Nullable{Float64}], 2)) | ||
@test all(isnull, (dt[:x1], dt[:x2])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just @test isnull(dt[:x1]); @test isnull(dt[:x2])
(on a new line). That also makes the tests more precise about what's wrong if they fail.
@@ -18,8 +18,6 @@ module TestConstructors | |||
|
|||
@test isequal(dt, DataTable(Any[NullableCategoricalVector(zeros(3)), | |||
NullableCategoricalVector(ones(3))])) | |||
@test isequal(dt, DataTable(x1 = [0.0, 0.0, 0.0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test still applies: it checks that comparisons work across types. You can convert it to NullableArray
like below.
That sounds like a great plan, @nalimilan, I can definitely do that! I think the logical units present in this PR that are necessary to stop autopromotion or otherwise assist with the transition away from autopromotion are:
I agree with all of your specific code comments/reviews as well, so I'll incorporate those where appropriate. Thanks for the feedback, and please don't feel the need to apologize for any delays! I'm sure you're busy enough before you start volunteering your time to maintain and develop this package, and DataTables isn't even the only package you maintain and develop. @nalimilan and @ararslan both deserve 🥇🥇s, as do the maintainers of many other packages and the language itself. I'll leave this PR open as a reference, and now that we're going to refactor this, I'll update #31 to not be dependant on this PR and try and get those changes merged soon too. |
I was hitting issues during the rebase and thought it would be smarter to just branch from master, dump the edits in, and push to overwrite the old branch. Looks like that was a bad idea and the prior PR was automatically closed.
To summarize some of the features discussed earlier and still present in this PR:
nullify!
,nullify
,denullify!
, anddenullify
have been added for converting DataTables back and forth from nullable and nullfree states.#24