Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds table transforms #45

Merged
merged 18 commits into from
Aug 10, 2021
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ authors = ["lorenzoh <[email protected]>"]
version = "0.2.3"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
ColorBlendModes = "60508b50-96e1-4007-9d6c-f475c410f16b"
CoordinateTransformations = "150eb455-5306-5404-9cee-2592286d6298"
DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
ImageDraw = "4381153b-2b60-58ae-a1ba-fd683676385f"
ImageTransformations = "02fcd773-0e25-5acc-982a-7f6622650795"
Expand Down
8 changes: 7 additions & 1 deletion src/DataAugmentation.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ module DataAugmentation

using ColorBlendModes
using CoordinateTransformations
using CategoricalArrays
using Distributions: Sampleable, Uniform, Categorical
using ImageDraw
using Images
Expand All @@ -18,6 +19,7 @@ using Rotations
using Setfield
using StaticArrays
using Statistics
using DataStructures
using Test: @test, @test_nowarn


Expand All @@ -28,6 +30,7 @@ include("./sequence.jl")
include("./items/arrayitem.jl")
include("./projective/base.jl")
include("./items/image.jl")
include("./items/table.jl")
include("./items/keypoints.jl")
include("./items/mask.jl")
include("./projective/compose.jl")
Expand All @@ -36,6 +39,7 @@ include("./projective/affine.jl")
include("./projective/warp.jl")
include("./oneof.jl")
include("./preprocessing.jl")
include("./rowtransforms.jl")
include("./colortransforms.jl")
include("testing.jl")
include("./visualization.jl")
Expand All @@ -49,6 +53,7 @@ export Item,
Sequence,
Project,
Image,
TabularItem,
Keypoints,
Polygon,
ToEltype,
Expand Down Expand Up @@ -88,7 +93,8 @@ export Item,
onehot,
showitems,
showgrid,
Bounds
Bounds,
getcategorypools


end # module
4 changes: 4 additions & 0 deletions src/items/table.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
struct TabularItem{T} <: Item
data::T
columns
end
manikyabard marked this conversation as resolved.
Show resolved Hide resolved
56 changes: 56 additions & 0 deletions src/rowtransforms.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
struct NormalizeRow{T, S} <: Transform
dict::T
cols::S
end

struct FillMissing{T, S} <: Transform
dict::T
cols::S
end

struct Categorify{T, S}
dict::T
cols::S
function Categorify{T, S}(dict::T, cols::S) where {T, S}
for (col, vals) in dict
if any(ismissing.(vals))
manikyabard marked this conversation as resolved.
Show resolved Hide resolved
dict[col] = collect(skipmissing(vals))
manikyabard marked this conversation as resolved.
Show resolved Hide resolved
@warn "There is a missing value present for category '$col' which will be removed from Categorify dict"
end
end
new{T, S}(dict, cols)
end
end

Categorify(dict::T, cols::S) where {T, S} = Categorify{T, S}(dict, cols)

function apply(tfm::NormalizeRow, item::TabularItem; randstate=nothing)
x = NamedTuple(Iterators.map(item.columns, item.data) do col, val
if col in tfm.cols
colmean, colstd = tfm.dict[col]
val = (val - colmean)/colstd
end
(col, val)
end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end)
end)
return TabularItem(x, item.columns)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And all the other transforms too

TabularItem(x, item.columns)
end

function apply(tfm::FillMissing, item::TabularItem; randstate=nothing)
x = NamedTuple(Iterators.map(item.columns, item.data) do col, val
if col in tfm.cols && ismissing(val)
val = tfm.dict[col]
end
(col, val)
end)
TabularItem(x, item.columns)
end

function apply(tfm::Categorify, item::TabularItem; randstate=nothing)
x = NamedTuple(Iterators.map(item.columns, item.data) do col, val
if col in tfm.cols
val = ismissing(val) ? 1 : findfirst(val .== tfm.dict[col]) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be findfirst if we use SortedSet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would findfirst work when the input function involves comparing with missing?

Copy link
Member

@darsnack darsnack Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because tfm.dict[col] always contains missing, and missing is treated as any other element in the set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean. An equality comparison with missing is missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as a result of this, the whole storing missing in the tfm.dict[col] is not going to work. We'll have to revert to the old filtering way + the conditional shown here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, there's no need to store missing in the dict values either then right? The conditional is required either way.

Copy link
Member

@darsnack darsnack Jun 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's what I meant. Filter the missing out of the dict, and don't add it if it isn't there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I have updated the constructor to use skipmissing and collect for the values containing missing.

end
(col, val)
end)
TabularItem(x, item.columns)
end