-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Commit
Efficiently finds the unique columns, rows, etc. of an array. The algorithm first hashes each row, then finds the unique hashes, and finally checks that the hashes don't collide. It is roughly O(n) in the number of elements in the matrix. This is my first time using Cartesian. Without it, this code is presently about 10% faster for finding unique rows of a matrix, but the overhead is probably worth it for the generality.
- Loading branch information
There are no files selected for viewing
3 comments
on commit db3b28d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was kind of surprised to discover this method and wonder how it can possibly work. What happens if the rows/columns don't have the same number of unique elements? This method needs documentation at least to explain what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this function compares the whole column / row against other columns/rows.
For example:
julia> a = [ones(5) ones(5) zeros(5)]
5x3 Array{Float64,2}:
1.0 1.0 0.0
1.0 1.0 0.0
1.0 1.0 0.0
1.0 1.0 0.0
1.0 1.0 0.0
julia> unique(a, 1)
1x3 Array{Float64,2}:
1.0 1.0 0.0
julia> unique(a, 2)
5x2 Array{Float64,2}:
1.0 0.0
1.0 0.0
1.0 0.0
1.0 0.0
1.0 0.0
Where does the number of unique elements in the row / column become a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that was totally unclear to me – that makes sense. I thought it was doing the unique operation in each column/row.
It's possible this can be sped up by doing more of your indexing in the pre-expression. For two dimensions, this inner loop generates code like this:
Despite appearances, this might be fast because branch prediction should be 100% effective (
dim
isn't changing). But, if you want that 10% back (relative to non-cartesian) you may want to evaluate something like this:which moves one of your
if
statements out of the inner loop.