-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
index_labels
method and IndexLabel
type indexing
#328
Conversation
Codecov Report
@@ Coverage Diff @@
## master #328 +/- ##
==========================================
- Coverage 90.27% 90.00% -0.27%
==========================================
Files 9 9
Lines 1337 1361 +24
==========================================
+ Hits 1207 1225 +18
- Misses 130 136 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
axes_keys
method and Key
type indexingaxislabels
method and Label
type indexing
Discourse conversation pointed out there may be some confusion with plotting libraries that often use axis tick labels. Switch to index should be more specific what is being labeled.
After a month of dedicated bike-shedding on discourse, I think this is ready for review. It would be good to have some relevant package maintainers okay the name of the function @rafaqz (DimensionalData.jl), @mcabbott (AxisKeys.jl), @davidavdav (NamedArrays.jl), and someone from AxisArrays (maybe @johnnychen94). I'd also like an okay on the syntax from @ChrisRackauckas because we're using the term "label" here and I want to ensure that this isn't problematic for LabelledArrays.jl. |
I don't care about the name |
src/axes.jl
Outdated
index_labels(x, dim) | ||
|
||
Returns a tuple of labels assigned to each axis or a collection of labels corresponding to | ||
axis `dim` of `x`. Default is to simply return `map(keys, axes(x))`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really a good idea to have the default return keys
? I'd say better return nothing
so that the caller knows that no labels are set (it's easy to call keys
if needed). Otherwise there's no way to know whether labels are present or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably have changed that back to just the axis after all the discussion around naming this. My thinking was that it would be odd if we returned something that didn't "label" the indexes at all because we are assuming that is what's returned.
Returning nothing
would require that every method that acts on index labels would require and extra step to check if it can actually look up values in it. Not an insurmountable issue, but it would certainly be tedious.
Throwing an error when this isn't defined also isn't an option because things like a vector transpose will inevitably have a dimension without labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't return nothing
when no labels are set, then we should add a separate function called e.g. has_index_labels
, a bit like we did in the Metadata PR with hasmetadata
and metadata
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely doable and is probably a good method to have irregardless of what the default we choose.
I'm not against returning something that clearly shows no labels are defined. Perhaps I'm being too strict with the rule that the return has the same indices as their corresponding axis, even for default values. I assume people will at least expect to be able to do things like findfirst(==(label), index_labels(A, dim))
, in which case we would probably still want a more informative error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth defining a special type just for that? What would be the advantage over Nothing
?
If in doubt, we could say that an error is thrown if has_index_labels
returns false
, as it allows doing anything we want later without breaking code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really worth defining a special type just for that? What would be the advantage over Nothing?
I'm just trying to figure out some generic representation of labels not being defined that can pass between methods. If we do vcat(a, b)
but one doesn't have index labels, then we can't do vcat(index_labels(a, 1), index_labels(b, 1))
. Even if we define something that doesn't error in this situation (e.g., vcat_index_labels(::AbstractVector, ::Nothing)
), how does this produce a new vector of index labels for the newly generated array? I figured UndefLabel
could represent every index without a label.
@rafaqz how are you handling this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaqz, it looks like NoLookup
is basically the same thing as what I'm describing. It wraps the non-labeled indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we want to have has_index_labels(T::Type)
or has_index_labels(x) = any(are_labels, index_labels)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this earlier. If your UndefLabels
is essentially my NoLookup
, that should be fine.
Returns a tuple of labels assigned to each axis or a collection of labels corresponding to | ||
axis `dim` of `x`. Default is to simply return `map(keys, axes(x))`. | ||
""" | ||
index_labels(x, dim) = index_labels(x, to_dims(x, dim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the function live in ArrayInterfaceCore so that existing "named array" packages can overload it? BTW, it would be good to ping their authors to ensure they would all be OK with the API, otherwise it won't make a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing packages can overload it from ArrayInterface
. If you're referring to supporting named dimensions like NamedDims.jl, then they define ArrayInterface.dimnames
and to_dims
maps to the appropriate dimension so they don't have to overload every method with a dim argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayInterface is relatively heavy, which is why ArrayInterfaceCore was created IIUC. I guess it's up to package authors to say whether a dependency on ArrayInterface is acceptable for them or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was more of an issue before we went through all the trouble to fix invalidations due to StaticInt
earlier this year. That doesn't mean we can't improve the situation. We are actively trying to move matured functionality into base where appropriate (see #340). I regularly review the code here in an effort to eliminate problematic code that still exists (e.g., redundancies, generated functions, etc,). For example, once we know how this PR is going to look I can finally finish an effort to consolidate a lot of "indexing.jl" by overloading base methods instead of reimplementing a lot of what's in base already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it's a bit of a chicken and egg issue. We use StaticSymbol
for dimension names known at compile time so that we can use them as a point of reference in an inferrible way. It's pretty difficult to do this only relying on constant propagation (demonstrated that with static sizes here JuliaLang/julia#44538 (comment)).
If someone has a reliable solution I'm open to it. I've been trying to actively address this and related issues for years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@oxinabox is it still a concern depending on ArrayInterface at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that ArrayInterface doesn't like Requires.jl a billion packages, I am much more comfortable depending upon it for NamedDims.jl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was just to put empty function definitions in ArrayInterfaceCore (like we do with DataAPI and StatsAPI). That doesn't prevent keeping fallback method definitions in ArrayInterface as this PR does. But packages that don't want to use fallback definitions are still able to overload the functions by depending only on ArrayInterfaceCore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been trying to avoid situations where the behavior of a method is different if ArrayInterface is loaded vs just ArrayInterfaceCore.
axislabels
method and Label
type indexingindex_labels
method and IndexLabel
type indexing
@mcabbott, would you mind providing some input if you have time? |
* removed support for nested traversal of `index_labels`. Support for this can be added later but it required making a lot of decisions about managing labels that up front that would be better addressed through iterative PRs * removed `has_index_labels`. There are some odd corner cases for this one. Particularly for `SubArrays` where the presence of labels in the parent don't always have a clear way of propagating forward. Again, we can address this one but it will take some decisions about how labels are propagated. * `UnlabelledIndices` and `LabelledIndices` are types that provide a more clear structure to what a label is and how they are accessed.
* removed support for nested traversal of `index_labels`. Support for this can be added later but it required making a lot of decisions about managing labels that up front that would be better addressed through iterative PRs * removed `has_index_labels`. There are some odd corner cases for this one. Particularly for `SubArrays` where the presence of labels in the parent don't always have a clear way of propagating forward. Again, we can address this one but it will take some decisions about how labels are propagated. * `UnlabelledIndices` and `LabelledIndices` are types that provide a more clear structure to what a label is and how they are accessed. For now access only is supported through `getlabels` and `setlabels`
* removed support for nested traversal of `index_labels`. Support for this can be added later but it required making a lot of decisions about managing labels that up front that would be better addressed through iterative PRs * removed `has_index_labels`. There are some odd corner cases for this one. Particularly for `SubArrays` where the presence of labels in the parent don't always have a clear way of propagating forward. Again, we can address this one but it will take some decisions about how labels are propagated. * `UnlabelledIndices` and `LabelledIndices` are types that provide a more clear structure to what a label is and how they are accessed. For now access only is supported through `getlabels` and `setlabels`
Anything static should go to https://github.com/JuliaArrays/StaticArrayInterface.jl |
Successor to #260. Implements basic tools for accessing multidimensional keys and fancy indexing.
The goal is to have a common backend for some of the clearly overlapping traits/method that go into
AxisArray
,DimensionalData
, andAxisKeys
. This should also help us get closer to helping other packages provide support for multiple fancy array types without explicitly depending on them.@johnnychen94 , @rafaqz , @mcabbott may be interested.