Skip to content
This repository has been archived by the owner on Jan 27, 2023. It is now read-only.

Avoid dynamic creation of ArrayMethods classes and new wrapjaggedmethods decorator #48

Merged
merged 3 commits into from
Apr 11, 2019
Merged

Avoid dynamic creation of ArrayMethods classes and new wrapjaggedmethods decorator #48

merged 3 commits into from
Apr 11, 2019

Conversation

guitargeek
Copy link
Contributor

Hi, I suggest here a fix for the issue I explained in scikit-hep/awkward-0.x#120.

The problem is the dynamic creation of classes in the _unwrapped_jagged function, which causes classes which you would expect to have the same type to actually not the same type.

I suggest two solution approaches which both work for my use case.

  1. create the new types ass class members (not instance members) of the classes that use them
  2. create the new types outside of the class, just in the scope of the file

I applied solution 2) to the from_ptetaphim function. Creating the types outside the class actually has one more cool benefit: you can use the types with a new decorator I called for now wrapjaggedmethod, which takes care of the _normalize_arrays, _unwrapped_jagged and re-wrapping into a jagged array for you.

I hope these Ideas could be useful to reduce code duplication as well. What to you think? Sorry again for initially implementing a lousy unit test which didn't catch this before...

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very good point—dynamically creating the classes makes them fail the type(self) == type(other) check in concatenation. Your scikit-hep/awkward-0.x#120 PR won't be testable until this is fully deployed, so I added a new version number to work through that.

However, the same issue is going to affect jagged arrays of Lorentz vectors from uproot, which are also based on a dynamically created type and doesn't go through this code. (A) Maybe it should go through this mechanism, but that would be adding a special case to uproot: it's different because uproot doesn't know about TLorentzVectors; it constructs them according to a recipe derived from streamers. That generality is a good thing. (B) Maybe the test in concatenate can be loosened to a general awkward.util.equivalent_class function that only ensures that two classes have the same set of __bases__.

If concatenate is generalized like that, then this fix wouldn't be strictly necessary. However, I see it as an improvement even if it's not motivated by the concatenate issue.

@jpivarski jpivarski merged commit 21e7d59 into scikit-hep:master Apr 11, 2019
@guitargeek guitargeek deleted the concatenate_mixin_bug branch April 11, 2019 18:04
@nsmith-
Copy link
Member

nsmith- commented Apr 11, 2019

If I understand correctly, this wrapjagged is exactly what I was looking for in scikit-hep/coffea#60 (comment)
This is a procedure we use often in that package, it would be good for the facility to be available in awkward itself.

@jpivarski
Copy link
Member

I was just about to give awkward a new semimajor version, which is a thing that would be required if we want to introduce a cross-repo dependency like this.

The whole file uproot_methods/wrapjagged.py could be moved into awkward/util.py. Do you want me to do that?

@guitargeek
Copy link
Contributor Author

To me that sounds like a good idea! I would also suggest to decorate all the other from_xyz methods with wrapjaggedmethod, to close the circle. I would volunteer to do that quickly.

The name wrapjaggedmethod is probably not optimal by the way, because it suggest it can only be with classmethods and not with regular functions.

@jpivarski
Copy link
Member

@guitargeek Could you create a PR of how you'd like it to look in awkward and then in uproot-methods? I understand that the uproot-methods PR won't succeed until the awkward PR is deployed, at least to a prerelease. I'm going to start a prerelease cycle for all three to get the machinery started.

If it's a large project, please let me know and we'll put it off for later. I want to have a pip-installable version for a talk on Wednesday. Thanks!

@nsmith-
Copy link
Member

nsmith- commented Apr 11, 2019

This makes implementing a few non-ufunc numpy ops trivial, so would be very useful I think!
Perhaps the broader idea is a decorator that wraps a function that normally accepts numpy arrays and allows it to be used with awkward (i.e. not just jagged) arrays?

@jpivarski
Copy link
Member

@nsmith- If you mean "any function," then that's too broad. I don't see a way to do it in general.

The important thing right now—at the threshold of awkward version 0.9.0, is to get in the API dependencies between awkward and uproot-methods to get the functionality above to work. We need general names, because we can't change names until the next semimajor version, but we can leave holes for additional functionality. For instance, the wrapper might only work on JaggedArrays right now, but we name it something general and let it raise NotImplementedError on non-JaggedArray cases. As long as the non-JaggedArray cases can be implemented in only one repo—awkward and not uproot-methods, for instance—then we can do that at any time and it wouldn't require a new version. It would be a future awkward feature, not a combined awkward/uproot-methods feature, like this one.

@guitargeek
Copy link
Contributor Author

Alright, is it ok if I quickly come up with two PRs each in awkward and uproot-methods and then we can continue discussing? For now I will rename the decorator to just awkwardfunction I think, but maybe you will find a better name.

@jpivarski
Copy link
Member

Yes, focus on the two PRs—no new capabilities, just organizing this capability better. The name will have to explain what it does in contrast to other awkward.util functions, "awkwardfunction" doesn't tell me anything. It would still be correct to call this "wrap" and "unwrap", right? How about unwrap_awkward and wrap_awkward?

I used the word "normalize" in a CS way while that function was private (with an underscore). If it's public and physicists will see it, there's a chance of confusion (making the norm 1). Maybe it should be called uniformly_awkward? It makes all the given arrays have the same structure.

@guitargeek
Copy link
Contributor Author

guitargeek commented Apr 11, 2019

Haha right awkwarfunction is a terribly generic name!

Not all functions have to be public, the underscore could still exist for everything else than the decorator.
It could be:
_normalize_arrays -> _wrap_awkward
_unwrap_jagged -> _unwrap_awkward
wrapjaggedmethod -> ?? (the decorator)

edit: got the mapping wrong and corrected. Anyway I will not worry about the names now, I cant seem to come up with something good.

@jpivarski
Copy link
Member

If uproot-methods is going to call it, the methods must be public. (Underscore means we're free to change it, but if another library, like uproot-methods, depends on it, then changing it would break that without us knowing. That happened once—it's a disaster I'd like to avoid.)

I'm confused by your mapping. Why does "unwrap" become "wrap"? That's the opposite. What _normalize_arrays did wasn't wrapping or unwrapping—it was getting all the arrays in a set to the same level of jaggedness, even if some were scalars or flat arrays. (E.g. allows us to set jagged arrays on pt, eta, phi but a scalar on mass.)

@jpivarski
Copy link
Member

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants