-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hooks for before/after instantiating objects #170
Comments
Thank you very much for the detailed proposal! I will first comment about a potential I am not sure what you mean by def get_data_content(data):
data.setup()
return data.content
...
parser.link_arguments('data', 'model.content', apply_on='instantiate', compute_fn=get_data_content) With this, debugging is easy because a break point can be set inside However, for this specific case I would recommend something a bit different. First, it might be convenient that calling class Data:
def __init__(self, fpath=None):
...
self._already_setup = False
def setup(self):
if self._already_setup:
return
self._content = ...
...
self._already_setup = True
@property
def content(self):
self.setup()
return self._content Then the link is created as |
Regarding a potential |
I call it an abuse because it's linking "data" to "model.content", even though the goal is to link "data.content" to "model.content". The function isn't just returning a modified value, it's determining the attribute that needs to be linked as well. I think this is conceptually inelegant, but that's just an opinion, I think the measurable objective criteria are that using link arguments like this:
Because I perceive a purpose of jsonargparse to be reducing boilerplate (otherwise why not just use stdlib argparse?) and because it's non-obvious that link_arguments is the way to accomplish this sort of behavior, I feel it's accurate to call the workaround an abuse.
Debugging might be easier but if you have to do this for more than one property it becomes cumbersome. In the MWE I only included on property, but in my system I'm doing this for multiple properties. The workaround solution I came up with is: def data_value_getter(key):
# Hack to call setup on the datamodule before linking args
def get_value(data):
if not data.did_setup:
data.setup('fit')
return getattr(data, key)
return get_value which naively makes a help message like:
And yes, I could modify the name of the returned function to make it more clear, and I probably will in the short-term, but I wan't to stress that is more boilerplate on the user-side, which makes the definition of the CLI less clean. And yes, I have augmented my data setup to include logic to be lazy on secondary calls, but again, that adds boilerplate on the userside.
The note on the propriety that computes Consider the use case (which happens to be similar to the situation I find myself in) where a new engineer (not-me) on a team wants to move an existing system to jsonargparse. Demonstrating the benefits of jsonargparse to other team members (me) is much easier when it doesn't need to impact the way other code is written in order to slot itself in.
Yes, it's more for completion with the original idea of a before / after hook. Really either one of them would work for my use case. I'm less concerned on the exact details of the hook, and more about adding some conceptually clear method for handling the details of object instantiation, and one of my goals with this discussion is to determine what the best way to do that is, because I don't think my specific proposal is the best way to do it. Something like
I feel like a full blown hook system is probably "the right way ™" to accomplish my and potentially many other use-cases, but without motivation for those other use-cases I realize that's a hard sell. But then again, that's a lot of what argparse is doing under the hood, so maybe it isn't that hard of a sell, although it is not the quickest path to addressing my particular problem, but it might be the most sustainable path for jsonargparse in general. Now that I'm thinking about it, adding a hook system is about as much work as adding anything of the more specific alternatives. Assuming my arguments as to why jsonargparse would benefit from such a feature have been convincing, perhaps the best way forward would be to implement a hook system with exactly one event: after_instantiate, and then add more events if use cases arise? |
Sorry, but I am not convinced. In the use case that you described, the best solution is to add a property to the jsonargparse has plenty of features that makes it worth using instead of argparse or even other parsing libraries. It is not just about reduction of boilerplate. Certainly jsonargparse can be used without impacting how the other code is written. But if it makes sense to change the other code, then it should be recommended to do so. In the Also as I mentioned before, I do know of a use case that would require a hook before instantiation, and currently simply there is no way to do it. But that would not help in the use case you mentioned. Unless in that hook the class is instantiated, which would be useless since jsonargparse would instantiate again afterwards. Registering a custom instantiator, like the |
Fair points, but the thing I'm getting hung up on is that the destination seems superfluous, I just want to postprocess and argument without worrying about what else it might be linked to. Yet, the maintenance cost is real, and I respect that. Now that I'm framing it like this what about this, I'm thinking of an alternative. What if jsonargparse is modified to allow the source and destination of parser.link_arguments('data', 'data', compute_fn=...) I half expected this to just work when I tried it, but it did fail due to cycles in the graph. However, the modification to simply ignore self-loops in this check and then respect those self-loops in further processing would be a much smaller modification than adding a second way to do something. Does this sound any better? |
There is a missing feature in parser.link_arguments(None, 'some.nested.key.when', compute_fn=two_days_from_now) And in the help the link would be shown as:
It does sound better. If source can be parser.link_arguments('data', None, compute_fn=call_setup) And in the help the link would be shown as:
It is a bit strange that the method is called |
I was thinking about having target as However, when target is
The way I would think about it is that you are linking something to the target argument. It could be from a secondary source argument, or via a argumentless I'll see if I can code up this feature and submit a PR. |
Would be great if you contribute. But maybe don't implement just yet. Now I have some doubts. Your plan is to change the I fear that even having this in |
There is another reason to not rush into implementation. As I mentioned, I have a use case that requires to manipulate arguments before instantiation. Is is better to first have clarity on how that would be implemented to avoid redundancy of features. |
The topological sort would be run on the graph where self-loops are removed. The standard graph algorithm is just a composition of primitive operations: remove self-loops, run topological sort. That produces a node order, and then the original graph is iterated over where the self loops still exist. (I'm I have a good deal of experience working with graph algorithms, networkx specifically. I'm confident in this idea, so let me know if there is anything else that needs further explanation).
The proposed feature is a natural extension of the current way that link_parameters works, and it results in a conceptually easy way to accomplish the task I'm trying to do. If I was teaching this to someone I would not want to tell them: well you have to make a nested closure that returns a function to do what you want, and you have to connect it to something, even if you don't directly mean to use that. I care a lot about making my CLI definition clear, and the combination of requiring that I connect Your use case would also be supported in the PR I have in mind. Consider a link-arguments graph where a special node is marked as None. That will represent the None source, and if it is linked to a target node, then it will populate that node with None. However, if a compute_fn is specified, then you can return whatever you want (e.g. The following rules capture all use cases discussed here:
|
Good to hear that you have experience with graphs. Just note, a directed graph can have many topological orders. Right now any topological order will do. But with the possible The use case that I was saying is not the one from the
If you were teaching this to someone, then teach them that the class should implement a property like I was saying before. The ease of use of a class such as |
Yes, because the topological sort guarantees you see all ancestors of a node before you see a node. I can't think of any reason why non-uniqueness would matter. There's no way self-loops would mess up instantiation order.
Ah, so you want to have access to all parameters before instantiating the object, and then effectively construct and return the object yourself? That does go beyond the current feature I had in mind. But such a feature would also work for what I want to do. I don't think it fits neatly into link arguments, which is effectively adding special edges in a graph where the nodes have already been determined. From a graph level, this feature would need to add in a new node to handle the arguments before passing them to the intended target.
I'm feeling like you're focusing on the minimal working example a bit too much. I have to maintain a large system that requires backwards compatibility and sometimes changing other code for the better is not an option. I have to work with code from a lot of different people of various Python skill levels. Getting them to change the way they are doing things is not always easy. Instead it's much easier if I can just add in some post processing hooks and not have to redesign every class I'm given. Furthermore, I think the idea stands on its own merits as a natural extension of what currently exists. Link arguments defines a attribute setting graph with an optional function on each edge. It is natural for a graph like that to have a null source / sink and to allow self loops. If I haven't satisfied you with my use case by now, then I don't think I will be able to. If you want to reject the idea that's fine. If I'm the only person who would ever use this, then I understand it's not worth the maintenance cost. |
I am not rejecting the idea. I just don't want to rush into any decision. In fact I am very grateful about this discussion. When I have more time I will describe here the other use case. It would be great to hear your thoughts on it. |
The use case that I was mentioning is the following. In One known use case for this would be a link from |
Another use case can be Lightning-AI/pytorch-lightning#15427. The |
…opers to implement custom instantiation (#170).
…ers to implement custom instantiation (#170).
I created pull request #326 implementing an
The implemented @Erotemic would this cover your use case? |
…ers to implement custom instantiation (#170).
Yes I think it would. I know what the specific class I'm interested in is, so I'd be able to provide that a-priori. |
…ers to implement custom instantiation (#170).
…ers to implement custom instantiation (#170).
🚀 Feature request
Add methods
jsonargparse.ArgumentParser
that let the user specify custom code that can be executed before / after an object is instantiated.Motivation
I'm working on porting the ML system I'm working on to LightningCLI with jsonargparse.
Some of the design goals of this system are:
So far jsonargparse is helping a lot and I'm very impressed with the library. It's by far the best argparse alternative I've come across. However, I'm having an issue slotting it into my system, because I require that my data module introspects my underlying dataset and passes relevant parameters to the model. Unfortunately these are not available at initialization time. I have a MWE illustrating the issue:
In the above example, the model cannot be instantiated because the content of the file hasn't been read by the Data class yet.
Pitch
Perhaps adding methods like:
parser.add_before_instantiate(key, fn): "key specifies the argument, and fn is passed the key, cls, and init_args"
parser.add_after_instantiate(key, fn): "key specifies the argument, and fn is passed the instantiated object"
But idk if these are the best options. This could also be done by adding
before_instantiate
andafter_instantiate
arguments toadd_class_arguments
. Or perhaps the before / after is shirked entirely andinstantiate
could be given as a function where the user is responsible for actually creating the object. I'm not super concerned about the details, I just want a way to control the creation of objects with more fidelity than is currently possible.I would like to be able to do something like:
Where I can control some pre/post processing step after instantiation happens. That way, in this instance, I can ensure that the correct value is populated in data before link_arguments connects data.content to model.content.
Alternatives
The natural question is then: why don't I just call
setup()
inData.__init__
? The reason is that jsonargparse may not be the only object that needs to useData
. And in general I think it's a good design practice to keep as much work out of__init__
methods as possible (it makes debugging objects easier if you can always make an instance).But what about a wrapper class that only jsonargparse gets to see? Well, now we are adding more boilerplate that jsonargparse is designed to avoid.
There is an alternative that I'm currently using, although I really don't like it. I'm effectively hacking
link_arguments
to do something similar enough to what I want. Namely:This abuses link arguments to get an instance of "data" before an instance of "model" is created and then applies the custom logic.
I think this is a clear anti-pattern, but unfortunately I don't see a viable alternative to it.
Discussion
I'm open to being the person who implements this PR, but I'd like to get feedback on the issue first. I'm pretty sure I'm not missing anything existing in jsonarparse that could solve my issue, but if I am, I'd love to be pointed to it and just close the issue. However, if this feature is not supported and deemed useful enough to add support for it, I think there should be discussion about how it is implemented.
Adding two methods
add_after_instantiate
andadd_before_instantiate
seems like it adds too much clutter to the top-level API. I'm wondering if this should be implemented as a more general hook system:e.g. add_hook(hook_name, hook_fn)
where the first hooks added are for the instantiate cases I've mentioned.Although I'd accept adding hook args to
add_class_arguments
as a solution to my issue, it isn't ideal in my case because I'd have to modify LightningCLI to gain access to that feature. Not the end of the world, but I think a method attached to the parser itself that lets you add these hooks would be more valuable.The other option I'd be completely happy with is an
add_instantiator
that allows the allows the user to override exactly how instantiation happens for an object.Summary
I'd like to customize instantiated objects either before and/or after they are created. Is this in the scope of jsonargparse? What are the thoughts of other users / maintainers?
The text was updated successfully, but these errors were encountered: