-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run HfApi
methods in the background (run_as_future
)
#1458
Conversation
API-wise, this makes sense to me. We'd need to adapt the internals of the |
Technical details:
EDIT: PR completely refactored since this comment. |
I like the design! We pretty much only use |
The documentation is not available anymore as the PR was closed or merged. |
HfApi
methods in the backgroundHfApi
methods in the background (upload_file_threaded
)
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1458 +/- ##
===========================================
+ Coverage 53.08% 82.13% +29.04%
===========================================
Files 53 54 +1
Lines 5638 5759 +121
===========================================
+ Hits 2993 4730 +1737
+ Misses 2645 1029 -1616
☔ View full report in Codecov by Sentry. |
Thank you for your PR! I can't find the docs for the I wanted to check the docs specifically as I'm wondering if we need to 2x the API + docs of the
WDYT? |
I liked the API of:
more than the current API because the current API has become pretty nested/hard to read: HfAPI inherits from _ThreadedHFApi which inherits from _HfApi. +1 to @LysandreJik that also it's not great if the API docs are fully duplicated. Think I also like the idea of adding a boolean flag instead. BTW why the name Also just an idea, could a context manager make sense here mabye? >>> import time
>>> from huggingface_hub import whoami
>>> whoami() # takes 1s
"Wauplin"
>>> with huggingface_hub.as_future():
>>> future = whoami() # instant
>>> time.sleep(1)
>>> future.result()
"Wauplin" Problem here though is that not all methods can be instantiated as a future and it's also not like the future closes when the context manager closes, so maybe not super intuitive. Overall to me the most natural design would indeed be a flag, like @LysandreJik proposed: >>> import time
>>> from huggingface_hub import whoami
>>> whoami() # takes 1s
"Wauplin"
>>> future = whoami(return_as_future=True) # instant
>>> time.sleep(1)
>>> future.result()
"Wauplin" |
Thanks a lot @LysandreJik @patrickvonplaten for your feedback. To be honest I have been very indecisive on how to design the API. A large part of this indecisiveness is that I don't like having a return value that is not typed correctly. It might seem like nitpicking but in my opinion type annotations really helps UX-wise and especially to help IDEs with auto-completion. I'm not too concerned when dealing with internal stuff but here we are talking about an exposed public API. That been said, I agree with both of you that duplicating all methods with a At first I suggested to do something like I did not investigate the idea of a context manager but I think we would have the same kind of problem. When doing >>> with huggingface_hub.as_future():
>>> future = whoami() # instant
>>> value = future.result() # takes 1s the IDE would not understand that "future" is a Future object (and not a dict). In the end, I agree with having a flag class Api:
@overload
def do_something(self, a: int, b: bool = ..., as_future: Literal[False] = ...) -> str: # type: ignore
...
@overload
def do_something(self, a: int, b: bool = ..., as_future: Literal[True] = ...) -> Future[str]:
...
@future_compatible
def do_something(self, a: int, b: bool = True, as_future: bool = False) -> Union[str, Future[str]]:
return f"{a} {b} {threading.current_thread().name}" Unless you disagree, I think I'll go with this solution. It's unfortunately quite verbose to write but UX-wise it should be nice to use for the user. I investigated a bit and the
(IMO, it's not so much needed for read-only methods like |
TL;DR:
And thanks again for the feedback! |
This looks like a good solution to me! |
HfApi
methods in the background (upload_file_threaded
)HfApi
methods in the background (run_as_future
)
@patrickvonplaten @LysandreJik Finally I implemented I did not want to implement it for all methods as it's quite verbose and it doesn't make sense in many cases. If a power user really want to run other methods in the background, they can use Example: >>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.run_as_future(api.create_repo, "username/my-model", exists_ok=True)
Future(...)
>>> api.upload_file(
... repo_id="username/my-model",
... path_in_repo="file.txt",
... path_or_fileobj=b"file content",
... run_as_future=True,
... )
Future(...) I have updated the PR description, added some tests and added a section in the docs. I think the PR is finally in a final state for review (:crossed_fingers:) :) |
I like it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Love the API, and the implementation looks correct.
Thanks for iterating!
@validate_hf_hub_args | ||
@future_compatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love how simple this becomes!
Youhou! Finally merging it :) Thanks for the reviews and feedback ❤️ |
Related to #939 (cc @LysandreJik @osanseviero @julien-c)
EDIT: PR completely refactored (so is this description)
EDIT 2: PR completely refactored (again)
This PR makes is possible to run
HfApi
methods in the background. I'm using aThreadPoolExecutor
with only 1 worker. This means that we are preserving the order in which the jobs are submitted. The returned value is aFuture
with methods like.is_done()
,.result()
and.exception()
already implemented so we don't have to support weird stuff inhuggingface_hub
.The goal of this PR is:
HfApi
in training scripts instead of the git-based methods. Similar toRepository.push(blocking=False)
but no need to clone the repo.UX-wise, we did several iterations. We came to the conclusion that having a
run_as_future:bool = False
argument in future-compatible methods was the way to go. It is not perfect but from a developer perspective, that should be the most practicle.Since the chosen solution is quite verbose in
hf_api.py
to get the correct type annotations/IDE autocompletion, only the most useful methods have arun_as_future: bool
argument :upload_file
,upload_folder
andcreate_commit
. For other methods, it is still possible to run them in the background usingHfApi.run_as_future
:TODO: