Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run HfApi methods in the background (run_as_future) #1458

Merged
merged 19 commits into from
May 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions docs/source/guides/upload.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,54 @@ but before that, all previous logs on the repo on deleted. All of this in a sing
... )
```

### Non-blocking upload

In some cases, you want to push data without blocking your main thread. This is particularly useful to upload logs and
artifacts while continuing a training. To do so, you can use the `run_as_future` argument in both [`upload_file] and
[`upload_folder`]. This will return a [`concurrent.futures.Future`](https://docs.python.org/3/library/concurrent.futures.html#future-objects)
object that you can use to check the status of the upload.

```py
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> future = api.upload_folder( # Upload in the background (non-blocking action)
... repo_id="username/my-model",
... folder_path="checkpoints-001",
... run_as_future=True,
... )
>>> future
Future(...)
>>> future.done()
False
>>> future.result() # Wait for the upload to complete (blocking action)
...
```

<Tip>

Background jobs are queued when using `run_as_future=True`. This means that you are guaranteed that the jobs will be
executed in the correct order.

</Tip>

Even though background jobs are mostly useful to upload data/create commits, you can queue any method you like using
[`run_as_future`]. For instance, you can use it to create a repo and then upload data to it in the background. The
built-in `run_as_future` argument in upload methods is just an alias around it.

```py
>>> from huggingface_hub import HfApi
>>> api = HfApi()
>>> api.run_as_future(api.create_repo, "username/my-model", exists_ok=True)
Future(...)
>>> api.upload_file(
... repo_id="username/my-model",
... path_in_repo="file.txt",
... path_or_fileobj=b"file content",
... run_as_future=True,
... )
Future(...)
```

### Upload a folder by chunks

[`upload_folder`] makes it easy to upload an entire folder to the Hub. However, for large folders (thousands of files or
Expand Down
2 changes: 2 additions & 0 deletions src/huggingface_hub/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@
"repo_type_and_id_from_hf_id",
"request_space_hardware",
"restart_space",
"run_as_future",
"set_space_sleep_time",
"space_info",
"unlike",
Expand Down Expand Up @@ -462,6 +463,7 @@ def __dir__():
repo_type_and_id_from_hf_id, # noqa: F401
request_space_hardware, # noqa: F401
restart_space, # noqa: F401
run_as_future, # noqa: F401
set_space_sleep_time, # noqa: F401
space_info, # noqa: F401
unlike, # noqa: F401
Expand Down
Loading