Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add hf_transfer upload #1395

Merged
merged 24 commits into from
Mar 24, 2023
Merged

feat: add hf_transfer upload #1395

merged 24 commits into from
Mar 24, 2023

Conversation

McPatate
Copy link
Member

@McPatate McPatate commented Mar 16, 2023

Add hf_transfer.upload

Happy to discuss if there is a more appropriate way to do things, this seems to me like a good implementation.

Linked to huggingface/hf_transfer#7 & huggingface/moon-landing#5722

Here's the result of my test:

$> export HF_HUB_ENABLE_HF_TRANSFER=1
$> time python3 test_hf_transfer_upload.py
python3 test_hf_transfer_upload.py  0.78s user 0.21s system 28% cpu 3.446 total

$> unset HF_HUB_ENABLE_HF_TRANSFER
$> time python3 test_hf_transfer_upload.py
python3 test_hf_transfer_upload.py  0.47s user 0.08s system 4% cpu 10.996 total

The file I tested for is 50MiB and I tested on my personal laptop on my home network, so it may even be more than x3 improvement !

@McPatate McPatate requested review from Narsil and Wauplin March 16, 2023 16:49
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Mar 16, 2023

The documentation is not available anymore as the PR was closed or merged.

@McPatate McPatate marked this pull request as ready for review March 16, 2023 20:45
@codecov
Copy link

codecov bot commented Mar 16, 2023

Codecov Report

Patch coverage: 48.78% and project coverage change: -0.33 ⚠️

Comparison is base (294636b) 84.30% compared to head (60f01bc) 83.97%.

❗ Current head 60f01bc differs from pull request most recent head 6c0ad0a. Consider uploading reports for the commit 6c0ad0a to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1395      +/-   ##
==========================================
- Coverage   84.30%   83.97%   -0.33%     
==========================================
  Files          48       48              
  Lines        4956     4980      +24     
==========================================
+ Hits         4178     4182       +4     
- Misses        778      798      +20     
Impacted Files Coverage Δ
src/huggingface_hub/file_download.py 84.72% <ø> (-0.41%) ⬇️
src/huggingface_hub/lfs.py 80.00% <ø> (+0.64%) ⬆️
src/huggingface_hub/_commit_api.py 84.03% <48.78%> (-8.79%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @McPatate! (and the one in hf_transfer). I left a few comments but yes, I definitely think it's a good integration. I've looked at huggingface/hf_transfer#7 but honestly don't have much feedback on it. I don't think it's worth to make parallel_failures configurable if it's to set it to max_files-1 but that's all.

Out of curiosity, is there a concept of threads in Rust i.e. is it using os-based threads or is it clever than that? And in the implementation, do you control the number of parallel upload at the same time or not?

src/huggingface_hub/_commit_api.py Show resolved Hide resolved
src/huggingface_hub/_commit_api.py Outdated Show resolved Hide resolved
src/huggingface_hub/_commit_api.py Outdated Show resolved Hide resolved
@McPatate
Copy link
Member Author

Out of curiosity, is there a concept of threads in Rust i.e. is it using os-based threads or is it clever than that? And in the implementation, do you control the number of parallel upload at the same time or not?

Here you go: https://docs.rs/tokio/latest/tokio/task/#what-are-tasks 😄
You also can spawn OS threads with spawn_blocking. But basically, IIUC, tokio has by default a multithreaded runtime and has its own task scheduler on top (we use the mt runtime in hf transfer).

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving it but let's wait for @Narsil's review + merge and deploy huggingface/hf_transfer#7 first

src/huggingface_hub/_commit_api.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, good to me otherwise

src/huggingface_hub/_commit_api.py Outdated Show resolved Hide resolved
@McPatate McPatate requested a review from Wauplin March 22, 2023 09:51
Copy link
Member Author

@McPatate McPatate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm ! thanks for cleaning the code up, looks better like this

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always weird to review stuff that we just changed 😄 But PR looks good to me now :)

Thanks for adding this @McPatate! Making sure that everything's aligned in 3 repos is never an easy task 🥳

@Wauplin Wauplin changed the title feat: add hf_transfer upload [PAUSED] feat: add hf_transfer upload Mar 24, 2023
@Wauplin Wauplin changed the title [PAUSED] feat: add hf_transfer upload feat: add hf_transfer upload Mar 24, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Mar 24, 2023

🎉 🎉 🎉

@Wauplin Wauplin merged commit 129994a into main Mar 24, 2023
@Wauplin Wauplin deleted the feat/add_hf_transfer_upload branch March 24, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants