-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cli] Uploads: add progress bar #2078
Conversation
see #2044 (comment) for context
Codecov Report
@@ Coverage Diff @@
## master #2078 +/- ##
==========================================
+ Coverage 83.16% 83.18% +0.01%
==========================================
Files 109 109
Lines 15858 15874 +16
==========================================
+ Hits 13188 13204 +16
Misses 2670 2670
Continue to review full report at Codecov.
|
transformers/hf_api.py
Outdated
@@ -148,6 +152,27 @@ def list_objs(self, token): | |||
|
|||
|
|||
|
|||
class ProgressFileReader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely non-standard, but definitely clever :)
I don't think many people would understand this behavior, as you have to know values are passed by reference in Python, and be familiar with io.BufferedReader
's API. I would document the ProgressFileReader
class to explain why we are doing this and why it works.
A more pythonic API would be to promote ProgressFileReader
to a context manager and open the file in a context so the uploading code would look like:
with stream_with_progressbar(filepath, "rb") as f:
r = requests.put(urls.write, data=f, headers={
"content-type": urls.type,
})
r.raise_for_status()
Also has the merit of being self-documenting and hiding the complex logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do a context manager, but I can't encapsulate f
completely like in your code sample, (passing f
to requests.put
) as I don't really know which methods of the io.BufferedReader requests
(and its underlying libraries) is going to call.
The cleanest way would be to subclass io.BufferedReader
and instantiate an instance of the child class with the parent instance, but I don't know how to do that in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, I think it's fine.
Might even be reused in other locations of the library and examples later so maybe put it in file_utils.py
and add it to __init__
?
transformers/hf_api.py
Outdated
@@ -129,10 +130,13 @@ def presign_and_upload(self, token, filename, filepath): | |||
# Even though we presign with the correct content-type, | |||
# the client still has to specify it when uploading the file. | |||
with open(filepath, "rb") as f: | |||
pf = ProgressFileReader(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it TqdmProgressFileReader
if it's included in file_utils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed it, let's move it to file_utils if/when we need it somewhere else.
see #2044 (comment) for context
There might be a more pythonic way (to do a "simple" method overriding) but I couldn't find it.