-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Progress bars #261
Progress bars #261
Conversation
) | ||
with lfs_log_progress(): | ||
subprocess.run( | ||
f"git lfs clone {repo_url} .".split(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the switch from git clone
to git lfs clone
. They're similar in terms of speed, but the smudge happens at different times.
Using git clone
means that the process will not log any progress at first, before downloading all files - it will not log any "clean" operation. During at least half of the time, no progress will be shown to the user.
Using git lfs clone
, the process will download the files before applying the clean filter. Both of these steps will be logged to the file, which can be shown to the user for feedback.
@@ -546,6 +644,8 @@ def auto_track_large_files(self, pattern="."): | |||
# Cleanup the .gitattributes if files were deleted | |||
self.lfs_untrack(deleted_files) | |||
|
|||
return files_to_be_staged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning the files to be staged to the user so as to warn them that the add
operation might take a bit of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks for it!
I took a couple of passes but this looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking nice! This is going to be a great addition :-)
subprocess.run( | ||
args, | ||
check=True, | ||
encoding="utf-8", | ||
cwd=self.local_dir, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really can't fit in one line (at least all the args on a new line?). Not used to 80 chars max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm contemplating switching the default to be the same as transformers
' on a daily basis :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You definitely should!
@contextmanager | ||
def lfs_log_progress(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice API!
src/huggingface_hub/repository.py
Outdated
current_lfs_progress_value = os.environ.get("GIT_LFS_PROGRESS", "") | ||
|
||
with tempfile.TemporaryDirectory() as tmpdir: | ||
os.environ["GIT_LFS_PROGRESS"] = tmpdir + "/lfs_progress" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the "/" work on any OS (Windows)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! I'll update to os.path.join
. Thank you!
Tested on Windows, it works! Working on a PR enabling Windows tests. |
This PR adds progress bars to the
Repository
utility. The progress bars are visible for large files (>10MB) and any other file tracked with git-lfs.It adds a context manager to be used internally as a wrapper for subprocess calls. This uses the
GIT_LFS_PROGRESS
environment variable to output the progress to a file, tails the file and parses each line. It makes use of temporary directories in order to be used in parallel across the system. Usage is the following:It works both in notebooks and scripts. Example with a clone:
Cloning in a script
Cloning in a notebook
It is visible for the
clone
,pull
andpush
git commands.This is the first part of a logging overhaul; the second part will be to have a logging manager similar to
transformers
' anddatasets
'