-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep lock files in a /locks
folder to prevent rare concurrency issue
#1659
Conversation
Hi @beeender thanks for raising the question here. I think the reason why we want to delete the lock file is to avoid polluting the cache system. Filelock has different ways of handling the file lock depending on the platform:
|
The documentation is not available anymore as the PR was closed or merged. |
@beeender after thinking about it and some discussions with @julien-c, we ended up with the conclusion:
That being said what we can do is to add a new flag ( (edit: another interesting read) |
This sounds good to me! Let me add that in the PR. |
Great, thanks! To do so, I would:
And that should be it. Please let me know if you need any help. Thanks in advance! |
774a0d8
to
a53c47b
Compare
+ def test_keep_lock_file(self):
+ """Lock files should be kept if HF_HUB_KEEP_LOCK_FILES is True
+ """
+ with SoftTemporaryDirectory() as tmpdir:
+ with patch("huggingface_hub.constants.HF_HUB_KEEP_LOCK_FILES", False):
+ hf_hub_download(DUMMY_MODEL_ID, filename=CONFIG_NAME, cache_dir=tmpdir)
+ for subdir, dirs, files in os.walk(tmpdir):
+ for file in files:
+ # No lock files should exist
+ self.assertNotRegex(file, ".*\.lock")
+
+ with SoftTemporaryDirectory() as tmpdir:
+ with patch("huggingface_hub.constants.HF_HUB_KEEP_LOCK_FILES", True):
+ hf_hub_download(DUMMY_MODEL_ID, filename=CONFIG_NAME, cache_dir=tmpdir)
+ lock_file_exist = False
+ for subdir, dirs, files in os.walk(tmpdir):
+ for file in files:
+ if file.endWith(".lock"):
+ lock_file_exist = True
+ break
+ self.assertTrue(lock_file_exist) @Wauplin My constants mocking doesn't work, do you have any idea? |
Since you are importing the constant directly in
and it should work. |
Magic! And tests have been added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @beeender! Everything looks good to me. I'll update the branch and run the CI. We are good to merge it once it's ✔️ ! 🎉 Second thoughts: we will remove the env variable. See #1659 (comment).
@beeender it seems the lock file is deleted on the CI Windows machine even when from logs (see https://github.com/huggingface/huggingface_hub/actions/runs/6238074721/job/16933112571?pr=1659):
EDIT: problem fixed (see #1659 (comment)). |
In another thought, shall we just create the lock file in the temp folder, so it does not have to be deleted.
This may not work if the cache folder is a sharing folder (nfs/samba) and downloading happens on multiple hosts. But I doubt the flock ever works with nfs/samba. |
I like the idea! What do you think about leaving the lock files in a separate directory but still in the
with That way we still have lock files but not messing around everywhere in the cache. We could have an utility to clean the locks folder (either after snapshot_download or when scanning/deleting the cache EDIT: etags are not unique across repos. Etag for an LFS file is its sha256 checksum. This means that if a same file is uploaded to several repos, it shares the same Etag. And if a user wants to download it from different repos, the blob files must be re-downloaded (that's the current behavior and we won't change that). Meaning that the lock_path = os.path.join(locks_dir, repo_folder_name(repo_id=repo_id, repo_type=repo_type), f"{etag}.lock")
# e.g.: ~/.cache/huggingface/hub/.locks/models--julien-c--EsperBERTo-small/46a9d58622b4675b29564da2d9ba73e702241c5fa969f12c387cad4aa984276a.lock |
yes, great idea! and a CLI script could take care of deleting .locks from time to time if needed. |
/locks
folder to prevent rare concurrency issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm making some cleaning on the repo so I renamed this PR accordingly with what we discussed in #1659 (comment). @beeender are you still interested in implementing it? :)
Yes! I was on vacation for the last week. Will create a PR soon. |
The lock file could be removed when the 2nd process gets the lock. And then the 3rd process will lock on a different lock file handle. Although the lock path stays the same. 1st proc gets the lock 2nd proc waits for the lock 1st proc releases the lock 1st proc removes the lock file 2nd proc gets the lock 3rd proc creates a new lock file and gets the lock Windows doesn't have this problem. This commit moves the lock files to a subdirectory of the hub cache, and don't remove it after downloading. The lock files are named with their 'etag' and '.lock' extension, placed in the 'HUGGINGFACE_HUB_CACHE/.locks/repo_folder_name' directory. The repo_folder_name is generated from 'repo_id' and 'repo_type', to avoid same 'etag' in different repos. Co-authored-by: Lucain <[email protected]>
@Wauplin PR has been updated. 🆙 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very clean PR! Thanks for making the change @beeender. I'm glad that we won't introduce a new environment variable like HF_HUB_KEEP_LOCK_FILES
and still be robust across processes! 🔥
@@ -1420,6 +1422,7 @@ def hf_hub_download( | |||
if os.name == "nt" and len(os.path.abspath(blob_path)) > 255: | |||
blob_path = "\\\\?\\" + os.path.abspath(blob_path) | |||
|
|||
Path(lock_path).parent.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
(fix tests in e670797. Scan-cache was failing to scan the |
The lock file could be removed when the 2nd process gets the lock.
And then the 3rd process will lock on a different lock file handle.
Although the lock path stays the same.
1st proc gets the lock
2nd proc waits for the lock
1st proc releases the lock
1st proc removes the lock file
2nd proc gets the lock
3rd proc creates a new lock file and gets the lock
Demo code to show the problem:
Add ENVHF_HUB_KEEP_LOCK_FILES
to give user option to keep the lock files to prevent concurrency issues.EDIT on 2023/1014
Windows doesn't have this problem.
This commit moves the lock files to a subdirectory of the hub cache,
and don't remove it after downloading.
The lock files are named with their
etag
and.lock
extension, placedin the
HUGGINGFACE_HUB_CACHE/.locks/repo_folder_name
directory. Therepo_folder_name is generated from
repo_id
andrepo_type
, to avoidsame
etag
in different repos.