You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we have a recent issue at JeanZay with downloaded models not getting correct group perms despite umask.
It probably has to do with the tempfile facility not respecting umask settings when the files are moved from temp to their final destination outside of /tmp.
I think it got introduced when this whole new structure with blobs was added.
Reproduction
$ umask
0007
# let's download and cache a tokenizer
$ python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('gpt2')"
$ ls -l ../path/models/models--gpt2/blobs/
total 2.8M
-rw------- 1 unj46ad six 665 Sep 14 19:08 10c66461e4c109db5a2196bff4bb59be30396ed8
-rw------- 1 unj46ad six 1018K Sep 14 19:08 1f1d9aaca301414e7f6c9396df506798ff4eb9a6
-rw------- 1 unj46ad six 446K Sep 14 19:08 226b0752cac7789c48f0cb3ec53eda48b7be36cc
-rw------- 1 unj46ad six 1.3M Sep 14 19:08 4b988bccc9dc5adacd403c00b4704976196548f8
# these all miss group perms! note that other files have correct perms:
$ ls -l ../path/models/models--gpt2/refs/
total 512
-rw-rw---- 1 unj46ad six 40 Nov 22 03:25 main
so only the `blobs` sub-dir has this problem.
# have to manually fix with:
$ chmod g+rw ../path/models/models--gpt2/blobs/*
# but most users are unaware of this problem resulting in downloads that nobody else in the group can use
this is a big issue in a shared environment when the creator is not around to fix it manually and others can't load the model.
I'm guessing here that it's a hub API and not transformers directly, but both probably need to be aware of this if it's not this project.
If I remember correctly we had a similar issue with datasets API and it was modified to check umask and manually adjust the file perms when they went through tempfile download - as you know tempfile will set perms to user-only. That was a long time, I wonder if this was the right PR: huggingface/datasets#2157
Thank you!
System info
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.11.0
- Platform: Linux-4.18.0-305.65.1.el8_4.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /linkhome/rech/genhug01/ura81os/.huggingface/token
- Has saved token ?: True
- Who am I ?: stas
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: 2.8.0
- Torch: 1.11.0+cu115
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
The text was updated successfully, but these errors were encountered:
Describe the bug
we have a recent issue at JeanZay with downloaded models not getting correct group perms despite
umask
.It probably has to do with the
tempfile
facility not respectingumask
settings when the files are moved from temp to their final destination outside of/tmp
.I think it got introduced when this whole new structure with
blobs
was added.Reproduction
this is a big issue in a shared environment when the creator is not around to fix it manually and others can't load the model.
I'm guessing here that it's a hub API and not
transformers
directly, but both probably need to be aware of this if it's not this project.If I remember correctly we had a similar issue with
datasets
API and it was modified to checkumask
and manually adjust the file perms when they went throughtempfile
download - as you knowtempfile
will set perms to user-only. That was a long time, I wonder if this was the right PR: huggingface/datasets#2157Thank you!
System info
The text was updated successfully, but these errors were encountered: