Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_pretrained download doesn't respect umask #1215

Closed
stas00 opened this issue Nov 22, 2022 · 1 comment · Fixed by #1220
Closed

from_pretrained download doesn't respect umask #1215

stas00 opened this issue Nov 22, 2022 · 1 comment · Fixed by #1220
Labels
bug Something isn't working

Comments

@stas00
Copy link
Contributor

stas00 commented Nov 22, 2022

Describe the bug

we have a recent issue at JeanZay with downloaded models not getting correct group perms despite umask.

It probably has to do with the tempfile facility not respecting umask settings when the files are moved from temp to their final destination outside of /tmp.

I think it got introduced when this whole new structure with blobs was added.

Reproduction

$ umask
0007

# let's download and cache a tokenizer
$ python -c "from transformers import AutoTokenizer; AutoTokenizer.from_pretrained('gpt2')"

$ ls -l ../path/models/models--gpt2/blobs/
total 2.8M
-rw------- 1 unj46ad six   665 Sep 14 19:08 10c66461e4c109db5a2196bff4bb59be30396ed8
-rw------- 1 unj46ad six 1018K Sep 14 19:08 1f1d9aaca301414e7f6c9396df506798ff4eb9a6
-rw------- 1 unj46ad six  446K Sep 14 19:08 226b0752cac7789c48f0cb3ec53eda48b7be36cc
-rw------- 1 unj46ad six  1.3M Sep 14 19:08 4b988bccc9dc5adacd403c00b4704976196548f8

# these all miss group perms! note that other files have correct perms:
$ ls -l ../path/models/models--gpt2/refs/
total 512
-rw-rw---- 1 unj46ad six 40 Nov 22 03:25 main

so only the `blobs` sub-dir has this problem.

# have to manually fix with:
$ chmod g+rw ../path/models/models--gpt2/blobs/*
# but most users are unaware of this problem resulting in downloads that nobody else in the group can use 

this is a big issue in a shared environment when the creator is not around to fix it manually and others can't load the model.

I'm guessing here that it's a hub API and not transformers directly, but both probably need to be aware of this if it's not this project.

If I remember correctly we had a similar issue with datasets API and it was modified to check umask and manually adjust the file perms when they went through tempfile download - as you know tempfile will set perms to user-only. That was a long time, I wonder if this was the right PR: huggingface/datasets#2157

Thank you!

System info

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.11.0
- Platform: Linux-4.18.0-305.65.1.el8_4.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /linkhome/rech/genhug01/ura81os/.huggingface/token
- Has saved token ?: True
- Who am I ?: stas
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: 2.8.0
- Torch: 1.11.0+cu115
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
@stas00 stas00 added the bug Something isn't working label Nov 22, 2022
@Wauplin
Copy link
Contributor

Wauplin commented Nov 22, 2022

Hi @stas00 , thanks for reporting. It is a duplicate of #1141 that we did not fix yet. Will prioritize it more :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants