-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Git: find a "better" way to handle tokens than git credential store #1051
Comments
For context, do you think |
Based on @LysandreJik's comment #939 (comment), |
kinda related, @julien-c do you know if we track |
(lysandre will know best but i think it might be the support for asynchronous aka. non-blocking pushes i.e. decoupling committing and pushing, which is useful during training as you need to push large files and don't want that to be blocking) (Also it's quite natural, indeed, to have a local git repo for a model you are currently training) |
About async commit/push, I totally get it (we can give priority to #939). About having a local git repo for a model you are training, that's something the git-based approach cannot be replaced right ? So we should not expect |
That's an open question. I think it would be interesting to poll the community on this (in particular maintainers of downstream libraries, both HF and third party ones) |
About the issue in general: An important aspect that we would want to keep were we to move away from using The token stored in
This is only possible if git has access to the credential helper. About the This can be the case when:
and additionally, but this could arguably be added to the HTTP methods as well if we decide it's worth it, but the fact that it can cleanly handle async stuff by using git/git-lfs under the hood makes it reliable. I would also recommend reading #321 where considering non-git approaches wasn't 100% accepted. |
Thanks for reminding this here @LysandreJik ! TBH I completely forgot about this aspect. Dunno yet how to tackle the issue but good to know that the solution has to be through git credential. |
I would challenge the fact that
should work. (As a matter of fact, it doesn't work currently (i.e. with latest release, before #1053), in both Colab and anywhere a user doesn't have helper=store) If no-one has complained about it not working, I think we should just drop it – but still support If we insist on supporting the first use case, I would only do on Colab like was done in #1053 |
I just made some tests with the abstract level that git uses ( If I run:
It will store the credentials in the default helper defined on the user's machine. So what we could do is to simply change the current command (from
with its abstract version. Same for deleting the token (currently using Using the default git helper on the machine instead of
And for users without any git helper configured (e.g. me until 1 week ago 👋), we either show a warning or help them configure a git helper though a CLI. This is already the case anyway. EDIT: implemented workflow slightly differs from the one below (see description in #1138 (comment)) Side note from @julien-c: we need to think about the case where a token is already stored in the user helper (as we do not want to overwrite an existing value). To summarize, here is a workflow I see:
Does that cover every possible use case? |
I would be in favor of making For being a user that almost never used the (edit: my point is that not having complains doesn't mean some users doesn't struggle with it) |
Your proposed workflow looks generally good to me @Wauplin, I would tweak it at the margins with:
WDYT? |
Ok in general. For I just wonder if a user already have an helper with a value and does |
But I'm not too opinionated on those 2 corner cases if you think no warning is fine as well. |
Yes, up to you. A prompt might be ok! |
I started to work on this issue and it was nearly good until I found a quite annoying issue. The plan now is to use My problem is when I try to read the credential when they are not set. What I'd like is to return Here is a standalone script to reproduce the problem. Any idea for a workaround for this ? import subprocess
ENDPOINT = "https://random_endpoint.hf"
# Store credential for "toto"
with subprocess.Popen(
["git", "credential", "approve"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
encoding="utf-8",
) as process:
process.stdin.write(f"url={ENDPOINT}\nusername=toto\npassword=secret\n\n")
process.stdin.flush()
# Retrieve credentials for "toto" -> works
print("Get credentials when stored:")
with subprocess.Popen(
["git", "credential", "fill"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
encoding="utf-8",
) as process:
process.stdin.write(f"url={ENDPOINT}\n\n")
process.stdin.flush()
print(process.stdout.read())
# Delete credentials for "toto"
with subprocess.Popen(
["git", "credential", "reject"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
encoding="utf-8",
) as process:
process.stdin.write(f"url={ENDPOINT}\nusername=toto\n\n")
process.stdin.flush()
# Retrieve credentials for "toto" -> prompt user !
print("Get credentials when not stored:")
with subprocess.Popen(
["git", "credential", "fill"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
encoding="utf-8",
) as process:
process.stdin.write(f"url={ENDPOINT}\n\n")
process.stdin.flush()
print(process.stdout.read()) FYI, |
Context: I try to adapt the existing @julien-c @LysandreJik any idea ? |
disclaimer: 😈 Should we just remove this feature and see what happens? I suspect not a long of stuff would break, and stuff that breaks we can override the remote to include token, or something |
So we are back to the question "do we want to support Because otherwise we can also have a checkbox/prompt "Save token as git credential ?" and let the user decide. No need to know if we are overwriting something or not, it is up to the user. I feel that this checkbox is mainly designed for your use case @julien-c (:roll_eyes:) but at least we would still support git credential and in a better way than what's done with the current |
I don't like the idea of removing the feature completely with @LysandreJik specifically advocating on the fact that it caused issues by the past (#1051 (comment)). |
I just created a draft PR and proposed an adapted workflow in #1138 (comment). |
Haven't read the other PR yet, but yeah asking |
Mentioned in #1043 (comment).
Currently we store the user token for git commands in the git-credential-store. This is the default git storage that stores creds in plain text in a file.
huggingface_hub
warns the user to use it by default to avoid problems (by runninggit config --global credential.helper store
). In a perfect world, it would be good to use the default credential helper from the user. In particular, macos users have a macosxkeychain tool by default to securely handle credentials.Another possibility is to not store the credential in git and automatically fill the values (from python) when git requires them (in the
Repository
module).Note: I am no expert on that topic so any addition is welcomed here :)
Useful links:
(Edit: also to mention that when a user do
huggingface-cli login
ornotebook_login()
, the token is also stored locally in plain text in the home directory~/.huggingface/token
to be reused in API calls. Changing this is out of topic for this issue)The text was updated successfully, but these errors were encountered: