Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

~/.huggingface/token is loaded with EOL #1634

Closed
GBR-613 opened this issue Sep 4, 2023 · 5 comments
Closed

~/.huggingface/token is loaded with EOL #1634

GBR-613 opened this issue Sep 4, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@GBR-613
Copy link
Contributor

GBR-613 commented Sep 4, 2023

Describe the bug

I copy the authentication token from the site and put it into ~/.huggingface/token using vim, nano or what else method.
When I run the code, load_datasets() fails, because a newline character is added to the token.

There is a workaround to solve the problem: truncate -s -1 .huggingface/token
But I believe correct solution would be to drop the finishing "\n" in the library code, for example in HfFolder().get_token(), if it is present, and to drop any finishing space characters too.

Reproduction

from datasets import load_dataset
ds = load_dataset("my-company/my-ds", verification_mode="no_checks", token=True)

Logs

N/A

System info

- huggingface_hub version: 0.16.4
- Platform: Linux-5.15.0-1036-aws-x86_64-with-glibc2.31
- Python version: 3.9.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ubuntu/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: gbr-bst
- Configured git credential helpers:
- FastAI: 2.1.10
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.5.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.24.3
- pydantic: 1.10.8
- aiohttp: 3.8.5
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/ubuntu/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/ubuntu/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/ubuntu/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
@GBR-613 GBR-613 added the bug Something isn't working label Sep 4, 2023
@Wauplin
Copy link
Contributor

Wauplin commented Sep 4, 2023

But I believe correct solution would be to drop the finishing "\n" in the library code, for example in HfFolder().get_token(), if it is present, and to drop any finishing space characters too.

@GBR-613 yes indeed, we could add a .strip() in .get_token() which would solve your issue. Do you want to open a PR to fix this and add a quick test in test_utils_hf_folder.py?

Regardless to this specific issue, do you know that you can save your token using huggingface-cli login instead? This command takes care of saving the token for you without having to struggle with saving+truncating the file. Any specific reason why not using it?

@julien-c
Copy link
Member

julien-c commented Sep 4, 2023

in fact, I'm wondering if it didn't use to be the case that we were stripping any newlines from get_token() (i might be mistaken though)

@GBR-613
Copy link
Contributor Author

GBR-613 commented Sep 4, 2023

@Wauplin
> do you know that you can save your token using huggingface-cli login instead?
It's that I finally did. But eventually I learned about existence of this option much later than about load_dataset(). (I used the latter much with free datasets until now.)

BTW, the documentation says several times:

token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).

That gave me impression that ~/.huggingface is a name of file where the token may be stored.
Is it possible to make it more clear? According to what I and my coworkers see in our workstations, it should say:

token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.cache/.huggingface/toekn file) or load it from  ~/.huggingface/token file.
> Do you want to open a PR to fix this and add a quick test in [test_utils_hf_folder.py](https://github.com/huggingface/huggingface_hub/blob/main/tests/test_utils_hf_folder.py)?

OK, will do.

GBR-613 added a commit to GBR-613/huggingface_hub that referenced this issue Sep 5, 2023
@GBR-613
Copy link
Contributor Author

GBR-613 commented Sep 5, 2023

@Wauplin PR is created: #1638

@Wauplin
Copy link
Contributor

Wauplin commented Sep 5, 2023

Thanks for the quick PR @GBR-613 :) I've approved it and will merge once the CI is green.

Wauplin added a commit that referenced this issue Sep 5, 2023
* Fix bug #1634 (drop finishing spaces and EOL)

* Update tests/test_utils_hf_folder.py

* Update tests/test_utils_hf_folder.py

* make style

---------

Co-authored-by: Lucain <[email protected]>
@Wauplin Wauplin closed this as completed Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants