~/.huggingface/token is loaded with EOL #1634

GBR-613 · 2023-09-04T11:33:12Z

Describe the bug

I copy the authentication token from the site and put it into ~/.huggingface/token using vim, nano or what else method.
When I run the code, load_datasets() fails, because a newline character is added to the token.

There is a workaround to solve the problem: truncate -s -1 .huggingface/token
But I believe correct solution would be to drop the finishing "\n" in the library code, for example in HfFolder().get_token(), if it is present, and to drop any finishing space characters too.

Reproduction

from datasets import load_dataset
ds = load_dataset("my-company/my-ds", verification_mode="no_checks", token=True)

Logs

N/A

System info

- huggingface_hub version: 0.16.4
- Platform: Linux-5.15.0-1036-aws-x86_64-with-glibc2.31
- Python version: 3.9.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/ubuntu/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: gbr-bst
- Configured git credential helpers:
- FastAI: 2.1.10
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: 9.5.0
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 1.24.3
- pydantic: 1.10.8
- aiohttp: 3.8.5
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /home/ubuntu/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /home/ubuntu/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/ubuntu/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False

The text was updated successfully, but these errors were encountered:

Wauplin · 2023-09-04T12:30:06Z

But I believe correct solution would be to drop the finishing "\n" in the library code, for example in HfFolder().get_token(), if it is present, and to drop any finishing space characters too.

@GBR-613 yes indeed, we could add a .strip() in .get_token() which would solve your issue. Do you want to open a PR to fix this and add a quick test in test_utils_hf_folder.py?

Regardless to this specific issue, do you know that you can save your token using huggingface-cli login instead? This command takes care of saving the token for you without having to struggle with saving+truncating the file. Any specific reason why not using it?

julien-c · 2023-09-04T13:04:40Z

in fact, I'm wondering if it didn't use to be the case that we were stripping any newlines from get_token() (i might be mistaken though)

GBR-613 · 2023-09-04T14:45:44Z

@Wauplin
> do you know that you can save your token using huggingface-cli login instead?
It's that I finally did. But eventually I learned about existence of this option much later than about load_dataset(). (I used the latter much with free datasets until now.)

BTW, the documentation says several times:

token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).

That gave me impression that ~/.huggingface is a name of file where the token may be stored.
Is it possible to make it more clear? According to what I and my coworkers see in our workstations, it should say:

token (str or bool, optional) — The token to use as HTTP bearer authorization for remote files. If True, or not specified, will use the token generated when running huggingface-cli login (stored in ~/.cache/.huggingface/toekn file) or load it from  ~/.huggingface/token file.

> Do you want to open a PR to fix this and add a quick test in [test_utils_hf_folder.py](https://github.com/huggingface/huggingface_hub/blob/main/tests/test_utils_hf_folder.py)?

OK, will do.

GBR-613 · 2023-09-05T08:11:11Z

@Wauplin PR is created: #1638

Wauplin · 2023-09-05T08:26:30Z

Thanks for the quick PR @GBR-613 :) I've approved it and will merge once the CI is green.

* Fix bug #1634 (drop finishing spaces and EOL) * Update tests/test_utils_hf_folder.py * Update tests/test_utils_hf_folder.py * make style --------- Co-authored-by: Lucain <[email protected]>

GBR-613 added the bug Something isn't working label Sep 4, 2023

GBR-613 added a commit to GBR-613/huggingface_hub that referenced this issue Sep 5, 2023

Fix bug huggingface#1634 (drop finishing spaces and EOL)

ac93708

Wauplin closed this as completed Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

~/.huggingface/token is loaded with EOL #1634

~/.huggingface/token is loaded with EOL #1634

GBR-613 commented Sep 4, 2023

Wauplin commented Sep 4, 2023

julien-c commented Sep 4, 2023

GBR-613 commented Sep 4, 2023

GBR-613 commented Sep 5, 2023

Wauplin commented Sep 5, 2023

~/.huggingface/token is loaded with EOL #1634

~/.huggingface/token is loaded with EOL #1634

Comments

GBR-613 commented Sep 4, 2023

Describe the bug

Reproduction

Logs

System info

Wauplin commented Sep 4, 2023

julien-c commented Sep 4, 2023

GBR-613 commented Sep 4, 2023

GBR-613 commented Sep 5, 2023

Wauplin commented Sep 5, 2023