Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected warning of SlowTokenizer #29237

Closed
2 of 4 tasks
hiyouga opened this issue Feb 23, 2024 · 5 comments
Closed
2 of 4 tasks

Unexpected warning of SlowTokenizer #29237

hiyouga opened this issue Feb 23, 2024 · 5 comments

Comments

@hiyouga
Copy link
Contributor

hiyouga commented Feb 23, 2024

System Info

  • transformers version: 4.38.1
  • Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
  • Python version: 3.10.10
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.1
  • Accelerate version: 0.26.1
  • Tokenizers version: 0.15.2
  • PyTorch version (GPU?): 2.1.1+cu121 (True)

Who can help?

@ArthurZucker @younesbelkada

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import transformers
from transformers import AutoTokenizer
transformers.utils.logging.set_verbosity(transformers.logging.INFO)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf", use_fast=False)
# [INFO|tokenization_utils_base.py:2044] 2024-02-23 17:26:31,129 >> loading file tokenizer.model
# [INFO|tokenization_utils_base.py:2044] 2024-02-23 17:26:31,130 >> loading file added_tokens.json
# [INFO|tokenization_utils_base.py:2044] 2024-02-23 17:26:31,130 >> loading file special_tokens_map.json
# [INFO|tokenization_utils_base.py:2044] 2024-02-23 17:26:31,130 >> loading file tokenizer_config.json
# [INFO|tokenization_utils_base.py:2044] 2024-02-23 17:26:31,130 >> loading file tokenizer.json
tokenizer.encode("hello")
# [WARNING|tokenization_utils.py:562] 2024-02-23 17:26:41,845 >> Keyword arguments {'add_special_tokens': False} not recognized.
# [1, 22172]
tokenizer.encode("hello", add_special_tokens=False)
# [WARNING|tokenization_utils.py:562] 2024-02-23 17:26:58,903 >> Keyword arguments {'add_special_tokens': False} not recognized.
# [22172]

Expected behavior

Do not show redundant warnings like the fast one.

@ArthurZucker
Copy link
Collaborator

Thanks! That is indeed an issue and should be fixed! I tracked it but could not reproduced 🤗 Thanks

@StevenTang1998
Copy link
Contributor

The same issue

@hiyouga
Copy link
Contributor Author

hiyouga commented Feb 25, 2024

@ArthurZucker This issue is a duplicate of #29237 and can be fixed in #29278

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@hiyouga hiyouga closed this as completed Mar 25, 2024
@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 25, 2024

For context, #29346 fixed this, thanks @hiyouga for your initial PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants