ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. #9250

nsankar · 2020-12-22T07:14:51Z

Environment info

transformers version: Latest transformers==4.2.0.dev0
Platform: Colab
Python version: Python 3.6.9
PyTorch version (GPU?): torch==1.7.0+cu101
Tensorflow version (GPU?):
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

Information

The following code indicated in the latest HF news letter seems to have isssues when I tried
I get tokenizer error both under Fast and Slow (True/Flase tokenizer parameter) conditions when I had checked

The problem arises when using:

the official example scripts: (give details below)
[ ]

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM 

tokenizer = AutoTokenizer.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa",use_fast=False )

model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa")

context = "HuggingFace won the best Demo paper at EMNLP2020."
question = "What won HuggingFace?"
input_text = 'question: %s context: %s' % (question, context)
features = tokenizer([input_text], return_tensors='pt')
output = model.generate(**features)
tokenizer.decode(output[0])

To reproduce

Steps to reproduce the behavior:

Run the above code on Google Colab

ERROR reported

`ValueError Traceback (most recent call last)
in ()
10 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
11
---> 12 tokenizer = AutoTokenizer.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa",use_fast=False )
13
14 model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa")

/usr/local/lib/python3.6/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
358 if tokenizer_class is None:
359 raise ValueError(
--> 360 "Tokenizer class {} does not exist or is not currently imported.".format(tokenizer_class_candidate)
361 )
362 return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported.`

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2020-12-22T12:02:24Z

Hey @nsankar,

I cannot reproduce the above error concerning the tokenizer. The tokenizer is loaded correctly in my command line.
However it seems like the model weights are not 100% correct.

@mrm8488 when I load the model via:

model = AutoModelForSeq2SeqLM.from_pretrained("mrm8488/mT5-small-finetuned-tydiqa-for-xqa")

I get the following warning:

2020-12-22 11:59:05.111580: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-12-22 11:59:05.111618: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Some weights of the model checkpoint at mrm8488/mT5-small-finetuned-tydiqa-for-xqa were not used when initializing T5ForConditionalGeneration: ['encoder.block.0.layer.1.DenseReluDense.wi.weight', 'encoder.block.1.layer.1.DenseReluDense.wi.weight', 'encoder.block.2.layer.1.DenseReluDense.wi.weight', 'encoder.block.3.layer.1.DenseReluDense.wi.weight', 'encoder.block.4.layer.1.DenseReluDense.wi.weight', 'encoder.block.5.layer.1.DenseReluDense.wi.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight', 'decoder.block.0.layer.2.DenseReluDense.wi.weight', 'decoder.block.1.layer.2.DenseReluDense.wi.weight', 'decoder.block.2.layer.2.DenseReluDense.wi.weight', 'decoder.block.3.layer.2.DenseReluDense.wi.weight', 'decoder.block.4.layer.2.DenseReluDense.wi.weight', 'decoder.block.5.layer.2.DenseReluDense.wi.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at mrm8488/mT5-small-finetuned-tydiqa-for-xqa and are newly initialized: ['encoder.block.0.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.0.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.1.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.1.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.2.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.2.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.3.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.3.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.4.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.4.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.5.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.5.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.6.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.6.layer.1.DenseReluDense.wi_1.weight', 'encoder.block.7.layer.1.DenseReluDense.wi_0.weight', 'encoder.block.7.layer.1.DenseReluDense.wi_1.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.0.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.1.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.2.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.3.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.4.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.5.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.6.layer.2.DenseReluDense.wi_1.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_0.weight', 'decoder.block.7.layer.2.DenseReluDense.wi_1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

-> I think the weights uploaded here correspond to the "old" T5 version. It would be awesome if you could check the weights :-)
Also in the config: https://huggingface.co/mrm8488/mT5-small-finetuned-tydiqa-for-xqa/blob/main/config.json, the architecture "T5ForConditionalGeneration" is used as well as "t5" for the model type, but it should be "MT5ForConditionalGeneration" and "mt5" I think :-)

mrm8488 · 2020-12-22T13:30:37Z

Thanks @patrickvonplaten. I will check it out, ASAP.

pommedeterresautee · 2020-12-25T09:26:27Z

It seems to happen with other models:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("moussaKam/mbarthez")

Traceback (most recent call last):
  File "/home/user/.local/share/virtualenvs/project/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-028660e65504>", line 3, in <module>
    tokenizer = AutoTokenizer.from_pretrained("moussaKam/mbarthez")
  File "/home/user/.local/share/virtualenvs/project/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 359, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class BarthezTokenizer does not exist or is not currently imported.

And:

(project) user@ubuntu:/mnt/workspace/project$ pip list | grep transformers
transformers             4.1.1

github-actions · 2021-03-06T00:14:05Z

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

simon-ging · 2021-11-30T14:01:55Z

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece

Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

johnpaulbin · 2021-12-14T02:23:33Z

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece

Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

This works fabulously with DeBerta models as well, seems that the error isn't very descriptive.

patrickvonplaten · 2021-12-14T18:42:41Z

I think on current master a better error message is given when from_pretrained(...) is called from a dummy object cc @sgugger :-)

480284856 · 2022-08-19T07:15:43Z

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece

Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

while it doesn't work for me. :-(

`
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

ValueError: Tokenizer class BloomTokenizerFast does not exist or is not currently imported.
`

480284856 · 2022-08-19T07:24:20Z

I had a similar problem ValueError: Tokenizer class M2M100Tokenizer does not exist or is not currently imported. and solved it by running pip install sentencepiece
Seems that when missing the sentencepiece package, AutoTokenizer.from_pretrained will silently not load the tokenizer and then crash later.

while it doesn't work for me. :-(

` tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

ValueError: Tokenizer class BloomTokenizerFast does not exist or is not currently imported. `

well, newest version of transformers works for me.

MoritzLaurer · 2023-02-11T12:10:39Z

I'm getting the same error with transformers==4.26 when trying to load ernie-m-base with

MODEL_NAME = "PaddlePaddle/ernie-m-base"  
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, model_max_length=max_length)  # model_max_length=512
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME, label2id=label2id, id2label=id2label).to(device)

Traceback (most recent call last):
  File "/gpfs/home5/laurerm/nli-scratch/nli_training.py", line 41, in <module>
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=True, model_max_length=max_length)  # model_max_length=512
  File "/home/laurerm/.local/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 655, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class ErnieMTokenizer does not exist or is not currently imported.

The exact same code worked two days ago with XLM-V. I've made sure that sentencepiece is installed.

Edit: Ah I think the error currently comes up because ernie-m is on the hub, but not yet merged into master for transformers #21349 (?)

patrickvonplaten mentioned this issue Dec 25, 2020

[Don't merge] New design proposition for MAPPINGS in "auto" files #9305

Closed

github-actions bot added the wontfix label Mar 6, 2021

github-actions bot closed this as completed Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. #9250

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. #9250

nsankar commented Dec 22, 2020 •

edited by patrickvonplaten

Loading

patrickvonplaten commented Dec 22, 2020

mrm8488 commented Dec 22, 2020

pommedeterresautee commented Dec 25, 2020 •

edited

Loading

github-actions bot commented Mar 6, 2021

simon-ging commented Nov 30, 2021

johnpaulbin commented Dec 14, 2021

patrickvonplaten commented Dec 14, 2021

480284856 commented Aug 19, 2022 •

edited

Loading

480284856 commented Aug 19, 2022

MoritzLaurer commented Feb 11, 2023 •

edited

Loading

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. #9250

ValueError: Tokenizer class T5Tokenizer does not exist or is not currently imported. #9250

Comments

nsankar commented Dec 22, 2020 • edited by patrickvonplaten Loading

Environment info

Who can help

Information

To reproduce

patrickvonplaten commented Dec 22, 2020

mrm8488 commented Dec 22, 2020

pommedeterresautee commented Dec 25, 2020 • edited Loading

github-actions bot commented Mar 6, 2021

simon-ging commented Nov 30, 2021

johnpaulbin commented Dec 14, 2021

patrickvonplaten commented Dec 14, 2021

480284856 commented Aug 19, 2022 • edited Loading

480284856 commented Aug 19, 2022

MoritzLaurer commented Feb 11, 2023 • edited Loading

nsankar commented Dec 22, 2020 •

edited by patrickvonplaten

Loading

pommedeterresautee commented Dec 25, 2020 •

edited

Loading

480284856 commented Aug 19, 2022 •

edited

Loading

MoritzLaurer commented Feb 11, 2023 •

edited

Loading