You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current workaround is to use BertTokenizer.from_pretrained(bert_version, local_files_only=True) but this does not allow to use same code with and without Internet.
To reproduce
Steps to reproduce the behavior:
Run
from transformers import BertTokenizer
BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking")
from env without internet but all the required cache files pre-downloaded.
Expected behavior
Works exactly as
from transformers import BertTokenizer
BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking", local_files_only=True)
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
transformers
version: 4.5.0.dev0Who can help
@LysandreJik (related to #10235 and #10067)
Information
I'm trying to run
from an environment without Internet access. It crashes even though I have all files downloaded and cached. The uncaught exception:
transformers/src/transformers/file_utils.py
Lines 1347 to 1350 in 5f1491d
When
file_id == 'added_tokens_file'
file_path
equals https://huggingface.co/bert-large-uncased-whole-word-masking/resolve/main/added_tokens.json which does not exist. (transformers/src/transformers/tokenization_utils_base.py
Line 1653 in 1a3e0c4
This results in line
transformers/src/transformers/file_utils.py
Line 1294 in 1a3e0c4
ConnectTimeout
which is caught intransformers/src/transformers/file_utils.py
Line 1313 in 1a3e0c4
and further ignored until another exception in
transformers/src/transformers/tokenization_utils_base.py
Line 1672 in 1a3e0c4
which is not caught enywhere.
When trying to get the same file with the internet is on the code work differently: line
transformers/src/transformers/file_utils.py
Line 1295 in 1a3e0c4
requests.exceptions.HTTPError
, which is caught and processed heretransformers/src/transformers/tokenization_utils_base.py
Lines 1674 to 1677 in 1a3e0c4
The rest of the code works just fine after
resolved_vocab_files[file_id] = None
Using
BertTokenizer.from_pretrained(bert_version, local_files_only=True)
works just fine because of this condition:transformers/src/transformers/tokenization_utils_base.py
Lines 1668 to 1672 in 1a3e0c4
The current workaround is to use
BertTokenizer.from_pretrained(bert_version, local_files_only=True)
but this does not allow to use same code with and without Internet.To reproduce
Steps to reproduce the behavior:
Run
from env without internet but all the required cache files pre-downloaded.
Expected behavior
Works exactly as
The text was updated successfully, but these errors were encountered: