How to find the corresponding download models from Amazon? #2157

PantherYan · 2019-12-13T01:34:56Z

❓ Questions & Help

As we know, the TRANSFORMER could easy auto-download models by the pretrain( ) function.
And the pre-trained BERT/RoBerta model are stored at the path of
./cach/.pytorch/.transformer/....

But, all the name of the download models are like this:

d9fc1956a01fe24af529f239031a439661e7634e6e931eaad2393db3ae1eff03.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.json

It's not readable and hard to distinguish which model is I wanted.

In another word, if I want to find the pretrained model of 'uncased_L-12_H-768_A-12', I can't finde which one is ?

Thanks for your answering.

LysandreJik · 2019-12-13T23:02:54Z

Hi, they are named as such because that's a clean way to make sure the model on the S3 is the same as the model in the cache. The name is created from the etag of the file hosted on the S3.

If you want to save it with a given name, you can save it as such:

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-cased")
model.save_pretrained("cased_L-12_H-768_A-12")

rnyak · 2019-12-26T17:45:10Z

@LysandreJik, following up the question above, and your answer, I ran this command first:

from transformers import RobertaModel
model = RobertaModel.from_pretrained("roberta-large")
model.save_pretrained("./roberta-large-355M")

I guess, we expect config.json, vocab, and all the other necessary files to be saved in roberta-large-355M directory.

Then I ran:

python ./examples/run_glue.py   --model_type roberta   --model_name_or_path ./roberta-large-355M --task_name MRPC --do_train  --do_eval --do_lower_case --data_dir $GLUE_DIR/$TASK_NAME --max_seq_length 128 --per_gpu_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 2.0 --output_dir ./results/mrpc/

and I am getting:

OSError: Model name './roberta-large-355M' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed './roberta-large-355M' was a path or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url

I checked the roberta-large-355M and there are only: config.json pytorch_model.bin, but files named ['vocab.json', 'merges.txt'] are missing.

same issue with the XLNET:

../workspace/transformers/xlnet_base# ls
config.json  pytorch_model.bin

What am I missing here? Why are all the files not downloaded properly?

Thanks.

julien-c · 2019-12-28T02:13:30Z

You also have to save the tokenizer into the same directory:

tokenizer.save_pretrained("./roberta-large-355M")

Let me know if this solves your issue.

stale · 2020-02-26T19:49:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bhamz-1 · 2020-05-31T01:21:07Z

OSError: Model name 'roberta-base' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed 'roberta-base' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
I got the error above even after saving the tokenizer, config, and model in the same directory

applemorshed · 2021-06-17T11:24:27Z

the problem for me is , when i load the model turning wifi off or switch off internet connection it fail to run but when i turn internet connection it run again. how can i run it off line.
i also set enviornment variable like this .
import os
os.environ['HF_DATASETS_OFFLINE']='1'
os.environ['TRANSFORMERS_OFFLINE']='1'
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
generator(text, do_sample=True, min_length=5)

result
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

applemorshed · 2021-06-17T11:26:07Z

import os
from transformers import pipeline
#HF_DATASETS_OFFLINE = 1
#TRANSFORMERS_OFFLINE = 1
#os.environ[HF_DATASETS_OFFLINE = 1,TRANSFORMERS_OFFLINE = 1]
os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ["TRANSFORMERS_OFFLINE"] = "1"
cache_dir='/Users/hossain/Desktop/gpt2/gpt-neo-1.3/model/'
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')

text = 'i am fine. what about you?'
generator(text, do_sample=True, min_length=5)
result: through an error
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Hyprnx · 2022-12-19T15:30:35Z

i have dig down into the sentence_transformers lib to see which folder contain the file after downloaded. And came up with this script to see where sentence_transformers keep its files.

import os

torch_home = os.path.expanduser(
    os.getenv("TORCH_HOME",
              os.path.join(os.getenv("XDG_CACHE_HOME",
                                     "~/.cache"), 'torch')))

print(torch_home)

i hope it helps

jieunboy0516 · 2023-02-26T01:12:37Z

i have dig down into the sentence_transformers lib to see which folder contain the file after downloaded. And came up with this script to see where sentence_transformers keep its files.
import os

torch_home = os.path.expanduser(
    os.getenv("TORCH_HOME",
              os.path.join(os.getenv("XDG_CACHE_HOME",
                                     "~/.cache"), 'torch')))

print(torch_home)
i hope it helps

thanks. the code works on windows too

This was referenced Jan 1, 2020

Where does the pre-trained bert model gets cached in my system by default? #2323

Closed

In which directory the downloaded roberta-base models will be stored on linux server conda environment #2128

Closed

stale bot added the wontfix label Feb 26, 2020

stale bot closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to find the corresponding download models from Amazon? #2157

How to find the corresponding download models from Amazon? #2157

PantherYan commented Dec 13, 2019

LysandreJik commented Dec 13, 2019

rnyak commented Dec 26, 2019 •

edited

Loading

julien-c commented Dec 28, 2019

stale bot commented Feb 26, 2020

bhamz-1 commented May 31, 2020

applemorshed commented Jun 17, 2021

applemorshed commented Jun 17, 2021

Hyprnx commented Dec 19, 2022

jieunboy0516 commented Feb 26, 2023

How to find the corresponding download models from Amazon? #2157

How to find the corresponding download models from Amazon? #2157

Comments

PantherYan commented Dec 13, 2019

❓ Questions & Help

LysandreJik commented Dec 13, 2019

rnyak commented Dec 26, 2019 • edited Loading

julien-c commented Dec 28, 2019

stale bot commented Feb 26, 2020

bhamz-1 commented May 31, 2020

applemorshed commented Jun 17, 2021

applemorshed commented Jun 17, 2021

Hyprnx commented Dec 19, 2022

jieunboy0516 commented Feb 26, 2023

rnyak commented Dec 26, 2019 •

edited

Loading