Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to find the corresponding download models from Amazon? #2157

Closed
PantherYan opened this issue Dec 13, 2019 · 9 comments
Closed

How to find the corresponding download models from Amazon? #2157

PantherYan opened this issue Dec 13, 2019 · 9 comments
Labels

Comments

@PantherYan
Copy link

❓ Questions & Help

As we know, the TRANSFORMER could easy auto-download models by the pretrain( ) function.
And the pre-trained BERT/RoBerta model are stored at the path of
./cach/.pytorch/.transformer/....

But, all the name of the download models are like this:

d9fc1956a01fe24af529f239031a439661e7634e6e931eaad2393db3ae1eff03.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda.json

It's not readable and hard to distinguish which model is I wanted.

In another word, if I want to find the pretrained model of 'uncased_L-12_H-768_A-12', I can't finde which one is ?

Thanks for your answering.

@LysandreJik
Copy link
Member

Hi, they are named as such because that's a clean way to make sure the model on the S3 is the same as the model in the cache. The name is created from the etag of the file hosted on the S3.

If you want to save it with a given name, you can save it as such:

from transformers import BertModel

model = BertModel.from_pretrained("bert-base-cased")
model.save_pretrained("cased_L-12_H-768_A-12")

@rnyak
Copy link

rnyak commented Dec 26, 2019

@LysandreJik, following up the question above, and your answer, I ran this command first:

from transformers import RobertaModel
model = RobertaModel.from_pretrained("roberta-large")
model.save_pretrained("./roberta-large-355M")

I guess, we expect config.json, vocab, and all the other necessary files to be saved in roberta-large-355M directory.

Then I ran:

python ./examples/run_glue.py   --model_type roberta   --model_name_or_path ./roberta-large-355M --task_name MRPC --do_train  --do_eval --do_lower_case --data_dir $GLUE_DIR/$TASK_NAME --max_seq_length 128 --per_gpu_train_batch_size 32 --learning_rate 2e-5 --num_train_epochs 2.0 --output_dir ./results/mrpc/

and I am getting:

OSError: Model name './roberta-large-355M' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed './roberta-large-355M' was a path or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url

I checked the roberta-large-355M and there are only: config.json pytorch_model.bin, but files named ['vocab.json', 'merges.txt'] are missing.

same issue with the XLNET:

../workspace/transformers/xlnet_base# ls
config.json  pytorch_model.bin

What am I missing here? Why are all the files not downloaded properly?

Thanks.

@julien-c
Copy link
Member

You also have to save the tokenizer into the same directory:

tokenizer.save_pretrained("./roberta-large-355M")

Let me know if this solves your issue.

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 26, 2020
@stale stale bot closed this as completed Mar 4, 2020
@bhamz-1
Copy link

bhamz-1 commented May 31, 2020

OSError: Model name 'roberta-base' was not found in tokenizers model name list (roberta-base, roberta-large, roberta-large-mnli, distilroberta-base, roberta-base-openai-detector, roberta-large-openai-detector). We assumed 'roberta-base' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.
I got the error above even after saving the tokenizer, config, and model in the same directory

@applemorshed
Copy link

the problem for me is , when i load the model turning wifi off or switch off internet connection it fail to run but when i turn internet connection it run again. how can i run it off line.
i also set enviornment variable like this .
import os
os.environ['HF_DATASETS_OFFLINE']='1'
os.environ['TRANSFORMERS_OFFLINE']='1'
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')
generator(text, do_sample=True, min_length=5)

result
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@applemorshed
Copy link

import os
from transformers import pipeline
#HF_DATASETS_OFFLINE = 1
#TRANSFORMERS_OFFLINE = 1
#os.environ[HF_DATASETS_OFFLINE = 1,TRANSFORMERS_OFFLINE = 1]
os.environ["HF_DATASETS_OFFLINE"] = "1"
os.environ["TRANSFORMERS_OFFLINE"] = "1"
cache_dir='/Users/hossain/Desktop/gpt2/gpt-neo-1.3/model/'
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-1.3B')

text = 'i am fine. what about you?'
generator(text, do_sample=True, min_length=5)
result: through an error
"Connection error, and we cannot find the requested files in the cached path."
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@Hyprnx
Copy link

Hyprnx commented Dec 19, 2022

i have dig down into the sentence_transformers lib to see which folder contain the file after downloaded. And came up with this script to see where sentence_transformers keep its files.

import os

torch_home = os.path.expanduser(
    os.getenv("TORCH_HOME",
              os.path.join(os.getenv("XDG_CACHE_HOME",
                                     "~/.cache"), 'torch')))

print(torch_home)

i hope it helps

@jieunboy0516
Copy link

i have dig down into the sentence_transformers lib to see which folder contain the file after downloaded. And came up with this script to see where sentence_transformers keep its files.

import os

torch_home = os.path.expanduser(
    os.getenv("TORCH_HOME",
              os.path.join(os.getenv("XDG_CACHE_HOME",
                                     "~/.cache"), 'torch')))

print(torch_home)

i hope it helps

thanks. the code works on windows too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants