Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when creating embeddings - HEAD request to S3 bucket returns 404 #1831

Closed
cibeah opened this issue Aug 27, 2020 · 30 comments
Closed

Error when creating embeddings - HEAD request to S3 bucket returns 404 #1831

cibeah opened this issue Aug 27, 2020 · 30 comments
Labels
bug Something isn't working

Comments

@cibeah
Copy link

cibeah commented Aug 27, 2020

Hello, I have a problem loading Word/FlairEmbeddings for English and German languages located at the urls: "https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/xxxxxxxxx.pt".

When following Tutorial 3, trying to create these embeddings gives the following error:

OSError: HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim.vectors.npy with status code 404

Making a simple HEAD request to that url outside of flair returns 404, so it looks like the embeddings are not located there anymore ?

To Reproduce

from flair.embeddings import WordEmbeddings
glove_embedding = WordEmbeddings('glove')

I would appreciate your help,
Thank you !

@cibeah cibeah added the bug Something isn't working label Aug 27, 2020
@PeadarOhAodha
Copy link

Seeing the same behaviour for the Flair forward and backward news embeddings ('news-forward' and 'news-backward'):

HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings-v0.4.1/big-news-forward--h2048-l1-d0.05-lr30-0.25-20/news-forward-0.4.1.pt with status code 404.

@alanakbik
Copy link
Collaborator

Hm, it looks like the AWS bucket is down. I have to check what's going on!

@tjaffri
Copy link

tjaffri commented Aug 28, 2020

Thanks @alanakbik... I think we are all blocked by this, those of us who are using this on different machines / CI where we need this stuff to work. Your help is much appreciated!!

@alanakbik
Copy link
Collaborator

Quick update: Unfortunately the entire AWS account was deleted yesterday by an internal process since I no longer work at Zalando (I'm now at university). That means that most models are currently not accessible. To fix this, I have to set up another file hosting solution and do a hotfix of Flair.

If anybody knows a good way to host a large amount of large files (with high amounts of download traffic by a big community of users), please let me know!

@ramnique
Copy link

If anybody knows a good way to host a large amount of large files (with high amounts of download traffic by a big community of users), please let me know!

S3 is still your best bet.

@kavyasoni
Copy link

Is it resolved? We are blocked with this.

@jordn
Copy link

jordn commented Aug 28, 2020

@alanakbik is this repo still being supported by Zalando – financially or otherwise? My startup uses this package we could see if we can help with the hosting.

@cariad
Copy link

cariad commented Aug 28, 2020

Folks, I just nabbed the alan-nlp bucket name while it's available, to prevent any bad actors taking it. I'll happily release it back to the project when you have an AWS account. 👍

@alanakbik
Copy link
Collaborator

There is currently no financial support - the project is maintained only through code contributions by open source community and members of my group. We're now tentatively thinking of setting up a system for donations to cover costs (and maybe even to hire people to maintain the code).

@cariad
Copy link

cariad commented Aug 28, 2020

If a developer on my team ran our app locally <20 hours ago, will the files from alan-nlp/* be cached locally anywhere? Is there any chance we could extract those files from their hard drive and put them into our deployment pipeline?

Thanks for any guidance!

@cariad
Copy link

cariad commented Aug 28, 2020

Folks, to clarify #1831 (comment): I've reserved the bucket name but I do not have the files. The emails are flooding in, and I can't help you.

@alanakbik
Copy link
Collaborator

Yes all files are cached in the .flair folder in your home folder.

@alanakbik
Copy link
Collaborator

Quick update: we are working on a fix. Will keep you posted.

@severinsimmler
Copy link
Contributor

What about Zenodo as hoster? You also get a DOI for each model which would make it easier to cite models in papers.

You can download models from Zenodo e.g. like this:

import requests
import wget

def download(doi, filepath):
    url = f"https://doi.org/{doi}"
    r = requests.get(url)
    record = r.url.split("/")[-1].strip()
    url = f"https://zenodo.org/api/records/{record}"
    r = requests.get(url)
    if r.ok:
        print("Downloading model from Zenodo...")
        print(f"Target directory: {filepath}")
        response = r.json()
        files = response["files"]
        total = sum(file["size"] for file in files)
        for file in files:
            link = file["links"]["self"]
            size = file["size"] / 2 ** 20
            print(f"Total size: {size:.1f} MB")
            fname = file["key"]
            checksum = file["checksum"]
            filename = wget.download(link, filepath)
            return filename
    else:
        raise Excpetion("Unable to download model from Zenodo.")

@ajaypr55
Copy link

i need en-ner-conll03-v0.4.pt file . its stored in .flair/ directory default .

@RXminuS
Copy link

RXminuS commented Aug 29, 2020

What's the approximate costs / month or year for hosting? We and a few others rely heavily on this project so can discuss with our investors make a meaningful donation.

@alanakbik
Copy link
Collaborator

@RXminuS that would be great and very much appredicated! I am currently looking into options for hosting and donation models!

@severinsimmler thanks for the pointer to Zenodo. Looks interesting - do you have experience wrt download speeds? Some of the models are pretty big and there is a good amount of traffic.

As a first fix, I've moved all models to our university server. Download speeds are slower than before and I worry that the server will have problems with the traffic (so please don't all try at the same time ;)), but at least everything should run again. Will be merged and released soon.

@abeermohamed1
Copy link

at least everything should run again. Will be merged

Thank you @alanakbik so how to use it. as i am getting the below error

OSError Traceback (most recent call last)
in ()
58 BertEmbeddings(bert_model_or_path = data_folder),
59 #flair fast embedding
---> 60 WordEmbeddings('ar'),
61
62

2 frames
/usr/local/lib/python3.6/dist-packages/flair/file_utils.py in get_from_cache(url, cache_dir)
216 if response.status_code != 200:
217 raise IOError(
--> 218 f"HEAD request failed for url {url} with status code {response.status_code}."
219 )
220

OSError: HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings-v0.4/ar-wiki-fasttext-300d-1M.vectors.npy with status code 301.

alanakbik added a commit that referenced this issue Aug 30, 2020
@alanakbik
Copy link
Collaborator

The fix is merged to master and can be used by installing Flair with:

pip install --upgrade git+https://github.com/flairNLP/flair.git

I'll also push this to pip later, but first let's test if this works.

@Masum06
Copy link

Masum06 commented Aug 30, 2020

Thanks a bunch, @alanakbik. My stack uses Flair 0.4.2. How do I install that specific version?

@alanakbik
Copy link
Collaborator

@Masum06 I'm afraid the model download in older versions will remain broken. If updating is not possible you could use the new version to download the models manually and then run the old version.

@RXminuS
Copy link

RXminuS commented Aug 30, 2020

Just wanted to say fantastic work @alanakbik. Not just for the library but keeping your cool during a situation like this. And your dedication to finding a way around so quickly. I have the deepest respect for open-source maintainers like yourself 🙇‍♂️🙇‍♂️🙇‍♂️

@VigneshBaskar
Copy link

@alanakbik You are the best! Thank you very much!

@alanakbik
Copy link
Collaborator

@RXminuS @VigneshBaskar Thanks a lot!

We just pushed the new version to pip so you can do a regular update with:

pip install --upgrade flair 

@cibeah
Copy link
Author

cibeah commented Aug 31, 2020

Thank you so much for your reactivity @alanakbik ! It all works fine for us.

@cibeah cibeah closed this as completed Sep 3, 2020
@djstrong
Copy link
Contributor

@alanakbik Dictionary.load("common-chars") in CharacterEmbeddings does not work.

@alanakbik
Copy link
Collaborator

@djstrong can you try again? Should work now

@djstrong
Copy link
Contributor

Thank you. It works.

@truptikirve26
Copy link

truptikirve26 commented Oct 7, 2020

Hello,
I am working with wordembeddings using Glove and encounter the below error:
**HEAD request failed for url https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/embeddings/glove.gensim.vectors.npy with status code 301**

I also ran the upgrade flair command that was mentioned in earlier posts.

code to reproduce the error

from flair.embeddings import WordEmbeddings, FlairEmbeddings
word_embeddings = [
WordEmbeddings('glove')]

@code-crusher
Copy link

@truptikirve26 - I was able to solve the issue with v0.61. Looks like the older version is still cached in your IDE/venv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests