Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-243: dataset downloader #246

Closed
wants to merge 27 commits into from
Closed

Conversation

alanakbik
Copy link
Collaborator

addresses #243

  • DataFetcher can now download universal dependencies corpora for 30 languages, WikiNER corpora for 8 languages and some CoNLL tasks.
  • If there is only a training data file, now samples both dev and test data (Train NER for Swedish #3)
  • rename all fetch_* methods to load_* methods

So now you can load a dataset like this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask

# load one corpus
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)
print(corpus)

# load a MultiCorpus of two UD corpora
corpus = NLPTaskDataFetcher.load_corpora([NLPTask.UD_ENGLISH, NLPTask.UD_GERMAN])
print(corpus)

You no longer need to download the UD corpus yourself. The method will check if it is there and if not download the corpus.

@alanakbik alanakbik closed this Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants