GH-243: dataset downloader #246

alanakbik · 2018-11-26T16:08:27Z

addresses #243

DataFetcher can now download universal dependencies corpora for 30 languages, WikiNER corpora for 8 languages and some CoNLL tasks.
If there is only a training data file, now samples both dev and test data (Train NER for Swedish #3)
rename all fetch_* methods to load_* methods

So now you can load a dataset like this:

from flair.data_fetcher import NLPTaskDataFetcher, NLPTask

# load one corpus
corpus = NLPTaskDataFetcher.load_corpus(NLPTask.UD_ENGLISH)
print(corpus)

# load a MultiCorpus of two UD corpora
corpus = NLPTaskDataFetcher.load_corpora([NLPTask.UD_ENGLISH, NLPTask.UD_GERMAN])
print(corpus)

You no longer need to download the UD corpus yourself. The method will check if it is there and if not download the corpus.

GH-242: Wrapper for hyperopt

…h/flair into GH-243-dataset-downloader

aakbik and others added 27 commits November 26, 2018 14:51

GH-243: added dataset downloader for UD and CoNLL corpora

efc5bb7

GH-243: fixed test

8c0ad4a

GH-243: added wikiner reader

98febfd

GH-242: Add hyperopt requirement

82b4195

GH-242: Add hyperopt wrapper class.

7f91127

GH-242: Rename package + classes.

242c82b

GH-242: model trainer returns dict of values

eb5b885

GH-242: Update parameter names.

c3351b9

GH-242: Rename dropout parameter.

057d3d2

GH-242: Clean up.

d2560f1

GH-3: data fetcher samples test data from train if no test file exists

0de8c0b

GH-242: Add tests.

233c96e

GH-242: Improve logging.

53c96d8

GH-243: added WikiNER downloader for all languages

0dcc7de

GH-243: added test for dataset downloader

ce62212

GH-242: Add parameter for new optimizers

832093a

GH-243: clean up data after completing test

015b82e

Merge pull request #245 from zalandoresearch/GH-242-hyperopt

69dbad3

GH-242: Wrapper for hyperopt

GH-243: added dataset downloader for UD and CoNLL corpora

879482b

GH-243: fixed test

3ccd5ba

GH-243: added wikiner reader

56054b5

GH-3: data fetcher samples test data from train if no test file exists

dbf593f

GH-243: added WikiNER downloader for all languages

057962c

GH-243: added test for dataset downloader

bb31364

GH-243: clean up data after completing test

b288dec

GH-243: fix test

c6b0447

Merge branch 'GH-243-dataset-downloader' of github.com:zalandoresearc…

2d819e4

…h/flair into GH-243-dataset-downloader

alanakbik closed this Nov 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-243: dataset downloader #246

GH-243: dataset downloader #246

alanakbik commented Nov 26, 2018

GH-243: dataset downloader #246

GH-243: dataset downloader #246

Conversation

alanakbik commented Nov 26, 2018