Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError while making NER using flair example with custom dataset #1131

Closed
MohamedLotfyElrefai opened this issue Sep 19, 2019 · 0 comments · Fixed by #1135
Closed

IndexError while making NER using flair example with custom dataset #1131

MohamedLotfyElrefai opened this issue Sep 19, 2019 · 0 comments · Fixed by #1135
Labels
bug Something isn't working

Comments

@MohamedLotfyElrefai
Copy link

MohamedLotfyElrefai commented Sep 19, 2019

Describe the bug
I have passed a train data file and empty files for dev and the test data set I face the same problem on my custom data set.
link for data set

https://drive.google.com/open?id=1ZiWTVuvcm5r0kRM50P8xV2trbKyjVU7-

To Reproduce

from flair.data import Corpus
from flair.datasets import ColumnCorpus

# define columns
columns = {0: 'text', 1: 'ner'}

# this is the folder in which train, test and dev files reside

# init a corpus using column format, data folder and the names of the train, dev and test files
corpus: Corpus = ColumnCorpus('',columns,
                              train_file='train.txt',
                              dev_file='test.txt',
                             test_file='test.txt')

from flair.data import Corpus
from flair.embeddings import TokenEmbeddings, WordEmbeddings, StackedEmbeddings, PooledFlairEmbeddings
from typing import List

# 1. get the corpus

# 2. what tag do we want to predict?
tag_type = 'ner'

# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)

# initialize embeddings
embedding_types: List[TokenEmbeddings] = [

    # GloVe embeddings
    WordEmbeddings('glove'),

    # contextual string embeddings, forward
    PooledFlairEmbeddings('news-forward', pooling='min'),

    # contextual string embeddings, backward
    PooledFlairEmbeddings('news-backward', pooling='min'),
]

embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_types)

# initialize sequence tagger
from flair.models import SequenceTagger

tagger: SequenceTagger = SequenceTagger(hidden_size=256,
                                        embeddings=embeddings,
                                        tag_dictionary=tag_dictionary,
                                        tag_type=tag_type)

# initialize trainer
from flair.trainers import ModelTrainer

trainer: ModelTrainer = ModelTrainer(tagger,corpus)

trainer.train('resources/taggers/example-ner',train_with_dev=True,max_epochs=150)

Expected behavior
IndexError: string index out of range

Environment (please complete the following information):

  • OS [Linux 18.04 ]:
  • Version [flair-0.4.3]:
  • cuda 10 and gpu version
    OS [windows10 ]:
  • Version [flair-0.4.3]:
  • cpu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant