Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format of input gold_label_dictionary for dependency parser #2575

Closed
FredericBlum opened this issue Dec 28, 2021 · 7 comments · Fixed by #2579
Closed

Format of input gold_label_dictionary for dependency parser #2575

FredericBlum opened this issue Dec 28, 2021 · 7 comments · Fixed by #2579
Labels
question Further information is requested

Comments

@FredericBlum
Copy link

FredericBlum commented Dec 28, 2021

Hello,
I am currently trying to implement the DependencyParser for a corpus in the conllu format. It runs smoothly until it hits the evaluation function, where I receive the following error:
TypeError: unsupported format string passed to Tensor.__format__

This is both with leaving the gold_label_dictionary empty (is marked as "optional" in the class), or with feeding it a label-dictionary. What needs to be my input in order for the parser to run?

I feel also a little bit unsecure regarding the format of the input corpus. Did I understand correctly that the parser only takes the token and the deprel-feature as input, leaving aside upos and "head"?

Looking forward to your help and exploring more about the dependency parser, thanks for implementing it!

@FredericBlum FredericBlum added the question Further information is requested label Dec 28, 2021
@alanakbik
Copy link
Collaborator

@Tarotis can you post your training script?

@FredericBlum
Copy link
Author

corpus, gold_dict = conllu_to_flair('converted.conllu') 
# creates a CorpusColumn corpus with 0:form, 1:upos, 2:head, 3:deprel, filtering out multi-word tokens

label_type = 'deprel'
dependency_dictionary = corpus.make_label_dictionary(label_type=label_type)
flair_embedding_forward = FlairEmbeddings('models/resources/embeddings/sk_forward/best-lm.pt')
flair_embedding_backward = FlairEmbeddings('models/resources/embeddings/sk_backward/best-lm.pt')
embeddings = StackedEmbeddings(embeddings=[flair_embedding_forward, flair_embedding_backward])

tagger = DependencyParser(lstm_hidden_size = 512,
                        token_embeddings=embeddings,
                        relations_dictionary=dependency_dictionary,
                        tag_type=label_type)

trainer = ModelTrainer(tagger, corpus)

trainer.train('models/resources/taggers/example-dependency',
                use_final_model_for_eval = True,
                learning_rate=0.1,
                mini_batch_size=8,
                max_epochs=20)

Some outputs from training:

2021-12-28 18:24:30,040 Corpus contains the labels: upos (#3716), head (#3716), deprel (#3716)
2021-12-28 18:59:06,761 Created (for label 'deprel') Dictionary with 31 tags: <unk>, nsubj, cop, root, punct, case, obj, aux:val, advmod, aux, nmod, amod, cc, obl, conj, xcomp, det, advcl, Lfcl, compound, nummod, x, ccomp, appos, iobj, discourse, vocative, acl, flat, marker, parataxis

Detailed error:

gold_label_dictionary=gold_label_dictionary_for_eval,
  File "anonymized/.environments/nlp/lib/python3.6/site-packages/flair/models/dependency_parser_model.py", line 334, in evaluate
    f"\nUAS : {parsing_metric.get_uas():.4f} - LAS : {parsing_metric.get_las():.4f}"
  File "anonymized/.environments/nlp/lib/python3.6/site-packages/torch/_tensor.py", line 572, in __format__
    return object.__format__(self, format_spec)
TypeError: unsupported format string passed to Tensor.__format__

@alanakbik
Copy link
Collaborator

Can you try using this corpus instead:

corpus = UD_ENGLISH()

dictionary = corpus.make_label_dictionary("dependency")

Does it work then?

@FredericBlum
Copy link
Author

No, I receive the same error as with my own data.

@alanakbik
Copy link
Collaborator

I just tested this script on current master branch and it runs:

from flair.datasets import UD_ENGLISH
from flair.embeddings import StackedEmbeddings, FlairEmbeddings
from flair.models import DependencyParser
from flair.trainers import ModelTrainer

corpus = UD_ENGLISH()

dependency_dictionary = corpus.make_label_dictionary("dependency")

embeddings = StackedEmbeddings(embeddings=[FlairEmbeddings('news-forward-fast'),
                                           FlairEmbeddings('news-backward-fast')])

tagger = DependencyParser(lstm_hidden_size=512,
                          token_embeddings=embeddings,
                          relations_dictionary=dependency_dictionary,
                          tag_type="dependency")

trainer = ModelTrainer(tagger, corpus)

trainer.train('models/resources/taggers/example-dependency',
              use_final_model_for_eval=True,
              learning_rate=0.1,
              mini_batch_size=8,
              max_epochs=20,
              )

@alanakbik
Copy link
Collaborator

Ah wait, I get this error during the evaluation. I'll check.

@FredericBlum
Copy link
Author

First I did a reinstall, but neither old nor new scripts worked.
Then I commented out the two following lines (334, 335):

            f"\nUAS : {parsing_metric.get_uas():.4f} - LAS : {parsing_metric.get_las():.4f}"
            f"\neval loss rel : {eval_loss_rel:.4f} - eval loss arc : {eval_loss_arc:.4f}"

Now all the models run smoothly and the predictions work as well. I still think there could be a bug within those functions, but I wouldn't know why it appears only on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants