Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fine-Tunable Transformers to Flair #1492

Closed
2 tasks done
alanakbik opened this issue Mar 25, 2020 · 6 comments
Closed
2 tasks done

Add Fine-Tunable Transformers to Flair #1492

alanakbik opened this issue Mar 25, 2020 · 6 comments
Labels
feature A new feature

Comments

@alanakbik
Copy link
Collaborator

alanakbik commented Mar 25, 2020

We currently support word embeddings from Huggingface's various transformer models (BERT, XLM, etc.), but two important features are missing: (1) we don't yet support sentence embeddings extracted directly from the transformer model using the [CLS] token and (2) the transformers currently are not fine-tuneable via Flair. This is a shame since transformers really shine when sentence embeddings are directly extracted from a fine-tuned transformer.

So with this issue, we want to add

  • The ability to get sentence embeddings directly from transformers, by adding new DocumentEmbeddings classes
  • The ability to fine-tune all transformer word and document embeddings classes
@djstrong
Copy link
Contributor

Supporting longer texts (more than 512 subtokens) would be helpful (at least for prediction). My research show that processing paragraphs rather than sentences decreases error by 10%.

@alanakbik
Copy link
Collaborator Author

Yes good point - what is the 'standard' way of working around the 512 subtoken limitation of transformers? I guess easiest would be to truncate the text to max length 512, but maybe there is a better way?

@djstrong
Copy link
Contributor

djstrong commented Mar 29, 2020

I have in mind sequence tagging so truncating in prediction mode is unacceptable. The text should be divided into splits with some overlapping context and then reconstructed.

For text classification there are some truncating strategies. However, in simple-transformers text is divided and each part is predicted separately, then the mode of text predictions is a final result.

@alanakbik
Copy link
Collaborator Author

Thanks - yes for TransformerWordEmbeddings an overlapping segment strategy should be doable and sounds like the best approach. For TransformerDocumentEmbeddings we require a strategy that outputs a single embedding for a text of arbitrary length so maybe truncation is the way to go here.

@alanakbik
Copy link
Collaborator Author

Just for reference, some truncation strategies are evaluated in this paper.

@alanakbik
Copy link
Collaborator Author

Fine-tuning now part of Flair 0.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

2 participants