Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NER task using Flair BertEmbeddings VS HuggingFace scripts #1508

Closed
ChessMateK opened this issue Apr 3, 2020 · 3 comments
Closed

NER task using Flair BertEmbeddings VS HuggingFace scripts #1508

ChessMateK opened this issue Apr 3, 2020 · 3 comments
Labels
question Further information is requested

Comments

@ChessMateK
Copy link

ChessMateK commented Apr 3, 2020

Hi everyone!

I am new to NLP and NER so I'm still trying to understand how exactly different architectures work.

My question is the following: the architecture implemented to do NER using Flair BertEmbeddings withing Flair SequenceTagger is the same implemented by HuggingFace Team in the pytorch/tf example scripts here?

In particular my doubt is related to the fact the Flair SequenceTagger is based on Bi-LSTM(-CRF) that I still see through layers when running, while HuggingFace scripts are based purely on the Transformer architecture.

I am running this tutorial in Google Colab.

I'd really appreciate any clarification. Thank you all in advance.

Best regards.

@ChessMateK ChessMateK added the question Further information is requested label Apr 3, 2020
@ChessMateK
Copy link
Author

Maybe I have to ask to the master :-) @alanakbik

@alanakbik
Copy link
Collaborator

For the Huggingface scripts @stefan-it is the person to ask :)

Both implementations are very different: In Flair, our default sequence labeling architecture is BiLSTM-CRF with a feature-based approach (i.e. no fine-tuning of transformers) trained with many epochs of SGD and annealing. Huggingface is I believe doing a fine-tuning of transformers as in the BERT paper (few epochs, very small learning rate, Adam optimizer) which is very different.

We are just now adding this fine-tuning transformers approach for Flair as well - it's part of master branch and undergoing testing (see #1494), so it will be part of the next release. It should allow the community to directly compare both approaches.

@ChessMateK
Copy link
Author

ChessMateK commented Apr 15, 2020

Thank you @alanakbik :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants