-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why SentenceTransformerDocumentEmbeddings are not fine-tunable? #1769
Comments
Hello @ilya-palachev I am not sure if sentence transformers can be further fine-tuned and if this makes sense. @nreimers can you comment? |
Yes, sentence transformers could be further fine tuned. It is basically a PyTorch Sequential Model (https://pytorch.org/docs/master/generated/torch.nn.Sequential.html) that first calls a BERT (etc.) model and then performs a mean pooling operation. If the forward function of SentenceTransformers is used, you would get gradients for the weights in BERT and BERT would be updated. Would it make sense to fine-tune them? If you have enough training data, I think it would make sense. By the way, most models are available from us in the huggingface repository: I see there is in flair a TransformerDocumentEmbedding, so you could try this:
This would load our sentence-transformers bert-base-nli-mean-tokens models. It loads this model without any pooling layer. I am not sure what pooling strategy TransformerDocumentEmbeddings uses? Does it use mean pooling or does it use the CLS token as embedding? If it uses the CLS token as embedding, than this would be the right model: Best |
Ah interesting, thanks! Yes, the So for the CLS sentence transformers, I guess we actually don't need a separate class. For the others, we would need to add a mean pooling layer, then we could have all transformers in one class, right? |
Hi @alanakbik Yes, adding mean pooling would be quite nice to the TransformerDocumentEmbeddings class. See here how to do this with minimal code for the HF AutoModel: Sometimes max pooling is also quite nice. Here you can find the code how to do max pooling with HF AutoModel: The pooling mechanism could be added as a parameter to the TransformerDocumentEmbeddings class. Best |
@nreimers thanks for the info - we'll get right on it :) |
Hi! Was mean pooling ever added in the recent releases? I was curious to try it 😃 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello! First of all, thanks for this awesome package!
I'm training a text classifier as described in the tutorial. Since my texts are quite short, I'm using the sentence transformer for embedding:
The model shows a good performance for our data. However, I realized that only the classifier layer is trained, and the transfomer is stayed untouched during the training:
After seeking the implementation, I see that sentence transformer even doesn't have
fine_tune
parameter:flair/flair/embeddings/document.py
Lines 520 to 530 in 17fa344
and
static_embeddings
parameter is explicitly set toTrue
:flair/flair/embeddings/document.py
Line 550 in 17fa344
So, why
SentenceTransformerDocumentEmbeddings
are not fine-tunable? Is it because they are already fine-tuned for sentence embedding on texts like the STS dataset (i.e. as described in their paper)? So, is it known to be not good to fine-tune already fine-tuned transformers?Or you have some other specific reason for staying this kind of embedding as static only?
As I can see, in #1492 it is announced that all transformers are now tunable in this library. But only
SentenceTransformerDocumentEmbeddings
are not.Thanks in advance!!!
The text was updated successfully, but these errors were encountered: