[Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint #3446

chelseagzr · 2024-04-22T08:19:48Z

Describe the bug

Thank you for developing and maintaining this invaluable module!

We would like to learn a multi-task model on two NER tasks by sharing a transformer word embedding.
We fine-tuned the model for several epochs and saved the checkpoint every epoch by specifying save_model_each_k_epochs=1 when calling the function fine_tune.
Now, assume we would like to continue fine-tuning from a previously saved checkpoint.
We loaded the model from a checkpoint by calling the function MultitaskModel.load. However, the transformer word embedding is no longer shared between the two tasks.

To Reproduce

%pip install scipy==1.10.1 transformers torch==2.0 flair==0.13.1 

from flair.datasets import NER_CHINESE_WEIBO, NER_ENGLISH_PERSON
from flair.embeddings import TransformerWordEmbeddings
from flair.models import SequenceTagger, MultitaskModel 
from flair.trainers import ModelTrainer
from flair.nn.multitask import make_multitask_model_and_corpus

# 1. get the corpus
corpus_1 = NER_CHINESE_WEIBO()
print(corpus_1)
corpus_2 = NER_ENGLISH_PERSON()
print(corpus_2)

# 2. what label do we want to predict?
label_type = 'ner'

# 3. make the label dictionary from the corpus
label_dict_1 = corpus_1.make_label_dictionary(label_type=label_type, add_unk=False)
print(label_dict_1)
label_dict_2 = corpus_2.make_label_dictionary(label_type=label_type, add_unk=False)
print(label_dict_2)

# 4. initialize fine-tuneable transformer embeddings WITH document context
shared_embeddings = TransformerWordEmbeddings(model='xlm-roberta-base',
                                       layers="-1",
                                       subtoken_pooling="first",
                                       fine_tune=True,
                                       use_context=True
)

# 5. initialize bare-bones sequence tagger (no CRF, no RNN, no reprojection)
tagger_1 = SequenceTagger(hidden_size=256,
                        embeddings=shared_embeddings,
                        tag_dictionary=label_dict_1,
                        tag_type=label_type,
                        use_crf=False,
                        use_rnn=False,
                        reproject_embeddings=False
)
tagger_2 = SequenceTagger(hidden_size=256,
                        embeddings=shared_embeddings,
                        tag_dictionary=label_dict_2,
                        tag_type=label_type,
                        use_crf=False,
                        use_rnn=False,
                        reproject_embeddings=False
)

# 6. initialize trainer
multitask_model, multicorpus = make_multitask_model_and_corpus(
    [
        (tagger_1, corpus_1),
        (tagger_2, corpus_2),
    ]
)
# the embedding layer of tagger_1 and tagger_2 are shared (one copy of embedding layer)
trainer = ModelTrainer(multitask_model, multicorpus)

# 7. run fine-tuning
trainer.fine_tune('resources/taggers/sota-ner-flert',
                  learning_rate=5.0e-6,
                  max_epochs=1,
                  mini_batch_size=4,
                  save_model_each_k_epochs=1
)

# 8. load from saved checkpoint
multitask_model = MultitaskModel.load('resources/taggers/sota-ner-flert/model_epoch_1.pt')
# the embedding layer of tagger_1 and tagger_2 are NOT shared now (two copies of embedding layer). The two copies have the same values after loading from the checkpoint, but they will have different values if we continue fine-tuning.

# 9. continue fine-tuning
trainer = ModelTrainer(multitask_model, multicorpus)
trainer.fine_tune('resources/taggers/sota-ner-flert',
                  learning_rate=5.0e-6,
                  epoch=1,
                  max_epochs=2,
                  mini_batch_size=4,
                  save_model_each_k_epochs=1
)

Expected behavior

Shared layers between tasks are still shared after loading from a checkpoint.

Logs and Stack traces

No response

Screenshots

No response

Additional Context

No response

Environment

Versions:

Flair

0.13.1

Pytorch

2.0.0+cu117

Transformers

4.40.0

GPU

True

The text was updated successfully, but these errors were encountered:

chelseagzr · 2024-05-06T07:24:58Z

For this specific example, I think the following method works. (Please let me know if you see any problem in this method.) I was wondering if the bug can be fixed inside the MultitaskModel.load method such that this method works for any multitask model. Thank you!

Assign the embedding layers of one task to the other tasks:

# 8. load from saved checkpoint
multitask_model = MultitaskModel.load('resources/taggers/sota-ner-flert/model_epoch_1.pt')
# the embedding layer of tagger_1 and tagger_2 are NOT shared now (two copies of embedding layer). The two copies have the same values after loading from the checkpoint, but they will have different values if we continue fine-tuning.

# 9. assign the embedding layers of Task_0 to the embedding layers of Task_1
multitask_model.tasks['Task_1'].embeddings = multitask_model.tasks['Task_0'].embeddings

# 10. continue fine-tuning
trainer = ModelTrainer(multitask_model, multicorpus)
trainer.fine_tune('resources/taggers/sota-ner-flert',
                  learning_rate=5.0e-6,
                  epoch=1,
                  max_epochs=2,
                  mini_batch_size=4,
                  save_model_each_k_epochs=1
)

chelseagzr · 2024-07-30T21:20:26Z

Do you have any idea how this can be fixed for a general MultitaskModel? If so, I can work on it and submit a PR.

chelseagzr · 2024-08-06T06:37:01Z

Besides embeddings layers, this issue also exists for rnn layers.
Since this issue is only related to the save and the load methods, I wrote some new codes to reproduce this issue without calling the fine_tune method:

%pip install scipy==1.10.1 transformers torch==2.0 flair==0.14.0

import torch
from flair.datasets import NER_CHINESE_WEIBO, NER_ENGLISH_PERSON
from flair.embeddings import TransformerWordEmbeddings
from flair.models import SequenceTagger, MultitaskModel 
from flair.nn.multitask import make_multitask_model_and_corpus


def create_multitask_model(corpus_1, corpus_2, label_type_1, label_type_2):
    label_dict_1 = corpus_1.make_label_dictionary(label_type=label_type_1, add_unk=False)
    label_dict_2 = corpus_2.make_label_dictionary(label_type=label_type_2, add_unk=False)
    shared_embeddings = TransformerWordEmbeddings(
        model='xlm-roberta-base',
        layers="-1",
        subtoken_pooling="first",
        fine_tune=True,
        use_context=True
    )
    shared_rnn = torch.nn.GRU(
        input_size=shared_embeddings.embedding_length,
        hidden_size=256,
        num_layers=2,
        bidirectional=True,
        batch_first=True,
    )
    tagger_1 = SequenceTagger(
        hidden_size=256,
        embeddings=shared_embeddings,
        tag_dictionary=label_dict_1,
        tag_type=label_type_1,
        use_crf=False,
        rnn=shared_rnn,
        reproject_embeddings=False
    )
    tagger_2 = SequenceTagger(
        hidden_size=256,
        embeddings=shared_embeddings,
        tag_dictionary=label_dict_2,
        tag_type=label_type_2,
        use_crf=False,
        rnn=shared_rnn,
        reproject_embeddings=False
    )
    multitask_model, multicorpus = make_multitask_model_and_corpus(
        [
            (tagger_1, corpus_1),
            (tagger_2, corpus_2),
        ]
    )
    return multitask_model, multicorpus


corpus_1 = NER_CHINESE_WEIBO()
corpus_2 = NER_ENGLISH_PERSON()

multitask_model, _ = create_multitask_model(corpus_1, corpus_2, 'ner', 'ner')

# confirm embeddings are shared between two tasks in multitask_model
print(multitask_model.tasks['Task_0'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
print(multitask_model.tasks['Task_1'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
# confirm rnn layers are shared between two tasks in multitask_model
print(multitask_model.tasks['Task_0'].rnn.weight_ih_l0.data_ptr())
print(multitask_model.tasks['Task_1'].rnn.weight_ih_l0.data_ptr())
# output:
# 139968856981504
# 139968856981504
# 139972491894784
# 139972491894784

# save multitask_model to disk using "save" method
multitask_model.save("saved_model_using_flair.pt")
# create new multitask_model using "load" method
multitask_model_using_flair = MultitaskModel.load("saved_model_using_flair.pt")

# confirm embeddings are not shared between two tasks in multitask_model_using_flair
print(multitask_model_using_flair.tasks['Task_0'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
print(multitask_model_using_flair.tasks['Task_1'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
# confirm rnn layers are not shared between two tasks in multitask_model_using_flair
print(multitask_model_using_flair.tasks['Task_0'].rnn.weight_ih_l0.data_ptr())
print(multitask_model_using_flair.tasks['Task_1'].rnn.weight_ih_l0.data_ptr())
# output:
# 139971709108224
# 139966139072512
# 139973196636160
# 139973185601536

chelseagzr · 2024-08-06T06:45:35Z

I think we don't have this issue when using the torch functions torch.save, model.state_dict, torch.load, model.load_state_dict to save model parameters to disk and load model parameters from disk:

# save multitask_model to disk using torch methods
torch.save(multitask_model.state_dict(), "saved_model_using_torch.pt")
# create new multitask_model using torch methods
multitask_model_using_torch, _ = create_multitask_model(corpus_1, corpus_2, 'ner', 'ner')
multitask_model_using_torch.load_state_dict(torch.load("saved_model_using_torch.pt"))

# confirm embeddings are shared between two tasks in multitask_model_using_torch
print(multitask_model_using_torch.tasks['Task_0'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
print(multitask_model_using_torch.tasks['Task_1'].embeddings.model.embeddings.word_embeddings.weight.data_ptr())
# confirm rnn layers are shared between two tasks in multitask_model_using_torch
print(multitask_model_using_torch.tasks['Task_0'].rnn.weight_ih_l0.data_ptr())
print(multitask_model_using_torch.tasks['Task_1'].rnn.weight_ih_l0.data_ptr())
# output:
# 139965367320576
# 139965367320576
# 139967313477632
# 139967313477632

shigapov · 2024-08-06T08:53:21Z

Hi all, i have the same problem with multitask for NER and NEL using a code similar to NER/NEL tutorial, so using SequenceTagger + SpanClassifier with shared embeddings.

After fine tuning it predicts both NER and NEL tags. But if I load a multitask model via MultitaskModel.load(), the embeddings are not shared anymore and the model predict only NER-tags.

chelseagzr added the bug Something isn't working label Apr 22, 2024

shigapov mentioned this issue Aug 6, 2024

[Question]: How to train an end-to-end Entity Linking model? #3356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint #3446

[Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint #3446

chelseagzr commented Apr 22, 2024

chelseagzr commented May 6, 2024 •

edited

Loading

chelseagzr commented Jul 30, 2024

chelseagzr commented Aug 6, 2024

chelseagzr commented Aug 6, 2024

shigapov commented Aug 6, 2024

[Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint #3446

[Bug]: Shared layers in multi-task model are no longer shared after loading the model from a checkpoint #3446

Comments

chelseagzr commented Apr 22, 2024

Describe the bug

To Reproduce

Expected behavior

Logs and Stack traces

Screenshots

Additional Context

Environment

Versions:

Flair

Pytorch

Transformers

GPU

chelseagzr commented May 6, 2024 • edited Loading

chelseagzr commented Jul 30, 2024

chelseagzr commented Aug 6, 2024

chelseagzr commented Aug 6, 2024

shigapov commented Aug 6, 2024

chelseagzr commented May 6, 2024 •

edited

Loading