No improvement in training loss using ReFT Methods #739

julian-fong · 2024-09-03T20:05:33Z

Environment info

adapters version: latest
Below output is from colab
transformers version: 4.43.4
Platform: Linux-6.1.85+-x86_64-with-glibc2.35
Python version: 3.10.12
Huggingface_hub version: 0.23.5
Safetensors version: 0.4.4
Accelerate version: 0.34.0
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (True)
Tensorflow version (GPU?): 2.17.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.8.4 (gpu)
Jax version: 0.4.26
JaxLib version: 0.4.26
Using distributed or parallel set-up in script?: No
Using GPU in script?: True
GPU type: Tesla T4

Information

Model I am using (Bert, XLNet ...): roberta (not sure if applicable for any model)

Language I am using the model on (English, Chinese ...): english

Adapter setup I am using (if any):

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

Question Answering on the boolq dataset.

Binary classification true/false given a question/passage

To reproduce

The training loss does not decrease by much after training via 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoReftConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)

config = LoReftConfig()
model.add_adapter("loreft_adapter", config=config)
model.add_classification_head("loreft_adapter", num_labels=2)
model.train_adapter("loreft_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training logs:

I tried using the same code to train it with LoRA and the training loss did decrease after 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoRAConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)


config = LoRAConfig(
    selfattn_lora=True, intermediate_lora=True, output_lora=True,
    attn_matrices=["q", "k", "v"],
    alpha=16, r=64, dropout=0.1
)

model.add_adapter("assistant_adapter", config=config)
model.add_classification_head("assistant_adapter", num_labels=2)
model.train_adapter("assistant_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training Logs

I am just wondering if I am doing something incorrect in my script? Some feedback would be appreciated

Thanks!

The text was updated successfully, but these errors were encountered:

calpt · 2024-09-15T12:52:14Z

Hey, from a short investigation of this, I believe these observations might be due to the capacity/ configuration of the adapters rather than an issue in the implementation:

Looking at the parameter count in adapter_summary(), the LoRA adapter has many more parameters/ capacity than the reft config, so the capacity of the reft config might be too limited to adequatly learn the task. To get better performance, it might help to increase reft capacity, e.g. via r (reduction factor) or prefix_positions/ suffix_positions (e.g. LoReftConfig(r=32, prefix_positions=10)). Alternatively, using a larger base model (e.g. roberta-large) might help.

As an additional check, you might try switching the task: On tasks from the GLUE benchmark, our Reft implementation did get solid results. See table here: #705. You might check if you can reproduce those in your setup (data from here).

(side notes: ideally, always use AdapterTrainer (from adapters import AdapterTrainer) for training. Also, increasing learning rate to e.g. 1e-4 is usually beneficial.)

julian-fong · 2024-09-15T16:18:23Z

thank you for the informative response!

julian-fong added the bug Something isn't working label Sep 3, 2024

calpt added question Further information is requested and removed bug Something isn't working labels Sep 15, 2024

julian-fong closed this as completed Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No improvement in training loss using ReFT Methods #739

No improvement in training loss using ReFT Methods #739

julian-fong commented Sep 3, 2024

calpt commented Sep 15, 2024

julian-fong commented Sep 15, 2024

No improvement in training loss using ReFT Methods #739

No improvement in training loss using ReFT Methods #739

Comments

julian-fong commented Sep 3, 2024

Environment info

Information

To reproduce

calpt commented Sep 15, 2024

julian-fong commented Sep 15, 2024