Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No improvement in training loss using ReFT Methods #739

Closed
2 of 4 tasks
julian-fong opened this issue Sep 3, 2024 · 2 comments
Closed
2 of 4 tasks

No improvement in training loss using ReFT Methods #739

julian-fong opened this issue Sep 3, 2024 · 2 comments
Labels
question Further information is requested

Comments

@julian-fong
Copy link
Contributor

Environment info

  • adapters version: latest
    Below output is from colab
  • transformers version: 4.43.4
  • Platform: Linux-6.1.85+-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.5
  • Safetensors version: 0.4.4
  • Accelerate version: 0.34.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.4.0+cu121 (True)
  • Tensorflow version (GPU?): 2.17.0 (True)
  • Flax version (CPU?/GPU?/TPU?): 0.8.4 (gpu)
  • Jax version: 0.4.26
  • JaxLib version: 0.4.26
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: True
  • GPU type: Tesla T4

Information

Model I am using (Bert, XLNet ...): roberta (not sure if applicable for any model)

Language I am using the model on (English, Chinese ...): english

Adapter setup I am using (if any):

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

Question Answering on the boolq dataset.

Binary classification true/false given a question/passage

To reproduce

The training loss does not decrease by much after training via 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoReftConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)

config = LoReftConfig()
model.add_adapter("loreft_adapter", config=config)
model.add_classification_head("loreft_adapter", num_labels=2)
model.train_adapter("loreft_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training logs:
image

I tried using the same code to train it with LoRA and the training loss did decrease after 5 epochs

from datasets import load_dataset, DatasetDict

boolq = DatasetDict()

boolq["train"] = load_dataset("google/boolq", split = "train")
boolq["val"] = load_dataset("google/boolq", split="validation")

model_name_or_path = "roberta-base"

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)

def preprocess_function(examples):
    return tokenizer(examples['passage'], examples['question'], truncation=True, padding='max_length')

tokenized_datasets = boolq.map(preprocess_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["question","passage"])
tokenized_datasets = tokenized_datasets.rename_column("answer","label")

from transformers import default_data_collator

data_collator = default_data_collator

from adapters import AutoAdapterModel, LoRAConfig
model = AutoAdapterModel.from_pretrained(model_name_or_path)


config = LoRAConfig(
    selfattn_lora=True, intermediate_lora=True, output_lora=True,
    attn_matrices=["q", "k", "v"],
    alpha=16, r=64, dropout=0.1
)

model.add_adapter("assistant_adapter", config=config)
model.add_classification_head("assistant_adapter", num_labels=2)
model.train_adapter("assistant_adapter")

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['val'],
    tokenizer=tokenizer,
)

trainer.train()

Training Logs

image

I am just wondering if I am doing something incorrect in my script? Some feedback would be appreciated

Thanks!

@julian-fong julian-fong added the bug Something isn't working label Sep 3, 2024
@calpt
Copy link
Member

calpt commented Sep 15, 2024

Hey, from a short investigation of this, I believe these observations might be due to the capacity/ configuration of the adapters rather than an issue in the implementation:

Looking at the parameter count in adapter_summary(), the LoRA adapter has many more parameters/ capacity than the reft config, so the capacity of the reft config might be too limited to adequatly learn the task. To get better performance, it might help to increase reft capacity, e.g. via r (reduction factor) or prefix_positions/ suffix_positions (e.g. LoReftConfig(r=32, prefix_positions=10)). Alternatively, using a larger base model (e.g. roberta-large) might help.

As an additional check, you might try switching the task: On tasks from the GLUE benchmark, our Reft implementation did get solid results. See table here: #705. You might check if you can reproduce those in your setup (data from here).

(side notes: ideally, always use AdapterTrainer (from adapters import AdapterTrainer) for training. Also, increasing learning rate to e.g. 1e-4 is usually beneficial.)

@calpt calpt added question Further information is requested and removed bug Something isn't working labels Sep 15, 2024
@julian-fong
Copy link
Contributor Author

thank you for the informative response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants