AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter #2161

qgallouedec · 2024-10-03T09:05:00Z

System Info

Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
Python version: 3.11.9
PyTorch version: 2.4.1
CUDA device: NVIDIA H100 80GB HBM3
Transformers version: 4.46.0.dev0
Accelerate version: 0.34.2
Accelerate config: not found
Datasets version: 3.0.0
HF Hub version: 0.24.7
TRL version: 0.12.0.dev0+07cebf3
bitsandbytes version: 0.41.1
DeepSpeed version: 0.15.1
Diffusers version: 0.30.3
Liger-Kernel version: 0.3.0
LLM-Blender version: 0.0.2
OpenAI version: 1.46.0
PEFT version: 0.12.0

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer, AutoModelForCausalLM
from datasets import load_dataset
import tempfile
from trl import DPOTrainer, DPOConfig

model_id = "HuggingFaceM4/tiny-random-LlamaForCausalLM"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

with tempfile.TemporaryDirectory() as tmp_dir:
    training_args = DPOConfig(output_dir=tmp_dir)
    dummy_dataset = load_dataset("trl-internal-testing/zen", "standard_preference")
    trainer = DPOTrainer(model=model, args=training_args, tokenizer=tokenizer, train_dataset=dummy_dataset["train"])

[2024-10-03 09:03:00,224] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
/fsx/qgallouedec/transformers/src/transformers/generation/configuration_utils.py:579: UserWarning: `pad_token_id` should be positive but got -1. This will cause errors when batch generating, if there is padding. Please set `pad_token_id` explicitly as `model.generation_config.pad_token_id=PAD_TOKEN_ID` to avoid errors in generation
  warnings.warn(
Traceback (most recent call last):
  File "/fsx/qgallouedec/transformers/../trl/dfg.py", line 13, in <module>
    trainer = DPOTrainer(model=model, args=training_args, tokenizer=tokenizer, train_dataset=dummy_dataset["train"])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/trl/trl/trainer/dpo_trainer.py", line 635, in __init__
    self.tokenizer = tokenizer
    ^^^^^^^^^^^^^^
AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter

Expected behavior

To work, like before.

The text was updated successfully, but these errors were encountered:

qgallouedec · 2024-10-03T09:09:11Z

Origin of error, this change: huggingface/transformers#32385
git bisect is a wonderfull tool.

qgallouedec · 2024-10-03T09:29:44Z

This bug is linked to the fact that tokenizer will no longer be an argument of trainer, but instead, processing_class.

Suggested migration plan:

Do the same change, eg;

  trainer = RewardTrainer(
      model=model,
      args=training_args,
-     tokenizer=tokenizer,
+     processing_class=tokenizer,
      train_dataset=dataset,
      peft_config=peft_config,
  )

Ensure backward compatibility only for SFTTrainer and DPOTrainer via:

def __init__(
    ...
        tokenizer: Optional[PreTrainedTokenizerBase] = None,
        processing_class: Optional[
            Union[PreTrainedTokenizerBase, BaseImageProcessor, FeatureExtractionMixin, ProcessorMixin]
        ] = None,
    ...
):
    if tokenizer is not None:
      if processing_class is not None:
          raise ValueError(
              "You cannot specify both `tokenizer` and `processing_class` at the same time. Please use `processing_class`."
          )
      warnings.warn(
          "`tokenizer` is now deprecated and will be removed in the future, please use `processing_class` instead.",
          FutureWarning,
      )
      processing_class = tokenizer

kashif · 2024-10-03T09:43:07Z

yes looks like a good solution

edbeeching · 2024-10-03T09:52:20Z

Yes seems good to me, it is a shame that these lines are just duplicates from the Trainer class and there is no way to just inherit them.

    if tokenizer is not None:
      if processing_class is not None:
          raise ValueError(
              "You cannot specify both `tokenizer` and `processing_class` at the same time. Please use `processing_class`."
          )
      warnings.warn(
          "`tokenizer` is now deprecated and will be removed in the future, please use `processing_class` instead.",
          FutureWarning,
      )
      processing_class = tokenizer

qgallouedec added the 🐛 bug Something isn't working label Oct 3, 2024

qgallouedec linked a pull request Oct 3, 2024 that will close this issue

Rename trainer arg tokenizer to processing_class #2162

Merged

18 tasks

This was referenced Oct 3, 2024

Rename trainer arg tokenizer to processing_class #2162

Merged

🩹 [Hotfix] Add setter for tokenizer #2163

Merged

qgallouedec closed this as completed in #2163 Oct 3, 2024

qgallouedec reopened this Oct 3, 2024

qgallouedec closed this as completed Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter #2161

AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter #2161

qgallouedec commented Oct 3, 2024

qgallouedec commented Oct 3, 2024 •

edited

Loading

qgallouedec commented Oct 3, 2024

kashif commented Oct 3, 2024

edbeeching commented Oct 3, 2024

AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter #2161

AttributeError: property 'tokenizer' of 'DPOTrainer' object has no setter #2161

Comments

qgallouedec commented Oct 3, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

qgallouedec commented Oct 3, 2024 • edited Loading

qgallouedec commented Oct 3, 2024

kashif commented Oct 3, 2024

edbeeching commented Oct 3, 2024

qgallouedec commented Oct 3, 2024 •

edited

Loading