Issue on page /tune/examples/pbt_transformers.html #41266

shamik-biswas-rft · 2023-11-20T07:28:29Z

Hello Someone,

The version of Ray is 2.8.0 and the version of Transformers is 4.28.1. I have also tested it with Transformers is 4.35.2.

I am trying to run the hyperparameter search as per the example script and getting the following error:

---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/Repos/amplify-ml-training-pipeline/a250588-model-build/pipelines/ESG/extraction/numeric_attributes/research/raytune.py", line 212, in <module>
    tune_transformer(num_samples=1, gpus_per_trial=0, smoke_test=True)
  File "/root/Repos/amplify-ml-training-pipeline/a250588-model-build/pipelines/ESG/extraction/numeric_attributes/research/raytune.py", line 181, in tune_transformer
    trainer.hyperparameter_search(
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/transformers/trainer.py", line 2592, in hyperparameter_search
    best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/transformers/integrations.py", line 327, in run_hp_search_ray
    trainable = ray.tune.with_parameters(_objective, local_trainer=trainer)
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/ray/tune/trainable/util.py", line 315, in with_parameters
    raise DeprecationWarning(_CHECKPOINT_DIR_ARG_DEPRECATION_MSG)
DeprecationWarning: Accepting a `checkpoint_dir` argument in your training function is deprecated.
Please use `ray.train.get_checkpoint()` to access your checkpoint as a
`ray.train.Checkpoint` object instead. See below for an example:

Before
------

from ray import tune

def train_fn(config, checkpoint_dir=None):
    if checkpoint_dir:
        torch.load(os.path.join(checkpoint_dir, "checkpoint.pt"))
    ...

tuner = tune.Tuner(train_fn)
tuner.fit()

After
-----

from ray import train, tune

def train_fn(config):
    checkpoint: train.Checkpoint = train.get_checkpoint()
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            torch.load(os.path.join(checkpoint_dir, "checkpoint.pt"))
    ...

tuner = tune.Tuner(train_fn)
tuner.fit()

I don’t know if it’s a ray tune specific issue or a hugging face trainer issue. If someone can help me with that, or suggest anything that i can do, i will create either an issue or a PR for the same.

Thank you

ddelange · 2023-12-01T21:16:14Z

potential duplicate of #39763

ddelange · 2023-12-01T21:17:54Z

fixed by #40125 ?

shamik-biswas-rft · 2023-12-08T04:35:41Z

fixed by #40125 ?

Yes this will be fixed by the same.

shamik-biswas-rft closed this as completed Dec 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue on page /tune/examples/pbt_transformers.html #41266

Issue on page /tune/examples/pbt_transformers.html #41266

shamik-biswas-rft commented Nov 20, 2023

ddelange commented Dec 1, 2023

ddelange commented Dec 1, 2023

shamik-biswas-rft commented Dec 8, 2023

Issue on page /tune/examples/pbt_transformers.html #41266

Issue on page /tune/examples/pbt_transformers.html #41266

Comments

shamik-biswas-rft commented Nov 20, 2023

ddelange commented Dec 1, 2023

ddelange commented Dec 1, 2023

shamik-biswas-rft commented Dec 8, 2023