Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on page /tune/examples/pbt_transformers.html #41266

Closed
shamik-biswas-rft opened this issue Nov 20, 2023 · 3 comments
Closed

Issue on page /tune/examples/pbt_transformers.html #41266

shamik-biswas-rft opened this issue Nov 20, 2023 · 3 comments

Comments

@shamik-biswas-rft
Copy link

Hello Someone,

The version of Ray is 2.8.0 and the version of Transformers is 4.28.1. I have also tested it with Transformers is 4.35.2.

I am trying to run the hyperparameter search as per the example script and getting the following error:

---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/Repos/amplify-ml-training-pipeline/a250588-model-build/pipelines/ESG/extraction/numeric_attributes/research/raytune.py", line 212, in <module>
    tune_transformer(num_samples=1, gpus_per_trial=0, smoke_test=True)
  File "/root/Repos/amplify-ml-training-pipeline/a250588-model-build/pipelines/ESG/extraction/numeric_attributes/research/raytune.py", line 181, in tune_transformer
    trainer.hyperparameter_search(
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/transformers/trainer.py", line 2592, in hyperparameter_search
    best_run = backend_dict[backend](self, n_trials, direction, **kwargs)
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/transformers/integrations.py", line 327, in run_hp_search_ray
    trainable = ray.tune.with_parameters(_objective, local_trainer=trainer)
  File "/opt/conda/envs/ml/lib/python3.9/site-packages/ray/tune/trainable/util.py", line 315, in with_parameters
    raise DeprecationWarning(_CHECKPOINT_DIR_ARG_DEPRECATION_MSG)
DeprecationWarning: Accepting a `checkpoint_dir` argument in your training function is deprecated.
Please use `ray.train.get_checkpoint()` to access your checkpoint as a
`ray.train.Checkpoint` object instead. See below for an example:

Before
------

from ray import tune

def train_fn(config, checkpoint_dir=None):
    if checkpoint_dir:
        torch.load(os.path.join(checkpoint_dir, "checkpoint.pt"))
    ...

tuner = tune.Tuner(train_fn)
tuner.fit()

After
-----

from ray import train, tune

def train_fn(config):
    checkpoint: train.Checkpoint = train.get_checkpoint()
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            torch.load(os.path.join(checkpoint_dir, "checkpoint.pt"))
    ...

tuner = tune.Tuner(train_fn)
tuner.fit()

I don’t know if it’s a ray tune specific issue or a hugging face trainer issue. If someone can help me with that, or suggest anything that i can do, i will create either an issue or a PR for the same.

Thank you

@ddelange
Copy link
Contributor

ddelange commented Dec 1, 2023

potential duplicate of #39763

@ddelange
Copy link
Contributor

ddelange commented Dec 1, 2023

fixed by #40125 ?

@shamik-biswas-rft
Copy link
Author

fixed by #40125 ?

Yes this will be fixed by the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants