Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trl CLI doesn't work #1716

Closed
nikhil-tensorwave opened this issue Jun 7, 2024 · 6 comments
Closed

trl CLI doesn't work #1716

nikhil-tensorwave opened this issue Jun 7, 2024 · 6 comments

Comments

@nikhil-tensorwave
Copy link

When I try to run the trl CLI, I keep getting File or directory not found errors. It seems the CLI commands are not correctly pointing to the example scripts.

@younesbelkada
Copy link
Contributor

Hi @nikhil-tensorwave
Thanks for the issue, I just tried the CLI on a fresh new google colab env with trl==0.9.4 and it worked. Can you make sure to install the latest version of TRL? pip install -U trl

@nikhil-tensorwave
Copy link
Author

I do have the latest version of TRL installed. Where am I supposed to run the CLI in? When I run it in the main directory, I get this error:
/usr/bin/python3: can't open file '/home/root/build/trl/trl/commands/scripts/sft.py': [Errno 2] No such file or directory

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@qgallouedec
Copy link
Member

qgallouedec commented Jul 22, 2024

This issue occurs only when installing from source
EDIT: This issue occurs only when installing from source in editable mode (pip install -e)

(python 3.10.14)

$ python -m venv .env
$ source .env/bin/activate
$ git clone https://github.com/huggingface/trl
Cloning into 'trl'...
remote: Enumerating objects: 8237, done.
remote: Counting objects: 100% (1283/1283), done.
remote: Compressing objects: 100% (212/212), done.
remote: Total 8237 (delta 1193), reused 1083 (delta 1065), pack-reused 6954
Receiving objects: 100% (8237/8237), 6.78 MiB | 37.72 MiB/s, done.
Resolving deltas: 100% (5701/5701), done.
$ cd trl
$ pip install -e .
[...]
$ trl sft --config examples/cli_configs/example_config.yaml --output_dir test-trl-cli --lr_scheduler_type cosine_with_restarts
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--mixed_precision` was set to a value of `'no'`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/fsx/qgallouedec/tmp/.env/bin/python3.12: can't open file '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py': [Errno 2] No such file or directory
Traceback (most recent call last):
  File "/fsx/qgallouedec/tmp/.env/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    simple_launcher(args)
  File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/fsx/qgallouedec/tmp/.env/bin/python3.12', '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py', '--config', 'examples/cli_configs/example_config.yaml', '--output_dir', 'test-trl-cli', '--lr_scheduler_type', 'cosine_with_restarts']' returned non-zero exit status 2.
[10:09:51] TRL - SFT failed on ! See the logs above for further details.                                                                                                             cli.py:67
Traceback (most recent call last):
  File "/fsx/qgallouedec/tmp/trl/trl/commands/cli.py", line 58, in main
    subprocess.run(
  File "/fsx/qgallouedec/miniconda3/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['accelerate', 'launch', '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py', '--config', 'examples/cli_configs/example_config.yaml', '--output_dir', 'test-trl-cli', '--lr_scheduler_type', 'cosine_with_restarts']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fsx/qgallouedec/tmp/.env/bin/trl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/fsx/qgallouedec/tmp/trl/trl/commands/cli.py", line 68, in main
    raise ValueError("TRL CLI failed! Check the traceback above..") from exc
ValueError: TRL CLI failed! Check the traceback above..

@qgallouedec qgallouedec reopened this Jul 22, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@qgallouedec
Copy link
Member

Instead of

git clone https://github.com/huggingface/trl
cd trl
pip install -e .

use

git clone https://github.com/huggingface/trl
cd trl
make dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants