-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trl CLI doesn't work #1716
Comments
Hi @nikhil-tensorwave |
I do have the latest version of TRL installed. Where am I supposed to run the CLI in? When I run it in the main directory, I get this error: |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
(python 3.10.14) $ python -m venv .env
$ source .env/bin/activate
$ git clone https://github.com/huggingface/trl
Cloning into 'trl'...
remote: Enumerating objects: 8237, done.
remote: Counting objects: 100% (1283/1283), done.
remote: Compressing objects: 100% (212/212), done.
remote: Total 8237 (delta 1193), reused 1083 (delta 1065), pack-reused 6954
Receiving objects: 100% (8237/8237), 6.78 MiB | 37.72 MiB/s, done.
Resolving deltas: 100% (5701/5701), done.
$ cd trl
$ pip install -e .
[...]
$ trl sft --config examples/cli_configs/example_config.yaml --output_dir test-trl-cli --lr_scheduler_type cosine_with_restarts
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/fsx/qgallouedec/tmp/.env/bin/python3.12: can't open file '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py': [Errno 2] No such file or directory
Traceback (most recent call last):
File "/fsx/qgallouedec/tmp/.env/bin/accelerate", line 8, in <module>
sys.exit(main())
^^^^^^
File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
simple_launcher(args)
File "/fsx/qgallouedec/tmp/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 703, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/fsx/qgallouedec/tmp/.env/bin/python3.12', '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py', '--config', 'examples/cli_configs/example_config.yaml', '--output_dir', 'test-trl-cli', '--lr_scheduler_type', 'cosine_with_restarts']' returned non-zero exit status 2.
[10:09:51] TRL - SFT failed on ! See the logs above for further details. cli.py:67
Traceback (most recent call last):
File "/fsx/qgallouedec/tmp/trl/trl/commands/cli.py", line 58, in main
subprocess.run(
File "/fsx/qgallouedec/miniconda3/lib/python3.12/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['accelerate', 'launch', '/fsx/qgallouedec/tmp/trl/trl/commands/scripts/sft.py', '--config', 'examples/cli_configs/example_config.yaml', '--output_dir', 'test-trl-cli', '--lr_scheduler_type', 'cosine_with_restarts']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/fsx/qgallouedec/tmp/.env/bin/trl", line 8, in <module>
sys.exit(main())
^^^^^^
File "/fsx/qgallouedec/tmp/trl/trl/commands/cli.py", line 68, in main
raise ValueError("TRL CLI failed! Check the traceback above..") from exc
ValueError: TRL CLI failed! Check the traceback above.. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Instead of
use
|
When I try to run the trl CLI, I keep getting
File or directory not found
errors. It seems the CLI commands are not correctly pointing to the example scripts.The text was updated successfully, but these errors were encountered: