-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected policy behavior in halfcheetah ARS example #74
Comments
The example was removed in the latest version, but the mechanism still exists for people to create their own version. The ant ARS example should work, but might require some tuning of hyperparameters to make it walk properly. |
Thanks so much! Do you know why the previous version using halfcheetah didn't work? Are there any specific mechanism parameters or functions I would need to implement to create an environment for halfcheetah similar to the new ant ARS? |
Not exactly sure what the issue was before, but there were a lot of changes on the simulation and contact behavior. I believe the training success is rather sensitive to these parameters and to the reward function, so that could be what broke it. As a rough starting point:
If you get something to work, also improvements to the ant, open a pull request and we can integrate that. |
Running the
halfcheetah_ars.jl
example, I expected to see policy behavior similar to what is shown in the docs. Instead, I see that ARS gets a mean reward of around -23 and the resulting policy tends to move backward. Is this the expected behavior?I'm using julia 1.8, Ubuntu 20.04, and the main branch of Dojo.jl
The text was updated successfully, but these errors were encountered: