-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
timestep_limit of MountainCar-v0 #336
Comments
I don't see -195 for a threshold anywhere: I believe it's -110. Yes, the environment is hard and the timestep limit makes it harder. It's supposed to be challenging. Algorithms like https://gym.openai.com/evaluations/eval_DAj7EdpYTiO7m0H1f6xWw show that learning in this environment is possible. You can enforce the timestep limit in your agent, or not if you want to experiment with longer trials. Most agents (such as the one linked above) do. |
thanks for the response. ahh, you are right, the reward is -110. hmm, interesting example submission. But the visualization on that submission seems off (strangely the plotted line didn't pass the threshold). |
Hi, env = gym.make('MountainCar-v0') But it still remain capped at 200. |
you might notice that unlike many other environments, this environment |
Well, it is not allowing me to continue calling 'step' function after the episode has taken 200 number of steps. It gives me the following error
So, it is binding me to call the 'reset' function. My problem is I am starting off taking random actions so as to explore the environment. However, 200 number of steps are turning out to be just too less for it too reach the goal and hence learn anything. |
i checked on the master branch ( |
Thanks for your replies. One of the ways worked. I edited 'init.py' under 'gym/envs/' to increase the maximum allowed steps in an episode. It does take effect immediately. |
@sanjaythakur i would recommend consulting example 8.2 in Reinforcement Learning An Introduction by Sutton and Barto for a principled treatment. |
Yeah, I too feel making an informed decision based on planning would help more. Thanks, will do that. |
If you modify the gym enviroment without changing the name, please don't submit any results to the scoreboard as it's not comparable with other people's scores. |
@tlbtlbtlb , I'll keep that in mind. |
@tlbtlbtlb Hi can you help me with this as I am new to open ai gym and have to create a new environment for autonomous drone hence defining _step() and _reset() fun in myenv class. Please help me with these errors and can you explain me about the argument action in the step function as we have to provide the action and the will return observation, reward and done so why we are giving action as an argument. |
@shristi945 for basic questions/discussion, you might want to consult https://discuss.openai.com/ first and reserve issues for more technical implementation-oriented things. So |
@falcondai Thanx for informing me about where to discuss basics things. I have resolved my problem now. |
The problem is specifically designed to be hard for policies that try to get the answer randomly, and rewards the methods that go through exploration. If you increase the time limit, you are changing the environment, tho, solving another problem. The same case can be said for those who modify the reward function to achieve the solution of the problem |
Try this to initialize your environment: This increases the upper bound on the number of trails When you visualize your learnt policy initialize your environment normally: I don't know the reason yet but my learnt policy works correctly only if I initialize my environment the normal way |
If anyone needs any help, here's how you fix the TimeLimit error: |
Currently in the MountainCar-v0 environment, the timestep_limit is 200 which makes learning very difficult: most initial policies will run out of time before reaching the goal and end up receiving the same rewards (-200). Note that the solution threshold is
-195-110, i.e. reaching goal in195110 timesteps. I would suggest to increase this limit.I notice that this time limit is only enforced when monitoring is on. I wonder why such limit is put into the monitoring which creates difference between a monitored/non-monitored environment. For performance comparison's sake, timestep counts might be a better measure.
The text was updated successfully, but these errors were encountered: