Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timestep_limit of MountainCar-v0 #336

Closed
falcondai opened this issue Sep 8, 2016 · 17 comments
Closed

timestep_limit of MountainCar-v0 #336

falcondai opened this issue Sep 8, 2016 · 17 comments

Comments

@falcondai
Copy link
Contributor

falcondai commented Sep 8, 2016

Currently in the MountainCar-v0 environment, the timestep_limit is 200 which makes learning very difficult: most initial policies will run out of time before reaching the goal and end up receiving the same rewards (-200). Note that the solution threshold is -195 -110, i.e. reaching goal in 195 110 timesteps. I would suggest to increase this limit.

I notice that this time limit is only enforced when monitoring is on. I wonder why such limit is put into the monitoring which creates difference between a monitored/non-monitored environment. For performance comparison's sake, timestep counts might be a better measure.

@tlbtlbtlb
Copy link
Contributor

I don't see -195 for a threshold anywhere: I believe it's -110.

Yes, the environment is hard and the timestep limit makes it harder. It's supposed to be challenging. Algorithms like https://gym.openai.com/evaluations/eval_DAj7EdpYTiO7m0H1f6xWw show that learning in this environment is possible.

You can enforce the timestep limit in your agent, or not if you want to experiment with longer trials. Most agents (such as the one linked above) do.

@falcondai
Copy link
Contributor Author

thanks for the response. ahh, you are right, the reward is -110. hmm, interesting example submission. But the visualization on that submission seems off (strangely the plotted line didn't pass the threshold).

@sanjaythakur
Copy link

Hi,
I was trying to raise the maximum steps per episode on Mountain Car environment.
I used this

env = gym.make('MountainCar-v0')
env.max_episode_steps = 500

But it still remain capped at 200.
I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error.
Can anyone give me some idea on how I should go about increasing the upper bound on episode size?
Thanks!!

@falcondai
Copy link
Contributor Author

you might notice that unlike many other environments, this environment MoutainCar-v0 allows you to continue to step even after an episode has ended: ignore the returned done value.

@sanjaythakur
Copy link

Well, it is not allowing me to continue calling 'step' function after the episode has taken 200 number of steps. It gives me the following error

raise error.ResetNeeded("Trying to step environment which is currently done. While the monitor is active for {}, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.".format(self.env_id))
gym.error.ResetNeeded: Trying to step environment which is currently done. While the monitor is active for MountainCar-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.

So, it is binding me to call the 'reset' function. My problem is I am starting off taking random actions so as to explore the environment. However, 200 number of steps are turning out to be just too less for it too reach the goal and hence learn anything.

@falcondai
Copy link
Contributor Author

falcondai commented Mar 9, 2017

i checked on the master branch (gym.__version__ = '0.7.4-dev') and it works fine without reset. as noted in the earlier discussion, it is possible to learn in this strict setting. MountainCar is a classic task to investigate the problem of exploration in RL. you are right that if an agent explore only by random actions, it is very unlikely to reach the goal in time since it would often undo its gained momentum. but that is exactly the issue of the so-called naive exploration.

@sanjaythakur
Copy link

Thanks for your replies. One of the ways worked. I edited 'init.py' under 'gym/envs/' to increase the maximum allowed steps in an episode. It does take effect immediately.

@falcondai
Copy link
Contributor Author

falcondai commented Mar 9, 2017

@sanjaythakur i would recommend consulting example 8.2 in Reinforcement Learning An Introduction by Sutton and Barto for a principled treatment.

@sanjaythakur
Copy link

Yeah, I too feel making an informed decision based on planning would help more. Thanks, will do that.

@tlbtlbtlb
Copy link
Contributor

If you modify the gym enviroment without changing the name, please don't submit any results to the scoreboard as it's not comparable with other people's scores.

@sanjaythakur
Copy link

@tlbtlbtlb , I'll keep that in mind.

@shristi945
Copy link

@tlbtlbtlb Hi can you help me with this as I am new to open ai gym and have to create a new environment for autonomous drone hence defining _step() and _reset() fun in myenv class.
this is the code for my environment
env_code
and I am getting these errors

env_error

Please help me with these errors and can you explain me about the argument action in the step function as we have to provide the action and the will return observation, reward and done so why we are giving action as an argument.
It would be helpful if I could get a quick reply.
Thanks in Advance

@falcondai
Copy link
Contributor Author

@shristi945 for basic questions/discussion, you might want to consult https://discuss.openai.com/ first and reserve issues for more technical implementation-oriented things. So action is the action chosen by your agent operating in Env and environment changes depending on the action taken. Thus Env.step takes action. You can read more here and in various tutorials

@shristi945
Copy link

@falcondai Thanx for informing me about where to discuss basics things. I have resolved my problem now.

@raul-mdelfin
Copy link

The problem is specifically designed to be hard for policies that try to get the answer randomly, and rewards the methods that go through exploration. If you increase the time limit, you are changing the environment, tho, solving another problem. The same case can be said for those who modify the reward function to achieve the solution of the problem

@ZainBashir
Copy link

ZainBashir commented Jun 24, 2019

Hi,
I was trying to raise the maximum steps per episode on Mountain Car environment.
I used this

env = gym.make('MountainCar-v0')
env.max_episode_steps = 500

But it still remain capped at 200.
I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error.
Can anyone give me some idea on how I should go about increasing the upper bound on episode size?
Thanks!!

Try this to initialize your environment:
env = gym.make('MountainCar-v0').env

This increases the upper bound on the number of trails

When you visualize your learnt policy initialize your environment normally:
env = gym.make('MountainCar-v0')

I don't know the reason yet but my learnt policy works correctly only if I initialize my environment the normal way
Hope it works

@QasimWani
Copy link

If anyone needs any help, here's how you fix the TimeLimit error:
env_name = "Taxi-v3"
env = gym.make(env_name)
env = env.unwrapped #gets ride of TimeLimit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants