timestep_limit of MountainCar-v0 #336

falcondai · 2016-09-08T21:55:42Z

Currently in the MountainCar-v0 environment, the timestep_limit is 200 which makes learning very difficult: most initial policies will run out of time before reaching the goal and end up receiving the same rewards (-200). Note that the solution threshold is ~~-195~~ -110, i.e. reaching goal in ~~195~~ 110 timesteps. I would suggest to increase this limit.

I notice that this time limit is only enforced when monitoring is on. I wonder why such limit is put into the monitoring which creates difference between a monitored/non-monitored environment. For performance comparison's sake, timestep counts might be a better measure.

tlbtlbtlb · 2016-09-09T19:45:27Z

I don't see -195 for a threshold anywhere: I believe it's -110.

Yes, the environment is hard and the timestep limit makes it harder. It's supposed to be challenging. Algorithms like https://gym.openai.com/evaluations/eval_DAj7EdpYTiO7m0H1f6xWw show that learning in this environment is possible.

You can enforce the timestep limit in your agent, or not if you want to experiment with longer trials. Most agents (such as the one linked above) do.

falcondai · 2016-09-09T23:15:52Z

thanks for the response. ahh, you are right, the reward is -110. hmm, interesting example submission. But the visualization on that submission seems off (strangely the plotted line didn't pass the threshold).

sanjaythakur · 2017-03-09T22:26:26Z

Hi,
I was trying to raise the maximum steps per episode on Mountain Car environment.
I used this

env = gym.make('MountainCar-v0')
env.max_episode_steps = 500

But it still remain capped at 200.
I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error.
Can anyone give me some idea on how I should go about increasing the upper bound on episode size?
Thanks!!

falcondai · 2017-03-09T22:40:44Z

you might notice that unlike many other environments, this environment MoutainCar-v0 allows you to continue to step even after an episode has ended: ignore the returned done value.

sanjaythakur · 2017-03-09T23:08:11Z

Well, it is not allowing me to continue calling 'step' function after the episode has taken 200 number of steps. It gives me the following error

raise error.ResetNeeded("Trying to step environment which is currently done. While the monitor is active for {}, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.".format(self.env_id))
gym.error.ResetNeeded: Trying to step environment which is currently done. While the monitor is active for MountainCar-v0, you cannot step beyond the end of an episode. Call 'env.reset()' to start the next episode.

So, it is binding me to call the 'reset' function. My problem is I am starting off taking random actions so as to explore the environment. However, 200 number of steps are turning out to be just too less for it too reach the goal and hence learn anything.

falcondai · 2017-03-09T23:18:00Z

i checked on the master branch (gym.__version__ = '0.7.4-dev') and it works fine without reset. as noted in the earlier discussion, it is possible to learn in this strict setting. MountainCar is a classic task to investigate the problem of exploration in RL. you are right that if an agent explore only by random actions, it is very unlikely to reach the goal in time since it would often undo its gained momentum. but that is exactly the issue of the so-called naive exploration.

sanjaythakur · 2017-03-09T23:29:10Z

Thanks for your replies. One of the ways worked. I edited 'init.py' under 'gym/envs/' to increase the maximum allowed steps in an episode. It does take effect immediately.

falcondai · 2017-03-09T23:42:33Z

@sanjaythakur i would recommend consulting example 8.2 in Reinforcement Learning An Introduction by Sutton and Barto for a principled treatment.

sanjaythakur · 2017-03-10T00:11:01Z

Yeah, I too feel making an informed decision based on planning would help more. Thanks, will do that.

tlbtlbtlb · 2017-03-10T00:34:35Z

If you modify the gym enviroment without changing the name, please don't submit any results to the scoreboard as it's not comparable with other people's scores.

sanjaythakur · 2017-03-10T00:49:01Z

@tlbtlbtlb , I'll keep that in mind.

shristi945 · 2017-08-18T07:07:36Z

@tlbtlbtlb Hi can you help me with this as I am new to open ai gym and have to create a new environment for autonomous drone hence defining _step() and _reset() fun in myenv class.
this is the code for my environment

and I am getting these errors

Please help me with these errors and can you explain me about the argument action in the step function as we have to provide the action and the will return observation, reward and done so why we are giving action as an argument.
It would be helpful if I could get a quick reply.
Thanks in Advance

falcondai · 2017-08-21T23:26:43Z

@shristi945 for basic questions/discussion, you might want to consult https://discuss.openai.com/ first and reserve issues for more technical implementation-oriented things. So action is the action chosen by your agent operating in Env and environment changes depending on the action taken. Thus Env.step takes action. You can read more here and in various tutorials

shristi945 · 2017-08-28T16:22:36Z

@falcondai Thanx for informing me about where to discuss basics things. I have resolved my problem now.

raul-mdelfin · 2019-04-30T14:31:23Z

The problem is specifically designed to be hard for policies that try to get the answer randomly, and rewards the methods that go through exploration. If you increase the time limit, you are changing the environment, tho, solving another problem. The same case can be said for those who modify the reward function to achieve the solution of the problem

ZainBashir · 2019-06-24T08:41:47Z

Hi,
I was trying to raise the maximum steps per episode on Mountain Car environment.
I used this

env = gym.make('MountainCar-v0')
env.max_episode_steps = 500

But it still remain capped at 200.
I also tried creating a new register entry, but it gave me some 'UnregisteredEnv' error.
Can anyone give me some idea on how I should go about increasing the upper bound on episode size?
Thanks!!

Try this to initialize your environment:
env = gym.make('MountainCar-v0').env

This increases the upper bound on the number of trails

When you visualize your learnt policy initialize your environment normally:
env = gym.make('MountainCar-v0')

I don't know the reason yet but my learnt policy works correctly only if I initialize my environment the normal way
Hope it works

QasimWani · 2020-07-19T01:23:17Z

If anyone needs any help, here's how you fix the TimeLimit error:
env_name = "Taxi-v3"
env = gym.make(env_name)
env = env.unwrapped #gets ride of TimeLimit

falcondai closed this as completed Sep 9, 2016

sedand mentioned this issue Sep 13, 2017

Workaround for environment max step limit of 200. dennybritz/reinforcement-learning#107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timestep_limit of MountainCar-v0 #336

timestep_limit of MountainCar-v0 #336

falcondai commented Sep 8, 2016 •

edited

Loading

tlbtlbtlb commented Sep 9, 2016

falcondai commented Sep 9, 2016

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017 •

edited

Loading

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017 •

edited

Loading

sanjaythakur commented Mar 10, 2017

tlbtlbtlb commented Mar 10, 2017

sanjaythakur commented Mar 10, 2017

shristi945 commented Aug 18, 2017

falcondai commented Aug 21, 2017

shristi945 commented Aug 28, 2017

raul-mdelfin commented Apr 30, 2019

ZainBashir commented Jun 24, 2019 •

edited

Loading

QasimWani commented Jul 19, 2020

timestep_limit of MountainCar-v0 #336

timestep_limit of MountainCar-v0 #336

Comments

falcondai commented Sep 8, 2016 • edited Loading

tlbtlbtlb commented Sep 9, 2016

falcondai commented Sep 9, 2016

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017 • edited Loading

sanjaythakur commented Mar 9, 2017

falcondai commented Mar 9, 2017 • edited Loading

sanjaythakur commented Mar 10, 2017

tlbtlbtlb commented Mar 10, 2017

sanjaythakur commented Mar 10, 2017

shristi945 commented Aug 18, 2017

falcondai commented Aug 21, 2017

shristi945 commented Aug 28, 2017

raul-mdelfin commented Apr 30, 2019

ZainBashir commented Jun 24, 2019 • edited Loading

QasimWani commented Jul 19, 2020

falcondai commented Sep 8, 2016 •

edited

Loading

falcondai commented Mar 9, 2017 •

edited

Loading

falcondai commented Mar 9, 2017 •

edited

Loading

ZainBashir commented Jun 24, 2019 •

edited

Loading