Skip to content

Commit

Permalink
Dev nogym (#160)
Browse files Browse the repository at this point in the history
* Starting removal of gym, testing of Gymnasium

- pybullet is not working, added more guards to fix the issue
- refactored seed in environment class
- Gymnasium class now seems to work
- Atari class seems to have still some issues

* Minor fixes in style

* [WIP] support for gymnasium Atari

* add full gymnasium support for Atari

* Update continuous_integration.yml

* Update test_pr.yml

---------

Co-authored-by: boris-il-forte <[email protected]>
Co-authored-by: Davide Tateo <[email protected]>
  • Loading branch information
3 people authored Feb 5, 2025
1 parent d567c45 commit acef38e
Show file tree
Hide file tree
Showing 31 changed files with 223 additions and 566 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/continuous_integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,10 @@ jobs:
sudo apt install unrar
- name: Install Atari ROMs
run: |
wget http://www.atarimania.com/roms/Roms.rar
unrar x -o+ Roms.rar
ale-import-roms ./ROMS
#wget http://www.atarimania.com/roms/Roms.rar
#unrar x -o+ Roms.rar
#ale-import-roms ./ROMS
pip install gymnasium[accept-rom-license]
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/test_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ jobs:
sudo apt install unrar
- name: Install Atari ROMs
run: |
wget http://www.atarimania.com/roms/Roms.rar
unrar x -o+ Roms.rar
ale-import-roms ./ROMS
# wget http://www.atarimania.com/roms/Roms.rar
# unrar x -o+ Roms.rar
# ale-import-roms ./ROMS
pip install gymnasium[accept-rom-license]
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
Expand Down
3 changes: 1 addition & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package:
python3 setup.py sdist

python3 -m build

install:
pip install $(shell ls dist/*.tar.gz)
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ What is MushroomRL
==================
MushroomRL is a Python Reinforcement Learning (RL) library whose modularity allows
to easily use well-known Python libraries for tensor computation (e.g. PyTorch,
Tensorflow) and RL benchmarks (e.g. OpenAI Gym, PyBullet, Deepmind Control Suite).
Tensorflow) and RL benchmarks (e.g. Gymnasium, PyBullet, Deepmind Control Suite).
It allows to perform RL experiments in a simple way providing classical RL algorithms
(e.g. Q-Learning, SARSA, FQI), and deep RL algorithms (e.g. DQN, DDPG, SAC, TD3,
TRPO, PPO).
Expand All @@ -45,7 +45,7 @@ You can do a minimal installation of ``MushroomRL`` with:
Installing everything
---------------------
``MushroomRL`` contains also some optional components e.g., support for ``OpenAI Gym``
``MushroomRL`` contains also some optional components e.g., support for ``Gymnasium``
environments, Atari 2600 games from the ``Arcade Learning Environment``, and the support
for physics simulators such as ``Pybullet`` and ``MuJoCo``.
Support for these classes is not enabled by default.
Expand Down
15 changes: 6 additions & 9 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ in order to run them without excessive effort. Moreover, it is designed in such
a way that new algorithms and other stuff can be added transparently,
without the need of editing other parts of the code. MushroomRL is compatible with RL
libraries like
`OpenAI Gym <https://gym.openai.com/>`_,
`Gymnasium <https://gymnasium.farama.org/>`_,
`DeepMind Control Suite <https://github.com/deepmind/dm_control>`_,
`Pybullet <https://pybullet.org/wordpress/>`_, and
`MuJoCo <http://www.mujoco.org/>`_, and
Expand Down Expand Up @@ -150,14 +150,11 @@ Installing with all the dependencies takes approximately 5 minutes using a fast
internet connection may increase the installation time significantly.

If installing all the dependencies, ensure that the swig library is installed, as it is used
by some Gym environments and the installation may fail otherwise. For Atari, you might need to install the ROM separately, otherwise
the creation of Atari environments may fail. Opencv should be installed too. For MuJoCo, ensure that the path of your MuJoCo folder is included
in the environment variable ``LD_LIBRARY_PATH`` and that ``mujoco_py`` is correctly installed.
Installing MushroomRL in a Conda environment is generally
safe. However, we are aware that when installing with the option
``plots``, some errors may arise due to incompatibility issues between
``pyqtgraph`` and Conda. We recommend not using Conda when installing using ``plots``.
Finally, ensure that C/C++ compilers and Cython are working as expected.
by some Gymnasium environments and the installation may fail otherwise. For Atari, you might need to install the ROM
separately, otherwise the creation of Atari environments may fail. Opencv should be installed too.
Installing MushroomRL in a Conda environment is generally safe. However, we are aware that when installing with the
option ``plots``, some errors may arise due to incompatibility issues between ``pyqtgraph`` and Conda. We recommend not
using Conda when installing using ``plots``.

To check if the installation has been successful, you can try to run the basic example above.

Expand Down
6 changes: 3 additions & 3 deletions docs/source/mushroom_rl.environments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,10 +60,10 @@ Grid World
:inherited-members:
:show-inheritance:

Gym
~~~
Gymnasium
~~~~~~~~~

.. automodule:: mushroom_rl.environments.gym_env
.. automodule:: mushroom_rl.environments.gymnasium_env
:members:
:private-members:
:inherited-members:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/tutorials/code/advanced_experiment.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.utils.callbacks import CollectDataset
from mushroom_rl.rl_utils.parameters import Parameter
from mushroom_rl.environments import Gym
from mushroom_rl.environments import Gymnasium

# MDP
mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)
mdp = Gymnasium(name='MountainCar-v0', horizon=np.inf, gamma=1.)

# Policy
epsilon = Parameter(value=0.)
Expand Down
4 changes: 2 additions & 2 deletions docs/source/tutorials/code/approximator.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@


# MDP
mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)
mdp = Gymnasium(name='MountainCar-v0', horizon=np.inf, gamma=1.)

# Policy
epsilon = Parameter(value=0.)
Expand All @@ -33,7 +33,7 @@
agent = SARSALambdaContinuous(mdp.info, pi, LinearApproximator,
approximator_params=approximator_params,
learning_rate=learning_rate,
lambda_coeff= .9, features=features)
lambda_coeff=.9, features=features)

# Algorithm
collect_dataset = CollectDataset()
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorials/tutorials.1_advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ Initially, the MDP and the policy are created:
.. literalinclude:: code/advanced_experiment.py
:lines: 1-19

This is an environment created with the MushroomRL interface to the OpenAI Gym
library. Each environment offered by OpenAI Gym can be created this way simply
This is an environment created with the MushroomRL interface to the Gymnasium
library. Each environment offered by Gymnasium can be created this way simply
providing the corresponding id in the ``name`` parameter, except for the Atari
that are managed by a separate class.
After the creation of the MDP, the tiles features are created:
Expand All @@ -46,7 +46,7 @@ Sutton & Barto, 1998* for details). After that, the learning is run as usual:
.. literalinclude:: code/advanced_experiment.py
:lines: 32-46

To visualize the learned policy the rendering method of OpenAI Gym is used. To
To visualize the learned policy the rendering method of Gymnasium is used. To
activate the rendering in the environments that supports it, it is necessary to
set ``render=True``.

Expand Down
10 changes: 5 additions & 5 deletions docs/source/tutorials/tutorials.5_environments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,21 +59,21 @@ For example, to create the ShipSteering environment you can use:
env = Environment.make('ShipSteering')
To build environments, you may need to pass additional parameters.
An example of this is the ``Gym`` environment which wraps most OpenAI Gym environments, except the Atari ones, which
uses the ``Atari`` environment to implement proper preprocessing.
An example of this is the ``Gymnasium`` environment which wraps most Gymnasium environments, except the Atari ones,
which uses the ``Atari`` environment to implement proper preprocessing.

If you want to build the ``Pendulum-v1`` gym environment you need to pass the environment name as a parameter:

.. code-block:: python
env = Environment.make('Gym', 'Pendulum-v1')
env = Environment.make('Gymnasium', 'Pendulum-v1')
However, for environments that are interfaces to other libraries such as ``Gym``, ``Atari`` or ``DMControl`` a notation
However, for environments that are interfaces to other libraries such as ``Gymnasium``, ``Atari`` or ``DMControl`` a notation
with a dot separator is supported. For example to create the pendulum you can also use:

.. code-block:: python
env = Environment.make('Gym.Pendulum-v1')
env = Environment.make('Gymnasium.Pendulum-v1')
Or, to create the ``hopper`` environment with ``hop`` task from DeepMind control suite you can use:

Expand Down
66 changes: 35 additions & 31 deletions examples/atari_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@

from mushroom_rl.algorithms.value import AveragedDQN, CategoricalDQN, DQN,\
DoubleDQN, MaxminDQN, DuelingDQN, NoisyDQN, QuantileDQN, Rainbow
from mushroom_rl.approximators.parametric import NumpyTorchApproximator
from mushroom_rl.approximators.parametric import NumpyTorchApproximator, TorchApproximator
from mushroom_rl.core import Core, Logger
from mushroom_rl.environments import *
from mushroom_rl.policy import EpsGreedy
from mushroom_rl.rl_utils.parameters import LinearParameter, Parameter
from mushroom_rl.rl_utils.replay_memory import PrioritizedReplayMemory
from mushroom_rl.utils.torch import TorchUtils

"""
This script runs Atari experiments with DQN, and some of its variants, as
Expand Down Expand Up @@ -118,17 +119,17 @@ def experiment():
arg_game = parser.add_argument_group('Game')
arg_game.add_argument("--name",
type=str,
default='BreakoutDeterministic-v4',
help='Gym ID of the Atari game.')
arg_game.add_argument("--screen-width", type=int, default=84,
help='Width of the game screen.')
arg_game.add_argument("--screen-height", type=int, default=84,
help='Height of the game screen.')
default='ALE/Breakout-v5',
help='Gymnasium ID of the Atari game.')
# arg_game.add_argument("--screen-width", type=int, default=84,
# help='Width of the game screen.')
# arg_game.add_argument("--screen-height", type=int, default=84,
# help='Height of the game screen.')

arg_mem = parser.add_argument_group('Replay Memory')
arg_mem.add_argument("--initial-replay-size", type=int, default=50000,
arg_mem.add_argument("--initial-replay-size", type=int, default=20_000,
help='Initial size of the replay memory.')
arg_mem.add_argument("--max-replay-size", type=int, default=500000,
arg_mem.add_argument("--max-replay-size", type=int, default=100000, #changed to 100k instead of 500k because of memory restrictions
help='Max size of the replay memory.')
arg_mem.add_argument("--prioritized", action='store_true',
help='Whether to use prioritized memory or not.')
Expand All @@ -141,13 +142,13 @@ def experiment():
'rmspropcentered'],
default='adam',
help='Name of the optimizer to use.')
arg_net.add_argument("--learning-rate", type=float, default=.0001,
arg_net.add_argument("--learning-rate", type=float, default=6.25e-5,
help='Learning rate value of the optimizer.')
arg_net.add_argument("--decay", type=float, default=.95,
help='Discount factor for the history coming from the'
'gradient momentum in rmspropcentered and'
'rmsprop')
arg_net.add_argument("--epsilon", type=float, default=1e-8,
arg_net.add_argument("--epsilon", type=float, default=1.5e-4,
help='Epsilon term used in rmspropcentered and'
'rmsprop')

Expand All @@ -163,36 +164,36 @@ def experiment():
"AveragedDQN or MaxminDQN.")
arg_alg.add_argument("--batch-size", type=int, default=32,
help='Batch size for each fit of the network.')
arg_alg.add_argument("--history-length", type=int, default=4,
help='Number of frames composing a state.')
arg_alg.add_argument("--target-update-frequency", type=int, default=10000,
# arg_alg.add_argument("--history-length", type=int, default=4,
# help='Number of frames composing a state.')
arg_alg.add_argument("--target-update-frequency", type=int, default=8_000,
help='Number of collected samples before each update'
'of the target network.')
arg_alg.add_argument("--evaluation-frequency", type=int, default=250000,
arg_alg.add_argument("--evaluation-frequency", type=int, default=250_000,
help='Number of collected samples before each'
'evaluation. An epoch ends after this number of'
'steps')
arg_alg.add_argument("--train-frequency", type=int, default=4,
help='Number of collected samples before each fit of'
'the neural network.')
arg_alg.add_argument("--max-steps", type=int, default=50000000,
arg_alg.add_argument("--max-steps", type=int, default=50_000_000,
help='Total number of collected samples.')
arg_alg.add_argument("--final-exploration-frame", type=int, default=1000000,
arg_alg.add_argument("--final-exploration-frame", type=int, default=250_000,
help='Number of collected samples until the exploration'
'rate stops decreasing.')
arg_alg.add_argument("--initial-exploration-rate", type=float, default=1.,
help='Initial value of the exploration rate.')
arg_alg.add_argument("--final-exploration-rate", type=float, default=.1,
arg_alg.add_argument("--final-exploration-rate", type=float, default=.01,
help='Final value of the exploration rate. When it'
'reaches this values, it stays constant.')
arg_alg.add_argument("--test-exploration-rate", type=float, default=.05,
help='Exploration rate used during evaluation.')
arg_alg.add_argument("--test-samples", type=int, default=125000,
arg_alg.add_argument("--test-samples", type=int, default=125_000,
help='Number of collected samples for each'
'evaluation.')
arg_alg.add_argument("--max-no-op-actions", type=int, default=30,
help='Maximum number of no-op actions performed at the'
'beginning of the episodes.')
# arg_alg.add_argument("--max-no-op-actions", type=int, default=30,
# help='Maximum number of no-op actions performed at the'
# 'beginning of the episodes.')
arg_alg.add_argument("--alpha-coeff", type=float, default=.6,
help='Prioritization exponent for prioritized experience replay.')
arg_alg.add_argument("--n-atoms", type=int, default=51,
Expand All @@ -218,6 +219,9 @@ def experiment():
help='Path of the model to be loaded.')
arg_utils.add_argument('--render', action='store_true',
help='Flag specifying whether to render the game.')
arg_utils.add_argument('--record', action='store_true',
help='Flag specifying whether to record the game.'
'The render flag should be set to True')
arg_utils.add_argument('--quiet', action='store_true',
help='Flag specifying whether to hide the progress'
'bar.')
Expand Down Expand Up @@ -276,9 +280,10 @@ def experiment():
max_steps = args.max_steps

# MDP
mdp = GymnasiumAtari(args.name, args.screen_width, args.screen_height,
ends_at_life=True, history_length=args.history_length,
max_no_op_actions=args.max_no_op_actions, headless=False)
mdp = Atari(args.name, headless=False)
# args.screen_width, args.screen_height,
# ends_at_life=True, history_length=args.history_length,
# max_no_op_actions=args.max_no_op_actions,

if args.load_path:
logger = Logger(DQN.__name__, results_dir=None)
Expand Down Expand Up @@ -314,12 +319,14 @@ def experiment():
output_shape=(mdp.info.action_space.n,),
n_actions=mdp.info.action_space.n,
n_features=Network.n_features,
optimizer=optimizer
optimizer=optimizer,
)
if args.algorithm not in ['cdqn', 'qdqn', 'rainbow']:
approximator_params['loss'] = F.smooth_l1_loss

approximator = NumpyTorchApproximator
approximator = TorchApproximator if args.use_cuda else NumpyTorchApproximator

TorchUtils.set_default_device('cuda:0' if torch.cuda.is_available() and args.use_cuda else 'cpu')

if args.prioritized:
replay_memory = PrioritizedReplayMemory(
Expand Down Expand Up @@ -406,9 +413,8 @@ def experiment():

# Evaluate initial policy
pi.set_epsilon(epsilon_test)
mdp.set_episode_end(False)
dataset = core.evaluate(n_steps=test_samples, render=args.render,
quiet=args.quiet, record=True)
quiet=args.quiet, record=args.record)
scores.append(get_stats(dataset, logger))

np.save(folder_name + '/scores.npy', scores)
Expand All @@ -417,7 +423,6 @@ def experiment():
logger.info('- Learning:')
# learning step
pi.set_epsilon(epsilon)
mdp.set_episode_end(True)
core.learn(n_steps=evaluation_frequency,
n_steps_per_fit=train_frequency, quiet=args.quiet)

Expand All @@ -427,7 +432,6 @@ def experiment():
logger.info('- Evaluation:')
# evaluation step
pi.set_epsilon(epsilon_test)
mdp.set_episode_end(False)
dataset = core.evaluate(n_steps=test_samples, render=args.render,
quiet=args.quiet)
scores.append(get_stats(dataset, logger))
Expand Down
4 changes: 2 additions & 2 deletions examples/gym_recurrent_ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import torch.optim as optim

from mushroom_rl.core import Logger, Core
from mushroom_rl.environments import Gym
from mushroom_rl.environments import Gymnasium

from mushroom_rl.algorithms.actor_critic import PPO_BPTT
from mushroom_rl.policy import RecurrentGaussianTorchPolicy
Expand Down Expand Up @@ -201,7 +201,7 @@ def experiment(
logger = Logger(results_dir=results_dir, log_name="stochastic_logging", seed=seed)

# MDP
mdp = Gym(env, horizon=horizon, gamma=gamma)
mdp = Gymnasium(env, horizon=horizon, gamma=gamma)

# create the policy
dim_env_state = mdp.info.observation_space.shape[0]
Expand Down
2 changes: 1 addition & 1 deletion examples/minigrid_dqn.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def experiment():
arg_env.add_argument("--name",
type=str,
default='MiniGrid-Unlock-v0',
help='Gym ID of the MiniGrid environment.')
help='Gymnasium ID of the MiniGrid environment.')

arg_mem = parser.add_argument_group('Replay Memory')
arg_mem.add_argument("--initial-replay-size", type=int, default=50000,
Expand Down
4 changes: 2 additions & 2 deletions examples/vectorized_core/pendulum_trust_region.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from tqdm import trange

from mushroom_rl.core import VectorCore, Logger, MultiprocessEnvironment
from mushroom_rl.environments import Gym
from mushroom_rl.environments import Gymnasium
from mushroom_rl.algorithms.actor_critic import PPO, TRPO

from mushroom_rl.policy import GaussianTorchPolicy
Expand Down Expand Up @@ -46,7 +46,7 @@ def experiment(alg, env_id, horizon, gamma, n_epochs, n_steps, n_steps_per_fit,
logger.strong_line()
logger.info('Experiment Algorithm: ' + alg.__name__)

mdp = MultiprocessEnvironment(Gym, env_id, horizon, gamma, n_envs=15)
mdp = MultiprocessEnvironment(Gymnasium, env_id, horizon, gamma, n_envs=15)

critic_params = dict(network=Network,
optimizer={'class': optim.Adam,
Expand Down
Loading

0 comments on commit acef38e

Please sign in to comment.