Dev nogym (#160)

* Starting removal of gym, testing of Gymnasium - pybullet is not working, added more guards to fix the issue - refactored seed in environment class - Gymnasium class now seems to work - Atari class seems to have still some issues * Minor fixes in style * [WIP] support for gymnasium Atari * add full gymnasium support for Atari * Update continuous_integration.yml * Update test_pr.yml --------- Co-authored-by: boris-il-forte <[email protected]> Co-authored-by: Davide Tateo <[email protected]>
MushroomRL · Feb 5, 2025 · acef38e · acef38e
1 parent d567c45
commit acef38e
Show file tree

Hide file tree

Showing 31 changed files with 223 additions and 566 deletions.
diff --git a/.github/workflows/continuous_integration.yml b/.github/workflows/continuous_integration.yml
@@ -25,9 +25,10 @@ jobs:
         sudo apt install unrar
     - name: Install Atari ROMs
       run: |
-        wget http://www.atarimania.com/roms/Roms.rar 
-        unrar x  -o+  Roms.rar
-        ale-import-roms ./ROMS
+        #wget http://www.atarimania.com/roms/Roms.rar 
+        #unrar x  -o+  Roms.rar
+        #ale-import-roms ./ROMS
+        pip install gymnasium[accept-rom-license]
     - name: Lint with flake8
       run: |
         # stop the build if there are Python syntax errors or undefined names

diff --git a/.github/workflows/test_pr.yml b/.github/workflows/test_pr.yml
@@ -26,9 +26,10 @@ jobs:
         sudo apt install unrar
     - name: Install Atari ROMs
       run: |
-        wget http://www.atarimania.com/roms/Roms.rar 
-        unrar x  -o+  Roms.rar
-        ale-import-roms ./ROMS
+        # wget http://www.atarimania.com/roms/Roms.rar 
+        # unrar x  -o+  Roms.rar
+        # ale-import-roms ./ROMS
+        pip install gymnasium[accept-rom-license]
     - name: Lint with flake8
       run: |
         # stop the build if there are Python syntax errors or undefined names

diff --git a/Makefile b/Makefile
@@ -1,6 +1,5 @@
 package: 
-	python3 setup.py sdist
-
+	python3 -m build
 
 install: 
 	pip install $(shell ls dist/*.tar.gz)

diff --git a/README.rst b/README.rst
@@ -27,7 +27,7 @@ What is MushroomRL
 ==================
 MushroomRL is a Python Reinforcement Learning (RL) library whose modularity allows
 to easily use well-known Python libraries for tensor computation (e.g. PyTorch,
-Tensorflow) and RL benchmarks (e.g. OpenAI Gym, PyBullet, Deepmind Control Suite).
+Tensorflow) and RL benchmarks (e.g. Gymnasium, PyBullet, Deepmind Control Suite).
 It allows to perform RL experiments in a simple way providing classical RL algorithms
 (e.g. Q-Learning, SARSA, FQI), and deep RL algorithms (e.g. DQN, DDPG, SAC, TD3,
 TRPO, PPO).
@@ -45,7 +45,7 @@ You can do a minimal installation of ``MushroomRL`` with:
 
 Installing everything
 ---------------------
-``MushroomRL`` contains also some optional components e.g., support for ``OpenAI Gym`` 
+``MushroomRL`` contains also some optional components e.g., support for ``Gymnasium``
 environments, Atari 2600 games from the ``Arcade Learning Environment``, and the support
 for physics simulators such as ``Pybullet`` and ``MuJoCo``. 
 Support for these classes is not enabled by default.

diff --git a/docs/index.rst b/docs/index.rst
@@ -22,7 +22,7 @@ in order to run them without excessive effort. Moreover, it is designed in such
 a way that new algorithms and other stuff can be added transparently,
 without the need of editing other parts of the code. MushroomRL is compatible with RL
 libraries like
-`OpenAI Gym <https://gym.openai.com/>`_,
+`Gymnasium <https://gymnasium.farama.org/>`_,
 `DeepMind Control Suite <https://github.com/deepmind/dm_control>`_,
 `Pybullet <https://pybullet.org/wordpress/>`_, and
 `MuJoCo <http://www.mujoco.org/>`_, and
@@ -150,14 +150,11 @@ Installing with all the dependencies takes approximately 5 minutes using a fast
 internet connection may increase the installation time significantly.
 
 If installing all the dependencies, ensure that the swig library is installed, as it is used
-by some Gym environments and the installation may fail otherwise. For Atari, you might need to install the ROM separately, otherwise
-the creation of Atari environments may fail. Opencv should be installed too. For MuJoCo, ensure that the path of your MuJoCo folder is included
-in the environment variable ``LD_LIBRARY_PATH`` and that ``mujoco_py`` is correctly installed.
-Installing MushroomRL in a Conda environment is generally
-safe. However, we are aware that when installing with the option
-``plots``, some errors may arise due to incompatibility issues between
-``pyqtgraph`` and Conda. We recommend not using Conda when installing using ``plots``.
-Finally, ensure that C/C++ compilers and Cython are working as expected.
+by some Gymnasium environments and the installation may fail otherwise. For Atari, you might need to install the ROM
+separately, otherwise the creation of Atari environments may fail. Opencv should be installed too.
+Installing MushroomRL in a Conda environment is generally safe. However, we are aware that when installing with the
+option ``plots``, some errors may arise due to incompatibility issues between ``pyqtgraph`` and Conda. We recommend not
+using Conda when installing using ``plots``.
 
 To check if the installation has been successful, you can try to run the basic example above.
 

diff --git a/docs/source/mushroom_rl.environments.rst b/docs/source/mushroom_rl.environments.rst
@@ -60,10 +60,10 @@ Grid World
     :inherited-members:
     :show-inheritance:
 
-Gym
-~~~
+Gymnasium
+~~~~~~~~~
 
-.. automodule:: mushroom_rl.environments.gym_env
+.. automodule:: mushroom_rl.environments.gymnasium_env
     :members:
     :private-members:
     :inherited-members:

diff --git a/docs/source/tutorials/code/advanced_experiment.py b/docs/source/tutorials/code/advanced_experiment.py
@@ -8,10 +8,10 @@
 from mushroom_rl.policy import EpsGreedy
 from mushroom_rl.utils.callbacks import CollectDataset
 from mushroom_rl.rl_utils.parameters import Parameter
-from mushroom_rl.environments import Gym
+from mushroom_rl.environments import Gymnasium
 
 # MDP
-mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)
+mdp = Gymnasium(name='MountainCar-v0', horizon=np.inf, gamma=1.)
 
 # Policy
 epsilon = Parameter(value=0.)

diff --git a/docs/source/tutorials/code/approximator.py b/docs/source/tutorials/code/approximator.py
@@ -12,7 +12,7 @@
 
 
 # MDP
-mdp = Gym(name='MountainCar-v0', horizon=np.inf, gamma=1.)
+mdp = Gymnasium(name='MountainCar-v0', horizon=np.inf, gamma=1.)
 
 # Policy
 epsilon = Parameter(value=0.)
@@ -33,7 +33,7 @@
 agent = SARSALambdaContinuous(mdp.info, pi, LinearApproximator,
                               approximator_params=approximator_params,
                               learning_rate=learning_rate,
-                              lambda_coeff= .9, features=features)
+                              lambda_coeff=.9, features=features)
 
 # Algorithm
 collect_dataset = CollectDataset()

diff --git a/docs/source/tutorials/tutorials.1_advanced.rst b/docs/source/tutorials/tutorials.1_advanced.rst
@@ -19,8 +19,8 @@ Initially, the MDP and the policy are created:
 .. literalinclude:: code/advanced_experiment.py
    :lines: 1-19
 
-This is an environment created with the MushroomRL interface to the OpenAI Gym
-library. Each environment offered by OpenAI Gym can be created this way simply
+This is an environment created with the MushroomRL interface to the Gymnasium
+library. Each environment offered by Gymnasium can be created this way simply
 providing the corresponding id in the ``name`` parameter, except for the Atari
 that are managed by a separate class.
 After the creation of the MDP, the tiles features are created:
@@ -46,7 +46,7 @@ Sutton & Barto, 1998* for details). After that, the learning is run as usual:
 .. literalinclude:: code/advanced_experiment.py
    :lines: 32-46
 
-To visualize the learned policy the rendering method of OpenAI Gym is used. To
+To visualize the learned policy the rendering method of Gymnasium is used. To
 activate the rendering in the environments that supports it, it is necessary to
 set ``render=True``.
 

diff --git a/docs/source/tutorials/tutorials.5_environments.rst b/docs/source/tutorials/tutorials.5_environments.rst
@@ -59,21 +59,21 @@ For example, to create the ShipSteering environment you can use:
     env = Environment.make('ShipSteering')
 
 To build environments, you may need to pass additional parameters.
-An example of this is the ``Gym`` environment which wraps most OpenAI Gym environments, except the Atari ones, which
-uses the ``Atari`` environment to implement proper preprocessing.
+An example of this is the ``Gymnasium`` environment which wraps most Gymnasium environments, except the Atari ones,
+which uses the ``Atari`` environment to implement proper preprocessing.
 
 If you want to build the ``Pendulum-v1`` gym environment you need to pass the environment name as a parameter:
 
 .. code-block:: python
 
-    env = Environment.make('Gym', 'Pendulum-v1')
+    env = Environment.make('Gymnasium', 'Pendulum-v1')
 
-However, for environments that are interfaces to other libraries such as ``Gym``, ``Atari`` or ``DMControl`` a notation
+However, for environments that are interfaces to other libraries such as ``Gymnasium``, ``Atari`` or ``DMControl`` a notation
 with a dot separator is supported. For example to create the pendulum you can also use:
 
 .. code-block:: python
 
-    env = Environment.make('Gym.Pendulum-v1')
+    env = Environment.make('Gymnasium.Pendulum-v1')
 
 Or, to create the ``hopper`` environment with ``hop`` task from DeepMind control suite you can use:
 

diff --git a/examples/atari_dqn.py b/examples/atari_dqn.py
@@ -10,12 +10,13 @@
 
 from mushroom_rl.algorithms.value import AveragedDQN, CategoricalDQN, DQN,\
     DoubleDQN, MaxminDQN, DuelingDQN, NoisyDQN, QuantileDQN, Rainbow
-from mushroom_rl.approximators.parametric import NumpyTorchApproximator
+from mushroom_rl.approximators.parametric import NumpyTorchApproximator, TorchApproximator
 from mushroom_rl.core import Core, Logger
 from mushroom_rl.environments import *
 from mushroom_rl.policy import EpsGreedy
 from mushroom_rl.rl_utils.parameters import LinearParameter, Parameter
 from mushroom_rl.rl_utils.replay_memory import PrioritizedReplayMemory
+from mushroom_rl.utils.torch import TorchUtils
 
 """
 This script runs Atari experiments with DQN, and some of its variants, as
@@ -118,17 +119,17 @@ def experiment():
     arg_game = parser.add_argument_group('Game')
     arg_game.add_argument("--name",
                           type=str,
-                          default='BreakoutDeterministic-v4',
-                          help='Gym ID of the Atari game.')
-    arg_game.add_argument("--screen-width", type=int, default=84,
-                          help='Width of the game screen.')
-    arg_game.add_argument("--screen-height", type=int, default=84,
-                          help='Height of the game screen.')
+                          default='ALE/Breakout-v5',
+                          help='Gymnasium ID of the Atari game.')
+    # arg_game.add_argument("--screen-width", type=int, default=84,
+    #                       help='Width of the game screen.')
+    # arg_game.add_argument("--screen-height", type=int, default=84,
+    #                       help='Height of the game screen.')
 
     arg_mem = parser.add_argument_group('Replay Memory')
-    arg_mem.add_argument("--initial-replay-size", type=int, default=50000,
+    arg_mem.add_argument("--initial-replay-size", type=int, default=20_000,
                          help='Initial size of the replay memory.')
-    arg_mem.add_argument("--max-replay-size", type=int, default=500000,
+    arg_mem.add_argument("--max-replay-size", type=int, default=100000, #changed to 100k instead of 500k because of memory restrictions
                          help='Max size of the replay memory.')
     arg_mem.add_argument("--prioritized", action='store_true',
                          help='Whether to use prioritized memory or not.')
@@ -141,13 +142,13 @@ def experiment():
                                   'rmspropcentered'],
                          default='adam',
                          help='Name of the optimizer to use.')
-    arg_net.add_argument("--learning-rate", type=float, default=.0001,
+    arg_net.add_argument("--learning-rate", type=float, default=6.25e-5,
                          help='Learning rate value of the optimizer.')
     arg_net.add_argument("--decay", type=float, default=.95,
                          help='Discount factor for the history coming from the'
                               'gradient momentum in rmspropcentered and'
                               'rmsprop')
-    arg_net.add_argument("--epsilon", type=float, default=1e-8,
+    arg_net.add_argument("--epsilon", type=float, default=1.5e-4,
                          help='Epsilon term used in rmspropcentered and'
                               'rmsprop')
 
@@ -163,36 +164,36 @@ def experiment():
                               "AveragedDQN or MaxminDQN.")
     arg_alg.add_argument("--batch-size", type=int, default=32,
                          help='Batch size for each fit of the network.')
-    arg_alg.add_argument("--history-length", type=int, default=4,
-                         help='Number of frames composing a state.')
-    arg_alg.add_argument("--target-update-frequency", type=int, default=10000,
+    # arg_alg.add_argument("--history-length", type=int, default=4,
+    #                      help='Number of frames composing a state.')
+    arg_alg.add_argument("--target-update-frequency", type=int, default=8_000,
                          help='Number of collected samples before each update'
                               'of the target network.')
-    arg_alg.add_argument("--evaluation-frequency", type=int, default=250000,
+    arg_alg.add_argument("--evaluation-frequency", type=int, default=250_000,
                          help='Number of collected samples before each'
                               'evaluation. An epoch ends after this number of'
                               'steps')
     arg_alg.add_argument("--train-frequency", type=int, default=4,
                          help='Number of collected samples before each fit of'
                               'the neural network.')
-    arg_alg.add_argument("--max-steps", type=int, default=50000000,
+    arg_alg.add_argument("--max-steps", type=int, default=50_000_000,
                          help='Total number of collected samples.')
-    arg_alg.add_argument("--final-exploration-frame", type=int, default=1000000,
+    arg_alg.add_argument("--final-exploration-frame", type=int, default=250_000,
                          help='Number of collected samples until the exploration'
                               'rate stops decreasing.')
     arg_alg.add_argument("--initial-exploration-rate", type=float, default=1.,
                          help='Initial value of the exploration rate.')
-    arg_alg.add_argument("--final-exploration-rate", type=float, default=.1,
+    arg_alg.add_argument("--final-exploration-rate", type=float, default=.01,
                          help='Final value of the exploration rate. When it'
                               'reaches this values, it stays constant.')
     arg_alg.add_argument("--test-exploration-rate", type=float, default=.05,
                          help='Exploration rate used during evaluation.')
-    arg_alg.add_argument("--test-samples", type=int, default=125000,
+    arg_alg.add_argument("--test-samples", type=int, default=125_000,
                          help='Number of collected samples for each'
                               'evaluation.')
-    arg_alg.add_argument("--max-no-op-actions", type=int, default=30,
-                         help='Maximum number of no-op actions performed at the'
-                              'beginning of the episodes.')
+    # arg_alg.add_argument("--max-no-op-actions", type=int, default=30,
+    #                      help='Maximum number of no-op actions performed at the'
+    #                           'beginning of the episodes.')
     arg_alg.add_argument("--alpha-coeff", type=float, default=.6,
                          help='Prioritization exponent for prioritized experience replay.')
     arg_alg.add_argument("--n-atoms", type=int, default=51,
@@ -218,6 +219,9 @@ def experiment():
                            help='Path of the model to be loaded.')
     arg_utils.add_argument('--render', action='store_true',
                            help='Flag specifying whether to render the game.')
+    arg_utils.add_argument('--record', action='store_true',
+                           help='Flag specifying whether to record the game.'
+                                'The render flag should be set to True')
     arg_utils.add_argument('--quiet', action='store_true',
                            help='Flag specifying whether to hide the progress'
                                 'bar.')
@@ -276,9 +280,10 @@ def experiment():
         max_steps = args.max_steps
 
     # MDP
-    mdp = GymnasiumAtari(args.name, args.screen_width, args.screen_height,
-                ends_at_life=True, history_length=args.history_length,
-                max_no_op_actions=args.max_no_op_actions, headless=False)
+    mdp = Atari(args.name, headless=False) 
+                # args.screen_width, args.screen_height,
+                # ends_at_life=True, history_length=args.history_length,
+                # max_no_op_actions=args.max_no_op_actions, 
 
     if args.load_path:
         logger = Logger(DQN.__name__, results_dir=None)
@@ -314,12 +319,14 @@ def experiment():
             output_shape=(mdp.info.action_space.n,),
             n_actions=mdp.info.action_space.n,
             n_features=Network.n_features,
-            optimizer=optimizer
+            optimizer=optimizer,
         )
         if args.algorithm not in ['cdqn', 'qdqn', 'rainbow']:
             approximator_params['loss'] = F.smooth_l1_loss
 
-        approximator = NumpyTorchApproximator
+        approximator = TorchApproximator if args.use_cuda else NumpyTorchApproximator
+
+        TorchUtils.set_default_device('cuda:0' if torch.cuda.is_available() and args.use_cuda else 'cpu')
 
         if args.prioritized:
             replay_memory = PrioritizedReplayMemory(
@@ -406,9 +413,8 @@ def experiment():
 
         # Evaluate initial policy
         pi.set_epsilon(epsilon_test)
-        mdp.set_episode_end(False)
         dataset = core.evaluate(n_steps=test_samples, render=args.render,
-                                quiet=args.quiet, record=True)
+                                quiet=args.quiet, record=args.record)
         scores.append(get_stats(dataset, logger))
 
         np.save(folder_name + '/scores.npy', scores)
@@ -417,7 +423,6 @@ def experiment():
             logger.info('- Learning:')
             # learning step
             pi.set_epsilon(epsilon)
-            mdp.set_episode_end(True)
             core.learn(n_steps=evaluation_frequency,
                        n_steps_per_fit=train_frequency, quiet=args.quiet)
 
@@ -427,7 +432,6 @@ def experiment():
             logger.info('- Evaluation:')
             # evaluation step
             pi.set_epsilon(epsilon_test)
-            mdp.set_episode_end(False)
             dataset = core.evaluate(n_steps=test_samples, render=args.render,
                                     quiet=args.quiet)
             scores.append(get_stats(dataset, logger))

diff --git a/examples/gym_recurrent_ppo.py b/examples/gym_recurrent_ppo.py
@@ -6,7 +6,7 @@
 import torch.optim as optim
 
 from mushroom_rl.core import Logger, Core
-from mushroom_rl.environments import Gym
+from mushroom_rl.environments import Gymnasium
 
 from mushroom_rl.algorithms.actor_critic import PPO_BPTT
 from mushroom_rl.policy import RecurrentGaussianTorchPolicy
@@ -201,7 +201,7 @@ def experiment(
     logger = Logger(results_dir=results_dir, log_name="stochastic_logging", seed=seed)
 
     # MDP
-    mdp = Gym(env, horizon=horizon, gamma=gamma)
+    mdp = Gymnasium(env, horizon=horizon, gamma=gamma)
 
     # create the policy
     dim_env_state = mdp.info.observation_space.shape[0]

diff --git a/examples/minigrid_dqn.py b/examples/minigrid_dqn.py
@@ -123,7 +123,7 @@ def experiment():
     arg_env.add_argument("--name",
                           type=str,
                           default='MiniGrid-Unlock-v0',
-                          help='Gym ID of the MiniGrid environment.')
+                          help='Gymnasium ID of the MiniGrid environment.')
 
     arg_mem = parser.add_argument_group('Replay Memory')
     arg_mem.add_argument("--initial-replay-size", type=int, default=50000,

diff --git a/examples/vectorized_core/pendulum_trust_region.py b/examples/vectorized_core/pendulum_trust_region.py
@@ -7,7 +7,7 @@
 from tqdm import trange
 
 from mushroom_rl.core import VectorCore, Logger, MultiprocessEnvironment
-from mushroom_rl.environments import Gym
+from mushroom_rl.environments import Gymnasium
 from mushroom_rl.algorithms.actor_critic import PPO, TRPO
 
 from mushroom_rl.policy import GaussianTorchPolicy
@@ -46,7 +46,7 @@ def experiment(alg, env_id, horizon, gamma, n_epochs, n_steps, n_steps_per_fit,
     logger.strong_line()
     logger.info('Experiment Algorithm: ' + alg.__name__)
 
-    mdp = MultiprocessEnvironment(Gym, env_id, horizon, gamma, n_envs=15)
+    mdp = MultiprocessEnvironment(Gymnasium, env_id, horizon, gamma, n_envs=15)
 
     critic_params = dict(network=Network,
                          optimizer={'class': optim.Adam,