thu-ml · Trinkle23897 · Feb 3, 2023 · Dec 29, 2022 · Dec 29, 2022 · Jan 3, 2023
diff --git a/README.md b/README.md
@@ -6,6 +6,13 @@
 
 [![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/) [![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock) [![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master) [![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/master/) [![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions) [![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou) [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues) [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network) [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)
 
+> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium), 
+> which is where future maintenance of OpenAI Gym will be taking place. 
+> Tianshou has transitioned to internally using Gymnasium environments. You can still use OpenAI Gym environments with
+> Tianshou vector environments, but they will be wrapped in a compatibility layer, which could be a source of issues.
+> We recommend that you update your environment code to Gymnasium. If you want to continue using OpenAI Gym with
+> Tianshou, you need to manually install Gym and [Shimmy](https://github.com/Farama-Foundation/Shimmy) (the compatibility layer).
+
 **Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:
 
 - [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
@@ -105,21 +112,21 @@ The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/ma
 
 ### Comprehensive Functionality
 
-| RL Platform                                                        | GitHub Stars                                                                                                                                    | # of Alg. <sup>(1)</sup> | Custom Env                  | Batch Training                    | RNN Support        | Nested Observation | Backend    |
-| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ | --------------------------- | --------------------------------- | ------------------ | ------------------ | ---------- |
-| [Baselines](https://github.com/openai/baselines)                   | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers)                         | 9                        | :heavy_check_mark: (gym)    | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x:                | TF1        |
-| [Stable-Baselines](https://github.com/hill-a/stable-baselines)     | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers)           | 11                       | :heavy_check_mark: (gym)    | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x:                | TF1        |
-| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3)   | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers)         | 7<sup> (3)</sup>         | :heavy_check_mark: (gym)    | :heavy_minus_sign: <sup>(2)</sup> | :x:                | :heavy_check_mark: | PyTorch    |
-| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers)                           | 16                       | :heavy_check_mark:          | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
-| [SpinningUp](https://github.com/openai/spinningup)                 | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers)                        | 6                        | :heavy_check_mark: (gym)    | :heavy_minus_sign: <sup>(2)</sup> | :x:                | :x:                | PyTorch    |
-| [Dopamine](https://github.com/google/dopamine)                     | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers)                           | 7                        | :x:                         | :x:                               | :x:                | :x:                | TF/JAX     |
-| [ACME](https://github.com/deepmind/acme)                           | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers)                               | 14                       | :heavy_check_mark: (dm_env) | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | TF/JAX     |
-| [keras-rl](https://github.com/keras-rl/keras-rl)                   | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers)                        | 7                        | :heavy_check_mark: (gym)    | :x:                               | :x:                | :x:                | Keras      |
-| [rlpyt](https://github.com/astooke/rlpyt)                          | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers)                               | 11                       | :x:                         | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
-| [ChainerRL](https://github.com/chainer/chainerrl)                  | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers)                       | 18                       | :heavy_check_mark: (gym)    | :heavy_check_mark:                | :heavy_check_mark: | :x:                | Chainer    |
-| [Sample Factory](https://github.com/alex-petrenko/sample-factory)  | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup>         | :heavy_check_mark: (gym)    | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
-|                                                                    |                                                                                                                                                 |                          |                             |                                   |                    |                    |            |
-| [Tianshou](https://github.com/thu-ml/tianshou)                     | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)                           | 20                       | :heavy_check_mark: (gym)    | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
+| RL Platform                                                        | GitHub Stars                                                                                                                                    | # of Alg. <sup>(1)</sup> | Custom Env                     | Batch Training                    | RNN Support        | Nested Observation | Backend    |
+| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |--------------------------------| --------------------------------- | ------------------ | ------------------ | ---------- |
+| [Baselines](https://github.com/openai/baselines)                   | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers)                         | 9                        | :heavy_check_mark: (gym)       | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x:                | TF1        |
+| [Stable-Baselines](https://github.com/hill-a/stable-baselines)     | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers)           | 11                       | :heavy_check_mark: (gym)       | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x:                | TF1        |
+| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3)   | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers)         | 7<sup> (3)</sup>         | :heavy_check_mark: (gym)       | :heavy_minus_sign: <sup>(2)</sup> | :x:                | :heavy_check_mark: | PyTorch    |
+| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers)                           | 16                       | :heavy_check_mark:             | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
+| [SpinningUp](https://github.com/openai/spinningup)                 | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers)                        | 6                        | :heavy_check_mark: (gym)       | :heavy_minus_sign: <sup>(2)</sup> | :x:                | :x:                | PyTorch    |
+| [Dopamine](https://github.com/google/dopamine)                     | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers)                           | 7                        | :x:                            | :x:                               | :x:                | :x:                | TF/JAX     |
+| [ACME](https://github.com/deepmind/acme)                           | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers)                               | 14                       | :heavy_check_mark: (dm_env)    | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | TF/JAX     |
+| [keras-rl](https://github.com/keras-rl/keras-rl)                   | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers)                        | 7                        | :heavy_check_mark: (gym)       | :x:                               | :x:                | :x:                | Keras      |
+| [rlpyt](https://github.com/astooke/rlpyt)                          | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers)                               | 11                       | :x:                            | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
+| [ChainerRL](https://github.com/chainer/chainerrl)                  | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers)                       | 18                       | :heavy_check_mark: (gym)       | :heavy_check_mark:                | :heavy_check_mark: | :x:                | Chainer    |
+| [Sample Factory](https://github.com/alex-petrenko/sample-factory)  | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup>         | :heavy_check_mark: (gym)       | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
+|                                                                    |                                                                                                                                                 |                          |                                |                                   |                    |                    |            |
+| [Tianshou](https://github.com/thu-ml/tianshou)                     | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers)                           | 20                       | :heavy_check_mark: (Gymnasium) | :heavy_check_mark:                | :heavy_check_mark: | :heavy_check_mark: | PyTorch    |
 
 <sup>(1): access date: 2021-08-08</sup>
 
@@ -175,7 +182,8 @@ This is an example of Deep Q Network. You can also run the full script at [test/
 First, import some relevant packages:
 
 ```python
-import gym, torch, numpy as np, torch.nn as nn
+import gymnasium as gym
+import torch, numpy as np, torch.nn as nn
 from torch.utils.tensorboard import SummaryWriter
 import tianshou as ts
 ```

diff --git a/docs/spelling_wordlist.txt b/docs/spelling_wordlist.txt
@@ -165,3 +165,4 @@ subprocesses
 isort
 yapf
 pydocstyle
+Args
diff --git a/docs/tutorials/cheatsheet.rst b/docs/tutorials/cheatsheet.rst
@@ -4,7 +4,7 @@ Cheat Sheet
 This page shows some code snippets of how to use Tianshou to develop new
 algorithms / apply algorithms to new scenarios.
 
-By the way, some of these issues can be resolved by using a ``gym.Wrapper``.
+By the way, some of these issues can be resolved by using a ``gymnasium.Wrapper``.
 It could be a universal solution in the policy-environment interaction. But
 you can also use the batch processor :ref:`preprocess_fn` or vectorized
 environment wrapper :class:`~tianshou.env.VectorEnvWrapper`.
@@ -159,7 +159,7 @@ toy_text and classic_control environments. For more information, please refer to
     # install envpool: pip3 install envpool
 
     import envpool
-    envs = envpool.make_gym("CartPole-v0", num_envs=10)
+    envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
     collector = Collector(policy, envs, buffer)
 
 Here are some other `examples <https://github.com/sail-sg/envpool/tree/master/examples/tianshou_examples>`_.

diff --git a/docs/tutorials/concepts.rst b/docs/tutorials/concepts.rst
@@ -55,18 +55,22 @@ Buffer
 
 :class:`~tianshou.data.ReplayBuffer` stores data generated from interaction between the policy and environment. ReplayBuffer can be considered as a specialized form (or management) of :class:`~tianshou.data.Batch`. It stores all the data in a batch with circular-queue style.
 
-The current implementation of Tianshou typically use 7 reserved keys in
+The current implementation of Tianshou typically use the following reserved keys in
 :class:`~tianshou.data.Batch`:
 
 * ``obs`` the observation of step :math:`t` ;
 * ``act`` the action of step :math:`t` ;
 * ``rew`` the reward of step :math:`t` ;
-* ``done`` the done flag of step :math:`t` ;
+* ``terminated`` the terminated flag of step :math:`t` ;
+* ``truncated`` the truncated flag of step :math:`t` ;
+* ``done`` the done flag of step :math:`t` (can be inferred as ``terminated or truncated``);
 * ``obs_next`` the observation of step :math:`t+1` ;
 * ``info`` the info of step :math:`t` (in ``gym.Env``, the ``env.step()`` function returns 4 arguments, and the last one is ``info``);
 * ``policy`` the data computed by policy in step :math:`t`;
 
-The following code snippet illustrates its usage, including:
+When adding data to a replay buffer, the done flag will be inferred automatically from ``terminated``and ``truncated``.
+
+The following code snippet illustrates the usage, including:
 
 - the basic data storage: ``add()``;
 - get attribute, get slicing data, ...;
@@ -80,7 +84,7 @@ The following code snippet illustrates its usage, including:
     >>> from tianshou.data import Batch, ReplayBuffer
     >>> buf = ReplayBuffer(size=20)
     >>> for i in range(3):
-    ...     buf.add(Batch(obs=i, act=i, rew=i, done=0, obs_next=i + 1, info={}))
+    ...     buf.add(Batch(obs=i, act=i, rew=i, terminated=0, truncated=0, obs_next=i + 1, info={}))
 
     >>> buf.obs
     # since we set size = 20, len(buf.obs) == 20.
@@ -95,8 +99,8 @@ The following code snippet illustrates its usage, including:
 
     >>> buf2 = ReplayBuffer(size=10)
     >>> for i in range(15):
-    ...     done = i % 4 == 0
-    ...     buf2.add(Batch(obs=i, act=i, rew=i, done=done, obs_next=i + 1, info={}))
+    ...     terminated = i % 4 == 0
+    ...     buf2.add(Batch(obs=i, act=i, rew=i, terminated=terminated, truncated=False, obs_next=i + 1, info={}))
     >>> len(buf2)
     10
     >>> buf2.obs
@@ -146,10 +150,10 @@ The following code snippet illustrates its usage, including:
 
     >>> buf = ReplayBuffer(size=9, stack_num=4, ignore_obs_next=True)
     >>> for i in range(16):
-    ...     done = i % 5 == 0
+    ...     terminated = i % 5 == 0
     ...     ptr, ep_rew, ep_len, ep_idx = buf.add(
     ...         Batch(obs={'id': i}, act=i, rew=i,
-    ...               done=done, obs_next={'id': i + 1}))
+    ...               terminated=terminated, truncated=False, obs_next={'id': i + 1}))
     ...     print(i, ep_len, ep_rew)
     0 [1] [0.]
     1 [0] [0.]

diff --git a/docs/tutorials/dqn.rst b/docs/tutorials/dqn.rst
@@ -35,10 +35,10 @@ Here is the overall system:
 Make an Environment
 -------------------
 
-First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `OpenAI Gym <https://github.com/openai/gym>`_. In your Python code, simply import Tianshou and make the environment:
+First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `Gymnasium <https://github.com/Farama-Foundation/Gymnasium>`_. In your Python code, simply import Tianshou and make the environment:
 ::
 
-    import gym
+    import gymnasium as gym
     import tianshou as ts
 
     env = gym.make('CartPole-v0')
@@ -84,8 +84,8 @@ You can also try the super-fast vectorized environment `EnvPool <https://github.
 ::
 
     import envpool
-    train_envs = envpool.make_gym("CartPole-v0", num_envs=10)
-    test_envs = envpool.make_gym("CartPole-v0", num_envs=100)
+    train_envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
+    test_envs = envpool.make_gymnasium("CartPole-v0", num_envs=100)
 
 For the demonstration, here we use the second code-block.
-Original file line number
+Diff line change
@@ Expand Up / @@ -165,3 +165,4 @@ subprocesses @@
     isort
     yapf
     pydocstyle
+    Args