Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gymnasium Integration #789

Merged
merged 29 commits into from
Feb 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
10cc606
Replaced Gym with Gymnasium
Markus28 Dec 29, 2022
4e827c1
Updated docs
Markus28 Dec 29, 2022
eaeb4cc
Added type ignores for gym wrappers
Markus28 Jan 3, 2023
125069a
Changed envpool from gym to gymnasium
Markus28 Jan 3, 2023
201f0cd
Require envpool>=0.7.0
Markus28 Jan 3, 2023
e1d0c0e
Fixed MyTestEnv, removed __init__ that is unnecessary due to Gymnasium
Markus28 Jan 3, 2023
5f0833a
Removed dead code that was necessary for old step API
Markus28 Jan 3, 2023
bea1559
Removed type hints about old step API
Markus28 Jan 3, 2023
3fc8c78
Removed check for whether environments return info upon reset
Markus28 Jan 3, 2023
9f2f104
Increase required version of PZ, fix some CI issues
Markus28 Jan 4, 2023
1ea1b90
Added dummy info to reset of FiniteVectorEnv
Markus28 Jan 4, 2023
9c9a70b
Fix log method to take terminated and truncated, update PettingZooEnv…
Markus28 Jan 4, 2023
28e5cb1
Made some code more explicit (removed hack for compatibility with old…
Markus28 Jan 4, 2023
6be6f41
Fix FiniteVectorEnv
Markus28 Jan 4, 2023
803f543
Fixed some type hints
Markus28 Jan 4, 2023
54e922f
Try to fix FiniteVectorEnv
Markus28 Jan 5, 2023
d9a1feb
Skip NNI test, remove commented out code
Markus28 Jan 5, 2023
54515f7
Fixed type errors
Markus28 Jan 5, 2023
bfca5a9
Disclaimer in README
Markus28 Jan 5, 2023
bc379b0
Put type ignore in the right places
Markus28 Jan 5, 2023
266c64a
Also allow OpenAI gym environments, fixed documentation
Jan 20, 2023
565c2e9
Also allow PettingZooEnv in vector environment, fixed type hint
Markus28 Jan 20, 2023
1402f1e
Fixed import of PettingZooEnv
Markus28 Jan 21, 2023
2a5ff31
Fixed type hinting, updated README
Markus28 Jan 21, 2023
040d2af
Updated documentation about ReplayBuffer
Markus28 Jan 21, 2023
28475f4
Fixed gymnasium version, added shimmy to dev requirements
Jan 23, 2023
2be8591
Added test for conversion of OpenAI Gym environments
Jan 23, 2023
654f9d8
fix spelling
Trinkle23897 Jan 26, 2023
363250d
Merge branch 'master' into gymnasium_integration
Trinkle23897 Feb 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 24 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@

[![PyPI](https://img.shields.io/pypi/v/tianshou)](https://pypi.org/project/tianshou/) [![Conda](https://img.shields.io/conda/vn/conda-forge/tianshou)](https://github.com/conda-forge/tianshou-feedstock) [![Read the Docs](https://img.shields.io/readthedocs/tianshou)](https://tianshou.readthedocs.io/en/master) [![Read the Docs](https://img.shields.io/readthedocs/tianshou-docs-zh-cn?label=%E4%B8%AD%E6%96%87%E6%96%87%E6%A1%A3)](https://tianshou.readthedocs.io/zh/master/) [![Unittest](https://github.com/thu-ml/tianshou/workflows/Unittest/badge.svg?branch=master)](https://github.com/thu-ml/tianshou/actions) [![codecov](https://img.shields.io/codecov/c/gh/thu-ml/tianshou)](https://codecov.io/gh/thu-ml/tianshou) [![GitHub issues](https://img.shields.io/github/issues/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/issues) [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) [![GitHub forks](https://img.shields.io/github/forks/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/network) [![GitHub license](https://img.shields.io/github/license/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/blob/master/LICENSE)

> ⚠️️ **Transition to Gymnasium**: The maintainers of OpenAI Gym have recently released [Gymnasium](http://github.com/Farama-Foundation/Gymnasium),
> which is where future maintenance of OpenAI Gym will be taking place.
> Tianshou has transitioned to internally using Gymnasium environments. You can still use OpenAI Gym environments with
> Tianshou vector environments, but they will be wrapped in a compatibility layer, which could be a source of issues.
> We recommend that you update your environment code to Gymnasium. If you want to continue using OpenAI Gym with
> Tianshou, you need to manually install Gym and [Shimmy](https://github.com/Farama-Foundation/Shimmy) (the compatibility layer).

**Tianshou** ([天授](https://baike.baidu.com/item/%E5%A4%A9%E6%8E%88)) is a reinforcement learning platform based on pure PyTorch. Unlike existing reinforcement learning libraries, which are mainly based on TensorFlow, have many nested classes, unfriendly API, or slow-speed, Tianshou provides a fast-speed modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. The supported interface algorithms currently include:

- [Deep Q-Network (DQN)](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf)
Expand Down Expand Up @@ -105,21 +112,21 @@ The example scripts are under [test/](https://github.com/thu-ml/tianshou/blob/ma

### Comprehensive Functionality

| RL Platform | GitHub Stars | # of Alg. <sup>(1)</sup> | Custom Env | Batch Training | RNN Support | Nested Observation | Backend |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ | --------------------------- | --------------------------------- | ------------------ | ------------------ | ---------- |
| [Baselines](https://github.com/openai/baselines) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | 9 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines](https://github.com/hill-a/stable-baselines) | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers) | 11 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers) | 7<sup> (3)</sup> | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :heavy_check_mark: | PyTorch |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | 16 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
| [SpinningUp](https://github.com/openai/spinningup) | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers) | 6 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :x: | PyTorch |
| [Dopamine](https://github.com/google/dopamine) | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers) | 7 | :x: | :x: | :x: | :x: | TF/JAX |
| [ACME](https://github.com/deepmind/acme) | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers) | 14 | :heavy_check_mark: (dm_env) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/JAX |
| [keras-rl](https://github.com/keras-rl/keras-rl) | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers) | 7 | :heavy_check_mark: (gym) | :x: | :x: | :x: | Keras |
| [rlpyt](https://github.com/astooke/rlpyt) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) | 11 | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| [ChainerRL](https://github.com/chainer/chainerrl) | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers) | 18 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :x: | Chainer |
| [Sample Factory](https://github.com/alex-petrenko/sample-factory) | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup> | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| | | | | | | | |
| [Tianshou](https://github.com/thu-ml/tianshou) | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | 20 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| RL Platform | GitHub Stars | # of Alg. <sup>(1)</sup> | Custom Env | Batch Training | RNN Support | Nested Observation | Backend |
| ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------ |--------------------------------| --------------------------------- | ------------------ | ------------------ | ---------- |
| [Baselines](https://github.com/openai/baselines) | [![GitHub stars](https://img.shields.io/github/stars/openai/baselines)](https://github.com/openai/baselines/stargazers) | 9 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines](https://github.com/hill-a/stable-baselines) | [![GitHub stars](https://img.shields.io/github/stars/hill-a/stable-baselines)](https://github.com/hill-a/stable-baselines/stargazers) | 11 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :heavy_check_mark: | :x: | TF1 |
| [Stable-Baselines3](https://github.com/DLR-RM/stable-baselines3) | [![GitHub stars](https://img.shields.io/github/stars/DLR-RM/stable-baselines3)](https://github.com/DLR-RM/stable-baselines3/stargazers) | 7<sup> (3)</sup> | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :heavy_check_mark: | PyTorch |
| [Ray/RLlib](https://github.com/ray-project/ray/tree/master/rllib/) | [![GitHub stars](https://img.shields.io/github/stars/ray-project/ray)](https://github.com/ray-project/ray/stargazers) | 16 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/PyTorch |
| [SpinningUp](https://github.com/openai/spinningup) | [![GitHub stars](https://img.shields.io/github/stars/openai/spinningup)](https://github.com/openai/spinningupstargazers) | 6 | :heavy_check_mark: (gym) | :heavy_minus_sign: <sup>(2)</sup> | :x: | :x: | PyTorch |
| [Dopamine](https://github.com/google/dopamine) | [![GitHub stars](https://img.shields.io/github/stars/google/dopamine)](https://github.com/google/dopamine/stargazers) | 7 | :x: | :x: | :x: | :x: | TF/JAX |
| [ACME](https://github.com/deepmind/acme) | [![GitHub stars](https://img.shields.io/github/stars/deepmind/acme)](https://github.com/deepmind/acme/stargazers) | 14 | :heavy_check_mark: (dm_env) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | TF/JAX |
| [keras-rl](https://github.com/keras-rl/keras-rl) | [![GitHub stars](https://img.shields.io/github/stars/keras-rl/keras-rl)](https://github.com/keras-rl/keras-rlstargazers) | 7 | :heavy_check_mark: (gym) | :x: | :x: | :x: | Keras |
| [rlpyt](https://github.com/astooke/rlpyt) | [![GitHub stars](https://img.shields.io/github/stars/astooke/rlpyt)](https://github.com/astooke/rlpyt/stargazers) | 11 | :x: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| [ChainerRL](https://github.com/chainer/chainerrl) | [![GitHub stars](https://img.shields.io/github/stars/chainer/chainerrl)](https://github.com/chainer/chainerrl/stargazers) | 18 | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :x: | Chainer |
| [Sample Factory](https://github.com/alex-petrenko/sample-factory) | [![GitHub stars](https://img.shields.io/github/stars/alex-petrenko/sample-factory)](https://github.com/alex-petrenko/sample-factory/stargazers) | 1<sup> (4)</sup> | :heavy_check_mark: (gym) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |
| | | | | | | | |
| [Tianshou](https://github.com/thu-ml/tianshou) | [![GitHub stars](https://img.shields.io/github/stars/thu-ml/tianshou)](https://github.com/thu-ml/tianshou/stargazers) | 20 | :heavy_check_mark: (Gymnasium) | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | PyTorch |

<sup>(1): access date: 2021-08-08</sup>

Expand Down Expand Up @@ -175,7 +182,8 @@ This is an example of Deep Q Network. You can also run the full script at [test/
First, import some relevant packages:

```python
import gym, torch, numpy as np, torch.nn as nn
import gymnasium as gym
import torch, numpy as np, torch.nn as nn
from torch.utils.tensorboard import SummaryWriter
import tianshou as ts
```
Expand Down
1 change: 1 addition & 0 deletions docs/spelling_wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,4 @@ subprocesses
isort
yapf
pydocstyle
Args
4 changes: 2 additions & 2 deletions docs/tutorials/cheatsheet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Cheat Sheet
This page shows some code snippets of how to use Tianshou to develop new
algorithms / apply algorithms to new scenarios.

By the way, some of these issues can be resolved by using a ``gym.Wrapper``.
By the way, some of these issues can be resolved by using a ``gymnasium.Wrapper``.
It could be a universal solution in the policy-environment interaction. But
you can also use the batch processor :ref:`preprocess_fn` or vectorized
environment wrapper :class:`~tianshou.env.VectorEnvWrapper`.
Expand Down Expand Up @@ -159,7 +159,7 @@ toy_text and classic_control environments. For more information, please refer to
# install envpool: pip3 install envpool

import envpool
envs = envpool.make_gym("CartPole-v0", num_envs=10)
envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
collector = Collector(policy, envs, buffer)

Here are some other `examples <https://github.com/sail-sg/envpool/tree/master/examples/tianshou_examples>`_.
Expand Down
20 changes: 12 additions & 8 deletions docs/tutorials/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,22 @@ Buffer

:class:`~tianshou.data.ReplayBuffer` stores data generated from interaction between the policy and environment. ReplayBuffer can be considered as a specialized form (or management) of :class:`~tianshou.data.Batch`. It stores all the data in a batch with circular-queue style.

The current implementation of Tianshou typically use 7 reserved keys in
The current implementation of Tianshou typically use the following reserved keys in
:class:`~tianshou.data.Batch`:

* ``obs`` the observation of step :math:`t` ;
* ``act`` the action of step :math:`t` ;
* ``rew`` the reward of step :math:`t` ;
* ``done`` the done flag of step :math:`t` ;
* ``terminated`` the terminated flag of step :math:`t` ;
* ``truncated`` the truncated flag of step :math:`t` ;
* ``done`` the done flag of step :math:`t` (can be inferred as ``terminated or truncated``);
* ``obs_next`` the observation of step :math:`t+1` ;
* ``info`` the info of step :math:`t` (in ``gym.Env``, the ``env.step()`` function returns 4 arguments, and the last one is ``info``);
* ``policy`` the data computed by policy in step :math:`t`;

The following code snippet illustrates its usage, including:
When adding data to a replay buffer, the done flag will be inferred automatically from ``terminated``and ``truncated``.

The following code snippet illustrates the usage, including:

- the basic data storage: ``add()``;
- get attribute, get slicing data, ...;
Expand All @@ -80,7 +84,7 @@ The following code snippet illustrates its usage, including:
>>> from tianshou.data import Batch, ReplayBuffer
>>> buf = ReplayBuffer(size=20)
>>> for i in range(3):
... buf.add(Batch(obs=i, act=i, rew=i, done=0, obs_next=i + 1, info={}))
... buf.add(Batch(obs=i, act=i, rew=i, terminated=0, truncated=0, obs_next=i + 1, info={}))

>>> buf.obs
# since we set size = 20, len(buf.obs) == 20.
Expand All @@ -95,8 +99,8 @@ The following code snippet illustrates its usage, including:

>>> buf2 = ReplayBuffer(size=10)
>>> for i in range(15):
... done = i % 4 == 0
... buf2.add(Batch(obs=i, act=i, rew=i, done=done, obs_next=i + 1, info={}))
... terminated = i % 4 == 0
... buf2.add(Batch(obs=i, act=i, rew=i, terminated=terminated, truncated=False, obs_next=i + 1, info={}))
>>> len(buf2)
10
>>> buf2.obs
Expand Down Expand Up @@ -146,10 +150,10 @@ The following code snippet illustrates its usage, including:

>>> buf = ReplayBuffer(size=9, stack_num=4, ignore_obs_next=True)
>>> for i in range(16):
... done = i % 5 == 0
... terminated = i % 5 == 0
... ptr, ep_rew, ep_len, ep_idx = buf.add(
... Batch(obs={'id': i}, act=i, rew=i,
... done=done, obs_next={'id': i + 1}))
... terminated=terminated, truncated=False, obs_next={'id': i + 1}))
... print(i, ep_len, ep_rew)
0 [1] [0.]
1 [0] [0.]
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorials/dqn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ Here is the overall system:
Make an Environment
-------------------

First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `OpenAI Gym <https://github.com/openai/gym>`_. In your Python code, simply import Tianshou and make the environment:
First of all, you have to make an environment for your agent to interact with. You can use ``gym.make(environment_name)`` to make an environment for your agent. For environment interfaces, we follow the convention of `Gymnasium <https://github.com/Farama-Foundation/Gymnasium>`_. In your Python code, simply import Tianshou and make the environment:
::

import gym
import gymnasium as gym
import tianshou as ts

env = gym.make('CartPole-v0')
Expand Down Expand Up @@ -84,8 +84,8 @@ You can also try the super-fast vectorized environment `EnvPool <https://github.
::

import envpool
train_envs = envpool.make_gym("CartPole-v0", num_envs=10)
test_envs = envpool.make_gym("CartPole-v0", num_envs=100)
train_envs = envpool.make_gymnasium("CartPole-v0", num_envs=10)
test_envs = envpool.make_gymnasium("CartPole-v0", num_envs=100)

For the demonstration, here we use the second code-block.

Expand Down
Loading