feature(pu): add UniZero algo. and related configs/utils/envs/models #232

puyuan1996 · 2024-06-11T12:59:33Z

Add UniZero algo. and related configs/utils/envs/models
Our arxiv paper is on UniZero: Generalized and Efficient Planning with Scalable World Models.

into dev-transformer

…alues_cache

…arch

…_gpt

…kenizer

…inference

… all, use batchsize 256 for tokenizer

…teps

zoo/mujoco/config/mujoco_disc_sampled_efficientzero_config.py

zoo/atari/config/atari_efficientzero_multigpu_ddp_config.py

lzero/worker/muzero_evaluator.py

lzero/policy/utils.py

lzero/model/utils.py

lzero/entry/eval_muzero.py

lzero/entry/train_muzero.py

lzero/entry/train_unizero.py

…perations

…t and _reset_eval method for unizero

lzero/model/common.py

…into dev-unizero

lzero/model/efficientzero_model.py

lzero/model/common.py

lzero/model/unizero_model.py

lzero/model/unizero_world_models/transformer.py

lzero/model/unizero_world_models/tokenizer.py

lzero/entry/train_unizero.py

PaParaZz1 · 2024-07-03T11:21:33Z

lzero/entry/train_unizero.py

+                    if cfg.policy.use_priority:
+                        replay_buffer.update_priority(train_data, log_vars[0]['value_priority_orig'])
+
+        # Clear caches and precompute positional embedding matrices


move these part to the __del__ method of world_model

因为是每个train epoch都需要调用一次，因为我在unizero中新建了一个recompute_pos_emb_diff_and_clear_cache() methid哈

lzero/model/unizero_world_models/kv_caching.py

PaParaZz1 · 2024-07-03T11:25:28Z

lzero/model/unizero_world_models/kv_caching.py

+        """
+        assert embed_dim % num_heads == 0
+        self._n, self._cache, self._size = num_samples, None, None
+        self._reset = lambda n: torch.empty(n, num_heads, max_tokens, embed_dim // num_heads,


why use lambda function here, rather than directly writing the implementation in reset method

这里使用 lambda function，可以避免很多参数的传递，效果是类似的

lzero/model/unizero_world_models/kv_caching.py

lzero/policy/unizero.py

lzero/model/common.py

puyuan1996 added 30 commits November 4, 2023 01:15

feature(pu): add init version of gpt-based muzero

36f69d0

sync code

344f9f6

sync code

f922cfb

sync code

6e436c5

fix(pu): fix reward/value/policy kl loss

e0c2b95

fix(pu): fix kv_cache used in MCTS search method

fc913c9

Merge branch 'dev-transformer' of https://github.com/opendilab/LightZero

8779039

into dev-transformer

feature(pu): add muzero_gpt for atari, polish world_model.past_keys_v…

a20deba

…alues_cache

polish(pu): polish slicer

641680c

poliah(pu): polish compute_slice

36adc43

fix(pu): fix init latent state, fix past_keys_values_cache in mcts se…

e39d421

…arch

fix(pu): fix past_keys_values_cache in mcts search

80f2239

polish(pu): polish world_model for batch processing, work in progress

3cad2f2

feature(pu): dd test_slicer and test_slicer_time

12d4174

fix(pu): fix reward/value/policy loss in muzero_gpt

03dc66f

fix(pu): fix state_action_history bug in mcts_ctree for muzero_gpt

26b6255

fix(pu): fix self.past_keys_values_cache bug in mcts_ctree for muzero…

fd85f4d

…_gpt

polish(pu): use precomputed slices

30e7882

feature(pu): add target_policy_entropy log

ff6419f

feature(pu): add train tokenizer related loss

19c673f

polish(pu): polish obs_token_loss_weight

91d2e63

polish(pu): use different sample data for training transformer and to…

18a5e7b

…kenizer

fix(pu): use argmax to select the most likely obs_token in recurrent_…

8a6da1d

…inference

sync code

20f7c4c

fix(pu): tokenizer adam optimizer use weight_decay=0, set lr=1e-4 for…

8a1c418

… all, use batchsize 256 for tokenizer

fix(pu): calculate target value in init_inference using unrolling 5 s…

7be1c32

…teps

fix(pu): fix init_inference in mcts root state

fa2263e

plish(pu): polish muzero_gpt config

1c1c745

polish(pu): tokenizer delay update

a22cc88

polish(pu): polish configs

6c21c0f

polish(pu): make muzero variants inherit from muzero

4c90207

puyuan1996 force-pushed the dev-unizero branch from a14a15d to 4c90207 Compare July 1, 2024 07:17

polish(pu): polish unizero comments

d4eab1e

PaParaZz1 requested changes Jul 1, 2024

View reviewed changes

puyuan1996 added 4 commits July 2, 2024 14:42

polish(pu): polish lunarlander configs, fix muzero_collector env_id o…

6d366e8

…perations

fix(pu): fix muzero_evaluator env_id operations, polish _reset_collec…

d5e958b

…t and _reset_eval method for unizero

polish(pu): polish configs

428379c

polish(pu): polish reset and del method of muzero

7305526

zjowowen reviewed Jul 2, 2024

View reviewed changes

lzero/model/common.py Show resolved Hide resolved

polish(pu): polish comments and typelint

1dd01c7

zjowowen reviewed Jul 2, 2024

View reviewed changes

lzero/model/common.py Outdated Show resolved Hide resolved

puyuan1996 and others added 5 commits July 3, 2024 11:49

polish(pu): add line_profiler requirements

ee905fb

fix(pu): fix muzero target_policy_entropy bug

3ce2fb5

fix(pu): fix device and use gym

d8c5649

Merge branch 'dev-unizero' of https://github.com/opendilab/LightZero …

f908378

…into dev-unizero

polish(pu): move initialize_zeros_batch to reset method of unizero

9813ce4

PaParaZz1 requested changes Jul 3, 2024

View reviewed changes

puyuan1996 added 5 commits July 3, 2024 13:11

polish(pu): polish model_path in configs

fb9594a

polish(pu): polish code and comments

79a4567

polish(pu): polish unizero utils

d97128f

polish(pu): polish KeysValues to_device method

7b08a5e

polish(pu): delete bsuite_unizero_config.py

9d739e8

PaParaZz1 requested changes Jul 3, 2024

View reviewed changes

puyuan1996 added 3 commits July 3, 2024 23:54

polish(pu): polish atari_unizero_configs

c957a2d

polish(pu): polish code style and comments

63507b1

polish(pu): polish code style and comments

c1e0228

puyuan1996 added enhancement New feature or request style Code or comments formatting labels Jul 3, 2024

polish(pu): polish comments in model/common.py

3882296

puyuan1996 merged commit 4e65afa into main Jul 3, 2024
0 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(pu): add UniZero algo. and related configs/utils/envs/models #232

feature(pu): add UniZero algo. and related configs/utils/envs/models #232

puyuan1996 commented Jun 11, 2024 •

edited

Loading

PaParaZz1 Jul 3, 2024

puyuan1996 Jul 3, 2024

PaParaZz1 Jul 3, 2024

puyuan1996 Jul 3, 2024

feature(pu): add UniZero algo. and related configs/utils/envs/models #232

feature(pu): add UniZero algo. and related configs/utils/envs/models #232

Conversation

puyuan1996 commented Jun 11, 2024 • edited Loading

PaParaZz1 Jul 3, 2024

Choose a reason for hiding this comment

puyuan1996 Jul 3, 2024

Choose a reason for hiding this comment

PaParaZz1 Jul 3, 2024

Choose a reason for hiding this comment

puyuan1996 Jul 3, 2024

Choose a reason for hiding this comment

puyuan1996 commented Jun 11, 2024 •

edited

Loading