distunroller set last step periodically #1725

hnyu · 2025-01-17T22:41:37Z

This PR let the DistributedUnroller to truncate the experience stream on its own. The stream is truncated if either the predefined max episode length is reached, or the env returns a LAST step.

After the stream is truncated, the unroller will switch to sending exps to a different trainer worker (if available).

This PR is dependent on PR #1723 which fixes a ReplayBuffer sharing issue among processes. Now I've added a minimal test to the init of DistributedTrainer to make sure that ReplayBuffer can correctly shared with a subprocess.

emailweixu · 2025-01-22T22:22:18Z

alf/algorithms/distributed_off_policy_algorithm.py

+            # One episode finishes; move to the next worker
+            # We need to make sure a whole episode is always sent to the same
+            # worker so that the temporal information is preserved.
+            exp = alf.nest.set_field(


In the case of a single trainer workers, we don't need to change the step type to LAST.

In the case of a single trainer workers, we don't need to change the step type to LAST.

If there are multiple unrollers, we still need to set LAST. But it's not straightforward for an unroller to know if there is any other unroller, unless via the trainer. So for simplicity, here we always set LAST.

emailweixu · 2025-01-22T22:26:24Z

alf/algorithms/distributed_off_policy_algorithm.py

+        if self._exp_socket is None:
+            self._exp_socket, _ = create_zmq_socket(zmq.ROUTER, '*',
+                                                    self._port, self._id)
+
        try:


should send only for LAST step or episode length reached.

should send only for LAST step or episode length reached.

Right now, we always send on a per-exp basis, instead of waiting for a long traj. The trainer is responsible for maintaining the traj integrity. The reason is for latency concern, because sending a very long traj might take a long time (especially with images), blocking the unroller.

emailweixu · 2025-01-22T22:27:03Z

alf/algorithms/distributed_off_policy_algorithm.py

+        self._num_earliest_frames_ignored = self._core_alg._num_earliest_frames_ignored
+
+        # We always test tensor sharing among processes, because
+        # we rely on undocumented features of PyTorch


Explain what the undocumented feature is.

Explain what the undocumented feature is.

added explanation

emailweixu · 2025-01-22T22:28:03Z

alf/utils/common_test.py

+    process.join()
+
+    # numpy array should not be modified
+    assert np.allclose(m.y, np.zeros([2]))


should be equal instead of close

should be equal instead of close

updated

emailweixu · 2025-01-27T23:09:29Z

alf/algorithms/distributed_off_policy_algorithm.py

                # Add the temp exp buffer to the replay buffer
-                for exp_params in unroller_exps_buffer[unroller_id]:
+                for i, exp_params in enumerate(
+                        unroller_exps_buffer[unroller_id]):


If the batch size of the replay buffer is 1. env_id has to be 0 at the next line

If the batch size of the replay buffer is 1. env_id has to be 0 at the next line

This is true for the current assumption. But since exp_params always contains env_id, we can just use it. Do you mean we should assert it's equal to 0?

We can just set it to 0 here?

We can just set it to 0 here?

updated

emailweixu · 2025-01-28T00:52:49Z

alf/algorithms/distributed_off_policy_algorithm.py

                # Add the temp exp buffer to the replay buffer
-                for exp_params in unroller_exps_buffer[unroller_id]:
-                    replay_buffer.add_batch(exp_params, exp_params.env_id)
+                env_id = torch.zeros([1], dtype=torch.int32)


device="cpu"

Should change exp_arams.env_id instead, since exp_params.env_id will be stored in replay buffer and we dont' want inconsistency.

exp_params.env_id.zero_()

distunroller set last step periodically

43c36f3

hnyu requested a review from emailweixu January 17, 2025 22:41

update

58992fa

hnyu requested a review from Haichao-Zhang January 18, 2025 04:31

emailweixu reviewed Jan 22, 2025

View reviewed changes

address some comments

3cddd5b

hnyu requested a review from emailweixu January 24, 2025 19:56

emailweixu reviewed Jan 27, 2025

View reviewed changes

emailweixu reviewed Jan 28, 2025

View reviewed changes

hardcode env_id=0

53f6e4d

hnyu force-pushed the PR_unroller_set_last branch from 48969cd to 53f6e4d Compare January 28, 2025 01:03

hnyu requested a review from emailweixu January 31, 2025 17:39

emailweixu approved these changes Jan 31, 2025

View reviewed changes

hnyu merged commit e140cd2 into pytorch Jan 31, 2025
2 checks passed

hnyu deleted the PR_unroller_set_last branch January 31, 2025 17:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distunroller set last step periodically #1725

distunroller set last step periodically #1725

hnyu commented Jan 17, 2025

emailweixu Jan 22, 2025

hnyu Jan 24, 2025

emailweixu Jan 22, 2025

hnyu Jan 24, 2025 •

edited

Loading

emailweixu Jan 22, 2025

hnyu Jan 24, 2025

emailweixu Jan 22, 2025

hnyu Jan 24, 2025

emailweixu Jan 27, 2025

hnyu Jan 28, 2025

emailweixu Jan 28, 2025

hnyu Jan 28, 2025

emailweixu Jan 28, 2025

emailweixu Jan 28, 2025

distunroller set last step periodically #1725

distunroller set last step periodically #1725

Conversation

hnyu commented Jan 17, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hnyu Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hnyu Jan 24, 2025 •

edited

Loading