Aloha handover #29

Andrew-Luo1 · 2025-01-21T01:56:27Z

Bi-arm handover task. Original reward design by Guy Lever.

2025-01-20.07-29-41.mp4

As seen in the video (50% speed for easier viewing), the shaping rewards consist of 3 terms that create a mostly monotonic "reward potential field" increasing as the robot progresses through the desired motion.

gripper_box drives the left hand to the box.
box_handover rewards the box for getting to a pre-assigned handover point.
handover_target rewards the right hand for getting the box to the target point.

With this formulation alone, the policy takes 30 min to 1 hour to train and gets stuck in local minima for about half the seeds. The difficulty is in the hand-over. Because the rewards plummet when the hands fumble in this process, you get stuck in a minima where both hands clasp onto the box, unwilling to let go. Two tricks to get around this.

First, don't penalize regression during an episode. If $r_{raw}$ is the sum of the above three terms, we use:
$r_{t+1} = \max( r_{raw, t+1} - max_{{\tau\in{0, t}}} r_\tau, 0)$

Second, reset the episode whenever the box is dropped. These tricks drive the robot to get a lot of attempts at the transfer procedure while being unafraid of failure.

On my RTX4090, this is trainining stably across seeds in about 10 min.

…ted stay-in-place

…ewards. Patch franka randomization bug

kevinzakka · 2025-01-21T02:50:47Z

Amazing job @Andrew-Luo1 and excellent PR summary thank you!

Andrew-Luo1 added 5 commits January 20, 2025 07:40

not stable across seeds

1dfa589

reimplemented rewards to match exact specification

969eed4

works on seed 4 but can fail to learn gripping and maybe too complica…

4faa55c

…ted stay-in-place

handover training fast across seeds

e23e9a7

Aloha Handover completed. Credit to Guy Lever for the three shaping r…

fa4e8f1

…ewards. Patch franka randomization bug

kevinzakka approved these changes Jan 21, 2025

View reviewed changes

copybara-service bot merged commit 54f8081 into google-deepmind:main Jan 21, 2025
5 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aloha handover #29

Aloha handover #29

Andrew-Luo1 commented Jan 21, 2025

kevinzakka commented Jan 21, 2025

Aloha handover #29

Aloha handover #29

Conversation

Andrew-Luo1 commented Jan 21, 2025

kevinzakka commented Jan 21, 2025