Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aloha handover #29

Merged
merged 5 commits into from
Jan 21, 2025
Merged

Conversation

Andrew-Luo1
Copy link
Contributor

Bi-arm handover task. Original reward design by Guy Lever.

2025-01-20.07-29-41.mp4

As seen in the video (50% speed for easier viewing), the shaping rewards consist of 3 terms that create a mostly monotonic "reward potential field" increasing as the robot progresses through the desired motion.

  1. gripper_box drives the left hand to the box.
  2. box_handover rewards the box for getting to a pre-assigned handover point.
  3. handover_target rewards the right hand for getting the box to the target point.

With this formulation alone, the policy takes 30 min to 1 hour to train and gets stuck in local minima for about half the seeds. The difficulty is in the hand-over. Because the rewards plummet when the hands fumble in this process, you get stuck in a minima where both hands clasp onto the box, unwilling to let go. Two tricks to get around this.

First, don't penalize regression during an episode. If $r_{raw}$ is the sum of the above three terms, we use:
$r_{t+1} = \max( r_{raw, t+1} - max_{{\tau\in{0, t}}} r_\tau, 0)$

Second, reset the episode whenever the box is dropped. These tricks drive the robot to get a lot of attempts at the transfer procedure while being unafraid of failure.

On my RTX4090, this is trainining stably across seeds in about 10 min.
image

@kevinzakka
Copy link
Collaborator

Amazing job @Andrew-Luo1 and excellent PR summary thank you!

@copybara-service copybara-service bot merged commit 54f8081 into google-deepmind:main Jan 21, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants