Add RLHF example #4

younesbelkada · 2023-02-16T16:19:25Z

What does this PR do?

First of all thanks a lot for bringing this repository and making the LION optimizer more accessible
I did some quick experiments in the context of RLHF using Transformers Reinforcement Library (trl), and managed to get an interesting convergence using this optimizer (by diving the learning rate by 3 as suggested), and I wanted to share it here, and why not merge the example script on this repository and/or add few instructions about this in the README

Related huggingface/trl#152

lucidrains · 2023-02-16T16:24:33Z

@younesbelkada ah, thanks for the PR, but i feel that's a bit outside the scope

would welcome that you share a few training runs in the discussion though. we don't even know if this optimizer will pass the test of time

younesbelkada · 2023-02-16T16:27:11Z

Sure, no problem, here is the run I made: https://wandb.ai/distill-bloom/trl/runs/slgy199e?workspace=user-younesbelkada

And in comparison to adam:

younesbelkada added 2 commits February 16, 2023 17:12

Create rlhf_example.py

3a4b811

Update README.md

c4831d5

lucidrains closed this Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RLHF example #4

Add RLHF example #4

younesbelkada commented Feb 16, 2023 •

edited

Loading

lucidrains commented Feb 16, 2023

younesbelkada commented Feb 16, 2023

Add RLHF example #4

Add RLHF example #4

Conversation

younesbelkada commented Feb 16, 2023 • edited Loading

What does this PR do?

lucidrains commented Feb 16, 2023

younesbelkada commented Feb 16, 2023

younesbelkada commented Feb 16, 2023 •

edited

Loading