Skip to content

Issues: huggingface/trl

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

GRPOTrainer with Deepspeed: Getting device mismatch error 🐛 bug Something isn't working 🚀 deepspeed Related to deepspeed 🏋 GRPO Related to GRPO
#2745 opened Feb 3, 2025 by 3rdAT
5 tasks done
feat(GRPOTrainer): reward_func return None to skip ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2737 opened Feb 2, 2025 by ctjlewis
PLZ make padding_free for DataCollatorForChatML. ✨ enhancement New feature or request 🏋 GKD Related to GKD 🙋 help from community wanted Open invitation for community members to contribute
#2736 opened Feb 2, 2025 by YooSungHyun
SFTvsRL SFT Memorizes, RL Generalizes ✨ enhancement New feature or request
#2735 opened Feb 2, 2025 by NickyDark1
GRPO Trainer supports VLMs ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2734 opened Feb 2, 2025 by sunildkumar
GKD Example why do not use labels? 🏋 GKD Related to GKD ❓ question Seeking clarification or more information
#2732 opened Feb 2, 2025 by YooSungHyun
5 tasks done
Latest TRL code = significantly worse rewards for GRPO training 🐛 bug Something isn't working 🏋 GRPO Related to GRPO
#2731 opened Feb 2, 2025 by abacaj
5 tasks done
Training Agents with GRPO 🏋 GRPO Related to GRPO
#2723 opened Jan 31, 2025 by August-murr
OOM for 7B model on A100 80Gb 🐛 bug Something isn't working
#2719 opened Jan 31, 2025 by JohnConnor123
5 tasks done
GRPO for RL on agent trajectories 🏋 GRPO Related to GRPO 🏋 Reward Related to Reward modelling
#2715 opened Jan 31, 2025 by korbinian-hoermann
GRPO with tool calling 🏋 GRPO Related to GRPO 🏋 Reward Related to Reward modelling
#2712 opened Jan 31, 2025 by accupham
3 tasks
LoRA 'trainable params: 0' 🐛 bug Something isn't working ⚡ PEFT Related to PEFT
#2711 opened Jan 31, 2025 by shannonruxin
Examples in training VDPO on llava1.6 🏋 DPO Related to DPO ✨ enhancement New feature or request
#2710 opened Jan 31, 2025 by lucasjinreal
GRPO memory bottleneck from num_generations in compute_loss 🐛 bug Something isn't working 🏋 GRPO Related to GRPO ⚡ PEFT Related to PEFT
#2709 opened Jan 31, 2025 by willccbb
PPOTrainer + LoRA and Continued Training ⏳ needs more info Additional information or clarification is required to proceed ⚡ PEFT Related to PEFT 🏋 PPO Related to PPO
#2707 opened Jan 30, 2025 by kooryan
Multi-GPU sampling for vLLM in GRPO Trainer ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2706 opened Jan 30, 2025 by nch0w
GRPO: Why does loss start at 0 for first K steps and then increase over time? 🏋 GRPO Related to GRPO ❓ question Seeking clarification or more information
#2703 opened Jan 30, 2025 by arnavgarg1
5 tasks done
Exposing GenerationConfig in the GRPO Trainer ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2702 opened Jan 30, 2025 by Superskyyy
Allow pretokenized dataset in GRPO Trainer ✨ enhancement New feature or request 🏋 GRPO Related to GRPO
#2701 opened Jan 30, 2025 by Superskyyy
GRPO VLLM does not work with Lora 🏋 GRPO Related to GRPO ⚡ PEFT Related to PEFT
#2698 opened Jan 30, 2025 by gagan3012
5 tasks done
I cannot launch PPOTrainning script with accelerate launch ⚡accelerate Related to accelerate ⚡ PEFT Related to PEFT 🏋 PPO Related to PPO
#2696 opened Jan 30, 2025 by daehuikim
5 tasks done
ProTip! Type g i on any issue or pull request to go back to the issue listing page.