Fix/agent grpo #3172

tastelikefeet · 2025-02-18T15:59:56Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

Write the detail information belongs to this PR.

Experiment results

Paste your experiment result here(if needed).

…m_ds * commit '0d3270d5b356a16853a43653cfd54d522445e281': Support Agent GRPO (modelscope#3170) Fix ovis2 (modelscope#3169) support grpo metric_for_best_model (modelscope#3155) Support Ovis2 models (modelscope#3163) docs: report_to add swanlab (modelscope#3158) # Conflicts: # examples/train/grpo/multi_node/multi_gpu_agent.sh # swift/plugin/orm.py

…soth_fast_grpo * commit '8921d9b98310d93f9f111af8859358ee32dce687': (46 commits) Support multiple vllms (modelscope#3202) update dataset & fix bugs (modelscope#3203) support vllm dp (modelscope#3201) fix setup.py (modelscope#3198) add links (modelscope#3193) Refactor grpo dataset (modelscope#3192) support r1 dataset (modelscope#3191) compat vllm==0.7.2 (modelscope#3083) support Knowledge Distillation sampling (modelscope#3185) Support GOT_OCR2_hf (modelscope#3182) Fix prm in sampler (modelscope#3184) fix sampler reaches max_length (modelscope#3180) refactor cosine orm (modelscope#3179) fix internvl-4b (modelscope#3178) Fix lmdeploy branch (modelscope#3145) Fix/agent grpo (modelscope#3172) fix streaming (modelscope#3176) fix max_length error (modelscope#3173) Support Agent GRPO (modelscope#3170) Fix ovis2 (modelscope#3169) ... # Conflicts: # swift/llm/train/tuner.py

tastelikefeet added 15 commits February 18, 2025 16:05

add new ds

0e0cd61

wip

5ab87cf

fix

8be7711

lint code

8c1e18d

fix

7396c0d

add more scripts

255b7cf

lint

09b3cf6

fix

91b2f6f

fix

4ebde92

fix

741ff42

fix

883b7b5

fix

d90402e

fix

6f3f6a7

fix

b50953a

Jintao-Huang approved these changes Feb 18, 2025

View reviewed changes

tastelikefeet merged commit d7ec5a2 into modelscope:main Feb 19, 2025
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/agent grpo #3172

Fix/agent grpo #3172

tastelikefeet commented Feb 18, 2025

Fix/agent grpo #3172

Fix/agent grpo #3172

Conversation

tastelikefeet commented Feb 18, 2025

PR type

PR information

Experiment results