Skip to content

Commit

Permalink
Merge commit 'df8939818d2b3694d14120d8fb07eea96e5b99a8' into feat/unl…
Browse files Browse the repository at this point in the history
…soth_fast_grpo

* commit 'df8939818d2b3694d14120d8fb07eea96e5b99a8': (24 commits)
  GRPO+LMDeploy 0.7 (modelscope#3277)
  fix lmdeploy (modelscope#3274)
  compat lmdeploy 0.7 (modelscope#3256)
  Fix typos (modelscope#3266)
  Support the base64 format of generated images for JanusPro (modelscope#3265)
  grpo_countdown & fix format reward (modelscope#3269)
  fix grpo compat transformers==4.47.* (modelscope#3252)
  save val_dataset (modelscope#3248)
  fix  grpo single gpu(modelscope#3246)
  fix grpo npu vllm (modelscope#3242)
  update docs (modelscope#3243)
  support muon optimizer (modelscope#3234)
  support moonlight (modelscope#3232)
  fix deepseek_vl2 (modelscope#3233)
  fix docs zh (modelscope#3231)
  Speed up GRPO (modelscope#3229)
  update docs (modelscope#3230)
  fix load args (modelscope#3226)
  Update the JanusPro-generation (modelscope#3221)
  Support the generation of JanusPro models (modelscope#3218)
  ...
  • Loading branch information
tastelikefeet committed Feb 26, 2025
2 parents 1a06843 + df89398 commit ddedb66
Show file tree
Hide file tree
Showing 85 changed files with 2,182 additions and 278 deletions.
18 changes: 10 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ You can contact us and communicate with us by adding our group:

- 🍎 **Model Types**: Supports 450+ pure text large models, **150+ multi-modal large models**, as well as All-to-All multi-modal models, sequence classification models, and embedding models, **covering the entire process from training to deployment**.
- **Dataset Types**: Comes with 150+ pre-training, fine-tuning, human alignment, multi-modal datasets, and supports custom datasets.
- **Hardware Support**: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, etc.
- **Hardware Support**: Compatible with CPU, RTX series, T4/V100, A10/A100/H100, Ascend NPU, MPS, etc.
- 🍊 **Lightweight Training**: Supports lightweight fine-tuning methods like LoRA, QLoRA, DoRA, LoRA+, ReFT, RS-LoRA, LLaMAPro, Adapter, GaLore, Q-Galore, LISA, UnSloth, Liger-Kernel.
- **Distributed Training**: Supports distributed data parallel (DDP), device_map simple model parallelism, DeepSpeed ZeRO2/ZeRO3, FSDP, and other distributed training techniques.
- **Quantization Training**: Supports training quantized models like BNB, AWQ, GPTQ, AQLM, HQQ, EETQ.
Expand All @@ -78,6 +78,8 @@ You can contact us and communicate with us by adding our group:


## 🎉 News
- 🎁 2025.02.21: We test the speed performance of GRPO,and with some tricks to [speed up to 300%](examples/train/grpo/full_lmdeploy.sh). WanDB charts can be found [here](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
- 🎁 2025.02.21: Support distill from LLM API,Please check[this example](examples/sampler/distill/distill.sh)
- 🎁 2025.02.17: Support SwanLab, just add [a few of arguments](docs/source_en/Instruction/Command-line-parameters.md#swanlab) you can use swanlab to analysis your training results
- 🎁 2025.02.16: Support LMDeploy in GRPO, use `--use_lmdeploy true`. Please check [this script](examples/train/grpo/full_lmdeploy.sh)
- 🔥 2025.02.12: Support for GRPO(Group Relative Policy Optimization) algorithm for llm and mllm, document can be found in [here](docs/source_en/Instruction/GRPO.md)
Expand Down Expand Up @@ -113,13 +115,13 @@ Running Environment:
| python | >=3.9 | 3.10 | |
| cuda | | cuda12 | No need to install if using CPU, NPU, MPS |
| torch | >=2.0 | | |
| transformers | >=4.33 | 4.48.3 | |
| transformers | >=4.33 | 4.49 | |
| modelscope | >=1.19 | | |
| peft | >=0.11.0,<0.15.0 | | |
| trl | >=0.13,<0.16 | 0.15 | RLHF |
| deepspeed | >=0.14 | | Training |
| vllm | >=0.5.1 | 0.7.2 | Inference/Deployment/Evaluation |
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 | Inference/Deployment/Evaluation |
| peft | >=0.11,<0.15 | ||
| trl | >=0.13,<0.17 | 0.15 |RLHF|
| deepspeed | >=0.14 | 0.14.5 | Training |
| vllm | >=0.5.1 | 0.7.3 | Inference/Deployment/Evaluation |
| lmdeploy | lmdeploy>=0.5 | 0.7.0.post3 | Inference/Deployment/Evaluation |
| evalscope | | >=0.11 | Evaluation |

For more optional dependencies, you can refer to [here](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh).
Expand Down Expand Up @@ -164,7 +166,7 @@ swift sft \

Tips:

- If you want to train with a custom dataset, you can refer to [this guide](../Customization/Custom-dataset.md) to organize your dataset format and specify `--dataset <dataset_path>`.
- If you want to train with a custom dataset, you can refer to [this guide](https://swift.readthedocs.io/en/latest/Customization/Custom-dataset.html) to organize your dataset format and specify `--dataset <dataset_path>`.
- The `--model_author` and `--model_name` parameters are only effective when the dataset includes `swift/self-cognition`.
- To train with a different model, simply modify `--model <model_id/model_path>`.
- By default, ModelScope is used for downloading models and datasets. If you want to use HuggingFace, simply specify `--use_hf true`.
Expand Down
16 changes: 9 additions & 7 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
**为什么选择ms-swift?**
- 🍎 **模型类型**:支持450+纯文本大模型、**150+多模态大模型**以及All-to-All全模态模型、序列分类模型、Embedding模型**训练到部署全流程**
- **数据集类型**:内置150+预训练、微调、人类对齐、多模态等各种类型的数据集,并支持自定义数据集。
- **硬件支持**:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等
- **硬件支持**:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU、MPS等
- 🍊 **轻量训练**:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
- **分布式训练**:支持分布式数据并行(DDP)、device_map简易模型并行、DeepSpeed ZeRO2 ZeRO3、FSDP等分布式训练技术。
- **量化训练**:支持对BNB、AWQ、GPTQ、AQLM、HQQ、EETQ量化模型进行训练。
Expand All @@ -74,6 +74,8 @@
- **模型量化**:支持AWQ、GPTQ和BNB的量化导出,导出的模型支持使用vLLM/LmDeploy推理加速,并支持继续训练。

## 🎉 新闻
- 🎁 2025.02.21: 我们测试了GRPO算法的性能,并且使用一些tricks使[训练速度提高到300%](examples/train/grpo/full_lmdeploy.sh). WanDB表格请查看[这里](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
- 🎁 2025.02.21: 支持大模型API蒸馏采样,请查看[示例](examples/sampler/distill/distill.sh)
- 🎁 2025.02.17: 支持SwanLab, 仅需添加[几个新的参数](docs/source/Instruction/命令行参数.md#swanlab)就可以在swanlab上验证你的训练效果
- 🎁 2025.02.16: 在GRPO算法中支持LMDeploy, 请查看`--use_lmdeploy true`. 具体参考[这个脚本](examples/train/grpo/full_lmdeploy.sh)
- 🔥 2025.02.12: 支持GRPO(Group Relative Policy Optimization) 训练算法,训练脚本可以在[这里](docs/source/Instruction/GRPO.md)找到
Expand Down Expand Up @@ -108,13 +110,13 @@ pip install -e .
| python | >=3.9 | 3.10 ||
| cuda | | cuda12 |使用cpu、npu、mps则无需安装|
| torch | >=2.0 | ||
| transformers | >=4.33 | 4.48.3 ||
| transformers | >=4.33 | 4.49 ||
| modelscope | >=1.19 | ||
| peft | >=0.11.0,<0.15.0 | ||
| trl | >=0.13,<0.16 | 0.15 |RLHF|
| deepspeed | >=0.14 | |训练|
| vllm | >=0.5.1 | 0.7.2 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
| peft | >=0.11,<0.15 | ||
| trl | >=0.13,<0.17 | 0.15 |RLHF|
| deepspeed | >=0.14 | 0.14.5 |训练|
| vllm | >=0.5.1 | 0.7.3 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5 | 0.7.0.post3 |推理/部署/评测|
| evalscope | | >=0.11 |评测|

更多可选依赖可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)
Expand Down
Binary file added docs/resources/grpo_countdown.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/grpo_countdown_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ version: 2
build:
os: ubuntu-22.04
tools:
python: "3.12"
python: "3.10"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
Expand Down
12 changes: 6 additions & 6 deletions docs/source/GetStarted/SWIFT安装.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,13 +57,13 @@ pip install ms-swift==2.*
| python | >=3.9 | 3.10 ||
| cuda | | cuda12 |使用cpu、npu、mps则无需安装|
| torch | >=2.0 | ||
| transformers | >=4.33 | 4.48.3 ||
| transformers | >=4.33 | 4.49 ||
| modelscope | >=1.19 | ||
| peft | >=0.11.0,<0.15.0 | ||
| trl | >=0.13,<0.16 | 0.15 |RLHF|
| deepspeed | >=0.14 | |训练|
| vllm | >=0.5.1 | 0.7.2 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5,<0.6.5 | 0.6.4 |推理/部署/评测|
| peft | >=0.11,<0.15 | ||
| trl | >=0.13,<0.17 | 0.15 |RLHF|
| deepspeed | >=0.14 | 0.14.5 |训练|
| vllm | >=0.5.1 | 0.7.3 |推理/部署/评测|
| lmdeploy | lmdeploy>=0.5 | 0.7.0.post3 |推理/部署/评测|
| evalscope | | >=0.11 |评测|

更多可选依赖可以参考[这里](https://github.com/modelscope/ms-swift/blob/main/requirements/install_all.sh)
Expand Down
2 changes: 1 addition & 1 deletion docs/source/GetStarted/快速开始.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ms-swift是魔搭社区提供的大模型与多模态大模型训练部署框架

- 🍎 模型类型:支持450+纯文本大模型、150+多模态大模型以及All-to-All全模态模型、序列分类模型、Embedding模型训练到部署全流程。
- 数据集类型:内置150+预训练、微调、人类对齐、多模态等各种类型的数据集,并支持自定义数据集。
- 硬件支持:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU等
- 硬件支持:CPU、RTX系列、T4/V100、A10/A100/H100、Ascend NPU、MPS等
- 🍊 轻量训练:支持了LoRA、QLoRA、DoRA、LoRA+、ReFT、RS-LoRA、LLaMAPro、Adapter、GaLore、Q-Galore、LISA、UnSloth、Liger-Kernel等轻量微调方式。
- 分布式训练:支持分布式数据并行(DDP)、device_map简易模型并行、DeepSpeed ZeRO2 ZeRO3、FSDP等分布式训练技术。
- 量化训练:支持对BNB、AWQ、GPTQ、AQLM、HQQ、EETQ量化模型进行训练。
Expand Down
4 changes: 3 additions & 1 deletion docs/source/Instruction/GRPO.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
环境安装
```bash
pip install math_verify # reward function
pip install "trl>=0.15"
pip install git+https://github.com/huggingface/trl.git"
```
**注意**:训练过程中 loss 接近0 是正常情况, 参考[issue](https://github.com/huggingface/open-r1/issues/239#issuecomment-2646297851)
Expand Down Expand Up @@ -95,6 +95,8 @@ A conversation between User and Assistant. The user asks a question, and the Ass
- vllm_gpu_memory_utilization: vLLM透传参数
- vllm_max_model_len: vLLM透传参数
- reward_model: 同model, 使用奖励模型作为奖励函数,与reward_funcs至少需要指定一个
- num_iterations: 每个批次代更新次数,默认为1.
- epsilon: clip 系数
奖励函数超参,见[内置奖励函数](#内置奖励函数)
Expand Down
Loading

0 comments on commit ddedb66

Please sign in to comment.