-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeepSeek #11971
DeepSeek #11971
Conversation
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
This reverts commit 4e5929b.
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
# Conflicts: # nemo/collections/llm/gpt/model/deepseek.py
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
@JRD971000 @yaoyu-33 could you review the changes to ssm.py and mllama.py |
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for llm/api
and nemo.lightning
related changes
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
Signed-off-by: cuichenx <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
[🤖]: Hi @cuichenx 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully So it might be time to merge this PR or get some approvals I'm just a bot so I'll leave it you what to do next. //cc @pablo-garay @ko3n1g |
* initial commit Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix mscale and remove debug code Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * remove MTP to avoid HF warning Signed-off-by: Chen Cui <[email protected]> * Revert "remove MTP to avoid HF warning" This reverts commit 4e5929b. * guard v3 args Signed-off-by: Chen Cui <[email protected]> * guard one more v3 arg Signed-off-by: Chen Cui <[email protected]> * add recipes (wip) Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * update to latest mcore Signed-off-by: Chen Cui <[email protected]> * update to latest mcore Signed-off-by: Chen Cui <[email protected]> * support lora for TELinear Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * support V2 lite Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * exporter Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * memory-efficient hf export Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * code scanning Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * comment out cli factory for pretraining recipes Signed-off-by: Chen Cui <[email protected]> * Support non-layernom column parallel layer for LoRA SP in MLA Signed-off-by: Chen Cui <[email protected]> * add v2-lite recipe Signed-off-by: Chen Cui <[email protected]> * recipe typos Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * linting Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * guard packed sequence=false Signed-off-by: Chen Cui <[email protected]> * unused imports Signed-off-by: Chen Cui <[email protected]> * fix lora Signed-off-by: Chen Cui <[email protected]> * address comments Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix test Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Signed-off-by: oliver könig <[email protected]>
* DeepSeek (#11971) * initial commit Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * clean Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix mscale and remove debug code Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * remove MTP to avoid HF warning Signed-off-by: Chen Cui <[email protected]> * Revert "remove MTP to avoid HF warning" This reverts commit 4e5929b. * guard v3 args Signed-off-by: Chen Cui <[email protected]> * guard one more v3 arg Signed-off-by: Chen Cui <[email protected]> * add recipes (wip) Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * update to latest mcore Signed-off-by: Chen Cui <[email protected]> * update to latest mcore Signed-off-by: Chen Cui <[email protected]> * support lora for TELinear Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * support V2 lite Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * exporter Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * memory-efficient hf export Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * code scanning Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * comment out cli factory for pretraining recipes Signed-off-by: Chen Cui <[email protected]> * Support non-layernom column parallel layer for LoRA SP in MLA Signed-off-by: Chen Cui <[email protected]> * add v2-lite recipe Signed-off-by: Chen Cui <[email protected]> * recipe typos Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * linting Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * update recipes Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * guard packed sequence=false Signed-off-by: Chen Cui <[email protected]> * unused imports Signed-off-by: Chen Cui <[email protected]> * fix lora Signed-off-by: Chen Cui <[email protected]> * address comments Signed-off-by: Chen Cui <[email protected]> * Apply isort and black reformatting Signed-off-by: cuichenx <[email protected]> * fix test Signed-off-by: Chen Cui <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Co-authored-by: cuichenx <[email protected]> Signed-off-by: oliver könig <[email protected]> * missing imports Signed-off-by: oliver könig <[email protected]> --------- Signed-off-by: Chen Cui <[email protected]> Signed-off-by: cuichenx <[email protected]> Signed-off-by: oliver könig <[email protected]> Co-authored-by: Chen Cui <[email protected]> Co-authored-by: cuichenx <[email protected]>
What does this PR do ?
Add DeepSeek V2-Lite, V2, and V3 (including R1) models.
Support model import (from HF), model export (to HF), SFT/LoRA recipes.
Currently not supported: SFT/LoRA with packed sequence, pretraining,
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information