NeMo-Skills is a collection of pipelines to improve "skills" of large language models. You can use it to generate synthetic data, train/evaluate models, analyzing outputs and more! Here are some of the things we support.
- Flexible inference: Seamlessly switch between API providers, local server and large-scale slurm jobs for LLM inference.
- Multiple formats: Use any of the NeMo, vLLM, sglang and TensorRT-LLM servers and easily convert checkpoints from one format to another.
- Model evaluation: Evaluate your models on many popular benchmarks
- Math problem solving: math, aime24, aime25, omni-math (and many more)
- Formal proofs in Lean: minif2f, proofnet
- Coding skills: human-eval, mbpp
- Chat/instruction following: ifeval, arena-hard, mt-bench
- General knowledge: mmlu, mmlu-pro, gpqa
- Model training: Train models at speed-of-light using NeMo-Aligner.
You can find the full documentation here.
To get started, follow this tutorial,
browse available pipelines or run ns --help
to see all available
commands and their options.
Using our pipelines we created OpenMathInstruct-2 dataset which consists of 14M question-solution pairs (600K unique questions), making it nearly eight times larger than the previous largest open-source math reasoning dataset.
The models trained on this dataset achieve strong results on common mathematical benchmarks.
model | GSM8K | MATH | AMC 2023 | AIME 2024 | Omni-MATH |
Llama3.1-8B-Instruct | 84.5 | 51.9 | 9/40 | 2/30 | 12.7 |
OpenMath2-Llama3.1-8B (nemo | HF) | 91.7 | 67.8 | 16/40 | 3/30 | 22.0 |
+ majority@256 | 94.1 | 76.1 | 23/40 | 3/30 | 24.6 |
Llama3.1-70B-Instruct | 95.1 | 68.0 | 19/40 | 6/30 | 19.0 |
OpenMath2-Llama3.1-70B (nemo | HF) | 94.9 | 71.9 | 20/40 | 4/30 | 23.1 |
+ majority@256 | 96.0 | 79.6 | 24/40 | 6/30 | 27.6 |
We provide all instructions to fully reproduce our results.
See our paper for ablations studies and more details!
We also provide a convenient tool for visualizing inference and data analysis.
If you find our work useful, please consider citing us!
@article{toshniwal2024openmathinstruct2,
title = {{OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data}},
author = {Shubham Toshniwal and Wei Du and Ivan Moshkov and Branislav Kisacanin and Alexan Ayrapetyan and Igor Gitman},
year = {2024},
journal = {arXiv preprint arXiv: Arxiv-2410.01560}
}
@inproceedings{toshniwal2024openmathinstruct1,
title = {{OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset}},
author = {Shubham Toshniwal and Ivan Moshkov and Sean Narenthiran and Daria Gitman and Fei Jia and Igor Gitman},
year = {2024},
booktitle = {Advances in Neural Information Processing Systems},
}
Disclaimer: This project is strictly for research purposes, and not an official product from NVIDIA.