Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
Pinned Loading
Repositories
Showing 10 of 61 repositories
- flash-attention Public Forked from vllm-project/flash-attention
Fast and memory-efficient exact attention
neuralmagic/flash-attention’s past year of commit activity - compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/compressed-tensors’s past year of commit activity - upstream-transformers Public Forked from huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralmagic/upstream-transformers’s past year of commit activity - lm-evaluation-harness Public Forked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/lm-evaluation-harness’s past year of commit activity - nm-vllm-certs Public
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
neuralmagic/nm-vllm-certs’s past year of commit activity - evalplus Public Forked from evalplus/evalplus
NeuralMagic fork of EvalPlus (Rigourous evaluation of LLM-synthesized code - NeurIPS 2023)
neuralmagic/evalplus’s past year of commit activity