Awesome-LLMOps

🎉 An awesome & curated list of best LLMOps tools.

More than welcomed to add new projects in alphabetical order.

Agent

Framework

Agno: Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
kagent: kagent is a kubernetes native framework for building AI agents.
LangGraph: Build resilient language agents as graphs.
MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
OpenManus: No fortress, purely open ground. OpenManus is Coming.
PydanticAI: Agent Framework / shim to use Pydantic with LLMs.
Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Tools

Browser Use: Make websites accessible for AI agents.
Mem0: The Memory layer for AI Agents.
OpenAI CUA: Computer Using Agent Sample App.

Alignment

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.

Application Orchestration Framework

Dify: Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Flowise: Drag & drop UI to build your customized LLM flow.
Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Inference: Turn any computer or edge device into a command center for your computer vision projects.
LangChain: 🦜🔗 Build context-aware reasoning applications.
LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation"
LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.
Semantic Kernel: An open-source integration framework for integrating LLMs into your applications, featuring plugin integration, memory management, planners, and multi-modal capabilities.

Chat Framework

5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.
Chatbot UI: AI chat for any model.
Cherry Studio: 🍒 Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1.
FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Gradio: Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
Lobe Chat: 🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
NextChat: ✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.

Code Assistant

Auto-dev: 🧙‍AutoDev: The AI-powered coding wizard（AI 驱动编程助手）with multilingual support 🌐, auto code generation 🏗️, and a helpful bug-slaying assistant 🐞! Customizable prompts 🎨 and a magic Auto Dev/Testing/Document/Agent feature 🧪 included! 🚀.
Codefuse-chatbot: An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
Cody: Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
Continue: ⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.
Sweep: JSweep: AI coding assistant for JetBrains.
Tabby: Self-hosted AI coding assistant.

Database

chroma: the AI-native open-source embedding database.
deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
Faiss: A library for efficient similarity search and clustering of dense vectors.
milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Evaluation

AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
lm-evaluation-harness: A framework for few-shot evaluation of language models.
LongBench: LongBench v2 and LongBench (ACL 2024).
OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

FineTune

Axolotl: Go ahead and axolotl questions.
EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
torchtune: PyTorch native post-training library.
Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Gateway

LLM Router

AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.

API Gateway

APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.
Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
Higress: 🤖 AI Gateway | AI Native API Gateway.
kgateway: The Cloud-Native API Gateway and AI Gateway.
Kong: 🦍 The Cloud-Native API Gateway and AI Gateway.
gateway-api-inference-extension: Gateway API Inference Extension.

Inference

Inference Engine

Cortex.cpp: Local AI API Platform.
DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework.
ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
llama.cpp: LLM inference in C/C++.
Llumnix: Efficient and easy multi-instance LLM serving.
MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
MLC LLM: Universal LLM Deployment Engine with ML Compilation.
MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
OpenVINO: OpenVINO™ is an open source toolkit for optimizing and deploying AI inference.
Ratchet: A cross-platform browser ML framework.
SGLang: SGLang is a fast serving framework for large language models and vision language models.
transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Text Generation Inference: Large Language Model Text Generation Inference.
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
web-llm: High-performance In-browser LLM Inference Engine.
zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.

Inference Platform

AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.
Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
llmaz: ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.
Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

MCP

MCP Server

awesome-mcp-servers: A curated list of awesome Model Context Protocol (MCP) servers.
mcp-directory: A directory for Awesome MCP Servers.
Smithery: Smithery is a platform to help developers find and ship language model extensions compatible with the Model Context Protocol Specification.

MCP Client

awesome-mcp-clients: A curated list of awesome Model Context Protocol (MCP) clients.

MLOps

BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Kubeflow: Machine Learning Toolkit for Kubernetes.
Metaflow: Build, Deploy and Manage AI/ML Systems.
MLflow: Open source platform for the machine learning lifecycle.
Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
ZenML: ZenML 🙏: The bridge between ML and Ops. https://zenml.io.

Observation

OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
Helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
phoenix: AI Observability & Evaluation.
wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Output

Instructor: structured outputs for llms.
Outlines: Structured Text Generation.

Training

Candle: Minimalist ML framework for Rust.
ColossalAI: Making large AI models cheaper, faster and more accessible.
Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
MaxText: A simple, performant and scalable Jax LLM!
MLX: MLX: An array framework for Apple silicon.

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
.github		.github
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
OWNERS		OWNERS
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-LLMOps

Table of Contents

Agent

Framework

Tools

Alignment

Application Orchestration Framework

Chat Framework

Code Assistant

Database

Evaluation

FineTune

Gateway

LLM Router

API Gateway

Inference

Inference Engine

Inference Platform

MCP

MCP Server

MCP Client

MLOps

Observation

Output

Training

About

Contributors 8

License

InftyAI/Awesome-LLMOps

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLMOps

Table of Contents

Agent

Framework

Tools

Alignment

Application Orchestration Framework

Chat Framework

Code Assistant

Database

Evaluation

FineTune

Gateway

LLM Router

API Gateway

Inference

Inference Engine

Inference Platform

MCP

MCP Server

MCP Client

MLOps

Observation

Output

Training

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 8