π An awesome & curated list of best LLMOps tools.
More than welcomed to add new projects in alphabetical order.
- Agent
- Alignment
- Application Orchestration Framework
- Chat Framework
- Code Assistant
- Database
- Evaluation
- FineTune
- Gateway
- Inference
- MCP
- MLOps
- Observation
- Output
- Training
- Agno: Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
- AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
- kagent: kagent is a kubernetes native framework for building AI agents.
- LangGraph: Build resilient language agents as graphs.
- MetaGPT: π The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
- OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
- OpenManus: No fortress, purely open ground. OpenManus is Coming.
- PydanticAI: Agent Framework / shim to use Pydantic with LLMs.
- Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
- Browser Use: Make websites accessible for AI agents.
- Mem0: The Memory layer for AI Agents.
- OpenAI CUA: Computer Using Agent Sample App.
- OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
- Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.
- Dify: Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
- Flowise: Drag & drop UI to build your customized LLM flow.
- Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
- Inference: Turn any computer or edge device into a command center for your computer vision projects.
- LangChain: π¦π Build context-aware reasoning applications.
- LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation"
- LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.
- Semantic Kernel: An open-source integration framework for integrating LLMs into your applications, featuring plugin integration, memory management, planners, and multi-modal capabilities.
- 5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.
- Chatbot UI: AI chat for any model.
- Cherry Studio: π Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1.
- FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
- Gradio: Build and share delightful machine learning apps, all in Python. π Star to support our work!
- Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
- Lobe Chat: π€― Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
- NextChat: β¨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
- Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
- PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.
- Auto-dev: π§βAutoDev: The AI-powered coding wizardοΌAI ι©±ε¨ηΌη¨ε©ζοΌwith multilingual support π, auto code generation ποΈ, and a helpful bug-slaying assistant π! Customizable prompts π¨ and a magic Auto Dev/Testing/Document/Agent feature π§ͺ included! π.
- Codefuse-chatbot: An intelligent assistant serving the entire software development lifecycle, powered by a Multi-Agent Framework, working with DevOps Toolkits, Code&Doc Repo RAG, etc.
- Cody: Type less, code more: Cody is an AI code assistant that uses advanced search and codebase context to help you write and fix code.
- Continue: β© Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.
- Sweep: JSweep: AI coding assistant for JetBrains.
- Tabby: Self-hosted AI coding assistant.
- chroma: the AI-native open-source embedding database.
- deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
- Faiss: A library for efficient similarity search and clustering of dense vectors.
- milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
- weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseβ.
- AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
- lm-evaluation-harness: A framework for few-shot evaluation of language models.
- LongBench: LongBench v2 and LongBench (ACL 2024).
- OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
- Axolotl: Go ahead and axolotl questions.
- EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
- LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
- LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
- maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
- MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
- Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
- torchtune: PyTorch native post-training library.
- Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
- unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! π¦₯
- AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
- LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
- RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.
- APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.
- Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
- Higress: π€ AI Gateway | AI Native API Gateway.
- kgateway: The Cloud-Native API Gateway and AI Gateway.
- Kong: π¦ The Cloud-Native API Gateway and AI Gateway.
- gateway-api-inference-extension: Gateway API Inference Extension.
- Cortex.cpp: Local AI API Platform.
- DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
- Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework.
- ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
- LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
- llama.cpp: LLM inference in C/C++.
- Llumnix: Efficient and easy multi-instance LLM serving.
- MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
- MLC LLM: Universal LLM Deployment Engine with ML Compilation.
- MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
- Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
- OpenVINO: OpenVINOβ’ is an open source toolkit for optimizing and deploying AI inference.
- Ratchet: A cross-platform browser ML framework.
- SGLang: SGLang is a fast serving framework for large language models and vision language models.
- transformers.js: State-of-the-art Machine Learning for the web. Run π€ Transformers directly in your browser, with no need for a server!
- Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
- Text Generation Inference: Large Language Model Text Generation Inference.
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
- web-llm: High-performance In-browser LLM Inference Engine.
- zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.
- AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
- Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.
- Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
- KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
- llmaz: βΈοΈ Easy, advanced inference platform for large language models on Kubernetes. π Star to support our work!
- LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.
- Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
- OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- awesome-mcp-servers: A curated list of awesome Model Context Protocol (MCP) servers.
- mcp-directory: A directory for Awesome MCP Servers.
- Smithery: Smithery is a platform to help developers find and ship language model extensions compatible with the Model Context Protocol Specification.
- awesome-mcp-clients: A curated list of awesome Model Context Protocol (MCP) clients.
- BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
- Kubeflow: Machine Learning Toolkit for Kubernetes.
- Metaflow: Build, Deploy and Manage AI/ML Systems.
- MLflow: Open source platform for the machine learning lifecycle.
- Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
- Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
- ZenML: ZenML π: The bridge between ML and Ops. https://zenml.io.
- OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
- Helicone: π§ Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 π
- phoenix: AI Observability & Evaluation.
- wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
- Instructor: structured outputs for llms.
- Outlines: Structured Text Generation.
- Candle: Minimalist ML framework for Rust.
- ColossalAI: Making large AI models cheaper, faster and more accessible.
- Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
- MaxText: A simple, performant and scalable Jax LLM!
- MLX: MLX: An array framework for Apple silicon.