-
Notifications
You must be signed in to change notification settings - Fork 187
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
blog: add apisix-ai-gateway-features (#1862)
* blog: add apisix-ai-gateway-features * add en version * Update apisix-ai-gateway-features.md * Update apisix-ai-gateway-features.md * fix typo * update metadata * update metadata * fix lint in why-reinvent-api-gateways
- Loading branch information
Showing
3 changed files
with
273 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
--- | ||
title: "Comprehensive Overview of APISIX AI Gateway Features" | ||
keywords: | ||
- APISIX | ||
- AI Gateway | ||
- LLM Proxy | ||
- API Gateway for AI | ||
- Token Rate Limiting | ||
- AI Security | ||
- AI Traffic Management | ||
- Open-Source API Gateway | ||
- Multi-LLM Load Balancing | ||
- AI API Protection | ||
- AI Request Throttling | ||
description: "Explore the robust features of the APISIX AI Gateway, including LLM proxy, intelligent traffic scheduling, token rate limiting, and security protection. Achieve multi-LLM load balancing, API rate control, and content moderation through open-source plugins to optimize the performance, security, and cost control of AI applications." | ||
tags: [Ecosystem] | ||
--- | ||
|
||
>This article will provide an in-depth look at the AI gateway features of the current and upcoming versions of APISIX. As a multifunctional API and AI gateway, Apache APISIX offers efficient and secure LLM API calls for AI applications. | ||
<!--truncate--> | ||
|
||
## Introduction: The Rise of AI Agents and the Evolution of AI Gateway | ||
|
||
In recent years, AI agents such as AutoGPT, Chatbots, and AI Assistants have seen rapid development. These applications rely heavily on API calls to large language models (LLMs), which has brought about challenges considering high concurrency, cost control, and security. | ||
|
||
Traditional API gateways primarily serve Web APIs and microservices and are not optimized for the unique needs of AI applications. This has led to the emergence of the concept of AI gateway. An AI gateway needs to provide enhanced capabilities in the following areas: | ||
|
||
- **Multi-LLM Proxy**: Support for multiple LLM providers to avoid vendor lock-in. | ||
- **Token Rate Limiting**: Prevent API abuse and optimize cost management. | ||
- **Security Protection**: Including prompt filtering and content moderation to ensure compliance of AI applications. | ||
- **Smart Traffic Management**: Dynamically adjust LLM weights based on cost, latency, and stability. | ||
|
||
Apache APISIX is not only an API gateway but also an AI gateway through its plugins, helping AI applications call LLM APIs more efficiently and securely. | ||
|
||
## LLM Proxy: Efficient Management of Multiple LLM Backends | ||
|
||
AI applications typically do not rely on a single LLM provider but need to dynamically select the best model based on requirements. For example: | ||
|
||
- Using OpenAI GPT-4 for general text generation and Claude for legal document processing. | ||
- Switching between Mistral and Gemini to optimize cost and throughput. | ||
|
||
**Apache APISIX's LLM Proxy offers the following capabilities:** | ||
|
||
✅ Support for Multiple LLM Providers: Including OpenAI, DeepSeek, Claude, Mistral, Gemini, etc., to avoid vendor lock-in. | ||
|
||
✅ LLM Weight and Priority Management: Adjust traffic distribution based on business needs. | ||
|
||
✅ Multi-LLM Load Balancing: Dynamically adjust LLM weights based on latency, cost, and stability. | ||
|
||
✅ Retry and Fallback Mechanisms: Ensure business continuity if an LLM API fails. | ||
|
||
✅ Load Balancing Across Different Providers of the Same LLM: | ||
|
||
For example: | ||
|
||
- Privately deployed DeepSeek. | ||
- Official DeepSeek API. | ||
- DeepSeek API from Volcano Engine | ||
|
||
Users can flexibly allocate traffic weights among different DeepSeek providers based on latency, stability, and price to achieve the best calling strategy. | ||
|
||
These capabilities enable AI applications to adapt flexibly to different LLMs, improve reliability, and reduce API calling costs. | ||
|
||
## AI Security Protection: Ensuring Safe and Compliant Use of AI | ||
|
||
AI APIs may involve sensitive data, misleading information, and potential misuse. Therefore, an AI gateway needs to provide security at multiple levels. | ||
|
||
**The AI security capabilities provided by Apache APISIX include:** | ||
|
||
✅ **AI RAG (Retrieval-Augmented Generation)**: Supports enterprise-owned knowledge bases to reduce LLM hallucinations and improve output reliability. | ||
|
||
✅ **Prompt Guard**: Automatically intercepts sensitive, illegal, and inappropriate prompts to prevent malicious use by users. | ||
|
||
✅ **Prompt Decorator**: Automatically adds content before and after user input to enhance the quality of LLM-generated content. | ||
|
||
✅ **Prompt Template**: Makes it easier for users to reuse standardized prompts and improve interaction experience. | ||
|
||
✅ **Response Filtering & Moderation**: Intercepts sensitive or non-compliant AI-generated content. | ||
|
||
✅ **Logging & Auditing**: Provides complete API request logs for compliance audits. | ||
|
||
These security measures ensure that AI applications meet enterprise-level security requirements and avoid compliance risks due to misleading AI content. | ||
|
||
## Token Observability and Management: Preventing High Bills Due to API Abuse | ||
|
||
Calling LLM APIs consumes tokens, and API abuse can lead to significant costs. Apache APISIX provides fine-grained token monitoring and management mechanisms. | ||
|
||
**The token management capabilities of Apache APISIX include:** | ||
|
||
✅ Token Rate Limiting by Route/Service/Consumer/Consumer Group/Custom Dimension | ||
|
||
✅ Support for Multiple Rate Limiting Modes: | ||
|
||
- Single-machine vs. cluster rate limiting to accommodate different scales of AI API services. | ||
- Fixed time window vs. sliding time window to flexibly control API rates. | ||
|
||
✅ Different Rate Limiting Policies for Different LLMs: Prevent cost overruns. | ||
|
||
Through Apache APISIX, enterprises can achieve fine-grained management of token resources and prevent high bills due to API abuse. | ||
|
||
## Smart Routing: Dynamic Traffic Management for AI APIs | ||
|
||
During AI API calls, different tasks may require different LLMs. For example: | ||
|
||
- Code generation requests → sent to GPT-4 or DeepSeek. | ||
- Long-form summarization tasks → sent to Claude. | ||
- General conversations → sent to GPT-3.5 or Gemini. | ||
|
||
**The smart routing capabilities of Apache APISIX include:** | ||
|
||
✅ Context-Aware Routing Based on Request Content: | ||
|
||
- Select the optimal LLM based on prompt type. | ||
- Allocate different models (GPT-4 Turbo vs. GPT-3.5) based on user level (paid vs. free users). | ||
|
||
✅ Response Caching: Reduce redundant API calls and improve response speed. | ||
|
||
These capabilities help AI APIs run more efficiently, reduce API latency, and increase throughput. | ||
|
||
## Conclusion | ||
|
||
With the rapid development of AI technology, API gateways also need to evolve to meet the unique needs of AI applications. Apache APISIX, with its LLM Proxy, token rate limiting, security protection, and smart routing features, has become the best choice for an AI gateway. | ||
|
||
**The core advantages of Apache APISIX compared to traditional API gateways are:** | ||
|
||
🚀 Support for Multiple LLM Providers: Avoid vendor lock-in. | ||
|
||
⚡️ Smart Traffic Scheduling: Dynamic load balancing to improve API reliability. | ||
|
||
🔒 Built-in Security Capabilities: Including prompt protection and content moderation to ensure secure and compliant AI APIs. | ||
|
||
💰 Token Rate Limiting: Prevent high bills due to API abuse. | ||
|
||
📊 High-performance Architecture: Meet the high concurrency needs of AI applications. | ||
|
||
If you are building AI-related applications and want to have both a powerful API gateway and AI gateway, give Apache APISIX a try! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,135 @@ | ||
--- | ||
title: "APISIX 的 AI Gateway 功能一览:LLM 代理、Token 限流、安全防护" | ||
keywords: | ||
- APISIX | ||
- AI Gateway | ||
- LLM Proxy | ||
- API Gateway for AI | ||
- Token Rate Limiting | ||
- AI Security | ||
- AI Traffic Management | ||
- Open-Source API Gateway | ||
- Multi-LLM Load Balancing | ||
- AI API Protection | ||
- AI Request Throttling | ||
description: 探索 APISIX AI Gateway 的强大功能,包括 LLM 代理、智能流量调度、Token 限流、安全防护等。通过开源插件实现多 LLM 负载均衡、API 速率控制、内容审核,优化 AI 应用的性能、安全性和成本控制。 | ||
tags: [Ecosystem] | ||
--- | ||
|
||
> 本文将详细介绍当前及未来几个版本 APISIX 的 AI 网关功能。作为一个多功能的 API 和 AI 网关,Apache APISIX 将为 AI 应用提供了高效且安全的 LLM API 调用。 | ||
<!--truncate--> | ||
|
||
## 引言:AI 代理的崛起与 AI Gateway 的演进 | ||
|
||
近年来,AI 代理发展迅猛,如 AutoGPT、Chatbots、AI Assistants 等应用不断涌现。它们依赖于大语言模型(LLM)的 API 调用,而高并发、成本控制、安全等挑战也随之而来。 | ||
|
||
传统的 API 网关主要服务于 Web API 和微服务,并未针对 AI 应用的特殊需求进行优化,因此催生了 AI Gateway 的概念。AI Gateway 需要在以下几个方面提供增强能力: | ||
|
||
- **多 LLM 代理**:支持多个 LLM 供应商,避免供应商锁定 | ||
- **Token 速率限制**:防止 API 滥用,优化成本管理 | ||
- **安全防护**:包括提示词过滤、内容审核等,确保 AI 应用的合规性 | ||
- **智能流量管理**:根据成本、延迟、稳定性,动态调整 LLM 权重 | ||
|
||
Apache APISIX 不仅是 API 网关,也通过插件成为了 AI 网关,帮助 AI 应用更高效、安全地调用 LLM API。 | ||
|
||
## LLM Proxy:高效管理多个 LLM 后端 | ||
|
||
AI 应用通常不会只依赖一个 LLM 供应商,而是需要根据需求动态选择最佳模型。例如: | ||
|
||
- 使用 OpenAI GPT-4 进行通用文本生成,使用 Claude 进行法律文档处理 | ||
- 在 Mistral 和 Gemini 之间切换,以优化成本和吞吐量 | ||
|
||
**Apache APISIX 的 LLM Proxy 提供以下能力:** | ||
|
||
✅ 支持多个 LLM 供应商(OpenAI, DeepSeek, Claude, Mistral, Gemini, etc.),避免供应商锁定 | ||
|
||
✅ LLM 权重和优先级管理,基于业务需求调整流量分配 | ||
|
||
✅ 智能负载均衡,依据延迟、成本、稳定性 动态调整 LLM 权重 | ||
|
||
✅ 重试与备用机制,确保某个 LLM API 发生故障时业务不中断 | ||
|
||
✅ 支持同一 LLM 不同供应商之间的负载均衡,例如: | ||
|
||
- 私有部署的 DeepSeek | ||
- 官方 DeepSeek API | ||
- 火山引擎的 DeepSeek API | ||
|
||
用户可以根据延迟、稳定性、价格等因素,灵活分配不同 DeepSeek 供应商的流量权重,实现最佳调用策略。 | ||
|
||
这些能力让 AI 应用能够灵活适配不同的 LLM,提高可靠性,降低 API 调用成本。 | ||
|
||
## AI 安全防护:确保 AI 使用安全与合规 | ||
|
||
AI API 可能会涉及敏感数据和误导性信息,甚至可能被滥用。因此,AI 网关需要在多个层面提供安全保障。 | ||
|
||
**Apache APISIX 提供的 AI 安全能力包括:** | ||
|
||
✅ AI RAG(Retrieval-Augmented Generation),支持企业自有知识库,降低 LLM 幻觉,提高输出可靠性 | ||
|
||
✅ 提示词防护,自动拦截敏感、违法、不适当的提示词,防止用户恶意使用 | ||
|
||
✅ 提示词装饰器,自动在用户的输入前后添加内容,增强 LLM 生成质量 | ||
|
||
✅ 提示词模版,让用户更方便地复用标准化提示词,提升交互体验 | ||
|
||
✅ 返回内容审核,拦截敏感或违规 的 AI 生成内容 | ||
|
||
✅ 日志与审计,提供完整 API 请求日志,便于合规审计 | ||
|
||
这些安全措施确保 AI 应用符合企业级安全要求,避免因 AI 误导性内容导致合规风险。 | ||
|
||
## Token 可观测性与管理:防止 API 滥用导致高额账单 | ||
|
||
调用 LLM API 需要消耗 Token,滥用 API 可能导致巨额成本。Apache APISIX 提供精细化的 Token 监控和管理机制。 | ||
|
||
**Apache APISIX 的 Token 管理能力:** | ||
|
||
✅ 按 Route/Service/Consumer/Consumer Group/自定义维度进行 Token 限流限速 | ||
|
||
✅ 支持多种限流模式: | ||
|
||
- 单机 vs. 集群限流,适应不同规模的 AI API 服务 | ||
- 固定时间窗口 vs. 滑动时间窗口,灵活控制 API 速率 | ||
|
||
✅ 对不同 LLM 设置不同的限流策略,避免成本失控 | ||
|
||
通过 Apache APISIX,企业可以实现 Token 资源的精细化管理,防止 API 滥用带来的高额账单。 | ||
|
||
## 智能路由:AI API 的动态流量管理 | ||
|
||
在 AI API 调用过程中,不同的任务可能需要不同的 LLM。例如: | ||
|
||
- 代码生成请求 → 发送至 GPT-4 或 DeepSeek | ||
- 长篇摘要任务 → 发送至 Claude | ||
- 普通对话 → 发送至 GPT-3.5 或 Gemini | ||
|
||
**Apache APISIX 的智能路由能力:** | ||
|
||
✅ 基于请求内容的智能路由(Context-Aware Routing) | ||
|
||
- 根据提示词(Prompt)类型 选择最优 LLM | ||
- 按用户级别(付费用户 vs. 免费用户) 分配不同的模型(GPT-4 Turbo vs. GPT-3.5) | ||
|
||
✅ 缓存优化(Response Caching),减少重复 API 调用,提升响应速度 | ||
|
||
这些能力帮助 AI API 运行更加高效,降低 API 延迟,提高吞吐量。 | ||
|
||
## 结语 | ||
|
||
随着 AI 技术的快速发展,API Gateway 也需要不断进化,适应 AI 应用的特殊需求。Apache APISIX 通过 LLM Proxy、Token 速率限制、安全防护和智能路由等功能,成为 AI Gateway 的最佳选择。 | ||
|
||
**Apache APISIX 相较于传统 API Gateway 的核心优势** | ||
|
||
🚀 支持多 LLM 供应商,避免供应商锁定 | ||
|
||
⚡️ 智能流量调度,动态负载均衡,提高 API 可靠性 | ||
|
||
🔒 内置安全能力,包括提示词防护、内容审核,确保 AI API 安全合规 | ||
|
||
💰 Token 限流限速,避免 API 滥用导致高额账单 | ||
|
||
📊 高性能架构,满足 AI 应用的高并发需求 | ||
|
||
如果你正在构建 AI 相关应用,并希望同时拥有强大的 API Gateway 和 AI Gateway,不妨试试 Apache APISIX!🎯 |