LlamaHome API Guide

Overview

This document provides a comprehensive overview of LlamaHome's API, including core concepts, architecture, key features, quick start, API reference, integration patterns, security, monitoring, error handling, best practices, and next steps.

API Overview

LlamaHome's API is designed to provide a robust, scalable, and secure interface for interacting with large language models. It supports both synchronous and asynchronous operations, with support for streaming responses and batch processing.

Core Concepts

Architecture

graph TD
    A[Client] --> B[API]
    B --> C[Model]

Loading

Key Features
- RESTful endpoints
- WebSocket support
- Streaming responses
- Rate limiting
- Authentication
- Monitoring

Quick Start

Basic Usage

Authentication

from llamahome.client import APIClient

client = APIClient(
    api_key="your_api_key",
    endpoint="https://api.llamahome.ai"
)

Simple Request

response = await client.process_prompt(
    prompt="Summarize this text",
    model="llama3.3",
    max_tokens=100
)
print(response.text)

Streaming Responses

Async Stream

async for chunk in client.stream_response(
    prompt="Generate a story",
    model="llama3.3"
):
    print(chunk.text, end="", flush=True)

Batch Processing

results = await client.process_batch(
    prompts=["Query 1", "Query 2", "Query 3"],
    model="llama3.3",
    batch_size=3
)

API Reference

Core Endpoints

Process Prompt

POST /api/v1/process
Content-Type: application/json
Authorization: Bearer <api_token>

{
  "prompt": "string (required)",
  "model": "string (optional, default: llama3.3)",
  "max_tokens": "integer (optional, default: 100)",
  "temperature": "float (optional, default: 0.7)",
  "stream": "boolean (optional, default: false)"
}

Response:

{
  "text": "Generated response",
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 50,
    "total_tokens": 60
  },
  "model": "llama3.3",
  "created": "2024-03-15T12:00:00Z"
}

Stream Response

POST /api/v1/stream
Content-Type: application/json
Authorization: Bearer <api_token>

{
  "prompt": "string (required)",
  "model": "string (optional)",
  "max_tokens": "integer (optional)"
}

Response Stream:

{"chunk": "First", "index": 0}
{"chunk": "part", "index": 1}
{"chunk": "of response", "index": 2}

Model Management

List Models

GET /api/v1/models
Authorization: Bearer <api_token>

Response:

{
  "models": [
    {
      "id": "llama3.3-7b",
      "name": "Llama 3.3 7B",
      "version": "3.3",
      "parameters": "7B",
      "context_length": 32768
    }
  ]
}

Model Information

GET /api/v1/models/{model_id}
Authorization: Bearer <api_token>

Response:

{
  "id": "llama3.3-7b",
  "name": "Llama 3.3 7B",
  "version": "3.3",
  "parameters": "7B",
  "context_length": 32768,
  "capabilities": [
    "text-generation",
    "summarization",
    "translation"
  ],
  "performance": {
    "tokens_per_second": 100,
    "memory_required": "8GB"
  }
}

Configuration

Update Settings

POST /api/v1/config
Content-Type: application/json
Authorization: Bearer <api_token>

{
  "model_settings": {
    "default_model": "llama3.3",
    "max_tokens": 2000,
    "temperature": 0.7
  },
  "system_settings": {
    "cache_size": "10GB",
    "max_requests_per_minute": 60
  }
}

Get Settings

GET /api/v1/config
Authorization: Bearer <api_token>

Integration Patterns

Client Integration

Python Client

from llamahome.client import LlamaClient

class CustomClient:
    def __init__(self, api_key: str):
        self.client = LlamaClient(api_key=api_key)
        
    async def process_with_retry(
        self,
        prompt: str,
        max_retries: int = 3
    ) -> str:
        """Process prompt with retry logic."""
        for attempt in range(max_retries):
            try:
                response = await self.client.process(prompt)
                return response.text
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                await asyncio.sleep(2 ** attempt)

JavaScript Client

class LlamaClient {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.baseUrl = 'https://api.llamahome.ai';
  }
  
  async processPrompt(prompt, options = {}) {
    const response = await fetch(`${this.baseUrl}/api/v1/process`, {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        prompt,
        ...options
      })
    });
    return response.json();
  }
}

Server Integration

FastAPI Server

from fastapi import FastAPI, Depends
from llamahome.server import LlamaServer

app = FastAPI()
llama = LlamaServer()

@app.post("/process")
async def process_prompt(
    prompt: str,
    current_user = Depends(get_current_user)
):
    return await llama.process(prompt)

Express Server

const express = require('express');
const { LlamaServer } = require('llamahome');

const app = express();
const llama = new LlamaServer();

app.post('/process', async (req, res) => {
  const result = await llama.process(req.body.prompt);
  res.json(result);
});

Security

Authentication

Token Generation

from llamahome.auth import TokenGenerator

generator = TokenGenerator(secret_key="your-secret")
token = generator.create_token(
    user_id="user123",
    expires_in=3600
)

Token Validation

from llamahome.auth import TokenValidator

validator = TokenValidator(secret_key="your-secret")
is_valid = validator.validate_token(token)

Rate Limiting

Basic Rate Limiting

from llamahome.security import RateLimiter

limiter = RateLimiter(
    requests_per_minute=60,
    burst_size=10
)

Advanced Rate Limiting

from llamahome.security import AdvancedRateLimiter

limiter = AdvancedRateLimiter(
    tiers={
        "basic": {"rpm": 60, "burst": 10},
        "pro": {"rpm": 300, "burst": 50},
        "enterprise": {"rpm": 1000, "burst": 100}
    }
)

Monitoring

Metrics Collection

Basic Metrics

from llamahome.monitoring import MetricsCollector

collector = MetricsCollector()
collector.record_request(
    endpoint="/api/v1/process",
    duration=0.123,
    status=200
)

Advanced Metrics

from llamahome.monitoring import AdvancedMetrics

metrics = AdvancedMetrics(
    enable_tracing=True,
    detailed_logging=True
)

Performance Monitoring

Response Time Tracking

from llamahome.monitoring import PerformanceMonitor

monitor = PerformanceMonitor()
with monitor.track_operation("process_prompt"):
    result = await process_prompt()

Resource Usage Tracking

from llamahome.monitoring import ResourceMonitor

monitor = ResourceMonitor()
monitor.track_resources(
    interval=60,
    metrics=["cpu", "memory", "gpu"]
)

Error Handling

Error Types

API Errors

class APIError(Exception):
    def __init__(self, message: str, code: int):
        self.message = message
        self.code = code

class RateLimitError(APIError):
    pass

class AuthenticationError(APIError):
    pass

Error Responses

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Too many requests",
    "details": {
      "retry_after": 60
    }
  }
}

Error Recovery

Retry Logic

from llamahome.error import RetryHandler

handler = RetryHandler(
    max_retries=3,
    backoff_factor=2
)

Circuit Breaker

from llamahome.error import CircuitBreaker

breaker = CircuitBreaker(
    failure_threshold=5,
    reset_timeout=300
)

Best Practices

API Usage

Request Optimization

# Good: Batch related requests
results = await client.batch_process([
    "Query 1",
    "Query 2",
    "Query 3"
])

# Bad: Multiple individual requests
result1 = await client.process("Query 1")
result2 = await client.process("Query 2")
result3 = await client.process("Query 3")

Resource Management

# Good: Use context managers
async with client.session() as session:
    result = await session.process(prompt)

# Bad: Manual resource management
session = await client.create_session()
result = await session.process(prompt)
await session.close()

Performance

Connection Pooling

from llamahome.client import PooledClient

client = PooledClient(
    pool_size=10,
    max_retries=3
)

Response Streaming

async for chunk in client.stream_response(
    prompt,
    chunk_size=1000
):
    process_chunk(chunk)

Next Steps

API Examples
Integration Guide
Security Guide
Monitoring Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API.md

API.md

LlamaHome API Guide

Table of Contents

Overview

API Overview

Core Concepts

Quick Start

Basic Usage

Streaming Responses

API Reference

Core Endpoints

Model Management

Configuration

Integration Patterns

Client Integration

Server Integration

Security

Authentication

Rate Limiting

Monitoring

Metrics Collection

Performance Monitoring

Error Handling

Error Types

Error Recovery

Best Practices

API Usage

Performance

Next Steps

Files

API.md

Latest commit

History

API.md

File metadata and controls

LlamaHome API Guide

Table of Contents

Overview

API Overview

Core Concepts

Quick Start

Basic Usage

Streaming Responses

API Reference

Core Endpoints

Model Management

Configuration

Integration Patterns

Client Integration

Server Integration

Security

Authentication

Rate Limiting

Monitoring

Metrics Collection

Performance Monitoring

Error Handling

Error Types

Error Recovery

Best Practices

API Usage

Performance

Next Steps