Comanda is a command-line tool that enables the composition of Large Language Model (LLM) operations using a YAML-based workflow. It simplifies the process of creating and managing agentic workflows.
Think of each step in a YAML file as the equivalent of a Lego block. You can chain these blocks together to create more complex structures which can help solve problems.
Create YAML workflow 'recipes' and use comanda process
to execute the recipe file.
Comanda allows you to use the best provider and model for each step and compose workflows that combine the stregths of different LLMs. It supports multiple LLM providers (Anthropic, Deepseek, Google, Local models via Ollama, OpenAI, and X.AI) and offers the ability to chain these models together by passing outputs from one step to inputs in the next step.
- 🔗 Chain multiple LLM operations together using simple YAML configuration
- 🤖 Support for multiple LLM providers (OpenAI, Anthropic, Google, X.AI, Ollama)
- 📄 File-based operations and transformations
- 🖼️ Support for image analysis with vision models (screenshots and common image formats)
- 🌐 Direct URL input support for web content analysis
- 🕷️ Advanced web scraping capabilities with configurable options
- 🛠️ Extensible YAML configuration for defining workflows
- ⚡ Efficient processing of LLM chains
- 🚀 Parallel processing of independent steps for improved performance
- 🔒 HTTP server mode: use it as a multi-LLM workflow wrapper
- 🔐 Secure configuration encryption for protecting API keys and secrets
- 📁 Multi-file input support with content consolidation
- 📝 Markdown file support for reusable actions (prompts)
- 🗄️ Database integration for read/write operations for inputs and outputs
The easiest way to get started is to download a pre-built binary from the GitHub Releases page. Binaries are available for:
- Windows (386, amd64)
- macOS (amd64, arm64)
- Linux (386, amd64, arm64)
Download the appropriate binary for your system, extract it if needed, and place it somewhere in your system's PATH.
go install github.com/kris-hansen/comanda@latest
git clone https://github.com/kris-hansen/comanda.git
cd comanda
go build
COMandA uses an environment file to store provider configurations and API keys. By default, it looks for a .env
file in the current directory. You can specify a custom path using the COMANDA_ENV
environment variable:
# Use a specific env file
export COMANDA_ENV=/path/to/your/env/file
comanda process your-workflow-file.yaml
# Or specify it inline
COMANDA_ENV=/path/to/your/env/file comanda process your-workflow-file.yaml
COMandA supports encrypting your configuration file to protect sensitive information like API keys. The encryption uses AES-256-GCM with password-derived keys, providing strong security against unauthorized access.
To encrypt your configuration:
comanda configure --encrypt
You'll be prompted to enter and confirm an encryption password. Once encrypted, all commands that need to access the configuration (process, server, configure) will prompt for the password.
Example workflow:
# First, configure your providers and API keys
comanda configure
# Then encrypt the configuration
comanda configure --encrypt
Enter encryption password: ********
Confirm encryption password: ********
Configuration encrypted successfully!
# When running commands, you'll be prompted for the password
comanda process your-workflow-file.yaml
Enter decryption password: ********
The encryption system provides:
- AES-256-GCM encryption (industry standard)
- Password-based key derivation
- Protection against tampering
- Brute-force resistance
You can still view your configuration using:
comanda configure --list
This will prompt for the password if the configuration is encrypted.
Configure your providers and models using the interactive configuration command:
comanda configure
This will prompt you to:
- Select a provider (OpenAI/Anthropic/Google/X.AI/Ollama)
- Enter API key (for OpenAI/Anthropic/Google/X.AI)
- Specify model name
- Select model mode:
- text: For text-only operations
- vision: For image analysis capabilities
- multi: For both text and image operations
You can view your current configuration using:
comanda configure --list
Server Configuration:
Port: 8080
Data Directory: data
Authentication Enabled: true
Configured Providers:
anthropic:
- claude-3-5-latest (external)
google:
- gemini-pro (external)
ollama:
- llama3.2 (local)
openai:
- gpt-4o-mini (external)
- gpt-4o (external)
xai:
- grok-beta (external)
To remove a model from the configuration:
comanda configure --remove <model-name>
To update an API key for a provider (e.g., after key rotation):
comanda configure --update-key=<provider-name>
This will prompt you for the new API key and update it in the configuration. For example:
comanda configure --update-key=openai
Enter new API key: sk-...
Successfully updated API key for provider 'openai'
When configuring a model that already exists, you'll be prompted to update its mode. This allows you to change a model's capabilities without removing and re-adding it.
Example configuration output:
Configuration from .env:
Server Configuration:
Port: 8088
Data Directory: data
Authentication Enabled: true
Bearer Token: <redacted>
Configured Providers:
ollama:
- llama2:latest (local)
Modes: text
openai:
- gpt-4-turbo-preview (external)
Modes: text, vision, multi, file
- gpt-4-vision-preview (external)
Modes: vision
- gpt-4o (external)
Modes: text, vision, multi, file
- gpt-4o-mini (external)
Modes: text, vision, multi, file
- o1-mini (external)
Modes: text
- o1-preview (external)
Modes: text
xai:
- grok-beta (external)
Modes: text, file
- grok-vision-beta (external)
Modes: vision
anthropic:
- claude-3-5-sonnet-20241022 (external)
Modes: text, vision, multi, file
- claude-3-5-sonnet-latest (external)
Modes: text, vision, multi, file
- claude-3-5-haiku-latest (external)
Modes: text, vision, multi, file
deepseek:
- deepseek-chat (external)
Modes: text, vision, multi, file
google:
- gemini-1.5-flash (external)
Modes: text, vision, multi, file
- gemini-1.5-flash-8b (external)
Modes: text, vision, multi, file
- gemini-1.5-pro (external)
Modes: text, vision, multi, file
- gemini-2.0-flash-exp (external)
Modes: text, vision, multi, file
- gemini-2.0-flash-001 (external)
Modes: text, vision, multi, file
- gemini-2.0-pro-exp-02-05 (external)
Modes: text, vision, multi, file
- gemini-2.0-flash-lite-preview-02-05 (external)
Modes: text, vision, multi, file
- gemini-2.0-flash-thinking-exp-01-21 (external)
Modes: text, vision, multi, file
COMandA can run as an HTTP server, allowing you to process chains of models and actions defined in YAML files via HTTP requests. The server is managed using the server
command:
# Start the server
comanda server
# Configure server settings
comanda server configure # Interactive configuration
comanda server show # Show current configuration
comanda server port 8080 # Set server port
comanda server datadir ./data # Set data directory
comanda server auth on # Enable authentication
comanda server auth off # Disable authentication
comanda server newtoken # Generate new bearer token
comanda server cors # Configure CORS settings
The server provides several configuration commands:
configure
: Interactive configuration for all server settings including port, data directory, authentication, and CORSshow
: Display current server configuration including CORS settingsport
: Set the server portdatadir
: Set the data directory for YAML filesauth
: Enable/disable authenticationnewtoken
: Generate a new bearer tokencors
: Configure CORS settings interactively
The CORS configuration allows you to:
- Enable/disable CORS headers
- Set allowed origins (use * for all, or specify domains)
- Configure allowed HTTP methods
- Set allowed headers
- Define max age for preflight requests
The server configuration is stored in your .env
file alongside provider and model settings:
server:
port: 8080
data_dir: "examples" # Directory containing YAML files to process
bearer_token: "your-generated-token"
enabled: true # Whether authentication is required
cors:
enabled: true # Enable/disable CORS
allowed_origins: ["*"] # List of allowed origins, ["*"] for all
allowed_methods: ["GET", "POST", "PUT", "DELETE", "OPTIONS"] # List of allowed HTTP methods
allowed_headers: ["Authorization", "Content-Type"] # List of allowed headers
max_age: 3600 # Max age for preflight requests in seconds
The CORS configuration allows you to control Cross-Origin Resource Sharing settings:
enabled
: Enable or disable CORS headers (default: true)allowed_origins
: List of origins allowed to access the API. Use["*"]
to allow all origins, or specify domains like["https://example.com"]
allowed_methods
: List of HTTP methods allowed for cross-origin requestsallowed_headers
: List of headers allowed in requestsmax_age
: How long browsers should cache preflight request results
To start the server:
comanda server
The server provides the following endpoints:
# Get file content as plain text
curl -H "Authorization: Bearer your-token" \
-H "Accept: text/plain" \
"http://localhost:8080/files/content?path=example.txt"
# Download binary file
curl -H "Authorization: Bearer your-token" \
-H "Accept: application/octet-stream" \
"http://localhost:8080/files/download?path=example.pdf" \
--output downloaded_file.pdf
# Upload a file
curl -X POST \
-H "Authorization: Bearer your-token" \
-F "file=@/path/to/local/file.txt" \
-F "path=destination/file.txt" \
"http://localhost:8080/files/upload"
Using JavaScript:
// Get file content
async function getFileContent(path) {
const response = await fetch(`http://localhost:8080/files/content?path=${encodeURIComponent(path)}`, {
headers: {
'Authorization': 'Bearer your-token',
'Accept': 'text/plain'
}
});
return await response.text();
}
// Download file
async function downloadFile(path) {
const response = await fetch(`http://localhost:8080/files/download?path=${encodeURIComponent(path)}`, {
headers: {
'Authorization': 'Bearer your-token',
'Accept': 'application/octet-stream'
}
});
const blob = await response.blob();
// Create download link
const url = window.URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = path.split('/').pop(); // Use filename from path
document.body.appendChild(a);
a.click();
window.URL.revokeObjectURL(url);
document.body.removeChild(a);
}
// Upload file
async function uploadFile(file, path) {
const formData = new FormData();
formData.append('file', file);
formData.append('path', path);
const response = await fetch('http://localhost:8080/files/upload', {
method: 'POST',
headers: {
'Authorization': 'Bearer your-token'
},
body: formData
});
return await response.json();
}
GET /process
processes a YAML file from the configured data directory. For YAML files that use STDIN as their first input, POST /process
is also supported. Both endpoints support real-time output streaming using Server-Sent Events.
# Regular processing (JSON response)
curl "http://localhost:8080/process?filename=openai-example.yaml"
# Streaming processing (Server-Sent Events)
curl -H "Accept: text/event-stream" \
"http://localhost:8080/process?filename=openai-example.yaml&streaming=true"
# With authentication (when enabled)
curl -H "Authorization: Bearer your-token" \
-H "Accept: text/event-stream" \
"http://localhost:8080/process?filename=openai-example.yaml&streaming=true"
You can provide input either through a query parameter or JSON body:
# Regular processing with query parameter
curl -X POST "http://localhost:8080/process?filename=stdin-example.yaml&input=your text here"
# Regular processing with JSON body
curl -X POST \
-H "Content-Type: application/json" \
-d '{"input":"your text here", "streaming": false}' \
"http://localhost:8080/process?filename=stdin-example.yaml"
# Streaming processing with JSON body
curl -X POST \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"input":"your text here", "streaming": true}' \
"http://localhost:8080/process?filename=stdin-example.yaml"
Note: POST requests are only allowed for YAML files where the first step uses "STDIN" as input. The /list endpoint shows which methods (GET or GET,POST) are supported for each YAML file.
Response format (non-streaming):
{
"success": true,
"message": "Successfully processed openai-example.yaml",
"output": "Response from gpt-4o-mini:\n..."
}
Response format (streaming):
data: Processing step 1...
data: Model response: ...
data: Processing step 2...
data: Processing complete
Error response (non-streaming):
{
"success": false,
"error": "Error message here",
"output": "Any output generated before the error"
}
Using JavaScript:
// Regular processing
async function processFile(filename, input = null) {
const url = `http://localhost:8080/process?filename=${encodeURIComponent(filename)}`;
const options = {
method: input ? 'POST' : 'GET',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json'
}
};
if (input) {
options.body = JSON.stringify({ input, streaming: false });
}
const response = await fetch(url, options);
return await response.json();
}
// Streaming processing
async function processFileStreaming(filename, input = null) {
const url = `http://localhost:8080/process?filename=${encodeURIComponent(filename)}`;
const options = {
method: input ? 'POST' : 'GET',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json',
'Accept': 'text/event-stream'
}
};
if (input) {
options.body = JSON.stringify({ input, streaming: true });
}
const response = await fetch(url, options);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Handle each SSE message
console.log(text);
}
}
GET /list
returns a list of YAML files in the configured data directory, along with their supported HTTP methods:
curl -H "Authorization: Bearer your-token" "http://localhost:8080/list"
Response format:
{
"success": true,
"files": [
{
"name": "openai-example.yaml",
"methods": "GET"
},
{
"name": "stdin-example.yaml",
"methods": "GET,POST"
}
]
}
The methods
field indicates which HTTP methods are supported:
GET
: The YAML file can be processed normallyGET,POST
: The YAML file accepts STDIN string input via POST request
GET /health
returns the server's current status:
curl -H "Authorization: Bearer your-token" "http://localhost:8080/health"
Response format:
{
"success": true,
"message": "Server is healthy",
"statusCode": 200,
"response": "OK"
}
POST /yaml/upload
uploads a YAML file for processing:
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-d '{"content": "your yaml content here"}' \
"http://localhost:8080/yaml/upload"
Response format:
{
"success": true,
"message": "YAML file uploaded successfully"
}
POST /yaml/process
processes a YAML file with optional real-time output streaming:
# Regular processing (JSON response)
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-d '{"content": "your yaml content here", "streaming": false}' \
"http://localhost:8080/yaml/process"
# Streaming processing (Server-Sent Events)
curl -X POST \
-H "Authorization: Bearer your-token" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"content": "your yaml content here", "streaming": true}' \
"http://localhost:8080/yaml/process"
Response format (non-streaming):
{
"success": true,
"yaml": "processed yaml content"
}
Response format (streaming):
data: Processing step 1...
data: Model response: ...
data: Processing step 2...
data: Processing complete
Using JavaScript:
// Upload YAML
async function uploadYaml(content) {
const response = await fetch('http://localhost:8080/yaml/upload', {
method: 'POST',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json'
},
body: JSON.stringify({ content })
});
return await response.json();
}
// Process YAML (non-streaming)
async function processYaml(content) {
const response = await fetch('http://localhost:8080/yaml/process', {
method: 'POST',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json'
},
body: JSON.stringify({ content, streaming: false })
});
return await response.json();
}
// Process YAML (streaming)
async function processYamlStreaming(content) {
const response = await fetch('http://localhost:8080/yaml/process', {
method: 'POST',
headers: {
'Authorization': 'Bearer your-token',
'Content-Type': 'application/json',
'Accept': 'text/event-stream'
},
body: JSON.stringify({ content, streaming: true })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { value, done } = await reader.read();
if (done) break;
const text = decoder.decode(value);
// Handle each SSE message
console.log(text);
}
}
The server logs all requests to the console, including:
- Timestamp
- Request method and path
- Query parameters
- Authorization header (token masked)
- Response status code
- Request duration
Example server log:
2024/11/02 21:06:33 Request: method=GET path=/health query= auth=Bearer ******** status=200 duration=875µs
2024/11/02 21:06:37 Request: method=GET path=/list query= auth=Bearer ******** status=200 duration=812.208µs
2024/11/02 21:06:45 Request: method=GET path=/process query=filename=examples/openai-example.yaml auth=Bearer ******** status=200 duration=3.360269792s
COMandA supports various file types for input:
- Text files:
.txt
,.md
,.yml
,.yaml
- Image files:
.png
,.jpg
,.jpeg
,.gif
,.bmp
- Web content: Direct URLs to web pages, JSON APIs, or other web resources
- Special inputs:
screenshot
(captures current screen)
When using vision-capable models (like gpt-4o), you can analyze both images and screenshots alongside text content.
Images are automatically optimized for processing:
- Large images are automatically resized to a maximum dimension of 1024px while preserving aspect ratio
- PNG compression is applied to reduce token usage while maintaining quality
- These optimizations help prevent rate limit errors and ensure efficient processing
The screenshot feature allows you to capture the current screen state for analysis. When you specify screenshot
as the input in your Workflow file, COMandA will automatically capture the entire screen and pass it to the specified model for analysis. This is particularly useful for UI analysis, bug reports, or any scenario where you need to analyze the current screen state.
For URL inputs, COMandA automatically:
- Detects and validates URLs in input fields
- Fetches content with appropriate error handling
- Handles different content types (HTML, JSON, plain text)
- Stores content in temporary files with appropriate extensions
- Cleans up temporary files after processing
Create a YAML file defining your chain of operations:
# example.yaml
summarize:
model: "gpt-4"
provider: "openai"
input:
file: "input.txt"
prompt: "Summarize the following content:"
output:
file: "summary.txt"
analyze:
model: "claude-2"
provider: "anthropic"
input:
file: "summary.txt"
prompt: "Analyze the key points in this summary:"
output:
file: "analysis.txt"
For image analysis:
# image-analysis.yaml
analyze:
input: "image.png" # Can be any supported image format
model: "gpt-4o"
action: "Analyze this image and describe what you see in detail."
output: "STDOUT"
Comanda supports parallel processing of independent steps to improve performance. This is particularly useful for tasks that don't depend on each other, such as:
- Running the same prompt against multiple models for comparison
- Processing multiple files independently
- Performing different analyses on the same input
To use parallel processing, define steps under a parallel-process
block in your YAML file:
# parallel-model-comparison.yaml
parallel-process:
gpt4o_step:
input:
- NA
model: gpt-4o
action:
- write a short story about a robot that discovers it has emotions
output:
- examples/parallel-processing/gpt4o-story.txt
claude_step:
input:
- NA
model: claude-3-5-sonnet-latest
action:
- write a short story about a robot that discovers it has emotions
output:
- examples/parallel-processing/claude-story.txt
compare_step:
input:
- examples/parallel-processing/gpt4o-story.txt
- examples/parallel-processing/claude-story.txt
model: gpt-4o
action:
- compare these two short stories about robots discovering emotions
- which one is more creative and has better narrative structure?
output:
- STDOUT
In this example:
- The
gpt4o_step
andclaude_step
will run in parallel - The
compare_step
will run after both parallel steps complete, as it depends on their outputs
The system automatically validates dependencies between steps to ensure:
- No circular dependencies exist
- Steps that depend on outputs from other steps run after those steps complete
- Parallel steps are truly independent of each other
Parallel processing leverages Go's concurrency features (goroutines and channels) for efficient execution.
Run your YAML workflow file:
comanda process your-workflow-file.yaml
For example:
Processing Workflow file: examples/openai-example.yaml
Configuration:
Step: step_one
- Input: [examples/example_filename.txt]
- Model: [gpt-4o-mini]
- Action: [look through these company names and identify the top five which seem most likely in the HVAC business]
- Output: [STDOUT]
Step: step_two
- Input: [STDIN]
- Model: [gpt-4o]
- Action: [for each of these company names provide a snappy tagline that would make them stand out]
- Output: [STDOUT]
Response from gpt-4o-mini:
Based on the company names provided, the following five seem most likely to be in the HVAC (Heating, Ventilation, and Air Conditioning) business:
1. **Evergreen Industries** - The name suggests a focus on sustainability, which is often associated with HVAC systems that promote energy efficiency.
2. **Mountain Peak Investments** - While not directly indicative of HVAC, the name suggests a focus on construction or infrastructure, which often involves HVAC installations.
3. **Cascade Technologies** - The term "cascade" could relate to water systems or cooling technologies, which are relevant in HVAC.
4. **Summit Technologies** - Similar to Mountain Peak, "Summit" may imply involvement in high-quality or advanced systems, possibly including HVAC solutions.
5. **Zenith Industries** - The term "zenith" suggests reaching the highest point, which can be associated with premium or top-tier HVAC products or services.
These names suggest a connection to industries related to heating, cooling, or building systems, which are integral to HVAC.
Response from gpt-4o:
Certainly! Here are some snappy taglines for each of the company names that could help them stand out in the HVAC industry:
1. **Evergreen Industries**: "Sustainability in Every Breath."
2. **Mountain Peak Investments**: "Building Comfort from the Ground Up."
3. **Cascade Technologies**: "Cooling Solutions That Flow."
4. **Summit Technologies**: "Reaching New Heights in HVAC Innovation."
5. **Zenith Industries**: "At the Pinnacle of Climate Control Excellence."
Comanda supports database operations as input and output in the YAML workflow. Currently, PostgreSQL is supported.
Before using database operations, configure your database connection:
comanda configure --database
This will prompt for:
- Database configuration name (used in YAML files)
- Database type (postgres)
- Host, port, username, password, database name
Reading from a database:
input:
database: mydb # Database configuration name
sql: SELECT * FROM customers LIMIT 5 # Must be SELECT statement
Writing to a database:
output:
database: mydb
sql: INSERT INTO customers (first_name, last_name, email) VALUES ('John', 'Doe', '[email protected]')
Examples can be found in the examples/
directory. Here is a link to the README for the examples: examples/README.md
comanda/
├── cmd/ # Command line interface
├── utils/
│ ├── config/ # Configuration handling
│ ├── input/ # Input validation and processing
│ ├── models/ # LLM provider implementations
│ ├── scraper/ # Web scraping functionality
│ └── processor/ # DSL processing logic
├── go.mod
├── go.sum
└── main.go
The following features are being considered:
- More providers:
- Huggingface inference API?
- Image generation providers?
- others?
- URL output support, post this data to URL
- Need to add credential support
- Need to solve for local secrets encryption
- Branching and basic if/or logic
- Routing logic i.e., use this model if the output is x and that model if y
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
Please ensure your PR:
- Includes tests for new functionality
- Updates documentation as needed
- Follows the existing code style
- Includes a clear description of the changes
This project is licensed under the MIT License - see the LICENSE file for details.
If you use COMandA in your research or academic work, please cite it as follows:
@software{comanda2024,
author = {Hansen, Kris},
title = {COMandA: Chain of Models and Actions},
year = {2024},
publisher = {GitHub},
url = {https://github.com/kris-hansen/comanda},
description = {A command-line tool for composing Large Language Model operations using YAML-based workflows}
}
- OpenAI and Anthropic for their LLM APIs
- The Ollama project for local LLM support
- The Go community for excellent libraries and tools
- The Colly framework for web scraping capabilities