A powerful framework for migrating Excel data using configurable rules, multimodal analysis, and LLM integration. This framework allows you to define complex migration rules, learn from examples, and leverage visual analysis of Excel sheets.
- ๐ฏ Task-centric approach for Excel migrations
- ๐ค Support for multiple LLM providers through LangChain
- ๐๏ธ Multimodal analysis capabilities:
- ๐ Direct Excel file processing
- ๐ธ Screenshot analysis and data extraction
- ๐ Visual structure recognition
- ๐ OCR for text extraction
- ๐ง Rule generation from example files
- ๐ ๏ธ Flexible rule types:
- ๐ Direct copy
- ๐ Value transformation
- ๐งฎ Computed fields
- ๐ Aggregations
- โ Validation rules
- ๐ Plugin-based rule execution:
- ๐งฉ Extensible formula executors
- ๐ Custom transformations
- ๐จ Modular design
- ๐ก๏ธ SOLID principles
- ๐ค LLM-powered transformations
- โ๏ธ Configurable via JSON rules
- ๐ Comprehensive logging with loguru
- ๐๏ธ SOLID principles and clean architecture
# Using pip
pip install excel-migration-framework
# Using poetry
poetry add excel-migration-framework
# Simple migration with rules
excel-migrate source.xlsx target.xlsx --rules rules.json
# Process specific sheets
excel-migrate source.xlsx target.xlsx \
--source-sheets "Sheet1" "Sheet2" \
--target-sheets "Output1" "Output2"
# Generate rules from example files with sheet selection
excel-migrate source.xlsx target.xlsx \
--example-source example_source.xlsx \
--example-target example_target.xlsx \
--example-source-sheets "Template1" \
--example-target-sheets "Result1"
# Include screenshots with sheet mapping
excel-migrate source.xlsx target.xlsx \
--screenshots sheet1.png sheet2.png \
--screenshot-sheet-mapping "sheet1.png:Sheet1" "sheet2.png:Sheet2"
from excel_migration.tasks.base import MigrationTask
from excel_migration.core.processor import TaskBasedProcessor
from pathlib import Path
# Create a migration task with sheet selection
task = MigrationTask(
source_file=Path("source.xlsx"),
target_file=Path("target.xlsx"),
task_type="migrate",
description="Migrate customer data",
context={
"sheet_mapping": {
"CustomerData": "Processed_Customers",
"Transactions": "Processed_Transactions"
}
},
screenshots=[Path("sheet1.png")]
)
# Process the task
processor = TaskBasedProcessor(...)
success = await processor.process(task)
The framework uses a flexible plugin system for formula execution and value transformations, following SOLID principles:
from excel_migration.plugins.interfaces import FormulaExecutor
from typing import Any, Dict
class CustomFormulaExecutor(FormulaExecutor):
"""Custom formula executor plugin."""
formula_type = "CUSTOM"
def can_execute(self, formula: str) -> bool:
"""Check if this executor can handle the formula."""
return formula.startswith("CUSTOM(")
def execute(self, formula: str, values: Dict[str, Any]) -> Any:
"""Execute the custom formula."""
# Implement custom formula logic
pass
# Register the plugin
registry = PluginRegistry()
registry.register_formula_executor(CustomFormulaExecutor())
from excel_migration.plugins.interfaces import TransformationHandler
from typing import Any, Dict
class CustomTransformer(TransformationHandler):
"""Custom transformation plugin."""
transformation_type = "custom_format"
def can_transform(self, transformation: Dict[str, Any]) -> bool:
"""Check if this handler can process the transformation."""
return transformation.get("type") == self.transformation_type
def transform(self, value: Any, params: Dict[str, Any]) -> Any:
"""Transform the value according to parameters."""
# Implement custom transformation logic
pass
# Register the plugin
registry.register_transformation_handler(CustomTransformer())
The framework includes several built-in plugins:
- ๐
DateDiffExecutor
: Calculate date differences - ๐ข
CountExecutor
: Count values or records - ๐ฏ
CountIfExecutor
: Conditional counting - โ
SumExecutor
: Sum numeric values - ๐
AverageExecutor
: Calculate averages
- ๐
DateTimeTransformer
: Format dates and times - ๐ข
NumericTransformer
: Format numbers - โ
BooleanTransformer
: Convert to boolean values - ๐ค
ConcatenateTransformer
: Join multiple values
Migrates data from source to target Excel files.
excel-migrate source.xlsx target.xlsx \
--task-type migrate \
--source-sheets "Data" \
--target-sheets "Processed"
Analyzes Excel files and provides insights.
excel-migrate source.xlsx target.xlsx \
--task-type analyze \
--source-sheets "Financial" "Metrics"
Validates data against rules.
excel-migrate source.xlsx target.xlsx \
--task-type validate \
--source-sheets "Input" \
--rules validation_rules.json
The framework can analyze Excel sheets through multiple approaches:
-
๐ Direct File Analysis
- ๐ Structure analysis
- ๐ Formula parsing
- ๐ท๏ธ Data type detection
-
๐๏ธ Visual Analysis (from screenshots)
- ๐ Table structure detection
- ๐ฒ Cell boundary recognition
- ๐ Text extraction (OCR)
- ๐จ Layout analysis
-
๐ง LLM Integration
- ๐ญ Natural language understanding
- ๐ Complex pattern recognition
- ๐ Context-aware transformations
Rules can be generated automatically by analyzing example files:
# Generate rules from specific sheets in examples
excel-migrate source.xlsx target.xlsx \
--example-source example_source.xlsx \
--example-target example_target.xlsx \
--example-source-sheets "Template" \
--example-target-sheets "Final" \
--output-rules rules.json
The framework will:
- ๐ Analyze source and target examples
- ๐งฎ Identify patterns and transformations
- โจ Generate appropriate rules
- ๐พ Save rules for future use
# Use OpenAI
excel-migrate source.xlsx target.xlsx \
--llm-provider openai \
--model gpt-4
# Use Anthropic
excel-migrate source.xlsx target.xlsx \
--llm-provider anthropic \
--model claude-2
# Set log level
excel-migrate source.xlsx target.xlsx --log-level DEBUG
# Log to file
excel-migrate source.xlsx target.xlsx --log-file migration.log
Create custom rule types by implementing the Rule interface:
from excel_migration.core.interfaces import Rule
class CustomRule(Rule):
async def apply(self, data: Any, context: Dict[str, Any]) -> Any:
# Implement custom logic
pass
Subscribe to migration events:
from excel_migration.core.interfaces import EventEmitter
def on_cell_processed(data: Dict[str, Any]):
print(f"Processed cell: {data}")
emitter = EventEmitter()
emitter.on("cell_processed", on_cell_processed)
Enable caching for better performance:
excel-migrate source.xlsx target.xlsx --cache-dir ./cache
Contributions are welcome! Please feel free to submit a Pull Request.
# Clone repository
git clone https://github.com/yourusername/excel-migration-framework.git
# Install dependencies
poetry install
# Run tests
poetry run pytest
This project is licensed under the MIT License - see the LICENSE file for details.