This repository contains custom pipelines for Open WebUI, designed to extend its capabilities and unlock new, powerful use cases.
This pipeline is designed to extract text from images and generate a response based on recognized text. It uses different models and prompts for vision-to-text and text-to-text generation.
-
Image Text Recognition
Utilizes a vision model to accurately recognize visible text within images.- Supports multiple image inputs.
- Analyze images from the last user post.
-
Text Analysis
The recognized text is passed to a general-purpose language model for detailed analysis and response generation based on the provided prompt or user query.
-
OLLAMA_BASE_URL
: URL pointing to your Ollama server API or Open WebUI proxy endpoint.Default:
http://OPEN_WEBUI_HOST/ollama
-
VISION_MODEL_ID
: ID of the vision model used for text extraction.Default:
minicpm-v:latest
-
VISION_PROMPT
: The prompt provided to the vision model for extracting text. -
GENERAL_PURPOSE_MODEL_ID
: The ID of the general-purpose language model used for analysis.Default:
llama3.1:latest
-
GENERAL_PURPOSE_PROMPT
: The default prompt provided to the general-purpose model. -
USER_PROMPT_TO_USE_DEFAULT_PROMPT
: User marker to use default prompt for general-purpose model.Default:
_
For more details on parameters, see the
Valve
class within the code.
-
Answering Questions from Images
Extract text from images containing questions and provide answers based on the image content. -
Analyzing Text-Based Images
Process images with textual information (e.g., signs, documents, etc) to derive meaning and extract insights. -
Document Summarization
Summarize the main points of documents captured as images, turning them into concise text.
Contributions to enhance this pipeline are welcome! Feel free to submit pull requests with suggestions and improvements.