Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocumentAgent (Phase 1) #438

Closed
marklysze opened this issue Jan 10, 2025 · 2 comments
Closed

DocumentAgent (Phase 1) #438

marklysze opened this issue Jan 10, 2025 · 2 comments
Assignees
Labels
agents:docagent Document Agent enhancement New feature or request RAG roadmap

Comments

@marklysze
Copy link
Collaborator

marklysze commented Jan 10, 2025

--FEEDBACK WELCOME--

DocumentAgent will be a document-based agent, able to ingest documents/sources of information and have that knowledge accessible to achieve its given task.

Examples of use-cases:

  • Document classification
  • Document/Page summarisation
  • Question Answering
  • Identify missing information
  • Invoice handling

The objective for this Phase is to provide a quick-start agent that developers can incorporate easily.

This DocumentAgent will include RAG capabilities and, so, it will be built progressively, with this Phase 1 implementation containing basic RAG capabilities such as being able to ingest and then embed into a vector database. Future implementations will include more advanced RAG capabilities and engines, as well as additional capabilities for document transformation.

Capabilities include:

  • Input: Read one or more TXT, CSV, PDF, HTML, Markdown, PPTX, JSON
  • Extract and store data, including into an intermediate format (such as Doclings DoclingDocument format)
  • Developer determined handling (put in prompt, use vector database, use third party query engine)
  • Query data, including support for 3rd party querying
  • Support for Structured Outputs to control output format

Example code (not final API):

# Most basic
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources="my_file.txt")

# Multiple sources, supporting different types
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources=[my_file_name_with_path, "https://my.url.com"]

# Storage and Retrieval from a Vector database
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=...,
    sources=[my_file_name_with_path, my_file_name_with_path],
    handling_config = DocumentHandlingConfig(document_types=[DocType.Text, DocType.XLSX], storage=DocumentStore.Weaviate, settings={...})

# 3rd-party query engine (or this could be an agent built on DocumentAgent, e.g. DocumentAgentAgentQL)
my_document_agent = DocumentAgent(
    name="docagent",
    llm_config=None,
    sources="https://my.url.com",
    handling_config = None,
    query_config = DocumentQueryConfig(document_types=[DocType.URL], provider=DocQueryProvider.AgentQL, settings={...})

Internal agent workflow:

  1. Load/Convert the document through handling configuration (defaulted for easy of use)
  2. Uses query configuration to respond to queries (e.g. inject full source into system message, query vector store and inject into system message, run external provider)

Notes:

  • The use of a common intermediate format may be important, such as using Docling for document parsing and their Docling Document format for local storage. This could provide a good basis for standardised tools for this agent.

Deliverables:

  • DocumentAgent code
  • Documentation
  • Blog
  • Notebook
  • Video script
@marklysze marklysze added this to ag2 Jan 10, 2025
@marklysze marklysze converted this from a draft issue Jan 10, 2025
@marklysze marklysze added enhancement New feature or request RAG labels Jan 10, 2025
@marklysze marklysze moved this to Todo in ag2 Jan 10, 2025
@AgentGenie
Copy link
Collaborator

  1. Use JSON for intermediate data format and use some convertor to integrate with parsing frameworks.
  2. Going to replace RetrieveUserProxyAgent
  3. Clear interface for user and internal. e.g. structured user config class, internal interfaces like query tool to integrate with 3rd party query frameworks.
  4. Refactor Vector DB interface to query engine.

@marklysze
Copy link
Collaborator Author

Thanks everyone! First phase in 0.7.5!

@github-project-automation github-project-automation bot moved this from In Progress to Done in ag2 Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agents:docagent Document Agent enhancement New feature or request RAG roadmap
Projects
Status: Done
Development

No branches or pull requests

3 participants