AnDB: AI-Native Database

AnDB (AI-Native DataBase) is an experimental database designed to bridge the gap between structured and unstructured data by leveraging cutting-edge AI technologies. It supports traditional relational database operations while enabling AI-driven tasks through intuitive SQL-like statements. AnDB is built to handle semantic queries, automate query optimization, and provide seamless integration of AI models, making it a powerful tool for universal semantic analysis.

Key Features

AI-Native Design: AnDB integrates AI technologies, such as Large Language Models (Deepseek only), to enable semantic queries and automate complex tasks like schema inference, semantic joins, and clustering.
Unified Data Analysis: Supports both structured (relational) and unstructured (text, images, etc.) data, allowing users to perform unified semantic analysis across diverse data types.
SQL-Like Interface: Users can execute AI-driven tasks using intuitive SQL-like statements without requiring deep AI expertise.
Cost-Aware Optimization: AnDB’s query optimizer balances accuracy, execution time, and financial cost, generating multiple execution plans and selecting the optimal one.
Multiple Storage Backends: Supports various storage engines and data types (relational, time-series, vector).
DB4AI Integration: Seamlessly integrates with machine learning libraries for AI-driven analytics.
Experimental Prototype: Currently implemented in Python for research and experimentation.

Getting Started

Prerequisites

Python 3.13 or higher.
Dependencies: Install required libraries using pip install -r requirements.txt.

Installation

Clone the repository:

git clone https://github.com/wotchin/AnDB.git
cd AnDB

Install dependencies:
```
pip install -r requirements.txt
```

Run the AnDB server:

python andb_server.py  --- Naive PostgreSQL wire protocol
python tools/local_client.py  --- like SQLite

Example Queries

Simple Semantic Query:

SELECT PROMPT("Analyze technical areas and count publications per area")
  FROM FILE("neurips_2024.txt"); -- RAG-like query

Schema Defination:

SELECT SEM_CLUSTER(title, PROMPT('Area of publication of the paper'), 5) AS area, COUNT(title) 
  FROM TABULAR(PROMPT('Authors of the paper') AS author text, 
    PROMPT('Title of the paper') AS title text FROM File('neurips_2024.txt')) neurips2024 
  GROUP BY area;

Contributing

We welcome contributions! If you’re interested in improving AnDB, please follow these steps:

Fork the repository.
Create a new branch for your feature or bugfix.
Submit a pull request with a detailed description of your changes.
Most of AnDB's functionalities are WIP and still polishing. Feel free and welcome to contribute your code!

License

AnDB is released under the Apache-2.0 license.

Citation

@article{wang2025andb,
  title={AnDB: Breaking Boundaries with an AI-Native Database for Universal Semantic Analysis},
  author={Wang, Tianqing and Xue, Xun and Li, Guoliang and Wang, Yong},
  journal={arXiv preprint arXiv:2502.13805},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
andb		andb
examples		examples
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
andb_server.py		andb_server.py
requirements-offline-models.txt		requirements-offline-models.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AnDB: AI-Native Database

Key Features

Getting Started

Prerequisites

Installation

Example Queries

Contributing

License

Citation

About

Releases

Packages

Languages

License

wotchin/AnDB

Folders and files

Latest commit

History

Repository files navigation

AnDB: AI-Native Database

Key Features

Getting Started

Prerequisites

Installation

Example Queries

Contributing

License

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages