-
Notifications
You must be signed in to change notification settings - Fork 495
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…4059) * Fixes #3971: Check how to integrate vector databases via rest APIs * fixed CI errors and removed unused imports * Changes review: added weaviate db, removed vector idx autocreation and vector as a default result * code clean * Changes review: added systemdb store, removed constraint creation * code clean * 2nd changes review * fixed qdrant filename typo and removed info procs from docs
- Loading branch information
Showing
31 changed files
with
4,081 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
157 changes: 157 additions & 0 deletions
157
docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/chroma.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
|
||
== ChromaDB | ||
|
||
Here is a list of all available ChromaDB procedures, | ||
note that the list and the signature procedures are consistent with the others, like the Qdrant ones: | ||
|
||
[opts=header, cols="1, 3"] | ||
|=== | ||
| name | description | ||
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) | | ||
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections`. | ||
| apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config) | | ||
Deletes a collection with the name specified in the 2nd parameter. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>`. | ||
| apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config) | | ||
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/upsert`. | ||
| apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config) | | ||
Deletes the vectors with the specified `ids`. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/delete`. | ||
| apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config) | | ||
Gets the vectors with the specified `ids`. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`. | ||
| apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config) | | ||
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`. | ||
| apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config) | | ||
Gets the vectors with the specified `ids`, and optionally creates/updates neo4j entities. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`. | ||
| apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) | | ||
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. | ||
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`. | ||
|=== | ||
|
||
where the 1st parameter can be a key defined by the apoc config `apoc.chroma.<key>.host=myHost`. | ||
With hostOrKey=null, the default is 'http://localhost:8000'. | ||
|
||
=== Examples | ||
|
||
.Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>}) | ||
---- | ||
|
||
|
||
.Delete a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.deleteCollection($host, '<collection_id>', {<optional config>}) | ||
---- | ||
|
||
|
||
.Upsert vectors (it leverages https://docs.trychroma.com/usage-guide#adding-data-to-a-collection[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.qdrant.upsert($host, '<collection_id>', | ||
[ | ||
{id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}, text: 'ajeje'}, | ||
{id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}, text: 'brazorf'} | ||
], | ||
{<optional config>}) | ||
---- | ||
|
||
|
||
.Get vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | entity | ||
| null | {city: "Berlin", foo: "one"} | null | null | null | null | ||
| null | {city: "Berlin", foo: "two"} | null | null | null | null | ||
| ... | ||
|=== | ||
|
||
|
||
.Get vectors with `{allResults: true}` | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | entity | ||
| null | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje | null | ||
| null | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf | null | ||
| ... | ||
|=== | ||
|
||
|
||
.Query vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.query($host, | ||
'<collection_id>', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{city: 'London'}, | ||
5, | ||
{allResults: true, <optional config>}), text | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | ||
| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje | ||
| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf | ||
| ... | ||
|=== | ||
|
||
|
||
[NOTE] | ||
==== | ||
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures. | ||
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]}, | ||
so that we do not return the other values that we do not need. | ||
==== | ||
|
||
|
||
In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them, | ||
by leveraging the vector metadata. For example: | ||
|
||
.Query vectors | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.query($host, '<collection_id>', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{}, | ||
5, | ||
{ mapping: { | ||
embeddingKey: "vect", | ||
nodeLabel: "Test", | ||
entityKey: "myId", | ||
metadataKey: "foo" | ||
} | ||
}) | ||
---- | ||
|
||
|
||
|
||
.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.chroma.delete($host, '<collection_id>', [1,2], {<optional config>}) | ||
---- | ||
|
99 changes: 99 additions & 0 deletions
99
docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/custom.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
|
||
== Custom (i.e. other vector databases) | ||
|
||
We can also interface with other db vectors that do not (yet) have dedicated procedures. | ||
For example, with https://docs.pinecone.io/guides/getting-started/overview[Pinecone], as we will see later. | ||
|
||
Here is a list of all available custom procedures: | ||
|
||
[opts=header, cols="1, 3"] | ||
|=== | ||
| name | description | ||
| apoc.vectordb.custom.get(host, $embeddingConfig) | Customizable get / query procedure, | ||
returning a result like the others `apoc.vectordb.*.get` ones | ||
| apoc.vectordb.custom(host, $config) | Fully customizable procedure, returns generic object results. | ||
|=== | ||
|
||
|
||
=== Examples | ||
|
||
|
||
The `apoc.vectordb.custom.get` can be used with every API that return something like this | ||
(note that the call does not need to return all keys): | ||
|
||
``` | ||
[ | ||
"<idKey>": "value", | ||
"<scoreKey>": scoreValue, | ||
"<vectorKey>": [ ... ] | ||
"<metadataKey>": { .. }, | ||
"<textKey>": "..." | ||
], | ||
[ | ||
... | ||
] | ||
``` | ||
|
||
where we can customize idKey, scoreKey, vectorKey, metadataKey and textKey via the homonyms config parameters. | ||
|
||
|
||
Let's look at some examples using https://docs.pinecone.io/guides/getting-started/overview[Pinecone]. | ||
|
||
|
||
.apoc.vectordb.custom.get example | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.custom.get('https://<INDEX-ID>.svc.gcp-starter.pinecone.io/query', { | ||
body: { | ||
"namespace", namespace, | ||
"vector", vector, | ||
"topK", 3, | ||
"includeValues", true, | ||
"includeMetadata", true | ||
}, | ||
headers: {"Api-Key", apiKey}, | ||
method: null, | ||
jsonPath: "matches", | ||
// the RestAPI return values as the key with values the vectors | ||
vectorKey: 'values' | ||
}), text | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | ||
| 1, | {a: 1} | 1 | [1,2,3,4] | ||
| 0.1 | {a: 2} | 2 | [1,2,3,4] | ||
| ... | ||
|=== | ||
|
||
|
||
|
||
.apoc.vectordb.custom example | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.custom('https://<INDEX-ID>.svc.gcp-starter.pinecone.io/query', { | ||
body: { | ||
"namespace", namespace, | ||
"vector", vector, | ||
"topK", 3, | ||
"includeValues", true, | ||
"includeMetadata", true | ||
}, | ||
headers: {"Api-Key", apiKey}, | ||
method: null, | ||
jsonPath: "matches" | ||
}) | ||
---- | ||
|
||
|
||
.Example esults | ||
[opts="header"] | ||
|=== | ||
| value | ||
| {score: <score>, metadata: <metadata>, id: <id>, vector: <vector>} | ||
| {score: <score>, metadata: <metadata>, id: <id>, vector: <vector>} | ||
| ... | ||
|=== |
108 changes: 108 additions & 0 deletions
108
docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/index.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,108 @@ | ||
[[vectordb]] | ||
= Vector Databases | ||
:description: This section describes procedures that can be used to interact with Vector Databases. | ||
|
||
APOC provides these set of procedures, which leverages the Rest APIs, to interact with Vector Databases: | ||
|
||
- `apoc.vectordb.qdrant.*` (to interact with https://qdrant.tech/documentation/overview/[Qdrant]) | ||
- `apoc.vectordb.chroma.*` (to interact with https://docs.trychroma.com/getting-started[Chroma]) | ||
- `apoc.vectordb.weaviate.*` (to interact with https://weaviate.io/developers/weaviate[Weaviate]) | ||
- `apoc.vectordb.custom.*` (to interact with other vector databases). | ||
- `apoc.vectordb.configure` (to store host, credentials and mapping into the system database) | ||
|
||
All the procedures, except the `apoc.vectordb.configure` one, can have, as a final parameter, | ||
a configuration map with these optional parameters: | ||
|
||
.config parameters | ||
|
||
|=== | ||
| key | description | ||
| headers | additional HTTP headers | ||
| method | HTTP method | ||
| endpoint | endpoint key, | ||
can be used to override the default endpoint created via the 1st parameter of the procedures, | ||
to handle potential endpoint changes. | ||
| body | body HTTP request | ||
| jsonPath | To customize https://github.com/json-path/JsonPath[JSONPath] parsing of the response. The default is `null`. | ||
|=== | ||
|
||
|
||
Besides the above config, the `apoc.vectordb.<type>.get` and the `apoc.vectordb.<type>.query` procedures can have these additional parameters: | ||
|
||
.embeddingConfig parameters | ||
|
||
|=== | ||
| key | description | ||
| mapping | to fetch the associated entities and optionally create them. See examples below. | ||
| allResults | if true, returns the vector, metadata and text (if present), otherwise returns null values for those columns. | ||
| vectorKey, metadataKey, scoreKey, textKey | used with the `apoc.vectordb.custom.get` procedure. | ||
To let the procedure know which key in the restAPI (if present) corresponds to the one that should be populated as respectively the vector/metadata/score/text result. | ||
Defaults are "vector", "metadata", "score", "text". | ||
See examples below. | ||
|=== | ||
|
||
|
||
== Ad-hoc procedures | ||
|
||
See the following pages for more details on specific vector db procedures | ||
|
||
- xref:./qdrant.adoc[Qdrant] | ||
- xref:./chroma.adoc[ChromaDB] | ||
- xref:./weaviate.adoc[Weaviate] | ||
|
||
|
||
== Store Vector db info (i.e. `apoc.vectordb.configure`) | ||
|
||
We can save some info in the System Database to be reused later, that is the host, login credentials, and mapping, | ||
to be used in `*.get` and `.*query` procedures, except for the `apoc.vectordb.custom.get` one. | ||
|
||
Therefore, to store the vector info, we can execute the `CALL apoc.vectordb.configure(vectorName, keyConfig, databaseName, $configMap)`, | ||
where `vectorName` can be "QDRANT", "CHROMA" or "WEAVIATE", | ||
that indicates info to be reused respectively by `apoc.vectordb.qdrant.*`, `apoc.vectordb.chroma.*` and `apoc.vectordb.weaviate.*`. | ||
|
||
Then `keyConfig` is the configuration name, `databaseName` is the database where the config will be set, | ||
|
||
and finally the `configMap`, that can have: | ||
|
||
- `host` is the host base name | ||
- `credentialsValue` is the API key | ||
- `mapping` is a map that can be used by the `apoc.vectordb.\*.getAndUpdate` and `apoc.vectordb.*.queryAndUpdate` procedures | ||
|
||
NOTE:: this procedure is only executable by a user with admin permissions and against the system database | ||
|
||
For example: | ||
[source,cypher] | ||
---- | ||
// -- within the system database or using the Cypher clause `USE SYSTEM ..` as a prefix | ||
CALL apoc.vectordb.configure('QDRANT', 'qdrant-config-test', 'neo4j', | ||
{ | ||
mapping: { embeddingKey: "vect", nodeLabel: "Test", entityKey: "myId", metadataKey: "foo" }, | ||
host: 'custom-host-name', | ||
credentials: '<apiKey>' | ||
} | ||
) | ||
---- | ||
|
||
and then we can execute e.g. the following procedure (within the `neo4j` database): | ||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.qdrant.query('qdrant-config-test', 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5) | ||
---- | ||
|
||
instead of: | ||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.qdrant.query($host, 'test_collection', [0.2, 0.1, 0.9, 0.7], {}, 5, | ||
{ mapping: { | ||
embeddingKey: "vect", | ||
nodeLabel: "Test", | ||
entityKey: "myId", | ||
metadataKey: "foo" | ||
}, | ||
headers: {Authorization: 'Bearer <apiKey>'}, | ||
endpoint: 'custom-host-name' | ||
}) | ||
---- | ||
|
Oops, something went wrong.