Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOID] Fixes #3971: Check how to integrate vector databases via rest APIs (#4059) #4237

Merged
merged 6 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions LICENSES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ Apache-2.0
curator-client-5.2.0.jar
curator-framework-5.2.0.jar
curator-recipes-5.2.0.jar
docker-java-api-3.2.13.jar
docker-java-transport-3.2.13.jar
docker-java-transport-zerodep-3.2.13.jar
docker-java-api-3.3.6.jar
docker-java-transport-3.3.6.jar
docker-java-transport-zerodep-3.3.6.jar
ehcache-3.3.1.jar
error_prone_annotations-2.18.0.jar
failureaccess-1.0.1.jar
Expand Down Expand Up @@ -133,6 +133,7 @@ Apache-2.0
jffi-1.2.16-native.jar
jffi-1.2.16.jar
jmespath-java-1.12.770.jar
jna-5.13.0.jar
jna-5.9.0.jar
jnr-constants-0.9.9.jar
jnr-ffi-2.1.7.jar
Expand Down Expand Up @@ -3045,6 +3046,7 @@ MIT
bcutil-jdk18on-1.78.jar
cassandra-1.17.6.jar
checker-qual-3.42.0.jar
chromadb-1.19.7.jar
couchbase-1.17.6.jar
database-commons-1.17.6.jar
duct-tape-1.0.8.jar
Expand All @@ -3062,12 +3064,14 @@ MIT
mysql-1.17.6.jar
neo4j-1.17.6.jar
postgresql-1.17.6.jar
qdrant-1.19.7.jar
reactive-streams-1.0.4.jar
slf4j-api-1.7.36.jar
slf4j-api-2.0.11.jar
slf4j-nop-1.7.30.jar
slf4j-reload4j-1.7.36.jar
testcontainers-1.17.6.jar
testcontainers-1.19.7.jar
weaviate-1.19.7.jar
------------------------------------------------------------------------------

The MIT License
Expand Down
13 changes: 9 additions & 4 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ Apache-2.0
curator-client-5.2.0.jar
curator-framework-5.2.0.jar
curator-recipes-5.2.0.jar
docker-java-api-3.2.13.jar
docker-java-transport-3.2.13.jar
docker-java-transport-zerodep-3.2.13.jar
docker-java-api-3.3.6.jar
docker-java-transport-3.3.6.jar
docker-java-transport-zerodep-3.3.6.jar
ehcache-3.3.1.jar
error_prone_annotations-2.18.0.jar
failureaccess-1.0.1.jar
Expand Down Expand Up @@ -163,6 +163,7 @@ Apache-2.0
jffi-1.2.16-native.jar
jffi-1.2.16.jar
jmespath-java-1.12.770.jar
jna-5.13.0.jar
jna-5.9.0.jar
jnr-constants-0.9.9.jar
jnr-ffi-2.1.7.jar
Expand Down Expand Up @@ -434,6 +435,7 @@ LGPL 2.1
javassist-3.25.0-GA.jar

LGPL-2.1-or-later
jna-5.13.0.jar
jna-5.9.0.jar

MIT
Expand All @@ -445,6 +447,7 @@ MIT
bcutil-jdk18on-1.78.jar
cassandra-1.17.6.jar
checker-qual-3.42.0.jar
chromadb-1.19.7.jar
couchbase-1.17.6.jar
database-commons-1.17.6.jar
duct-tape-1.0.8.jar
Expand All @@ -462,12 +465,14 @@ MIT
mysql-1.17.6.jar
neo4j-1.17.6.jar
postgresql-1.17.6.jar
qdrant-1.19.7.jar
reactive-streams-1.0.4.jar
slf4j-api-1.7.36.jar
slf4j-api-2.0.11.jar
slf4j-nop-1.7.30.jar
slf4j-reload4j-1.7.36.jar
testcontainers-1.17.6.jar
testcontainers-1.19.7.jar
weaviate-1.19.7.jar

MPL 1.1
javassist-3.25.0-GA.jar
Expand Down
3 changes: 2 additions & 1 deletion core/src/main/java/apoc/SystemLabels.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ public enum SystemLabels implements Label {
ApocUuidMeta,
ApocTriggerMeta,
ApocTrigger,
DataVirtualizationCatalog
DataVirtualizationCatalog,
VectorDb
}
6 changes: 5 additions & 1 deletion core/src/main/java/apoc/SystemPropertyKeys.java
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,9 @@ public enum SystemPropertyKeys {
label,
addToSetLabel,
addToExistingNodes,
propertyName;
propertyName,

// vector db
host,
credentials
}
49 changes: 49 additions & 0 deletions core/src/main/java/apoc/util/Util.java
Original file line number Diff line number Diff line change
Expand Up @@ -1314,4 +1314,53 @@ public static ConstraintCategory getConstraintCategory(ConstraintType type) {
return ConstraintCategory.NODE;
}
}

public static void setProperties(Entity entity, Map<String, Object> props) {
for (var entry : props.entrySet()) {
entity.setProperty(entry.getKey(), entry.getValue());
}
}
/**
* Transform a list like: [ {key1: valueFoo1, key2: valueFoo2}, {key1: valueBar1, key2: valueBar2} ]
* to a map like: { keyNew1: [valueFoo1, valueBar1], keyNew2: [valueFoo2, valueBar2] },
*
* where mapKeys is e.g. {key1: keyNew1, key2: keyNew2}
*/
public static Map<Object, List> listOfMapToMapOfLists(Map mapKeys, List<Map<String, Object>> vectors) {
Map<Object, List> additionalBodies = new HashMap();
for (var vector : vectors) {
mapKeys.forEach((from, to) -> {
mapEntryToList(additionalBodies, vector, from, to);
});
}
return additionalBodies;
}

private static void mapEntryToList(
Map<Object, List> map, Map<String, Object> vector, Object keyFrom, Object keyTo) {
Object item = vector.get(keyFrom);
if (item == null) {
return;
}

map.compute(keyTo, (k, v) -> {
if (v == null) {
List<Object> list = new ArrayList<>();
list.add(item);
return list;
}
v.add(item);
return v;
});
}

public static float[] listOfNumbersToFloatArray(List<? extends Number> embedding) {
float[] floats = new float[embedding.size()];
int i = 0;
for (var item : embedding) {
floats[i] = item.floatValue();
i++;
}
return floats;
}
}
1 change: 1 addition & 0 deletions docs/asciidoc/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ include::partial$generated-documentation/nav.adoc[]
** xref::database-integration/bolt-neo4j.adoc[]
** xref::database-integration/load-ldap.adoc[]
** xref::database-integration/redis.adoc[]
** xref:database-integration/vectordb/index.adoc[]

* xref:graph-updates/index.adoc[]
** xref::graph-updates/data-creation.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ For more information on how to use these procedures, see:
* xref::database-integration/bolt-neo4j.adoc[]
* xref::database-integration/load-ldap.adoc[]
* xref::database-integration/redis.adoc[]
* xref:database-integration/vectordb/index.adoc[]

Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@

== ChromaDB

Here is a list of all available ChromaDB procedures,
note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/api/v1/collections`.
| apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config) |
Deletes a collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>`.
| apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config) |
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/upsert`.
| apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config) |
Deletes the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/delete`.
| apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
| apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified `ids`, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
|===

where the 1st parameter can be a key defined by the apoc config `apoc.chroma.<key>.host=myHost`.
With hostOrKey=null, the default is 'http://localhost:8000'.

=== Examples

.Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
----


.Delete a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.deleteCollection($host, '<collection_id>', {<optional config>})
----


.Upsert vectors (it leverages https://docs.trychroma.com/usage-guide#adding-data-to-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.upsert($host, '<collection_id>',
[
{id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}, text: 'ajeje'},
{id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}, text: 'brazorf'}
],
{<optional config>})
----


.Get vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | null | null | null | null
| null | {city: "Berlin", foo: "two"} | null | null | null | null
| ...
|===


.Get vectors with `{allResults: true}`
[source,cypher]
----
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje | null
| null | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf | null
| ...
|===


.Query vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.query($host,
'<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{city: 'London'},
5,
{allResults: true, <optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text
| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje
| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf
| ...
|===


[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====


In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them,
by leveraging the vector metadata. For example:

.Query vectors
[source,cypher]
----
CALL apoc.vectordb.chroma.query($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----



.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.delete($host, '<collection_id>', [1,2], {<optional config>})
----

Loading
Loading