Skip to content

Commit

Permalink
[NOID] Fixes #3971: Check how to integrate vector databases via rest …
Browse files Browse the repository at this point in the history
…APIs (#4059) (#4237)

* [NOID] Fixes #3971: Check how to integrate vector databases via rest APIs (#4059)

* Fixes #3971: Check how to integrate vector databases via rest APIs

* fixed CI errors and removed unused imports

* Changes review: added weaviate db, removed vector idx autocreation and vector as a default result

* code clean

* Changes review: added systemdb store, removed constraint creation

* code clean

* 2nd changes review

* fixed qdrant filename typo and removed info procs from docs

* [NOID] solved compile error

* [NOID] spotless and licence changes

* [NOID] updated license files

* [NOID] various changes for 4.4

* [NOID] fixes tests and format-checks
  • Loading branch information
vga91 authored Nov 27, 2024
1 parent b753ce6 commit e571e6f
Show file tree
Hide file tree
Showing 47 changed files with 4,644 additions and 646 deletions.
12 changes: 8 additions & 4 deletions LICENSES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ Apache-2.0
curator-client-5.2.0.jar
curator-framework-5.2.0.jar
curator-recipes-5.2.0.jar
docker-java-api-3.2.13.jar
docker-java-transport-3.2.13.jar
docker-java-transport-zerodep-3.2.13.jar
docker-java-api-3.3.6.jar
docker-java-transport-3.3.6.jar
docker-java-transport-zerodep-3.3.6.jar
ehcache-3.3.1.jar
error_prone_annotations-2.18.0.jar
failureaccess-1.0.1.jar
Expand Down Expand Up @@ -133,6 +133,7 @@ Apache-2.0
jffi-1.2.16-native.jar
jffi-1.2.16.jar
jmespath-java-1.12.770.jar
jna-5.13.0.jar
jna-5.9.0.jar
jnr-constants-0.9.9.jar
jnr-ffi-2.1.7.jar
Expand Down Expand Up @@ -3045,6 +3046,7 @@ MIT
bcutil-jdk18on-1.78.jar
cassandra-1.17.6.jar
checker-qual-3.42.0.jar
chromadb-1.19.7.jar
couchbase-1.17.6.jar
database-commons-1.17.6.jar
duct-tape-1.0.8.jar
Expand All @@ -3062,12 +3064,14 @@ MIT
mysql-1.17.6.jar
neo4j-1.17.6.jar
postgresql-1.17.6.jar
qdrant-1.19.7.jar
reactive-streams-1.0.4.jar
slf4j-api-1.7.36.jar
slf4j-api-2.0.11.jar
slf4j-nop-1.7.30.jar
slf4j-reload4j-1.7.36.jar
testcontainers-1.17.6.jar
testcontainers-1.19.7.jar
weaviate-1.19.7.jar
------------------------------------------------------------------------------

The MIT License
Expand Down
13 changes: 9 additions & 4 deletions NOTICE.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,9 @@ Apache-2.0
curator-client-5.2.0.jar
curator-framework-5.2.0.jar
curator-recipes-5.2.0.jar
docker-java-api-3.2.13.jar
docker-java-transport-3.2.13.jar
docker-java-transport-zerodep-3.2.13.jar
docker-java-api-3.3.6.jar
docker-java-transport-3.3.6.jar
docker-java-transport-zerodep-3.3.6.jar
ehcache-3.3.1.jar
error_prone_annotations-2.18.0.jar
failureaccess-1.0.1.jar
Expand Down Expand Up @@ -163,6 +163,7 @@ Apache-2.0
jffi-1.2.16-native.jar
jffi-1.2.16.jar
jmespath-java-1.12.770.jar
jna-5.13.0.jar
jna-5.9.0.jar
jnr-constants-0.9.9.jar
jnr-ffi-2.1.7.jar
Expand Down Expand Up @@ -434,6 +435,7 @@ LGPL 2.1
javassist-3.25.0-GA.jar

LGPL-2.1-or-later
jna-5.13.0.jar
jna-5.9.0.jar

MIT
Expand All @@ -445,6 +447,7 @@ MIT
bcutil-jdk18on-1.78.jar
cassandra-1.17.6.jar
checker-qual-3.42.0.jar
chromadb-1.19.7.jar
couchbase-1.17.6.jar
database-commons-1.17.6.jar
duct-tape-1.0.8.jar
Expand All @@ -462,12 +465,14 @@ MIT
mysql-1.17.6.jar
neo4j-1.17.6.jar
postgresql-1.17.6.jar
qdrant-1.19.7.jar
reactive-streams-1.0.4.jar
slf4j-api-1.7.36.jar
slf4j-api-2.0.11.jar
slf4j-nop-1.7.30.jar
slf4j-reload4j-1.7.36.jar
testcontainers-1.17.6.jar
testcontainers-1.19.7.jar
weaviate-1.19.7.jar

MPL 1.1
javassist-3.25.0-GA.jar
Expand Down
3 changes: 2 additions & 1 deletion core/src/main/java/apoc/SystemLabels.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ public enum SystemLabels implements Label {
ApocUuidMeta,
ApocTriggerMeta,
ApocTrigger,
DataVirtualizationCatalog
DataVirtualizationCatalog,
VectorDb
}
6 changes: 5 additions & 1 deletion core/src/main/java/apoc/SystemPropertyKeys.java
Original file line number Diff line number Diff line change
Expand Up @@ -46,5 +46,9 @@ public enum SystemPropertyKeys {
label,
addToSetLabel,
addToExistingNodes,
propertyName;
propertyName,

// vector db
host,
credentials
}
49 changes: 49 additions & 0 deletions core/src/main/java/apoc/util/Util.java
Original file line number Diff line number Diff line change
Expand Up @@ -1314,4 +1314,53 @@ public static ConstraintCategory getConstraintCategory(ConstraintType type) {
return ConstraintCategory.NODE;
}
}

public static void setProperties(Entity entity, Map<String, Object> props) {
for (var entry : props.entrySet()) {
entity.setProperty(entry.getKey(), entry.getValue());
}
}
/**
* Transform a list like: [ {key1: valueFoo1, key2: valueFoo2}, {key1: valueBar1, key2: valueBar2} ]
* to a map like: { keyNew1: [valueFoo1, valueBar1], keyNew2: [valueFoo2, valueBar2] },
*
* where mapKeys is e.g. {key1: keyNew1, key2: keyNew2}
*/
public static Map<Object, List> listOfMapToMapOfLists(Map mapKeys, List<Map<String, Object>> vectors) {
Map<Object, List> additionalBodies = new HashMap();
for (var vector : vectors) {
mapKeys.forEach((from, to) -> {
mapEntryToList(additionalBodies, vector, from, to);
});
}
return additionalBodies;
}

private static void mapEntryToList(
Map<Object, List> map, Map<String, Object> vector, Object keyFrom, Object keyTo) {
Object item = vector.get(keyFrom);
if (item == null) {
return;
}

map.compute(keyTo, (k, v) -> {
if (v == null) {
List<Object> list = new ArrayList<>();
list.add(item);
return list;
}
v.add(item);
return v;
});
}

public static float[] listOfNumbersToFloatArray(List<? extends Number> embedding) {
float[] floats = new float[embedding.size()];
int i = 0;
for (var item : embedding) {
floats[i] = item.floatValue();
i++;
}
return floats;
}
}
1 change: 1 addition & 0 deletions docs/asciidoc/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ include::partial$generated-documentation/nav.adoc[]
** xref::database-integration/bolt-neo4j.adoc[]
** xref::database-integration/load-ldap.adoc[]
** xref::database-integration/redis.adoc[]
** xref:database-integration/vectordb/index.adoc[]
* xref:graph-updates/index.adoc[]
** xref::graph-updates/data-creation.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ For more information on how to use these procedures, see:
* xref::database-integration/bolt-neo4j.adoc[]
* xref::database-integration/load-ldap.adoc[]
* xref::database-integration/redis.adoc[]
* xref:database-integration/vectordb/index.adoc[]

Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@

== ChromaDB

Here is a list of all available ChromaDB procedures,
note that the list and the signature procedures are consistent with the others, like the Qdrant ones:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/api/v1/collections`.
| apoc.vectordb.chroma.deleteCollection(hostOrKey, collection, $config) |
Deletes a collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>`.
| apoc.vectordb.chroma.upsert(hostOrKey, collection, vectors, $config) |
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/upsert`.
| apoc.vectordb.chroma.delete(hostOrKey, collection, ids, $config) |
Deletes the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/delete`.
| apoc.vectordb.chroma.get(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.query(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
| apoc.vectordb.chroma.getAndUpdate(hostOrKey, collection, ids, $config) |
Gets the vectors with the specified `ids`, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/get`.
| apoc.vectordb.chroma.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors from the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/api/v1/collections/<collection param>/query`.
|===

where the 1st parameter can be a key defined by the apoc config `apoc.chroma.<key>.host=myHost`.
With hostOrKey=null, the default is 'http://localhost:8000'.

=== Examples

.Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
----


.Delete a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.deleteCollection($host, '<collection_id>', {<optional config>})
----


.Upsert vectors (it leverages https://docs.trychroma.com/usage-guide#adding-data-to-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.upsert($host, '<collection_id>',
[
{id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}, text: 'ajeje'},
{id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}, text: 'brazorf'}
],
{<optional config>})
----


.Get vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | null | null | null | null
| null | {city: "Berlin", foo: "two"} | null | null | null | null
| ...
|===


.Get vectors with `{allResults: true}`
[source,cypher]
----
CALL apoc.vectordb.chroma.get($host, '<collection_id>', ['1','2'], {<optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje | null
| null | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf | null
| ...
|===


.Query vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.query($host,
'<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{city: 'London'},
5,
{allResults: true, <optional config>}), text
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text
| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | ajeje
| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | brazorf
| ...
|===


[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the apoc.vectordb.chroma.query and the `apoc.vectordb.chroma.get` procedures.
For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"include": ["metadatas", "documents", "distances"]},
so that we do not return the other values that we do not need.
====


In the same way as other procedures, we can define a mapping, to fetch the associated nodes and relationships and optionally create them,
by leveraging the vector metadata. For example:

.Query vectors
[source,cypher]
----
CALL apoc.vectordb.chroma.query($host, '<collection_id>',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----



.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.delete($host, '<collection_id>', [1,2], {<optional config>})
----

Loading

0 comments on commit e571e6f

Please sign in to comment.