Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty vectors while scrolling unnamed vector collection #434

Closed
hdmi opened this issue Jan 12, 2024 · 6 comments
Closed

Empty vectors while scrolling unnamed vector collection #434

hdmi opened this issue Jan 12, 2024 · 6 comments

Comments

@hdmi
Copy link

hdmi commented Jan 12, 2024

Hi,

This might be a bug, whenever I scroll a collection with named vectors, the records came with an empty vector.

Additionally to the values of the vector in the Record which are completely wrong.

POC

def testScroll():

    localClient = QdrantClient(path="asdf")

    localClient.recreate_collection(
        collection_name="namedCol",
        vectors_config={
                    "text": models.VectorParams(size=4, distance=models.Distance.COSINE)
        },
    )
                
    localClient.recreate_collection(
        collection_name="unnamedCol",
        vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE),
    )

    points = [
        PointStruct(id=1, vector=[0.24, 0.18, 0.22, 0.44], payload={"something": "asdf"}),
        PointStruct(id=2, vector=[0.54, 1.18, -0.42, 0.94], payload={"something": "asdf"}),
    ]

    localClient.upsert(
        collection_name="namedCol",
        points=points
    )


    localClient.upsert(
        collection_name="unnamedCol",
        points=points
    )

    recNamed, nextOff1 = localClient.scroll("namedCol", with_payload=True, with_vectors=True)
    recUnnamed, nextOff1 = localClient.scroll("unnamedCol", with_payload=True, with_vectors=True)

    for r in recNamed:
        print("Scroll named", r.id, r.payload, r.vector)

    colN = CollectionPersistence(f"asdf/collection/namedCol")
    for point in colN.load():
        print("CollectionPersistence named", point.id, point.payload, point.vector)

    for r in recUnnamed:
        print("Scroll unnamed", r.id, r.payload, r.vector)

    colU = CollectionPersistence(f"asdf/collection/unnamedCol")
    for point in colU.load():
        print("CollectionPersistence unnamed", point.id, point.payload, point.vector)

testScroll()

Output

Scroll named 1 {'something': 'asdf'} {}
Scroll named 2 {'something': 'asdf'} {}
CollectionPersistence named 1 {'something': 'asdf'} [0.24, 0.18, 0.22, 0.44]
CollectionPersistence named 2 {'something': 'asdf'} [0.54, 1.18, -0.42, 0.94]

Scroll unnamed 1 {'something': 'asdf'} [0.41652607917785645, 0.31239455938339233, 0.3818155825138092, 0.7636311650276184]
Scroll unnamed 2 {'something': 'asdf'} [0.3259880840778351, 0.7123442888259888, -0.25354626774787903, 0.5674607157707214]
CollectionPersistence unnamed 1 {'something': 'asdf'} [0.24, 0.18, 0.22, 0.44]
CollectionPersistence unnamed 2 {'something': 'asdf'} [0.54, 1.18, -0.42, 0.94]
@joein
Copy link
Member

joein commented Jan 12, 2024

Hi

First thing that I see is that there should've been a error on

    localClient.upsert(
        collection_name="namedCol",
        points=points
    )

Because it should not be possible to upsert vectors which were not described during collection creating

Regarding the "wrong" records, they are not actually wrong, but the values were normalized.
Qdrant normalize vectors when the distance is set to models.Distance.COSINE

You can change it to models.Distance.DOT to compare results.

@joein
Copy link
Member

joein commented Jan 12, 2024

To insert a named vector your PointStruct should look like

PointStruct(id=1, vector={"text": [0.24, 0.18, 0.22, 0.44]}, payload={"something": "asdf"}),

@joein
Copy link
Member

joein commented Jan 12, 2024

I have updated #432 to address the absence of an exception

@hdmi
Copy link
Author

hdmi commented Jan 12, 2024

I see, so the protection should be done in the upsert side.

I was doing the scroll because I wanted to migrate a local database to a remote database, for doing so I could use upsert or upload_records.

And for getting the Records I was using scroll. Now I fear that if Records are normalised before giving it to me, when I upload those records the raw data in the remote collection are going to be stored normalised which it is a function of the source collection configuration (models.Distance.COSINE)

Is there a way of getting the raw vectors from the scroll records? or should I proceed by using the CollectionPersistence and make the points and upserts to the remote server from there?

I am trying to find the most efficient way of pouring one local database into a remote one.

@joein
Copy link
Member

joein commented Jan 12, 2024

No, there is currently no way to get the raw vectors

Why do you need raw vectors?

There is no other way to migrate from local mode to remote

We've implemented it in migrate method of QdrantClient:
https://github.com/qdrant/qdrant-client/blob/efb876fe3915dc5e2855f60a5617e940c84591e5/qdrant_client/qdrant_client.py#L2224C6-L2224C6

@joein
Copy link
Member

joein commented Jan 19, 2024

try with qdrant-client==1.7.1

@joein joein closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants