-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vecindex: add support for vector index prefix columns #142050
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
7d0bf2d
to
8c8ea92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 24 of 24 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mw5h)
89eb0af
to
4ddffbc
Compare
The CREATE VECTOR INDEX syntax allows indexing over multiple columns, as long as the vector column to be indexed is the last column in the index definition. The other "prefix" columns can be used to partition the index by tenants, regions, users, etc. The execution engine encodes prefix columns as a byte slice and passes it as a parameter to vector index operations like Insert and Search. While the index itself treats these bytes as an opaque "TreeKey", the CRDB Store implementation incorporates these prefix bytes into KV keys. This prefixing mechanism has the effect of separating the index into distinct K-means trees, each identified by a unique TreeKey. CRDB partitioning can control where those trees are located, e.g. an app that stores indexed user photo embeddings in a region close to them. Epic: CRDB-42943 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mw5h)
bors r=drewkimball |
The CREATE VECTOR INDEX syntax allows indexing over multiple columns, as long as the vector column to be indexed is the last column in the index definition. The other "prefix" columns can be used to partition the index by tenants, regions, users, etc. The execution engine encodes prefix columns as a byte slice and passes it as a parameter to vector index operations like Insert and Search. While the index itself treats these bytes as an opaque "TreeKey", the CRDB Store implementation incorporates these prefix bytes into KV keys.
This prefixing mechanism has the effect of separating the index into distinct K-means trees, each identified by a unique TreeKey. CRDB partitioning can control where those trees are located, e.g. an app that stores indexed user photo embeddings in a region close to them.
Epic: CRDB-42943
Release note: None