Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vecindex: add support for vector index prefix columns #142050

Merged
merged 1 commit into from
Mar 1, 2025

Conversation

andy-kimball
Copy link
Contributor

The CREATE VECTOR INDEX syntax allows indexing over multiple columns, as long as the vector column to be indexed is the last column in the index definition. The other "prefix" columns can be used to partition the index by tenants, regions, users, etc. The execution engine encodes prefix columns as a byte slice and passes it as a parameter to vector index operations like Insert and Search. While the index itself treats these bytes as an opaque "TreeKey", the CRDB Store implementation incorporates these prefix bytes into KV keys.

This prefixing mechanism has the effect of separating the index into distinct K-means trees, each identified by a unique TreeKey. CRDB partitioning can control where those trees are located, e.g. an app that stores indexed user photo embeddings in a region close to them.

Epic: CRDB-42943

Release note: None

Copy link

blathers-crl bot commented Feb 26, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andy-kimball andy-kimball force-pushed the prefix branch 3 times, most recently from 7d0bf2d to 8c8ea92 Compare February 28, 2025 00:23
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 24 of 24 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mw5h)

@andy-kimball andy-kimball force-pushed the prefix branch 4 times, most recently from 89eb0af to 4ddffbc Compare March 1, 2025 07:36
@andy-kimball andy-kimball requested a review from a team as a code owner March 1, 2025 07:36
The CREATE VECTOR INDEX syntax allows indexing over multiple columns,
as long as the vector column to be indexed is the last column in the
index definition. The other "prefix" columns can be used to partition
the index by tenants, regions, users, etc. The execution engine
encodes prefix columns as a byte slice and passes it as a parameter
to vector index operations like Insert and Search. While the index
itself treats these bytes as an opaque "TreeKey", the CRDB Store
implementation incorporates these prefix bytes into KV keys.

This prefixing mechanism has the effect of separating the index into
distinct K-means trees, each identified by a unique TreeKey. CRDB
partitioning can control where those trees are located, e.g. an app
that stores indexed user photo embeddings in a region close to them.

Epic: CRDB-42943

Release note: None
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 13 of 13 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mw5h)

@andy-kimball
Copy link
Contributor Author

bors r=drewkimball

@craig
Copy link
Contributor

craig bot commented Mar 1, 2025

@craig craig bot merged commit e527f6b into cockroachdb:master Mar 1, 2025
24 checks passed
@andy-kimball andy-kimball deleted the prefix branch March 1, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants