Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add schema support for vector indexing in CREATE TABLE #140090

Merged
merged 1 commit into from
Feb 3, 2025

Conversation

andy-kimball
Copy link
Contributor

Support usage of VECTOR INDEX in a CREATE TABLE statement, e.g.:

CREATE TABLE simple (
a INT PRIMARY KEY,
vec1 VECTOR(3),
VECTOR INDEX (vec1)
)

Create the corresponding table and index schema objects for the vector index. Check various error conditions, e.g. that only a column of type VECTOR can be the last column in the index. Add unit and logic tests. CREATE VECTOR INDEX support will come in a future PR.

Epic: CRDB-42943

Release note: None

@andy-kimball andy-kimball requested review from a team as code owners January 30, 2025 03:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We probably want Foundations to take a look as well before merging, but :lgtm:

Reviewed 38 of 38 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @rafiss)


pkg/sql/catalog/catformat/index.go line 263 at r1 (raw file):

		// direction. Since the default direction is ASC, we omit the direction
		// entirely for inverted/vector index columns.
		if i < n-1 || index.Type.AllowExplicitDirection() {

Maybe this method should specify that it has to do with the last column only. LastKeyColIsOrdered maybe?


pkg/sql/sem/idxtype/idxtype.go line 67 at r1 (raw file):

// Panics if this is called for a forward index, as "forward" is not a term we
// should be including in error messages, as most users will not understand it.
func ErrorText(t T) string {

Let's make this return redact.SafeString so that it doesn't get redacted.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @rafiss)


pkg/sql/catalog/catformat/index.go line 263 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Maybe this method should specify that it has to do with the last column only. LastKeyColIsOrdered maybe?

I changed it to be HasLinearOrdering:

// HasLinearOrdering is true if this index type does not define a linear
// ordering on the last key column in the index. For example, a vector index
// groups nearby vectors, but does not define a linear ordering among them. As
// another example, an inverted index only defines a linear ordering for tokens,
// not for the original JSONB or ARRAY data type.

pkg/sql/sem/idxtype/idxtype.go line 67 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Let's make this return redact.SafeString so that it doesn't get redacted.

Done.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! my comments are pretty minor

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @DrewKimball)


pkg/sql/sem/idxtype/idxtype.go line 70 at r3 (raw file):

//
// Panics if this is called for a forward index, as "forward" is not a term we
// should be including in error messages, as most users will not understand it.

super nit: how did we determine this should be a function of the package, rather than a method with a T receiver?


pkg/sql/sem/idxtype/idxtype.go line 78 at r3 (raw file):

		return "a vector index"
	default:
		panic(errors.AssertionFailedf("ErrorText is not defined for index type %v", t))

is a panic that might crash the node the best we can do here? what if we had this return something like redact.Sprintf("an unknown index type (%v)", ...)

i noticed that some the other switch statements in this PR did not panic in the default case.


pkg/sql/catalog/tabledesc/structured.go line 1098 at r3 (raw file):

	// cannot be indexed.
	var typInfo string
	msg := "the following columns are not indexable due to their type: "

let's craft the message using redact.StringBuilder, otherwise the entire message will get redacted.


pkg/sql/catalog/catformat/index.go line 252 at r3 (raw file):

		}
		// TODO(drewk): we might need to print something like "vector_l2_ops" for
		// vector indexes.

out of curiosity; how can we determine if we need this?

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rafiss)


pkg/sql/catalog/catformat/index.go line 252 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

out of curiosity; how can we determine if we need this?

I think we'll know more when we add support for other distance functions like cosine and inner product.


pkg/sql/catalog/tabledesc/structured.go line 1098 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

let's craft the message using redact.StringBuilder, otherwise the entire message will get redacted.

I just eliminated this code entirely. Yes, the error message is a bit better for this case, but it's not being done the same way in the schema changer code, so it was causing failures when I tried to unify the error checking in colinfo.ValidateColumnForIndex.


pkg/sql/sem/idxtype/idxtype.go line 70 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

super nit: how did we determine this should be a function of the package, rather than a method with a T receiver?

I feel like this is a special-case utility function rather than a core part of what an index type is.


pkg/sql/sem/idxtype/idxtype.go line 78 at r3 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is a panic that might crash the node the best we can do here? what if we had this return something like redact.Sprintf("an unknown index type (%v)", ...)

i noticed that some the other switch statements in this PR did not panic in the default case.

I just replaced it with the string "an index" - this should never get called anyways. As an aside, what's the right way to get redact.SafeString from redact.RedactableString, or were you suggesting here that we return redact.RedactableString?

@andy-kimball andy-kimball force-pushed the create-table branch 2 times, most recently from d661c24 to 6c0a17d Compare January 31, 2025 14:58
Support usage of VECTOR INDEX in a CREATE TABLE statement, e.g.:

  CREATE TABLE simple (
    a INT PRIMARY KEY,
    vec1 VECTOR(3),
    VECTOR INDEX (vec1)
  )

Create the corresponding table and index schema objects for the vector
index. Check various error conditions, e.g. that only a column of type
VECTOR can be the last column in the index. Add unit and logic tests.
CREATE VECTOR INDEX support will come in a future PR.

Epic: CRDB-42943

Release note: None

Co-authored-by: Drew Kimball <[email protected]>
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @DrewKimball)


pkg/sql/sem/idxtype/idxtype.go line 78 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I just replaced it with the string "an index" - this should never get called anyways. As an aside, what's the right way to get redact.SafeString from redact.RedactableString, or were you suggesting here that we return redact.RedactableString?

yeah, i was proposing the usage of RedactableString. though i guess under my suggestion, the format arg is safe to, so would have been safe to cast the whole thing to SafeString.

@andy-kimball
Copy link
Contributor Author

bors r+

@andy-kimball
Copy link
Contributor Author

TFTR, @rafiss, @DrewKimball

@craig craig bot merged commit 95d43db into cockroachdb:master Feb 3, 2025
21 of 22 checks passed
@andy-kimball andy-kimball deleted the create-table branch February 4, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants