Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: inverted-index accelerate filters of the form j->0 @> '{"b": "c"} #96202

Merged
merged 1 commit into from
Feb 2, 2023

Conversation

Shivs11
Copy link
Contributor

@Shivs11 Shivs11 commented Jan 30, 2023

Previously, the optimizer did not plan inverted index scans for filters
having an integer as the index for the fetch value in a filter alongside
the "contains" or the "contained by" operator.

To address this, we now build JSON arrays from fetch value expressions
with integer indexes. From these JSON arrays, inverted spans are built
for constraining scans over inverted indexes. With these changes chains
of both integer and string fetch value operators are now supported
alongside the "contains" and the "contained by" operators.
(e.g., j->0 @> '{"b": "c"}' and j->0 <@ '{"b": "c"}').

Epic: CRDB-3301
Fixes: #94667

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->) with integer indices alongside the "contains" or
the "contained by" operators, e.g, json_col->0 @> '{"b": "c"}'
or json_col->0 <@ '{"b": "c"}'

@Shivs11 Shivs11 requested a review from a team as a code owner January 30, 2023 17:08
@Shivs11 Shivs11 requested a review from yuzefovich January 30, 2023 17:08
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from d73363c to a7b8179 Compare January 30, 2023 17:19
@Shivs11 Shivs11 requested review from a team as code owners January 30, 2023 17:19
@Shivs11 Shivs11 requested review from a team, smg260 and renatolabs and removed request for a team January 30, 2023 17:19
@Shivs11 Shivs11 marked this pull request as draft January 30, 2023 18:10
@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from a7b8179 to b3e0323 Compare January 30, 2023 18:17
@Shivs11 Shivs11 marked this pull request as ready for review January 30, 2023 18:18
@Shivs11 Shivs11 requested review from mgartner and rytaft and removed request for a team, smg260, renatolabs and yuzefovich January 30, 2023 18:18
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was quick! 🙂

Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)


-- commits line 20 at r1:
nit: also mention <@ (You've added support for both "contains" and "contained by")


pkg/sql/logictest/testdata/logic_test/inverted_index line 846 at r1 (raw file):

SELECT j FROM f@i WHERE j->0->1 @> '{"a": {"b": []}}'
----
[[{"a": {"b": []}}]]

This doesn't look right, since there is no element at j->0->1. Maybe you need to mark this expression as not tight?


pkg/sql/opt/xform/testdata/rules/select line 2928 at r1 (raw file):


# Generate an inverted scan when the index of the fetch val operator
# is an integer and there is a containment operator with a JSON object

nit: end comment with period (here and below)


pkg/sql/opt/xform/testdata/rules/select line 2972 at r1 (raw file):

# Generate an inverted scan when right side of the equality is an array.
opt expect=GenerateInvertedIndexScans
SELECT k FROM b WHERE j->'a' = '["b"]'

Did you mean to get rid of this test case and the one below?

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review as well!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index line 846 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This doesn't look right, since there is no element at j->0->1. Maybe you need to mark this expression as not tight?

Had a quick question about this - what does it mean by having no element at j->0->1 ? In my understanding, the answer given by the query, [[{"a": {"b": []}}]], satisfies the filter of j->0->1 @> '{"a": {"b": []}}' right?


pkg/sql/opt/xform/testdata/rules/select line 2972 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Did you mean to get rid of this test case and the one below?

I thought these test cases would be testing filters of the form: j->0 @> '"a"' right?

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/xform/testdata/rules/select line 2972 at r1 (raw file):

Previously, Shivs11 (Shivam) wrote…

I thought these test cases would be testing filters of the form: j->0 @> '"a"' right?

Ah, I seem to have misunderstood your question. I noticed that it has been removed, due to some reason.
Shall add them back since I think they are pretty important cases to consider. Thank you for the catch!

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @Shivs11)


pkg/sql/logictest/testdata/logic_test/inverted_index line 846 at r1 (raw file):

Previously, Shivs11 (Shivam) wrote…

Had a quick question about this - what does it mean by having no element at j->0->1 ? In my understanding, the answer given by the query, [[{"a": {"b": []}}]], satisfies the filter of j->0->1 @> '{"a": {"b": []}}' right?

I think it satisfies j->0->0 @> '{"a": {"b": []}}'


pkg/sql/opt/xform/testdata/rules/select line 2972 at r1 (raw file):

Previously, Shivs11 (Shivam) wrote…

I thought these test cases would be testing filters of the form: j->0 @> '"a"' right?

The new cases you added are good, but you removed some old ones (this one was j->'a' = '["b"]')

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/logictest/testdata/logic_test/inverted_index line 846 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I think it satisfies j->0->0 @> '{"a": {"b": []}}'

Oh yeah, my apologies it is indeed j->0->0 and not j->0->1. So in an ideal setting, the output of this query should be no rows right? The fact that this outputs an invalid row is concerning....


pkg/sql/opt/xform/testdata/rules/select line 2928 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: end comment with period (here and below)

Done!


pkg/sql/opt/xform/testdata/rules/select line 2972 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

The new cases you added are good, but you removed some old ones (this one was j->'a' = '["b"]')

Yep, I see the issue now. I have added them back and shall be pushing them right now. Thank you and sorry!

@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from b3e0323 to 208fba2 Compare January 30, 2023 20:33
Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


-- commits line 20 at r1:

Previously, rytaft (Rebecca Taft) wrote…

nit: also mention <@ (You've added support for both "contains" and "contained by")

Done!

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)

a discussion (no related file):
I just realized that I have forgotten to add code which shall set the generated inverted_expr to tight if keys are of the form of DInt. Thank you @rytaft for giving me that hint when a test case failed in one of my earlier submissions. Shall be adding code to now account for that!


@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from 208fba2 to 7ce0bca Compare January 30, 2023 22:27
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: just one minor nit below. Nice work!

Reviewed 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @rytaft, and @Shivs11)


pkg/sql/logictest/testdata/logic_test/inverted_index line 846 at r1 (raw file):

Previously, Shivs11 (Shivam) wrote…

Oh yeah, my apologies it is indeed j->0->0 and not j->0->1. So in an ideal setting, the output of this query should be no rows right? The fact that this outputs an invalid row is concerning....

Looks correct now


pkg/sql/opt/invertedidx/json_array_test.go line 603 at r3 (raw file):

			tight:            false,
			unique:           true,
			remainingFilters: `j->0 @> '[1,2]'`,

This is a duplicate test case

Copy link
Contributor Author

@Shivs11 Shivs11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @rytaft)


pkg/sql/opt/invertedidx/json_array_test.go line 603 at r3 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This is a duplicate test case

Removed it in the latest push! Thank you!

@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from 7ce0bca to 6a3c3e3 Compare January 31, 2023 15:17
Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! :lgtm:

Reviewed 1 of 4 files at r1, 3 of 4 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @Shivs11)


pkg/sql/opt/invertedidx/json_array_test.go line 566 at r4 (raw file):

		},
		{
			filters:          "j->0 @> '[1,2]'",

nit: A single element array might be a good test, like `j->0 @> '[1]'.

@Shivs11
Copy link
Contributor Author

Shivs11 commented Jan 31, 2023

bors r+

@craig
Copy link
Contributor

craig bot commented Jan 31, 2023

Build failed (retrying...):

@renatolabs
Copy link
Contributor

bors r-

See failures in Essential CI.

@craig
Copy link
Contributor

craig bot commented Jan 31, 2023

Canceled.

Previously, the optimizer did not plan inverted index scans for filters
having an integer as the index for the fetch value in a filter alongside
the "contains" or the "contained by" operator.

To address this, we now build JSON arrays from fetch value expressions
with integer indexes. From these JSON arrays, inverted spans are built
for constraining scans over inverted indexes. With these changes chains
of both integer and string fetch value operators are now supported
alongside the "contains" and the "contained by" operators.
(e.g., j->0 @> '{"b": "c"}' and j->0 <@ '{"b": "c"}').

Epic: CRDB-3301
Fixes: cockroachdb#94667

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->) with integer indices alongside the "contains" or
the "contained by" operators, e.g, json_col->0 @> '{"b": "c"}'
or json_col->0 <@ '{"b": "c"}'
@Shivs11 Shivs11 force-pushed the inverted_index_containment branch from 6a3c3e3 to 3ed8616 Compare February 1, 2023 17:54
@Shivs11
Copy link
Contributor Author

Shivs11 commented Feb 1, 2023

bors r+

craig bot pushed a commit that referenced this pull request Feb 1, 2023
95622: backupccl,storage: add logs around manifest handling and ExportRequest pagination r=stevendanna a=adityamaru

backupccl: add logging to backup manifest handling

Release note: None

storage: log the ExportRequest pagination reason

Release note: None

Epic: None

95865: cmd/roachtest: adapt disk-stall detection roachtest r=nicktrav,erikgrinaker a=jbowens

Move the existing disk-stall/* roachtests under disk-stall/fuse/* (for the FUSE
filesystem approach to stalling) and skip them for now. Currently, they're not
capable of stalling the disk longer 50us (see #95886), which makes them
unreliable at exercising stalls.

Add two new roachtests, disk-stall/dmsetup and disk-stall/cgroup that use
dmsetup and cgroup bandwidth restrctions respectively to reliably induce a
write stall for an indefinite duration.

Informs #94373.
Epic: None
Release note: None

95999: multitenant: add multitenant/shared-process/basic roachtest r=stevendanna a=msbutler

This patch introduces a simple roachtest that runs in a shared-process tenant.
This test imports a 500 tpcc workload (about 30 GB of replicated data), and
runs the workload for 10 minutes. The test is run on a 4 node, 4vcpu cluster
with local ssds.

A future patch could complicate the test by running schema changes or other
bulk operations.

Fixes #95990

Release note: None

96115: schemachanger: Implement `DROP CONSTRAINT` in declarative schema changer r=Xiang-Gu a=Xiang-Gu

This PR implements `ALTER TABLE t DROP CONSTRAINT cons_name` in declarative schema changer.

Supported constraints include Checks, FK, and UniqueWithoutIndex.

Dropping PK or Unique constraints will fall back to legacy schema changer, which in turn spits out an "not supported yet" error.

Epic: None

96202: opt: inverted-index accelerate filters of the form j->0 @> '{"b": "c"} r=Shivs11 a=Shivs11

Previously, the optimizer did not plan inverted index scans for filters
having an integer as the index for the fetch value in a filter alongside
the "contains" or the "contained by" operator.

To address this, we now build JSON arrays from fetch value expressions
with integer indexes. From these JSON arrays, inverted spans are built
for constraining scans over inverted indexes. With these changes chains
of both integer and string fetch value operators are now supported
alongside the "contains" and the "contained by" operators.
(e.g., j->0 `@>` '{"b": "c"}' and j->0 <@ '{"b": "c"}').

Epic: [CRDB-3301](https://cockroachlabs.atlassian.net/browse/CRDB-3301)
Fixes: #94667

Release note (performance improvement): The optimizer now plans
inverted index scans for queries that filter by JSON fetch value
operators (->) with integer indices alongside the "contains" or
the "contained by" operators, e.g, json_col->0 `@>` '{"b": "c"}'
or json_col->0 <@ '{"b": "c"}'

96235: sem/tree: add support for producing vectorized data from strings r=cucaroach a=cucaroach

tree.ValueHandler exposes raw machine type hooks that are used by
vec_handler to build coldata.Vec's.

Epic: CRDB-18892
Informs: #91831
Release note: None


96328: udf: allow strict UDF with no arguments r=DrewKimball a=DrewKimball

This patch fixes the case when a strict UDF (returns null on null input) has no arguments. Previously, attempting to call such a function would result in `ERROR: reflect: call of reflect.Value.Pointer on zero Value`.

Fixes #96326

Release note: None

96366: release: skip nil GitHub events r=celiala a=rail

Previously, we referenced `*event.Event`, but in some cases the event objects are `nil`.

This PR skips the nil GitHub event objects.

Epic: none
Release note: None

Co-authored-by: adityamaru <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Shivam Saraf <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
Co-authored-by: Drew Kimball <[email protected]>
Co-authored-by: Rail Aliiev <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 1, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 2, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: inverted-index accelerate filters of the form j->0 @> '{"b": "c"}
5 participants