Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update columns' order in TokenIndexEntity multi-column index #2729

Merged
merged 3 commits into from
Nov 27, 2024

Conversation

LZRS
Copy link
Collaborator

@LZRS LZRS commented Nov 20, 2024

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Description
Updates the order of columns in the multi-column index used in the TokenIndexEntity entity table to have index_value as the leftmost column.
Also removes the index_system column from the index since it would never get used because the queries generated use the IFNULL statement that forces evaluation from the actual rows.

With an example query

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                             (index_value = 'in-progress' AND
                              IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status')));

Previous query plan, was

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

The updated query plan would be

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--MULTI-INDEX OR
      |--INDEX 1
      |  `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?)
      `--INDEX 2
         `--SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_index_value_resourceType_index_name_resourceUuid (index_value=? AND resourceType=? AND index_name=?) 

Type
Feature

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@LZRS
Copy link
Collaborator Author

LZRS commented Nov 20, 2024

Running query

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (index_value = 'ready' AND IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status') OR
                             (index_value = 'in-progress' AND
                              IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status')));

in a database with

  • 125206 ResourceEntity rows
  • 109654 Task ResourceEntity rows
  • 1223741 TokenIndexEntity rows
  • 999355 Task TokenIndexEntity rows
  • 54753 Task index_namestatus TokenIndexEntity rows

took 0.749 seconds vs 2.354 seconds from previous index

@LZRS LZRS marked this pull request as ready for review November 20, 2024 18:11
@LZRS LZRS requested a review from a team as a code owner November 20, 2024 18:11
@LZRS LZRS requested a review from ktarasenko November 20, 2024 18:11
@jingtang10
Copy link
Collaborator

hey what if you change this line:

"index_value = ? ${ if (it.uri.isNullOrBlank()) "" else "AND IFNULL(index_system,'') = ?" }",

to:

"${ if (it.uri.isNullOrBlank()) "" else "IFNULL(index_system,'') = ? AND " }index_value = ?",

wouldn't that save you from changing the index?

also - i'm not sure if we should allow empty system at all? shouldn't we always look for a code system? so that even if it's empty we should be checking taht it's an empty code system. that way, even if your query doesn't include a code system we'll still be hitting the index.

@LZRS
Copy link
Collaborator Author

LZRS commented Nov 21, 2024

hey what if you change this line:

"index_value = ? ${ if (it.uri.isNullOrBlank()) "" else "AND IFNULL(index_system,'') = ?" }",

to:

"${ if (it.uri.isNullOrBlank()) "" else "IFNULL(index_system,'') = ? AND " }index_value = ?",

wouldn't that save you from changing the index?

also - i'm not sure if we should allow empty system at all? shouldn't we always look for a code system? so that even if it's empty we should be checking taht it's an empty code system. that way, even if your query doesn't include a code system we'll still be hitting the index.

Changing that line the query, would probably look like this

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (
                             (IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status' AND index_value = 'ready') OR
                             (IFNULL(index_system, '') = 'http://hl7.org/fhir/task-status' AND index_value = 'in-progress')));

whereby the query plan still doesn't change

QUERY PLAN
|--SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
`--LIST SUBQUERY 1
   `--SEARCH TokenIndexEntity USING COVERING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value_resourceUuid (resourceType=? AND index_name=?)

The order of columns in the query doesn't need to match the way they've been defined in the index. I think it checks that leftmost column of an index exists within the WHERE constraints, and if there is an conflict it checks for the next column in the index...that is until, it either encounters a inequality comparison, a missing column constraint, or a column with function evaluation unless the indexed column is a function evaluation. To still have the index_system as part of the index, we can probably define it in the index as IFNULL(index_system, '')

Index(value = ["index_value", "resourceType", "index_name", "IFNULL(index_system, '')" "resourceUuid"]),

For queries that wouldn't have index_system, it would be a little in-efficient since it won't return resourceUuid directly from the covering index

I think it should still be okay to search without the system in the query. Assuming for the above query, it should still be okay to search

SELECT a.*
FROM ResourceEntity a
WHERE a.resourceType = 'Task'
  AND a.resourceUuid IN (SELECT resourceUuid
                         FROM TokenIndexEntity
                         WHERE resourceType = 'Task'
                           AND index_name = 'status'
                           AND (index_value = 'ready'  OR index_value = 'in-progress'));

to cover for usecases that may require searching using index_sytem without index_value
@jingtang10 jingtang10 merged commit f684adb into google:master Nov 27, 2024
6 checks passed
@jingtang10 jingtang10 deleted the optimize-token-index branch November 28, 2024 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Complete
Development

Successfully merging this pull request may close these issues.

3 participants