Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search method takes more time when load 7k household list #2040

Closed
mahmud6390 opened this issue Jun 15, 2023 · 13 comments · Fixed by #2063
Closed

Search method takes more time when load 7k household list #2040

mahmud6390 opened this issue Jun 15, 2023 · 13 comments · Fixed by #2063
Assignees

Comments

@mahmud6390
Copy link

mahmud6390 commented Jun 15, 2023

Uploaded 7k household to FHIR server with basic info like name and id. After initial syncing done it's take more time to load at pagination or search by name. It take 2 minute to load all household at SDK.
Like a query below
SELECT a.serializedResource FROM ResourceEntity a LEFT JOIN DateIndexEntity b ON a.resourceType = b.resourceType AND a.resourceUuid = b.resourceUuid AND b.index_name = '_lastUpdated' LEFT JOIN DateTimeIndexEntity c ON a.resourceType = c.resourceType AND a.resourceUuid = c.resourceUuid AND c.index_name = '_lastUpdated' WHERE a.resourceType = 'Group' AND a.resourceUuid IN ( SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Group' AND index_name = 'active' AND index_value = 'true' ) AND a.resourceUuid IN ( SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Group' AND index_name = 'type' AND (index_value = 'person' AND IFNULL(index_system,'') = 'http://hl7.org/fhir/group-type') ) AND a.resourceUuid IN ( SELECT resourceUuid FROM TokenIndexEntity WHERE resourceType = 'Group' AND index_name = 'code' AND (index_value = '35359004' AND IFNULL(index_system,'') = 'https://www.snomed.org') ) ORDER BY b.index_from DESC, c.index_from DESC LIMIT 10 OFFSET 30
when using ORDER BY it's take 50 sec per page or getting search result after search by name.

Device configuration: samsung galaxy tab A 8 enc(SM-T295), ram 2 GB, rom 16 GB.

Would you like to work on the issue?
Yes, we want to work in this issue .We'll need to add order by with household added datetimewise, household serial number,lastvisited timewise. so how can we implement this any tips for this

@mahmud6390 mahmud6390 changed the title Load balancing to optimize query Search method takes more time when load 7k household list Jun 15, 2023
@jingtang10 jingtang10 self-assigned this Jun 15, 2023
@jingtang10
Copy link
Collaborator

can you please add your search api call here as well?

@ellykits
Copy link
Collaborator

@mahmud6390 Is this based on the FHIRCore code? Refer to RegisterRepository#L91 to access the Search API code responsible for generating the query.

@mahmud6390
Copy link
Author

@jingtang10 time consuming whenever we have added sort(DateClientParam("_lastUpdated"), Order.DESCENDING) at fhirEngine .search method
testing code as below to filter household

fhirEngine
     .search<Group> {
       if (nameQuery.isNotEmpty()) {
         filter(
           StringClientParam(SP_NAME),
           {
             modifier = StringFilterModifier.CONTAINS
             value = nameQuery
           }
         )
       }
       filterFamily(this)
       sort(DateClientParam("_lastUpdated"), Order.DESCENDING)
       //count = 100
       from = 0
     }
     .mapIndexed { index, fhirGroup -> fhirGroup.toFamilyItem(index + 1) }
     .let { families.addAll(it) }

@mahmud6390
Copy link
Author

mahmud6390 commented Jun 21, 2023

` private fun filterFamily(search: Search) {
search.filter(
TokenClientParam("type"),
init = arrayOf(
{
val coding = Coding("http://hl7.org/fhir/group-type", "person", "")
value = of(coding)
},
),
)

search.filter(
  TokenClientParam("code"),
  init = arrayOf(
    {
      val coding = Coding("https://www.snomed.org", "35359004", "")
      value = of(coding)
    },
  ),
)

}`

@ellykits
Copy link
Collaborator

@jingtang10 We also shared some feedback on some of the query performance issues on FHIRCore. Refer to the latest comments on this issue for more info opensrp/fhircore#2066

@google google deleted a comment from mahmud6390 Jun 23, 2023
@omarismail94
Copy link
Contributor

@mahmud6390 I deleted your comment as it contained sensitive info. Just so we don't loose what was said, the DB size was 21MB

@google google deleted a comment from mahmud6390 Jun 23, 2023
@google google deleted a comment from aditya-07 Jun 23, 2023
@omarismail94
Copy link
Contributor

omarismail94 commented Jun 23, 2023

I can replicate the issue on my machine and can confirm the query is slow. The issue seems to be from the double left join we do when create the search query here when the sort is done on a DateClientParam type. If we are able to remove one of the left joins, the query runs much faster, e.g.:

SELECT a.serializedResource 
FROM ResourceEntity a 
LEFT JOIN DateIndexEntity b
ON a.resourceType = b.resourceType 
AND a.resourceUuid = b.resourceUuid 
AND b.index_name = '_lastUpdated' 
WHERE a.resourceType = 'Group' 
AND a.resourceUuid IN (
   SELECT resourceUuid 
   FROM TokenIndexEntity 
   WHERE resourceType = 'Group' 
   AND index_name = 'type' 
   AND (index_value = 'person' 
   AND IFNULL(index_system,'') = 'http://hl7.org/fhir/group-type') 
) 
AND a.resourceUuid IN (
   SELECT resourceUuid 
   FROM TokenIndexEntity 
   WHERE resourceType = 'Group' 
   AND index_name = 'code' 
   AND (index_value = '35359004' AND IFNULL(index_system,'') = 'https://www.snomed.org') ) 
ORDER BY b.index_from DESC
LIMIT 10 
OFFSET 30

@omarismail94
Copy link
Contributor

Actually, my previous analysis was wrong. The only reason the left join in the above query runs faster is that the DateIndexEntity is empty! I replaced the DateIndexEntity with DateTimeIndexEntity and the query was indeed slow.

I do have a fix though! Please see PR #2047

@omarismail94
Copy link
Contributor

omarismail94 commented Jun 26, 2023

Adding the query plan from having the new index:

SEARCH TABLE ResourceEntity AS a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
LIST SUBQUERY 1
SEARCH TABLE TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
LIST SUBQUERY 2
SEARCH TABLE TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
SEARCH TABLE DateIndexEntity AS b USING INDEX index_DateIndexEntity_resourceType_resourceUuid_index_name (resourceType=? AND resourceUuid=? AND index_name=?)
SEARCH TABLE DateTimeIndexEntity AS c USING INDEX index_DateTimeIndexEntity_resourceType_resourceUuid_index_name (resourceType=? AND resourceUuid=? AND index_name=?)
USE TEMP B-TREE FOR ORDER BY

we can see it uses our new index to search for the DateTimeIndexEntity table

@omarismail94
Copy link
Contributor

and here is the query plan from having the existing slow way:

SEARCH TABLE ResourceEntity AS a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
LIST SUBQUERY 1
SEARCH TABLE TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
LIST SUBQUERY 2
SEARCH TABLE TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
SEARCH TABLE DateIndexEntity AS b USING INDEX index_DateIndexEntity_resourceType_index_name_index_from_index_to (resourceType=? AND index_name=?)
SEARCH TABLE DateTimeIndexEntity AS c USING INDEX index_DateTimeIndexEntity_resourceType_index_name_index_from_index_to (resourceType=? AND index_name=?)
USE TEMP B-TREE FOR ORDER BY

In the query plan for the faster query, the index used is index_DateTimeIndexEntity_resourceType_resourceUuid_index_name, and in the slower one the index used is index_DateTimeIndexEntity_resourceType_index_name_index_from_index_to.

Looking at the data in DateTimeIndexEntity, I am assuming the faster one is due to the columns, resourceType and index_name, having lower cardinality than index_from and index_to

@omarismail94
Copy link
Contributor

I pushed a new commit to the PR to further optimise the query plan:

SEARCH a USING INDEX index_ResourceEntity_resourceType_resourceId (resourceType=?)
LIST SUBQUERY 1
SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
LIST SUBQUERY 2
SEARCH TokenIndexEntity USING INDEX index_TokenIndexEntity_resourceType_index_name_index_system_index_value (resourceType=? AND index_name=?)
SEARCH b USING COVERING INDEX index_DateIndexEntity_resourceType_index_name_resourceUuid_index_from (resourceType=? AND index_name=? AND resourceUuid=?) LEFT-JOIN
SEARCH c USING COVERING INDEX index_DateTimeIndexEntity_resourceType_index_name_resourceUuid_index_from (resourceType=? AND index_name=? AND resourceUuid=?) LEFT-JOIN
USE TEMP B-TREE FOR ORDER BY

with these new commit, we are USING COVERING INDEX, which means avoiding having to reference the original table, and instead the DB just uses the indexes table

@mahmud6390
Copy link
Author

after merge PR #2047 now it's working smoothly like 50sec to 2 sec per page or search

@omarismail94
Copy link
Contributor

Thanks @mahmud6390 , great to hear it works!! We will add the PR to our main branch after review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Complete
5 participants