Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance adding new Schema. #1577

Closed
whitingjr opened this issue Apr 5, 2024 · 3 comments
Closed

Slow performance adding new Schema. #1577

whitingjr opened this issue Apr 5, 2024 · 3 comments
Assignees
Labels
area/backend priority/hold An issuee that we need to look at - but later

Comments

@whitingjr
Copy link
Collaborator

Describe the bug

UI responsiveness is less than expected when adding a new Schema.

To Reproduce

Add a new Schema. Takes around 60 seconds which is considered too long.

Version

0.12.1

Java

17

PostgreSQL

16
@lampajr
Copy link
Member

lampajr commented Apr 9, 2024

A bit of consideration as a result of several discussions on the subject.

Root cause

The root cause has been identified in the query

protected static final String FIND_RUNS_WITH_URI = """
SELECT id, testid
FROM run
WHERE NOT trashed
AND (data->>'$schema' = ?1
OR (CASE
WHEN jsonb_typeof(data) = 'object' THEN ?1 IN (SELECT values.value->>'$schema' FROM jsonb_each(data) as values)
WHEN jsonb_typeof(data) = 'array' THEN ?1 IN (SELECT jsonb_array_elements(data)->>'$schema')
ELSE false
END)
OR (metadata IS NOT NULL AND ?1 IN (SELECT jsonb_array_elements(metadata)->>'$schema'))
)
""";

Query that is performed during the creation or update of a Schema object. Its goal is to identify all possible runs that are referencing the updated/created schema (it is possible to upload a run referencing a schema that does not exist yet), then for each run (I) update the run_schemas table; (II) delete validation errors and (III) trigger async run recalculation.

The issue causing this slowness is that the query performs a full (sequential) scan over all the runs analyzing every jsonb data object looking for the $schema key.

                                                                                                                                                                       QUERY PLAN                                                                                                                                                                        
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on run  (cost=0.00..103758.07 rows=7746 width=8) (actual time=37775.492..37775.493 rows=0 loops=1)
   Filter: ((NOT trashed) AND (can_view(access, owner, token) OR has_role('horreum.system'::text)) AND (((data ->> '$schema'::text) = 'urn:achme:0.1'::text) OR CASE WHEN (jsonb_typeof(data) = 'object'::text) THEN (SubPlan 1) WHEN (jsonb_typeof(data) = 'array'::text) THEN (SubPlan 2) ELSE false END OR ((metadata IS NOT NULL) AND (SubPlan 3))))
   Rows Removed by Filter: 28414
   SubPlan 1
     ->  Function Scan on jsonb_each "values"  (cost=0.00..1.25 rows=100 width=32) (actual time=0.653..0.785 rows=6 loops=25827)
   SubPlan 2
     ->  Result  (cost=0.00..2.02 rows=100 width=32) (actual time=7.914..8.241 rows=2 loops=95)
           ->  ProjectSet  (cost=0.00..0.52 rows=100 width=32) (actual time=7.912..8.239 rows=2 loops=95)
                 ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=95)
   SubPlan 3
     ->  Result  (cost=0.00..2.02 rows=100 width=32) (actual time=0.001..0.001 rows=1 loops=6345)
           ->  ProjectSet  (cost=0.00..0.52 rows=100 width=32) (actual time=0.001..0.001 rows=1 loops=6345)
                 ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=6345)
 Planning Time: 0.470 ms
 Execution Time: 37775.565 ms
(15 rows)

Fix proposal

Given that, in order to fix this issue, we might refactor how run_schemas table is populated and then change the query to avoid full scan over all jsonb data of all runs but rather use the run_schemas table as single source of truth for run<-->schemas association, we decided to proceed in two steps:

@whitingjr whitingjr assigned lampajr and unassigned whitingjr Apr 9, 2024
@johnaohara johnaohara added the priority/hold An issuee that we need to look at - but later label May 9, 2024
@johnaohara
Copy link
Member

On Hold due to #1603

@lampajr
Copy link
Member

lampajr commented Oct 14, 2024

Given that #1580 has been closed as not planned and #1579 has been already fixed we can close this one as completed.

@lampajr lampajr closed this as completed Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend priority/hold An issuee that we need to look at - but later
Projects
None yet
Development

No branches or pull requests

3 participants