Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horreum procedure calc_dataset_view computes wrong data #2201

Closed
lampajr opened this issue Dec 5, 2024 · 0 comments · Fixed by #2202
Closed

Horreum procedure calc_dataset_view computes wrong data #2201

lampajr opened this issue Dec 5, 2024 · 0 comments · Fixed by #2202
Assignees
Labels
area/backend priority/high High priority type/bug Something isn't working

Comments

@lampajr
Copy link
Member

lampajr commented Dec 5, 2024

Describe the bug

It turned out that currently the process of recomputing the label values is quite tricky and it involves some procedure and triggers to compute the dataset_view which is the one showed when filtering datasets.

calculateLabelValues -> insert "label_values" -> call calc_dataset_views

                                                -> triggers dsv_after_lv_insert_func -> insert "view_recalc_queue" -> triggers recalc_dataset_view

As you can see we are triggering both calc_dataset_views and recalc_dataset_view in an unpredictable order.

The problem is that calc_dataset_views is wrong and it can compute wrong data.

CREATE OR REPLACE PROCEDURE calc_dataset_view(datasetId bigint) AS $$
BEGIN
WITH view_agg AS (
SELECT
vc.view_id, vc.id as vcid, array_agg(DISTINCT label.id) as label_ids, jsonb_object_agg(label.name, lv.value) as value FROM dataset_schemas ds
JOIN label ON label.schema_id = ds.schema_id
JOIN viewcomponent vc ON vc.labels ? label.name
JOIN label_values lv ON lv.label_id = label.id
WHERE ds.dataset_id = datasetId
GROUP BY vc.view_id, vcid
)
INSERT INTO dataset_view (dataset_id, view_id, label_ids, value)
SELECT datasetId, view_id, array_agg(DISTINCT label_id), jsonb_object_agg(vcid, value) FROM view_agg, unnest(label_ids) as label_id
GROUP BY view_id;

It is joining label_values without adding the on condition with lv.dataset_id = ds.dataset_id as the recalc_dataset_view is doing.

CREATE OR REPLACE FUNCTION recalc_dataset_view() RETURNS TRIGGER AS $$
BEGIN
PERFORM set_config('horreum.userroles', NEW.roles, true);
DELETE FROM dataset_view WHERE dataset_id = NEW.dataset_id AND (NEW.view_id IS NULL OR NEW.view_id = view_id);
WITH view_agg AS (
SELECT vc.view_id, vc.id as vcid, array_agg(DISTINCT label.id) as label_ids, jsonb_object_agg(label.name, lv.value) as value FROM dataset_schemas ds
JOIN label ON label.schema_id = ds.schema_id
JOIN viewcomponent vc ON vc.labels ? label.name
JOIN label_values lv ON lv.label_id = label.id AND lv.dataset_id = ds.dataset_id
WHERE ds.dataset_id = NEW.dataset_id
AND (NEW.view_id IS NULL OR NEW.view_id = vc.view_id)
AND vc.view_id IN (SELECT view.id FROM view JOIN dataset ON view.test_id = dataset.testid WHERE dataset.id = NEW.dataset_id)
GROUP BY vc.view_id, vcid
) INSERT INTO dataset_view (dataset_id, view_id, label_ids, value)
SELECT NEW.dataset_id, view_id, array_agg(DISTINCT label_id), jsonb_object_agg(vcid, value) FROM view_agg, unnest(label_ids) as label_id
GROUP BY view_id;

To Reproduce

  1. Startup Horreum
  2. Create datasets and some views
  3. Drop trigger from db drop trigger recalc_dataset_view on view_recalc_queue
  4. Force datasets recalculation
  5. In the dataset view filter by label value

Additional Notes

  • Why was it working before? My guess is that it was working by luck.. as the calc_dataset_view was inserting wrong data that was later overridden by recalc_dataset_view
  • Starting from 0.16 we removed the recalculation lock and we might have increased the concurrency level that may not gurantee the previous order (just a guess this)
  • The issue was present also before 0.16, we have been just lucky that we did not experience it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend priority/high High priority type/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant