Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove aggregate tables (fixes #4869) #5407

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft

Conversation

Nutomic
Copy link
Member

@Nutomic Nutomic commented Feb 7, 2025

Removed the tables comment_aggregates, post_aggregates, community_aggregates and person_aggregates. These columns are merged into the respective main table to reduce complexity and likely improve performance.

@Nutomic Nutomic changed the title Remove aggregate tables Remove aggregate tables (fixes #4869) Feb 7, 2025
@Nutomic Nutomic force-pushed the remove-aggregate-tables branch from 8de26ab to a88845e Compare February 10, 2025 14:21
AND (diff.upvotes, diff.downvotes) != (0, 0)
RETURNING
r.creator_id_from_thing_aggregates (a.*) AS creator_id, diff.upvotes - diff.downvotes AS score)
a.creator_id AS creator_id, diff.upvotes - diff.downvotes AS score)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was quite confused by this trigger because it actually replaces thing in the sql code with post/comment, and so kept referring to post_aggregates when that was already removed. I fixed it and ran into another problem: the trigger for comment updates the comment table, so it fires another trigger, and ultimately throws "stack depth limit exceeded". I fixed this by adding WHEN (pg_trigger_depth( ) = 0 ) to create_triggers()

@Nutomic Nutomic force-pushed the remove-aggregate-tables branch from 62c34e7 to af3035b Compare February 11, 2025 10:19
assert_eq!(0, after_parent_post_delete.posts);
assert_eq!(0, after_parent_post_delete.comments);
Copy link
Member Author

@Nutomic Nutomic Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theres a test failure on this line but I cant figure out why. The relevant trigger seems to be in r.update_comment_count_from_post (), but there are no changes except for simple renaming. Site and person aggregates also have the same problem with wrong comment count after post deletion.

@dessalines @dullbananas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a first glance, that function is doing an inner join from post to post now, where it should only need to do a where filter.

@dullbananas when you get a chance, could you go over this PR and make any corrections you see necessary to the replaceable schema utils and triggers?

Copy link
Member Author

@Nutomic Nutomic Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I tried to change that but still not working. And yes triggers definitely need a close review before merging. (edit: migrations as well especially the indexes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway I will keep working to remove the remaining aggregate tables (and likely break more stuff). You can use this commit to look into this test failure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment trigger had infinite recursion because it updated parent comments, and statetement-level triggers always run even if zero rows are updated. The pg_trigger_depth( ) = 0 condition prevents the comment trigger from running if the comment was deleted by the delete_comments_before_post trigger. Remove the depth condition and check for (SELECT count(*) FROM select_old_and_new_rows AS old_and_new_rows) = 0 in the trigger function so it can return early. This is less error prone than increasing the depth limit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear to me, where exactly should I put the check (SELECT count(*) FROM select_old_and_new_rows AS old_and_new_rows) = 0;?

Copy link
Collaborator

@dullbananas dullbananas Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition needs to be checked in the function body just before all of the updates, using an if statement that contains a return statement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to add that condition directly inside r.create_triggers() to avoid duplication, but for some reason it throws relation "select_old_and_new_rows" does not exist. So I added it manually to every call to create_triggers but there are still various errors eg:

---- impls::local_site::tests::test_aggregates stdout ----
Error: LemmyError { message: Unknown("tuple to be deleted was already modified by an operation triggered by the current command"), inner: tuple to be deleted was already modified by an operation triggered by the current command

---- impls::actor_language::tests::test_user_languages stdout ----
Error: LemmyError { message: Unknown("duplicate key value violates unique constraint \"idx_site_instance_unique\""), inner: duplicate key value violates unique constraint "idx_site_instance_unique"

Its probably easier if you make the changes yourself and then open a PR to this branch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to fix the triggers right now

@Nutomic
Copy link
Member Author

Nutomic commented Feb 12, 2025

Removed the tables comment_aggregates, post_aggregates, community_aggregates and person_aggregates. site_aggregates table is unchanged because we dont store this data for all instances (only local site). There is also a table person_post_aggregates which cannot be merged. Whats left is the test failure above, and updating lemmy-js-client.

Edit: Also a lot of the comments and some variable names still refer to aggregate tables.

@dessalines
Copy link
Member

For site_aggregates, you could move all those columns to local_site then.

@dessalines
Copy link
Member

dessalines commented Feb 12, 2025

I see the person_post_aggregates is actually only doing work on the post_actions table now. Probably makes sense to move it to crates/db_schema/src/impls/post.rs , or possibly crates/db_schema/src/impls/local_user.rs

@Nutomic
Copy link
Member Author

Nutomic commented Feb 12, 2025

For site_aggregates, you could move all those columns to local_site then.

Right makes sense.

I see the person_post_aggregates is actually only doing work on the post_actions table now. Probably makes sense to move it to crates/db_schema/src/impls/post.rs , or possibly crates/db_schema/src/impls/local_user.rs

Maybe person_post_aggregates can be merged into post_actions then.

Edit: Turns out PersonPostAggregates actually is the post_actions table, so I renamed and moved the structs.

@Nutomic
Copy link
Member Author

Nutomic commented Feb 13, 2025

It would also be possible to merge local_site_rate_limit into local_site. However the combined table would have 54 columns which is close to the 64 limit, and then we would have to split it again, or accept even slower builds with 128 column feature. So Im leaving this one.

I merged local_user_vote_display into local_user as that separation is not necessary.

So whats missing now is some help to get these tests passing and review the triggers. Also the comments in triggers.sql need to be updated, but that can be done in a separate PR. This one should be merged soon to avoid conflicts.

@Nutomic Nutomic force-pushed the remove-aggregate-tables branch from eacb79e to 25aa10a Compare February 13, 2025 10:57
Copy link
Member

@dessalines dessalines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets prioritize this PR so you don't get stuck in merge hell, and can work on other things.

@@ -51,6 +51,17 @@ pub struct Comment {
/// Whether the comment has been distinguished(speaking officially) by a mod.
pub distinguished: bool,
pub language_id: LanguageId,
pub score: i64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we should add any comments to denote that some fields are built by triggers / reports, but I guess its kind of arbitrary.

Comment on lines 93 to 94
--CREATE INDEX idx_post_aggregates_community_published on post USING btree (community_id, featured_local DESC, published DESC);
--CREATE INDEX idx_post_aggregates_community_published_asc on post USING btree (community_id, featured_local DESC, reverse_timestamp_sort (published) DESC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see these ones being created before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange Im sure those were throwing errors before but now they can all be added. Maybe because of the renaming above, but anyway the indices should be dropped when the table is dropped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants