-
-
Notifications
You must be signed in to change notification settings - Fork 890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Consistent High postgres CPU usage from federation worker calling community_follower
#3958
Comments
community_follower
community_follower
Is this with debug or release mode? The community_followers_recheck_delay
is once per instance per second on debug one once per minute on release
…On Sun, Sep 10, 2023, 16:47 Dessalines ***@***.***> wrote:
Requirements
- Is this a bug report? For questions or discussions use
https://lemmy.ml/c/lemmy_support
- Did you check to see if this issue already exists?
- Is this only a single bug? Do not put multiple bugs in one issue.
- Is this a backend issue? Use the lemmy-ui
<https://github.com/LemmyNet/lemmy-ui> repo for UI / frontend issues.
Summary
#3605 <#3605> Introduced a
consistent, high-CPU postgres usage bug coming from the the federation
worker queue.
The query is: SELECT DISTINCT "community"."id",
coalesce("person"."shared_inbox_url", "person"."inbox_url") FROM
(("community_follower" INNER JOIN "community" ON
("community_follower"."community_id" = "community"."id")) INNER JOIN
"person" ON ("community_follower"."person_id" = "person"."id")) WHERE
(((("person"."instance_id" = $1) AND "community"."local") AND NOT
("person"."local")) AND ("community_follower"."published" > $2))
The source function used by the new federation worker is this one:
https://github.com/LemmyNet/lemmy/blob/main/crates/db_views_actor/src/community_follower_view.rs#L18
Its called about 12x every second.
cc @phiresky <https://github.com/phiresky>
Steps to Reproduce
See above
Technical Details
See above
Version
main
Lemmy Instance URL
*No response*
—
Reply to this email directly, view it on GitHub
<#3958>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARSOYPXP5JH5DYPYDQIDOTXZXHG3ANCNFSM6AAAAAA4SHEYNM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It should also be very cheap since the published should be one minute in
the past so it should return 0 rows almost always
…On Sun, Sep 10, 2023, 16:49 phire skyde ***@***.***> wrote:
Is this with debug or release mode? The community_followers_recheck_delay
is once per instance per second on debug one once per minute on release
On Sun, Sep 10, 2023, 16:47 Dessalines ***@***.***> wrote:
> Requirements
>
> - Is this a bug report? For questions or discussions use
> https://lemmy.ml/c/lemmy_support
> - Did you check to see if this issue already exists?
> - Is this only a single bug? Do not put multiple bugs in one issue.
> - Is this a backend issue? Use the lemmy-ui
> <https://github.com/LemmyNet/lemmy-ui> repo for UI / frontend issues.
>
> Summary
>
> #3605 <#3605> Introduced a
> consistent, high-CPU postgres usage bug coming from the the federation
> worker queue.
>
> The query is: SELECT DISTINCT "community"."id",
> coalesce("person"."shared_inbox_url", "person"."inbox_url") FROM
> (("community_follower" INNER JOIN "community" ON
> ("community_follower"."community_id" = "community"."id")) INNER JOIN
> "person" ON ("community_follower"."person_id" = "person"."id")) WHERE
> (((("person"."instance_id" = $1) AND "community"."local") AND NOT
> ("person"."local")) AND ("community_follower"."published" > $2))
>
> The source function used by the new federation worker is this one:
> https://github.com/LemmyNet/lemmy/blob/main/crates/db_views_actor/src/community_follower_view.rs#L18
>
> Its called about 12x every second.
>
> cc @phiresky <https://github.com/phiresky>
> Steps to Reproduce
>
> See above
> Technical Details
>
> See above
> Version
>
> main
> Lemmy Instance URL
>
> *No response*
>
> —
> Reply to this email directly, view it on GitHub
> <#3958>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AARSOYPXP5JH5DYPYDQIDOTXZXHG3ANCNFSM6AAAAAA4SHEYNM>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
It's executed separately for every federated instance that's why there's many |
if they are indexed correctly they should take <1ms .. maybe an index is missing? |
So frequent queries here is kinda expected with the current code. If there's 1000 instaences there's 1000 queues and each of them loads new followers every 60s, so that's ~15 per second. Since the instance_id is not part of the community_followers table it has to join to person which makes it a bit more expensive. When I tested this it didn't seem to be much of an issue because each query was very cheap. Also this is kind of part of what I meant with "increases overhead for small instances". On an active instance this shouldn't be too relevant because it replaces the previous follower queries that were always in the top lists of pg_stat_statements (where before it was O(n) wrt the activity count and now is O(1)) Now I'm running the query on a db and I am seeing The execution count could also be reduced, both by increasing the recheck delay and by adding more complicated logic or another table like The easiest improvement here would be to increase the delay. Something like 5min might be fine. This delay is only relevant the very first time a person from a different instance subscribes to a community. |
This is the query it replaces from 0.18.x This query is(was) executed very frequently on an active instance and was in the top 3 most expensive queries if I remember correctly ( called once per user action) |
I'm not sure what the defaults should be, I'll let you comment on the PR for those. |
Fixing high CPU usage on federation worker recheck + fix federation tests. Fixes #3958
Requirements
Summary
#3605 Introduced a consistent, high-CPU postgres usage bug coming from the the federation worker queue.
The query is:
SELECT DISTINCT "community"."id", coalesce("person"."shared_inbox_url", "person"."inbox_url") FROM (("community_follower" INNER JOIN "community" ON ("community_follower"."community_id" = "community"."id")) INNER JOIN "person" ON ("community_follower"."person_id" = "person"."id")) WHERE (((("person"."instance_id" = $1) AND "community"."local") AND NOT ("person"."local")) AND ("community_follower"."published" > $2))
The source function used by the new federation worker is this one: https://github.com/LemmyNet/lemmy/blob/main/crates/db_views_actor/src/community_follower_view.rs#L18
Its called about 12x every second.
cc @phiresky
Steps to Reproduce
See above
Technical Details
See above
Version
main
Lemmy Instance URL
No response
The text was updated successfully, but these errors were encountered: