Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement simple moka cache for all dereference() calls #55

Closed
wants to merge 4 commits into from

Conversation

phiresky
Copy link
Contributor

@phiresky phiresky commented Jul 1, 2023

This PR caches the dereference() function in memory (so it caches both DB fetches as well as HTTP fetches). The cache is separate per object type is both limited by a time to live (default 10s) and by a object count (default 1000).

Since dereference() is a core mechanic of lemmy, this should improve performance of both normal api calls as well as incoming federation calls.

The main pain is that right now it means that both Object and the Object::Error type need to be Send + Sync + Clone (since they are stored in the cache and can thus be accessed from multiple places simultaneously). Send + Sync isn't causing any issues, but since anyhow::Error is not Clone, this requires some really annoying changes in the examples and making LemmyError Clone (by storing the inner anyhow::Error in an Arc).

The cache also prevents multiple concurrent db fetches and multiple concurrent HTTP requests to the same object as well as race conditions in other code resulting from it.

The following I'm still planning todo:

  • some testing
  • performance benchmarking (i was planning to do a simple comparison here)

Open questions:

  • any approval / concerns with this approach
  • should there be defaults for the cache count and duration? maybe not.
  • what should the cache count and TTL duration values be for lemmy? are there places where this caching causes issue?
  • There's now a ton of trait bounds that are passed around in 10 different places here and another 10 places in lemmy. Should just move all of those to the Object trait? They are only required for the dereference function (as before), but idk if there's actually any point to not having them on the main trait. Just look at that signature of the receive_activity function.

@Nutomic
Copy link
Member

Nutomic commented Jul 3, 2023

This looks like it would essentially be a cache for database reads, as network calls are made at most once every 24h due to last_refreshed_at() check. So Im sceptical that this will have much of a performance impact, as these db queries are very simple (select a single db row from indexed string column, no joins, filters or anything).

So i think its better to set the right cache-control headers on lemmy http endpoints, and allow admins to configure caching in nginx.

@phiresky
Copy link
Contributor Author

phiresky commented Jul 3, 2023

as network calls are made at most once every 24h due to last_refreshed_at()

It also fixes the race condition when two lookups happen at the same time. That happened a lot for me when perf testing, not sure how much it happens in prod.

I'm working on some benchmarks that I'll post here. I think it could definitely make an impact considering that when I test by receiving 100k incoming federation events lemmy does 11170454 database queries! (>100 per event)!

@Nutomic
Copy link
Member

Nutomic commented Jul 3, 2023

That sounds really extreme. Can you tell what its reading from the db so much?

@phiresky
Copy link
Contributor Author

phiresky commented Jul 3, 2023

Yes. Check the "Top Queries by call count" here: https://github.com/phiresky/lemmy-perf-data/blob/main/comparisons/2023-07-03-pg-synchronous-commit.md. It's a different experiment but it's the state of lemmy main branch. Both the select * from person where actor_id=X and the select * from post where ap_id=X queries seem to be called around three times for every incoming fed event (vote, comment, post).

It's less than the number I quoted above because previously i uploaded 100k post/comments, now it's 70k votes and 30k post/comment (more realistic).

I'll do the experiment to compare the impact of this caching PR later.

@Nutomic
Copy link
Member

Nutomic commented Jul 4, 2023

Okay it looks like the votes in particular do a lot of database reads. Also the inbox needs to read actor keys from db. I was thinking that it might be possible to optimize these things in the code without relying on caching, but that looks very tricky.

The current cache time of 10s is very short though and wont make any noticable difference. I think one hour would be totally fine. Cache capacity could also be higher, but it depends on memory usage.

Im all for simplifying the trait bounds.

Edit: I also added some caching in Lemmy for blocklist and dead instance list reads: LemmyNet/lemmy#3427 (comment)

#[serde(untagged)]
pub enum SearchableObjects {
Person(Person),
Note(Note)
}

#[derive(Debug, Clone)]
pub struct Error(pub(crate) Arc<anyhow::Error>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a comment why this is necessary.

Kind: Object + Send + Sync + Clone + 'static,
<Kind as Object>::Error: Clone + Send + Sync,
{
static CACHES: OnceLock<RwLock<type_map::concurrent::TypeMap>> = OnceLock::new();
Copy link
Member

@Nutomic Nutomic Jul 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really need two different types of lock? Smells like premature optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that

  1. OnceLock::get_or_init() returns an immutable reference to the inner value. It only cares about making sure the initializer is called only once.
  2. TypeMap is basically just a HashMap, which cannot be written to concurrently. (the concurrent in the package name just refers to the inner bounds of the value type of the typemap.
  3. RwLock allows writing to the TypeMap via reference.

Honestly I just spent a while trying to figure out the best way to do this in general, I'd like it to not need a typemap at all (somehow abuse monomorphization to create one moka cache per type) but couldn't figure out a better solution.

} else {
// The Cache.get_with method ensures that the dereference_inner method is only called once even if the dereference method is called twice simultaneously.
// From the moka docs: "This method guarantees that concurrent calls on the same not-existing key are coalesced into one evaluation of the init future. Only one of the calls evaluates its future, and other calls wait for that future to resolve."
// Considerations: should an error result be stored in the cache as well? Right now: yes. Otherwise try_get_with should be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it makes sense to cache error values. I would expect errors to be temporary and retried the next time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also nitpick, your comment lines are too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it makes sense to cache error values. I would expect errors to be temporary and retried the next time.

The main reason I cached them is that failed lookups are very common and "intentional" in lemmy. For example, when inserting a comment it first looks up if the corresponding parent exist, and if it doesn't it tries looking that up which might permanently fail. That lookup failing is a fairly hot path and sending out spamming http requests is not great.

Tbh, I can't think of good example right now where these errors are something that would happen "expectedly", maybe e.g. if some instance is blacklisted so their content is not inserted, but I think it does exist.

In general, I think lemmy needs to start distinguishing between ephemeral errors and permanent ones. Right now the code eats up lots of errors that should be passed up or ignored (e.g. parsing error vs 404 error vs DNS lookup error).

/// Defines how long objects of this type should live in the in-memory cache
fn cache_time_to_live() -> Duration {
Duration::from_secs(600)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes more sense to move this to the FederationConfig. Should also be disabled in debug mode imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually did this intentionally in the trait so the configuration can depend on the object. For example, it makes sense to cache a lot of communities and posts and cache them for a long time while caching Vote objects is probably not very useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Votes dont implement the Object trait so they wont get cached either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. But still, does it make sense what I wrote? It probably makes sense to have pretty much all communities in the cache but only the most recent 1000 posts or so, while caching comments potentially is not too useful (since those are mostly queried in batches or together with a post)

I agree that it should probably also somehow be configurable by user though, or somehow dependent on system memory.

@phiresky
Copy link
Contributor Author

phiresky commented Jul 4, 2023

This looks like it would essentially be a cache for database reads

I think you might be right after all that it makes more sense to do this thing closer to the database. When benchmarking this I'm seeing around ~30% reduced DB queries, but since a lot of the lemmy code does the exact same lookups but not going through the .dereference() function.

So it might make more sense to do this whole thing around the Crud trait or similar, that way all the code can benefit. Also means it can catch all db updates immediately and update the cache. I'll look later and maybe open a PR against lemmy itself.

@Nutomic
Copy link
Member

Nutomic commented Jul 4, 2023

Yes a general cache for database reads might be good. I wonder how that could be done.

@phiresky
Copy link
Contributor Author

phiresky commented Jul 4, 2023

I'll look If i can make a "simple" draft PR against lemmy for that. Probably very similar to this one.

@Nutomic Nutomic closed this Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants