-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching improvements? #2
Comments
That low cache minute level TTL was just for initial development purposes. order of hours makes indeed more sense now that things are getting more concrete |
Recently some back-ends seem to have some issues and that got fed through to the aggregator, which responded with errors or timeouts sometimes. So in addition to the cron job proposal above, I think it's also a good idea to only clear the cache once a successful request has been made (or it's very outdated) so that people can still retrieve metadata although the back-end is temporarily offline. Metadata requests should always be delivered in a second or less, instead of taking up to a minute like it is right now from time to time. |
a couple of ideas to improve the caching in the aggregator:
|
This is basically what we already do in the openEO Hub, but it's all JS with a MongoDB in the background and a daily crawling through a cron job. There we have some logic implemented for such cases where the data is not directly cleared in the db until new data successfully comes in, but clears it after a certain amount of failures... So if you need some insights into that feel free to contact @christophfriedrich. |
Note: under #15 (83cfb7b), the number of gunicorn workers was increased to 10, which means that there are 10 separate processes at the moment that have their own in-memoy cache (containing the same things). Sharing the cache would be better for performance (now it could take long to warm up the cache) and consistency (because different workers might have different cached data) |
Another idea to take into account: do caching at proxy/load balancing level instead of doing it in the flask app itself |
For most unit tests we want no caching or simple dict based caching
start using memoizers in MultiBackendConnection and AggregatorBackendImplementation refactor memoizers some more to support this properly improve test coverage
For most unit tests we want no caching or simple dict based caching
For most unit tests we want no caching or simple dict based caching
start using memoizers in MultiBackendConnection and AggregatorBackendImplementation refactor memoizers some more to support this properly improve test coverage
merged initial usage of zookeeper based caching in 99332f7 |
caching JsonSerde needed support for serialization of custom classes (`_InternalCollectionMetadata` in this case)
had to add additional gzip'ing of json because process registry payload of process metadata is too large for default zookeeper limits
I think the most important caching issue listed here as the lack of a shared cache between all the workers, which made cache misses very frequent in practice. There are some more ideas left here, but I'd prefer close this general issue and move the remaining ideas to separate tickets for further discussion: |
Right now the caching time seems to be 5 minutes, which means that (due to a small request frequency) mostly every request is refreshing the cache, which leads to long loading times on the website and in the clients. For example, the Web Editor takes roughly 5 seconds to connect without (server-side) cache and under a second with (server-side) cache.
I think 5 minute cache TTL seems pretty low, do we really expect such a high frequency in changes for metadata (collections, processes, file formats, ...)? 60 minutes or even a day could be reasonable, too? And then I'd suggest to refresh the data using a cron job and always return the user cached data so that loading times are consistent.
Origin: https://github.com/openEOPlatform/architecture-docs/issues/22#issuecomment-886254097
The text was updated successfully, but these errors were encountered: