Deduplicate cache fill requests to avoid cache stampede in proxy #308

Kunde21 · 2018-07-19T19:12:28Z

Large and heavily-used projects will have users with automated update mechanisms, leading to a flood of requests for the newest tag. Proxy & Olympus need to be able to identify duplicate requests and only fill cache from VCS once.

Initially, we can simply block all duplicate requests until the module is downloaded (#290), but this will become more complicated when working with streams (#281).

arschles · 2018-07-19T19:53:49Z

Totally agree. Taking inspiration from https://github.com/golang/groupcache, I have a rough proposal (which I'm pretty sure needs work!).

The basic idea is that we build a distributed lease on which any process can watch for release events. Then, when a server receives a go get module@version request:

Check for module@version in storage. If it exists, return it and stop. If not, continue
Attempt to acquire lease for module@version
If it's acquired, fetch module@version, return it to client, fill storage, release lease and return
If it's not acquired, subscribe to the release event for module@version. When it fires, go to 1

These are leases and not locks, so they should have timeouts. So a few notes:

If we get to (4) and the release event fires, module@version might not have been storage (if the process that had the lease failed)
We should have a maximum number of retries until failure because (4) -> (1) could happen forever if things keep failing

Some implementation notes on the distributed lease table:

Instead of requiring the table to fire an event, waiter processes can instead poll a timestamp to determine if the lease has expired
The "lease" can be implemented by doing an atomic check-and-set on a key/value storage

Kunde21 · 2018-07-19T22:03:46Z

I'm hesitant to put the fetch logic on the request side of the interaction. If we keep the fetch on the storage side, we can keep the retry logic there. That also allows us to register failed requests to a background worker that performs the retry with back-off and/or fill from elsewhere in the athens system.

As a wrapper around the storage backend, the "cache-warmer" would implement storage.Lister and storage.Getter interfaces with ( "or timeout" refers to received context cancellation/timeout):

Check for module@version cache fill requests. If it exists, return result when fill request completes (or timeout).
If cache fill request doesn't exist, check for module@version in storage backend. If it exists, return the module in storage.
If module doesn't exist in storage, register cache fill request and return result when fill request completes (or timeout).
Cache-warmer attempts to fetch module (up to N retries - configurable). If successful, return module and put in storage.
On failure, return error to all requests waiting on that module. Cache fill request can be dropped, added to back-off list with deadline, added to async fill queue, etc

All of this can live between the storage backend interface and the selected backend, giving us flexibility to change how we fill the cache and return results.

I think if we focus on deduplication at the instance level, we can scale that out with distributed leases across larger deployments later. We'll already be reducing the traffic on the VCS site by doing so, while getting the single-instance proxy ready for users to test locally and in-org.

marwan-at-work · 2018-08-13T15:20:51Z

I'd like to grab this if no one has worked on it yet. PS. This is the package you're talking about @arschles right? https://github.com/golang/groupcache/tree/master/singleflight

michalpristas · 2018-08-13T15:29:58Z

we should think whether or not proxy will be scaled 1:1 with underlying DB.
if not (adding more processing power - instances, with shared DB) we should think about some distributed lock here.
Something like Watches in Zookeeper or Consul

marwan-at-work · 2018-08-13T15:31:36Z

@michalpristas does the proxy need to be scalable? It's meant to run internally and so 1 instance of the proxy server seems ok to me for MVP. But happy to do a distributed lock thing if people wanna scale the proxy, sounds more fun to implement anyway :)

michalpristas · 2018-08-13T15:34:09Z

that's the question. we should elaborate more on that.
for MVP i guess it's fine to think about proxy as a single instance thing, but i would like to keep the discussion going. As you said, this is where it gets REALLY interesting.
Can't put enough emphasis on really

Kunde21 changed the title ~~Reduplicate cache fill requests to avoid cache stampede in proxy~~ Deduplicate cache fill requests to avoid cache stampede in proxy Jul 19, 2018

arschles added enhancement New feature or request registry proxy Work to do on the module proxy labels Jul 20, 2018

marwan-at-work self-assigned this Aug 27, 2018

marwan-at-work mentioned this issue Aug 27, 2018

Athens: introduce Single Flight #573

Merged

marwan-at-work closed this as completed in #573 Sep 3, 2018

arschles mentioned this issue Oct 8, 2018

Implement Distributed Deduplication Mechanisms #760

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate cache fill requests to avoid cache stampede in proxy #308

Deduplicate cache fill requests to avoid cache stampede in proxy #308

Kunde21 commented Jul 19, 2018

arschles commented Jul 19, 2018 •

edited

Loading

Kunde21 commented Jul 19, 2018

marwan-at-work commented Aug 13, 2018

michalpristas commented Aug 13, 2018 •

edited

Loading

marwan-at-work commented Aug 13, 2018

michalpristas commented Aug 13, 2018

Deduplicate cache fill requests to avoid cache stampede in proxy #308

Deduplicate cache fill requests to avoid cache stampede in proxy #308

Comments

Kunde21 commented Jul 19, 2018

arschles commented Jul 19, 2018 • edited Loading

Kunde21 commented Jul 19, 2018

marwan-at-work commented Aug 13, 2018

michalpristas commented Aug 13, 2018 • edited Loading

marwan-at-work commented Aug 13, 2018

michalpristas commented Aug 13, 2018

arschles commented Jul 19, 2018 •

edited

Loading

michalpristas commented Aug 13, 2018 •

edited

Loading