Add limit for number of concurrent connections to registry #15569

dmage · 2017-07-31T17:40:34Z

The registry might have excessive resource usage under heavy load. To avoid this, we limit the number of concurrent requests. Requests over the MaxRunning limit are enqueued. Requests are rejected if there are MaxInQueue requests in the queue. Request may stay in the queue no more than MaxWaitInQueue.

See also #15448.

smarterclayton · 2017-07-31T17:42:48Z

pkg/dockerregistry/server/maxconnections/maxconnections_test.go

+	return true
+}
+
+func TestCoutner(t *testing.T) {


smarterclayton · 2017-07-31T17:44:05Z

pkg/dockerregistry/server/maxconnections/maxconnections.go

+	"time"
+)
+
+func defaultOverloadHandler(w http.ResponseWriter, r *http.Request) {


We should be consistent with Kubernetes unless we have a reason not to. If we have a reason not to, it should be documented in comments here.

smarterclayton · 2017-07-31T17:44:19Z

pkg/dockerregistry/server/maxconnections/maxconnections.go

+}
+
+// DefaultOverloadHandler is a default OverloadHandler that used by New.
+var DefaultOverloadHandler http.Handler = http.HandlerFunc(defaultOverloadHandler)


Does this need to be public?

smarterclayton · 2017-07-31T17:46:15Z

pkg/dockerregistry/server/maxconnections/maxconnections.go

+
+	var timer *time.Timer
+	var timeout <-chan time.Time
+	if h.MaxWaitInQueue > 0 {


If MaxWaitInQueue is zero, why wouldn't we exit earlier in the function (i.e., before the h.queue enqueue above)?

smarterclayton · 2017-07-31T18:18:19Z

pkg/dockerregistry/server/maxconnections/maxconnections.go

+
+func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
+	if h.enqueueRunning(r.Context()) {
+		defer func() {


this might be better as a non-closure. defer h.done()

smarterclayton · 2017-07-31T18:39:29Z

pkg/cmd/dockerregistry/dockerregistry.go

@@ -137,7 +138,15 @@ func Execute(configFile io.Reader) {
 	app.Config.HTTP.Headers.Set("X-Registry-Supports-Signatures", "1")

 	app.RegisterHealthChecks()
-	handler := alive("/", app)
+	handler := http.Handler(app)


The limiter is fine, but this needs to be much more discriminating. Blob HEAD and GET shouldn't be under the same rate limit. This is too broad to solve the existing problem without adding a new one.

This needs to only apply to blob upload.

I'd be ok with single quota for a set of methods / paths that only includes uploads, as long as we can identify up front (before we merge) that it addresses the issue at hand. Future changes can well expand it.

@smarterclayton afaik HEAD is triggering mirroring of blobs that is equivalent to uploading. Why that should not be rate-limited?

It should, but our read registry traffic is an order of magnitude than our write load. Setting a read rate limit is going to dwarf the write limit, which mean we'll have to set the read limit too low to preserve our current scale.

If pull though proxy needs to be limited, then it's possible we need another limiter wired in (or connect this limiter to that). However, if this only addresses non-pullthroigh it gets us closer to not broken. We aren't breaking today because of pullthrough though.

Also, we already have a crude pullthrough limiter that limits max simultaneous writes.

@smarterclayton what I was saying is that HEAD will cause a write to blob store (we start mirroring and uploading to S3 in background, bypassing the API and rate-limiter).

@dmage or @legionus can explain it better, seems like i am a proxy here :) maybe you guys should talk

we already have a crude pullthrough limiter that limits max simultaneous writes.

It's better to think that we haven't, it has serious leaking problems (and it's not a pull-through limiter, but a storage writer limiter).

If pull though proxy needs to be limited

Pull-through can trigger mirroring which uses blob writers which consume memory. So... I don't know, maybe we needn't. Do we want to have the same limiter for mirroring and for PUT/PATCH requests?

Any write (upload) should be part of the same rate limit pool. Otherwise we have to come up with heuristics and split the pool, which then creates more operational overhead.

But right now mirroring is not the primary perf problem we have. Any fix we do should be able to be adapted to cover mirroring, but it's not required that we do that right now.

I've updated this PR, it still has one TODO, but the main idea I think will stay the same: use two limiters, one for GET/HEAD requests, and another one for PATCH/PUT and mirroring.

stevekuznetsov · 2017-08-01T20:44:04Z

/test end_to_end

dmage · 2017-08-04T18:50:38Z

/retest

smarterclayton · 2017-08-05T02:08:05Z

A few minor comments, but looks ok.

miminar · 2017-08-07T11:55:02Z

pkg/dockerregistry/server/maxconnections/limiter.go

+	}
+
+	select {
+	case l.running <- struct{}{}:


Can this be FIFO instead of random selection from queued requests? Otherwise I wouldn't call them queued requests but waiting requests.

In the current implementation of Go, it is FIFO (golang/go#11506). While it does not mentioned in the language specification, I don't want to increase complexity because of it. The authors of Go try to avoid problems with "tail latency during bursty load" and I'm fine with it even if it would not be genuine FIFO.

@dmage very interesting. In Go Playground. It's behaving like FIFO. Locally, with version go version go1.8.3 linux/amd64, I get totally different behavior:

go run bufchan.go reader #0 got message from writer #1 reader #2 got message from writer #3 reader #4 got message from writer #2 reader #5 got message from writer #7 reader #6 got message from writer #4 reader #7 got message from writer #5 reader #8 got message from writer #8 reader #3 got message from writer #6 reader #1 got message from writer #0 reader #9 got message from writer #9

There are clearly conspicuous rising sequences but it's far from FIFO IMHO.

But I agree, let's put the burden on the golang.

You need to use n on line 17 and to increase sleep time, 10 us might be too low.

My bad, now it looks like FIFO indeed 😉.

Signed-off-by: Oleg Bulatov <[email protected]>

miminar · 2017-08-07T14:38:10Z

/approve

smarterclayton · 2017-08-07T17:16:00Z

Yeah, channels are FIFO by design. They can't be changed without breaking existing apps.

…

On Mon, Aug 7, 2017 at 11:41 AM, Michal Minář ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In pkg/dockerregistry/server/maxconnections/limiter.go <#15569 (comment)>: > + defer func() { + <-l.queue + }() + default: + return false + } + + var timeout <-chan time.Time + if l.maxWaitInQueue > 0 { + timer := l.newTimer(l.maxWaitInQueue) + defer timer.Stop() + timeout = timer.C + } + + select { + case l.running <- struct{}{}: My bad, now it looks like FIFO indeed 😉. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15569 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p2OBy6ilgSWCa9fSQ8dI8X13dO85ks5sVzAdgaJpZM4OosXc> .

legionus · 2017-08-08T11:50:13Z

/approve
/lgtm

miminar · 2017-08-08T12:07:57Z

/assign kargakis
as a pkg/cmd approver

0xmichalis · 2017-08-08T12:32:42Z

/approve

@miminar can you please add an OWNERS file inside pkg/cmd/dockerregistry with your names as approvers? Thanks.

openshift-merge-robot · 2017-08-08T12:32:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dmage, kargakis, legionus, miminar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~images/dockerregistry/OWNERS~~ [kargakis,legionus]
~~pkg/cmd/OWNERS~~ [kargakis]
~~pkg/dockerregistry/OWNERS~~ [dmage,legionus,miminar]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

miminar · 2017-08-08T12:51:11Z

@miminar can you please add an OWNERS file inside pkg/cmd/dockerregistry with your names as approvers? Thanks.

Your wish is my command: #15670

openshift-merge-robot · 2017-08-09T01:06:19Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-merge-robot · 2017-08-09T03:06:34Z

Automatic merge from submit-queue

smarterclayton · 2017-08-09T13:40:50Z

How do we prepare to test this safely in our largest environments? We need to have a plan for verifying this addresses the issue and can safely enable it. Perhaps free int or free stg. We also need a backport for 3.6.1 ose On Aug 8, 2017, at 11:06 PM, OpenShift Merge Robot <[email protected]> wrote: Automatic merge from submit-queue — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15569 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p22BKGeCCpOMI75mJPN1UIu5mb7zks5sWSI9gaJpZM4OosXc> .

legionus · 2017-08-09T14:49:15Z

How do we prepare to test this safely in our largest environments? We need
to have a plan for verifying this addresses the issue and can safely enable
it. Perhaps free int or free stg.

@smarterclayton We need monitoring data to specify the correct rate limits. The question is do we have statistics of the number of requests per time from these clusters.

smarterclayton · 2017-08-09T16:46:50Z

Yes, you got 8k builds in 20 minutes, with a rough average of 3 blobs per image pushed. 2-5 minutes in was the highest peak, so assume double or triple the average in that phase.

…

On Wed, Aug 9, 2017 at 10:49 AM, Alexey Gladkov ***@***.***> wrote: How do we prepare to test this safely in our largest environments? We need to have a plan for verifying this addresses the issue and can safely enable it. Perhaps free int or free stg. @smarterclayton <https://github.com/smarterclayton> We need monitoring data to specify the correct rate limits. The question is do we have statistics of the number of requests per time from these clusters. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15569 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_pyP7lA3IOyhnWqAzq3fC8XVm3nUhks5sWcbxgaJpZM4OosXc> .

dmage · 2017-08-09T17:08:31Z

Let's assume that we have 8000*3 equally distributed uploads in 5 minutes (very rough estimation, in practice I guess there will be a some kind of the Poisson distribution).

5 minutes between 8000*3 blobs = each 0.0125 second a new upload is started.
Let's assume that each upload takes 10 seconds. 10/0.0125 = 800 concurrent uploads.

Ok, let's assume that we have 1 GB and each upload uses 20 MB of RAM.
800*20 MB = 16 GB. Oops.

So we can allow only 1 GB/20MB = 50 concurrent uploads and enqueue almost all requests.

So our config is:

requests.write.maxrunning: 50
maxinqueue: 24000
maxwaitinqueue: 1200s

smarterclayton · 2017-08-09T19:00:05Z

In the big clusters we have between 20 and 40 GB of slack RAM, but I'd probably prefer to aim for 4-6GB on the largest clusters max. I'm actually ok with even lower, since we know that EC2 -> S3 saturates (right now) at around 80m/s. Recommendation is 64-128 simultaneous uploaders before hitting saturation in most tests i've seen.

…

On Wed, Aug 9, 2017 at 1:08 PM, Oleg Bulatov ***@***.***> wrote: Let's assume that we have 8000*3 equally distributed uploads in 5 minutes (very rough estimation, in practice I guess there will be a some kind of the Poisson distribution). 5 minutes between 8000*3 blobs = each 0.0125 second a new upload is started. Let's assume that each upload takes 10 seconds. 10/0.0125 = 800 concurrent uploads. Ok, let's assume that we have 1 GB and each upload uses 20 MB of RAM. 800*20 MB = 16 GB. Oops. So we can allow only 1 GB/20MB = 50 concurrent uploads and enqueue almost all requests. So our config is: requests.write.maxrunning: 50 maxinqueue: 24000 maxwaitinqueue: 1200s — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#15569 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p_kCD7uqP8Xb2e4wb9C07T95RGYZks5sWeeRgaJpZM4OosXc> .

dmage · 2017-08-09T21:01:57Z

You should think of the channel between EC2 and S3 if you care about tail latency and can scale up. Otherwise your data will be uploaded in a fixed amount of time no matter how it's ordered (the amount of data / egress bandwidth).

If we care about tail latency, then we should set the queue size to zero (clients will be served quicker, but some of them will have to retry). If we don't care, we can decrease the amount of 429 by increasing latency.

The value of maxrunning controls average bandwidth for an upload and from which point we have to scale up if we don't want to get 429 errors. This value depends on a server's capabilities.

But in the end, no matter how the limiter is configured, the cluster will upload all the images in almost a fixed time*. maxrunning reduces peak memory usage, maxinqueue reduces overhead for retries.

* if clients always try to repush on failure.

openshift-merge-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 31, 2017

openshift-merge-robot assigned liggitt and smarterclayton Jul 31, 2017

smarterclayton reviewed Jul 31, 2017

View reviewed changes

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 3, 2017

dmage force-pushed the maxconnections branch from aa13f25 to 86ed24c Compare August 4, 2017 16:10

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2017

dmage added the do-not-merge label Aug 4, 2017

dmage requested a review from miminar August 4, 2017 23:38

miminar suggested changes Aug 7, 2017

View reviewed changes

Add limit for number of concurrent connections to registry

949a64e

Signed-off-by: Oleg Bulatov <[email protected]>

dmage force-pushed the maxconnections branch from 86ed24c to 949a64e Compare August 7, 2017 13:45

openshift-merge-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2017

dmage removed the do-not-merge label Aug 7, 2017

miminar approved these changes Aug 7, 2017

View reviewed changes

openshift-ci-robot assigned legionus Aug 8, 2017

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 8, 2017

openshift-ci-robot assigned 0xmichalis Aug 8, 2017

openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2017

openshift-merge-robot merged commit 423e664 into openshift:master Aug 9, 2017

Add limit for number of concurrent connections to registry #15569

Add limit for number of concurrent connections to registry #15569

Conversation

dmage commented Jul 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smarterclayton Jul 31, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmage Aug 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevekuznetsov commented Aug 1, 2017

dmage commented Aug 4, 2017

smarterclayton commented Aug 5, 2017

Choose a reason for hiding this comment

dmage Aug 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miminar commented Aug 7, 2017

smarterclayton commented Aug 7, 2017 via email

legionus commented Aug 8, 2017

miminar commented Aug 8, 2017

0xmichalis commented Aug 8, 2017

openshift-merge-robot commented Aug 8, 2017

miminar commented Aug 8, 2017

openshift-merge-robot commented Aug 9, 2017

openshift-merge-robot commented Aug 9, 2017

smarterclayton commented Aug 9, 2017 via email

legionus commented Aug 9, 2017

smarterclayton commented Aug 9, 2017 via email

dmage commented Aug 9, 2017

smarterclayton commented Aug 9, 2017 via email

dmage commented Aug 9, 2017

smarterclayton Jul 31, 2017 •

edited

Loading

dmage Aug 2, 2017 •

edited

Loading

dmage Aug 7, 2017 •

edited

Loading