Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add limit for number of concurrent connections to registry #15569

Merged
merged 1 commit into from
Aug 9, 2017

Conversation

dmage
Copy link
Contributor

@dmage dmage commented Jul 31, 2017

The registry might have excessive resource usage under heavy load. To avoid this, we limit the number of concurrent requests. Requests over the MaxRunning limit are enqueued. Requests are rejected if there are MaxInQueue requests in the queue. Request may stay in the queue no more than MaxWaitInQueue.

See also #15448.

@openshift-merge-robot openshift-merge-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 31, 2017
return true
}

func TestCoutner(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

"time"
)

func defaultOverloadHandler(w http.ResponseWriter, r *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be consistent with Kubernetes unless we have a reason not to. If we have a reason not to, it should be documented in comments here.

}

// DefaultOverloadHandler is a default OverloadHandler that used by New.
var DefaultOverloadHandler http.Handler = http.HandlerFunc(defaultOverloadHandler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public?


var timer *time.Timer
var timeout <-chan time.Time
if h.MaxWaitInQueue > 0 {
Copy link
Contributor

@smarterclayton smarterclayton Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If MaxWaitInQueue is zero, why wouldn't we exit earlier in the function (i.e., before the h.queue enqueue above)?


func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if h.enqueueRunning(r.Context()) {
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be better as a non-closure. defer h.done()

@@ -137,7 +138,15 @@ func Execute(configFile io.Reader) {
app.Config.HTTP.Headers.Set("X-Registry-Supports-Signatures", "1")

app.RegisterHealthChecks()
handler := alive("/", app)
handler := http.Handler(app)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limiter is fine, but this needs to be much more discriminating. Blob HEAD and GET shouldn't be under the same rate limit. This is too broad to solve the existing problem without adding a new one.

This needs to only apply to blob upload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok with single quota for a set of methods / paths that only includes uploads, as long as we can identify up front (before we merge) that it addresses the issue at hand. Future changes can well expand it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton afaik HEAD is triggering mirroring of blobs that is equivalent to uploading. Why that should not be rate-limited?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, but our read registry traffic is an order of magnitude than our write load. Setting a read rate limit is going to dwarf the write limit, which mean we'll have to set the read limit too low to preserve our current scale.

If pull though proxy needs to be limited, then it's possible we need another limiter wired in (or connect this limiter to that). However, if this only addresses non-pullthroigh it gets us closer to not broken. We aren't breaking today because of pullthrough though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we already have a crude pullthrough limiter that limits max simultaneous writes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton what I was saying is that HEAD will cause a write to blob store (we start mirroring and uploading to S3 in background, bypassing the API and rate-limiter).

@dmage or @legionus can explain it better, seems like i am a proxy here :) maybe you guys should talk

Copy link
Contributor Author

@dmage dmage Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a crude pullthrough limiter that limits max simultaneous writes.

It's better to think that we haven't, it has serious leaking problems (and it's not a pull-through limiter, but a storage writer limiter).

If pull though proxy needs to be limited

Pull-through can trigger mirroring which uses blob writers which consume memory. So... I don't know, maybe we needn't. Do we want to have the same limiter for mirroring and for PUT/PATCH requests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any write (upload) should be part of the same rate limit pool. Otherwise we have to come up with heuristics and split the pool, which then creates more operational overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But right now mirroring is not the primary perf problem we have. Any fix we do should be able to be adapted to cover mirroring, but it's not required that we do that right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this PR, it still has one TODO, but the main idea I think will stay the same: use two limiters, one for GET/HEAD requests, and another one for PATCH/PUT and mirroring.

@stevekuznetsov
Copy link
Contributor

/test end_to_end

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 3, 2017
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2017
@dmage
Copy link
Contributor Author

dmage commented Aug 4, 2017

/retest

@dmage dmage requested a review from miminar August 4, 2017 23:38
@smarterclayton
Copy link
Contributor

A few minor comments, but looks ok.

}

select {
case l.running <- struct{}{}:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be FIFO instead of random selection from queued requests? Otherwise I wouldn't call them queued requests but waiting requests.

Copy link
Contributor Author

@dmage dmage Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current implementation of Go, it is FIFO (golang/go#11506). While it does not mentioned in the language specification, I don't want to increase complexity because of it. The authors of Go try to avoid problems with "tail latency during bursty load" and I'm fine with it even if it would not be genuine FIFO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmage very interesting. In Go Playground. It's behaving like FIFO. Locally, with version go version go1.8.3 linux/amd64, I get totally different behavior:

go run bufchan.go
reader #0 got message from writer #1
reader #2 got message from writer #3
reader #4 got message from writer #2
reader #5 got message from writer #7
reader #6 got message from writer #4
reader #7 got message from writer #5
reader #8 got message from writer #8
reader #3 got message from writer #6
reader #1 got message from writer #0
reader #9 got message from writer #9

There are clearly conspicuous rising sequences but it's far from FIFO IMHO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I agree, let's put the burden on the golang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use n on line 17 and to increase sleep time, 10 us might be too low.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, now it looks like FIFO indeed 😉.

@openshift-merge-robot openshift-merge-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2017
@dmage dmage removed the do-not-merge label Aug 7, 2017
@miminar
Copy link

miminar commented Aug 7, 2017

/approve

@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 7, 2017 via email

@legionus
Copy link
Contributor

legionus commented Aug 8, 2017

/approve
/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 8, 2017
@miminar
Copy link

miminar commented Aug 8, 2017

/assign kargakis
as a pkg/cmd approver

@0xmichalis
Copy link
Contributor

/approve

@miminar can you please add an OWNERS file inside pkg/cmd/dockerregistry with your names as approvers? Thanks.

@openshift-merge-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dmage, kargakis, legionus, miminar

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2017
@miminar
Copy link

miminar commented Aug 8, 2017

@miminar can you please add an OWNERS file inside pkg/cmd/dockerregistry with your names as approvers? Thanks.

Your wish is my command: #15670

@openshift-merge-robot
Copy link
Contributor

/test all [submit-queue is verifying that this PR is safe to merge]

@openshift-merge-robot
Copy link
Contributor

Automatic merge from submit-queue

@openshift-merge-robot openshift-merge-robot merged commit 423e664 into openshift:master Aug 9, 2017
@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 9, 2017 via email

@legionus
Copy link
Contributor

legionus commented Aug 9, 2017

How do we prepare to test this safely in our largest environments? We need
to have a plan for verifying this addresses the issue and can safely enable
it. Perhaps free int or free stg.

@smarterclayton We need monitoring data to specify the correct rate limits. The question is do we have statistics of the number of requests per time from these clusters.

@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 9, 2017 via email

@dmage
Copy link
Contributor Author

dmage commented Aug 9, 2017

Let's assume that we have 8000*3 equally distributed uploads in 5 minutes (very rough estimation, in practice I guess there will be a some kind of the Poisson distribution).

5 minutes between 8000*3 blobs = each 0.0125 second a new upload is started.
Let's assume that each upload takes 10 seconds. 10/0.0125 = 800 concurrent uploads.

Ok, let's assume that we have 1 GB and each upload uses 20 MB of RAM.
800*20 MB = 16 GB. Oops.

So we can allow only 1 GB/20MB = 50 concurrent uploads and enqueue almost all requests.

So our config is:

requests.write.maxrunning: 50
maxinqueue: 24000
maxwaitinqueue: 1200s

@smarterclayton
Copy link
Contributor

smarterclayton commented Aug 9, 2017 via email

@dmage
Copy link
Contributor Author

dmage commented Aug 9, 2017

You should think of the channel between EC2 and S3 if you care about tail latency and can scale up. Otherwise your data will be uploaded in a fixed amount of time no matter how it's ordered (the amount of data / egress bandwidth).

If we care about tail latency, then we should set the queue size to zero (clients will be served quicker, but some of them will have to retry). If we don't care, we can decrease the amount of 429 by increasing latency.

The value of maxrunning controls average bandwidth for an upload and from which point we have to scale up if we don't want to get 429 errors. This value depends on a server's capabilities.

But in the end, no matter how the limiter is configured, the cluster will upload all the images in almost a fixed time*. maxrunning reduces peak memory usage, maxinqueue reduces overhead for retries.

* if clients always try to repush on failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants