runtime: make sure blocked channels run operations in FIFO order #11506

rsc · 2015-07-01T23:14:37Z

I've had mail conversations with two different projects using Go recently, both centered around the surprise that if many goroutines are blocked on a particular channel, and that channel becomes available, it is still possible for running goroutines who "drive by" at the right time to get their operation in before the goroutines that are blocked, and it is possible for the goroutines who are blocked to be reordered as a result. If this happens repeatedly, the effect is that the blocked goroutines can block arbitrarily long even though the channel is known to be ready at regular intervals. I wonder if we should adjust the channel implementation to ensure that when a channel does become available for sending or receiving, blocked operations take priority over "drive by" operations. It seems to me that this can be done by completing the blocked operation as part of the operation that unblocks it.

Sends and receives on unbuffered channels already behave this way: a send with blocked receivers picks the first receiver goroutine off the queue, delivers the value to it, readies it (puts it on a run queue), and continues execution. That is, the send completes the blocked operation. Receives on unbuffered channels similarly complete blocked sends.

Sends and receives on buffered channels do not behave this way:

A send into a buffered channel with blocked receivers stores the value in the buffer, wakes up a receiver, and continues executing. When the receiver is eventually scheduled, it checks the channel, and maybe it gets lucky and the value is still there. But maybe not, in which case it goes to the end of the queue.

A receive out of a buffered channel copies a value out of the buffer, wakes a blocked sender, and continues. When the sender is eventually scheduled, it checks the channel, and maybe it gets lucky and there is still room in the buffer for the send. But maybe not, in which case it goes to the end of the queue.

It seems to me that it would be easy and symmetric with the unbuffered channel operations for a send with blocked receivers to deliver the value straight to a receiver, completing the first pending receive operation. Similarly a receive with blocked senders could take its own value out of the channel buffer and then complete the send into the newly emptied space. That would be the only operation of the four that needs to transfer a pair of values instead of just one.

It doesn't seem to me that it would hurt performance to do this, and in fact it might help, since there would not be all the unnecessary wakeups that happen now. It would give much more predictable behavior when goroutines queue up on a stuck channel: once queued, that's the order they're going to be served, guaranteed.

From a "happens before" point of view, I'm talking nonsense. There is no "happens before" for two different goroutines blocking on the same channel, and so there is no difference between all the execution orders. I understand that. But from a "what actually happens" point of view, there certainly is a difference: if you know that the goroutines have been stacking up one per minute for an hour before the channel finally gets unblocked and they all complete, you know what order they blocked and might reasonably expect them to unblock in that same order. This remains roughly as true at shorter time scales.

Especially since so many Go programs care about latency, making channels first come, first served seems worth doing. And it looks to me like it can be done trivially and at no performance cost. The current behavior looks more like a bug than an intended result.

I'm not talking about a language change, just a change to our implementation.

Thoughts?

@robpike @dvyukov @randall77 @aclements

dr2chase · 2015-07-01T23:51:33Z

I would make this change. The current behavior sounds unfriendly to me, and years past I used a similar policy for locks and waits (in Java) and thought it gave good behavior. Starvation is bad.

robpike · 2015-07-02T00:25:24Z

Language semantics require that the values be FIFO, but the implications of that with multiple goroutines reading the values are unspecified and muddy at best. That said, I believe the claim here that it seems more natural for blocked ones to go first; it just seems intuitive.

I think you could even make a case that the current situation is almost a bug.

There may be a noticeable performance hit for heavy contention, though, because I believe this style requires asymptotically more context switches. I still think channels should behave as you suggest.

i3d · 2015-07-02T00:29:28Z

I'd fully support this. I think this is a more "natural" result that I was expecting from one of the discussion threads we had.

randall77 · 2015-07-02T01:30:56Z

I already have a CL for basically exactly this.

https://go-review.googlesource.com/#/c/9345/

aclements · 2015-07-02T02:45:54Z

From a specification perspective, it's true that our current approach satisfies the happens-before graph, but it does not satisfy liveness. That's a formally defined, reasonable, and desirable property we could specify as a requirement of channels without specifying something as specific as FIFO blocking.

Implementation-wise, I'm concerned about making blocking strictly FIFO order. This seems like the exact same mistake as making map order deterministic or goroutine scheduling deterministic. People can and will come to depend on this and eventually we may want to weaken it. In particular, this is asking for scalability problems. One of the cardinal rules that came out of my thesis was to not impose strict ordering on operations unless it is a fundamental requirement. It is by no means a fundamental requirement of the blocking order.

I fully support addressing the liveness issue, and I like the approach of eliminating the wake-up race, but if we're going to do that, we should think about whether we can introduce bounded randomness into the blocking order to prevent people from depending on strict FIFO blocking.

cespare · 2015-07-02T02:52:16Z

@aclements FIFO blocking seems harder to incorrectly rely on than map ordering. Perhaps, like goroutine scheduling, it should only be randomized with -race?

sougou · 2015-07-02T03:21:35Z

This change will improve tail latency during bursty load, which is what we observed in one of our benchmarks in vitess.
If accepted, should the language specification be amended to reflect this behavior?

dr2chase · 2015-07-02T03:39:04Z

Flip a coin (linear feedback shift register), choose one of the top two, till we come up with a better reason for some other policy.Starvation is exponentially rare, service is nearly fifo, but order will usually differ if there's more than one waiter.

randall77 · 2015-07-02T03:52:41Z

I think we're overthinking the randomization. With randomized goroutine scheduling (which is already going in, at least for -race testing), we'll get randomized channel ordering for free. The random scheduling should change the order in which waiters get queued and the order in which sends get matched to the waiters.

dvyukov · 2015-07-02T11:52:28Z

You seem to assume that the entity that benefits from reduced latency is a goroutine only. This is true when a goroutine services a request and needs to acquire some aux resource (e.g. a database connection). But this is not true for all producer-consumer/pipelining scenarios, where the entity that benefits from reduced latency is a message in a chan. Today messages in chans are serviced strictly FIFO and with minimal latency: the next running goroutine picks up the first message in a chan. What you propose improves the database connection pool scenario but equally worsens the producer-consumer scenario. Because under your proposal we handoff the first message in the chan to a goroutine that will run who-knows-when, and a goroutine that drives by the chan next moment either blocks or services the second message ahead of the first one. If we switch the implementation we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for fairness. They can't ensure user-perceived fairness. Consider that to service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer requests (that do first query). As the result some requests can experience no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to requirement of lock-step scheduling order. Which becomes significantly worse if you add a bunch of unrelated goroutines to the mix, so that you have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900.

RLH · 2015-07-02T13:08:29Z

I like the imagery of a drive by goroutine a lot, the literature also uses
the boring term barging. To put a number on the cost of fair locks Doug Lea
(http://gee.cs.oswego.edu/dl/papers/aqs.pdf) reports a 1 to 2 orders of
magnitude slowdown over locks that allow barging. They
(java.util.concurrency circa 2004) also stepped back from a definition of
fair to be less than a strict FIFO.
In the intervening decade since that paper Hannes Payer has done some
promising work on queues with even weaker semantics such as a allowing a
pop to return some value near, as opposed to at, the front of the queue.
I believe we can build fairness on top of weaker unfair semantics so not
cooking the stronger semantics into Go is the way forward.

On Thu, Jul 2, 2015 at 7:52 AM, Dmitry Vyukov [email protected]
wrote:

You seem to assume that the entity that benefits from reduced latency is a
goroutine only. This is true when a goroutine services a request and needs
to acquire some aux resource (e.g. a database connection). But this is not
true for all producer-consumer/pipelining scenarios, where the entity that
benefits from reduced latency is a message in a chan. Today messages in
chans are serviced strictly FIFO and with minimal latency: the next
running goroutine picks up the first message in a chan. What you
propose improves the database connection pool scenario but equally worsens
the producer-consumer scenario. Because under your proposal we handoff the
first message in the chan to a goroutine that will run who-knows-when, and
a goroutine that drives by the chan next moment either blocks or services
the second message ahead of the first one. If we switch the implementation
we will start receiving complaints about the other scenarios.
Also sase synchronization primitives is usually the wrong level for
fairness. They can't ensure user-perceived fairness. Consider that to
service a request you need to do 10 DB queries. DB pool has fair
queue, however older requests (that do 10-th query ) complete with newer
requests (that do first query). As the result some requests can experience
no waiting time, while others can wait for a hundred of requests in total.
What you propose is known to significantly reduce performance due to
requirement of lock-step scheduling order. Which becomes significantly
worse if you add a bunch of unrelated goroutines to the mix, so that you
have lock-step with a very long step time.
Regarding the unnecessary wakeup, it's easy to fix. See #8900
#8900.

—
Reply to this email directly or view it on GitHub
#11506 (comment).

gopherbot · 2015-10-23T21:00:27Z

CL https://golang.org/cl/9345 mentions this issue.

This change is the same as CL #9345 which was reverted, except for a small bug fix. The only change is to the body of sendDirect and its callsite. Also added a test. The problem was during a channel send operation. The target of the send was a sleeping goroutine waiting to receive. We basically do: 1) Read the destination pointer out of the sudog structure 2) Copy the value we're sending to that destination pointer Unfortunately, the previous change had a goroutine suspend point between 1 & 2 (the call to sendDirect). At that point the destination goroutine's stack could be copied (shrunk). The pointer we read in step 1 is no longer valid for step 2. Fixed by not allowing any suspension points between 1 & 2. I suspect the old code worked correctly basically by accident. Fixes #13169 The original 9345: This change removes the retry mechanism we use for buffered channels. Instead, any sender waking up a receiver or vice versa completes the full protocol with its counterpart. This means the counterpart does not need to relock the channel when it wakes up. (Currently buffered channels need to relock on wakeup.) For sends on a channel with waiting receivers, this change replaces two copies (sender->queue, queue->receiver) with one (sender->receiver). For receives on channels with a waiting sender, two copies are still required. This change unifies to a large degree the algorithm for buffered and unbuffered channels, simplifying the overall implementation. Fixes #11506 Change-Id: I57dfa3fc219cffa4d48301ee15fe5479299efa09 Reviewed-on: https://go-review.googlesource.com/16740 Reviewed-by: Ian Lance Taylor <[email protected]>

rsc self-assigned this Jul 1, 2015

rsc added this to the Go1.6Early milestone Jul 1, 2015

mdempsky mentioned this issue Oct 27, 2015

runtime: lock_sema.go uses LIFO ordering for waking blocked Ms #13056

Open

gopherbot closed this as completed in 8e496f1 Nov 5, 2015

dgryski mentioned this issue Jul 14, 2016

runtime: Add loosely ordered channels? #16364

Closed

joshpurvis mentioned this issue Jul 20, 2016

Logs sometimes out of order with awslogs log driver moby/moby#24775

Closed

golang locked and limited conversation to collaborators Nov 4, 2016

gopherbot added the FrozenDueToAge label Nov 4, 2016

rsc removed their assignment Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: make sure blocked channels run operations in FIFO order #11506

runtime: make sure blocked channels run operations in FIFO order #11506

rsc commented Jul 1, 2015

dr2chase commented Jul 1, 2015

robpike commented Jul 2, 2015

i3d commented Jul 2, 2015

randall77 commented Jul 2, 2015

aclements commented Jul 2, 2015

cespare commented Jul 2, 2015

sougou commented Jul 2, 2015

dr2chase commented Jul 2, 2015

randall77 commented Jul 2, 2015

dvyukov commented Jul 2, 2015

RLH commented Jul 2, 2015

gopherbot commented Oct 23, 2015

runtime: make sure blocked channels run operations in FIFO order #11506

runtime: make sure blocked channels run operations in FIFO order #11506

Comments

rsc commented Jul 1, 2015

dr2chase commented Jul 1, 2015

robpike commented Jul 2, 2015

i3d commented Jul 2, 2015

randall77 commented Jul 2, 2015

aclements commented Jul 2, 2015

cespare commented Jul 2, 2015

sougou commented Jul 2, 2015

dr2chase commented Jul 2, 2015

randall77 commented Jul 2, 2015

dvyukov commented Jul 2, 2015

RLH commented Jul 2, 2015

gopherbot commented Oct 23, 2015