Keepalive pings should be sent every [Time+Timeout] period and not every [Time] period #2790

easwars · 2019-04-26T17:29:07Z

This commit makes the following changes:

Keep track of the time of the last read in the transport.
Use this in the keepalive implementation to decide when to send out
keepalives.
Address the issue of keepalives being sent every [Time+Timeout] period
instead of every [Time] period, as mandated by proposal A8.
Makes many of the transport tests to run in parallel (as most of them
spend a lot of time just sleeping, waiting for things to happen).

Proposal A8 is here:
https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md

canguler · 2019-05-08T17:45:27Z

internal/transport/http2_client.go

-	timer := time.NewTimer(t.kp.Time)
+	go t.activityMonitor()
+
+	ticker := time.NewTicker(defaultKeepaliveTickerDuration)


Why do we use a ticker rather than a timer?

With this implementation if timeout is greater that time and the client reads data t seconds after sending a ping where time < t < timeout, then the connection will be closed. But I think it shouldn't be.

I use a ticker instead of a timer for convenience reasons. Since we have a forever for loop here, if we use a timer we would have to take care of resetting the timer every time it fires. And since the timer value does not change, ticker is easier to use.

For your second comment, I think this is what will happen:

at time t0, lets say we have some read activity (and nothing after that)

at time t1 = (t0 + keepalive.Time), whenever the next ticker fires, we will start a pinger

at time t2 = (t1 + delta), where keepalive.Time < delta < keepalive.Timeout, we receive the ping_ack

the next time the ticker fires, we will detect read activity, and will close the pinger

Now, if keepalive.Timeout expires after t2, but before the ticker fires next, you are right, the connection will be closed. The proposal only asks for second granularity in these events, and since our ticker fires twice every second we should be able to guarantee that we do the right thing, except when the keepalive.Timeout expiration and the read of the ping_ack don't happen at such close proximity that our ticker granularity can't catch them. Does this make sense?

canguler · 2019-05-08T17:49:03Z

internal/transport/http2_client.go

+					// If the control gets here, it means that a ping was sent right
+					// after stream creation. We need to make sure we get an ack for it
+					// before kp.Timeout expires.
+					sendPing = false


I didn't understand how this can happen. Can you explain this?

In http2Client.NewStream(), in the case where the number of active streams has changed from 0 to 1, we write on the awakenKeepalive channel to awaken the keepalive goroutine. After sending on the awakenKeepalive channel, we also send out a PING. This implementation detail is based on the proposal A8. Related excerpt here:
Since keepalive is not occurring on HTTP/2 connections without any streams, there will be a higher chance of failure for new RPCs following a long period of inactivity. To reduce the tail latency for these RPCs, it is important to not reset the keepalive time when a connection becomes active; if a new stream is created and there has been greater than 'keepalive time' since the last read byte, then a keepalive PING should be sent (ideally before the HEADERS frame). Doing so detects the broken connection with a latency of keepalive timeout instead of keepalive time + timeout.

So, in our keepalive routine, when we are awakened we know that a PING has already been sent. So, we start the pinger and ask it to not send out a new PING.

canguler · 2019-05-08T17:49:57Z

internal/transport/transport_test.go

@@ -462,6 +462,8 @@ func setUpWithNoPingServer(t *testing.T, copts ConnectOptions, done chan net.Con
 // TestInflightStreamClosing ensures that closing in-flight stream
 // sends status error to concurrent stream reader.
 func TestInflightStreamClosing(t *testing.T) {
+	t.Parallel()


I believe @dfawley has some comments about running parallel tests.

I did see a considerable reduction in test execution time with parallel tests :)

canguler · 2019-05-08T17:51:13Z

internal/transport/http2_client.go

+	for {
+		select {
+		case <-t.lr.ch:
+			atomic.StoreInt64(&t.lr.time, time.Now().Unix())


As we already have an int64, should we store last read time with a higher precision?

I decided to go with second precision since that's what the proposal asks for, but we could as well store time.Now().UnixNano() if that is preferable.

easwars

PTAL.

easwars · 2019-05-08T18:30:53Z

internal/transport/http2_client.go

+	for {
+		select {
+		case <-t.lr.ch:
+			atomic.StoreInt64(&t.lr.time, time.Now().Unix())


I decided to go with second precision since that's what the proposal asks for, but we could as well store time.Now().UnixNano() if that is preferable.

easwars · 2019-05-08T18:47:46Z

internal/transport/http2_client.go

-	timer := time.NewTimer(t.kp.Time)
+	go t.activityMonitor()
+
+	ticker := time.NewTicker(defaultKeepaliveTickerDuration)


I use a ticker instead of a timer for convenience reasons. Since we have a forever for loop here, if we use a timer we would have to take care of resetting the timer every time it fires. And since the timer value does not change, ticker is easier to use.

For your second comment, I think this is what will happen:

at time t0, lets say we have some read activity (and nothing after that)

at time t1 = (t0 + keepalive.Time), whenever the next ticker fires, we will start a pinger

at time t2 = (t1 + delta), where keepalive.Time < delta < keepalive.Timeout, we receive the ping_ack

the next time the ticker fires, we will detect read activity, and will close the pinger

Now, if keepalive.Timeout expires after t2, but before the ticker fires next, you are right, the connection will be closed. The proposal only asks for second granularity in these events, and since our ticker fires twice every second we should be able to guarantee that we do the right thing, except when the keepalive.Timeout expiration and the read of the ping_ack don't happen at such close proximity that our ticker granularity can't catch them. Does this make sense?

easwars · 2019-05-08T18:54:01Z

internal/transport/http2_client.go

+					// If the control gets here, it means that a ping was sent right
+					// after stream creation. We need to make sure we get an ack for it
+					// before kp.Timeout expires.
+					sendPing = false


In http2Client.NewStream(), in the case where the number of active streams has changed from 0 to 1, we write on the awakenKeepalive channel to awaken the keepalive goroutine. After sending on the awakenKeepalive channel, we also send out a PING. This implementation detail is based on the proposal A8. Related excerpt here:
Since keepalive is not occurring on HTTP/2 connections without any streams, there will be a higher chance of failure for new RPCs following a long period of inactivity. To reduce the tail latency for these RPCs, it is important to not reset the keepalive time when a connection becomes active; if a new stream is created and there has been greater than 'keepalive time' since the last read byte, then a keepalive PING should be sent (ideally before the HEADERS frame). Doing so detects the broken connection with a latency of keepalive timeout instead of keepalive time + timeout.

So, in our keepalive routine, when we are awakened we know that a PING has already been sent. So, we start the pinger and ask it to not send out a new PING.

easwars · 2019-05-08T18:54:56Z

internal/transport/transport_test.go

@@ -462,6 +462,8 @@ func setUpWithNoPingServer(t *testing.T, copts ConnectOptions, done chan net.Con
 // TestInflightStreamClosing ensures that closing in-flight stream
 // sends status error to concurrent stream reader.
 func TestInflightStreamClosing(t *testing.T) {
+	t.Parallel()


I did see a considerable reduction in test execution time with parallel tests :)

canguler · 2019-05-10T18:09:41Z

internal/transport/http2_client.go

+}
+
+// keepalive running in a separate goroutune makes sure the connection is alive
+// by sending pings.
 func (t *http2Client) keepalive() {


Hey @easwars,

I think spawning a new go routine to keep track of the timeout for each ping sent may hurt the performance. Can we use a single go routine to keep track of Time and Timeout at the same time? I have this suggestion:

func minTime(a, b time.Duration) time.Duration { if a < b { return a } return b } func (t *http2Client) keepalive() { p := &ping{data: [8]byte{}} // True iff a PING sent, no data is received since then and didn't timeout. outstandingPing := false var ackWaitDuration time.Duration for { dataRead := atomic.CompareAndSwapUint32(&t.activity, 1, 0) if dataRead { outstandingPing = false } // If there was an outstanding ping, last timer was a timeout timer and the ping timed out if outstandingPing && ackWaitDuration == 0 { t.Close() return } // Last timer was a time timer // No data read if !dataRead { t.mu.Lock() dormant := len(t.activeStreams) < 1 && !t.kp.PermitWithoutStream t.mu.Unlock() // If not dormant and no outstanding ping then send a PING if !dormant && !outstandingPing { t.controlBuf.put(p) ackWaitDuration = t.kp.Timeout outstandingPing = true } } sleepDuration := t.kp.Time // If a PING is sent then we set up a timer for minimum of remaining timeout duration and time duration if outstandingPing { sleepDuration = minTime(t.kp.Time, ackWaitDuration) ackWaitDuration -= sleepDuration } timer := time.NewTimer(sleepDuration) select { case <-timer.C: case <-t.ctx.Done(): if !timer.Stop() { <-timer.C } return } } }

A couple of notes on this code:
It uses t.activity rather than your last read time. I think using last read time is much better. But that requires some changes in the if conditions (time since last read with kp.Time or kp.Timeout)

It is an oversimplification of the dormancy case. I think it is functionally equivalent. We don't send any PINGs if dormancy conditions are met but the go routine keeps working. I think having this go routine sleep if it is dormant is a good idea.

I modified the code to look like this:

func (t *http2Client) keepalive() { p := &ping{data: [8]byte{}} // True iff a PING has been sent, and no data has been received since then // and the PING hasn't timed out. var outstandingPing bool // Amount of time remaining before which we should receive an ACK for the // last sent PING. var timeoutLeft time.Duration // UnixNanos recorded before we go block on the timer. This is required to // check for read activity since then. var prevNano int64 for { dataRead := false if prevNano < atomic.LoadInt64(&t.lr.timeNano) { // Read activity since the last time we were here. outstandingPing = false dataRead = true } // Outstanding PING timed out, we are done. if outstandingPing && timeoutLeft <= 0 { t.Close() return } if !dataRead { // Check if keepalive should go dormant. t.mu.Lock() if len(t.activeStreams) < 1 && !t.kp.PermitWithoutStream { // Make awakenKeepalive writable. <-t.awakenKeepalive t.mu.Unlock() select { case <-t.awakenKeepalive: // If the control gets here a ping has been sent // need to reset the timer with keepalive.Timeout. timeoutLeft = t.kp.Timeout outstandingPing = true case <-t.ctx.Done(): return } } else { t.mu.Unlock() if !outstandingPing { if channelz.IsOn() { atomic.AddInt64(&t.czData.kpCount, 1) } t.controlBuf.put(p) timeoutLeft = t.kp.Timeout outstandingPing = true } } } // Amount of kp.Time remaining should be calculated from the time of the // last read activity. timeLeft := t.kp.Time if dataRead { timeLeft = time.Duration(atomic.LoadInt64(&t.lr.timeNano)) + t.kp.Time - time.Duration(time.Now().UTC().UnixNano()) } // If a PING is outstanding, the amount of time to sleep here should be the // minimum of timeoutLeft and timeLeft. sleepDuration := timeLeft if outstandingPing { sleepDuration = minTime(timeLeft, timeoutLeft) timeoutLeft -= sleepDuration } prevNano = time.Now().UTC().UnixNano() timer := time.NewTimer(sleepDuration) select { case <-timer.C: case <-t.ctx.Done(): if !timer.Stop() { <-timer.C } return } } }

Existing tests seem to pass. I still need to warm up more to this code though, although I do feel this is better. I will also try to see if I can add more tests to increase my confidence level.

Thanks for the suggestion.

easwars · 2019-06-11T20:34:42Z

Apologize for the hibernation on this thread.

Finally got around to run some benchmarks with keepalive enabled.

Features for the benchmark::

Network: Local
EnableKeepalive: true
BenchTime: 1m0s
Trace: false
Latency: 0s
Kbps: 0
MTU: 0
Callers: 64
ReqSize: 1B
RespSize: 1B
Compressor: nop
Channelz: false
Preloader: false

Here before is without the proposed changes and after is with the proposed changes::

           Title       Before        After Percentage  
        TotalOps     14269369     14075722    -1.36%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  76580533448  75557207568    -1.34%
       Allocs/op    624578386    616661452    -1.27%
         ReqT/op   1902582.53   1876762.93    -1.36%
        RespT/op   1902582.53   1876762.93    -1.36%
        50th-Lat    251.155µs    253.045µs     0.75%
        90th-Lat    330.374µs    334.528µs     1.26%
        99th-Lat    938.035µs    954.803µs     1.79%
         Avg-Lat    268.276µs    272.007µs     1.39%

Here before is without the proposed changes and after is with the proposed changes (but the last timestamp is updated directly in http2Client.reader(), instead of using a channel and a separate goroutine for it::

           Title       Before        After Percentage
        TotalOps     14269369     14106600    -1.14%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  76580533448  75717126112    -1.13%
       Allocs/op    624578386    617685281    -1.10%
         ReqT/op   1902582.53   1880880.00    -1.14%
        RespT/op   1902582.53   1880880.00    -1.14%
        50th-Lat    251.155µs    254.051µs     1.15%
        90th-Lat    330.374µs    333.547µs     0.96%
        99th-Lat    938.035µs    947.897µs     1.05%
         Avg-Lat    268.276µs    271.305µs     1.13%

Here before is with the proposed changes (timestamp is updated in a separate goroutine) while after is where the timestamp is directly updated in http2Client.reader()::

           Title       Before        After Percentage
        TotalOps     14075722     14106600     0.22%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  75557207568  75717126112     0.21%
       Allocs/op    616661452    617685281     0.17%
         ReqT/op   1876762.93   1880880.00     0.22%
        RespT/op   1876762.93   1880880.00     0.22%
        50th-Lat    253.045µs    254.051µs     0.40%
        90th-Lat    334.528µs    333.547µs    -0.29%
        99th-Lat    954.803µs    947.897µs    -0.72%
         Avg-Lat    272.007µs    271.305µs    -0.26%

There is negligible performance difference between the two approaches. So I can go with either one.

@canguler @dfawley : I couldn't figure out how to update the timestamp directly in some syscall handler that you mentioned. Could you please point me to that, and I can try that out as well.
Also, please let me know if I should be running the benchmarks with a different set of features.
Thanks.

canguler · 2019-06-13T00:03:06Z

Hey @easwars,

I've talked with @dfawley. Framer reads from conn. But we are wrapping this with a bufio.Reader. An option would be to implement this wrapper ourselves, just as we did for the writer. Here is the relevant part:

grpc-go/internal/transport/http_util.go

Lines 661 to 669 in 684ef04

    
           var r io.Reader = conn 
        
           if readBufferSize > 0 { 
        
           	r = bufio.NewReaderSize(r, readBufferSize) 
        
           } 
        
           w := newBufWriter(conn, writeBufferSize) 
        
           f := &framer{ 
        
           	writer: w, 
        
           	fr:     http2.NewFramer(w, r), 
        
           }

Once we have an implementation for this wrapper, then we can use it to get the timestamp whenever Read is called on it.

easwars · 2019-06-20T23:20:51Z

@canguler

I tried implementing just the parts of bufio.Reader that we need here, and it turns it that we need quite a bit. We not only need to implement the Read() method in our wrapper, we also need to implement WriteTo(), writeBuf() and fill(). (I tried initially with implementing just Read() and the benchmark results go south big time).

So, with the new bufReader type which also captures the lastReadTimestamp, here are the benchmark results with keepalive enabled.

Comparison between current code (none of the changes proposed in this PR) and keepalive changes with the new bufReader type::

           Title       Before        After Percentage
        TotalOps     14269369     13929606    -2.38%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  76580533448  74776051880    -2.36%
       Allocs/op    624578386    609889619    -2.35%
         ReqT/op   1902582.53   1857280.80    -2.38%
        RespT/op   1902582.53   1857280.80    -2.38%
        50th-Lat    251.155µs    254.101µs     1.17%
        90th-Lat    330.374µs    339.288µs     2.70%
        99th-Lat    938.035µs    970.433µs     3.45%
         Avg-Lat    268.276µs    274.796µs     2.43%

Comparison between proposed changes in this PR (writing timestamp on a channel) and the new bufReader type::

           Title       Before        After Percentage
        TotalOps     14075722     13929606    -1.04%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  75557207568  74776051880    -1.03%
       Allocs/op    616661452    609889619    -1.10%
         ReqT/op   1876762.93   1857280.80    -1.04%
        RespT/op   1876762.93   1857280.80    -1.04%
        50th-Lat    253.045µs    254.101µs     0.42%
        90th-Lat    334.528µs    339.288µs     1.42%
        99th-Lat    954.803µs    970.433µs     1.64%
         Avg-Lat    272.007µs    274.796µs     1.03%

Comparison between writing timestamp directly in http2Client.reader() and the new bufReader type::

           Title       Before        After Percentage
        TotalOps     14106600     13929606    -1.25%
         SendOps            0            0      NaN%
         RecvOps            0            0      NaN%
        Bytes/op  75717126112  74776051880    -1.24%
       Allocs/op    617685281    609889619    -1.26%
         ReqT/op   1880880.00   1857280.80    -1.25%
        RespT/op   1880880.00   1857280.80    -1.25%
        50th-Lat    254.051µs    254.101µs     0.02%
        90th-Lat    333.547µs    339.288µs     1.72%
        99th-Lat    947.897µs    970.433µs     2.38%
         Avg-Lat    271.305µs    274.796µs     1.29%

No significant performance difference between either of the three approaches, while a slight degradation in performance for all three alternatives when compared to existing code (which was expected I guess).

This commit makes the following changes: * Keep track of the time of the last read in the transport. * Use this in the keepalive implementation to decide when to send out keepalives. * Address the issue of keepalives being sent every [Time+Timeout] period instead of every [Time] period, as mandated by proposal A8. * Makes many of the transport tests to run in parallel (as most of them spend a lot of time just sleeping, waiting for things to happen). Proposal A8 is here: https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md This commit addresses: grpc#2638

This resolver doesn't do much at this point, except returning an empty address list and a hard-coded service config which picks the xds balancer with a round_robin child policy. Also moved the xdsConfig struct to the xds/internal package and exported it as LBConfig, so that both the resolver and the balancer packages can make use of this.

Total number of Allocs and AllocedBytes retrieved from runtime.Memstats() were not being divided by the number of completed operations during the benchmark run, to get the correct number of Allocs/op and Bytes/op.

The current code was using a buffered channel which was used to signal whether the keepalive goroutine should enter dormancy or if it should be awaked from sleep. Using sync.Cond makes the code much simpler to read, and also improves the performance numbers in all but one front.

…2996) fixes grpc#2995

This commit makes the following changes: * Keep track of the time of the last read in the transport. * Use this in the keepalive implementation to decide when to send out keepalives. * Address the issue of keepalives being sent every [Time+Timeout] period instead of every [Time] period, as mandated by proposal A8. * Makes many of the transport tests to run in parallel (as most of them spend a lot of time just sleeping, waiting for things to happen). Proposal A8 is here: https://github.com/grpc/proposal/blob/master/A8-client-side-keepalive.md This commit addresses: grpc#2638

linux-foundation-easycla · 2019-08-30T00:39:17Z

One or more committers are not authorized under a signed CLA as indicated below. Please click here to be authorized.

✅ Easwar Swaminathan (c0a84df, 097320e, dc18754, 6fb38bd, 45bd284, c3a2e9b, 99b54ce, 4581854, 4b464ee)
✅ Srini Polavarapu (ad20eaf)
✅ Michael Hudson-Doyle (d5a36f0)

❌ Michael Le (0574097)

easwars · 2019-10-15T23:20:03Z

This is now tracked in #3102.

easwars requested a review from dfawley April 29, 2019 18:57

dfawley assigned dfawley and canguler and unassigned dfawley May 2, 2019

canguler reviewed May 8, 2019

View reviewed changes

easwars commented May 8, 2019

View reviewed changes

canguler reviewed May 10, 2019

View reviewed changes

dfawley assigned easwars and unassigned canguler May 23, 2019

easwars force-pushed the keepalive branch from 3e4a7c5 to f6cf813 Compare June 11, 2019 01:18

easwars force-pushed the keepalive branch from f6cf813 to bb668dd Compare August 20, 2019 22:49

easwars force-pushed the keepalive branch from bb668dd to c0a84df Compare August 20, 2019 22:50

easwars and others added 11 commits August 20, 2019 16:00

Fix a bad merge during rebase.

097320e

Fix a bug in benchmark code. (grpc#2989)

6fb38bd

Total number of Allocs and AllocedBytes retrieved from runtime.Memstats() were not being divided by the number of completed operations during the benchmark run, to get the correct number of Allocs/op and Bytes/op.

example: use proto message Get methods in helloworld (grpc#2990)

0574097

change from Inc. to LLC for Google (grpc#2997)

ad20eaf

credentials/alts: Assume not running on GCP if DMI not present (grpc#…

d5a36f0

…2996) fixes grpc#2995

Fix a bad merge during rebase.

99b54ce

Merge branch 'keepalive' of github.com:easwars/grpc-go into keepalive

4581854

Missed during merging process.

4b464ee

easwars closed this Oct 15, 2019

easwars mentioned this pull request Oct 15, 2019

transport: client keepalives should be sent every [Time] period #3102

Merged

easwars deleted the keepalive branch November 11, 2019 23:01

lock bot locked as resolved and limited conversation to collaborators May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keepalive pings should be sent every [Time+Timeout] period and not every [Time] period #2790

Keepalive pings should be sent every [Time+Timeout] period and not every [Time] period #2790

easwars commented Apr 26, 2019

canguler May 8, 2019

canguler May 8, 2019

easwars May 8, 2019

canguler May 8, 2019

easwars May 8, 2019

canguler May 8, 2019

easwars May 8, 2019

canguler May 8, 2019

easwars May 8, 2019

easwars left a comment

easwars May 8, 2019

easwars May 8, 2019

easwars May 8, 2019

easwars May 8, 2019

canguler May 10, 2019

canguler May 10, 2019

easwars May 11, 2019

easwars commented Jun 11, 2019

canguler commented Jun 13, 2019

easwars commented Jun 20, 2019

linux-foundation-easycla bot commented Aug 30, 2019

easwars commented Oct 15, 2019

Keepalive pings should be sent every [Time+Timeout] period and not every [Time] period #2790

Keepalive pings should be sent every [Time+Timeout] period and not every [Time] period #2790

Conversation

easwars commented Apr 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easwars left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

easwars commented Jun 11, 2019

canguler commented Jun 13, 2019

easwars commented Jun 20, 2019

linux-foundation-easycla bot commented Aug 30, 2019

easwars commented Oct 15, 2019