-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stream: allow codecs to reuse output buffers #2816
Labels
Comments
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 61821 65568 6.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116033.83 105560.37 -9.03% Allocs/op 111.79 117.89 5.37% ReqT/op 506437632.00 537133056.00 6.06% RespT/op 506437632.00 537133056.00 6.06% 50th-Lat 143.303µs 136.558µs -4.71% 90th-Lat 197.926µs 188.623µs -4.70% 99th-Lat 521.575µs 507.591µs -2.68% Avg-Lat 161.294µs 152.038µs -5.74% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 61821 65568 6.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116033.83 105560.37 -9.03% Allocs/op 111.79 117.89 5.37% ReqT/op 506437632.00 537133056.00 6.06% RespT/op 506437632.00 537133056.00 6.06% 50th-Lat 143.303µs 136.558µs -4.71% 90th-Lat 197.926µs 188.623µs -4.70% 99th-Lat 521.575µs 507.591µs -2.68% Avg-Lat 161.294µs 152.038µs -5.74% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_10s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 61821 65568 6.06% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116033.83 105560.37 -9.03% Allocs/op 111.79 117.89 5.37% ReqT/op 506437632.00 537133056.00 6.06% RespT/op 506437632.00 537133056.00 6.06% 50th-Lat 143.303µs 136.558µs -4.71% 90th-Lat 197.926µs 188.623µs -4.70% 99th-Lat 521.575µs 507.591µs -2.68% Avg-Lat 161.294µs 152.038µs -5.74% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 8, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
adtac
pushed a commit
to adtac/grpc-go
that referenced
this issue
Nov 9, 2019
Performance benchmarks can be found below. Obviously, a 10KB request and 10KB response is tailored to showcase this improvement as this is where codec buffer re-use shines, but I've run other benchmarks too (like 1-byte requests and responses) and there's no discernable impact on performance. To no one's surprise, the number of bytes allocated per operation goes down by almost exactly 10 KB across 60k+ queries, which suggests excellent buffer re-use. The number of allocations itself increases by 5-ish, but that's probably because of a few additional slice pointers that we need to store; these are 8-byte allocations and should have virtually no impact on the allocator and garbage collector. We do not allow reuse of buffers when stat handlers or binlogs are turned on. This is because those two may need access to the data and payload even after the data has been written to the wire. In such cases, we never return the data back to the pool. streaming-networkMode_none-bufConn_false-keepalive_false-benchTime_1m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_1-reqSize_10240B-respSize_10240B-compressor_off-channelz_false-preloader_false Title Before After Percentage TotalOps 370480 372395 0.52% SendOps 0 0 NaN% RecvOps 0 0 NaN% Bytes/op 116049.91 105488.90 -9.10% Allocs/op 111.59 118.27 6.27% ReqT/op 505828693.33 508443306.67 0.52% RespT/op 505828693.33 508443306.67 0.52% 50th-Lat 142.553µs 143.951µs 0.98% 90th-Lat 193.714µs 192.51µs -0.62% 99th-Lat 549.345µs 545.059µs -0.78% Avg-Lat 161.506µs 160.635µs -0.54% Closes grpc#2816
The PR that implemented this ultimately needed to be rolled back (#3307); this should have been reopened at that time. |
There's another attempt to reuse the buffer for reads, but the team didn't have time to review the PR (#3220 (comment)). |
Bring tags over to #6619 and close this down. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This would require a codec API change/extension to recycle the memory once grpc is done writing it to the wire (or compressing it).
The text was updated successfully, but these errors were encountered: