-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/json: incorrect usage of sync.Pool #27735
Comments
Change https://golang.org/cl/136035 mentions this issue: |
It's been nearly a year, and no real attention to this issue. So what are the options for fixing this? The simplest would be to set an arbitrary size limit for buffers to recycle. Whatever number is selected will hurt performance for someone, I'd wager. When using a Providing a package-level variable is an option, too, but an ugly one that should absolutely be avoided (I just mention it for completeness sake). Eliminating the need internal buffering in the json encoder could be another option. It wouldn't solve this problem for everyone, but it would provide recourse for those who are affected enough to care. I believe this was the exact issue that I think brought this up (https://go-review.googlesource.com/c/go/+/135595), which makes it a bit ironic that this issue may be holding up that one :) |
@flimzy, one approach would be to keep statistics on the output size, and only return a buffer to the pool if it isn't a statistical outlier. The problem there is that the statistical computations need to be cheap enough that they don't swamp out the benefit from using Another (simpler?) approach would be to pull a buffer from the pool, use it, and abandon it instead of returning it to the pool according to some probability distribution computed from the final high-water mark and the actual buffer size. (One simple distribution would be: 1.0 if the buffer is within 2x of the final high-water mark, or 0.0 otherwise.) |
@flimzy, are you running into an issue related to this in production? Seeing how it performs badly in production may be helpful data in coming up with a reasonable approach for this. When I filed this issue, it was just a theoretical issue I saw. It was not a concrete problem I was experiencing. |
@dsnet: No, I'm not. I'm interested in resolving this issue because it seems to be a blocker for making headway on some other json-encoder related issues that are affecting me: https://go-review.googlesource.com/c/go/+/135595, and by extension #33714 |
Thanks for the feeback, @bcmills I like the simplicity of your second approach. I think we need to be careful not to discard buffers to aggressively, in the case of applications that naturally have a large disparity of JSON payloads. A simple example comes to mind: A REST API that responds with I can still imagine certain applications where this would be sub-optimal, but maybe this is a reasonable starting point, to be tuned later |
As I consider this further, I have to wonder: Is this really a DoS attack vector? If it were for I don't mean to suggest this doesn't warrant some amount of attention, I just suspect the danger is over-stated. |
Another approach would be to remove the package level |
This does seem like the only approach to cover all use cases optimally. Is there any precedent for something similar elsewhere in the stdlib to use for guidance? |
Change https://golang.org/cl/195217 mentions this issue: |
I have created https://golang.org/cl/195217 as the simplest change I could think of to address the concerns raised in #27735. My main concern with this implementation is that it would become possible for a third-party library to starve the buffer pool by, for instance, never returning anything to the pool. The other realistic approach, IMO, is to expose the encodeStatePool object to the public API. This will have a larger footprint on the public API, and will require all users of Even with the drawbacks, the latter may be the better approach, but I started with the former because it is so simple, it seemed worth the minor effort to put it in front of reviewers. |
I've been thinking of ways to guarantee bounds on the worst-case behavior, and this is a prototype idea: // bufferPool is a pool of buffers.
//
// Example usage:
//
// p := getBuffer()
// defer putBuffer(p)
// p.buf = appendFoo(p.buf, foo) // may resize buffer to arbitrarily large sizes
// return append([]byte(nil), p.buf...) // single copy of the buffer contents
//
var bufferPool = sync.Pool{
New: func() interface{} { return new(pooledBuffer) },
}
type pooledBuffer struct {
buf []byte
strikes int // number of times the buffer was under-utilized
}
func getBuffer() *pooledBuffer {
return bufferPool.Get().(*pooledBuffer)
}
func putBuffer(p *pooledBuffer) {
// Recycle large buffers if sufficiently utilized.
// If a buffer is under-utilized enough times sequentially,
// then it is discarded, ensuring that a single large buffer
// won't be kept alive by a continuous stream of small usages.
switch {
case cap(p.buf) <= 1<<16: // always recycle buffers smaller than 64KiB
p.strikes = 0
case cap(p.buf)/2 <= len(p.buf): // at least 50% utilization
p.strikes = 0
case p.strikes < 4:
p.strikes++
default:
return // discard the buffer; too large and too often under-utilized
}
p.buf = p.buf[:0]
bufferPool.Put(p)
} In the worst-case scenario, we can ensure a bound on the minimal utilization based on the % threshold (e.g., 50% in the example above) and maximum number of sequential recycles below that threshold (e.g., 4x in the example above). For the constants chosen above, the worst case sequence of utilization would be:
On average, that's a worst case utilization of 10%, which is far better than the theoretical worst-case of 0% if the code naively recycles buffers without any limits. Advantages of the algorithm:
Happy to hear suggestions for improvements. |
@dsnet I like this. It's elegant and low cost. What is the reasoning behind dropping <64k buffers? |
The idea of keeping statistics locally is a nice one. We do lose some amount of information every time we throw away an over-sized buffer. An alternative, which might be helpful if we kept more sophisticated statistics, would be to nil out buf and return the pooledBuffer object to the pool. Then on Get we could pre-allocate a new buf with a better initial size. Even with these simple stats, it might even be worth zeroing and returning pooledBuffer to avoid an allocation. Maybe even include a small byte array struct field in pooledBuffer that we could initialize buf to use, providing an allocation-free path when a small buf is needed. |
It's not dropping 64k buffers, but always choosing to recycle them in the pool. That specific code path isn't strictly necessary, but for buffers below a certain size it probably doesn't hurt to always recycle them regardless of the utilization.
Interesting idea. I think a useful statistic to obtain would perhaps be the maximum length encountered over the last N usages. If we evict a buffer due to low utilization, we can allocate a new one based on a recently seen maximum length. |
Change https://go.dev/cl/469555 mentions this issue: |
Change https://go.dev/cl/469556 mentions this issue: |
Change https://go.dev/cl/469558 mentions this issue: |
I'm not sold on using AvailableBuffer like that here, because if the thing being written doesn't fit in the buffer there will be an allocation. But there is a fairly easy way to use it that doesn't have that problem, I think. |
You're correct that there could be an allocation outside of the segmented buffer, but the number of allocations asymptotically approaches zero so long as the buffer grows exponentially (which it does). For the cases where the appended bytes could be large, we call |
I don't think it's ideal but it can be made to work by truncating the existing buffer and allocating a one-off buffer of the requested size or bigger. Likewise when there are only a few bytes left in the current buffer, I would truncate it and make another when AvailableBuffer is called. I found that the best performance was with many smallish buffers (< 64k IIRC) rather than continuing to double the size. So, the allocations are not exponentially rare when the output is big enough. (but, they can be mostly prevented by wasting some bytes as described.) |
OK, I read the above CLs -- is there more coming or shall I rebase on top of those? |
Even if the buffer is not exponentially grown, so long as we explicitly call Grow before large writes, I believe the performance will still be fine. Writing of single-byte delimiters, bools, and numbers are sufficiently small that even with a 4k buffer, the probability of an occasional small allocation is okay.
That's all for what's relevant for your change. |
I've rebased and it's working, but it's much slower than it was so I've got more work to do before I push a new revision. |
I have perf back to baseline, but |
OK I think I figured out my rebase problem and https://go.dev/cl/455776 is ready for a pass. It is not much smaller because the |
This is part of the effort to reduce direct reliance on bytes.Buffer so that we can use a buffer with better pooling characteristics. Avoid direct use of bytes.Buffer in Compact and Indent and instead modify the logic to rely only on append. This avoids reliance on the bytes.Buffer.Truncate method, which makes switching to a custom buffer implementation easier. Performance: name old time/op new time/op delta EncodeMarshaler 25.5ns ± 8% 25.7ns ± 9% ~ (p=0.724 n=10+10) name old alloc/op new alloc/op delta EncodeMarshaler 4.00B ± 0% 4.00B ± 0% ~ (all equal) name old allocs/op new allocs/op delta EncodeMarshaler 1.00 ± 0% 1.00 ± 0% ~ (all equal) Updates #27735 Change-Id: I8cded03fab7651d43b5a238ee721f3472530868e Reviewed-on: https://go-review.googlesource.com/c/go/+/469555 Run-TryBot: Joseph Tsai <[email protected]> Reviewed-by: Daniel Martí <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Joseph Tsai <[email protected]> Reviewed-by: Bryan Mills <[email protected]>
This is part of the effort to reduce direct reliance on bytes.Buffer so that we can use a buffer with better pooling characteristics. Unify these two methods as a single version that uses generics to reduce duplicated logic. Unfortunately, we lack a generic version of utf8.DecodeRune (see #56948), so we cast []byte to string. The []byte variant is slightly slower for multi-byte unicode since casting results in a stack-allocated copy operation. Fortunately, this code path is used only for TextMarshalers. We can also delete TestStringBytes, which exists to ensure that the two duplicate implementations remain in sync. Performance: name old time/op new time/op delta CodeEncoder 399µs ± 2% 409µs ± 2% +2.59% (p=0.000 n=9+9) CodeEncoderError 450µs ± 1% 451µs ± 2% ~ (p=0.684 n=10+10) CodeMarshal 553µs ± 2% 562µs ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 733µs ± 3% 737µs ± 2% ~ (p=0.400 n=9+10) EncodeMarshaler 24.9ns ±12% 24.1ns ±13% ~ (p=0.190 n=10+10) EncoderEncode 12.3ns ± 3% 14.7ns ±20% ~ (p=0.315 n=8+10) name old speed new speed delta CodeEncoder 4.87GB/s ± 2% 4.74GB/s ± 2% -2.53% (p=0.000 n=9+9) CodeEncoderError 4.31GB/s ± 1% 4.30GB/s ± 2% ~ (p=0.684 n=10+10) CodeMarshal 3.51GB/s ± 2% 3.46GB/s ± 3% ~ (p=0.075 n=10+10) CodeMarshalError 2.65GB/s ± 3% 2.63GB/s ± 2% ~ (p=0.400 n=9+10) name old alloc/op new alloc/op delta CodeEncoder 327B ±347% 447B ±232% +36.93% (p=0.034 n=9+10) CodeEncoderError 142B ± 1% 143B ± 0% ~ (p=1.000 n=8+7) CodeMarshal 1.96MB ± 2% 1.96MB ± 2% ~ (p=0.468 n=10+10) CodeMarshalError 2.04MB ± 3% 2.03MB ± 1% ~ (p=0.971 n=10+10) EncodeMarshaler 4.00B ± 0% 4.00B ± 0% ~ (all equal) EncoderEncode 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta CodeEncoder 0.00 0.00 ~ (all equal) CodeEncoderError 4.00 ± 0% 4.00 ± 0% ~ (all equal) CodeMarshal 1.00 ± 0% 1.00 ± 0% ~ (all equal) CodeMarshalError 6.00 ± 0% 6.00 ± 0% ~ (all equal) EncodeMarshaler 1.00 ± 0% 1.00 ± 0% ~ (all equal) EncoderEncode 0.00 0.00 ~ (all equal) There is a very slight performance degradation for CodeEncoder due to an increase in allocation sizes. However, the number of allocations did not change. This is likely due to remote effects of the growth rate differences between bytes.Buffer and the builtin append function. We shouldn't overly rely on the growth rate of bytes.Buffer anyways since that is subject to possibly change in #51462. As the benchtime increases, the alloc/op goes down indicating that the amortized memory cost is fixed. Updates #27735 Change-Id: Ie35e480e292fe082d7986e0a4d81212c1d4202b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/469556 Run-TryBot: Joseph Tsai <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Daniel Martí <[email protected]> Auto-Submit: Joseph Tsai <[email protected]>
As part of the effort to rely less on bytes.Buffer, switch most operations to use more natural append-like operations. This makes it easier to swap bytes.Buffer out with a buffer type that only needs to support a minimal subset of operations. As a simplification, we can remove the use of the scratch buffer and use the available capacity of the buffer itself as the scratch. Also, declare an inlineable mayAppendQuote function to conditionally append a double-quote if necessary. Performance: name old time/op new time/op delta CodeEncoder 405µs ± 2% 397µs ± 2% -1.94% (p=0.000 n=20+20) CodeEncoderError 453µs ± 1% 444µs ± 4% -1.83% (p=0.000 n=19+19) CodeMarshal 559µs ± 4% 548µs ± 2% -2.02% (p=0.001 n=19+17) CodeMarshalError 724µs ± 3% 716µs ± 2% -1.13% (p=0.030 n=19+20) EncodeMarshaler 24.9ns ±15% 22.9ns ± 5% ~ (p=0.086 n=20+17) EncoderEncode 14.0ns ±27% 15.0ns ±20% ~ (p=0.365 n=20+20) There is a slight performance gain across the board due to the elimination of the scratch buffer. Appends are done directly into the unused capacity of the underlying buffer, avoiding an additional copy. See #53685 Updates #27735 Change-Id: Icf6d612a7f7a51ecd10097af092762dd1225d49e Reviewed-on: https://go-review.googlesource.com/c/go/+/469558 Reviewed-by: Daniel Martí <[email protected]> Auto-Submit: Joseph Tsai <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Joseph Tsai <[email protected]>
Change https://go.dev/cl/471200 mentions this issue: |
I'm still a bit saddened by the amount of code being added. I sent out https://go.dev/cl/471200 as a simpler alternative implementation. Here's the performance comparison:
where:
We see that both v1 and v2 result in low allocation wastage compared to v0. Comparing performance of v2 relative to v1, we get:
There's generally a performance improvement over the more complicated https://go.dev/cl/455776 implementation. I feel bad sending out an alternative CL after you (@lavalamp) clearly spent a lot of time on your CL. I'm okay with you taking attribution credit for the CL if you want. After all, my CL would not have come about until you spent the grunt work to prove that segmented buffers were a viable solution. (The benchmarks for v2 were not run with the optimization to |
I found this comment curious, so I spent some time investigating why this might be the case. The cause is that as buffers grow larger, they are recycled through the |
I put my benchmark results in a comment on the CL, they are directionally the same. I am happy with any solution as long as the problem gets solved :) |
Yeah, the most striking thing was that variance was super high with large buffers, so it makes sense that there are competing effects. |
@dsnet Have you seen the approach taken by https://github.com/valyala/bytebufferpool? It also attempts to collect some stats. |
Is there any update on this issue? |
Not presently, all current effort is being invested into getting "encoding/json/v2" ready for a formal proposal. Also, the proper fix for this could really benefit from a generic |
https://golang.org/cl/84897 introduces a denial-of-service attack on
json.Marshal
via a live-lock situation withsync.Pool
.Consider this snippet:
This is a variation of #23199 (comment) of a situation suggested by @bcmills.
Essentially, we have a 1-to-1000 ratio of a routines that either use 1KiB or 256MiB, respectively. The occasional insertion of a 256MiB buffer into the
sync.Pool
gets continually held by the 1KiB routines. On my machine, after 300 GC cycles, the above program occupies 6GiB of my heap, when I expect it to be 256MiB in the worst-case.\cc @jnjackins @bradfitz
The text was updated successfully, but these errors were encountered: