-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reuse []byte field when unmarshaling? #1495
Comments
Unfortunately, there is no support for this currently. Generally, the problem you describe is usually solved with areans, but we need to have Go language support for that: golang/go#51317 |
I don't understand how an arena allocator would help here, could you elaborate? I just want a way to re-use the allocated memory to avoid more allocations, not to free a bunch of allocations together to save on GC costs. I think it's a different task. Regardless, hopefully protobuf library doesn't have to wait for a standard library change for this use case. I'm probably not the only user who might benefit from this. Cheers. |
The usage pattern with an arena would look something like: a := arena.New()
for {
var m foopb.MyMessage
proto.UnmarshalOptions{Arena: a}.Unmarshal(&m)
... // make use of m
a.Free()
} When The arena proposal is based on real-world data battle testing an implementation of arenas with protobufs. I don't have hard numbers, but I recall that it does dramatically improve performance in some use cases. |
Ok, I see. Thanks for the explanation. This looks like it would solve my problem indeed. However, this seems to be a bit overengineered if you ask me :) I just want to reuse the byte slice that is already in the message, it shouldn't be that hard to do. On the other hand, for more complex messages arenas are definitely a more comprehensive solution (all those small allocations e.g. for nested messages could also be removed). So, we cannot do anything until there are arenas in Go? |
You are right that it's probably not too hard to do, but it's going to be very hard to use correctly because one would need to ensure that the byte slice referenced in the existing proto message is not used anywhere else. It's not enough to make sure that the proto message is not used anywhere else, because some piece of code might retain just the byte slice. It's a dangerous API and that makes me very reluctant to consider it. Unfortunately, I don't see a simple way to solve your problem with the current API. It's possible that I am missing something though. |
I’ve had to work on over-allocation issues of byte slices in a separate project that also deals with marshalling and unmarshalling results. There are kind of a multitude of issues and gotchas and it’s really not as easy as one might assume. The point mentioned by znkr is but one of the issues you can run into. Reading up on arenas, it definitely seems like the more powerful and correct solution, which will help this case here and in that other project. And hopefully that would avoid the accidental pitfalls and footguns surrounding “but I already have a byte-slice right here already.” |
The (unstated) goal is (likely) to reduce CPU cycles spent. Allocation and garbage collection are often big contributors to CPU usage in Go programs. It makes sense to try to reduce allocations to reduce CPU usage. But:
Implies a different goal. Can you state the desired end result more clearly? About possible solutions: to reduce GC-related CPU usage (and concomittant cache thrashing, write barrier latency et cetera.) arenas have proven quite effective in experiments. Hopefully arenas will become a more widely available thing in the future, but we can also brainstorm other solutions:
For the time being, I think it may be best to wait for arena support, but at any rate I'd be happy to hear your thoughts. |
what about clip? I think that would be better than the current code, as I believe the current code always allocates, and clip would never allocate, at least not inside that function |
I don't understand how |
@znkr OP reported this line as the problem: likely because that line is copying a byte slice. in my view thats not needed, you can just use the byte slice directly. the only issue is that if the returned slice has extra capacity, then running |
That's unfortunately not the only issue. The slice can be modified by other means, in fact it's fairly common to reuse a byte slice. Consider this function: func Parse(in []byte) *MyType If there return value references var b bytes.Buffer
for range n {
Receive(&b)
v := Parse(b.Bytes())
DoSomething(v)
} If We can't change the default, because that would break existing code and we are reluctant to add a feature that has subtle behavior differences like this based on a configuration. |
OK well you might as well close the issue then. you have three options:
currently option 3 is used, which has obvious performance issues because you forcing a copy regardless of the situation. |
As @znkr said, we're currently not considering this due to the potential safety issues. There are two dangerous operations on byte slices that alias the unmarshal input:
So issue #1 would still exist even with The Go proto implementation still has a number of safer optimizations that we could do first, and we're more likely to do those first, stay tuned. |
I have a client/server app that uses gRPC to communicate. Profiler shows that up to 12.6% of memory is allocated in
consumeBytesNoZero()
.The code is unmarshaling a message that has a
bytes
(i.e.[]byte
) field a few levels deep. Then that data is processed and then the message object/struct could be reused to read the next message into if it was possible, but I don't see how. I mean, I can reuse it but looking at the code, a new byte slice will be allocated in any case. Thebytes
field is always up to N KB in size (client ensures that) so perfect for reuse on the server side. Is there any way to reuse the allocated memory that I don't see?I see two ways:
Unmarshal()
to the message so that I could add custom unmarshaling logic or something like that. Ideally not a fixed method, actually, but a way to dynamically inject custom logic for a particular type.UnmarshalOptions
specially for that case.Perhaps combining the two is the best way forward - ability to inject dynamic unmarshaling code for types via
UnmarshalOptions
?The text was updated successfully, but these errors were encountered: