-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add and use over-allocation metric to decide when to defragment #996
Conversation
ff3c047
to
fbac057
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for working on this! This one is a bit tricky, I left some feedback below.
@Matthias247, would you like to take a look at this?
offset = duplicate.start; | ||
} | ||
bytes.advance((duplicate.end - offset) as usize); | ||
offset = duplicate.end; | ||
} | ||
allocation_size = allocation_size.saturating_sub((offset - start) as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this. As far as I understand it, either Chunk
could keep the entire allocation alive. It's also already the case that a payload may have multiple STREAM
frames (either for the same stream or for different streams), so we're generally somewhat overcounting the overallocations, and we just have to take that into account when deciding to defragment (but to that end we can just fudge the defragmentation trigger heuristic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still would like the over-allocation estimate to be correct and precise if the allocation_size
argument is also correct and precise.
So e.g. insert(1, Buffer of size 1022, 1022)
followed by insert(0, Buffer of size 1024, 1024)
should increase over-allocation by 1022 and not by 2045.
I agree that we will often over-count the over-allocation, but I don't see that as a reason to make the estimate even worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not really my point though. My point is this: shrinking the Bytes
doesn't shrink the size of the allocation, it simply changes the index pointing to the allocation. So in that sense, the deduplication happening here should have, in my understanding, no effect on the amount of overallocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that only advancing the Bytes
should not shrink the allocation_size
, but pushing a Buffer
should, because we do not want to account for the same bytes twice. My error was that I thought that I could simply adjust the allocation_size
at the end of the de-duplication loop, but now I see that this does not work when we are not pushing a new Buffer
at every iteration step. I therefore moved the allocation_size
adjustment into the loop, only adjusting it when we are pushing a new Buffer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But shrinking the allocation size is only warranted if we're accounting for the same allocations. A different but overlapping Buffer
("same bytes twice") is substantially more likely to originate from a different packet, and thus from a different allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to clarify my thoughts a bit more:
if duplicate.start > offset {
let over_allocation = (duplicate.end - duplicate.start) as usize;
self.data.push(Buffer {
offset,
bytes: bytes.split_to((duplicate.start - offset) as usize),
over_allocation,
});
self.over_allocation += over_allocation;
allocation_size =
allocation_size.saturating_sub((duplicate.end - offset) as usize);
offset = duplicate.start;
}
Here we are pushing a new Buffer
, because we need to close the gap to the next duplicate. This new Buffer
references the same allocation and I use the size of the duplicate as the over-allocation for the new Buffer
, because this is the amount of bytes that are unused but are still referenced and kept in memory by the Buffer
. The reasons I associate a over-allocation with every Buffer
are:
- It could be the last
Buffer
we push - If we only keep track of over-allocation in a single
Buffer
, when thatBuffer
is later removed, we will underestimate the over-allocation caused by the remaining Buffers pushed by theinsert
call.
Because I account for the over-allocation in the pushed Buffer
, I reduce the allocation size by the size of the Buffer
and its over-allocation.
A possible other sensible approach would be to calculate for every pushed Buffer the over-allocation as allocation_size.saturating_sub(bytes.len())
and never mutating the allocation_size
. Then the calculated over-allocation would grow much larger if we have to fill many small gaps in the de-duplication loop.
(Also please mention #981 in the commit message!) |
cbd7a0b
to
8645050
Compare
fbac057
to
bd074b0
Compare
bd074b0
to
9a14853
Compare
I feel like any flag like I would tackle this problem a bit different than what is proposed here: I would purely add a One important thing: I would not touch the utilization value if a part of the chunk is dequeued by the user. The reason for this is that if the user is already reading data, they will read the next data chunk very soon, and it won't stay around with poor utilization for very long. The danger there only exists for chunks which can't be read by the user yet, because there is a gap in the queue. With that approach another challenge is how not to iterate the complete list of fragments on each insert. Maybe the old |
9a14853
to
4a46217
Compare
@Matthias247 I pushed a new commit that would make this approach a bit more similar to yours (although it keeps the The defragment process first checks if we can reduce |
That bug was due in part to the duplication of the read APIs, which I'm removing in #991. Yes, keeping these in sync is a bit tricky, but also has important benefits like not having to iterate over the chunks when deciding when to defragment, and after the changes in #991 isn't too hard IMO. |
Add a has_pending_retransmits method to quinn_proto::connection::Connection and use it inside of quinn::RecvStream::poll_read_generic to decide if we should wake up the connection driver.
Add an over_allocation field to Buffer and Assembler to keep track of wasted memory per Buffer and per Assembler. Add an allocation_size parameter to Assembler::insert to estimate wasted memory per Buffer. Trigger defragmentation when over-allocation reaches 32k.
4a46217
to
7591ef0
Compare
7591ef0
to
b45354f
Compare
509bc8f
to
41b1293
Compare
Uh, sorry for the automation closing this. Can you open a new one targeting main? |
OK, I opened a new PR here: #1000. |
This pull request implements the suggestion made by @djc here: #981 (comment)