Add and use over-allocation metric to decide when to defragment #996

geieredgar · 2021-01-28T03:04:48Z

This pull request implements the suggestion made by @djc here: #981 (comment)

djc

Thanks again for working on this! This one is a bit tricky, I left some feedback below.

@Matthias247, would you like to take a look at this?

quinn-proto/src/connection/assembler.rs

quinn-proto/src/connection/mod.rs

djc · 2021-01-28T09:06:53Z

quinn-proto/src/connection/assembler.rs

                    offset = duplicate.start;
                }
                bytes.advance((duplicate.end - offset) as usize);
                offset = duplicate.end;
            }
+            allocation_size = allocation_size.saturating_sub((offset - start) as usize);


I'm not sure about this. As far as I understand it, either Chunk could keep the entire allocation alive. It's also already the case that a payload may have multiple STREAM frames (either for the same stream or for different streams), so we're generally somewhat overcounting the overallocations, and we just have to take that into account when deciding to defragment (but to that end we can just fudge the defragmentation trigger heuristic).

I still would like the over-allocation estimate to be correct and precise if the allocation_size argument is also correct and precise.
So e.g. insert(1, Buffer of size 1022, 1022) followed by insert(0, Buffer of size 1024, 1024) should increase over-allocation by 1022 and not by 2045.
I agree that we will often over-count the over-allocation, but I don't see that as a reason to make the estimate even worse.

That's not really my point though. My point is this: shrinking the Bytes doesn't shrink the size of the allocation, it simply changes the index pointing to the allocation. So in that sense, the deduplication happening here should have, in my understanding, no effect on the amount of overallocation.

You are right that only advancing the Bytes should not shrink the allocation_size, but pushing a Buffer should, because we do not want to account for the same bytes twice. My error was that I thought that I could simply adjust the allocation_size at the end of the de-duplication loop, but now I see that this does not work when we are not pushing a new Buffer at every iteration step. I therefore moved the allocation_size adjustment into the loop, only adjusting it when we are pushing a new Buffer.

But shrinking the allocation size is only warranted if we're accounting for the same allocations. A different but overlapping Buffer ("same bytes twice") is substantially more likely to originate from a different packet, and thus from a different allocation.

I try to clarify my thoughts a bit more:

if duplicate.start > offset { let over_allocation = (duplicate.end - duplicate.start) as usize; self.data.push(Buffer { offset, bytes: bytes.split_to((duplicate.start - offset) as usize), over_allocation, }); self.over_allocation += over_allocation; allocation_size = allocation_size.saturating_sub((duplicate.end - offset) as usize); offset = duplicate.start; }

Here we are pushing a new Buffer, because we need to close the gap to the next duplicate. This new Buffer references the same allocation and I use the size of the duplicate as the over-allocation for the new Buffer, because this is the amount of bytes that are unused but are still referenced and kept in memory by the Buffer. The reasons I associate a over-allocation with every Buffer are:

It could be the last Buffer we push

If we only keep track of over-allocation in a single Buffer, when that Buffer is later removed, we will underestimate the over-allocation caused by the remaining Buffers pushed by the insert call.

Because I account for the over-allocation in the pushed Buffer, I reduce the allocation size by the size of the Buffer and its over-allocation.

A possible other sensible approach would be to calculate for every pushed Buffer the over-allocation as allocation_size.saturating_sub(bytes.len()) and never mutating the allocation_size. Then the calculated over-allocation would grow much larger if we have to fill many small gaps in the de-duplication loop.

djc · 2021-01-28T09:09:38Z

(Also please mention #981 in the commit message!)

Matthias247 · 2021-01-29T17:49:14Z

I feel like any flag like defragmented or overallocation is rather challenging to maintain, since it replicates the state of the individual buffers but also needs to be in sync with those. If it isn't, then we get bugs like #982 .

I would tackle this problem a bit different than what is proposed here: I would purely add a utilization field to each of the chunks in the queue. When inserting, I would loop through fields, check utilization values, and if I find some chunks with bad utilization values (e.g. < 50%), I would merge those into a bigger chunk - but actually leave others alone. There might be no need to defragment everything.

One important thing: I would not touch the utilization value if a part of the chunk is dequeued by the user. The reason for this is that if the user is already reading data, they will read the next data chunk very soon, and it won't stay around with poor utilization for very long. The danger there only exists for chunks which can't be read by the user yet, because there is a gap in the queue.

With that approach another challenge is how not to iterate the complete list of fragments on each insert. Maybe the old defragmented field helps for that. Or keeping it as an offset towards the first gap (non consecutive data), which would be the starting point for investigating defragmentation?

geieredgar · 2021-01-29T20:54:47Z

@Matthias247 I pushed a new commit that would make this approach a bit more similar to yours (although it keeps the over_allocation field, but like you said, we would probably need one field like defragmented or over_allocation to prevent iterating the list of fragments on every insert).

The defragment process first checks if we can reduce over_allocation below the specified threshold by just defragmenting low utilized Buffer (currently that translates to over_allocation > bytes.len()). In that case it only defragments those, otherwise it will only defragment buffers with over_allocation > 0, therefore preventing an already defragmented Buffer to be defragmented again.

djc · 2021-01-29T21:28:46Z

I feel like any flag like defragmented or overallocation is rather challenging to maintain, since it replicates the state of the individual buffers but also needs to be in sync with those. If it isn't, then we get bugs like #982.

That bug was due in part to the duplication of the read APIs, which I'm removing in #991. Yes, keeping these in sync is a bit tricky, but also has important benefits like not having to iterate over the chunks when deciding when to defragment, and after the changes in #991 isn't too hard IMO.

Add a has_pending_retransmits method to quinn_proto::connection::Connection and use it inside of quinn::RecvStream::poll_read_generic to decide if we should wake up the connection driver.

Add an over_allocation field to Buffer and Assembler to keep track of wasted memory per Buffer and per Assembler. Add an allocation_size parameter to Assembler::insert to estimate wasted memory per Buffer. Trigger defragmentation when over-allocation reaches 32k.

djc · 2021-01-30T12:14:29Z

Uh, sorry for the automation closing this. Can you open a new one targeting main?

geieredgar · 2021-01-30T14:24:51Z

OK, I opened a new PR here: #1000.

geieredgar force-pushed the reading-981 branch from ff3c047 to fbac057 Compare January 28, 2021 03:07

djc reviewed Jan 28, 2021

View reviewed changes

djc force-pushed the reading branch 4 times, most recently from cbd7a0b to 8645050 Compare January 28, 2021 12:53

geieredgar force-pushed the reading-981 branch from fbac057 to bd074b0 Compare January 28, 2021 16:45

geieredgar requested a review from djc January 28, 2021 17:08

geieredgar force-pushed the reading-981 branch from bd074b0 to 9a14853 Compare January 28, 2021 21:37

geieredgar force-pushed the reading-981 branch from 9a14853 to 4a46217 Compare January 29, 2021 20:39

djc added 4 commits January 29, 2021 22:35

quinn-proto: unify API for ordered and unordered reads

306debe

quinn: unify ordered and unordered read APIs

031fe0e

quinn-proto: rename assembler::Chunk to Buffer

99fd422

quinn-proto: use struct to yield data from assembler

d5b0eb0

djc force-pushed the reading branch from 8645050 to 4985f3a Compare January 29, 2021 21:40

djc and others added 2 commits January 29, 2021 22:47

quinn-proto: yield read data as Chunks

e4f9226

Only wake up the connection driver if necessary

d51bf0c

Add a has_pending_retransmits method to quinn_proto::connection::Connection and use it inside of quinn::RecvStream::poll_read_generic to decide if we should wake up the connection driver.

djc force-pushed the reading branch from 4985f3a to d51bf0c Compare January 29, 2021 21:47

geieredgar force-pushed the reading-981 branch from 4a46217 to 7591ef0 Compare January 29, 2021 22:13

quinn-proto: change defragmentation strategy

b45354f

geieredgar force-pushed the reading-981 branch from 7591ef0 to b45354f Compare January 29, 2021 22:51

djc force-pushed the reading branch 2 times, most recently from 509bc8f to 41b1293 Compare January 30, 2021 08:34

djc closed this Jan 30, 2021

djc deleted the branch quinn-rs:reading January 30, 2021 09:09

geieredgar mentioned this pull request Jan 30, 2021

Add and use over-allocation metric to decide when to defragment #1000

Merged

geieredgar deleted the reading-981 branch January 30, 2021 14:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add and use over-allocation metric to decide when to defragment #996

Add and use over-allocation metric to decide when to defragment #996

geieredgar commented Jan 28, 2021

djc left a comment

djc Jan 28, 2021

geieredgar Jan 28, 2021

djc Jan 28, 2021

geieredgar Jan 28, 2021

djc Jan 29, 2021

geieredgar Jan 29, 2021 •

edited

Loading

djc commented Jan 28, 2021

Matthias247 commented Jan 29, 2021

geieredgar commented Jan 29, 2021

djc commented Jan 29, 2021

djc commented Jan 30, 2021

geieredgar commented Jan 30, 2021

Add and use over-allocation metric to decide when to defragment #996

Add and use over-allocation metric to decide when to defragment #996

Conversation

geieredgar commented Jan 28, 2021

djc left a comment

Choose a reason for hiding this comment

djc Jan 28, 2021

Choose a reason for hiding this comment

geieredgar Jan 28, 2021

Choose a reason for hiding this comment

djc Jan 28, 2021

Choose a reason for hiding this comment

geieredgar Jan 28, 2021

Choose a reason for hiding this comment

djc Jan 29, 2021

Choose a reason for hiding this comment

geieredgar Jan 29, 2021 • edited Loading

Choose a reason for hiding this comment

djc commented Jan 28, 2021

Matthias247 commented Jan 29, 2021

geieredgar commented Jan 29, 2021

djc commented Jan 29, 2021

djc commented Jan 30, 2021

geieredgar commented Jan 30, 2021

geieredgar Jan 29, 2021 •

edited

Loading