Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor perf: reduce fine grained buffer access when encoding HTTP1 headers #19115

Merged
merged 30 commits into from
Jan 12, 2022

Conversation

wbpcode
Copy link
Member

@wbpcode wbpcode commented Nov 29, 2021

Commit Message: minor perf: reduce fine grained buffer access when encoding HTTP1 headers

Additional Description:

The HTTP1 codec performs a large number of buffer APIs calls during the encoding of HTTP headers, which introduces an additional CPU overhead (3-4%). This PR implements a buffer helper as a cache to reduce direct buffer writes to improving the overall performance.

Check this #19103 (comment) for some more related info.

This PR reverts #9825. At the same time, I also try my best to make the code simpler and easier to maintain.

Risk Level: High.
Testing: Waiting.
Docs Changes: N/A.
Release Notes: N/A.

Some benchmark results:

The total amount of data written in each test is the same, 64MB. The figure shows the performance of using different APIs and different basic write unit sizes.

In addFrags/{n}, n represents the number of units written by the addFragments API in batches. For example, when the write unit is 16 bytes, addFrags/3 means to use addFragments to write 3 strings of 16 bytes at one time.

Y-axis: number of nanoseconds to complete 64mb data writes. Higher Y-axis is slower.
X-axis: basic write unit bytes size. add can only writes one basic unit one time and addFragments can writes multiple units one time.

image

image

From the graph, addFrags/5 gives the best performance in most all cases.

@jmarantz
Copy link
Contributor

jmarantz commented Nov 29, 2021

nice! Can you add both unit tests and a microbenchmark test to this PR?

You can see a nearby microbenchmark test in test/common/http/header_map_impl_speed_test.cc , bazel target //test/common/http:header_map_impl_speed_test , which you could copy into the http1 subdir to capture the performance improvement directly.

Thanks!

@wbpcode
Copy link
Member Author

wbpcode commented Nov 29, 2021

nice! Can you add both unit tests and a microbenchmark test to this PR?

You can see a nearby microbenchmark test in test/common/http/header_map_impl_speed_test.cc , bazel target //test/common/http/header_map_impl_speed_test , which you could copy into the http1 subdir to capture the performance improvement directly.

Thanks!

Get it. And thanks very much for all you kind and detailed sugesstions. It's very helpful. (^_^)

const uint64_t basic_header_size = method->value().size() + (host ? host->value().size() : 0) +
(path ? path->value().size() : 0);

conn_buffer_helper.reserveBuffer(4096 + basic_header_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it somewhere to ensure the size of headers is less than 4096 bytes?

Copy link
Member Author

@wbpcode wbpcode Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need not to ensure it. We always will reserve enough space for encoding.

void StreamEncoderImpl::encodeHeader(absl::string_view key, absl::string_view value) {
  ASSERT(!key.empty());

  const uint64_t header_size = key.size() + value.size() + 4;

  auto& conn_buffer_helper = connection_.bufferHelper();
  conn_buffer_helper.reserveBuffer(header_size);  // <-- reserve enough space for every header.

  conn_buffer_helper.writeToBuffer(key);
  conn_buffer_helper.writeToBuffer(':');
  conn_buffer_helper.writeToBuffer(' ');
  conn_buffer_helper.writeToBuffer(value);
  conn_buffer_helper.writeToBuffer(CRLF);

  bytes_meter_->addHeaderBytesSent(header_size);
}

@ggreenway
Copy link
Contributor

Reading through the history of this PR and the one you linked to, it sounds like the biggest contributor to the performance degradation is the repeated checking of watermarks, correct?

If that's the case, an alternative that would keep the code-cleanup of #9825 would be to add an API for temporarily disabling those checks (startTransaction()/stopTransaction()?).

If we choose to keep going down this path, we'll need to validate that this use of the buffer reservation API isn't resulting in a worse set of Slices in the buffer, which could also negatively affect performance. The call to reserve 4096 bytes will most likely always create a new Slice, and I'm not sure what the state of the buffer likely is before that call.

@alyssawilk
Copy link
Contributor

I hadn't been checking out the flame fraphs I assumed it was fragmented allocations not checking watermark bounds. that should be pretty cheap or something we can address without adding back all the complexity. @wbpcode can you clue me on what causes the gains?

@jmarantz
Copy link
Contributor

another thought is to add an API to add an ordered collection of string_views to a Buffer Instance.

That could iterate through the collection twice: once to get the total size so we can reserve enough contiguous space for it, the other to append all the chunks without need for intermediate checking. I would use some sort of templated method if possible (annoying cause it's a virtual class...) so that the same code can work whether it is supplied a C array, std::vector, std::array, or whatever, meaning at the call site you could call buffer_->appendStrings({header, ": ", value, "\r"}); without having to create allocate a temp vector. You might need a non-virtual wrapper with templated helpers.

@jmarantz
Copy link
Contributor

/wait

@jmarantz
Copy link
Contributor

also I want to re-emphasize that it would be really helpful to get a repro of this issue using Nighthawk or even just a microbenchmark so we could make sure Envoy is operating in its sweet-spot, and that the proposed changes are making a realistic case go faster. I think this change probably is worth it but it'd be better to repro it.

@wbpcode
Copy link
Member Author

wbpcode commented Dec 1, 2021

@ggreenway Thanks and here is some replies.

Reading through the history of this PR and the one you linked to, it sounds like the biggest contributor to the performance degradation is the repeated checking of watermarks, correct?

Nope. Although the repeated checking of watermarks brings some overhead. But from the callgrind data, the repeated buffer Add brings more.

image

If we choose to keep going down this path, we'll need to validate that this use of the buffer reservation API isn't resulting in a worse set of Slices in the buffer, which could also negatively affect performance. The call to reserve 4096 bytes will most likely always create a new Slice, and I'm not sure what the state of the buffer likely is before that call.

We always maintain sequential writing. In other words, in the worst case, a slice is created at the end of the buffer and submitted. And when a slice is created, unless the exist slice does not have enough space, otherwise we will not submit and create a new slice.

Judging from the test results, this PR's effect is good, and the throughput has been increased by about 4%. (Same config and condition in the #19103 )

@wbpcode
Copy link
Member Author

wbpcode commented Dec 1, 2021

@alyssawilk Thanks and here is some replies.

Yes. Single watermark check is very cheap. But for 100, 000 request, it would be 6, 200, 000 watermark checks. And the repeat (6, 200, 000) buffer Add introduce the more obvious overhead. It seems that Add has more complicated logic. It is not cost-effective to call Add once to write a character or a very short string.

image

@wbpcode
Copy link
Member Author

wbpcode commented Dec 1, 2021

also I want to re-emphasize that it would be really helpful to get a repro of this issue using Nighthawk or even just a microbenchmark so we could make sure Envoy is operating in its sweet-spot, and that the proposed changes are making a realistic case go faster. I think this change probably is worth it but it'd be better to repro it.

I have test the latest binary of this PR with wrk and the throughput has been increased by about 4% . And I can run some tests with nighthawk this weekend. In addition, until the weekend, I will update this PR.

@jmarantz
Copy link
Contributor

jmarantz commented Dec 1, 2021

This makes sense; there's a lot of per-string-segment gook even in simple OwnedImpl::add().

@alyssawilk
Copy link
Contributor

hm yeah. While I wouldn't block this change I'd be more inclined to address by improving the buffer add functionality to be more efficient for small writes, than wrapper HTTP/1.1 code this way. I feel like if we improve add we'll improve perf for the cases discussed but also for things like trickle attack where I think the current fix wouldn't improve things?
cc @antoniovicente because I think he's looked into perf optimization here and may have thoughts as well.

@mattklein123
Copy link
Member

FWIW, the reason the code was originally the way it was is exactly because of this type of benchmark. Way back in the day I measured the same thing and noticed the same type of perf delta by doing this type of batching at the codec level. Whether it's actually worth it or not for real world use cases to go back to some variant of what we had before is a different question.

@wbpcode
Copy link
Member Author

wbpcode commented Dec 2, 2021

@alyssawilk

I'd be more inclined to address by improving the buffer add functionality to be more efficient for small writes.

It would be a great job. If necessary, I can also help at that time. I am now interested in most performance optimization issues. 😄

But I read the source code of OwnImpl, I think this job is not easy. The current OwnImpl::addImpl has been implemented very well, and it will be very difficult to obtain a larger performance improvement. This is not because OwnImpl is not good, but because the work undertaken by OwnImpl itself is very complicated.

In addition, this RP can also alleviate the overhead introduced by the watermark check. Although it's not too big, it shouldn't be ignored after the check is executed more frequently.

@antoniovicente antoniovicente self-assigned this Dec 2, 2021
@antoniovicente
Copy link
Contributor

I need to look at this more closely, but it seems that the story is that high-watermark checks add measurable overhead. I wonder if a mitigation would be to perform header writes to a new buffer and then move it's contents to the output buffer. This way we eliminate all the high watermark checks in the critical section.

The change as is does appear very risky as indicated by the risk assessment in the PR description. #9825 attempted to eliminate crashes due to incorrect reservation sizes and this PR re-introduces some of those human error risks. As mentioned in earlier comments, it would be good to have a continuous loadtest that shows the performance benefits of a change like this one and allows us to compare those benefits with alternate implementations like the write to temporary OwnedImpl buffer and then move its contents to the output buffer.

@wbpcode
Copy link
Member Author

wbpcode commented Dec 2, 2021

@mattklein123

This PR try to alleviate the fine-grains buffer write problem introduced by the HTTP headers encoding. I think this problem exists in all HTTP1 requests.

Of course, in different scenarios, the effect of this PR may be different. For example, in some very complex scenarios, the effect of this PR will not be obvious, because other modules may become a bottleneck. But this does not mean that this PR is useless, and it will at least not make things worse.

real world use cases

Regarding the meaning of this sentence, I am a little confused. 🤔
If the read world cases refers to a more realistic load, I can try to use nighthawk to generate some more different types of load to observe the effect of PR.
If you are talking about some complex scenarios, such as enabling a large number of extensions, then I think this is another problem. Because for most extensions, users can at least choose to disable it. But codec is a critical path that all users cannot avoid.

@wbpcode
Copy link
Member Author

wbpcode commented Dec 2, 2021

@anatolebeuzon Thanks for this comment.

... compare those benefits with alternate implementations like the write to temporary OwnedImpl buffer and then move its contents to the output buffer.

I thinks this won't help. Because the frequent fine-grains OwnedImpl::add introduced more overhead.

image

#9825 attempted to eliminate crashes due to incorrect reservation sizes and this PR re-introduces some of those human error risks.

🤔 can you give some more info about this problem? then we can try to solve it in this PR. #9825 has no detailed description about the problem.

@antoniovicente
Copy link
Contributor

@anatolebeuzon Thanks for this comment.

... compare those benefits with alternate implementations like the write to temporary OwnedImpl buffer and then move its contents to the output buffer.

I thinks this won't help. Because the frequent fine-grains OwnedImpl::add introduced more overhead.

image

#9825 attempted to eliminate crashes due to incorrect reservation sizes and this PR re-introduces some of those human error risks.

thinking can you give some more info about this problem? then we can try to solve it in this PR. #9825 has no detailed description about the problem.

PR #9825 is a defense-in-depth fix for GHSA-gxvv-x4p2-rppp which was previously fixed by b3f42a4

I can try to implement the idea that I proposed above, but I'll need some help on the profiles front. I assume we don't have a micro benchmark that covers serialization of HTTP/1 request/response to output buffers in cases with a small/medium/large number of headers. At least part of the problem is the WatermarkBuffer implementation of checkHighAndOverflowWatermarks being called on every add operation, your benchmark shows this.

@wbpcode
Copy link
Member Author

wbpcode commented Dec 2, 2021

@antoniovicente

At least part of the problem is the WatermarkBuffer implementation of checkHighAndOverflowWatermarks being called on every add operation, your benchmark shows this.

Yes. But this part is not as significant as the problems caused by frequent fine-grains OwnImpl::add. And use a new Buffer, and finally submit its content to the output buffer, which will not make the code much simpler than this PR. 🤔

However, if we can optimize the entire OwnImpl, the value should be far greater than this PR. Because at that time all modules that used OwnImpl could benefit from it, not just codec.

I assume we don't have a micro benchmark that covers serialization of HTTP/1 request/response to output buffers in cases with a small/medium/large number of headers.

I want to create a simple test set for continuous tracking of #19103. But need to wait until this weekend. At that time, I can share it with you. 😄

@antoniovicente
Copy link
Contributor

@antoniovicente

At least part of the problem is the WatermarkBuffer implementation of checkHighAndOverflowWatermarks being called on every add operation, your benchmark shows this.

Yes. But this part is not as significant as the problems caused by frequent fine-grains OwnImpl::add. And use a new Buffer, and finally submit its content to the output buffer, which will not make the code much simpler than this PR. thinking

However, if we can optimize the entire OwnImpl, the value should be far greater than this PR. Because at that time all modules that used OwnImpl could benefit from it, not just codec.

I assume we don't have a micro benchmark that covers serialization of HTTP/1 request/response to output buffers in cases with a small/medium/large number of headers.

I want to create a simple test set for continuous tracking of #19103. But need to wait until this weekend. At that time, I can share it with you. smile

@mum4k

I look forward to more data. I have some optimizations in mind, but work on them has been blocked on there being more repeatable Envoy OSS loadtests.

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response, my holiday vacation was busier than expected.

*dst += v.size();
};

total_size_to_write = (absl::string_view(args).size() + ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the multiple conversions of args to string_view on lines 478 and 482 may have some performance impact in the case where args contains "const char*". I think that it may be possible to address this issue by changing the addFragments signature to:

template size_t addFragments(absl::string_view&&... args) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable-length template parameters do not seem to support this. Or we need to modify our interface to accept a container, but there may be additional memory allocation and copying. 🤔

Copy link
Contributor

@jmarantz jmarantz Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that might make sense. It would be a little less syntactically convenient, but you don't need to use dynamic arrays. You can just use a begin/end ptr/size at the interface, and write things like:

const absl::string_view args[] = {name, ": ", value, "\r\n"};
addFragments(args, args + arraysize(args));

or

addFragments(args, arraysize(args));

this would also work for std::vector if that's more convenient, at the cost of some allocs:

std::vector<absl::string_view> args{name, ": ", value, "\r\n"};
addFragments(&args[0], &args[0] + args.size());

or

addFragments(&args[0], args.size());

Copy link
Member Author

@wbpcode wbpcode Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do a quick test for this. Because we need to consider the cost of building the container itself. 🤔

Or we can just ensure all args in the original addFragments must have data() and size() methods. It would be a little less syntactically convenient because the caller need convert char* to absl::string_view by itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you'll need to allocate any memory for the first option, a C Array like that will just extend the stack.

Copy link
Member Author

@wbpcode wbpcode Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of having the model for HTTP1 where you collect a single vector of Spans for all the fragments of all the headers, and use a single addSpans calls that does the watermark check on the aggregated size? Does that simplify things?

Then we need create a vector to store all the key/values string view. Although frequent watermark checks will bring some overhead, I think copying all string view to the heap (64 bytes for single header) may bring more overhead.

In addition, I haven't figured out how to deal with the situation where a large number of fragments are written at the same time in a single addFragments. If the back slice overflows, then we will downgrade to use addImpl.

Although our performance is definitely not worse than repeated add(), there is not much performance gain when overflow occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE copying string-views; that's pretty small (16 bytes per).
RE heap: we have a limited number of headers, right? You could put a string_view[MAX_HEADERS] on the stack; no need to heap it.

RE add vs addFragments: as discussed in previous comments I'm suggesting we not have the policy of bailing into add mode when you spill out of the back slice. Instead, with the giant array of fragments, you can figure out before adding anything exactly how many bytes are needed, set up the slices as required, and do the watermarking once. I think it will be a cleaner/faster model.

Copy link
Member Author

@wbpcode wbpcode Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For HTTP1, if we modify the codec to the mode you mentioned, the benefits may not be particularly significant. The overhead of watermark check is not very large, and in the current implementation, we have reduced a lot (current overhead is about 1/4 of the original). And for HTTP1, back slice overflow does not happen frequently.

Instead, with the giant array of fragments, you can figure out before adding anything exactly how many bytes are needed, set up the slices as required, and do the watermarking once. I think it will be a cleaner/faster model.

This suggestion is valuable. When the back slice does not have enough contiguous memory, the remaining memory in the back slice can be filled first. After that, for a large amount of unwritten data, contiguous memory can be allocated. In this way addFragments can handle a large amount of fragmented data. 🤔

I think we can add a TODO for this. Then I will optimize it in subsequent PRs. This PR has really taken too long. orz

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just an optimization tweak -- it's a much simpler model, isn't it?

With the API that you have you have to have the caller do addFragments calls, and then end with an add(). That's potentially error-prone and I believe @ggreenway has commented about this.

Moreover the current API you have, as @antoniovicente observed, may have excess conversions from const char* to string_view which would result in extra calls to strlen.

With the Span API you can make the interface much simpler and more performant. So although I know this PR has gone on for a while, we iterated quite a bit and learned a lot and got some really good data. We are trying to make Envoy fast and safe, and it's very useful to do this well.

Copy link
Member Author

@wbpcode wbpcode Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz I am a little confused. 🤣 I have updated this PR to use new absl::Span<absl::string_view> API. Because as you said, it's cleaner, simper, and even faster. So the problems of @ggreenway and @antoniovicente have been solved. 🤣 Do you noticed the latest commit?

The optimization I mentioned above is ref the special case where the back slice has no continuous memory. We still can do some more perf improvements in that case.

if (slices_.empty()) {
slices_.emplace_back(Slice(size, account_));
length_ += size;
slices_.back().reservable_ += size;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "length_ += size;" and "slices_.back().reservable_ += size;" in here are a bit odd. inlineReserve is not only reserving, but also committing bytes to the buffer before the caller of inlineReserve has written the bytes to the appropriate buffer.

I take it that this is an optimization. I think that the method name should be changed to reflect this reserve+commit semantic that your implementation depends on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

Signed-off-by: wbpcode <[email protected]>
jmarantz
jmarantz previously approved these changes Jan 6, 2022
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great! Sorry I missed the last awesome commit.

Just some small nits but I think this is ready for Alyssa.

slices_.emplace_back(Slice(total_size_to_write, account_));
}

Slice& back = slices_.back();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TODO to improve performance as discussed in the comments, rather than the downgrade strategy we have here? You can add here the detail about contiguous memory also, that you provided in that comment thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

source/common/buffer/buffer_impl.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks absolutely fantastic! Thanks for all your patience and iteration @wbpcode and reviewers!

couple of questions from me below, and could you take a look at the PR description and see if it's still up to date?
@jmarantz : as you've done most of the reviewing I'd appreciate you weighing in on your comfort level landing this before we cut the release next week

envoy/buffer/buffer.h Outdated Show resolved Hide resolved
source/common/buffer/watermark_buffer.h Outdated Show resolved Hide resolved
@@ -231,7 +227,7 @@ void StreamEncoderImpl::encodeHeadersBase(const RequestOrResponseHeaderMap& head
}
}

connection_.addToBuffer(CRLF);
connection_.buffer().add(CRLF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this newly added comment still true? Reading the code it looks like watermarks are applied now

source/common/http/http1/codec_impl.cc Show resolved Hide resolved
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this deserves an approval from @antoniovicente &/or @KBaichoo as they have worked more in the buffer code than I do. At a high level this direction makes sense to me but this code is at the heart of things and deserves another pass from them.

envoy/buffer/buffer.h Outdated Show resolved Hide resolved
@KBaichoo
Copy link
Contributor

KBaichoo commented Jan 7, 2022

Thanks for the patience @wbpcode


class OwnedImpl;
private:
friend OwnedImpl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to make OwnedImpl a friend? I think we can get away with using the public interface via reservation, commit to the slice in https://github.com/envoyproxy/envoy/pull/19115/files#diff-0fdef7bd42dfa48ed5a015ddb85be158a514d61a82a5a17c1cfa34c6d744e5f8R604-R610

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed now that we use the public interface? Thanks!

Comment on lines 604 to 610
uint8_t* mem = back.base_ + back.reservable_;
for (const auto& fragment : fragments) {
memcpy(mem, fragment.data(), fragment.size()); // NOLINT(safe-memcpy)
mem += fragment.size();
}
length_ += total_size_to_write;
back.reservable_ += total_size_to_write;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use the existing API on Slice to create a reservation via reserve() memcpy the fragments into that and then do back.commit(reservation)?

Copy link
Member Author

@wbpcode wbpcode Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then there must be some more unnecessary check 🤔 It is minor update and I can try update it and check the benchmark result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look like it is ok.

@@ -231,7 +227,7 @@ void StreamEncoderImpl::encodeHeadersBase(const RequestOrResponseHeaderMap& head
}
}

connection_.addToBuffer(CRLF);
connection_.buffer().add(CRLF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true and should be reverted

test/common/buffer/buffer_test.cc Show resolved Hide resolved
source/common/buffer/buffer_impl.h Outdated Show resolved Hide resolved
@wbpcode
Copy link
Member Author

wbpcode commented Jan 10, 2022

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19115 (comment) was created by @wbpcode.

see: more, trace.

@wbpcode
Copy link
Member Author

wbpcode commented Jan 10, 2022

@KBaichoo Hi, I have update this RP, can you take a look again when you have free time? 😄

Copy link
Contributor

@KBaichoo KBaichoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great otherwise :)


class OwnedImpl;
private:
friend OwnedImpl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed now that we use the public interface? Thanks!

back.commit<false>(reservation);
length_ += total_size_to_copy;
} else {
// Downgrade to using `addImpl` for not enough memory in the back slice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/for/if


auto slice_vec = buffer.getRawSlices();

EXPECT_EQ(slice_vec.size(), 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it.

@@ -17,6 +17,8 @@
namespace Envoy {
namespace Buffer {

class OwnedImpl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the forward declaration since we no longer need to friend the class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.

Signed-off-by: wbpcode <[email protected]>
@KBaichoo
Copy link
Contributor

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19115 (comment) was created by @KBaichoo.

see: more, trace.

@wbpcode
Copy link
Member Author

wbpcode commented Jan 11, 2022

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #19115 (comment) was created by @wbpcode.

see: more, trace.

Copy link
Contributor

@KBaichoo KBaichoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the optimization.

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! I'm inclined to wait until after the release tomorrow to merge unless someone objects, as it's a pretty major change to some very core code.

@alyssawilk alyssawilk dismissed ggreenway’s stale review January 12, 2022 19:52

from months ago, stale I think

@alyssawilk alyssawilk merged commit 02dff6e into envoyproxy:main Jan 12, 2022
joshperry pushed a commit to joshperry/envoy that referenced this pull request Feb 13, 2022
…ders (envoyproxy#19115)

Commit Message: minor perf: reduce fine grained buffer access when encoding HTTP1 headers

Additional Description:
The HTTP1 codec performs a large number of buffer APIs calls during the encoding of HTTP headers, which introduces an additional CPU overhead (3-4%). This PR implements a buffer helper as a cache to reduce direct buffer writes to improving the overall performance.
Check this envoyproxy#19103 (comment) for some more related info.
This PR reverts envoyproxy#9825. At the same time, I also try my best to make the code simpler and easier to maintain.

Risk Level: High.
Testing: Waiting.
Docs Changes: N/A.
Release Notes: N/A.

Signed-off-by: wbpcode <[email protected]>
Signed-off-by: Josh Perry <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants