-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[persist] refactor Blob impl for Azure for higher performance #31127
base: main
Are you sure you want to change the base?
Conversation
* move fetching each chunk of a Part into a tokio::task * reduce copying in the case we get an invalid content-length header * add metrics for tracking the number of responses missing content-length
0f4186d
to
8cc2778
Compare
// valuable. | ||
let mut stream = blob.get().into_stream(); | ||
|
||
while let Some(value) = stream.next().await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we map
/buffered
here instead of spinning up individual tasks? A bit closer to the S3 impl (which does not fork off individual tasks) and makes it easier to cap the concurrency per fetch...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call! refactored to use FuturesOrdered
like the S3 blob impl
src/persist/src/azure.rs
Outdated
.lgbytes | ||
.persist_azure | ||
.new_region(usize::cast_from(content_length)); | ||
PreSizedBuffer::Sized(region) | ||
} | ||
0 => PreSizedBuffer::Unknown(Vec::new()), | ||
0 => { | ||
metrics.get_invalid_resp.inc(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The S3 metrics say that a "content-length of 0 isn't necessarily invalid", which makes sense to me. Could we inc this only if the size turns out to not match the header?
(Generally I'm not convinced of the need to have this defensive coding here, so it'd be handy if this metric fired only in the cases that it was actually load-bearing!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh yeah you're totally right, updated!
* remove metrics counting * use FuturesOrdered instead of tokio::task
…es not match content-length
This refactors the impl of
Blob
for Azure in a way that should be faster. TheBlobClient
we use from theazure_storage_blob
crate returns aStream
that whenawait
-ed sends a ranged GET request for a chunk of a blob. This PR refactors our impl so we await each ranged request in atokio::task
which increases the concurrency at which we fetch chunks of aPart
.It also refactors how we handle the case when the
content-length
header is missing, and adds metrics so we can track how often this occurs.Motivation
Maybe progress against https://github.com/MaterializeInc/database-issues/issues/8892
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.