Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[persist] refactor Blob impl for Azure for higher performance #31127

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Jan 21, 2025

This refactors the impl of Blob for Azure in a way that should be faster. The BlobClient we use from the azure_storage_blob crate returns a Stream that when await-ed sends a ranged GET request for a chunk of a blob. This PR refactors our impl so we await each ranged request in a tokio::task which increases the concurrency at which we fetch chunks of a Part.

It also refactors how we handle the case when the content-length header is missing, and adds metrics so we can track how often this occurs.

Motivation

Maybe progress against https://github.com/MaterializeInc/database-issues/issues/8892

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar changed the title [persist] slightly different impl for Azure blob store [persist] less mem-copying in impl for Azure blob store Jan 21, 2025
@ParkMyCar ParkMyCar changed the title [persist] less mem-copying in impl for Azure blob store [persist] less mem-copying in impl for Azure blob store, in the worst case Jan 21, 2025
* move fetching each chunk of a Part into a tokio::task
* reduce copying in the case we get an invalid content-length header
* add metrics for tracking the number of responses missing content-length
@ParkMyCar ParkMyCar changed the title [persist] less mem-copying in impl for Azure blob store, in the worst case [persist] refactor Blob impl for Azure for higher performance Jan 21, 2025
@ParkMyCar ParkMyCar marked this pull request as ready for review January 21, 2025 18:04
@ParkMyCar ParkMyCar requested a review from a team as a code owner January 21, 2025 18:04
@ParkMyCar ParkMyCar requested review from pH14 and bkirwi January 21, 2025 18:08
// valuable.
let mut stream = blob.get().into_stream();

while let Some(value) = stream.next().await {
Copy link
Contributor

@bkirwi bkirwi Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we map/buffered here instead of spinning up individual tasks? A bit closer to the S3 impl (which does not fork off individual tasks) and makes it easier to cap the concurrency per fetch...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call! refactored to use FuturesOrdered like the S3 blob impl

.lgbytes
.persist_azure
.new_region(usize::cast_from(content_length));
PreSizedBuffer::Sized(region)
}
0 => PreSizedBuffer::Unknown(Vec::new()),
0 => {
metrics.get_invalid_resp.inc();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 metrics say that a "content-length of 0 isn't necessarily invalid", which makes sense to me. Could we inc this only if the size turns out to not match the header?

(Generally I'm not convinced of the need to have this defensive coding here, so it'd be handy if this metric fired only in the cases that it was actually load-bearing!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh yeah you're totally right, updated!

* remove metrics counting
* use FuturesOrdered instead of tokio::task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants