Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[persist] Structured file format #31080

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Jan 16, 2025

Motivation

Adds a new on-disk file format for Persist - the non-dual-write structured-data-only version.

We'd like this format to be supported in our on-prem releases, so folks aren't stuck with the migration midpoint.

Tips for reviewer

I've intentionally left this off in CI, out of the parallel workload. For our existing cloud envs I think it's most important to verify that the (substantial) refactorings here don't cause any regression for existing flag settings. However, it is enabled in parallel workload so we should be able to spot any bugs. (And empirically those tests are good at finding bugs in this corner of Persist.)

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bkirwi bkirwi force-pushed the structured-on-disk-2 branch 2 times, most recently from ef767c6 to 60af8bb Compare January 17, 2025 19:25
@bkirwi bkirwi marked this pull request as ready for review January 17, 2025 19:47
@bkirwi bkirwi requested review from a team as code owners January 17, 2025 19:47
@def-
Copy link
Contributor

def- commented Jan 17, 2025

I've intentionally left this off in CI, out of the parallel workload. For or existing cloud envs I think it's most important to verify that the (substantial) refactorings here don't cause any regression for existing flag settings. However, it is enabled in parallel workload so we should be able to spot any bugs. (And empirically those tests are good at finding bugs in this corner of Persist.)

One risk here is that what parallel workloads finds is usually harder to reproduce than other tests. I'll trigger a one-off run at least to see what would happen if we default all of CI to structured. Edit: Done: #31101

@bkirwi
Copy link
Contributor Author

bkirwi commented Jan 17, 2025

Aight! Note that feature benchmarks and upgrade tests are very likely to fail; unsure about the rest of them.

@def-
Copy link
Contributor

def- commented Jan 17, 2025

That kind of failure is ok. I'm more hoping to catch some wrong result or panic.

Copy link
Contributor

@def- def- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wrong results, no panics, just some timeouts, and the expected upgrade errors. So all good from my side

@bkirwi bkirwi force-pushed the structured-on-disk-2 branch from 43bc2c2 to e60157d Compare January 21, 2025 15:03
@bkirwi
Copy link
Contributor Author

bkirwi commented Jan 21, 2025

Found and fixed the performance regression - this should be ready for review.

(Nightly run with clean benchmarks. The only failure is in data ingest, which I think is the recent Azure issue?)

@def-
Copy link
Contributor

def- commented Jan 21, 2025

(Nightly run with clean benchmarks. The only failure is in data ingest, which I think is the recent Azure issue?)

Correct, please ignore.

@bkirwi bkirwi requested a review from ParkMyCar January 21, 2025 20:48
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WOOHOO!

src/persist-client/src/fetch.rs Show resolved Hide resolved
let next = if self.part_cursor < self.timestamps.len() {
let next_idx = self.part_cursor;
self.part_cursor += 1;
let mut t = T::decode(self.timestamps.values()[next_idx].to_le_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably fine, but I'm a bit surprised to see a .to_le_bytes() here? Feels like a bit of an abstraction leakage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about this... it's an artifact of not always reading data from ColumnarRecords, where this code used to live. I've left a comment that we should re-encapsulate it once we're all in on structured data, if that feels like enough to cover it?

src/persist-client/src/fetch.rs Show resolved Hide resolved
src/persist-client/src/fetch.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants