Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add no_sync write_lp param for fast writes #25902

Merged
merged 1 commit into from
Jan 24, 2025

Conversation

mgattozzi
Copy link
Contributor

In cases where a user does not need the guarantees that data is persisted to the WAL on write and needs faster ingest speed then the no_sync param added in this commit are what they need. Rather than waiting on a sync to the WAL a task to do so is spawned and the code continues executing to return a successful HTTP code to the user.

The upside to this is that they can ingest data faster, but there is a small risk that between writing the data and it eventually being written to object storage, that the server crashes and it's irrevocably lost. Also if the write to the WAL fails, then at most the user will get an error printed in the logs rather than a failed response code they can handle. The data will still be in the buffer, but will not be durable until persisted as a parquet file in this case.

However, in many cases that might be acceptable. This commit expands on what's possible so that the user can use InfluxDB Core the way that works best for their workload.

Note that this feature is only added for the /api/v3/write_lp endpoint. The legacy endpoints for writing can take the parameter, but won't do anything with it at all.

Closes #25597

self.wal.write_ops(ops).await?;
if no_sync {
let wal = Arc::clone(&self.wal);
tokio::spawn(async move {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't require a spawned task here. There's already a function on the WAL called buffer_op_unconfirmed (see https://github.com/influxdata/influxdb/blob/main/influxdb3_wal/src/lib.rs#L70). Just update this to instead take a vec and you'll be able to do this without a spawned task. It puts it into a mutex'd vec that gets used when the flush happens.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For additional context, this was described in the original issue as well: #25597 (comment)

@mgattozzi mgattozzi force-pushed the mgattozzi/to-sync-or-not-sync branch 2 times, most recently from b1dfdc3 to 9645fd9 Compare January 24, 2025 15:52
@mgattozzi mgattozzi requested a review from pauldix January 24, 2025 15:52
@mgattozzi mgattozzi force-pushed the mgattozzi/to-sync-or-not-sync branch 2 times, most recently from b6b35e2 to 8324f56 Compare January 24, 2025 17:15
Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question, otherwise LGTM

@@ -905,419 +904,6 @@ mod tests {
use std::any::Any;
use tokio::sync::oneshot::Receiver;

#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn write_flush_delete_and_load() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove this test? If covered elsewhere, probably better to have it gone as it's likely to be brittle given it's down in the weeds of creating the contents manually. Would definitely be better to have a refactored one that uses test helpers that build LP into batches, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't intentionally remove this test. In fact I don't even remember doing so. I'll put that back in. It should not have been removed

In cases where a user does not need the guarantees that data is
persisted to the WAL on write and needs faster ingest speed then the
no_sync param added in this commit are what they need. Rather
than waiting on a sync to the WAL we write to the buffer without
confirming that writes have made it to the WAL.

The upside to this is that they can ingest data faster, but there is a
small risk that between writing the data and it eventually being written
to object storage, that the server crashes and it's irrevocably lost.
Also if the write to the WAL fails, then at most the user will not get a
failed response code they can handle. The data will still be in the
buffer, but will not be durable until persisted as a parquet file in
this case.

However, in many cases that might be acceptable. This commit expands on
what's possible so that the user can use InfluxDB Core the way that
works best for their workload.

Note that this feature is only added for the /api/v3/write_lp endpoint.
The legacy endpoints for writing can take the parameter, but won't do
anything with it at all.

Closes #25597
@mgattozzi mgattozzi force-pushed the mgattozzi/to-sync-or-not-sync branch from 8324f56 to 32f027a Compare January 24, 2025 18:18
@mgattozzi mgattozzi merged commit 43e186d into main Jan 24, 2025
13 checks passed
@mgattozzi mgattozzi deleted the mgattozzi/to-sync-or-not-sync branch January 24, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add write API endpoint to write data unsync'd
3 participants