feat: loosen 72 hour query/write restriction #25890

mgattozzi · 2025-01-21T17:47:51Z

This commit does a few key things:

Removes the 72 hour query and write restrictions in Core
Limits the queries to a default number of parquet files. We chose 432 as this is about 72 hours using default settings for the gen1 timeblock
The file limit can be increased, but the help text and error message when exceeded note that query performance will likely be degraded as a result.
We warn users to use smaller time ranges if possible if they hit this query error

With this we eliminate the hard restriction we have in place, but instead create a soft one that users can choose to take the performance hit with. If they can't take that hit then it's recommended that they upgrade to Enterprise which has the compactor built in to make performant historical queries.

praveen-influx

Looks good - just a minor comment/suggestion.

praveen-influx · 2025-01-21T17:58:22Z

influxdb3/src/commands/serve.rs

+    /// query performance will likely suffer as a result. It would be better to specify smaller time
+    /// ranges if possible in a query
+    #[clap(long = "file-limit", env = "INFLUXDB3_FILE_LIMIT", action)]
+    pub file_limit: Option<usize>,


Just a minor thing, setting 432 here directly as a default here could've worked?

I'd still have to pass it in for testing purposes so I kept it as an Option so that there would only be one place that the default would need to be defined!

Can we rename to query-file-limit? I think it's better to be more descriptive about what this file limit is about.

hiltontj

Seems good - just commented on the dbg! macro invocations. Does the compiler make those a no-op in release build? I thought it still results in stdout.

hiltontj · 2025-01-21T18:26:00Z

influxdb3_write/src/lib.rs

@@ -570,6 +567,7 @@ impl BufferFilter {
                    < analysis.boundaries.len())
                .then_some(analysis.boundaries.remove(time_col_index))
                {
+                    dbg!(&interval);


Should these dbg!s be left in?

No I thought I had taken them out!

hiltontj · 2025-01-21T18:26:07Z

influxdb3_write/src/lib.rs

@@ -578,6 +576,7 @@ impl BufferFilter {
                    } else {
                        time_interval.replace(interval);
                    }
+                    dbg!(&time_interval);


waynr · 2025-01-21T18:34:18Z

influxdb3_server/src/query_executor/mod.rs

+        let db_name = "test_db";
+        // perform writes over time to generate WAL files and some snapshots
+        // the time provider is bumped to trick the system into persisting files:
+        for i in 0..1298 {


I see in the comment that the time provider is bumped to trick the system into persisting files, but just so I understand correctly -- is that because the system is configured at some level to persist to the object store at some interval, ie 30 seconds or so?

Yes exactly! Normally this would be by default I think every 10 minutes or so.

Snapshot happens based on the number of WAL files. Incrementing the time makes it so that the writes fall into different gen1 blocks of time, each of which will result in a single parquet file.

pauldix

Left some comments. Also, since we're no longer limiting the time, we should load up all persisted snapshots on startup. If we only load the last 1,000 we could be missing a bunch of historical data. Can do in a follow up PR or as part of this.

influxdb3/src/commands/serve.rs

pauldix · 2025-01-22T04:33:10Z

influxdb3/src/commands/serve.rs

+    /// query performance will likely suffer as a result. It would be better to specify smaller time
+    /// ranges if possible in a query
+    #[clap(long = "file-limit", env = "INFLUXDB3_FILE_LIMIT", action)]
+    pub file_limit: Option<usize>,


Can we rename to query-file-limit? I think it's better to be more descriptive about what this file limit is about.

pauldix · 2025-01-22T04:34:33Z

influxdb3_server/src/query_executor/mod.rs

+        let db_name = "test_db";
+        // perform writes over time to generate WAL files and some snapshots
+        // the time provider is bumped to trick the system into persisting files:
+        for i in 0..1298 {


Snapshot happens based on the number of WAL files. Incrementing the time makes it so that the writes fall into different gen1 blocks of time, each of which will result in a single parquet file.

influxdb3_write/src/write_buffer/mod.rs

This commit does a few key things: - Removes the 72 hour query and write restrictions in Core - Limits the queries to a default number of parquet files. We chose 432 as this is about 72 hours using default settings for the gen1 timeblock - The file limit can be increased, but the help text and error message when exceeded note that query performance will likely be degraded as a result. - We warn users to use smaller time ranges if possible if they hit this query error With this we eliminate the hard restriction we have in place, but instead create a soft one that users can choose to take the performance hit with. If they can't take that hit then it's recomended that they upgrade to Enterprise which has the compactor built in to make performant historical queries.

mgattozzi · 2025-01-22T19:19:57Z

Updated wording and moved us to using --query-file-limit

mgattozzi requested review from pauldix, waynr, hiltontj, jacksonrnewhouse and praveen-influx January 21, 2025 17:47

praveen-influx approved these changes Jan 21, 2025

View reviewed changes

hiltontj reviewed Jan 21, 2025

View reviewed changes

waynr reviewed Jan 21, 2025

View reviewed changes

mgattozzi force-pushed the mgattozzi/remove-72-hour-limit branch from 7c3317e to 218efa0 Compare January 21, 2025 21:07

pauldix reviewed Jan 22, 2025

View reviewed changes

mgattozzi force-pushed the mgattozzi/remove-72-hour-limit branch from 218efa0 to 7a9ade9 Compare January 22, 2025 19:19

mgattozzi requested a review from pauldix January 22, 2025 19:19

pauldix approved these changes Jan 23, 2025

View reviewed changes

mgattozzi merged commit 63bd509 into main Jan 23, 2025
13 checks passed

mgattozzi deleted the mgattozzi/remove-72-hour-limit branch January 23, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: loosen 72 hour query/write restriction #25890

feat: loosen 72 hour query/write restriction #25890

mgattozzi commented Jan 21, 2025

praveen-influx left a comment

praveen-influx Jan 21, 2025

mgattozzi Jan 21, 2025

pauldix Jan 22, 2025

hiltontj left a comment

hiltontj Jan 21, 2025

mgattozzi Jan 21, 2025

hiltontj Jan 21, 2025

waynr Jan 21, 2025

mgattozzi Jan 21, 2025

pauldix Jan 22, 2025

pauldix left a comment

pauldix Jan 22, 2025

pauldix Jan 22, 2025

mgattozzi commented Jan 22, 2025

feat: loosen 72 hour query/write restriction #25890

feat: loosen 72 hour query/write restriction #25890

Conversation

mgattozzi commented Jan 21, 2025

praveen-influx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hiltontj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pauldix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgattozzi commented Jan 22, 2025