-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: loosen 72 hour query/write restriction #25890
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - just a minor comment/suggestion.
influxdb3/src/commands/serve.rs
Outdated
/// query performance will likely suffer as a result. It would be better to specify smaller time | ||
/// ranges if possible in a query | ||
#[clap(long = "file-limit", env = "INFLUXDB3_FILE_LIMIT", action)] | ||
pub file_limit: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor thing, setting 432 here directly as a default here could've worked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still have to pass it in for testing purposes so I kept it as an Option so that there would only be one place that the default would need to be defined!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename to query-file-limit
? I think it's better to be more descriptive about what this file limit is about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good - just commented on the dbg!
macro invocations. Does the compiler make those a no-op in release build? I thought it still results in stdout.
influxdb3_write/src/lib.rs
Outdated
@@ -570,6 +567,7 @@ impl BufferFilter { | |||
< analysis.boundaries.len()) | |||
.then_some(analysis.boundaries.remove(time_col_index)) | |||
{ | |||
dbg!(&interval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these dbg!
s be left in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I thought I had taken them out!
influxdb3_write/src/lib.rs
Outdated
@@ -578,6 +576,7 @@ impl BufferFilter { | |||
} else { | |||
time_interval.replace(interval); | |||
} | |||
dbg!(&time_interval); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
let db_name = "test_db"; | ||
// perform writes over time to generate WAL files and some snapshots | ||
// the time provider is bumped to trick the system into persisting files: | ||
for i in 0..1298 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see in the comment that the time provider is bumped to trick the system into persisting files, but just so I understand correctly -- is that because the system is configured at some level to persist to the object store at some interval, ie 30 seconds or so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly! Normally this would be by default I think every 10 minutes or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot happens based on the number of WAL files. Incrementing the time makes it so that the writes fall into different gen1 blocks of time, each of which will result in a single parquet file.
7c3317e
to
218efa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Also, since we're no longer limiting the time, we should load up all persisted snapshots on startup. If we only load the last 1,000 we could be missing a bunch of historical data. Can do in a follow up PR or as part of this.
influxdb3/src/commands/serve.rs
Outdated
/// query performance will likely suffer as a result. It would be better to specify smaller time | ||
/// ranges if possible in a query | ||
#[clap(long = "file-limit", env = "INFLUXDB3_FILE_LIMIT", action)] | ||
pub file_limit: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename to query-file-limit
? I think it's better to be more descriptive about what this file limit is about.
let db_name = "test_db"; | ||
// perform writes over time to generate WAL files and some snapshots | ||
// the time provider is bumped to trick the system into persisting files: | ||
for i in 0..1298 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Snapshot happens based on the number of WAL files. Incrementing the time makes it so that the writes fall into different gen1 blocks of time, each of which will result in a single parquet file.
This commit does a few key things: - Removes the 72 hour query and write restrictions in Core - Limits the queries to a default number of parquet files. We chose 432 as this is about 72 hours using default settings for the gen1 timeblock - The file limit can be increased, but the help text and error message when exceeded note that query performance will likely be degraded as a result. - We warn users to use smaller time ranges if possible if they hit this query error With this we eliminate the hard restriction we have in place, but instead create a soft one that users can choose to take the performance hit with. If they can't take that hit then it's recomended that they upgrade to Enterprise which has the compactor built in to make performant historical queries.
218efa0
to
7a9ade9
Compare
Updated wording and moved us to using --query-file-limit |
This commit does a few key things:
With this we eliminate the hard restriction we have in place, but instead create a soft one that users can choose to take the performance hit with. If they can't take that hit then it's recommended that they upgrade to Enterprise which has the compactor built in to make performant historical queries.