Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor SqlToRel::sql_expr_to_logical_expr_internal to reduce stack size #12384

Merged
merged 5 commits into from
Sep 9, 2024

Conversation

Jefffrey
Copy link
Contributor

@Jefffrey Jefffrey commented Sep 8, 2024

Which issue does this PR close?

Closes #11499

Rationale for this change

Cause for stack overflow was familiar to #9962, so after some LLDB debugging was able to identify that SqlToRel::sql_expr_to_logical_expr_internal was being recursively called with a chunky frame size (appox 70000 bytes), and seemed to be the most frequent frame in the backtrace with the largest size. Applied fix similar to #9962 where large arms are split into functions to reduce the size of the main function.

What changes are included in this PR?

Refactor SqlToRel::sql_expr_to_logical_expr_internal to split larger arms into separate functions.

Are these changes tested?

Yes, existing sqllogictest (with stack size modifier removed so back to default 2MB)

Are there any user-facing changes?

No

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels Sep 8, 2024
Copy link
Contributor Author

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the minimum needed to get the test passing, instead of refactoring each arm that doesn't already delegate to a function. Wouldn't be surprised if this crops up again, so have left a comment to serve as a reminder in the future.

Not sure of feasibility of doing something like #10023 to refactor this recursion to not be prone to issues like this in the future, but worth considering.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me -- thank you @Jefffrey 🙏

cc @comphead

@@ -30,11 +30,9 @@ use datafusion_common_runtime::SpawnedTask;

const TEST_DIRECTORY: &str = "test_files/";
const PG_COMPAT_FILE_PREFIX: &str = "pg_compat_";
const STACK_SIZE: usize = 2 * 1024 * 1024 + 512 * 1024; // 2.5 MBs, the default 2 MBs is currently too small
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@@ -174,6 +174,10 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
schema: &DFSchema,
planner_context: &mut PlannerContext,
) -> Result<Expr> {
// NOTE: This function is called recusively, so each match arm body should be as
// small as possible to avoid stack overflows in debug builds. Follow the
Copy link
Contributor

@comphead comphead Sep 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh did it happen for debug builds only?

please clarify what do you mean small as possible, small in terms of code, or small in terms of bytes which allocated for the stackframe? I feel we can put clarification into the comments to avoid potential confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both, I guess? I admit I'm not 100% of the mechanics at play here, I just eyeballed the larger arms which are likely to be causing this overflow issue from previous experience. I assume there's a general correlation between lines of code and bytes allocated for stackframe anyway 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote up the technical backstory for stack frame usage, many moons ago, here: #1047

I think this "stack overflow on debug builds" happens frequently enough it would be cool to write it up more formally on a blog or something

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @Jefffrey

@comphead comphead merged commit 79b3433 into apache:main Sep 9, 2024
24 checks passed
@Jefffrey Jefffrey deleted the array-expr-stack-overflow branch September 9, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate memory use in debug builds for deeply nested array constants
3 participants