Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark-3.5: make where sql case sensitive setting alterable in rewrite data files procedure #11439

Merged
merged 22 commits into from
Dec 2, 2024

Conversation

ludlows
Copy link
Contributor

@ludlows ludlows commented Oct 31, 2024

this pr aims to make the rewriteDataFile action is aware of the user settings about sql case sensitivity in the where statement.
the implementation is simple.
we first obtain the case sensitive setting and save it as a variable in the constructor of rewriteDataFileAction.
then, we pass the variable to the tableScan .

related issue: #11438

@github-actions github-actions bot added the spark label Oct 31, 2024
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ludlows can you please also add an UT for it to future proof it ?

Copy link
Collaborator

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes looks reasonable to me as well, agree with @singhpk234 about UT

@ludlows
Copy link
Contributor Author

ludlows commented Nov 5, 2024

Hi @szehon-ho ,
please review the test cases, should you have time.
one possible problem is the type of exception is IllegalArgumentException here instead of the ValidationException mentioned in the issue.

@ludlows
Copy link
Contributor Author

ludlows commented Nov 5, 2024

I think I typed the wrong version of iceberg in the issue #11438

@huaxingao
Copy link
Contributor

@ludlows
Thanks for the PR. I have a couple of questions:

  1. It seems to me that the tests don't really test the changes in this PR; they would pass even without the fix. I think we should add some tests that would fail without the fix but can pass with it.
  2. Do we want to make the Spark SQL configuration spark.sql.caseSensitive apply to Iceberg stored procedure parameters? If so, we probably should apply spark.sql.caseSensitive to all Iceberg stored procedure parameters. Are there other Iceberg stored procedure parameters that should also honor spark.sql.caseSensitive?

@ludlows
Copy link
Contributor Author

ludlows commented Nov 6, 2024

  1. It seems to me that the tests don't really test the changes in this PR; they would pass even without the fix. I think we should add some tests that would fail without the fix but can pass with it.
  2. Do we want to make the Spark SQL configuration spark.sql.caseSensitive apply to Iceberg stored procedure parameters? If so, we probably should apply spark.sql.caseSensitive to all Iceberg stored procedure parameters. Are there other Iceberg stored procedure parameters that should also honor spark.sql.caseSensitive?

hi @huaxingao ,
thanks for the questions above.
about the 2nd one, as a data engineer, our users were asking me : why not all parts in the procedure are case insensitive even I have set spark.sql.caseSensitive to false? since the procedure is triggered at the sql level but why the parameter of procedure is not affected?

I think it is reasonable to apply the setting about the sql case sensitivity to all procedures, but we could take this pr as the starting point.

@huaxingao
Copy link
Contributor

@ludlows Thanks for the quick fix. Can we have a test that fails without the fix but passes with it? It seems that all your current tests pass even without the fix.

@ludlows
Copy link
Contributor Author

ludlows commented Nov 30, 2024

@huaxingao I think the test method testFilterCaseSensitivityBeforeChange() (leading to validation exception) has shown the bug exists before the PR.

@huaxingao
Copy link
Contributor

@ludlows I think you can simply reproduce the problem by something like

    createTable();
    insertData(10);
    sql("SET %s=false", SQLConf.CASE_SENSITIVE().key());
    sql("CALL %s.system.rewrite_data_files(table=>'%s', where=>'C1 > 0'), catalogName, tableIdent));

@ludlows
Copy link
Contributor Author

ludlows commented Dec 1, 2024

@huaxingao thanks for the comment.
but i don't think the problem will be raised since the bug has been fixed by this PR.
please check the test code belove:

@TestTemplate
  public void testFilterCaseSensitivityAfterChange() {
    createTable();
    insertData(10);
    sql("set spark.sql.caseSensitive=false");
    assertEquals(
        "Should have done nothing but passed the schema validation, since no files are present",
        ImmutableList.of(row(0, 0, 0L, 0)),
        sql(
            "CALL %s.system.rewrite_data_files(table=>'%s', where=>'C1 > 90000000')",
            catalogName, tableIdent));
  }

the test case above has passed .

@huaxingao
Copy link
Contributor

@ludlows Thanks for the quick reply. I know my example will pass with the PR's fix. However, the problem will arise without the fix. We need a simple test that fails without the fix and passes with it. A straightforward test like my example should suffice, with minimal changes. My goal is to keep the test as simple as possible.

Copy link
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@szehon-ho szehon-ho merged commit d8326d8 into apache:main Dec 2, 2024
31 checks passed
@szehon-ho
Copy link
Collaborator

Thanks @ludlows , and also @huaxingao, @anuragmantri @singhpk234 for reviews

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants