Spark: Bump Spark minor versions for 3.3 and 3.4 #9187

ajantha-bhat · 2023-12-01T04:49:03Z

Spark 3.4.2 released yesterday with security and correctness fixes: https://spark.apache.org/news/spark-3-4-2-released.html

Spark 3.3.3 was released two months ago
https://spark.apache.org/news/spark-3-3-3-released.html

Spark 3.4.2 released yesterday with security and correctness fixes: https://spark.apache.org/news/spark-3-4-2-released.html Spark 3.3.3 was released two months ago https://spark.apache.org/news/spark-3-3-3-released.html

ajantha-bhat · 2023-12-01T04:49:45Z

gradle/libs.versions.toml

@@ -77,8 +77,8 @@ scala-collection-compat = "2.11.0"
 slf4j = "1.7.36"
 snowflake-jdbc = "3.14.3"
 spark-hive32 = "3.2.2"
-spark-hive33 = "3.3.2"
-spark-hive34 = "3.4.1"
+spark-hive33 = "3.3.3"


Not sure why dependabot didn't update it.

RussellSpitzer · 2023-12-01T22:36:34Z

    org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table '`spark_catalog`.`default`.`source_table`':
    - Cannot safely cast 'id': string to int.```

ajantha-bhat · 2023-12-02T01:44:03Z

org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table 'spark_catalog.default.source_table':
- Cannot safely cast 'id': string to int.```

I saw that. But didn't know which exact change in spark-3.4.2 has caused it. I was occupied with other work. Will analyze this today.

ajantha-bhat · 2023-12-02T11:29:04Z

@RussellSpitzer and @aokolnychyi : I have spent some time.

The issue is with only SparkCatalogConfig.SPARK catalog type.
That too for tables created with 'location' option.

I suspect clean up issue for tables created with 'location'.
I tried adding PURGE to dropTables() of these test cases and the test case can pass.

I don't closely work with Spark. So, I am not sure what caused it (was it https://issues.apache.org/jira/browse/SPARK-43203 ?). Feel free to add your analysis.

ajantha-bhat · 2023-12-02T11:30:19Z

...park-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestAddFilesProcedure.java

@@ -77,7 +77,7 @@ public void setupTempDirs() {

  @After
  public void dropTables() {
-    sql("DROP TABLE IF EXISTS %s", sourceTableName);
+    sql("DROP TABLE IF EXISTS %s PURGE", sourceTableName);


Many test cases used to fail with 3.4.2 version bump.

More details: #9187 (comment)

Yes so the issue here is that non-iceberg tables are being created inorder to test conversion and addition of files from non-iceberg tables into iceberg tables. The Spark Session catalog is probably now (correctly) treating them as external tables (not hive managed tables) and only dropping metadata instead of clearing out the whole table.

This is solely a Spark side change and has no impacts on any Iceberg code that I know of.

ajantha-bhat · 2023-12-05T16:58:46Z

ping

RussellSpitzer · 2023-12-06T15:36:38Z

Thanks @ajantha-bhat for the PR and @Fokko for review

manuzhang · 2024-01-22T12:39:57Z

@RussellSpitzer @ajantha-bhat Will this break Iceberg applications running on Spark 3.4.1 and Spark 3.4.0 due to https://issues.apache.org/jira/browse/SPARK-43203?

ajantha-bhat · 2024-01-22T12:50:27Z

I think testcase was failing for non-iceberg tables managed by spark session catalog. So, should not be an impact for iceberg (tables)/users?

chinnaraolalam · 2024-03-20T19:11:16Z

@ajantha-bhat @RussellSpitzer Testcase was failing for non-iceberg table, but it should not effect non-iceberg tables. This might be an issue

I can see 2 cases:

CASE 1: Launching spark-sql session with session catalog(which is spark default, here iceberg tables will not work) and can manage non-iceberg tables. when drop the non-iceberg table like parquet table, it will purge data and it will not leave any data on disk.

CASE 2: Launching spark-sql session with Spark session catalog(Iceberg provided, here iceberg and non-iceberg tables can be managed) will work fine for iceberg tables, when operate on non-iceberg tables like parquet table, drop non-icerberg table will not purge the data and data will be leaked until manual cleanup.

So here CASE-1 and CASE-2 behaviour is different, more over launching spark-sql with spark session catalog is bringing behavioural change for non-iceberg tables(Its an issue).

In CASE 2: CREATE` TABLE parquettable (id bigint, data string) USING parquet;
INSERT INTO parquettable VALUES (1,'A),(2,'B'),(3,'C');
SELECT id,data FROM parquettable WHERE lenght(data) = 1;
DROP TABLE parquettable;
CREATE TABLE parquettable (id bigint, data string) USING parquet; --> This query will fail and throw exception like [LOCATION_ALREADY_EXIST] (as drop table purge not happended)

Where as in case 1, It is passing and it will not throw any error

Please add you are thoughts.

manuzhang · 2024-03-22T07:36:16Z

@chinnaraolalam which versions of Iceberg and Spark are you using in test cases?

RussellSpitzer · 2024-03-22T11:33:32Z

We have always in our internal builds defaulted to purge off for iceberg (even before there was an option) for safety. I prefer that behavior and don't really mind that it's different for non Iceberg tables.

What is the suggestion here on what we should do? Also should we start a new issue?

Bump Spark minor versions for 3.3 and 3.4

047aff8

Spark 3.4.2 released yesterday with security and correctness fixes: https://spark.apache.org/news/spark-3-4-2-released.html Spark 3.3.3 was released two months ago https://spark.apache.org/news/spark-3-3-3-released.html

ajantha-bhat changed the title ~~Bump Spark minor versions for 3.3 and 3.4~~ Spark: Bump Spark minor versions for 3.3 and 3.4 Dec 1, 2023

ajantha-bhat commented Dec 1, 2023

View reviewed changes

Fokko approved these changes Dec 1, 2023

View reviewed changes

Fix test case failures

26f8804

github-actions bot added the spark label Dec 2, 2023

ajantha-bhat commented Dec 2, 2023

View reviewed changes

ajantha-bhat requested review from RussellSpitzer and aokolnychyi December 5, 2023 16:59

RussellSpitzer approved these changes Dec 6, 2023

View reviewed changes

RussellSpitzer merged commit 70ec4e5 into apache:main Dec 6, 2023
45 checks passed

This was referenced Dec 11, 2023

streaming update jasonf20/iceberg#1

Closed

streaming update jasonf20/iceberg#2

Closed

lisirrx pushed a commit to lisirrx/iceberg that referenced this pull request Jan 4, 2024

Spark: Bump Spark minor versions for 3.3 and 3.4 (apache#9187)

9e38710

chinnaraolalam mentioned this pull request Apr 16, 2024

Drop table purge issue for parquet tables with SparkSessionCatalog #10157

Open

devangjhabakh pushed a commit to cdouglas/iceberg that referenced this pull request Apr 22, 2024

Spark: Bump Spark minor versions for 3.3 and 3.4 (apache#9187)

7c5fc89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark: Bump Spark minor versions for 3.3 and 3.4 #9187

Spark: Bump Spark minor versions for 3.3 and 3.4 #9187

ajantha-bhat commented Dec 1, 2023

ajantha-bhat Dec 1, 2023

RussellSpitzer commented Dec 1, 2023

ajantha-bhat commented Dec 2, 2023

ajantha-bhat commented Dec 2, 2023

ajantha-bhat Dec 2, 2023

RussellSpitzer Dec 6, 2023

ajantha-bhat commented Dec 5, 2023

RussellSpitzer commented Dec 6, 2023

manuzhang commented Jan 22, 2024

ajantha-bhat commented Jan 22, 2024 •

edited

Loading

chinnaraolalam commented Mar 20, 2024

manuzhang commented Mar 22, 2024

RussellSpitzer commented Mar 22, 2024

Spark: Bump Spark minor versions for 3.3 and 3.4 #9187

Spark: Bump Spark minor versions for 3.3 and 3.4 #9187

Conversation

ajantha-bhat commented Dec 1, 2023

ajantha-bhat Dec 1, 2023

Choose a reason for hiding this comment

RussellSpitzer commented Dec 1, 2023

ajantha-bhat commented Dec 2, 2023

ajantha-bhat commented Dec 2, 2023

ajantha-bhat Dec 2, 2023

Choose a reason for hiding this comment

RussellSpitzer Dec 6, 2023

Choose a reason for hiding this comment

ajantha-bhat commented Dec 5, 2023

RussellSpitzer commented Dec 6, 2023

manuzhang commented Jan 22, 2024

ajantha-bhat commented Jan 22, 2024 • edited Loading

chinnaraolalam commented Mar 20, 2024

manuzhang commented Mar 22, 2024

RussellSpitzer commented Mar 22, 2024

ajantha-bhat commented Jan 22, 2024 •

edited

Loading