Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37734][SQL][TESTS] Upgrade h2 from 1.4.195 to 2.0.204 #35013

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion sql/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>1.4.195</version>
<version>2.0.204</version>
<scope>test</scope>
</dependency>
<dependency>
Expand Down
31 changes: 8 additions & 23 deletions sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ import java.util.{Calendar, GregorianCalendar, Properties, TimeZone}

import scala.collection.JavaConverters._

import org.h2.jdbc.JdbcSQLException
import org.mockito.ArgumentMatchers._
import org.mockito.Mockito._
import org.scalatest.{BeforeAndAfter, PrivateMethodTester}
Expand Down Expand Up @@ -54,7 +53,8 @@ class JDBCSuite extends QueryTest
val urlWithUserAndPass = "jdbc:h2:mem:testdb0;user=testUser;password=testPass"
var conn: java.sql.Connection = null

val testBytes = Array[Byte](99.toByte, 134.toByte, 135.toByte, 200.toByte, 205.toByte)
val testBytes = Array[Byte](99.toByte, 134.toByte, 135.toByte, 200.toByte, 205.toByte) ++
Array.fill(15)(0.toByte)

val testH2Dialect = new JdbcDialect {
override def canHandle(url: String): Boolean = url.startsWith("jdbc:h2")
Expand Down Expand Up @@ -87,7 +87,6 @@ class JDBCSuite extends QueryTest
val properties = new Properties()
properties.setProperty("user", "testUser")
properties.setProperty("password", "testPass")
properties.setProperty("rowId", "false")

conn = DriverManager.getConnection(url, properties)
conn.prepareStatement("create schema test").executeUpdate()
Expand Down Expand Up @@ -162,7 +161,7 @@ class JDBCSuite extends QueryTest
|OPTIONS (url '$url', dbtable 'TEST.STRTYPES', user 'testUser', password 'testPass')
""".stripMargin.replaceAll("\n", " "))

conn.prepareStatement("create table test.timetypes (a TIME, b DATE, c TIMESTAMP)"
conn.prepareStatement("create table test.timetypes (a TIME, b DATE, c TIMESTAMP(7))"
).executeUpdate()
conn.prepareStatement("insert into test.timetypes values ('12:34:56', "
+ "'1996-01-01', '2002-02-20 11:22:33.543543543')").executeUpdate()
Expand All @@ -177,12 +176,12 @@ class JDBCSuite extends QueryTest
""".stripMargin.replaceAll("\n", " "))

conn.prepareStatement("CREATE TABLE test.timezone (tz TIMESTAMP WITH TIME ZONE) " +
"AS SELECT '1999-01-08 04:05:06.543543543 GMT-08:00'")
"AS SELECT '1999-01-08 04:05:06.543543543-08:00'")
.executeUpdate()
conn.commit()

conn.prepareStatement("CREATE TABLE test.array (ar ARRAY) " +
"AS SELECT '(1, 2, 3)'")
conn.prepareStatement("CREATE TABLE test.array_table (ar Integer ARRAY) " +
"AS SELECT ARRAY[1, 2, 3]")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why need rename test.array -> test.array_table ? Because Array standard a data type in newer H2. So H2 throws:
An exception or error caused a run to abort: Syntax error in SQL statement "CREATE TABLE TEST.ARRAY[*] (AR INTEGER ARRAY) AS SELECT ARRAY[1, 2, 3]"; expected "identifier"; SQL statement:
CREATE TABLE test.array (ar Integer ARRAY) AS SELECT ARRAY[1, 2, 3] [42001-202] 
  1. Why need add INTEGER ? Because ARRAY need identify data type in newer H2. otherwise:
An exception or error caused a run to abort: Syntax error in SQL statement "CREATE TABLE TEST.ARRAY_TABLE (AR ARRAY[*]) AS SELECT ARRAY[1, 2, 3]"; expected "IDENTITY, data type"; SQL statement:
CREATE TABLE test.array_table (ar ARRAY) AS SELECT ARRAY[1, 2, 3] [42001-202] 
  1. Why need change '(1, 2, 3)' to ARRAY[1, 2, 3]? Because cannot convert it to ARRAY in newer H2.
An exception or error caused a run to abort: Data conversion error converting "'(1, 2, 3)' (ARRAY_TABLE: ""AR"" INTEGER ARRAY)"; SQL statement:
CREATE TABLE test.array_table (ar Integer ARRAY) AS SELECT '(1, 2, 3)' [22018-202] 

.executeUpdate()
conn.commit()

Expand Down Expand Up @@ -638,7 +637,7 @@ class JDBCSuite extends QueryTest
assert(rows(0).getAs[Array[Byte]](0).sameElements(testBytes))
assert(rows(0).getString(1).equals("Sensitive"))
assert(rows(0).getString(2).equals("Insensitive"))
assert(rows(0).getString(3).equals("Twenty-byte CHAR"))
assert(rows(0).getString(3).equals("Twenty-byte CHAR "))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, new result is correct. It's nice to have this update.

stmt.setString(4, "Twenty-byte CHAR")

assert(rows(0).getAs[Array[Byte]](4).sameElements(testBytes))
assert(rows(0).getString(5).equals("I am a clob!"))
}
Expand Down Expand Up @@ -729,20 +728,6 @@ class JDBCSuite extends QueryTest
assert(math.abs(rows(0).getDouble(1) - 1.00000023841859331) < 1e-12)
}

test("Pass extra properties via OPTIONS") {
// We set rowId to false during setup, which means that _ROWID_ column should be absent from
// all tables. If rowId is true (default), the query below doesn't throw an exception.
intercept[JdbcSQLException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we fail before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H2 older version will throws:

org.h2.jdbc.JdbcSQLException: Column "_ROWID_" not found; SQL statement:
CREATE FORCE VIEW PUBLIC._0 AS
SELECT
    _ROWID_
FROM TEST.PEOPLE [42122-195]

But, it OK in newer version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you look into the commit history and see the original purpose of adding this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@cloud-fan cloud-fan Dec 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK seems properties.setProperty("rowId", "false") doesn't work for h2 2.0 anymore. I think we can remove this test now, as well as properties.setProperty("rowId", "false") in the test setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

sql(
s"""
|CREATE OR REPLACE TEMPORARY VIEW abc
|USING org.apache.spark.sql.jdbc
|OPTIONS (url '$url', dbtable '(SELECT _ROWID_ FROM test.people)',
| user 'testUser', password 'testPass')
""".stripMargin.replaceAll("\n", " "))
}
}

test("Remap types via JdbcDialects") {
JdbcDialects.registerDialect(testH2Dialect)
val df = spark.read.jdbc(urlWithUserAndPass, "TEST.PEOPLE", new Properties())
Expand Down Expand Up @@ -1375,7 +1360,7 @@ class JDBCSuite extends QueryTest
}.getMessage
assert(e.contains("Unsupported type TIMESTAMP_WITH_TIMEZONE"))
e = intercept[SQLException] {
spark.read.jdbc(urlWithUserAndPass, "TEST.ARRAY", new Properties()).collect()
spark.read.jdbc(urlWithUserAndPass, "TEST.ARRAY_TABLE", new Properties()).collect()
}.getMessage
assert(e.contains("Unsupported type ARRAY"))
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -421,7 +421,7 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
"PushedGroupByColumns: []"
checkKeywordsExistsInExplain(df, expected_plan_fragment)
}
checkAnswer(df, Seq(Row(2, 1.0)))
checkAnswer(df, Seq(Row(2, 1.5)))
}

test("partitioned scan with aggregate push-down: complete push-down only") {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
JdbcDialects.registerDialect(testH2Dialect)
val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)

val m = intercept[org.h2.jdbc.JdbcSQLException] {
val m = intercept[org.h2.jdbc.JdbcSQLSyntaxErrorException] {
df.write.option("createTableOptions", "ENGINE tableEngineName")
.jdbc(url1, "TEST.CREATETBLOPTS", properties)
}.getMessage
Expand Down Expand Up @@ -326,7 +326,7 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
test("save errors if wrong user/password combination") {
val df = spark.createDataFrame(sparkContext.parallelize(arr2x2), schema2)

val e = intercept[org.h2.jdbc.JdbcSQLException] {
val e = intercept[org.h2.jdbc.JdbcSQLInvalidAuthorizationSpecException] {
df.write.format("jdbc")
.option("dbtable", "TEST.SAVETEST")
.option("url", url1)
Expand Down Expand Up @@ -427,7 +427,7 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
// verify the data types of the created table by reading the database catalog of H2
val query =
"""
|(SELECT column_name, type_name, character_maximum_length
|(SELECT column_name, data_type, character_maximum_length
| FROM information_schema.columns WHERE table_name = 'DBCOLTYPETEST')
""".stripMargin
val rows = spark.read.jdbc(url1, query, properties).collect()
Expand All @@ -436,7 +436,7 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
val typeName = row.getString(1)
// For CHAR and VARCHAR, we also compare the max length
if (typeName.contains("CHAR")) {
val charMaxLength = row.getInt(2)
val charMaxLength = row.getLong(2)
assert(expectedTypes(row.getString(0)) == s"$typeName($charMaxLength)")
} else {
assert(expectedTypes(row.getString(0)) == typeName)
Expand All @@ -452,15 +452,18 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
val df = spark.createDataFrame(sparkContext.parallelize(data), schema)

// out-of-order
val expected1 = Map("id" -> "BIGINT", "first#name" -> "VARCHAR(123)", "city" -> "CHAR(20)")
val expected1 =
Map("id" -> "BIGINT", "first#name" -> "CHARACTER VARYING(123)", "city" -> "CHARACTER(20)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. varchar is not a valid type name in h2 any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. H2 changed the type name.

testUserSpecifiedColTypes(df, "`first#name` VARCHAR(123), id BIGINT, city CHAR(20)", expected1)
// partial schema
val expected2 = Map("id" -> "INTEGER", "first#name" -> "VARCHAR(123)", "city" -> "CHAR(20)")
val expected2 =
Map("id" -> "INTEGER", "first#name" -> "CHARACTER VARYING(123)", "city" -> "CHARACTER(20)")
testUserSpecifiedColTypes(df, "`first#name` VARCHAR(123), city CHAR(20)", expected2)

withSQLConf(SQLConf.CASE_SENSITIVE.key -> "false") {
// should still respect the original column names
val expected = Map("id" -> "INTEGER", "first#name" -> "VARCHAR(123)", "city" -> "CLOB")
val expected = Map("id" -> "INTEGER", "first#name" -> "CHARACTER VARYING(123)",
"city" -> "CHARACTER LARGE OBJECT(9223372036854775807)")
testUserSpecifiedColTypes(df, "`FiRsT#NaMe` VARCHAR(123)", expected)
}

Expand All @@ -470,7 +473,9 @@ class JDBCWriteSuite extends SharedSparkSession with BeforeAndAfter {
StructField("First#Name", StringType) ::
StructField("city", StringType) :: Nil)
val df = spark.createDataFrame(sparkContext.parallelize(data), schema)
val expected = Map("id" -> "INTEGER", "First#Name" -> "VARCHAR(123)", "city" -> "CLOB")
val expected =
Map("id" -> "INTEGER", "First#Name" -> "CHARACTER VARYING(123)",
"city" -> "CHARACTER LARGE OBJECT(9223372036854775807)")
testUserSpecifiedColTypes(df, "`First#Name` VARCHAR(123)", expected)
}
}
Expand Down