Skip to content

Commit

Permalink
ORC-1697: Fix IllegalArgumentException when reading json timestamp ty…
Browse files Browse the repository at this point in the history
…pe in benchmark

### What changes were proposed in this pull request?
This PR aims to fix `IllegalArgumentException` when reading json timestamp type in benchmark.

Write and read json, convert timestamp type to long type instead of string type.

### Why are the changes needed?
ORC-1191 Switch the csv format of taxi to parquet and read the timestamp format of parquet, but it is in microseconds format, which is different from the millisecond format of Java's `java.sql.Timestamp`.

taxi source parquet meta
```bash
  optional int64 tpep_pickup_datetime (TIMESTAMP(MICROS,false));
  optional int64 tpep_dropoff_datetime (TIMESTAMP(MICROS,false));
```

When we write the data into json and then use the scan command, we will get the following error.
```java
java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json
```

```
Exception in thread "main" java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
	at java.sql/java.sql.Timestamp.valueOf(Timestamp.java:224)
	at org.apache.orc.bench.core.convert.json.JsonReader$TimestampColumnConverter.convert(JsonReader.java:175)
	at org.apache.orc.bench.core.convert.json.JsonReader.nextBatch(JsonReader.java:86)
	at org.apache.orc.bench.core.convert.ScanVariants.run(ScanVariants.java:92)
	at org.apache.orc.bench.core.Driver.main(Driver.java:64)
```

Because json data of type timestamp is written via `java.sql.Timestamp#toString`, but reading the data `java.sql.Timestamp#valueOf` will report an error.

```java
    Timestamp ts = new Timestamp(1446341079000000L);
    System.out.println(ts);
    System.out.println(Timestamp.valueOf(ts.toString()));
```
```
47802-09-23 02:50:00.0
Exception in thread "main" java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
	at java.sql.Timestamp.valueOf(Timestamp.java:237)
```

### How was this patch tested?
local test

```bash
java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -format json -data taxi -compress snappy
```

```bash
java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -format json -data taxi -compress snappy
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #1902

Closes #1930 from cxzl25/ORC-1697_v2.

Authored-by: sychen <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
  • Loading branch information
cxzl25 authored and dongjoon-hyun committed Aug 5, 2024
1 parent 21a6380 commit d09dbf3
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,7 @@ public void convert(JsonElement value, ColumnVector vect, int row) {
vect.isNull[row] = true;
} else {
TimestampColumnVector vector = (TimestampColumnVector) vect;
vector.set(row, Timestamp.valueOf(value.getAsString()
.replaceAll("[TZ]", " ")));
vector.set(row, new Timestamp(value.getAsLong()));
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,8 +160,7 @@ static void printValue(com.google.gson.stream.JsonWriter writer, ColumnVector ve
(int) ((LongColumnVector) vector).vector[row]).toString());
break;
case TIMESTAMP:
writer.value(((TimestampColumnVector) vector)
.asScratchTimestamp(row).toString());
writer.value(((TimestampColumnVector) vector).getTimestampAsLong(row));
break;
case LIST:
printList(writer, (ListColumnVector) vector, schema, row);
Expand Down

0 comments on commit d09dbf3

Please sign in to comment.