-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34615][SQL] Support java.time.Period
as an external type of the year-month interval type
#31765
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
java.time.Period
as an external type of the year-month interval typejava.time.Period
as an external type of the year-month interval type
@cloud-fan @yaooqinn @srielau @HyukjinKwon Could you review this PR, please |
* @return The total number of months in the period, may be negative | ||
* @throws ArithmeticException If numeric overflow occurs | ||
*/ | ||
def periodToMonths(period: Period): Int = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we fail if the day field is not 0? Or at least give a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't fail when we convert:
java.sql.Date
has time component with millisecond precision but we ignore it when we convert to days atspark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 93 in 56e664c
val julianDays = Math.toIntExact(Math.floorDiv(millisLocal, MILLIS_PER_DAY)) java.sql.Timestamp
which has nanoseconds precision:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 166 in 56e664c
val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % MICROS_PER_MILLIS java.time.Instant
which contains nanoseconds, and we don't fail when we convert it to microseconds:spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Line 383 in 56e664c
val result = Math.addExact(us, NANOSECONDS.toMicros(instant.getNano))
To be consistent with current implementation for other types, I do believe we should not fail.
Or at least give a warning?
This will just fill in the logs by useless records, and this is again inconsistent with current implementation.
thanks, merging to master! |
Late LGTM! |
What changes were proposed in this pull request?
In the PR, I propose to extend Spark SQL API to accept
java.time.Period
as an external type of recently added new Catalyst type -YearMonthIntervalType
(see #31614). The Java classjava.time.Period
has similar semantic to ANSI SQL year-month interval type, and it is the most suitable to be an external type forYearMonthIntervalType
. In more details:PeriodConverter
which convertsjava.time.Period
instances to/from internal representation of the Catalyst typeYearMonthIntervalType
(toInt
type). ThePeriodConverter
object uses new methods ofIntervalUtils
:periodToMonths()
converts the input period to the total length in months. If this period is too large to fitInt
, the method throws the exceptionArithmeticException
. Note: the input period has "days" precision, the method just ignores the days unit.monthToPeriod()
obtains ajava.time.Period
representing a number of months.YearMonthIntervalType
inRowEncoder
via the methodscreateDeserializerForPeriod()
andcreateSerializerForJavaPeriod()
.java.time.Period
instances.Why are the changes needed?
java.time.Period
collections, and construct year-month interval columns. Also to collect such columns back to the driver side.Does this PR introduce any user-facing change?
The PR extends existing functionality. So, users can parallelize instances of the
java.time.Duration
class and collect them back:How was this patch tested?
CatalystTypeConvertersSuite
to check conversion from/tojava.time.Period
.RowEncoderSuite
.YearMonthIntervalType
are tested inLiteralExpressionSuite
.DatasetSuite
andJavaDatasetSuite
.IntervalUtilsSuites
to check conversionsjava.time.Period
<-> months.