Implement Sedate IXDTF string parsing #4646

nekevss · 2024-03-01T23:53:50Z

Hi! This PR is intended to close #2127.

It would add the IxdtfParser and IsoDurationParser.

I'm opening this as a draft for a bit early feedback and comments.

This is essentially a proposal to pull in the core of the parser from over in temporal_rs along with some updates I've added to allow the parser to function as a standalone as well as some general clean up.

There's still a couple things I have left to do:

ALOT more documentation is needed. (More docs added, can be built out further if wanted)
Clean up the tests / potentially add more tests.

The one question I have is that the previous experimental version of the parser had a bit more precision on the errors when it came to parsing digits. Currently, this really only throws an ExpectedDigitChar error from Cursor::next_digit. If that is too general, this could be caught/mapped in the individual functions or next_digit could return an Option<u8> to allow more exact error handling.

EDIT: I did end up removing ExpectedDigitChar as an error.

CLAassistant · 2024-03-01T23:53:55Z

All committers have signed the CLA.

nekevss · 2024-03-03T19:32:29Z

Marking as ready for review 😄

A few general questions:

What would be the preferred way of resolving an ambiguity with UTC offsets and Time Zone Annotations? Currently, this defaults to the tz annotation if provided, but maybe this isn't the best approach. It could be better to provide the UTC offset value separate from the tz annotation and allow the caller to handle any ambiguous cases.
Would it be preferred for IsoParseRecord to not have as many nested records as it currently does? (primary cases: DateRecord and TimeRecord).
IxdtfParser has a lot of parsing options beyond just extended date time (admittedly, a decent portion targeting some Temporal functionality). Would it be preferred to split some of the parsing like time zone and time into a TemporalParser over IxdtfParser?

sffc

Good work! Leaving some architectural feedback first; I noted some of these things when I took a pass over the code in the temporal_rs repository, but I didn't post them at that time because they seem like incremental changes from the code base.

utils/ixdtf/src/error.rs

utils/ixdtf/src/lib.rs

utils/ixdtf/src/parsers/mod.rs

utils/ixdtf/src/parsers/records.rs

nekevss · 2024-03-14T17:34:21Z

General updates were made concerning the above review. There are a couple more points that need to still be addressed, but I had a question that I thought I'd pose early.

Is there any preference when handling parsing MonthDay, YearMonth, and annotated Time strings?

I'm tempted to keep those but include them as part of a separate parser from the current IxdtfParser, but I thought I'd float the question in case there was any feedback regarding go forward on those.

sffc

Nice work! I did a more thorough review this time. Just some questions mostly about data types. Feel free to convert comments into follow-up issues as you see fit. I would like to land this soon and favor follow-up PRs.

utils/ixdtf/src/lib.rs

utils/ixdtf/src/parsers/mod.rs

utils/ixdtf/src/parsers/datetime.rs

utils/ixdtf/src/parsers/time_zone.rs

utils/ixdtf/src/parsers/records.rs

utils/ixdtf/src/parsers/mod.rs

utils/ixdtf/src/parsers/datetime.rs

utils/ixdtf/src/parsers/records.rs

1. Add Sign enum and adjust IsoDurationRecord and UTCOffsetRecord 2. Remove graceful failing of IxdtfOptions 3. General doc updates

1. Add parse_annotated_year_month 2. Add test to check YearMonth isn't parsing a time

sffc

Praise: This is the best version yet! Thanks for all the work you've been putting into this. All my comments can be done later or filed as a follow-up issue. I think we can land the first version now!

sffc · 2024-03-26T23:44:49Z

utils/ixdtf/src/parsers/annotations.rs

+        if kv.key == "u-ca" {
+            if calendar.is_none() {
+                calendar = Some(kv.value);
+                calendar_crit = kv.critical;
+                continue;
+            }
+
+            if calendar_crit || kv.critical {
+                return Err(ParserError::CriticalDuplicateCalendar);
+            }
+        }
+
+        annotations.push(kv);


Thought: This logic seems a little odd to me. I think what it does is:

[u-ca=hebrew][!u-ca=buddhist] = error

[!u-ca=hebrew][u-ca=buddhist] = error

[!u-ca=hebrew] = OK

[u-ca=hebrew][u-ca=buddhist] = OK with main calendar hebrew and then buddhist going into annotations

It also doesn't return whether the main calendar is critical or not, which seems like information that is being lost.

I think I lean toward the following behavior for simplicity:

Change the calendar return type to be an Annotation, keeping the critical flag

Use some proper algorithm for selecting which calendar is the main calendar: the first, the last, the first critical, or the last critical. i.e., which calendar wins among:

[u-ca=aaa][!u-ca=bbb][!u-ca=ccc][u-ca=ddd]

Put all other calendars in order into the annotations list

That's probably true. I almost switched it (like with TimeZoneAnnotation), but given that u-ca is a registered key, I was viewing it as the parser being the primary consumer, so it was fine to leave it without the critical flag included. After looking back at the spec draft, I'm conflicted but leaning a bit more towards include the annotation.

For clarity on the point in the specification I'm thinking about mostly:

An application that encounters duplicate use of a suffix key in elective suffixes and does not want to perform additional processing on this inconsistency MUST choose the first suffix that has that key, i.e., 2022-07-08T00:14:07Z[u-ca=chinese][u-ca=japanese] 2022-07-08T00:14:07Z[u-ca=chinese]

All that considered and given the current approach to this parser, I think it makes sense to set calendar to the Annotation for the first calendar and all the rest are moved to annotations to be handled by the user, i.e. 1 & 3, but default to the first calendar over selecting calendars

Secondary Idea:
Would it make sense to add an opt-in strict parsing option to IxdtfParser for calendar / critical flag handling? When strict the parser would through an error on any strict calendar ambiguity or unknown critical annotations, and when not strict the calendar ambiguity and unknown critical annotations are provided to the user to handle?

sffc · 2024-03-27T00:02:04Z

utils/ixdtf/src/parsers/datetime.rs

+
+/// Utility that returns whether a year is a leap year.
+#[inline]
+fn in_leap_year(year: i32) -> i8 {


Suggestion: return a bool instead of an int?

sffc · 2024-03-27T00:03:07Z

utils/ixdtf/src/parsers/mod.rs

+/// Parsing options that can be provided to `IxdtfParser` to change parsing behavior.
+#[non_exhaustive]
+#[derive(Debug, Clone, Copy, PartialEq)]
+pub enum IxdtfOptions {


Suggestion: We should bikeshed the name of this enum.

sffc · 2024-03-27T00:04:47Z

utils/ixdtf/src/parsers/mod.rs

+        match self.options {
+            IxdtfOptions::None => datetime::parse_annotated_date_time(&mut self.cursor),
+            IxdtfOptions::YearMonth => datetime::parse_annotated_year_month(&mut self.cursor),
+            IxdtfOptions::MonthDay => datetime::parse_annotated_month_day(&mut self.cursor),
+            IxdtfOptions::Time => time::parse_annotated_time_record(&mut self.cursor),
+        }


Thought: With big match statements like this, it is better for DCE if we instead had four functions like parse, parse_year_month, parse_month_day, and parse_time. Then we wouldn't need the enum.

sffc · 2024-03-27T00:07:33Z

utils/ixdtf/src/parsers/records.rs

+
+/// The parsed sign value, representing whether its struct is positive or negative.
+#[repr(i8)]
+#[allow(clippy::exhaustive_enums)] // Sign can only be positive or negative.


Question: there are often 3 types of signs, known as Signum, which are "explicit minus", "explicit plus", and "no sign", as in "-1", "+1" and "1". Does IXDTF make a distinction?

IXDTF does not to my knowledge and from what I can find. RFC3339 does layout that the offset must be "+" or "-". So Duration is the only one that can be "no sign" as it is considered optional.

sffc · 2024-03-27T00:10:53Z

utils/ixdtf/src/parsers/records.rs

+pub enum DurationFraction {
+    /// The fraction value applied to an hour.
+    Hours(u64),
+    /// The fraction value applied to the minutes field.
+    Minutes(u64),
+    /// The fraction value applied to the seconds field.
+    Seconds(u32),
+}


Thought: another way to model this, which may or may not be better, would be

pub enum TimeDuration { None, Hours { hours: u32, fraction: u64, }, Minutes { hours: u32, minutes: u32, fraction: u64, }, Seconds { hours: u32, minutes: u32, seconds: u32, fraction: u32, } }

This more properly encodes the invariant that if hours have fractions, then minutes and seconds don't exist, for example.

This PR adjusts TimeDurationRecord to align with some feedback from #4646 regarding the representation of TimeDurationRecord

nekevss marked this pull request as ready for review March 3, 2024 19:20

nekevss requested a review from a team as a code owner March 3, 2024 19:20

sffc requested changes Mar 4, 2024

View reviewed changes

sffc self-requested a review March 17, 2024 15:36

sffc reviewed Mar 17, 2024

View reviewed changes

utils/ixdtf/src/parsers/records.rs Outdated Show resolved Hide resolved

utils/ixdtf/src/parsers/records.rs Outdated Show resolved Hide resolved

sffc self-requested a review March 19, 2024 09:32

sffc reviewed Mar 19, 2024

View reviewed changes

nekevss requested a review from sffc March 20, 2024 04:30

nekevss and others added 18 commits March 25, 2024 17:18

Pull in temporal_rs parser with some adaptations

2578ad5

Build out docs and some general updates

5397c5d

Update README.md and small changes to intro docs

2b7127d

Impl ZonedDateTime parsing with tests and some fixes

496db84

Build out and add links to documentation

105c982

Refactor to remove external errors

02921bf

Add Time string parsing and adjust module names and visibility

3b4d29e

Touch lockfile

aa0907c

Make license check pass

e2f1999

Update implementation according to review

55c14cf

Update and fix lib docs

c8483b4

Update lockfile

03c5bd8

Add TemporalParser experiment with features

4dc8c67

Update README and fix feature flags

269e25b

Remove empty line from end of code block

b913f1a

Regen README and fix msrv fail

6bf905e

Apply review feedback

8a4d697

Remove now valid year range test from tests and hang cfg flags

7393032

nekevss added 8 commits March 25, 2024 17:18

General cleanup in tests and etc.

178874a

IsoParseRecord -> IxdtfParseRecord

314ec87

Apply review feedback p3

438fec4

1. Add Sign enum and adjust IsoDurationRecord and UTCOffsetRecord 2. Remove graceful failing of IxdtfOptions 3. General doc updates

Adjust module headers

0e5799e

Update to checked_* in duration and add a couple tests

ecedbe4

DurationDigitExceededRange -> DurationValueExceededRange

38dd2f6

Separate YearMonth option code path from DateTime

0c39267

1. Add parse_annotated_year_month 2. Add test to check YearMonth isn't parsing a time

Added a doc comment to parse_annotated_month_day

739c08d

nekevss force-pushed the ixdtf-parser branch from 7d44d75 to 739c08d Compare March 25, 2024 21:20

sffc approved these changes Mar 27, 2024

View reviewed changes

sffc merged commit 7e151d4 into unicode-org:main Mar 27, 2024
30 checks passed

sffc mentioned this pull request Mar 27, 2024

Add Sedate IXDTF (extended ISO) string parsing function #2127

Open

nekevss deleted the ixdtf-parser branch March 27, 2024 01:39

nekevss mentioned this pull request Mar 28, 2024

Add IXDTF annotation handler and update critical flag handling #4751

Merged

jedel1043 mentioned this pull request Apr 14, 2024

Migrate to the ixdtf crate for datetime parsing boa-dev/temporal#39

Closed

nekevss mentioned this pull request Apr 18, 2024

Adjust ixdtf's TimeDurationRecord representation #4819

Merged

sffc pushed a commit that referenced this pull request Apr 25, 2024

Adjust ixdtf's TimeDurationRecord representation (#4819)

9f23363

This PR adjusts TimeDurationRecord to align with some feedback from #4646 regarding the representation of TimeDurationRecord

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Sedate IXDTF string parsing #4646

Implement Sedate IXDTF string parsing #4646

nekevss commented Mar 1, 2024 •

edited

Loading

CLAassistant commented Mar 1, 2024 •

edited

Loading

nekevss commented Mar 3, 2024

sffc left a comment

nekevss commented Mar 14, 2024

sffc left a comment

sffc left a comment

sffc Mar 26, 2024

nekevss Mar 27, 2024

sffc Mar 27, 2024

sffc Mar 27, 2024

sffc Mar 27, 2024

sffc Mar 27, 2024

nekevss Mar 27, 2024

sffc Mar 27, 2024

Implement Sedate IXDTF string parsing #4646

Implement Sedate IXDTF string parsing #4646

Conversation

nekevss commented Mar 1, 2024 • edited Loading

CLAassistant commented Mar 1, 2024 • edited Loading

nekevss commented Mar 3, 2024

sffc left a comment

Choose a reason for hiding this comment

nekevss commented Mar 14, 2024

sffc left a comment

Choose a reason for hiding this comment

sffc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nekevss commented Mar 1, 2024 •

edited

Loading

CLAassistant commented Mar 1, 2024 •

edited

Loading