-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Sedate IXDTF string parsing #4646
Conversation
Marking as ready for review 😄 A few general questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! Leaving some architectural feedback first; I noted some of these things when I took a pass over the code in the temporal_rs
repository, but I didn't post them at that time because they seem like incremental changes from the code base.
General updates were made concerning the above review. There are a couple more points that need to still be addressed, but I had a question that I thought I'd pose early. Is there any preference when handling parsing MonthDay, YearMonth, and annotated Time strings? I'm tempted to keep those but include them as part of a separate parser from the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I did a more thorough review this time. Just some questions mostly about data types. Feel free to convert comments into follow-up issues as you see fit. I would like to land this soon and favor follow-up PRs.
1. Add Sign enum and adjust IsoDurationRecord and UTCOffsetRecord 2. Remove graceful failing of IxdtfOptions 3. General doc updates
1. Add parse_annotated_year_month 2. Add test to check YearMonth isn't parsing a time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: This is the best version yet! Thanks for all the work you've been putting into this. All my comments can be done later or filed as a follow-up issue. I think we can land the first version now!
if kv.key == "u-ca" { | ||
if calendar.is_none() { | ||
calendar = Some(kv.value); | ||
calendar_crit = kv.critical; | ||
continue; | ||
} | ||
|
||
if calendar_crit || kv.critical { | ||
return Err(ParserError::CriticalDuplicateCalendar); | ||
} | ||
} | ||
|
||
annotations.push(kv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: This logic seems a little odd to me. I think what it does is:
[u-ca=hebrew][!u-ca=buddhist]
= error[!u-ca=hebrew][u-ca=buddhist]
= error[!u-ca=hebrew]
= OK[u-ca=hebrew][u-ca=buddhist]
= OK with main calendarhebrew
and thenbuddhist
going intoannotations
It also doesn't return whether the main calendar is critical or not, which seems like information that is being lost.
I think I lean toward the following behavior for simplicity:
- Change the
calendar
return type to be anAnnotation
, keeping the critical flag - Use some proper algorithm for selecting which calendar is the main calendar: the first, the last, the first critical, or the last critical. i.e., which calendar wins among:
[u-ca=aaa][!u-ca=bbb][!u-ca=ccc][u-ca=ddd]
- Put all other calendars in order into the
annotations
list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably true. I almost switched it (like with TimeZoneAnnotation
), but given that u-ca
is a registered key, I was viewing it as the parser being the primary consumer, so it was fine to leave it without the critical flag included. After looking back at the spec draft, I'm conflicted but leaning a bit more towards include the annotation.
For clarity on the point in the specification I'm thinking about mostly:
An application that encounters duplicate use of a suffix key in
elective suffixes and does not want to perform additional processing
on this inconsistency MUST choose the first suffix that has that key,
i.e.,
2022-07-08T00:14:07Z[u-ca=chinese][u-ca=japanese]
2022-07-08T00:14:07Z[u-ca=chinese]
All that considered and given the current approach to this parser, I think it makes sense to set calendar to the Annotation
for the first calendar and all the rest are moved to annotations
to be handled by the user, i.e. 1 & 3, but default to the first calendar over selecting calendars
Secondary Idea:
Would it make sense to add an opt-in strict
parsing option to IxdtfParser
for calendar / critical flag handling? When strict
the parser would through an error on any strict calendar ambiguity or unknown critical annotations, and when not strict the calendar ambiguity and unknown critical annotations are provided to the user to handle?
|
||
/// Utility that returns whether a year is a leap year. | ||
#[inline] | ||
fn in_leap_year(year: i32) -> i8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: return a bool
instead of an int?
/// Parsing options that can be provided to `IxdtfParser` to change parsing behavior. | ||
#[non_exhaustive] | ||
#[derive(Debug, Clone, Copy, PartialEq)] | ||
pub enum IxdtfOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We should bikeshed the name of this enum.
match self.options { | ||
IxdtfOptions::None => datetime::parse_annotated_date_time(&mut self.cursor), | ||
IxdtfOptions::YearMonth => datetime::parse_annotated_year_month(&mut self.cursor), | ||
IxdtfOptions::MonthDay => datetime::parse_annotated_month_day(&mut self.cursor), | ||
IxdtfOptions::Time => time::parse_annotated_time_record(&mut self.cursor), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: With big match statements like this, it is better for DCE if we instead had four functions like parse
, parse_year_month
, parse_month_day
, and parse_time
. Then we wouldn't need the enum.
|
||
/// The parsed sign value, representing whether its struct is positive or negative. | ||
#[repr(i8)] | ||
#[allow(clippy::exhaustive_enums)] // Sign can only be positive or negative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: there are often 3 types of signs, known as Signum
, which are "explicit minus", "explicit plus", and "no sign", as in "-1", "+1" and "1". Does IXDTF make a distinction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IXDTF does not to my knowledge and from what I can find. RFC3339 does layout that the offset must be "+" or "-". So Duration is the only one that can be "no sign" as it is considered optional.
pub enum DurationFraction { | ||
/// The fraction value applied to an hour. | ||
Hours(u64), | ||
/// The fraction value applied to the minutes field. | ||
Minutes(u64), | ||
/// The fraction value applied to the seconds field. | ||
Seconds(u32), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: another way to model this, which may or may not be better, would be
pub enum TimeDuration {
None,
Hours {
hours: u32,
fraction: u64,
},
Minutes {
hours: u32,
minutes: u32,
fraction: u64,
},
Seconds {
hours: u32,
minutes: u32,
seconds: u32,
fraction: u32,
}
}
This more properly encodes the invariant that if hours have fractions, then minutes and seconds don't exist, for example.
Hi! This PR is intended to close #2127.
It would add the
IxdtfParser
andIsoDurationParser
.I'm opening this as a draft for a bit early feedback and comments.
This is essentially a proposal to pull in the core of the parser from over in
temporal_rs
along with some updates I've added to allow the parser to function as a standalone as well as some general clean up.There's still a couple things I have left to do:
The one question I have is that the previous experimental version of the parser had a bit more precision on the errors when it came to parsing digits.
Currently, this really only throws anExpectedDigitChar
error fromCursor::next_digit
. If that is too general, this could be caught/mapped in the individual functions ornext_digit
could return anOption<u8>
to allow more exact error handling.EDIT: I did end up removing
ExpectedDigitChar
as an error.