-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement data struct PluralRangesV1
#4077
Conversation
36763de
to
cbaa89e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: Nice job!
provider/datagen/src/transform/cldr/cldr_serde/plural_ranges.rs
Outdated
Show resolved
Hide resolved
pub end: String, | ||
} | ||
|
||
impl<'de> Deserialize<'de> for PluralRange { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: Really nice use of a custom Deserialize impl
components/plurals/src/lib.rs
Outdated
@@ -155,7 +161,7 @@ pub enum PluralCategory { | |||
/// | |||
/// - 0 in Arabic (ar), Latvian (lv) | |||
/// - 10~20, 30, 40, 50, ... in Latvian (lv) | |||
Zero, | |||
Zero = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: For dataful enums I wonder if in general we should reserve 0 for the value "null"; or, in the case of the PluralCategory enum, null is "other" so should "other" be 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about it, but I didn't know if reordering the variants would be a breaking change or not (because of the PartialOrd
derive), so I opted for following the specified order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, we could also implement PartialOrd
by hand, but it'll be really weird to have Zero < Other
and Zero as u8 > Other as u8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, this enum has derive(Ord)
which somewhat limits our options
My fully detailed thoughts: I think what you did is fine, but I don't see the harm in making a new enum whose purpose is serialization. We then leave this enum as the public-facing Rust enum. Another thing to consider is, the keys in the ZeroMap do not need to be validated; if you have an out-of-range plural category, you just won't ever find that entry in the map, so we can save a little bit of deserialization code and speed by using a plain u8
as the key. However, that key type does not serialize nicely in JSON, so what we've done in other places is make yet another enum which is a newtype wrapper over the primitive but has nice serialization (usually called UnvalidatedX: see UnvalidatedTinyAsciiStr, UnvalidatedStr, etc). However, the nice part about your proposed solution is that it is the least amount of additional boilerplate code. It just might be tying too many functional use cases together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, then it would be:
- A new
RawPluralCategory
enum insideprovider
withRawPluralCategory::Other = 0
and a conversion fromPluralCategory
toRawPluralCategory
. This will be the new value type of the map. - A new
UnvalidatedPluralCategory(u8)
insideprovider
that de/serializes from/to a string if the de/serializer is human readable, which will be the key type of the map. Also a conversion fromRawPluralCategory
toUnvalidatedPluralCategory
.
Is that correct?
Also, should both types be #[doc(hidden]
? Or should I just document them as unstable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, naming aside, that's basically the other model for this.
They should be public in the provider
module with the yellow box warning people that the APIs may be unstable.
I would make the second one pub struct UnvalidatedPluralCategory(pub u8)
that is a u8 for all purposes except that is_human_readable impl Serialize
makes a nice string, returning an error if out of range, and is_human_readable impl Deserialize
knows how to handle the string.
However I'd like to see what @robertbastian or @Manishearth think about these two approaches (re-use the existing enum and make it ZeroVec compatible, or introduce two new types which are data provider specific)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, naming aside, that's basically the other model for this.
They should be public in the
provider
module with the yellow box warning people that the APIs may be unstable.I would make the second one
pub struct UnvalidatedPluralCategory(pub u8)
that is a u8 for all purposes except thatis_human_readable impl Serialize
makes a nice string, returning an error if out of range, andis_human_readable impl Deserialize
knows how to handle the string.However I'd like to see what @robertbastian or @Manishearth think about these two approaches (re-use the existing enum and make it ZeroVec compatible, or introduce two new types which are data provider specific)
Sounds good! I've already finished implementing the API in another branch, so I'll wait for your consensus in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a new type is the best solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
3b9734c
to
2ac056b
Compare
Applied the review. Let me know if those names are okay. |
components/plurals/src/provider.rs
Outdated
/// - `key1` corresponds to the end category of the range. | ||
#[cfg_attr(feature = "serde", serde(borrow))] | ||
pub ranges: | ||
ZeroMap2d<'data, UnvalidatedPluralCategory, UnvalidatedPluralCategory, RawPluralCategory>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: ZeroMap2d
might be overkill. I think we don't need functionality to iterate over second-order keys, and we can fit both keys in a single byte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I'll use a simple ZeroMap then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced it with a ZeroMap
. It makes the human-readable de/serialization slightly worse but non-human-readable de/serialization is a lot smaller and faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for @sffc to take a look before merging.
components/plurals/src/provider.rs
Outdated
.ok_or_else(|| S::Error::custom("invalid tag in UnvalidatedPluralRange"))?; | ||
let end = RawPluralCategory::new_from_u8(self.0 & 0x0F) | ||
.ok_or_else(|| S::Error::custom("invalid tag in UnvalidatedPluralRange"))?; | ||
(start, end).serialize(serializer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: if we wanted to make this human-readable we could serialize to a separated string like "One/Other", but I think what you have here is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be a pretty quick fix. Maybe something like One-Other
to use a range-like syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like that!
(This is a totally optional suggestion; this is mergeable as-is)
databake(path = icu_plurals::provider), | ||
)] | ||
#[zerovec::make_ule(UnvalidatedPluralRangeULE)] | ||
pub struct UnvalidatedPluralRange(pub u8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Praise: so compact!
7d74cae
to
b1e132a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Related to #3012.
This is still missing some details and documentation, but I wanted to put it out for you to evaluate the data model.Should be feature-complete now.One of the biggest space optimizations was obviating the need to store ranges where the resulting category of the range is the same as the end category, since TR35 specifies: