-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split date skeleton key in two, with datagen #4778
Conversation
Sizes (all locales, maximal deduplication, for each key):
So it looks like splitting does make things smaller. |
type DateDaySkeletonPatternsV1Marker: KeyedDataMarker<Yokeable = PackedSkeletonDataV1<'static>>; | ||
|
||
#[cfg(any(feature = "datagen", feature = "experimental"))] | ||
/// The data marker for loading date-ext pattern skeletons for this calendar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have more comments explaining what ext is
I did a little experiment to find the most efficient split. For splitting into 2 keys, we're near the optimum with the split proposed in this PR. For splitting into 3 keys, we should add a split between YearMonthDay and EraYearMonthDay. For splitting into 4 keys, we should add a split between Year and EraYear. It's also possible that reordering the skeleta within the list would yield smaller blocks. I haven't tested this. |
Splitting the skeleta each into a separate key makes things the smallest, even with the lookup tables. The lookup tables get smaller when the keys are smaller because there is more deduplication. Even in hybrid mode, the wins on data size are great enough that it looks like it would more than offset the increase in lookup table sizes. |
Here's a hypothesis worth testing. I expect that making the keys shared across calendars but unique on components would yield better results than the other way around. Patterns don't differ as much across calendars as display names, but they differ a lot from skeleton to skeleton. |
Moving this to draft because I want to compare it with #4779 |
Wait, really? How would we handle cases where calendars differ, an "extra values" table? Also won't that deduplicate anyway? |
Need to see if this helps with data size