-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Baked data is big, and compiles slowly, for finely sliced data markers #5230
Comments
2 sounds compelling if we can make it work cleanly in our baking infra |
potentially have a flag for data keys that marks them as "VZV packable" |
Discussed this briefly with @robertbastian. Some points:
To illustrate that last point: // Data struct type
#[derive(Clone)]
pub struct ThingV1<'data> {
pub a: VarZeroVec<'data, str>,
pub b: VarZeroVec<'data, str>,
}
// Borrowed type
#[derive(Copy, Clone)]
pub(crate) struct ThingV1Borrowed<'data> {
pub a: &'data VarZeroSlice<str>,
pub b: &'data VarZeroSlice<str>,
}
// Example top-level owned type
pub struct ThingFormatter {
payload: DataPayload<ThingV1Marker>
}
// Example top-level borrowed type
pub struct ThingFormatterBorrowed<'data> {
payload: ThingV1Borrowed<'data>
}
// To get from one to the other
impl ThingFormatter {
pub fn as_borrowed(&self) -> ThingFormatterBorrowed {
self.payload.get().as_borrowed()
}
} In the above example, |
|
You're right about Maybe 1, 2, 3 are better illustrated with examples: // Option 0, Current (not exactly, but equivalent):
payloads: [
PackedSkeletonDataV1 {
index_info: SkeletonDataIndex::from_raw(...),
patterns: VarZeroVec::from_bytes_unchecked(...)
},
// ...
]
// Option 1:
payloads: [
(SkeletonDataIndex::from_raw(...), &'static VarZeroSlice::from_bytes_unchecked(...)),
// ...
]
// at runtime, use ZeroFrom to get the PackedSkeletonDataV1,
// or put it directly into the borrowed struct
// Option 2:
payloads: VarZeroSlice::from_bytes_unchecked(
// entries are the VarULE repr of PackedSkeletonDataV1
)
// at runtime, use ZeroFrom to get the PackedSkeletonDataV1,
// or put it directly into the borrowed struct
// Option 3:
impl PackedSkeletonDataV1 {
pub const unsafe fn from_parts(raw_index_info, raw_patterns) -> Self {
Self {
index_info: SkeletonDataIndex::from_raw(raw_index_info),
patterns: VarZeroVec::from_bytes_unchecked(raw_patterns),
}
}
}
payloads: [
PackedSkeletonDataV1::from_parts(..., ...)
]
// identical runtime characteristics to the current implementation |
|
The baked provider used to use // correction: currently we have a slice of structs, not an array of struct refs
payloads: &'static [
PackedSkeletonDataV1 {
index_info: SkeletonDataIndex::from_raw(...),
patterns: VarZeroVec::from_bytes_unchecked(...)
},
// ...
] Option 3 is equivalent to option 1, because |
Yes, I briefly forgot about Option 3 is equivalent to the current solution, option 0 (not option 1), except for file size being slightly smaller. |
A more radical solution (bigger change but maybe better outcome) would be to add it to DynamicDataMarker pub trait DynamicDataMarker {
type Borrowed<'a>;
// not sure if this is the right syntax but you get the idea:
type Yokeable: for<'a> Yokeable<'a> + ZeroFrom<Self::Borrowed<'a>>;
} |
This^ is the only implementation path I see for option 1, what alternative were you thinking of? |
The other way to implement option 1 would be to have a databake derive attribute that defines how to get from the static representation to the struct, which we could do as part of #2452. |
One thing databake could potentially do is have a way for the CrateEnv to collect auxiliary codegen, which would work by:
Something like: struct PackedDataSkeletonV1PatternsAux {
patterns: Vec<_>,
}
fn bake(&self, env: &CrateEnv) -> TokenStream2 {
// This uses an anymap or something
let map = env.get_or_insert::<PackedDataSkeletonV1PatternsAux>(PackedDataSkeletonV1PatternsAux::default(),
// The flush function. This is just an example.
|aux| {quote!( const ALL_THE_PATTERNS = #patterns )});
let start = map.patterns.len();
map.patterns.extend(self.patterns);
let end = map.patterns.len();
quote!(PackedSkeletonDataV1 {
// ...
patterns: ALL_THE_PATTERNS[#start..#end],
})
} This still needs some way to make the types work, the example above doesn't attempt to address that, but this could help for tricks like "store all of the data in a big |
This is not 2.0 blocking unless we implement a breaking solution for #5187 |
(additional discussion not recorded) Current thinking:
LGTM: @sffc @robertbastian |
LGTM as well |
To add some urgency to this issue, @kartva says:
( |
Building the latest icu_experimental data:
|
Here are the figures for
That's a big share so I'll focus on reproducing just this in isolation. |
Here's with changing the paths to be imported instead of absolute:
And here's with using a const constructor:
And a helper function:
include_bytes! for the trie data:
|
In tabular form (each row builds on the one above it):
So there seems to be a pretty strong correlation between file size, compile time, and memory usage, except the last few rows with wrapper functions which reduce file size but don't seem to impact the other metrics. |
Do we know which compiler pass is actually slow ( Would be useful to compare that output to that of a normal utils crate and see where the big differences are. |
Revisiting this since we're getting 143 errors again in CI. Here's where we're currently at on main:
I ran it with
Relative to what other crates do, the following steps are the biggest:
Of these, MIR_borrow_checking is by far the slowest relative to other crates; expand_crate and macro_expand_crate add the most to the memory usage. Reminder from my previous exploration: adding a const constructor instead of doing raw struct construction did reduce compile times and peak memory usage (this was before I was measuring the rustc phase breakdown). This is consistent with @Manishearth's interpretation of the above figures. |
I made a reproducible, standalone case here: https://github.com/sffc/icu4x_compile_sample/tree/standalone You can also see some of my previous work on making changes in the July 30 commits on the main branch: https://github.com/sffc/icu4x_compile_sample/commits/main/ |
I added const constructors where they were most impactful in #5541. I don't consider this a long-term solution because:
But, for the short term, it should make CI less flaky. |
Filed rust-lang/rust#134404, with reduced testcase https://github.com/Manishearth/icu4x_compile_sample. Further reduced testcase in rm-macro that shows the difference when you avoid the macro step. |
I still think options 1 and 2 are on the table, at least for keys that would benefit from it.
Note: option 1 could store the data in a In case it was lost earlier, I want to emphasize that options 1 and 2 have the added benefit of reducing compiled baked data size (#5429) by not putting the struct stacks into the binary. |
I agree that they are still on the table. Option 2 doesn't sound too bad. We could have toggles for data loading in the baked system that can produce baked data via alternate means (not just calling Bake per-entry). The idea of baked was to avoid deserialization as much as possible, this does not mean we should avoid it completely; a couple VarULEs here and there that get poked during data loading isn't terrible, the format is good at random-access. |
Thought: if we want to avoid the yoke, we could add a third DataPayload variant with a enum DataPayloadInner<M> {
Yoke(Yoke<M::Yokeable, Option<Rc<[u8]>>>),
StaticRef(&'static M::Yokeable),
StaticOwned(M::Yokeable), // NEW
} StaticOwned can be constructed from static borrowed data. It doesn't have to own any heap memory. Not sure if this is better than just using the |
Another idea I had was that the Baked context could grow a helper method that shortens imports. Basically instead of generating // thousands of times
icu::experimental::dimension::provider::units::UnitsDisplayNameV1 { patterns: icu::experimental::relativetime::provider::PluralPatterns { ... } } we generate use icu::experimental::dimension::provider::units::UnitsDisplayNameV1;
use icu::experimental::relativetime::provider::PluralPatterns;
// thousands of times
UnitsDisplayNameV1 { patterns: PluralPatterns { ... }} when generating a path, instead of doing A wrinkle is that rust doesn't like having multiple imports in the same file, and these are macro generated, so we have to either:
What I've noticed so far is that: |
It's interesting, and somewhat accurate but with caveats, to consider options 1 and 2 to be deserialization. I generally think of deserialization as a fallible operation that checks a bunch of invariants and unpacks a structure with lots of hopping around. Zero-copy deserialization does this but without allocating memory. What I'm talking about here is basically: struct Message<'data> { a: &'data str, b: &'data str }
// Current:
const BAKED_MESSAGE: &'static Message<'static> = Message { a: "a", b: "b" };
// Proposed, option 1 with tuple slice:
const MESSAGE_STRS: &'static (&'static str, &'static str) = &[("a", "b")];
const fn get_message(i: usize) {
let (a, b) = MESSAGE_STRS[i];
Message { a, b }
}
// Proposed, option 1 with VZV:
// not sure if this code works; could also be written as a byte string literal
const MESSAGE_STRS: &'static VarZeroSlice<VarTuple2ULE<str, str>> = var_zero_slice![("a", "b")];
fn get_message(i: usize) {
let tpl = MESSAGE_STRS.get(i).unwrap();
Message { a: tpl.get_0(), b: tpl.get_1() }
} Do we consider |
(To be clear, my note about "avoid deserialization" was not a disagreement)
Yep, though even with fully trusted data a thing that zero-copy deserialization has to do (that baked typically does not) is figuring out where different pieces of data start and end, and that's the real benefit of baked over an unsafe deserialization framework.
I think get_message is doing the same kind of work that deserialization does, regardless of what we call it. I think it's fine for us to do this judiciously in baked data. Having a mode for the baked provider to bake a data blob as a single vec rather than as multiple pieces of data would be quite neat. |
Or, the context can always just generate unique names, like a JS minifier.
I got more like 5-10% on shortening the imports in #5230 (comment) (measured in a slightly different context) |
Oh, good point. I'd like it to be readable, though, so perhaps doing stuff like
Sounds good, I didn't look too closely at the numbers, the |
See additional discussion in #3260 (comment) |
Follow-up tasks from #6133:
Both of these are best done after we migrate clients away from |
Is that not just encode_var_ule_as_box? I don't see anything wrong with That's a compelling argument for not using EncodeAsVarULE here and instead using a different trait though (in opposition to my previous comment): It's nice to be able to customize what it encodes to. So I retract my objection to MaybeEncodeAsVar, though I'd still like to keep ZeroFrom around. (I would still prefer the trait to be named so that it is clear that it is about this optimization, but that's not so important to figure out now) |
icu_datetime
compile times have regressed a lot since I added neo datetime data, and #5221 appears to be choking in CI.The finely sliced data markers (small data structs designed to work with many data marker attributes) give compelling data sizes and stack sizes in Postcard (#4818, #4779). However, in Baked, they significantly increase file size, and the numbers for data size are also not as compelling because baked data includes a lot more pointers (for example, at least 24 bytes for a ZeroVec) which are duplicated for each and every instance of the data struct.
Example data struct that is used in a finely sliced data marker:
Some ideas:
PackedSkeletonDataV1<'static>
, we could instead store many static instances of(SkeletonDataIndex, &[u8])
, and build an instance ofPackedSkeletonDataV1<'static>
at runtime. This is "free", and it should significantly reduce file size, but it causes us to use a Yoke code path.VarZeroVec<PackedSkeletonDataV1ULE>
, and build an instance ofPackedSkeletonDataV1<'static>
at runtime. This should result in the smallest file size and data size, in line with postcard sizes, but is a bit more of a runtime cost since we need to do a VZV lookup. However, it's only one lookup and only when the locale was found, so I don't think we should try to avoid this cost for the sake of avoiding this cost.pub fn PackedSkeletonDataV1::new_unchecked(SkeletonDataIndex, &[u8])
, reducing file size and therefore probably compile times without changing any runtime characteristics. See DataBake: split serialized form from runtime form #2452.@robertbastian @Manishearth @younies
The text was updated successfully, but these errors were encountered: