-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use record IDs to index proof batches #1350
Use record IDs to index proof batches #1350
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1350 +/- ##
==========================================
- Coverage 93.51% 93.49% -0.03%
==========================================
Files 210 210
Lines 35032 35211 +179
==========================================
+ Hits 32759 32919 +160
- Misses 2273 2292 +19 ☔ View full report in Codecov by Sentry. |
Draft, 5M: https://draft-mpc.vercel.app/query/view/sharp-mover2024-10-15T2018 (failed) |
// $$ depth >= log_8 target_proof_size $$ | ||
// Because multiplication intermediate storage gets rounded up to blocks of 256, leaving | ||
// some margin is advised. | ||
pub const MAX_PROOF_RECURSION: usize = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursion depth of 9 corresponds to TARGET_PROOF_SIZE of about 117M. Without quicksort batching, this is not quite enough. So either we can wait for quicksort batching, or we can temporarily increase MAX_PROOF_RECURSION slightly. I lean towards the former unless we identify an urgency to take this change.
55baedb
to
2eac4f2
Compare
2eac4f2
to
b3bd7eb
Compare
// * We need (SRF - 1) at the last level to have room for the masks. | ||
let max_uv_values: usize = | ||
(SRF - 1) * SRF.pow(u32::try_from(MAX_PROOF_RECURSION - 2).unwrap()); | ||
tracing::info!("uv_values.len() = {}", uv_values.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to see the values in the draft run, I am going to remove this.
.map(|buf| Fp61BitPrime::deserialize(buf.try_into().unwrap())) | ||
.collect::<Result<Vec<_>, _>>()? | ||
.try_into() | ||
.unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is neat, I didn't know you can go from Result<Vec<_>, _> to Result<Box<_>, _>
, I guess because there exists blanket From<Vec<T>>
for Box<T>
ipa-core/src/protocol/mod.rs
Outdated
pub struct RecordIdRange(Range<RecordId>); | ||
|
||
impl RecordIdRange { | ||
pub const ALL: RecordIdRange = RecordIdRange(RecordId(0)..RecordId(u32::MAX)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to add LAST
to RecordId
and use FIRST..LAST here?
(0..length) | ||
.map(|i| async move { receive_channel_right.receive(RecordId::from(i)).await }), | ||
) | ||
.set_total_records(TotalRecords::Indeterminate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, this is much better than parallel join, can't wait until I see the impact of this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect that this is big enough contributor that the difference will show up in overall runtime, but I do think that getting rid of the parallel join more than offsets any cost of space overhead in the fixed-size structures.
pub async fn receive_from_right<C>( | ||
ctx: &C, | ||
record_id: RecordId, | ||
length: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we need to shrink?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since many of our batches end up being smaller than the target, reasonably often... but since we do this only once per proof, it doesn't seem worth a lot of effort to optimize.
// Prover `P_i` and verifier `P_{i+1}` both compute q(x) | ||
// therefore the "left" share computed by this verifier corresponds to that which | ||
// was used by the prover to the left. | ||
let (q_mask_from_left_prover, my_q_mask) = ctx.prss().generate_fields(record_counter); | ||
record_counter += 1; | ||
let (q_mask_from_left_prover, my_q_mask) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
record_id_range.next().unwrap()
is dominantly used everywhere it seems. I think we should either implement a helper function to make it look less ugly at callsite and potentially not implement the iterator trait for it as we don't have (unless I missed it) for x in range
use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on what to call the helper? Maybe expect_next()
?
b3bd7eb
to
22a1c22
Compare
Draft run, 25M: https://draft-mpc.vercel.app/query/view/flaky-stern2024-10-17T2136. 3 hr 16 min, slightly slower than the run in #1355, which was 3 hr 4 min. Not sure if the difference is real or just variation. |
seems more than just variation to me, but we could try a few runs. |
Yeah, I think it is real. From some smaller runs, the difference appears to be in the share conversion step. I suspect it has to do with the way that OrderingSender sequences communications (record n+1 is not accepted into the channel until after record n) somehow reducing the amount of validation work that can be done in parallel, but I haven't completely worked out how. |
Since we are not guaranteed that all proofs have the same size, I serialize the proof values into fixed-size structures and use one record ID per proof, rather than doing record ID arithmetic, except for PRSS.
Fixes #1269