Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argon2: Add parallel lane processing #149

Merged
merged 5 commits into from
Apr 18, 2021

Conversation

aumetra
Copy link
Contributor

@aumetra aumetra commented Mar 26, 2021

Adds parallel processing for lanes
Closes #103

The parallelism is gated behind the feature parallel

When the feature is activated, the #![forbid(unsafe_code)] gets downgraded to #![deny(unsafe_code)] due to unsafe usage (here)
The #![no_std] flag is being disabled as well

@tarcieri
Copy link
Member

Thanks for implementing this.

I think it might make sense to use rayon to manage the thread pool, similar to what we have in the pbkdf2 crate:

/// Generic implementation of PBKDF2 algorithm.
#[cfg(feature = "parallel")]
#[inline]
pub fn pbkdf2<F>(password: &[u8], salt: &[u8], rounds: u32, res: &mut [u8])
where
F: Mac + NewMac + Clone + Sync,
{
let n = F::OutputSize::to_usize();
let prf = F::new_varkey(password).expect("HMAC accepts all key sizes");
res.par_chunks_mut(n).enumerate().for_each(|(i, chunk)| {
pbkdf2_body(i as u32, chunk, &prf, salt, rounds);
});
}

argon2/src/instance.rs Outdated Show resolved Hide resolved
argon2/src/lib.rs Outdated Show resolved Hide resolved
argon2/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks good to me.

I'll give @newpavlov a few days to comment in case he can think of a better solution to the mutable aliasing problem.

@tarcieri tarcieri requested a review from newpavlov March 27, 2021 19:07
Copy link

@nikomatsakis nikomatsakis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got intrigued by the twitter thread, as @Bascule hoped, but I don't quite understand what this code is trying to do. :( More pointers would be helpful! I do see some problems with it as written, though.

argon2/src/instance.rs Outdated Show resolved Hide resolved
argon2/src/instance.rs Show resolved Hide resolved
@tarcieri
Copy link
Member

tarcieri commented Mar 28, 2021

I can try to write a short synopsis of how the Argon2 paper describes parallel operation of the algorithm (mostly summarizing sections 3.2 and 3.3, see also section 6.2 Implementing parallelism):

https://www.password-hashing.net/argon2-specs.pdf

The algorithm operates over a matrix of "blocks" consisting of:

  • 𝒑 rows ("lanes"), where 𝒑 is the desired number of worker threads
  • 𝒒 columns

Each lane is further subdivided into 𝑺 = 4 "slices" (in Argon2 terminology, and referred to as SYNC_POINTS in the code), and the intersection of a slice and a lane forms a segment of length 𝒒 / 𝑺.

Segments of the same slice are computed in parallel and therefore cannot reference each other. So from a memory model perspective, what we'd really like is to mutably borrow the values of a particular slice, partition them into segments, and give each worker thread access to a particular segment.

However, we also need to allow all of the worker threads to simultaneously borrow all of the other blocks which do not belong to the "slice" being operated on to reference as inputs.

This is the tricky part: the fill_segment operation can reference blocks from the current lane, or other lanes, but will not reference blocks from the same "slice" being operated on.

Having just written all of that down (thanks for rubber ducking if nothing else), I think I have a better idea of how to model this problem safely in Rust: "slices" (in the Argon2 sense) should be the core level of granularity in which the working "memory" is organized.

The main loop of the algorithm iterates over the slices. Provided I'm actually understanding this correctly, we can borrow one slice mutably at the time and the others immutably. The mutably borrowed slice can then be subdivided into a segment for each lane, given to the worker threads along with immutable references to all of the other slices.

I think a big part of what's making this so tricky right now is the memory consists of a contiguous Vec<Block>. I think that might still be fine for the backing storage, but perhaps we could mediate access to segments through another type that splits the borrows by Argon2 "slice", allowing one "slice" to be acted on mutably and the others referenced as inputs.

@tarcieri
Copy link
Member

I think a next step which might be helpful in general is to extract a struct Memory which borrows from a backing [Block] slice, pass that to Instance::new instead of the raw &'a mut [Block], and provide a method on Memory for accessing blocks by their Position.

From there we can look at borrow splitting the backing buffer so Memory only holds 3 of the 4 "slices" at a time, and can be used to look up the "reference" blocks being used to fill a segment, but it would not have access to the slice being operated over (which would be borrowed mutably and split up into segments among the worker threads).

Copy link
Member

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also based on @nikomatsakis's comments I'm going to unapprove this for now

@aumetra aumetra marked this pull request as draft March 29, 2021 13:38
@aumetra
Copy link
Contributor Author

aumetra commented Apr 18, 2021

This should fix the UB as every thread now dereferences the pointer itself

@tarcieri
Copy link
Member

@smallglitch nice! I think that's a start.

Do you want to mark the PR as ready for review?

@aumetra aumetra marked this pull request as ready for review April 18, 2021 18:31
@aumetra
Copy link
Contributor Author

aumetra commented Apr 18, 2021

Sure, I wasn't sure if you'd be ok with an unsafe implementation
If an unsafe implementation is ok, I think this should be fine

Copy link
Member

@tarcieri tarcieri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unsafe is fine for now. I can circle back on a safe implementation.

Thanks for extracting a Memory type.

@tarcieri tarcieri merged commit dd8d13b into RustCrypto:master Apr 18, 2021
@tarcieri
Copy link
Member

I opened #154 to track making the implementation safe

@tarcieri tarcieri mentioned this pull request Apr 18, 2021
@tarcieri tarcieri mentioned this pull request Oct 2, 2021
dns2utf8 pushed a commit to dns2utf8/password-hashes that referenced this pull request Jan 24, 2023
…k-dev-macro

digest: fix benchmark dev macro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

argon2: parallel implementation
3 participants