Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking: sort std/crypto functions into categories #6095

Merged
merged 7 commits into from
Aug 20, 2020

Conversation

jedisct1
Copy link
Contributor

Instead of having all primitives and constructions share the same namespace, they are now organized by category and function family.

Types within the same category are expected to share the exact same API.

The AEAD API needs to expose encryption/decryption functions with an attached tag, and we may want to define a common Error Set, but that will be part of a dedicated PR.

@kubkon kubkon added standard library This issue involves writing Zig code for the standard library. breaking Implementing this issue could cause existing code to no longer compile or have different behavior. labels Aug 19, 2020
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I have a couple of pieces of guidance here.

.data = [_]u32{0} ** (State.BLOCKBYTES / 4),
},
.buf_off = 0,
pub fn reset(self: *Self) void {
Copy link
Member

@andrewrk andrewrk Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous way of doing this is more idiomatic. reset in zig is understood to be equivalent to the following pattern:

var x = init();
// reset is just deinit() and then reassign it
x.deinit();
x.* = init();

If a reset function would just be this, better to not define it. Similarly, if deinit would just do self.* = undefined then it's understood that deinit is simply x = undefined and should not be implemented.

And finally, if init is simply returning a struct of comptime-known values, then it would be better to either make init simply a const rather than a function, or populate the fields with default values.

However if you are trying to design a consistent interface across many APIs, which it appears you are doing, then it's OK to have an init function that simply returns a comptime value.

I recognize that what I am saying contradicts a lot of existing standard library code. However I'm confident in the direction of what I am saying and I'd like to see contributions go towards this goal rather than away from it.

If it were up to me, I'd have all the crypto APIs omit init, deinit, and reset. That's the minimalistic beauty of things when you don't need any heap allocations or any syscalls, and you're doing pure data manipulation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Andrew!

reset() is identical to init() only for functions whose initial state is constant. All recent hash functions, including blake2/blake3/gimli that we already have in std, can accept a key, a context and other optional parameters. Being able to reset the state to what it was after these parameters have been processed can save a few cycles in that case.

That being said, besides a category of functions that we don't have yet (designed for hashing passwords), heap allocations are unlikely to ever be necessary for hash functions, so update() is not necessary; applications can simply copy a state if they need to reuse it.

init() functions are currently returning a struct with comptime-know values. But, taking the example of blake3, as an alternative to init(), we also currently have init_keyed() to specify a key and init_derive_key to specify a context. So we already have 3 init functions. And this is insufficient, since applications should also be able to optionally set the parallelism. And even though the current implementation is minimal, BLAKE2 is specified with more optional parameters than BLAKE3.

So, maybe the cleanest approach would be for a hash structure to have an initial state computed for some default parameters, making init*() functions optional, but also have an (unique) init(options: struct { ... }) function allowing the creation of the initial state according to a key, context, salt, parallelism level, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am worried about if we use default values (or if init is a constant) is the fact that it will prevent us from selecting an optimized implementation at runtime.

There has been discussions about having multiple variants of a function. Dynamic dispatch according to CPU features is also doable right now, but in both cases, how the state is encoded will depend on the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamic dispatch according to CPU features

Related: #1018


Now that you've taken in the information I gave in the above comment, and you've been poking around the std lib crypto code, I trust your judgement, and so you can make the API choices that you feel are appropriate here. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew!

Maybe keeping init as a function would be the safest option in order to reduce future API breakage, then.

reset() has been removed.

  • reset() is unnecessary; states that have to be reused can be copied
  • reset() is error-prone. Copying a previous state prevents forgetting struct members.
  • reset() forces implementation to store sensitive data (key, initial state) in memory even when they are not needed.
  • reset() is confusing as it has a different meaning elsewhere in Zig.

I'm going to make a review pass on init() functions to ensure that they include parameters that each primitive is designed to support and that we will eventually implement, and then this PR may then be good to go.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I would prefer rebase against master than merges, if you don't mind. Otherwise I have to do it locally in a branch instead of slapping that big fat green button

Instead of having all primitives and constructions share the same namespace,
they are now organized by category and function family.

Types within the same category are expected to share the exact same API.
Some MACs have a 64-bit output
Justification:
- reset() is unnecessary; states that have to be reused can be copied
- reset() is error-prone. Copying a previous state prevents forgetting
  struct members.
- reset() forces implementation to store sensitive data (key, initial state)
  in memory even when they are not needed.
- reset() is confusing as it has a different meaning elsewhere in Zig.
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, ready for merge?

- This avoids having multiple `init()` functions for every combination
of optional parameters
- The API is consistent across all hash functions
- New options can be added later without breaking existing applications.
  For example, this is going to come in handy if we implement parallelization
  for BLAKE2 and BLAKE3.
- We don't have a mix of snake_case and camelCase functions any more, at
least in the public crypto API

Support for BLAKE2 salt and personalization (more commonly called context)
parameters have been implemented by the way to illustrate this.
@jedisct1
Copy link
Contributor Author

Now ready to merge :)

@andrewrk
Copy link
Member

andrewrk commented Aug 20, 2020

Nice, fc55cd4 looks 👍

If you tell me that you ran the std lib tests locally (./zig build test-std) I'll merge without waiting for CI

@jedisct1
Copy link
Contributor Author

I ran the entire zig build test, which just completed fine :)

@andrewrk andrewrk merged commit 21106b9 into ziglang:master Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants