Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a simpler PAE #27

Closed
MarkLodato opened this issue Apr 16, 2021 · 11 comments
Closed

Use a simpler PAE #27

MarkLodato opened this issue Apr 16, 2021 · 11 comments

Comments

@MarkLodato
Copy link
Collaborator

MarkLodato commented Apr 16, 2021

In 0.1, we choose PASETO's PAE because it was already written. But it is does require doing some integer-to-bytes conversion and a need to be careful about sizes and endianness. Here is an example where a potential customer was scared off by the perceived complexity of PAE: sigstore/cosign#214 (comment)

For 1.0, perhaps we should choose a simpler function that is easier to describe and implement. Here is my proposal: concatenate the following, separated by a single space:

  • "DSSEv1" (or whatever we choose in Choose a more specific name than "signing-spec" #16)
  • byte length of UTF8-encoded type ID, encoded in ASCII decimal with no leading zeros
  • UTF8-encoded type ID
  • byte length of payload, encoded in ASCII decimal with no leading zeros
  • payload

Example: DSSEv1 31 https://in-toto.io/Statement/v1 434 {"subject": {...}, ...}

This is basically the same as the current PAE except it is simpler to implement. Note that this is a byte stream that happens to be start with unicode characters, and we never need to decode it.

Edit 1: reorder the fields to match the existing PAE.
Edit 2: clarify that it is a byte stream, and mention version 0.1.

@MarkLodato
Copy link
Collaborator Author

In fact, here is a bash implementation of the suggested simpler PAE:

local payloadTypeLength=$(LC_ALL=C; echo "${#payloadType}")
local payloadSize=$(stat -c %s "$payloadFile")
printf "%s" "DSSEv1 $payloadTypeLength $payloadType $payloadSize "
cat "$payloadFile"

Doing this with the current PAE would be more difficult.

@kommendorkapten
Copy link

Wouldn't it be beneficial to base64 encode the payload first?

@MarkLodato
Copy link
Collaborator Author

@kommendorkapten No, I don't think so. Base64 is a transport detail that should get stripped off before it gets to the PAE stage. For example, if you parse the JSON with the protobuf library, it does the base64-decoding for you. Plus the protocol is encoding-agnostic, and base64 isn't used at all for CBOR, Avro, Protobuf, or any other binary encoding. Heck, you might have an "ASCII Armor" encoding like PGP where you wrap at 72 columns, in which case you can't feed it directly into PAE anyway.

@kommendorkapten
Copy link

kommendorkapten commented Apr 27, 2021

Makes as PAE is used to canonicalize the data prior to signature generation.

Heck, you might have an "ASCII Armor" encoding like PGP where you wrap at 72 columns, in which case you can't feed it directly into PAE anyway.

This I didn't get. An ASCII Armor encoding is a regular string (with newlines), so why couldn't that be fed into PAE? The result of PAE is an octet stream that won't be decoded, only used to generate a signature if I got it right.

@MarkLodato
Copy link
Collaborator Author

There are two steps to signing:

  1. Protocol: Given a payload (bytes) and a payloadType (string), create a signature. PAE is used here to convert the 2-tuple into a byte stream. Base64 doesn't make sense here: payload is already bytes, so it serves no purpose to re-encode it.
  2. Encoding: Given payload (bytes), payloadType (string), and signature (bytes), serialize it into an envelope that the verifier can decode. This is currently JSON. Base64 is needed for payload and signature because those are bytes but JSON only supports strings, and base64 converts bytes to strings.

The protocol is agnostic to the encoding. If you decode a JSON envelope and re-encode as CBOR, the signature should stay valid.

Now consider the proposal where PAE uses a base64-encoded payload:

  • If the envelope is CBOR, you'd need to temporarily base64-encode just for PAE. That is unnecessary processing and room for error. Do you use standard or URL-safe encoding? Do you include newlines (and if so, wrap at what width)?
  • If the envelope is ASCII Armor standard base64 wrapped at 72 columns, you'd need to decode it and re-encode as the canonical URL-safe, no-newline base64 encoding. So here, having base64 in PAE does you no good.
  • If the envelope is JSON but uses standard base64 encoding, again you're going to have to decode and re-encode before PAE.

You could say, "You sign whatever encoding the sender used." OK, but that would break the notion of being able to re-encode the envelope, and what would it buy you? AFAICT, the only difference is that you do the PAE before base64-decoding instead of after. But why is that worth the added complexity?

@kommendorkapten
Copy link

I think we are talking about different things :)
I'm with you on not base64 encode prior to PAE, as that's how the current spec is written. As PAE in the current spec exactly what you wrote: convert the 2-tuple into a byte stream that is signed.

I looked at your example

Example: DSSEv1 31 https://in-toto.io/Statement/v1 434 {"subject": {...}, ...}

And I jumped the gun and thought that you meant for this PAE function to convert a 2-tuple to a string (not a byte stream).
So this proposal is still for PAE to create a byte stream, where the first few bytes just happens to be ASCII characters?

@MarkLodato
Copy link
Collaborator Author

Ah, so sorry for the confusion! 😄

Yes, the proposal is for it to still be a byte stream (to be fed to a cryptographic hash function), but the first few bytes just happen to be ASCII because it's easier to deal with ASCII. In particular, printf("%d", length) is easier than little_endian_64bit(length) in most languages.

@kommendorkapten
Copy link

Got it, thanks for clarifying 👍

This was referenced Jun 4, 2021
MarkLodato added a commit to MarkLodato/dsse that referenced this issue Jun 4, 2021
The previous PAE required binary encoding, which can be tricky in some
langauges. The new PAE is pure ASCII so it can be implemented very
easily in almost any language.
@wietse-gmail
Copy link

Just a minor nit: when you write "concatenate the following elements, separated by spaces" you probably mean "separated by a single space character".

Otherwise, there can be ambiguity, for example when a length is followed by multiple spaces.

@MarkLodato
Copy link
Collaborator Author

Fixed, though the real text is at #37. Please take a look at that to see if it looks good to you.

@MarkLodato
Copy link
Collaborator Author

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants