Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging, shortcuts and other improvements #34

Merged
merged 4 commits into from
Mar 11, 2021

Conversation

lgoldstein
Copy link

  • Logging - IMO it is more intuitive to explicitly enable logging rather than do it via filtering of the logging configuration.
  • Shortcuts - Added encryptor shortcut to facilitate encrypting in-memory data
  • Added a few improvements to FileMetadata

@justinludwig
Copy link
Owner

Thanks for all your work on this. I don't understand the change to Encryptor.getLiteralBuffer(), however.

The encryption, compression, and literal buffers were all sized so that bouncy castle wouldn't split up the encryption, compression, or literal packets with additional partial-packet headers for small files; and that for large files, it wouldn't insert a ton of partial-packet headers (just one every 64k).

With this change, it looks like the literal packet will always be split up into a bunch of 512-byte partial packets for files of any size (unless the file name is really large, in which case it could be split into 1024-byte partial packets or larger).

So my questions are:

  1. Why vary the partial-packet size based on the size of the file name?
  2. Why not use the same sizing for encryption and compression partial-packets?
  3. If you're finding it useful to tune the partial-packet size, how about instead change the Encryptor.bestPacketSize() method to allow the max size to be configured? -- if the max size was say a settable property on the Encryptor, it could be set really small for use-cases where that would be optimal, or really big for use-cases where a really big partial-packet size would be better

@lgoldstein
Copy link
Author

Why vary the partial-packet size based on the size of the file name?
Why not use the same sizing for encryption and compression partial-packets?

The original code was doing

return new byte[bestPacketSize(meta.getLength())];

This is obviously wrong - the literal buffer size for a 1MB file named "X" is the same size as the one for a 1KB file named "X". This is due to the fact that the PGPLiteralDataGenerator encodes the name + format + length + last-modified-time - which means that the required buffer size depends on the file name size and not the file content size. In other words, the old code was extremely wasteful since it would allocate a huge buffer to encode just a few bytes.

I think this 100% aligned with the principle that you mentioned (and with which I agree wholeheartedly):

The encryption, compression, and literal buffers were all sized so that bouncy castle wouldn't split up the encryption, compression, or literal packets with additional partial-packet headers for small files; and that for large files, it wouldn't insert a ton of partial-packet headers (just one every 64k).

As far as

If you're finding it useful to tune the partial-packet size, how about instead change the Encryptor.bestPacketSize() method to allow the max size to be configured? -- if the max size was say a settable property on the Encryptor, it could be set really small for use-cases where that would be optimal, or really big for use-cases where a really big partial-packet size would be better

While it may provide extra optimization I don't think the cost in code complexity warrants it - at least not for the time being.

@justinludwig
Copy link
Owner

Ah -- yes, that's the first thing that the PGPLiteralDataGenerator writes to the buffer (format + name + last modified etc). But then the full plaintext of the file is also written through the buffer.

The bouncy-castle code is hard to follow, but it's trying to encapsulate the logic from here:

https://tools.ietf.org/html/rfc4880#section-5.9

It's straightforward until you get to the last bulleted item in section 5.9 (the literal data itself -- aka the plaintext). With the literal data, you either need to know its full exact length ahead of time, or you need to buffer the data, write a partial-packet header with the length of the partial data, write the partial data, buffer more data, write the next partial-packet header, write the partial data, and so on, until you get to the end of the data. The way this is output is described earlier in the spec, in section 4 -- with the partial-packet structure in particular described here:

https://tools.ietf.org/html/rfc4880#section-4.2.2.4

The (wrapped) BCPGOutputStream object that the PGPLiteralDataGenerator.open() method returns tries to encapsulate this partial-packet sizing mechanism (and also tries to encapsulate all the other variations of packet sizing from section 4, which is why it's so convoluted). It will hold onto the buffer we pass into PGPLiteralDataGenerator.open(), and use it to buffer and then write each partial chunk of plaintext as we push it through through the encryption pipeline. Not until the BCPGOutputStream's close() method is called, when it finally knows for sure the full length of the literal data, will it write the final length header and remaining buffered data.

So that's why, even though it initially doesn't look like it's doing anything, the buffer for the literal-data generator is actually providing the internal buffering (and corresponding partial-packet sizing) for the entire original plaintext content.

@lgoldstein
Copy link
Author

Good to know - I will revert the change and update this PR

@lgoldstein lgoldstein force-pushed the logging-and-shortcuts branch from 680eae2 to 50ce266 Compare March 10, 2021 19:29
@lgoldstein
Copy link
Author

Done...

@justinludwig justinludwig merged commit 50ce266 into justinludwig:master Mar 11, 2021
@lgoldstein lgoldstein deleted the logging-and-shortcuts branch March 11, 2021 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants