Logging, shortcuts and other improvements #34

lgoldstein · 2021-03-06T08:00:15Z

Logging - IMO it is more intuitive to explicitly enable logging rather than do it via filtering of the logging configuration.
Shortcuts - Added encryptor shortcut to facilitate encrypting in-memory data
Added a few improvements to FileMetadata

justinludwig · 2021-03-08T22:18:00Z

Thanks for all your work on this. I don't understand the change to Encryptor.getLiteralBuffer(), however.

The encryption, compression, and literal buffers were all sized so that bouncy castle wouldn't split up the encryption, compression, or literal packets with additional partial-packet headers for small files; and that for large files, it wouldn't insert a ton of partial-packet headers (just one every 64k).

With this change, it looks like the literal packet will always be split up into a bunch of 512-byte partial packets for files of any size (unless the file name is really large, in which case it could be split into 1024-byte partial packets or larger).

So my questions are:

Why vary the partial-packet size based on the size of the file name?
Why not use the same sizing for encryption and compression partial-packets?
If you're finding it useful to tune the partial-packet size, how about instead change the Encryptor.bestPacketSize() method to allow the max size to be configured? -- if the max size was say a settable property on the Encryptor, it could be set really small for use-cases where that would be optimal, or really big for use-cases where a really big partial-packet size would be better

lgoldstein · 2021-03-09T18:25:15Z

Why vary the partial-packet size based on the size of the file name?
Why not use the same sizing for encryption and compression partial-packets?

The original code was doing

return new byte[bestPacketSize(meta.getLength())];

This is obviously wrong - the literal buffer size for a 1MB file named "X" is the same size as the one for a 1KB file named "X". This is due to the fact that the PGPLiteralDataGenerator encodes the name + format + length + last-modified-time - which means that the required buffer size depends on the file name size and not the file content size. In other words, the old code was extremely wasteful since it would allocate a huge buffer to encode just a few bytes.

I think this 100% aligned with the principle that you mentioned (and with which I agree wholeheartedly):

The encryption, compression, and literal buffers were all sized so that bouncy castle wouldn't split up the encryption, compression, or literal packets with additional partial-packet headers for small files; and that for large files, it wouldn't insert a ton of partial-packet headers (just one every 64k).

As far as

If you're finding it useful to tune the partial-packet size, how about instead change the Encryptor.bestPacketSize() method to allow the max size to be configured? -- if the max size was say a settable property on the Encryptor, it could be set really small for use-cases where that would be optimal, or really big for use-cases where a really big partial-packet size would be better

While it may provide extra optimization I don't think the cost in code complexity warrants it - at least not for the time being.

justinludwig · 2021-03-10T06:11:02Z

Ah -- yes, that's the first thing that the PGPLiteralDataGenerator writes to the buffer (format + name + last modified etc). But then the full plaintext of the file is also written through the buffer.

The bouncy-castle code is hard to follow, but it's trying to encapsulate the logic from here:

https://tools.ietf.org/html/rfc4880#section-5.9

It's straightforward until you get to the last bulleted item in section 5.9 (the literal data itself -- aka the plaintext). With the literal data, you either need to know its full exact length ahead of time, or you need to buffer the data, write a partial-packet header with the length of the partial data, write the partial data, buffer more data, write the next partial-packet header, write the partial data, and so on, until you get to the end of the data. The way this is output is described earlier in the spec, in section 4 -- with the partial-packet structure in particular described here:

https://tools.ietf.org/html/rfc4880#section-4.2.2.4

The (wrapped) BCPGOutputStream object that the PGPLiteralDataGenerator.open() method returns tries to encapsulate this partial-packet sizing mechanism (and also tries to encapsulate all the other variations of packet sizing from section 4, which is why it's so convoluted). It will hold onto the buffer we pass into PGPLiteralDataGenerator.open(), and use it to buffer and then write each partial chunk of plaintext as we push it through through the encryption pipeline. Not until the BCPGOutputStream's close() method is called, when it finally knows for sure the full length of the literal data, will it write the final length header and remaining buffered data.

So that's why, even though it initially doesn't look like it's doing anything, the buffer for the literal-data generator is actually providing the internal buffering (and corresponding partial-packet sizing) for the entire original plaintext content.

lgoldstein · 2021-03-10T19:25:45Z

Good to know - I will revert the change and update this PR

lgoldstein · 2021-03-10T19:29:37Z

Done...

Added encryptor/decryptor flag to control non-essential logging

40cce5b

Lyor Goldstein added 3 commits March 10, 2021 21:27

Added shortcut for encrypting bytes data array

d7c826c

Added FileMetadata#equals/hashCode implementation

f1a1070

Stricter visibility for FileMetadata related fields

50ce266

lgoldstein force-pushed the logging-and-shortcuts branch from 680eae2 to 50ce266 Compare March 10, 2021 19:29

justinludwig merged commit 50ce266 into justinludwig:master Mar 11, 2021

lgoldstein deleted the logging-and-shortcuts branch March 11, 2021 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging, shortcuts and other improvements #34

Logging, shortcuts and other improvements #34

lgoldstein commented Mar 6, 2021

justinludwig commented Mar 8, 2021

lgoldstein commented Mar 9, 2021

justinludwig commented Mar 10, 2021

lgoldstein commented Mar 10, 2021

lgoldstein commented Mar 10, 2021

Logging, shortcuts and other improvements #34

Logging, shortcuts and other improvements #34

Conversation

lgoldstein commented Mar 6, 2021

justinludwig commented Mar 8, 2021

lgoldstein commented Mar 9, 2021

justinludwig commented Mar 10, 2021

lgoldstein commented Mar 10, 2021

lgoldstein commented Mar 10, 2021