-
-
Notifications
You must be signed in to change notification settings - Fork 760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Add file authentication #1688
Conversation
Current coverage is 84.58% (diff: 95.45%)@@ master #1688 diff @@
==========================================
Files 20 21 +1
Lines 6599 6706 +107
Methods 0 0
Messages 0 0
Branches 1121 1140 +19
==========================================
+ Hits 5587 5672 +85
- Misses 740 759 +19
- Partials 272 275 +3
|
src/borg/crypto.pyx
Outdated
|
||
def __exit__(self, exc_type, exc_val, exc_tb): | ||
# Sign length of file as well to avoid problems if only a prefix is read. | ||
self.seek(0, io.SEEK_END) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for exception here? guess you only want to sign if there was no exception that triggered leaving the scope.
src/borg/crypto.pyx
Outdated
def signature(self): | ||
""" | ||
Return final signature bytestring; further read()s or write()s are illegal. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise NotImplemented ?
src/borg/key.py
Outdated
# Changing the name however imbues a change of context that is not permissible. | ||
filename = os.path.basename(self.path) | ||
self.signer.update(b'%d' % len(filename)) | ||
self.signer.update(filename.encode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, we don't need it for borg filenames, but it does no harm either: .encode('utf-8')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"utf-8" is the default value for encode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, rly? I thought it was ascii.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the default was adjustable on Python 2 (derived by locale, even?), but in 3 it's always utf-8. Sane defaults are nice :)
(btw we can clean those encode(utf8), decode(utf8) up some time)
src/borg/key.py
Outdated
with open(signature_path, 'r') as fd: | ||
signature = json.load(fd) | ||
# Provisions for agility now, implementation later, but make sure the on-disk joint is oiled. | ||
algorithm = signature['algorithm'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the json does not have this key, you'll get a KeyError here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are caught below (OSError, ValueError, TypeError, LookupError); I can change the latter into KeyError (instead of LookupError - I thought indexing a list with a string would be IndexError, but it's actually TypeError), so this is more visible.
src/borg/testsuite/crypto.py
Outdated
with ss: | ||
assert ss.read() == data | ||
assert ss.signature() == unhexlify('9f22d4ccc869fc0e697bcd8780ec7e24efa8d5c862367be393204bca6f2f0e0b' | ||
'8992ff67d54e789f9e658bef65e0bcdf10afeaaec86a5c96a54f3a6cde083532') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use same key and data for read/write tests to show roundtrip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
src/borg/key.py
Outdated
|
||
|
||
class SignedFile(FileLikeWrapper): | ||
# generic enough that it can be used without a KeyBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it into helpers.py? I'ld expect some trouble with circular dependencies when put here.
about *.bsi: sounds like one can not trust it. :-D other ideas: .check, .sign(ature), .mac |
.cyber? (Or would that be .cyberprotect?) Edit: Oh God, there's actually something with that name around!
Cybering intensifies Anyhow. .signature should work ok - this stuff is nowhere near 8.3 compatible anyway. |
src/borg/crypto.pyx
Outdated
|
||
WARNING: Seeks should only be used to query the size of the file, not | ||
to skip data, because skipped data isn't read and not signed. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe one should note that reading and writing should be in same order (usually writing the whole file in a linear manner) - otherwise the signature won't match. Also, seeks to create holes are not allowed.
src/borg/testsuite/signature.py
Outdated
class TestSignedFile: | ||
@pytest.fixture | ||
def key(self): | ||
return bytes(1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any special reason to use 1234 zero bytes? Usually a hmac-sha512 key is 512bits (64 bytes).
src/borg/testsuite/signature.py
Outdated
class TestSignedFileParts: | ||
@pytest.fixture | ||
def key(self): | ||
return bytes(1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -0,0 +1,84 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line
src/borg/signature.py
Outdated
# Provisions for agility now, implementation later, but make sure the on-disk joint is oiled. | ||
algorithm = signature['algorithm'] | ||
if algorithm != signer.NAME: | ||
logger.info('Cannot verify signature for %s: Unknown algorithm %r', path, algorithm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning?
src/borg/testsuite/crypto.py
Outdated
data = unhexlify('12345678') | ||
fd = io.BytesIO(data) | ||
key = b'\x0b' * 20 | ||
ss = StreamSigner_HMAC_SHA512(key, fd, write=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write=True?
I'm going to remove the keying part for now completely, because that requires changes to borg.key I don't want to do at all right here / right now. |
@@ -13,6 +13,8 @@ | |||
import msgpack | |||
|
|||
import logging | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty lines?
@@ -1,3 +1,4 @@ | |||
#include <Python.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow all that Py* stuff isn't pretty. can we avoid by moving this to cython?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving it to Cython means exposing most C structures to Cython and staring at tons of generated C code to see that it actually does the right thing (like in #891). I wouldn't do it bundled together with a functional change.
isn't it sufficient to collect an hash and sign the results, that way the hash the user decided on would also be obtainable at zero cost |
I don't understand what you're trying to say... can you explain? What do you mean by the user deciding on a hash? How is it zero cost? |
Current state: Some minor issues to work out in the tests, seems to work well in real life. In some spots (esp. Repository.write_index) this made some duplication more visible, I'm considering to replace these with SaveFile maybe. (On the other hand the file emplacement done in write_index doesn't do much either). Same goes for Cache. Needs some additional tests to actually check correct handling of signatures from the other code (cache, repo). The HashIndex tests could now avoid writing to temporary files, which would be nice, because they're used in the self test. |
@enkore instead of making a signature for files, just make a supported crypto hash of the content, then sign the hash/collection of all hashes, that way obtaining that particular crypto hash is zero cost |
The latter part has always to happen, and takes most of the time: eg. hash a 200 MB index file @ 350 MB/s = 0.6 s CPU time. Making a public key signature (which we don't right now and probably won't for some time?) is comparatively little work; the same CPU does ~160 RSA 4096 bit signatures/s and around 10000 verifications/s. RSA is pretty much the slowest system anyone would consider (DSA is slower, but it's DSA), and others are an order of magnitude faster or so (Ed25519). Besides having to make fewer signatures I don't really see any advantage, but another increase in complexity. |
-> not locked -> index broken, replay needed -> tries to check self.lock -> but it's None - Repository wasn't locked in the first place
@@ -89,7 +90,8 @@ def __enter__(self): | |||
return self | |||
|
|||
def __exit__(self, exc_type, exc_val, exc_tb): | |||
self.close() | |||
self.fd.__exit__(exc_type, exc_val, exc_tb) | |||
#self.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXX I really have to dive into this and see what really is correct
|
||
def __exit__(self, exc_type, exc_val, exc_tb): | ||
from .. import platform | ||
self.fd.close() | ||
self.fd.__exit__(exc_type, exc_val, exc_tb) | ||
#self.fd.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
XXX I really have to dive into this and see what really is correct
- Preparation for borgbackup#1688 / borgbackup#1101 - Support hash indices >2 GB - Better error reporting
- Preparation for borgbackup#1688 / borgbackup#1101 - Support hash indices >2 GB - Better error reporting
- Preparation for borgbackup#1688 / borgbackup#1101 - Support hash indices >2 GB - Better error reporting
- Preparation for borgbackup#1688 / borgbackup#1101 - Support hash indices >2 GB - Better error reporting
This is based on much earlier work from October 2016 by me, but is overall simplified and changed terminology (from "signing" to hashing and integrity checking). See borgbackup#1688 for the full history.
It's time to close this one. |
Implement IntegrityCheckedFile This is based on much earlier work from October 2016 by me, but is overall simplified and changed terminology (from "signing" to hashing and integrity checking). See #1688 for the full history.
Fixes #1101
First the implementation only; I'll have another idea or two here to make it safer in the ways we'll be using it, especially in the hashindex (C) code. Then I'll integrate it in the various spots that need it:
Signing key generation for the cache (this will change the key to store the newly generated key; log message saying so)Gist:
Create an accompanying .bsi (_b_org _si_gnature, didn't want to use .sig due to PGP use). Contains signature algo + value in JSON.
When reading, signature state is continuously updated. When done, signatures are compared and appropiate errors are raised.
This has the advantage that the file doesn't have to be read twice and has no races that would be associated with reading twice.
The disadvantage is that invalid data may be processed (but I have an idea about that).[1]When writing, the same thing happens
Both the file name (but not the path) and the file length are encoded into the signature
Implementation is HMAC-SHA512 for now. Since most of these places we'll use it (indices, caches) will be rather large objects I didn't bother to add low-level OpenSSL wrappers for now.
[1] d46f3be solves this. By allowing us to sign parts of a file independently as we go (header and data in hashindex for example) this allows us to sanitize-by-signing parts of the file we need to work with later parts of the file.
If eg. the header were corrupted and says we got a gazillion entries, we can detect this now immediately, because the header can be signed independently and thus verified immediately.