Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Add file authentication #1688

Closed
wants to merge 9 commits into from
Closed

Conversation

enkore
Copy link
Contributor

@enkore enkore commented Oct 6, 2016

Fixes #1101

First the implementation only; I'll have another idea or two here to make it safer in the ways we'll be using it, especially in the hashindex (C) code. Then I'll integrate it in the various spots that need it:

  • Hashindex (this could be a bit awkward talking a lot to Python from C, but well, let's do it -- at least we get better error handling now than PRINTF_ERROR; RETURN ABORT)
  • Cache
    • chunks cache
    • archive caches
    • files cache
  • Repository (index, hints... config?)
  • Signing key generation for the cache (this will change the key to store the newly generated key; log message saying so)
  • Work out test issue (good news: no bug in the signing code)
  • optimize: SignedFile in read mode doesn't need to do anything if there's no signature attached
  • check/review/refactor nested context wrappers

Gist:

  • Create an accompanying .bsi (_b_org _si_gnature, didn't want to use .sig due to PGP use). Contains signature algo + value in JSON.

  • When reading, signature state is continuously updated. When done, signatures are compared and appropiate errors are raised.

    This has the advantage that the file doesn't have to be read twice and has no races that would be associated with reading twice.

    The disadvantage is that invalid data may be processed (but I have an idea about that). [1]

  • When writing, the same thing happens

  • Both the file name (but not the path) and the file length are encoded into the signature

Implementation is HMAC-SHA512 for now. Since most of these places we'll use it (indices, caches) will be rather large objects I didn't bother to add low-level OpenSSL wrappers for now.

[1] d46f3be solves this. By allowing us to sign parts of a file independently as we go (header and data in hashindex for example) this allows us to sanitize-by-signing parts of the file we need to work with later parts of the file.

If eg. the header were corrupted and says we got a gazillion entries, we can detect this now immediately, because the header can be signed independently and thus verified immediately.

@codecov-io
Copy link

codecov-io commented Oct 6, 2016

Current coverage is 84.58% (diff: 95.45%)

Merging #1688 into master will decrease coverage by 0.08%

@@             master      #1688   diff @@
==========================================
  Files            20         21     +1   
  Lines          6599       6706   +107   
  Methods           0          0          
  Messages          0          0          
  Branches       1121       1140    +19   
==========================================
+ Hits           5587       5672    +85   
- Misses          740        759    +19   
- Partials        272        275     +3   

Powered by Codecov. Last update 8567fcf...e61d8c1


def __exit__(self, exc_type, exc_val, exc_tb):
# Sign length of file as well to avoid problems if only a prefix is read.
self.seek(0, io.SEEK_END)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for exception here? guess you only want to sign if there was no exception that triggered leaving the scope.

def signature(self):
"""
Return final signature bytestring; further read()s or write()s are illegal.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise NotImplemented ?

src/borg/key.py Outdated
# Changing the name however imbues a change of context that is not permissible.
filename = os.path.basename(self.path)
self.signer.update(b'%d' % len(filename))
self.signer.update(filename.encode())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we don't need it for borg filenames, but it does no harm either: .encode('utf-8')

Copy link
Contributor Author

@enkore enkore Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"utf-8" is the default value for encode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, rly? I thought it was ascii.

Copy link
Contributor Author

@enkore enkore Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the default was adjustable on Python 2 (derived by locale, even?), but in 3 it's always utf-8. Sane defaults are nice :)

(btw we can clean those encode(utf8), decode(utf8) up some time)

src/borg/key.py Outdated
with open(signature_path, 'r') as fd:
signature = json.load(fd)
# Provisions for agility now, implementation later, but make sure the on-disk joint is oiled.
algorithm = signature['algorithm']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the json does not have this key, you'll get a KeyError here.

Copy link
Contributor Author

@enkore enkore Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are caught below (OSError, ValueError, TypeError, LookupError); I can change the latter into KeyError (instead of LookupError - I thought indexing a list with a string would be IndexError, but it's actually TypeError), so this is more visible.

with ss:
assert ss.read() == data
assert ss.signature() == unhexlify('9f22d4ccc869fc0e697bcd8780ec7e24efa8d5c862367be393204bca6f2f0e0b'
'8992ff67d54e789f9e658bef65e0bcdf10afeaaec86a5c96a54f3a6cde083532')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use same key and data for read/write tests to show roundtrip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

src/borg/key.py Outdated


class SignedFile(FileLikeWrapper):
# generic enough that it can be used without a KeyBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it into helpers.py? I'ld expect some trouble with circular dependencies when put here.

@ThomasWaldmann
Copy link
Member

about *.bsi: sounds like one can not trust it. :-D

other ideas: .check, .sign(ature), .mac

@enkore
Copy link
Contributor Author

enkore commented Oct 6, 2016

.cyber? (Or would that be .cyberprotect?) Edit: Oh God, there's actually something with that name around!

Go safely into the digital corporate future with Allianz Cyber Protect Premium
...
You are vulnerable like never before to Cyber attacks, identity theft, Internet sabotage or data loss
...
"Hacker attacks become more directed, lasting and fatal", says Jens Krickhahn, AGCS expert for Cyberinsurance

Cybering intensifies


Anyhow.

.signature should work ok - this stuff is nowhere near 8.3 compatible anyway.


WARNING: Seeks should only be used to query the size of the file, not
to skip data, because skipped data isn't read and not signed.

Copy link
Member

@ThomasWaldmann ThomasWaldmann Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one should note that reading and writing should be in same order (usually writing the whole file in a linear manner) - otherwise the signature won't match. Also, seeks to create holes are not allowed.

class TestSignedFile:
@pytest.fixture
def key(self):
return bytes(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any special reason to use 1234 zero bytes? Usually a hmac-sha512 key is 512bits (64 bytes).

class TestSignedFileParts:
@pytest.fixture
def key(self):
return bytes(1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -0,0 +1,84 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

# Provisions for agility now, implementation later, but make sure the on-disk joint is oiled.
algorithm = signature['algorithm']
if algorithm != signer.NAME:
logger.info('Cannot verify signature for %s: Unknown algorithm %r', path, algorithm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning?

data = unhexlify('12345678')
fd = io.BytesIO(data)
key = b'\x0b' * 20
ss = StreamSigner_HMAC_SHA512(key, fd, write=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write=True?

@enkore
Copy link
Contributor Author

enkore commented Oct 6, 2016

I'm going to remove the keying part for now completely, because that requires changes to borg.key I don't want to do at all right here / right now.

@@ -13,6 +13,8 @@
import msgpack

import logging


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty lines?

@@ -1,3 +1,4 @@
#include <Python.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somehow all that Py* stuff isn't pretty. can we avoid by moving this to cython?

Copy link
Contributor Author

@enkore enkore Oct 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving it to Cython means exposing most C structures to Cython and staring at tons of generated C code to see that it actually does the right thing (like in #891). I wouldn't do it bundled together with a functional change.

@RonnyPfannschmidt
Copy link
Contributor

isn't it sufficient to collect an hash and sign the results, that way the hash the user decided on would also be obtainable at zero cost

@enkore
Copy link
Contributor Author

enkore commented Oct 7, 2016

I don't understand what you're trying to say... can you explain? What do you mean by the user deciding on a hash? How is it zero cost?

@enkore
Copy link
Contributor Author

enkore commented Oct 7, 2016

Current state: Some minor issues to work out in the tests, seems to work well in real life. In some spots (esp. Repository.write_index) this made some duplication more visible, I'm considering to replace these with SaveFile maybe. (On the other hand the file emplacement done in write_index doesn't do much either). Same goes for Cache.

Needs some additional tests to actually check correct handling of signatures from the other code (cache, repo).

The HashIndex tests could now avoid writing to temporary files, which would be nice, because they're used in the self test.

@RonnyPfannschmidt
Copy link
Contributor

@enkore instead of making a signature for files, just make a supported crypto hash of the content,

then sign the hash/collection of all hashes, that way obtaining that particular crypto hash is zero cost

@enkore
Copy link
Contributor Author

enkore commented Oct 7, 2016

@enkore instead of making a signature for files, just make a supported crypto hash of the content,

The latter part has always to happen, and takes most of the time: eg. hash a 200 MB index file @ 350 MB/s = 0.6 s CPU time. Making a public key signature (which we don't right now and probably won't for some time?) is comparatively little work; the same CPU does ~160 RSA 4096 bit signatures/s and around 10000 verifications/s. RSA is pretty much the slowest system anyone would consider (DSA is slower, but it's DSA), and others are an order of magnitude faster or so (Ed25519).

Besides having to make fewer signatures I don't really see any advantage, but another increase in complexity.

@enkore enkore changed the title (WIP) Add file signing ⚠️ WIP Add file signing Oct 9, 2016
@enkore enkore changed the title ⚠️ WIP Add file signing WIP Add file signing Oct 9, 2016
@enkore enkore self-assigned this Oct 10, 2016
@@ -89,7 +90,8 @@ def __enter__(self):
return self

def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
self.fd.__exit__(exc_type, exc_val, exc_tb)
#self.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XXX I really have to dive into this and see what really is correct


def __exit__(self, exc_type, exc_val, exc_tb):
from .. import platform
self.fd.close()
self.fd.__exit__(exc_type, exc_val, exc_tb)
#self.fd.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

XXX I really have to dive into this and see what really is correct

@ThomasWaldmann ThomasWaldmann added this to the 1.1.0rc1 milestone Nov 24, 2016
@textshell textshell changed the title WIP Add file signing WIP Add file authentication Nov 28, 2016
@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1.0b4, 1.1.0rc1 Jan 12, 2017
@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1.0b5, 1.1.0b4 Mar 4, 2017
@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1.0b6, 1.1.0b5 Apr 24, 2017
enkore added a commit to enkore/borg that referenced this pull request May 9, 2017
- Preparation for borgbackup#1688 / borgbackup#1101
- Support hash indices >2 GB
- Better error reporting
@enkore enkore mentioned this pull request May 9, 2017
enkore added a commit to enkore/borg that referenced this pull request May 9, 2017
- Preparation for borgbackup#1688 / borgbackup#1101
- Support hash indices >2 GB
- Better error reporting
enkore added a commit to enkore/borg that referenced this pull request May 9, 2017
- Preparation for borgbackup#1688 / borgbackup#1101
- Support hash indices >2 GB
- Better error reporting
enkore added a commit to enkore/borg that referenced this pull request May 9, 2017
- Preparation for borgbackup#1688 / borgbackup#1101
- Support hash indices >2 GB
- Better error reporting
enkore added a commit that referenced this pull request May 9, 2017
- Preparation for #1688 / #1101
- Support hash indices >2 GB
- Better error reporting
enkore added a commit to enkore/borg that referenced this pull request May 11, 2017
This is based on much earlier work from October 2016 by me, but is
overall simplified and changed terminology (from "signing" to
hashing and integrity checking).

See borgbackup#1688 for the full history.
@enkore
Copy link
Contributor Author

enkore commented May 11, 2017

It's time to close this one.

@enkore enkore closed this May 11, 2017
enkore added a commit that referenced this pull request May 12, 2017
Implement IntegrityCheckedFile

This is based on much earlier work from October 2016 by me, but is
overall simplified and changed terminology (from "signing" to
hashing and integrity checking).

See #1688 for the full history.
@enkore enkore mentioned this pull request May 25, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants