Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make builds reproducible #399

Closed
anarcat opened this issue Nov 11, 2015 · 19 comments
Closed

make builds reproducible #399

anarcat opened this issue Nov 11, 2015 · 19 comments
Assignees
Milestone

Comments

@anarcat
Copy link
Contributor

anarcat commented Nov 11, 2015

reproducible builds are interesting because they allow users to audit the binaries shipped by the borg collective to ensure that they match the source code that supposedly implements borg. for example, right now, building the binaries twice generates different results:

(borg)[1002]anarcat@angela:borg$ pyinstaller -F -n borg-linux-64-new --hidden-import=logging.config borg/__main__.py
[...]
(borg)[1003]anarcat@angela:borg$ pyinstaller -F -n borg-linux-64-new --hidden-import=logging.config borg/__main__.py
[...]
(borg)[1004]anarcat@angela:borg$ diff -u dist/*
Les fichiers binaires dist/borg-linux-64 et dist/borg-linux-64-new sont différents

same for tarballs and debian packages... see also https://wiki.debian.org/ReproducibleBuilds

it seems that pyinstaller can build reproducibly, but i'm not sure how: pyinstaller/pyinstaller#1590 i tried to parse the difference between the two binaries using vbindiff but, well.. it's a binary so i can't parse it (it's nothing obviously like date strings that i can see). radiff isn't helpful either...

pip install is somewhat reproducible. it will generate the same borg "binary" (it's a python text file after all) on multiple installs. it remains to be seen how reproducible the borg debian package will be once it enters debian. the results will show up here: https://reproducible.debian.net/rb-pkg/unstable/amd64/borgbackup.html

@ThomasWaldmann
Copy link
Member

you want to use pyinstaller -D ... for such experiments - then you do not need to dissect the file.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

interesting... a bit better, but still cryptic:

$ diff -rq *
Files borg-linux-64/base_library.zip and borg-linux-64-new/base_library.zip differ
Only in borg-linux-64: borg-linux-64
Only in borg-linux-64-new: borg-linux-64-new

The zip files differ only in metadata: once extracted, the contents are the same. the zipfiles differ because of the timestamps, of course, but i don't know if that's included directly in the binary as is? If so, then we'd need to fixup the mtime of the files because they are zipped, as per: https://wiki.debian.org/ReproducibleBuilds/TimestampsInZip

I guess this goes back to pyinstaller's yard. I'll look over there more for how to make it behave.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

Note that the strip-nondeterminism package fixes the problems with the zip file, but not the executables.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

so i filed this as #1668 upstream. but even with that out of the way, it's going to be difficult to tell people how exactly we built the binaries unless we use a very standard environment. i would suggest we build in a clean "Debian jessie" chroot with only the minimal dependencies. that way we can deduce, from the state of "jessie + security" at the date of the build, which exact versions of all the dependent packages were shipped with the build.

For OSX, i have no idea how we would proceed.

Phew, that is harder than I thought it would be. The .deb will be easy in comparison. ;)

@lfam
Copy link
Contributor

lfam commented Nov 12, 2015

Hi,

I have been building Borg with Guix [1], which, along with Nix, is really the way to go if you want to start building reproducibly. The whole build environment is declared and controlled in a very transparent way. There are still sources of non-determinism in some software packages, but at least with Guix / Nix you have a chance to declare the environment and hunt down the non-determinism. I haven't yet tested if Borg builds reproducibly across systems, architectures, etc, but it is a start.

I have a Guix package definition for Borg 0.28 available on the master branch of this repo:
https://github.com/lfam/pkgs/blob/master/leo/packages/borg.scm

As further enticement, check out this Borg 0.28 dependency graph that I generated with the simple Guix command guix graph borg | dot -Tpng > borg-dep-graph.png:
http://imgur.com/DeKut40

It's really quite easy to get started with Guix on a Linux distro. It doesn't conflict with your system's package manager at all.

[1]
http://www.gnu.org/software/guix/

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

@lfam i'd love to hear how we can make borg reproducible more easily in Nix/Guix as well! i have added Guix to the list of distro package in #105, is that correct?

@lfam
Copy link
Contributor

lfam commented Nov 12, 2015

@anarcat I haven't submitted Borg for inclusion in the official GNU Guix package repos, yet. Borg is in Nix, however.

But users can trivially use my package definition by cloning the repo to, for example, ~/pkgs, and then setting their environment while invoking Guix like this:

GUIX_PACKAGE_PATH=~/pkgs guix package -i borg

That will build (or substitute binaries) for Borg and all of its dependencies and install Borg into the user's environment.

I'm somewhat conservative about what I use for backups, and even more conservative about what I'm willing to suggest as backup software for an entire package system / OS.

I'd like to spend some more time exploring the limits of Borg. For example, questions like this about the limits of Python's stack recursion:
#380

And I remember recently, I think it was @ThomasWaldmann who called for testing of very large backups. Unfortunately, I don't have the hardware or resources to test that.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

@lfam let's followup the distro discussion in #105.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

@lfam as for large backups, it's considered tested up to the multi-terabyte range, see #216.

@ThomasWaldmann
Copy link
Member

@anarcat building binaries on jessie is out of question as that would mean that every binary user would need at least same or newer glibc version as on jessie. So, e.g. wheezy users (and likely also some other non-debian dists) could not run the binary.

@anarcat
Copy link
Contributor Author

anarcat commented Nov 12, 2015

interesting... make it wheezy then... :) can pyinstaller ship libc? that would resolve it...

@ThomasWaldmann
Copy link
Member

No, the glibc must match the system.

@ThomasWaldmann
Copy link
Member

related: pyinstaller/pyinstaller#1714

in short: export PYTHONHASHSEED=1

@ThomasWaldmann
Copy link
Member

also, .pyc files have a timestamp inside (32bit at offset 4) so python can detect whether they are newer than the corresponding .py.

@unode
Copy link

unode commented Feb 12, 2016

Hi guys,

I've also went down the road of building a binary out of the 1.0.0rc1 code but for a system using an older libc. The released binaries fail there complaining:

Error loading Python lib '/tmp/_MEIVTlqOR/libpython3.5m.so.1.0': /lib64/libc.so.6: version `GLIBC_2.7' not found (required by /tmp/_MEIVTlqOR/libpython3.5m.so.1.0)

However I'm having problems packaging it with pyinstaller. I always get errors referring to six or packaging not being in the final package.

Are there instructions on building a binary using pyinstaller somewhere? I didn't get far looking for workarounds...

@ThomasWaldmann
Copy link
Member

Hmm, check what the python requirements for glibc are. If somehow python 3.5 wants a rather new one, you can also build the whole thing with 3.4.x.

We don't use six, so I have no idea why you have trouble there.

Some infos are in the development section of our docs, the Vagrantfile might also be interesting.

@unode
Copy link

unode commented Feb 15, 2016

@ThomasWaldmann thanks for the Vagrant info. Managed to compile it but python 3.5 was still an issue. Ended up using another system which is more up to date and works fine. Cheers.

@dannyedel
Copy link
Contributor

Regarding reproducible (re-)builds: The Debian borgbackup binary package is 100% reproducible, so I guess it's just an issue of reproducing the exact set of dependencies used in the build, and the borg sourcecode itself is fine. However, I cannot tell to what extent the python-packages in Debian are patched in regard to upstream.

Maybe the diffoscope program can help you analyze the exact differences between the various builds, it's used to generate the HTML output on the Debian reproducible builds project.


The only remaining issue Debian has is reproducing the documentation (package borgbackup-doc), specifically api.html, reported as debian bug 816788. @anarcat pointed me to this discussion, so I'm copying this here, maybe it will help in tracking down the last remaining reproducibility issues.

In essence, the only problematic code pieces are initializers such as the one in archive.py

    def __init__(self, repository, key, manifest, name, cache=None, create=False,
                 checkpoint_interval=300, numeric_owner=False, progress=False,
                 chunker_params=CHUNKER_PARAMS,
                 start=datetime.utcnow(), end=datetime.utcnow()):

When compiling the documentation, the module will get imported. This will evaluate the expression datetime.utcnow() and then, when writing the html doc, it will output a serialisation of that object into html.

I'm not sure if we're calling sphinx wrong, and/or if there is a switch to make it embed the literal string =datetime.utcnow() into the html docs, rather then trying to evaluate and re-serialize. Currently the commands we use to build the cython modules, and then the docs, are essentially:

python3 setup.py build_ext --inplace
make -C docs html

If anyone has an idea how to fix this reproducibility issue, I'd be thankful for a pointer.


Edit: Look at diffoscope's analysis of borgbackup in debian/testing to see an example of the output, and to confirm the diff is only in the api.html of the borgbackup-doc package (the searchindex.js is dependent on api.html, so it can be ignored for now)

@ThomasWaldmann ThomasWaldmann modified the milestones: 1.1 - near future goals, 1.0.1 - fixes Mar 11, 2016
@ThomasWaldmann ThomasWaldmann self-assigned this Mar 11, 2016
@ThomasWaldmann
Copy link
Member

#746 another one was needed for docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants