-
-
Notifications
You must be signed in to change notification settings - Fork 649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow alternate binaries download urls generation and convert GoDistribution and LLVM subsystems to use it #5780
Allow alternate binaries download urls generation and convert GoDistribution and LLVM subsystems to use it #5780
Conversation
CI is failing because Travis doesn't have
|
e426fee
to
ffdb530
Compare
So I was initially thinking we'd do this by leveraging However this means that some binary tools use The underlying design flaw, I think, is that |
downloaded_successfully = False | ||
accumulated_errors = [] | ||
for baseurl in OrderedSet(self._baseurls): # De-dup URLS: we only want to try each URL once. | ||
url = posixpath.join(baseurl, binary_path) | ||
for url in urls: # De-dup URLS: we only want to try each URL once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment references the fact that we put the urls into a set. You've removed that for whatever reason (why?) so the comment is no longer relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's a good catch. I was thinking that we might want to allow intentional retrying of urls, but realized after your comment that we should be doing that in the Fetcher, if anywhere, so added the OrderedSet()
call in b29bca3. Thanks!
680100c
to
b29bca3
Compare
@benjyw there are 3 or 4 comments describing a few more test cases I should add, but having added docstrings, this is in a reviewable state. |
btw, splitting up BinaryUtil into fetching and url generation was a fantastic idea, imo. I'm totally fine with iterating on it more if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came out really nicely! Just one comment that I think you can satisfy by renaming some things.
"(sysname, id) -> (os, arch), e.g. {('darwin', '15'): ('mac', '10.11'), " | ||
"('linux', 'arm32'): ('linux', 'arm32')}.")) | ||
# BinaryTool options. | ||
register('--force-baseurls', type=bool, default=False, advanced=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love this. It's weird to provide a really neat API, and then have this out-of-band way of overriding it, that requires an elaborate comment in the interface, and that will be completely non-obvious to the reader/debugger of this code.
I don't immediately have a better idea. But at the very least, let's name this something more obvious, like "--allow-external-binary-tool-downloads" (with a default of True, of course).
And then you can change the name of url_generator()
to get_external_url_generator()
, to clarify that it is affected by this option, and that if it returns None then "internal" (i.e., pants-hosted, or org-hosted) urls are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the only reason for this was for migration purposes, which isn't great. I would much prefer / tried adding an option to do this for individual BinaryTool
s, but then you would just have to override every one of them (if running pants on an internal network). As a pants.ini option, I might not use it, but as a command-line option I can see it being useful. I agree on not liking it and there is probably a better way.
I'll implement the option name change and the method name change you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c8bb85d has the changes. I think I got all the references to the old option, but we will see what CI has to say.
I did some work on deepening the testing for |
rest of their code can handle the result of downloading from different urls if url_generator() | ||
is overridden yet again by a subclass, or if it is ignored with --force-baseurls. | ||
If the bootstrap option --allow-external-binary-tool-downloads is False, the result of this | ||
method will be ignored. In general, implementors of BinaryTool subclasses should try to ensure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this means? In what way would a BinaryTool subclass be affected by the download location, and how would an author mitigate that in code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Also, I think the word is spelled "implementer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was thinking about how to make that more clear here. I was thinking specifically about what is done in the LLVM
subsystem by overriding the select()
method (and was going to explicitly call out that example here, but stopped short of doing that). Should I add in a comment describing that a bit (basically: we check if there's a single top-level directory and go down a level if so, or just return the result of select()
if the extraction doesn't put everything into a single top-level directory inside our specified extraction directory), or take this clause out of the docstring? Let me know if that made sense.
The internet confirms that "implementer" is the correct spelling.
help="Force BinaryTool subclasses to download from urls generated from " | ||
"--binaries-baseurls, even if the tool has a custom url generator. " | ||
"This can be necessary if using Pants in an environment which cannot " | ||
register('--allow-external-binary-tool-downloads', type=bool, default=True, advanced=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the name and the help text mesh together nicely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for the thoughtful changes.
Ah, I see. So the issue is that the internal and external artifacts may have different internal structure? How about saying something more along the lines of "Implementations of BinaryTool must be aware of differences (e.g., in archive structure) between the external and internal versions of the downloaded tool, if any." |
That sounds perfect and I tried changing that paragraph to the exact text of your comment in 55e6fb3 -- the bit about being further overridden by a subclass in particular was highly extraneous, so this makes it a lot clearer. I think this is a clear description of the concern I wanted to state there. |
… native backend subsystem tests (#5943) ### Problem The native backend subsystems tests introduced in #5490 are still skipped, complaining about `crti.o` on linux, which is part of libc. See #5662 -- `clang`'s driver will find the directory containing that file on travis, but `gcc` won't. We should make a way to find this file (which is necessary for creating executables) so we can unskip the native backend testing. ### Solution - Fix a mistake in #5780 -- we did not check the correct directory with `os.path.isdir()`, so we never found the `LLVM` BinaryTool when downloading it from the LLVM download page. - Find libc using the new `LibcDev` subsystem. This uses the option `--libc-dir`, if provided, or finds an installation of libc with `crti.o` by invoking `--host-compiler` on the host once to get its search directories (this is necessary on travis, due to ubuntu's nonstandard installation location). - Expand the definition of executables, compilers, and linkers in `src/python/pants/backend/native/config/environment.py` to cover everything needed to actually compile, and give them the ability to generate an environment suitable for passing into `subprocess.Popen()`. - Introduce `GCCCCompiler`, `GCCCppCompiler`, `LLVMCCompiler`, and `LLVMCppCompiler` to differentiate between the two different compilers we have available for each language. - Expose the libc lib directory to the compilers we create in `native_toolchain.py`. - Unskip tests in `test_native_toolchain.py` from #5490. - Make the default `CCompiler` and `CppCompiler` into clang, for no particular reason (it will pass CI with gcc as well). The different compilers we can use will likely be denoted using variants in the future, but this solution allows us to separate the resources generated from each subsystem (`GCC`, `LLVM`, `Binutils`, `XCodeCLITools`) from a fully-functioning compiler that can be invoked (because actually invoking either clang or gcc requires some resources from the other, e.g. headers and libraries). Now, each binary tool only does the job of wrapping the resources it contains, while `native_toolchain.py` does the job of creating either a clang or a gcc compiler that we can invoke on the current host (with all necessary headers, libs, etc). ### Result The native toolchain tests from #5490 can be unskipped, and we can invoke our toolchain on almost any linux installation without further setup. The tests in `test_native_toolchain.py` now pass in CI, and as we iterate on the native backend, we will continue to have end-to-end testing for both of our compilers, on osx as well, regardless of whichever one we choose to use for `python_dist()` compilation.
- Problem Attempting to use the latest node LTS (10.15.3 - the latest LTS released March 5th) fails to install in pants as it seems the binary is not in pants' s3 - as seen in the error returned by pants: Fetch of https://binaries.pantsbuild.org/bin/node/mac/10.13/v10.15.3/node.tar.gz failed with status code 404 The issue of binaries in s3 has come up before in another context (the motivation there was related to bandwidth costs, not unavailable binaries), and a solution was implemented for go and llvm in #5780. - Solution This continues the work done in the #5780, leveraging the BinaryToolUrlGenerator to allow node binaries to be fetched from nodejs.org (see full list of binaries for the latest LTS release here). - Result Users can now use any supported node release from nodejs.org - verified locally (on a Mac using the latest LTS)
Problem
BinaryTool
is a great recent development which makes using binaries downloaded lazily from a specified place much more declarative and much more extensible. However, it's still only able to download from either our S3 hosting, or a mirror.The previous structure requires the urls provided to the global option
--binaries-baseurls
to point to an exact mirror of the hierarchy we provide in our S3 hosting, but that can change at any time. It's not incredibly difficult to write a script to mirror our hosting into an internal network, but in general there's no reason the layout of binaries in~/.cache/pants/bin/
needs to determine where those binaries are downloaded from.Our bandwidth costs in S3 have recently increased due to the introduction of clang and gcc in #5490. See #5777 and #5779 for further context on S3 hosting. There are reliable binary downloads for some of these tools, which we would be remiss not to use if we can do it in a structured way.
Solution
urls=
argument to multiple methods ofBinaryUtil
forBinaryTool
s that don't download from our s3..tar.xz
archives by adding thexz
BinaryTool (see add xz package binaries#66) and integrating it into BinaryTool'sarchive_type
selection mechanism.go
andllvm
binaries from their official download urls.Clang
subsystem toLLVM
as the binary download we use now (for ubuntu 16.04, currently) also contains many other LLVM tools, including e.g.lld
.Result
Urls for binary downloads can now be created in a structured way for external downloads, with the
--force-baseurls
option as an escape hatch. Some binaries now default to external urls provided for public use by the maintainers of the software to download, thanks to the introduction of thexz
binary tool. Two out of the three largest bandwidth users among our provided binaries have been switched to use the download urls provided by the maintainers of each project (LLVM and Go). gcc still needs to be fixed, which will happen in a separate PR.