-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Buildpack Dependency Management Improvements #8
Comments
+1 - I was just talking to @ForestEckhardt about how we can do some consolidations in pipelines. I'd definitely be interested in arriving on a singular way to handle dependencies across all the Paketo buildpacks. I'd also be happy to share our experiences using a more federated approach and pulling dependencies directly from upstream locations. There are some good parts and some challenges as well. |
@dmikusa-pivotal I'd like to hear about your experience with the federated approach, for sure. |
I have a couple of questions/comments:
Overall I am excited by the prospect of being able to eliminate some of the magic and some of the back breaking work and I think that it would make it easier for us to get new languages from community members. I think that it will also ultimately make our buildpacks make more sense because we are downloading and installing the same dependency that your average developer is using. I think that this will make make it easier for users that need to replace the dependency with one hosted on a mirror as well because the artifact is question will not be any different from the publicly available one . |
Some notes off the top of my head:
On the Java buildpacks, we have this information in buildpack.toml and update it when we update releases. The base information tends not to change, but we have to keep the versions in these fields all in sync. How is this being sourced with deps-server? |
It is being generated when it is input into the system so it is possible I was just curious if this also meant we would be stripping down the data we are providing for each dependency. For the most part many of our buildpacks use this to construct the old SBOM format and it may not make sense to have it there in the long run if we are using Syft. |
@ForestEckhardt, yes. We would still want this information. So, a solution would need to take this into account. |
It is also used to generate the new SBOM format: https://github.com/paketo-buildpacks/packit/blob/2247967a3f873b178f6fb16c5e6411646ca0882a/sbom/sbom.go#L72-L97 |
I stand corrected |
@dmikusa-pivotal thanks for outlining those cases it's super helpful. I'd say that item 2 Out of the items you mentioned, number 6 around hashes changing is the most concerning to me. Whatever process we implement, I think it'll be really important to have a way to reconcile mismatched SHAs or detect changes. Number 7 and 8 around modifications to the dependency are also pretty complicated, but I think moving to the federated approach will be a big help with this, since we can delegate out those types of decisions to language family maintainers potentially. |
👍 One other thought that's been a hindrance for us. Github Actions are not well suited to checking for dependency updates. There is no trigger or event, even if the 3rd party is using Github to release code, so you end up having to poll for updates. Presently, we're polling daily, because if we do it more often we'll blow past the limits Github Actions puts on the execution of our jobs. In some cases, this means we have to manually trigger the job like if we need to get an urgent update released. It's not a big deal and it's easy to do, but it's manual work. Also, if you have a buildpack that has many dependencies then you run into an issue with how to organize them. The Liberty buildpack, for example, has quite a few dependencies that we monitor. We presently have them set up such that each dependency has its own workflow. The workflows are largely the same but just check for different resources. This has some advantages in that it's easy to have them all run in parallel, if one fails it doesn't impact others, and it's easy to trigger just a single resource if you need to force an update or re-run a failed update. It's not nice in that the parallelization makes us hit Github Action limits faster, there's lots of duplication across workflows, and it's extremely inefficient (Github Actions spins up a new VM for each workflow & job). Personally, I'd like them to be more efficient. I've thought about how we could merge them all into a single workflow and job with multiple steps, but then you don't get the same parallelization and it's not easy to run/re-run a specific update. It's also not clear if that would help reduce duplication in the workflow, possibly, I haven't looked from that angle. I've also thought about moving this type of fetch outside of Github Actions, somewhere it can be done more efficiently and then using hooks/API to trigger Github Actions or submitting a PR directly. That's a big step though and we haven't had time to investigate it further. |
We don't have the luxury of using GitHub Actions internally so have built out a dependency update system in Concourse with a few custom Concourse Resources Re Point 5: There are a lot of dependencies that don't provide sha256sums but do provide others, if the sha256 exists we download and use that, if it doesn't, we calculate a new one against the downloaded binary, but we also verify the downloaded binary against some of the other shas that are available. This gives us an extra bit of confidence that the binary is what we were expecting it to be. Re Semver: It would be great if everything supported semver compatible version numbers - but they don't. We've added in quite a bit of logic around handling non-semver compatible version numbers. |
One possible additional goal that we could consider is making the dep server itself a system that others can re-use - either in part or as a whole. Although this would potentially increase the support burden on the dependencies team, I think it could also result in a significant value add for other buildpack authors outside of the paketo core team. I have three cases in mind where buildpack authors outside of the paketo team could benefit from a reusable (and federated) dep-server:
|
I wanted to also add this proposal as well: https://docs.google.com/document/d/1g5rRW-oE_v8Gdvz-CiCOK9z2rxg6L5XniKI25Zq2j6M/edit I believe it to be complementary to what's been discussed already. You can take a look at the google doc for now, but I hope to get this into an RFC format in the near future. |
As the RFCs for dependency management have been approved and merged, and work is already underway to implement them, I will close this issue. |
Summary
Many of the Paketo buildpacks contain references to dependencies that they will install during their build phase. These dependencies are often language runtimes like Ruby MRI or package managers like Poetry. The dependencies are tracked and built from their upstream source (dep-server) and updated in buildpacks (
jam update-dependencies
and dependency/update action) through a considerable amount of automation. This current architecture has outlived its utility and will likely present a significant technical headwind as we attempt to move buildpacks to new stacks.Outcome
This exploration should focus on providing direction for a future effort to modernize the dependency-building infrastructure we depend upon in Paketo Buildpacks. In the process, this exploration should weigh the following goals and any others that may be discovered in the exploration process with the result being an RFC outlining a future direction for dependencies.
Goals
Remove Cloud Foundry Dependency
The dependency-building automation is tightly coupled to the legacy dependency-building infrastructure inherited from Cloud Foundry (buildpacks-ci and binary-builder). Making changes to these codebases to support new Paketo use cases and features is a convoluted and difficult process. Ideally, we could move, refactor, or rewrite this code into codebases that we maintain within the Paketo Buildpacks project.
Use Upstream References
Many of our dependencies may already be built in a form that is usable on top of our stacks. In these cases, we shouldn't be re-building them to no real benefit. Instead, we should just reference the upstream artifact download location. An example of this might be Go. The Go downloads page serves pre-built tarballs for Linux on a number of architectures. Ideally, we would just be able to reference these download URLs in our
go-dist
buildpack.toml
file.Adopt Federated Model
The current dependency-building automation is mostly centralized in the
dep-server
repository. While this was good when we had a dedicated team of folks with a strong working knowledge of these components, it has become more difficult to maintain with a more diffuse ownership model. It would likely be more advantageous for us to move much of the monitoring and building infrastructure into the repositories where these dependencies will ultimately reside. For instance, it would make sense to have the dependency-building infrastructure for the Node.js runtime within thenode-engine
repository and directly under the responsibility of the Node.js Maintainers team. It still would make sense for a Dependencies team to help maintain expertise and tooling for the workflows involved in dependency-building generally, but the particulars of each dependency could be distributed to their respective buildpack repository.Consolidate with Java Workflows
The dependency-building infrastructure described above does not encompass any of the dependencies that contribute to the Java buildpacks. The Java buildpacks have their own system for managing dependencies. It is worth considering what a consolidation of these systems might look like.
Enable Multi-stack / Multi-architecture Support
The dependency-building infrastructure is tightly coupled to an Ubuntu Bionic-derived stack on a Linux AMD64 architecture. Ideally, we would propose a solution that would enable us to deliver dependencies on a more diverse set of operating system / architecture pairings.
The text was updated successfully, but these errors were encountered: