From 4bad7c099024b01569ad7eb0eae70e94b80d2c23 Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Mon, 8 Aug 2022 15:45:23 +0100 Subject: [PATCH 1/7] docs: move developer-specific docs to docs/dev/ folder Signed-off-by: Justin Chadwell --- docs/{merge+diff.md => dev/merge-diff.md} | 0 docs/{ => dev}/solver.md | 0 examples/README.md | 2 +- 3 files changed, 1 insertion(+), 1 deletion(-) rename docs/{merge+diff.md => dev/merge-diff.md} (100%) rename docs/{ => dev}/solver.md (100%) diff --git a/docs/merge+diff.md b/docs/dev/merge-diff.md similarity index 100% rename from docs/merge+diff.md rename to docs/dev/merge-diff.md diff --git a/docs/solver.md b/docs/dev/solver.md similarity index 100% rename from docs/solver.md rename to docs/dev/solver.md diff --git a/examples/README.md b/examples/README.md index e43ccda44654..0e0c13c17410 100644 --- a/examples/README.md +++ b/examples/README.md @@ -34,7 +34,7 @@ Different versions of the example scripts show different ways of describing the - `./buildkit1` - cloning git repositories has been separated for extra concurrency. - `./buildkit2` - uses git sources directly instead of running `git clone`, allowing better performance and much safer caching. - `./buildkit3` - allows using local source files for separate components eg. `./buildkit3 --runc=local | buildctl build --local runc-src=some/local/path` -- `./buildkit4` - uses MergeOp to optimize copy chains for better caching behavior (see `docs/merge+diff.md` for more details) +- `./buildkit4` - uses MergeOp to optimize copy chains for better caching behavior (see `docs/dev/merge-diff.md` for more details) - `./dockerfile2llb` - can be used to convert a Dockerfile to LLB for debugging purposes - `./nested-llb` - shows how to use nested invocation to generate LLB - `./gobuild` - shows how to use nested invocation to generate LLB for Go package internal dependencies From 2e5a01aa962a6a9a6b3b35c2d7e08c6bd62da311 Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Mon, 8 Aug 2022 15:54:20 +0100 Subject: [PATCH 2/7] docs: reformat dev docs Signed-off-by: Justin Chadwell --- docs/dev/merge-diff.md | 420 +++++++++++++++++++++++++++++++++-------- docs/dev/solver.md | 340 ++++++++++++++++++++++++--------- 2 files changed, 595 insertions(+), 165 deletions(-) diff --git a/docs/dev/merge-diff.md b/docs/dev/merge-diff.md index 09322f332b58..ba0bf19849d1 100644 --- a/docs/dev/merge-diff.md +++ b/docs/dev/merge-diff.md @@ -1,29 +1,56 @@ # Merge and Diff Ops -MergeOp and DiffOp are two interrelated LLB operations that enable the rebasing of LLB results onto other results and the separation of LLB results from their base, respectively. Underneath the hood, these ops enable fine grain manipulation of container layer chains that can result in highly efficient operations for many use cases. -This doc assumes some familiarity with LLB and ops like ExecOp and FileOp. More background on LLB can be obtained from the README.md in Buildkit's git repository. This doc also uses the Go LLB client for examples, though MergeOp and DiffOp are not in any way language specific. +MergeOp and DiffOp are two interrelated LLB operations that enable the rebasing +of LLB results onto other results and the separation of LLB results from their +base, respectively. Underneath the hood, these ops enable fine grain +manipulation of container layer chains that can result in highly efficient +operations for many use cases. + +This doc assumes some familiarity with LLB and ops like ExecOp and FileOp. More +background on LLB can be obtained from the README.md in Buildkit's git +repository. This doc also uses the Go LLB client for examples, though MergeOp +and DiffOp are not in any way language specific. ## MergeOp + MergeOp has a very simple interface: + ```go func Merge(inputs []llb.State) llb.State ``` -The intuition is that it merges the contents of the provided states together into one state (hence the name), with files from later states taking precedence over those from earlier ones. +The intuition is that it merges the contents of the provided states together +into one state (hence the name), with files from later states taking precedence +over those from earlier ones. + +To be more concrete, MergeOp returns a state where each of the input states are +rebased on top of each other in the order provided. "Rebasing" a state `B` onto +another state `A` creates a state that: -To be more concrete, MergeOp returns a state where each of the input states are rebased on top of each other in the order provided. "Rebasing" a state `B` onto another state `A` creates a state that: -* Has all the contents of `B` -* Has all the contents of `A` except when a path exists in both `B` and `A`. In this case: - * If both paths are directories, their contents are merged. Metadata (such as permissions) on the directory from `B` take precedence. - * If one of the paths is not a directory, whatever is present in `B` takes precedence. This also means that if a file in `B` overwrites a dir in `A`, then all files/dirs in the tree under at that path in `A` are also removed. +- Has all the contents of `B` +- Has all the contents of `A` except when a path exists in both `B` and `A`. In this case: + - If both paths are directories, their contents are merged. Metadata (such + as permissions) on the directory from `B` take precedence. + - If one of the paths is not a directory, whatever is present in `B` takes + precedence. This also means that if a file in `B` overwrites a dir in `A`, + then all files/dirs in the tree under at that path in `A` are also + removed. -MergeOp is associative, i.e. using shorthand notation: `Merge(A, B, C) == Merge(Merge(A, B), C) == Merge(A, Merge(B, C))`. Buildkit knows this and internally optimizes LLB merges that are equivalent in this way to re-use the same cache entries. +MergeOp is associative, i.e. using shorthand notation: `Merge(A, B, C) == +Merge(Merge(A, B), C) == Merge(A, Merge(B, C))`. Buildkit knows this and +internally optimizes LLB merges that are equivalent in this way to re-use the +same cache entries. -There are more subtleties to the behavior of MergeOp, such as when deletions are present in a layer making up a state, discussed in the "Advanced Details" section of this doc. +There are more subtleties to the behavior of MergeOp, such as when deletions +are present in a layer making up a state, discussed in the "Advanced Details" +section of this doc. -States created by MergeOp are the same as any other LLB states in that they can be used as the base for exec, be mounted to arbitrary paths in execs, be plugged into other merges and diffs, be exported, etc. +States created by MergeOp are the same as any other LLB states in that they can +be used as the base for exec, be mounted to arbitrary paths in execs, be +plugged into other merges and diffs, be exported, etc. As a very simple example: + ```go // a has /dir/a a := llb.Scratch(). @@ -53,28 +80,52 @@ mergedPlusMore := merged.File(llb.Mkdir("/yetanotherdir", 0755)) mergedPlusMore = llb.Merge([]llb.State{merged, llb.Scratch().File(llb.Mkdir("/yetanotherdir", 0755))}) ``` -### Container Image Export -When the result of a MergeOp is exported as a container image, the image will consist of the layers making up each input joined together in the order of the MergeOp. If Buildkit has cached any one of these layers already, they will not need to be re-exported (i.e. re-packaged into compressed tarballs). Additionally, if the image is being pushed to a registry and the registry indicates it already has any of the layers, then Buildkit can skip pushing those layers entirely. +### MergeOp Container Image Export + +When the result of a MergeOp is exported as a container image, the image will +consist of the layers making up each input joined together in the order of the +MergeOp. If Buildkit has cached any one of these layers already, they will not +need to be re-exported (i.e. re-packaged into compressed tarballs). +Additionally, if the image is being pushed to a registry and the registry +indicates it already has any of the layers, then Buildkit can skip pushing +those layers entirely. -Layers joined together by MergeOp do not have dependencies on each other, so a cache invalidation of the layers of one input doesn't cascade to the layers of the other inputs. +Layers joined together by MergeOp do not have dependencies on each other, so a +cache invalidation of the layers of one input doesn't cascade to the layers of +the other inputs. ## DiffOp + DiffOp also has a very simple interface: + ```go func Diff(lower llb.State, upper llb.State) llb.State ``` -The intuition is that it returns a state whose contents are the difference between `lower` and `upper`. It can be viewed as something like the inverse of MergeOp; whereas MergeOp "adds" states together, DiffOp "subtracts" `lower` from `upper` (in a manner of speaking). +The intuition is that it returns a state whose contents are the difference +between `lower` and `upper`. It can be viewed as something like the inverse of +MergeOp; whereas MergeOp "adds" states together, DiffOp "subtracts" `lower` +from `upper` (in a manner of speaking). -More specifically, DiffOp returns a state that has the contents present in `upper` that either aren't present in `lower` or have changed from `lower` to `upper`. Another way of thinking about it is that if you start at `A` and apply `Diff(A, B)`, you will end up at `B`. Or, even more succinctly, `Merge(A, Diff(A, B)) == B`. +More specifically, DiffOp returns a state that has the contents present in +`upper` that either aren't present in `lower` or have changed from `lower` to +`upper`. Another way of thinking about it is that if you start at `A` and apply +`Diff(A, B)`, you will end up at `B`. Or, even more succinctly, `Merge(A, +Diff(A, B)) == B`. -Files and dirs are considered to have changed between `lower` and `upper` if their contents are unequal or if metadata like permissions and `mtime` have changed. Unequal `atime` or `ctime` values are not considered to be a change. +Files and dirs are considered to have changed between `lower` and `upper` if +their contents are unequal or if metadata like permissions and `mtime` have +changed. Unequal `atime` or `ctime` values are not considered to be a change. -There are more subtleties to the behavior of DiffOp discussed in the "Advanced Details" section of this doc. +There are more subtleties to the behavior of DiffOp discussed in the "Advanced +Details" section of this doc. -States created by DiffOp are the same as any other LLB states in that they can be used as the base for exec, be mounted to arbitrary paths in execs, be plugged into merges and other diffs, be exported, etc. +States created by DiffOp are the same as any other LLB states in that they can +be used as the base for exec, be mounted to arbitrary paths in execs, be +plugged into merges and other diffs, be exported, etc. As a very simple example: + ```go base := llb.Image("alpine") basePlusBuilt := base.Run(llb.Shlex("touch /foo")).Root() @@ -82,8 +133,11 @@ basePlusBuilt := base.Run(llb.Shlex("touch /foo")).Root() diffed := llb.Diff(base, basePlusBuilt) ``` -### Container Image Export -When the result of a DiffOp is exported as a container image, layers will be re-used as much as possible. To explain, consider this case: +### DiffOp Container Image Export + +When the result of a DiffOp is exported as a container image, layers will be +re-used as much as possible. To explain, consider this case: + ```go lower := llb.Image("alpine") middle := lower.Run(llb.Shlex("touch /foo")).Root() @@ -91,26 +145,44 @@ upper := middle.Run(llb.Shlex("touch /bar")).Root() diff := llb.Diff(lower, upper) ``` -In this case, there is a "known chain" from `lower` to `upper` because `lower` is a state in `upper`'s history. This means that when the DiffOp is exported as a container image, it can just consist of the container layers for `middle` joined with the container layers for `upper`. +In this case, there is a "known chain" from `lower` to `upper` because `lower` +is a state in `upper`'s history. This means that when the DiffOp is exported as +a container image, it can just consist of the container layers for `middle` +joined with the container layers for `upper`. + +Another way of thinking about this is that when `lower` is a state in `upper`'s +history, the diff between the two is equivalent to a merge of the states +between them. So, using the example above: -Another way of thinking about this is that when `lower` is a state in `upper`'s history, the diff between the two is equivalent to a merge of the states between them. So, using the example above: ```go llb.Diff(lower, upper) == llb.Merge([]llb.State{ llb.Diff(lower, middle), llb.Diff(middle, upper), }) -```` +``` + This behavior extends to arbitrary numbers of states separating `lower` and `upper`. -In the case where there is not a chain between `lower` and `upper` that Buildkit can determine, DiffOp still works consistently but, when exported, will always result in a single layer that is not re-used from its inputs. +In the case where there is not a chain between `lower` and `upper` that +Buildkit can determine, DiffOp still works consistently but, when exported, +will always result in a single layer that is not re-used from its inputs. ## Example Use Case: Better "Copy Chains" with MergeOp + ### The Problem -A common pattern when building container images is to independently assemble components of the image and then combine those components together into a final image using a chain of Copy FileOps. For example, when using the Dockerfile frontend, this is the multi-stage build pattern and a chain of `COPY --from=...` statements. -One issue with this type of pattern is that if any of the inputs to the copy chain change, that doesn't just invalidate Buildkit's cache for that input, it also invalidates Buildkit's cache for any copied layers after that one. +A common pattern when building container images is to independently assemble +components of the image and then combine those components together into a final +image using a chain of Copy FileOps. For example, when using the Dockerfile +frontend, this is the multi-stage build pattern and a chain of `COPY +--from=...` statements. + +One issue with this type of pattern is that if any of the inputs to the copy +chain change, that doesn't just invalidate Buildkit's cache for that input, it +also invalidates Buildkit's cache for any copied layers after that one. To be a bit more concrete, consider the following LLB as specified with the Go client: + ```go // stage a a := llb.Image("alpine").Run("build a").Root() @@ -127,6 +199,7 @@ combined := llb.Image("alpine"). ``` Note that this is basically the equivalent of the following Dockerfile: + ```dockerfile FROM alpine as a RUN build a @@ -143,9 +216,20 @@ COPY --from=b /bin/b /usr/local/bin/b COPY --from=c /bin/c /usr/local/bin/c ``` -Now, say you do a build of this LLB and export the `combined` stage as a container image to a registry. If you were to then repeat the same build with the same instance of Buildkit, each part of the build should be cached, resulting in no work needing to be done and no layers needing to be exported or pushed to the registry. +Now, say you do a build of this LLB and export the `combined` stage as a +container image to a registry. If you were to then repeat the same build with +the same instance of Buildkit, each part of the build should be cached, +resulting in no work needing to be done and no layers needing to be exported or +pushed to the registry. + +Then, say you later do the build again but this time with a change to `a`. The +build for `a` is thus not cached, which means that the copy of `/bin/a` into +`/usr/local/bin/a` of `combined` is also not cached and has to be re-run. The +problem is that because each copy in to `combined` is chained together, the +invalidation of the copy from `a` also cascades to its descendants, namely the +copies from `b` and `c`. This is despite the fact that `b` and `c` are +independent of `a` and thus don't need to be invalidated. In graphical form: -Then, say you later do the build again but this time with a change to `a`. The build for `a` is thus not cached, which means that the copy of `/bin/a` into `/usr/local/bin/a` of `combined` is also not cached and has to be re-run. The problem is that because each copy in to `combined` is chained together, the invalidation of the copy from `a` also cascades to its descendants, namely the copies from `b` and `c`. This is despite the fact that `b` and `c` are independent of `a` and thus don't need to be invalidated. In graphical form: ```mermaid graph TD alpine("alpine") --> |CACHE HIT fa:fa-check| A("build a2.0") @@ -165,10 +249,16 @@ graph TD class A,ACopy,BCopy,CCopy red ``` -As a result, not only do the copies from `b` and `c` to create `/usr/local/bin/b` and `/usr/local/bin/c` need to run again, they also result in new layers needing to be exported and then pushed to a registry. For many use cases, this becomes a significant source of overhead in terms of build times and the amount of data that needs to be stored and transferred. +As a result, not only do the copies from `b` and `c` to create +`/usr/local/bin/b` and `/usr/local/bin/c` need to run again, they also result +in new layers needing to be exported and then pushed to a registry. For many +use cases, this becomes a significant source of overhead in terms of build +times and the amount of data that needs to be stored and transferred. ### The Solution + MergeOp can be used to fix the problem of cascading invalidation in copy chains: + ```go a := llb.Scratch().File(llb.Copy(llb.Image("alpine").Run("build a").Root(), "/bin/a", "/usr/local/bin/a")) b := llb.Scratch().File(llb.Copy(llb.Image("alpine").Run("build b").Root(), "/bin/b", "/usr/local/bin/b")) @@ -181,15 +271,32 @@ combined := llb.Merge([]llb.State{ }) ``` -(*Note that newer versions of Dockerfiles support a `--link` flag when using `COPY`, which results in basically this same pattern*) +(*Note that newer versions of Dockerfiles support a `--link` flag when using +`COPY`, which results in basically this same pattern*) Two changes have been made from the previous version: -1. `a`, `b`, and `c` have been updated to copy their desired contents to `Scratch` (a new, empty state). + +1. `a`, `b`, and `c` have been updated to copy their desired contents to + `Scratch` (a new, empty state). 1. `combined` is defined as a MergeOp of the states desired in the final image. -Say you're doing this build for the first time. The build will first create states `a`, `b`, and `c`, resulting in each being a single layer consisting only of contents `/usr/local/bin/a`, `/usr/local/bin/b`, and `/usr/local/bin/c` respectively. Then, the MergeOp rebases each of those states on to the base `busybox` image. As discussed earlier, the container image export of a MergeOp will consist of the layers of the merge inputs joined together, so the final image looks mostly the same as before. +Say you're doing this build for the first time. The build will first create +states `a`, `b`, and `c`, resulting in each being a single layer consisting +only of contents `/usr/local/bin/a`, `/usr/local/bin/b`, and `/usr/local/bin/c` +respectively. Then, the MergeOp rebases each of those states on to the base +`busybox` image. As discussed earlier, the container image export of a MergeOp +will consist of the layers of the merge inputs joined together, so the final +image looks mostly the same as before. + +The benefits of MergeOp become apparent when considering what happens if the +build of `a` is modified. Whereas before this led to invalidation of the copy +of `b` and `c`, now those merge inputs are completely unaffected; no new cache +entries or new container layers need to be created for them. So, the end result +is that the only work Buildkit does when `a` changes is re-build `a` and then +push the new layers for `/usr/local/bin/a` (plus a new image manifest). +`/usr/local/bin/b` and `/usr/local/bin/c` do not need to be re-exported and do +not need to be re-pushed to the registry. In graphical form: -The benefits of MergeOp become apparent when considering what happens if the build of `a` is modified. Whereas before this led to invalidation of the copy of `b` and `c`, now those merge inputs are completely unaffected; no new cache entries or new container layers need to be created for them. So, the end result is that the only work Buildkit does when `a` changes is re-build `a` and then push the new layers for `/usr/local/bin/a` (plus a new image manifest). `/usr/local/bin/b` and `/usr/local/bin/c` do not need to be re-exported and do not need to be re-pushed to the registry. In graphical form: ```mermaid graph TD alpine("alpine") --> |CACHE HIT fa:fa-check| A("build a2.0") @@ -210,27 +317,51 @@ graph TD class A,ACopy red ``` -An important aspect of this behavior is that MergeOp is implemented lazily, which means that its on-disk filesystem representation is only created locally when strictly required. This means that even though a change to `a` invalidates the MergeOp as a whole, no work needs to be done to create the merged state on-disk when it's only being exported as a container image. This laziness behavior is discussed more in the "Performance Considerations" section of the doc. +An important aspect of this behavior is that MergeOp is implemented lazily, +which means that its on-disk filesystem representation is only created locally +when strictly required. This means that even though a change to `a` invalidates +the MergeOp as a whole, no work needs to be done to create the merged state +on-disk when it's only being exported as a container image. This laziness +behavior is discussed more in the "Performance Considerations" section of the +doc. -You can see a working-code example of this by comparing `examples/buildkit3` with `examples/buildkit4` in the Buildkit git repo. +You can see a working-code example of this by comparing `examples/buildkit3` +with `examples/buildkit4` in the Buildkit git repo. ## Example Use Case: Remote-only Image Append with MergeOp -If you have some layers already pushed to a remote registry, MergeOp allows you to create new images that combine those layers in arbitrary ways without having to actually pull any layers down first. For example: + +If you have some layers already pushed to a remote registry, MergeOp allows you +to create new images that combine those layers in arbitrary ways without having +to actually pull any layers down first. For example: + ```go foo := llb.Image("fooApp:v0.1") bar := llb.Image("barApp:v0.3") qaz := llb.Image("qazApp:v1.2") merged := llb.Merge([]llb.State{foo, bar, qaz}) ``` -If `merged` is being exported to the same registry that already has the layers for `fooApp`, `barApp` and `qazApp`, then the only thing Buildkit does during the export is create an image manifest (just some metadata) and push it to the registry. No layers need to be pushed (they are already there) and they don't even need to be pulled locally to Buildkit either. + +If `merged` is being exported to the same registry that already has the layers +for `fooApp`, `barApp` and `qazApp`, then the only thing Buildkit does during +the export is create an image manifest (just some metadata) and push it to the +registry. No layers need to be pushed (they are already there) and they don't +even need to be pulled locally to Buildkit either. Note that if you were to instead do this: + ```go merged := llb.Merge([]llb.State{foo, bar, qaz}).Run(llb.Shlex("extra command")).Root() ``` -Then `fooApp`, `barApp` and `qazApp` will need to be pulled, though they will usually be merged together more efficiently than the naive solution of just unpacking the layers on top of each other. See the "Performance Details" section for more info. -Additionally, if you export your Buildkit cache to a registry, this same idea can be extended to any LLB types, not just `llb.Image`. So, using the same example as the previous use case: +Then `fooApp`, `barApp` and `qazApp` will need to be pulled, though they will +usually be merged together more efficiently than the naive solution of just +unpacking the layers on top of each other. See the "Performance Details" +section for more info. + +Additionally, if you export your Buildkit cache to a registry, this same idea +can be extended to any LLB types, not just `llb.Image`. So, using the same +example as the previous use case: + ```go a := llb.Scratch().File(llb.Copy(llb.Image("alpine").Run("build a").Root(), "/bin/a", "/usr/bin/a")) b := llb.Scratch().File(llb.Copy(llb.Image("alpine").Run("build b").Root(), "/bin/b", "/usr/bin/b")) @@ -243,24 +374,44 @@ combined := llb.Merge([]llb.State{ }) ``` -If you do a build that includes a remote cache export to a registry, then any Buildkit worker importing that cache can run builds that do different merges of those layers without having to pull anything down. For instance, if a separate Buildkit worker imported that remote cache and then built this: +If you do a build that includes a remote cache export to a registry, then any +Buildkit worker importing that cache can run builds that do different merges of +those layers without having to pull anything down. For instance, if a separate +Buildkit worker imported that remote cache and then built this: + ```go combined2 := llb.Merge([]llb.State{ c, a }) ``` -An export of `combined2` would not need to pull any layers down because it's just a merge of `c` and `a`, which already have layers in the registry thanks to the remote cache. This works because a remote cache import is actually just a metadata download; layers are only pulled locally once needed and they aren't needed for this MergeOp. + +An export of `combined2` would not need to pull any layers down because it's +just a merge of `c` and `a`, which already have layers in the registry thanks +to the remote cache. This works because a remote cache import is actually just +a metadata download; layers are only pulled locally once needed and they aren't +needed for this MergeOp. ## Example Use Case: Modeling Package Builds with MergeOp+DiffOp -Merge and Diff have many potential use cases, but one primary one is to assist higher level tooling that's using LLB to model "dependency-based builds", such as what's found in many package managers and other build systems. + +Merge and Diff have many potential use cases, but one primary one is to assist +higher level tooling that's using LLB to model "dependency-based builds", such +as what's found in many package managers and other build systems. More specifically, the following is a common pattern used to model the build of a "package" (or equivalent concept) in such systems: -1. The build-time dependencies of the package are combined into a filesystem. The dependencies are themselves just already-built packages. -1. A build is run by executing some commands that have access to the combined dependencies, producing new build artifacts that are somehow isolated from the dependencies. These isolated build artifacts become the new package's contents. -1. The new package can then be used as a dependency of other packages and/or served directly to end users, while being careful to ensure that any runtime dependencies are also present when the package needs to be utilized. + +1. The build-time dependencies of the package are combined into a filesystem. + The dependencies are themselves just already-built packages. +1. A build is run by executing some commands that have access to the combined + dependencies, producing new build artifacts that are somehow isolated from + the dependencies. These isolated build artifacts become the new package's + contents. +1. The new package can then be used as a dependency of other packages and/or + served directly to end users, while being careful to ensure that any runtime + dependencies are also present when the package needs to be utilized. One way to adapt the above model to LLB might be like this: + ```go // "Packages" are just LLB states. Build-time dependencies are combined // together into a filesystem using MergeOp. @@ -286,9 +437,23 @@ builtPackage := buildDeps.Run( llb.Merge([]llb.State{runtimeDeps, builtPackage}) ``` -While the above is a bit of an over-simplification (it, for instance, ignores the need to topologically sort dependency DAGs before merging them together), the important point is that it only needs MergeOp and ExecOp; DiffOp is left out entirely. For many use cases, this is completely fine and DiffOp is not needed. +While the above is a bit of an over-simplification (it, for instance, ignores +the need to topologically sort dependency DAGs before merging them together), +the important point is that it only needs MergeOp and ExecOp; DiffOp is left +out entirely. For many use cases, this is completely fine and DiffOp is not +needed. + +Some use cases can run into issues though, specifically with the part where +build artifacts need to be isolated from their dependencies. The above example +uses the convention of setting `DESTDIR`, an environment variable that +specifies a directory that `make install` should place artifacts under. Most +build systems support either `DESTDIR` or some type of equivalent mechanism for +isolating installed build artifacts. However, there are times when this +convention is either not available or not desired, in which case DiffOp can +come to the rescue as a generic, tool-agnostic way of separating states out +from their original dependency base. The modification from the previous example +is quite small: -Some use cases can run into issues though, specifically with the part where build artifacts need to be isolated from their dependencies. The above example uses the convention of setting `DESTDIR`, an environment variable that specifies a directory that `make install` should place artifacts under. Most build systems support either `DESTDIR` or some type of equivalent mechanism for isolating installed build artifacts. However, there are times when this convention is either not available or not desired, in which case DiffOp can come to the rescue as a generic, tool-agnostic way of separating states out from their original dependency base. The modification from the previous example is quite small: ```go // Same `make` command as before buildBase := buildDeps.Run( @@ -305,26 +470,56 @@ builtPackage := llb.Diff(buildBase, buildBase.Run( ).Root()) ``` -This approach using DiffOp should achieve the same end result as the previous version but without having to rely on `DESTDIR` support being present in the `make install` step. - -The fact that DiffOp is more generic and arguably simpler than setting `DESTDIR` or equivalents doesn't mean it's strictly better for every case. The following should be kept in mind when dealing with use cases where both approaches are viable: -1. The version that uses `DESTDIR` will likely have *slightly* better performance than the version using DiffOp for many use cases. This is because it's faster for Buildkit to merge in a state that is just a single layer on top of scratch (i.e. the first version of `builtPackage` that used `DESTDIR`) than it is to merge in a state whose diff is between two non-empty states (i.e. the DiffOp version). Whether the performance difference actually matters needs to be evaluated on a case-by-case basis. -1. DiffOp has some subtle behavior discussed in the "Advanced Details" section that, while irrelevant to most use cases, can occasionally distinguish it from the `DESTDIR` approach. +This approach using DiffOp should achieve the same end result as the previous +version but without having to rely on `DESTDIR` support being present in the +`make install` step. + +The fact that DiffOp is more generic and arguably simpler than setting +`DESTDIR` or equivalents doesn't mean it's strictly better for every case. The +following should be kept in mind when dealing with use cases where both +approaches are viable: + +1. The version that uses `DESTDIR` will likely have *slightly* better + performance than the version using DiffOp for many use cases. This is because + it's faster for Buildkit to merge in a state that is just a single layer on top + of scratch (i.e. the first version of `builtPackage` that used `DESTDIR`) than + it is to merge in a state whose diff is between two non-empty states (i.e. the + DiffOp version). Whether the performance difference actually matters needs to + be evaluated on a case-by-case basis. +1. DiffOp has some subtle behavior discussed in the "Advanced Details" section + that, while irrelevant to most use cases, can occasionally distinguish it from + the `DESTDIR` approach. ## Performance Considerations + ### Laziness -MergeOp and DiffOp are both implemented lazily in that their on-disk filesystem representations will only be created when absolutely necessary. -The most common situation in which a Merge/Diff result will need to be "unlazied" (created on disk) is when it is used as the input to an Exec or File op. For example: +MergeOp and DiffOp are both implemented lazily in that their on-disk filesystem +representations will only be created when absolutely necessary. + +The most common situation in which a Merge/Diff result will need to be +"unlazied" (created on disk) is when it is used as the input to an Exec or File +op. For example: + ```go rootfs := llb.Merge([]llb.State{A, B}) extraLayer := rootfs.Run(llb.Shlex("some command")).Root() ``` -In this case, if `extraLayer` is not already cached, `extraLayer` will need `rootfs` to exist on disk in order to run, so `rootfs` will have to be unlazied. The same idea applies if `extraLayer` was defined as a FileOp or if `rootfs` was defined using a `DiffOp`. -What's perhaps more interesting are cases in which merge/diff results *don't* need to be unlazied. One such situation is when they are exported as a container image. As discussed previously, layers from the inputs of merge/diff are re-used as much as possible during image exports, so that means that the final merged/diffed result is not needed, only the inputs. +In this case, if `extraLayer` is not already cached, `extraLayer` will need +`rootfs` to exist on disk in order to run, so `rootfs` will have to be +unlazied. The same idea applies if `extraLayer` was defined as a FileOp or if +`rootfs` was defined using a `DiffOp`. + +What's perhaps more interesting are cases in which merge/diff results *don't* +need to be unlazied. One such situation is when they are exported as a +container image. As discussed previously, layers from the inputs of merge/diff +are re-used as much as possible during image exports, so that means that the +final merged/diffed result is not needed, only the inputs. + +Another situation that doesn't require unlazying is when a merge/diff is used +as an input to another merge/diff. For example: -Another situation that doesn't require unlazying is when a merge/diff is used as an input to another merge/diff. For example: ```go diff1 := llb.Diff(A, B) diff2 := llb.Diff(C, D) @@ -334,22 +529,49 @@ merge := llb.Merge([]llb.State{diff1, diff2}) In this case, even though `diff1` and `diff2` are used as an input to `merge`, they do not need to be unlazied because `merge` is also lazy. If `A`, `B`, `C` or `D` are lazy LLB states, they also do not need to be unlazied. Laziness is transitive in this respect. ### Snapshotter-dependent Optimizations -There are some optimizations in the implementation of Merge and Diff op that are relevant to users concerned with scaling large builds involving many different merges and/or diffs. These optimizations are ultimately implementation details though and don't have any impact on the actual contents of merge/diff results. -When a merge or diff result needs to be unlazied, the "universal" fallback implementation that works for all snapshotter backends is to create them by copying files from the inputs as needed into a new filesystem. This works but it can become costly in terms of disk space and CPU time at a certain scale. +There are some optimizations in the implementation of Merge and Diff op that +are relevant to users concerned with scaling large builds involving many +different merges and/or diffs. These optimizations are ultimately +implementation details though and don't have any impact on the actual contents +of merge/diff results. + +When a merge or diff result needs to be unlazied, the "universal" fallback +implementation that works for all snapshotter backends is to create them by +copying files from the inputs as needed into a new filesystem. This works but +it can become costly in terms of disk space and CPU time at a certain scale. -However, for two of the default snapshotters (overlay and native), there is an optimization in place to avoid copying files and instead hardlink them from the inputs into the merged/diffed filesystem. This is at least as fast as copying the files and often significantly faster for inputs with large file sizes. +However, for two of the default snapshotters (overlay and native), there is an +optimization in place to avoid copying files and instead hardlink them from the +inputs into the merged/diffed filesystem. This is at least as fast as copying +the files and often significantly faster for inputs with large file sizes. ## Advanced Details -These details are not expected to impact many use cases, but are worth reviewing if you are experiencing surprising behavior while using Merge and Diff op or otherwise want to understand them at a deeper level. + +These details are not expected to impact many use cases, but are worth +reviewing if you are experiencing surprising behavior while using Merge and +Diff op or otherwise want to understand them at a deeper level. ### Layer-like Behavior of Merge and Diff -One important principal of LLB results is that when they are exported as container images, an external runtime besides Buildkit that pulls and unpacks the image must see the same filesystem that is seen during build time. -That may seem a bit obvious, but it has important implications for Merge and Diff, which are ops that are designed to re-use container layers from their inputs as much as possible in order to maximize cache re-use and efficiency. Many of the more surprising aspects of the behavior discussed in the rest of this doc are a result of needing to ensure that Merge+Diff results look the same before and after export as container layers. +One important principal of LLB results is that when they are exported as +container images, an external runtime besides Buildkit that pulls and unpacks +the image must see the same filesystem that is seen during build time. + +That may seem a bit obvious, but it has important implications for Merge and +Diff, which are ops that are designed to re-use container layers from their +inputs as much as possible in order to maximize cache re-use and efficiency. +Many of the more surprising aspects of the behavior discussed in the rest of +this doc are a result of needing to ensure that Merge+Diff results look the +same before and after export as container layers. ### Deletions -When either 1) an LLB state deletes a file present in its parent chain or 2) `upper` lacks a path that is present in `lower` while using DiffOp, that deletion is considered an "entity" in the same way that a directory or file is and can have an effect when using that state as a merge input. For example: + +When either 1) an LLB state deletes a file present in its parent chain or 2) +`upper` lacks a path that is present in `lower` while using DiffOp, that +deletion is considered an "entity" in the same way that a directory or file is +and can have an effect when using that state as a merge input. For example: + ```go // create a state that only has /foo foo := llb.Scratch().File(llb.Mkfile("/foo", 0644, nil)) @@ -362,9 +584,16 @@ bar := rmFoo.File(llb.Mkfile("/bar", 0644, nil)) merged := llb.Merge([]llb.State{foo, bar}) ``` -You might assume that `merged` would consist of the files `/foo` and `/bar`, but it will actually just consist of `/bar`. This is because the state `bar` also includes a deletion of the file `/foo` in its chain and thus a part of its definition. -One way of understanding this is that when you merge `foo` and `bar`, you are actually merging the diffs making up each state in the chain that created `foo` and `bar`, i.e.: +You might assume that `merged` would consist of the files `/foo` and `/bar`, +but it will actually just consist of `/bar`. This is because the state `bar` +also includes a deletion of the file `/foo` in its chain and thus a part of its +definition. + +One way of understanding this is that when you merge `foo` and `bar`, you are +actually merging the diffs making up each state in the chain that created `foo` +and `bar`, i.e.: + ```go llb.Merge([]llb.State{foo, bar}) == llb.Merge([]llb.State{ // foo's chain (only 1 layer) @@ -375,33 +604,66 @@ llb.Merge([]llb.State{foo, bar}) == llb.Merge([]llb.State{ llb.Diff(rmFoo, bar), // create /bar }) ``` -As you can see, `Diff(foo, rmFoo)` is included there and its only "content" is a deletion of `/foo`. Therefore, when `merged` is being constructed, it will apply that deletion and `/foo` will not exist in the final `merged` result. -Also note that if the order of the merge was reversed to be `Merge([]State{bar, foo})`, then `/foo` will actually exist in `merged` alongside `/bar` because then the contents of `foo` take precedent over the contents of `bar`, and then create of `/foo` therefore "overwrites" the previous deletion of it. +As you can see, `Diff(foo, rmFoo)` is included there and its only "content" is +a deletion of `/foo`. Therefore, when `merged` is being constructed, it will +apply that deletion and `/foo` will not exist in the final `merged` result. + +Also note that if the order of the merge was reversed to be `Merge([]State{bar, +foo})`, then `/foo` will actually exist in `merged` alongside `/bar` because +then the contents of `foo` take precedent over the contents of `bar`, and then +create of `/foo` therefore "overwrites" the previous deletion of it. -One final detail to note is that even though deletions are entities in the same way files/dirs are, they do not show up when mounted. For example, if you were to mount `llb.Diff(foo, rmFoo)` during a build, you would just see an empty directory. Deletions only have an impact when used as an input to MergeOp. +One final detail to note is that even though deletions are entities in the same +way files/dirs are, they do not show up when mounted. For example, if you were +to mount `llb.Diff(foo, rmFoo)` during a build, you would just see an empty +directory. Deletions only have an impact when used as an input to MergeOp. #### Workarounds -For use cases that are experiencing this behavior and do not want it, the best option is to find a way to avoid including the problematic deletion in your build definition. This can be very use-case specific, but using the previous example one option might be this: + +For use cases that are experiencing this behavior and do not want it, the best +option is to find a way to avoid including the problematic deletion in your +build definition. This can be very use-case specific, but using the previous +example one option might be this: + ```go justBar := llb.Diff(rmFoo, bar) merged := llb.Merge([]llb.State{foo, justBar}) ``` -Now, `merged` consists of both `/foo` and `/bar` because `justBar` has "diffed out" its parent `rmFoo` and consists only of the final layer that creates `/bar`. Other use cases may require different approaches like changing build commands to avoid unneeded deletions of files and directories. -For use cases that can't avoid the deletion for whatever reason, the fallback option is to use a Copy op to squash the merge input and discard any deletions. So, building off the previous example: +Now, `merged` consists of both `/foo` and `/bar` because `justBar` has "diffed +out" its parent `rmFoo` and consists only of the final layer that creates +`/bar`. Other use cases may require different approaches like changing build +commands to avoid unneeded deletions of files and directories. + +For use cases that can't avoid the deletion for whatever reason, the fallback +option is to use a Copy op to squash the merge input and discard any deletions. +So, building off the previous example: + ```go squashedBar := llb.Scratch().File(llb.Copy(bar, "/", "/")) merged := llb.Merge([]llb.State{foo, squashedBar}) ``` -This results in `merged` consisting of both `/foo` and `/bar`. This is because `squashedBar` is a single layer that only consists of the file+directories that existed in `bar`, not any of its deletions. -Note that there are currently performance tradeoffs to this copy approach in that it will actually result in a copy on disk (i.e. no hardlink optimizations), the copy will not be lazy and `squashedBar` will be a distinct layer from its inputs as far as the Buildkit cache and any remote registries are concerned, which may or may not matter depending on the use-case. +This results in `merged` consisting of both `/foo` and `/bar`. This is because +`squashedBar` is a single layer that only consists of the file+directories that +existed in `bar`, not any of its deletions. + +Note that there are currently performance tradeoffs to this copy approach in +that it will actually result in a copy on disk (i.e. no hardlink +optimizations), the copy will not be lazy and `squashedBar` will be a distinct +layer from its inputs as far as the Buildkit cache and any remote registries +are concerned, which may or may not matter depending on the use-case. ### Diff Corner Cases -There are some cases where it's ambiguous what the right behavior should be when merging diffs together. As stated before, Merge+Diff resolve these ambiguities by following the same behavior as container image import/export implementations in order to maintain consistency. + +There are some cases where it's ambiguous what the right behavior should be +when merging diffs together. As stated before, Merge+Diff resolve these +ambiguities by following the same behavior as container image import/export +implementations in order to maintain consistency. One example: + ```go dir := llb.Scratch().File(llb.Mkdir("/dir", 0755)) dirFoo := dir.File(llb.Mkfile("/dir/foo", 0755, nil)) @@ -415,4 +677,8 @@ otherdir := llb.Scratch().File(llb.Mkdir("/otherdir", 0755)) merged := llb.Merge([]llb.State{otherdir, rmFoo}) ``` -In this case, you start with just `/otherdir` and apply `rmFoo`, which is a deletion of `/dir/foo`. But `/dir/foo` doesn't exist, so it may be reasonable to expect that it just has no effect. However, image import/export code will actually create `/dir` even though it only exists in order to hold an inapplicable delete. As a result, Merge+Diff also have this same behavior. +In this case, you start with just `/otherdir` and apply `rmFoo`, which is a +deletion of `/dir/foo`. But `/dir/foo` doesn't exist, so it may be reasonable +to expect that it just has no effect. However, image import/export code will +actually create `/dir` even though it only exists in order to hold an +inapplicable delete. As a result, Merge+Diff also have this same behavior. diff --git a/docs/dev/solver.md b/docs/dev/solver.md index 45b81c5cb078..db8b9e146d13 100644 --- a/docs/dev/solver.md +++ b/docs/dev/solver.md @@ -1,16 +1,37 @@ -## Buildkit solver design - -The solver is a component in BuildKit responsible for parsing the build definition and scheduling the operations to the workers for execution. - -Solver package is heavily optimized for deduplication of work, concurrent requests, remote and local caching and different per-vertex caching modes. It also allows operations and frontends to call back to itself with new definition that they have generated. - -The implementation of the solver is quite complicated, mostly because it is supposed to be performant with snapshot-based storage layer and distribution model using layer tarballs. It is expected that calculating the content based checksum of snapshots between every operation or after every command execution is too slow for common use cases and needs to be postponed to when it is likely to have a meaningful impact. Ideally, the user shouldn't realize that these optimizations are taking place and just get intuitive caching. It is also hoped that if some implementations can provide better cache capabilities, the solver would take advantage of that without requiring significant modification. - -In addition to avoiding content checksum scanning the implementation is also designed to make decisions with minimum available data. For example, for remote caching sources to be effective the solver will not require the cache to be loaded or exists for all the vertexes in the graph but will only load it for the final node that is determined to match cache. As another example, if one of the inputs (for example image) can produce a definition based cache match for a vertex, and another (for example local source files) can only produce a content-based(slower) cache match, the solver is designed to detect it and skip content-based check for the first input(that would cause a pull to happen). - -### Build definition - -The solver takes in a build definition in the form of a content addressable operation definition that forms a graph. +# Buildkit solver design + +The solver is a component in BuildKit responsible for parsing the build +definition and scheduling the operations to the workers for execution. + +Solver package is heavily optimized for deduplication of work, concurrent +requests, remote and local caching and different per-vertex caching modes. It +also allows operations and frontends to call back to itself with new definition +that they have generated. + +The implementation of the solver is quite complicated, mostly because it is +supposed to be performant with snapshot-based storage layer and distribution +model using layer tarballs. It is expected that calculating the content based +checksum of snapshots between every operation or after every command execution +is too slow for common use cases and needs to be postponed to when it is likely +to have a meaningful impact. Ideally, the user shouldn't realize that these +optimizations are taking place and just get intuitive caching. It is also hoped +that if some implementations can provide better cache capabilities, the solver +would take advantage of that without requiring significant modification. + +In addition to avoiding content checksum scanning the implementation is also +designed to make decisions with minimum available data. For example, for remote +caching sources to be effective the solver will not require the cache to be +loaded or exists for all the vertexes in the graph but will only load it for +the final node that is determined to match cache. As another example, if one of +the inputs (for example image) can produce a definition based cache match for a +vertex, and another (for example local source files) can only produce a +content-based(slower) cache match, the solver is designed to detect it and skip +content-based check for the first input(that would cause a pull to happen). + +## Build definition + +The solver takes in a build definition in the form of a content addressable +operation definition that forms a graph. A vertex in this graph is defined by these properties: @@ -31,20 +52,49 @@ type Edge struct { type Index int ``` -Every vertex has a content-addressable digest that represents a checksum of the definition graph up to that vertex including all of its inputs. If two vertexes have the same checksum, they are considered identical when they are executing concurrently. That means that if two other vertexes request a vertex with the same digest as an input, they will wait for the same operation to finish. - -The vertex digest can only be used for comparison while the solver is running and not between different invocations. For example, if parallel builds require using `docker.io/library/alpine:latest` image as one of the operations, it is pulled only once. But if a build using `docker.io/library/alpine:latest` was built earlier, the checksum based on that name can't be used for finding if the vertex was already built because the image might have changed in the registry and "latest" tag might be pointing to another image. - -`Sys()` method returns an object that is used to resolve the executor for the operation. This is how a definition can pass logic to the worker that will execute the task associated with the vertex, without the solver needing to know anything about the implementation. When the solver needs to execute a vertex, it will send this object to a worker, so the worker needs to be configured to understand the object returned by `Sys()`. The solver itself doesn't care how the operations are implemented and therefore doesn't define a type for this value. In LLB solver this value would be with type `llb.Op`. - -`Inputs()` returns an array of other vertexes the current vertex depends on. A vertex may have zero inputs. After an operation has executed, it returns an array of return references. If another operation wants to depend on any of these references they would define an input with that vertex and an index of the reference from the return array(starting from zero). Inputs need to be contained in the `Digest()` of the vertex - two vertexes with different inputs should never have the same digest. - -Options contain extra information that can be associated with the vertex but what doesn't change the definition(or equality check) of it. Normally this is either a hint to the solver, for example, to ignore cache when executing. It can also be used for associating messages with the vertex that can be helpful for tracing purposes. - - -### Operation interface - -Operation interface is how the solver can evaluate the properties of the actual vertex operation. These methods run on the worker, and their implementation is determined by the value of `vertex.Sys()`. The solver is configured with a "resolve" function that can convert a `vertex.Sys()` into an `Op`. +Every vertex has a content-addressable digest that represents a checksum of the +definition graph up to that vertex including all of its inputs. If two vertexes +have the same checksum, they are considered identical when they are executing +concurrently. That means that if two other vertexes request a vertex with the +same digest as an input, they will wait for the same operation to finish. + +The vertex digest can only be used for comparison while the solver is running +and not between different invocations. For example, if parallel builds require +using `docker.io/library/alpine:latest` image as one of the operations, it is +pulled only once. But if a build using `docker.io/library/alpine:latest` was +built earlier, the checksum based on that name can't be used for finding if the +vertex was already built because the image might have changed in the registry +and "latest" tag might be pointing to another image. + +`Sys()` method returns an object that is used to resolve the executor for the +operation. This is how a definition can pass logic to the worker that will +execute the task associated with the vertex, without the solver needing to know +anything about the implementation. When the solver needs to execute a vertex, +it will send this object to a worker, so the worker needs to be configured to +understand the object returned by `Sys()`. The solver itself doesn't care how +the operations are implemented and therefore doesn't define a type for this +value. In LLB solver this value would be with type `llb.Op`. + +`Inputs()` returns an array of other vertexes the current vertex depends on. A +vertex may have zero inputs. After an operation has executed, it returns an +array of return references. If another operation wants to depend on any of +these references they would define an input with that vertex and an index of +the reference from the return array(starting from zero). Inputs need to be +contained in the `Digest()` of the vertex - two vertexes with different inputs +should never have the same digest. + +Options contain extra information that can be associated with the vertex but +what doesn't change the definition(or equality check) of it. Normally this is +either a hint to the solver, for example, to ignore cache when executing. It +can also be used for associating messages with the vertex that can be helpful +for tracing purposes. + +## Operation interface + +Operation interface is how the solver can evaluate the properties of the actual +vertex operation. These methods run on the worker, and their implementation is +determined by the value of `vertex.Sys()`. The solver is configured with a +"resolve" function that can convert a `vertex.Sys()` into an `Op`. ```go // Op is an implementation for running a vertex @@ -81,59 +131,155 @@ type Result interface { } ``` -There are two functions that every operation defines. One describes how to calculate a cache key for a vertex and another how to execute it. - -`CacheMap` is a description for calculating the cache key. It contains a digest that is combined with the cache keys of the inputs to determine the stable checksum that can be used to cache the operation result. For the vertexes that don't have inputs(roots), it is important that this digest is a stable secure checksum. For example, in LLB this digest is a manifest digest for container images or a commit SHA for git sources. - -`CacheMap` may also define optional selectors or content-based cache functions for its inputs. A selector is combined with the input cache key and useful for describing when different parts of an input are being used, and inputs cache key needs to be customized. Content-based cache function allows computing a new cache key for an input after it has completed. In LLB this is used for calculating cache key based on the checksum of file contents of the input snapshots. - -`Exec` executes the operation defined by a vertex by passing in the results of the inputs. - - -### Shared graph - -After new build request is sent to the solver, it first loads all the vertexes to the shared graph structure. For status tracking, a job instance needs to be created, and vertexes are loaded through jobs. A job ID is assigned to every vertex. If vertex with the same digest has already been loaded to the shared graph, a new job ID is appended to the existing record. When the job finishes, it removes all of its references from the loaded vertex. The resources are released if no more references remain. - -Loading a vertex also creates a progress writer associated with it and sets up the cache sources associated with the specific vertex. - -After vertexes have been loaded to the job, it is safe to request a result from an edge pointing to a previously loaded vertex. To do this `build(ctx, Edge) (CachedResult, error)` method is called on the static scheduler instance associated with the solver. - -### Scheduler - -The scheduler is a component responsible for invoking the individual operations needed to find the result for the graph. While the build definition is defined with vertexes, the scheduler is solving edges. In the case of LLB solver, a result of a solved edge is associated with a snapshot. Usually, to solve an edge, the input edges need to be solved first and this can be done concurrently, but there are many exceptions like edge may be cached but its input might be not, or solving one input might cause a cache hit while solving others would just be wasteful. Scheduler tries do handle all these cases. - -The scheduler is implemented as a single threaded non-blocking event loop. The single threaded constraint is for simplicity and might be removed in the future - currently, it is not known if this would have any performance impact. All the events in the scheduler have one fixed sender and receiver. The interface for interacting with the scheduler is to create a "pipe" between a sender and a receiver. One or both sides of the pipe may be an edge instance of the graph. If a pipe is added it to the scheduler and an edge receives an event from the pipe, the scheduler will "unpark" that edge so it can process all the events it had received. - -The unpark handler for an edge needs to be non-blocking and execute quickly. The edge will process the data from the incoming events and update its internal state. When calling unpark, the scheduler has already separated out the sender and receiver sides of the pipes that in the code are referred as incoming and outgoing requests. The incoming requests are usually requests to retrieve a result or a cache key from an edge. If it appears that an edge doesn't have enough internal state to satisfy the requests, it can make new pipes and register them with the scheduler. These new pipes are generally of two types: ones asking for some async function to be completed and others that request an input edge to reach a specific state first. - -To avoid bugs and deadlocks in this logic, the unpark method needs to follow the following rules. If unpark has finished without completing all incoming requests it needs to create outgoing requests. Similarly, if an incoming request remains pending, at least one outgoing request needs to exist as well. Failing to comply with this rule will cause the scheduler to panic as a precaution to avoid leaks and hiding errors. - -### Edge state - -During unpark, edge state is incremented until it can fulfill the incoming requests. - -An edge can be in the following states: initial, cache-fast, cache-slow, completed. Completed edge contains a reference to the final result, in-progress edge may have zero or more cache keys. - -The initial state is the starting state for any edge. If a state has reached a cache-fast state, it means that all the definition based cache key lookups have been performed. Cache-slow means that content-based cache lookup has been performed as well. If possible, the scheduler will avoid looking up the slow keys of inputs if they are unnecessary for solving current edge. - -The unpark method is split into four phases. The first phase processes all incoming events (responses from outgoing requests or new incoming requests) that caused the unpark to be called. These contain responses from async functions like calls to get the cachemap, execution result or content-based checksum for an input, or responses from input edges when their state or number of cache keys has changed. All the results are stored in edge's internal state. For the new cache keys, a query is performed to determine if any of them can create potential matches to the current edge. - -After that, if any of the updates caused changes to edge's properties, a new state is calculated for the current vertex. In this step, all potential cache keys from inputs can cause new cache keys for the edge to be created and the status of an edge might be updated. - -Third, the edge will go over all of its incoming requests, to determine if the current internal state is sufficient for satisfying them all. There are a couple of possibilities how this check may end up. If all requests can be completed and there are no outgoing requests the requests finish and unpark method returns. If there are outgoing requests but the edge has reached the completed state or all incoming requests have been canceled, the outgoing requests are canceled. This is an async operation as well and will cause unpark to be called again after completion. If this condition didn't apply but requests could be completed and there are outgoing requests, then the incoming request is answered but not completed. The receiver can then decide to cancel this request if needed. If no new data has appeared to answer the incoming requests, the desired state for an edge is determined for an edge from the incoming requests, and we continue to the next step. - -The fourth step sets up outgoing requests based on the desired state determined in the third step. If the current state requires calling any async functions to move forward then it is done here. We will also loop through all the inputs to determine if it is important to raise their desired state. Depending on what inputs can produce content based cache keys and what inputs have already returned possible cache matches, the desired state for inputs may be raised at different times. - -When an edge needs to resolve an operation to call the async `CacheMap` and `Exec` methods, it does so by calling back to the shared graph. This makes sure that two different edges pointing to the same vertex do not execute twice. The result values for the operation that is shared by the edges is also cached until the vertex is cleaned up. Progress reporting is also handled and forwarded to the job through this shared vertex instance. - -Edge state is cleaned up when a final job that loaded the vertexes that they are connected to is discarded. - - -### Cache providers - -Cache providers determine if there is a result that matches the cache keys generated during the build that could be reused instead of fully reevaluating the vertex and its inputs. There can be multiple cache providers, and specific providers can be defined per vertex using the vertex options. - -There are multiple backend implementations for cache providers, in-memory one used in unit tests, the default local one using bbolt and one based on cache manifests in a remote registry. +There are two functions that every operation defines. One describes how to +calculate a cache key for a vertex and another how to execute it. + +`CacheMap` is a description for calculating the cache key. It contains a digest +that is combined with the cache keys of the inputs to determine the stable +checksum that can be used to cache the operation result. For the vertexes that +don't have inputs(roots), it is important that this digest is a stable secure +checksum. For example, in LLB this digest is a manifest digest for container +images or a commit SHA for git sources. + +`CacheMap` may also define optional selectors or content-based cache functions +for its inputs. A selector is combined with the input cache key and useful for +describing when different parts of an input are being used, and inputs cache +key needs to be customized. Content-based cache function allows computing a new +cache key for an input after it has completed. In LLB this is used for +calculating cache key based on the checksum of file contents of the input +snapshots. + +`Exec` executes the operation defined by a vertex by passing in the results of +the inputs. + +## Shared graph + +After new build request is sent to the solver, it first loads all the vertexes +to the shared graph structure. For status tracking, a job instance needs to be +created, and vertexes are loaded through jobs. A job ID is assigned to every +vertex. If vertex with the same digest has already been loaded to the shared +graph, a new job ID is appended to the existing record. When the job finishes, +it removes all of its references from the loaded vertex. The resources are +released if no more references remain. + +Loading a vertex also creates a progress writer associated with it and sets up +the cache sources associated with the specific vertex. + +After vertexes have been loaded to the job, it is safe to request a result from +an edge pointing to a previously loaded vertex. To do this `build(ctx, Edge) +(CachedResult, error)` method is called on the static scheduler instance +associated with the solver. + +## Scheduler + +The scheduler is a component responsible for invoking the individual operations +needed to find the result for the graph. While the build definition is defined +with vertexes, the scheduler is solving edges. In the case of LLB solver, a +result of a solved edge is associated with a snapshot. Usually, to solve an +edge, the input edges need to be solved first and this can be done +concurrently, but there are many exceptions like edge may be cached but its +input might be not, or solving one input might cause a cache hit while solving +others would just be wasteful. Scheduler tries do handle all these cases. + +The scheduler is implemented as a single threaded non-blocking event loop. The +single threaded constraint is for simplicity and might be removed in the future - +currently, it is not known if this would have any performance impact. All the +events in the scheduler have one fixed sender and receiver. The interface for +interacting with the scheduler is to create a "pipe" between a sender and a +receiver. One or both sides of the pipe may be an edge instance of the graph. +If a pipe is added it to the scheduler and an edge receives an event from the +pipe, the scheduler will "unpark" that edge so it can process all the events it +had received. + +The unpark handler for an edge needs to be non-blocking and execute quickly. +The edge will process the data from the incoming events and update its internal +state. When calling unpark, the scheduler has already separated out the sender +and receiver sides of the pipes that in the code are referred as incoming and +outgoing requests. The incoming requests are usually requests to retrieve a +result or a cache key from an edge. If it appears that an edge doesn't have +enough internal state to satisfy the requests, it can make new pipes and +register them with the scheduler. These new pipes are generally of two types: +ones asking for some async function to be completed and others that request an +input edge to reach a specific state first. + +To avoid bugs and deadlocks in this logic, the unpark method needs to follow +the following rules. If unpark has finished without completing all incoming +requests it needs to create outgoing requests. Similarly, if an incoming +request remains pending, at least one outgoing request needs to exist as well. +Failing to comply with this rule will cause the scheduler to panic as a +precaution to avoid leaks and hiding errors. + +## Edge state + +During unpark, edge state is incremented until it can fulfill the incoming +requests. + +An edge can be in the following states: initial, cache-fast, cache-slow, +completed. Completed edge contains a reference to the final result, +in-progress edge may have zero or more cache keys. + +The initial state is the starting state for any edge. If a state has reached a +cache-fast state, it means that all the definition based cache key lookups have +been performed. Cache-slow means that content-based cache lookup has been +performed as well. If possible, the scheduler will avoid looking up the slow +keys of inputs if they are unnecessary for solving current edge. + +The unpark method is split into four phases. The first phase processes all +incoming events (responses from outgoing requests or new incoming requests) +that caused the unpark to be called. These contain responses from async +functions like calls to get the cachemap, execution result or content-based +checksum for an input, or responses from input edges when their state or number +of cache keys has changed. All the results are stored in edge's internal state. +For the new cache keys, a query is performed to determine if any of them can +create potential matches to the current edge. + +After that, if any of the updates caused changes to edge's properties, a new +state is calculated for the current vertex. In this step, all potential cache +keys from inputs can cause new cache keys for the edge to be created and the +status of an edge might be updated. + +Third, the edge will go over all of its incoming requests, to determine if the +current internal state is sufficient for satisfying them all. There are a +couple of possibilities how this check may end up. If all requests can be +completed and there are no outgoing requests the requests finish and unpark +method returns. If there are outgoing requests but the edge has reached the +completed state or all incoming requests have been canceled, the outgoing +requests are canceled. This is an async operation as well and will cause unpark +to be called again after completion. If this condition didn't apply but +requests could be completed and there are outgoing requests, then the incoming +request is answered but not completed. The receiver can then decide to cancel +this request if needed. If no new data has appeared to answer the incoming +requests, the desired state for an edge is determined for an edge from the +incoming requests, and we continue to the next step. + +The fourth step sets up outgoing requests based on the desired state determined +in the third step. If the current state requires calling any async functions to +move forward then it is done here. We will also loop through all the inputs to +determine if it is important to raise their desired state. Depending on what +inputs can produce content based cache keys and what inputs have already +returned possible cache matches, the desired state for inputs may be raised at +different times. + +When an edge needs to resolve an operation to call the async `CacheMap` and +`Exec` methods, it does so by calling back to the shared graph. This makes sure +that two different edges pointing to the same vertex do not execute twice. The +result values for the operation that is shared by the edges is also cached +until the vertex is cleaned up. Progress reporting is also handled and +forwarded to the job through this shared vertex instance. + +Edge state is cleaned up when a final job that loaded the vertexes that they +are connected to is discarded. + +## Cache providers + +Cache providers determine if there is a result that matches the cache keys +generated during the build that could be reused instead of fully reevaluating +the vertex and its inputs. There can be multiple cache providers, and specific +providers can be defined per vertex using the vertex options. + +There are multiple backend implementations for cache providers, in-memory one +used in unit tests, the default local one using bbolt and one based on cache +manifests in a remote registry. Simplified cache provider has following methods: @@ -144,18 +290,36 @@ Load(ctx context.Context, rec *CacheRecord) (Result, error) Save(key *CacheKey, s Result) (*ExportableCacheKey, error) ``` -Query method is used to determine if there exist a possible cache link between the input and a vertex. It takes parameters provided by `op.CacheMap` and cache keys returned by the calling the same method on its inputs. +Query method is used to determine if there exist a possible cache link between +the input and a vertex. It takes parameters provided by `op.CacheMap` and cache +keys returned by the calling the same method on its inputs. -If a cache key has been found, the matching records can be asked for them. A cache key can have zero or more records. Having a record means that a cached result can be loaded for a specific vertex. The solver supports partial cache chains, meaning that not all inputs need to have a cache record to match cache for a vertex. +If a cache key has been found, the matching records can be asked for them. A +cache key can have zero or more records. Having a record means that a cached +result can be loaded for a specific vertex. The solver supports partial cache +chains, meaning that not all inputs need to have a cache record to match cache +for a vertex. -Load method is used to load a specific record into a result reference. This value is the same type as the one returned by the `op.Exec` method. +Load method is used to load a specific record into a result reference. This +value is the same type as the one returned by the `op.Exec` method. -Save allows adding more records to the cache. +Save allows adding more records to the cache. -### Merging edges +## Merging edges -One final piece of solver logic allows merging two edges into one when they have both returned the same cache key. In practice, this appears for example when a build uses image references `alpine:latest` and `alpine@sha256:abcabc` in its definition and they actually point to the same image. Another case where this appears is when same source files from different sources are being used as part of the build. +One final piece of solver logic allows merging two edges into one when they +have both returned the same cache key. In practice, this appears for example +when a build uses image references `alpine:latest` and `alpine@sha256:abcabc` +in its definition and they actually point to the same image. Another case where +this appears is when same source files from different sources are being used as +part of the build. -After scheduler has called `unpark()` on an edge it checks it the method added any new cache keys to its state. If it did it will check its internal index if another active edge already exists with the same cache key. If it does it performs some basic validation, for example checking that the new edge has not explicitly asked cache to be ignored, and if it passes, merges the states of two edges. +After scheduler has called `unpark()` on an edge it checks it the method added +any new cache keys to its state. If it did it will check its internal index if +another active edge already exists with the same cache key. If it does it +performs some basic validation, for example checking that the new edge has not +explicitly asked cache to be ignored, and if it passes, merges the states of +two edges. -In the result of the merge, the edge that was checked is deleted, its ongoing requests are canceled and the incoming ones are added to the original edge. \ No newline at end of file +In the result of the merge, the edge that was checked is deleted, its ongoing +requests are canceled and the incoming ones are added to the original edge. From d119adb274eb50d97df4f8abb6a6e7d8bbb04282 Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Mon, 8 Aug 2022 15:55:26 +0100 Subject: [PATCH 3/7] docs: add dev doc on dockerfile->llb conversion Co-authored-by: Edgar Lee Co-authored-by: coryb Signed-off-by: Justin Chadwell --- docs/dev/dockerfile-llb.md | 208 +++++++++++++++++++++++++++++++++++++ 1 file changed, 208 insertions(+) create mode 100644 docs/dev/dockerfile-llb.md diff --git a/docs/dev/dockerfile-llb.md b/docs/dev/dockerfile-llb.md new file mode 100644 index 000000000000..00ba3890837b --- /dev/null +++ b/docs/dev/dockerfile-llb.md @@ -0,0 +1,208 @@ +# Dockerfile conversion to LLB + +If you want to understand how Buildkit translates Dockerfile instructions into +LLB, or you want to write your own frontend, then seeing how Dockerfile maps to +using the Buildkit LLB package will give you a jump start. + +The `llb` package from Buildkit provides a chainable state object to help +construct a LLB. Then you can marshal the state object into a definition using +protocol buffers, and send it off in a solve request over gRPC. + +In code, these transformations are performed by the [`Dockerfile2LLB()`](../../frontend/dockerfile/dockerfile2llb/convert.go) +function, which takes a raw `Dockerfile`'s contents and converts it to an LLB +state, and associated image config, which are then both assembled in the +[`Build()`](../../frontend/dockerfile/builder/build.go) function. + +## Basic examples + +Here are a few Dockerfile instructions you should be familiar with: + +- Base image + + ```dockerfile + FROM golang:1.12 + ``` + + ```golang + st := llb.Image("golang:1.12") + ``` + +- Scratch image + + ```dockerfile + FROM scratch + ``` + + ```golang + st := llb.Scratch() + ``` + +- Environment variables + + ```dockerfile + ENV DEBIAN_FRONTEND=noninteractive + ``` + + ```golang + st = st.AddEnv("DEBIAN_FRONTEND", "noninteractive") + ``` + +- Running programs + + ```dockerfile + RUN echo hello + ``` + + ```golang + st = st.Run( + llb.Shlex("echo hello"), + ).Root() + ``` + +- Working directory + + ```dockerfile + WORKDIR /path + ``` + + ```golang + st = st.Dir("/path") + ``` + +## File operations + +This is where LLB starts to deviate from Dockerfile in features. In +Dockerfiles, the run command is completely opaque to the builder and just +executes the command. But in LLB, there are file operations that have better +caching semantics and understanding of the command: + +- Copying files + + ```dockerfile + COPY --from=builder /files/* /files + ``` + + ```golang + var CopyOptions = &llb.CopyInfo{ + FollowSymlinks: true, + CopyDirContentsOnly: true, + AttemptUnpack: false, + CreateDestPath: true, + AllowWildcard: true, + AllowEmptyWildcard: true, + } + st = st.File( + llb.Copy(builder, "/files/*", "/files", CopyOptions), + ) + ``` + +- Adding files + + ```dockerfile + ADD --from=builder /files.tgz /files + ``` + + ```golang + var AddOptions = &llb.CopyInfo{ + FollowSymlinks: true, + CopyDirContentsOnly: true, + AttemptUnpack: true, + CreateDestPath: true, + AllowWildcard: true, + AllowEmptyWildcard: true, + } + st = st.File( + llb.Copy(builder, "/files.tgz", "files", AddOptions), + ) + ``` + +- Chaining file commands + + ```dockerfile + # not possible without RUN in Dockerfile + RUN mkdir -p /some && echo hello > /some/file + ``` + + ```golang + st = st.File( + llb.Mkdir("/some", 0755), + ).File( + llb.Mkfile("/some/file", 0644, "hello"), + ) + ``` + +## Bind mounts + +Bind mounts allow unidirectional syncing of the host's local file system into +the build environment. + +Bind mounts in Buildkit should not be confused with bind mounts in the linux +kernel - they do not sync bidirectionally. Bind mounts are only a snapshot of +your local state, which is specified through the `llb.Local` state object: + +- Using bind mounts + + ```dockerfile + WORKDIR /builder + RUN --mount=type=bind,target=/builder \ + PIP_INDEX_URL=https://my-proxy.com/pypi \ + pip install . + ``` + + ```golang + localState := llb.Local( + "context", + llb.SessionID(client.BuildOpts().SessionID), + llb.WithCustomName("loading .") + llb.FollowPaths([]string{"."}), + ) + + st = st.Dir("/builder").Run( + llb.Shlex("pip install ."), + llb.AddEnv( + "PIP_INDEX_URL", + "https://my-proxy.com/pypi", + ), + llb.AddMount("/builder", localState) + ).Root() + ``` + +## Cache mounts + +Cache mounts allow for a shared file cache location between build invocations, +which allow manually caching expensive operations, such as package downloads. +Mounts have options to persist between builds with different sharing modes. + +- Using cache mounts + + ```dockerfile + RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ + --mount=type=cache,target=/var/lib/apt \ + apt-get update + ``` + + ```golang + var VarCacheAptMount = llb.AddMount( + "/var/cache/apt", + llb.Scratch(), + llb.AsPersistentCacheDir( + "some-cache-id", + llb.CacheMountLocked, + ), + ) + + var VarLibAptMount = llb.AddMount( + "/var/lib/apt", + llb.Scratch(), + llb.AsPersistentCacheDir( + "another-cache-id", + llb.CacheMountShared, + ), + ) + + st := st.Run( + llb.Shlex("apt-get update"), + VarCacheAptMount, + VarLibAptMount, + ).Root() + ``` From df55fa5df8f41df6f3d9619738767fbff8684dea Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Mon, 8 Aug 2022 15:55:55 +0100 Subject: [PATCH 4/7] docs: add dev doc on the solve request lifecycle Co-authored-by: Edgar Lee Co-authored-by: coryb Signed-off-by: Justin Chadwell --- docs/dev/request-lifecycle.md | 251 ++++++++++++++++++++++++++++++++++ 1 file changed, 251 insertions(+) create mode 100644 docs/dev/request-lifecycle.md diff --git a/docs/dev/request-lifecycle.md b/docs/dev/request-lifecycle.md new file mode 100644 index 000000000000..1b0a23e584de --- /dev/null +++ b/docs/dev/request-lifecycle.md @@ -0,0 +1,251 @@ +# Solve Request Lifecycle + +Buildkit solves build graphs to find the final result. By default, nothing will +be exported to the client, but requests can be made after solving the graph to +export results to external destinations (like the client’s filesystem). + +A solve request goes through the following: + +1. Client makes a solve request and sends it to buildkitd over gRPC. The + request may either include a LLB definition, or the name of a frontend (must + be `dockerfile.v0` or `gateway.v0`), but it must not be both. +2. Buildkitd receives the solve request with the Controller. The controller is + registered as the ControlServer gRPC service. +3. The controller passes it down to the LLB solver, which will create a job for + this request. It will also create a FrontendLLBBridge, that provides a + solving interface over the job object. +4. The request is processed: + - If the request is definition-based, it will simply build the definition. + - If the request is frontend-based, it will run the frontend over the + gateway while passing it a reference to the FrontendLLBBridge. Frontends + must return a result for the solve request, but they may also issue solve + requests themselves to the bridge. +5. The results are plumbed back to the client, and the temporary job and bridge + are discarded. + +```plantuml +@startuml +ControlClient -> ControlServer : Solve +ControlServer -> Solver : Solve + +Solver -> Job : Create job +activate Job + +Solver -> FrontendLLBBridge : Create bridge over Job +activate FrontendLLBBridge + + +Solver -> FrontendLLBBridge : Solve + +alt definition-based solve + FrontendLLBBridge -> Job : Build + activate Job #FFBBBB + Job --> FrontendLLBBridge : Result + deactivate Job +else frontend-based solve + FrontendLLBBridge -> Frontend : Solve + activate Frontend #FFBBBB + note over FrontendLLBBridge, Frontend : Frontend must be either \ndockerfile.v0 or gateway.v0. + + loop + Frontend -[#SeaGreen]> FrontendLLBBridge : Solve + FrontendLLBBridge -[#SeaGreen]> Job : Build + activate Job #SeaGreen + note right FrontendLLBBridge : Implementations may also call\nFrontendLLBBridge to solve graphs\nbefore returning the result. + Job -[#SeaGreen]-> FrontendLLBBridge : Result + deactivate Job + FrontendLLBBridge -[#SeaGreen]-> Frontend : Result + end + + Frontend --> FrontendLLBBridge : Result + deactivate Frontend +end + +FrontendLLBBridge --> Solver : Result +Solver -> FrontendLLBBridge : Discard +deactivate FrontendLLBBridge + +Solver -> Job : Discard +deactivate Job + +Solver --> ControlServer : Result +ControlServer --> ControlClient : Result +@enduml +``` + +> Diagram from + +An important detail is that frontends may also issue solve requests, which are +often definition-based solves, but can also be frontend-based solves, allowing +for composability of frontends. Note that if a frontend makes a frontend-based +solve request, they will share the same FrontendLLBBridge and underlying job. + +## Dockerfile frontend (`dockerfile.v0`) + +Buildkit comes with a Dockerfile frontend which essentially is a parser that +translates Dockerfile instructions into a LLB definition. In order to introduce +new features into the Dockerfile DSL without breaking backwards compatibility, +Dockerfiles can include a syntax directive at the top of the file to indicate a +frontend image to use. + +For example, users can include a syntax directive to use +`docker/dockerfile:1-labs` to opt-in for an extended Dockerfile DSL that +takes advantage of Buildkit features. However, the frontend image doesn’t have +to be Dockerfile-specific. One can write a frontend that reads a YAML file, and +using the syntax directive, issue the build request using `docker build -f +my-config.yaml`. + +The lifecycle of a `dockerfile.v0` frontend-based solve request goes through +the following: + +1. Starting from the "frontend-based solve" path, the bridge looks up the + Dockerfile frontend if the frontend key is `dockerfile.v0`, and requests a + solve to the frontend. The gateway forwarder implements the frontend + interface and wraps over a BuildFunc that builds Dockerfiles. +2. The BuildFunc issues a solve request to read the Dockerfile from a source + (local context, git, or HTTP), and parses it to find a syntax directive. + - If a syntax directive is found, it delegates the solve to the `gateway.v0` + frontend. + - If a syntax directive is not found, then it parses the Dockerfile + instructions and builds an LLB. The LLB is marshaled into a definition and + sent in a solve request. + +```plantuml +@startuml +participant Job +participant FrontendLLBBridge + +box "Dockerfile frontend" + participant "Gateway Forwarder" as Frontend + participant BuildFunc +end box + +[-> FrontendLLBBridge : Solve +FrontendLLBBridge -> Frontend : Solve + +Frontend -> BuildFunc : Call +activate BuildFunc + +BuildFunc -[#SeaGreen]> FrontendLLBBridge : Solve +FrontendLLBBridge -[#SeaGreen]> Job : Build +activate Job #SeaGreen +note over Frontend : Solve to read Dockerfile from\nlocal context, git, or HTTP. +Job -[#SeaGreen]-> FrontendLLBBridge : Result +deactivate Job +FrontendLLBBridge -[#SeaGreen]-> BuildFunc : Result + +alt Dockerfile has syntax directive + BuildFunc -> FrontendLLBBridge : Solve + activate FrontendLLBBridge #FFBBBB + note over Frontend : Dockerfile delegates solve to\ngateway.v0 frontend. + FrontendLLBBridge --> BuildFunc : Result + deactivate FrontendLLBBridge +else Dockerfile has no syntax directive + BuildFunc -> FrontendLLBBridge : Solve + FrontendLLBBridge -> Job : Build + activate Job #FFBBBB + note over Frontend : Solve graph generated by\nDockerfile2LLB. + Job --> FrontendLLBBridge : Result + deactivate Job + FrontendLLBBridge --> BuildFunc : Result +end + +BuildFunc --> Frontend : Return +deactivate BuildFunc + +Frontend --> FrontendLLBBridge : Result +FrontendLLBBridge -->[ : Result +@enduml +``` + +> Diagram from + +## Gateway frontend (`gateway.v0`) + +The gateway frontend allows external frontends to be implemented as container +images, allowing for a pluggable architecture. The container images have access +to the gRPC service through stdin/stdout. The easiest way to implement a +frontend image is to create a golang binary that vendors buildkit because they +have a convenient LLB builders and utilities. + +The lifecycle of a `gateway.v0` frontend-based solve request goes through the +following: + +1. Starting from the "frontend-based solve" path, the bridge looks up the + Gateway frontend if the frontend key is `gateway.v0`, and requests a solve + to the frontend. +2. The gateway frontend resolves a frontend image from the `source` key + and solves the request to retrieve the rootfs for the image. +3. A temporary gRPC server is created that forwards requests to the LLB bridge. +4. A container using the frontend image rootfs is created, and a gRPC + connection is established from a process inside the container to the + temporary bridge forwarder. +5. The frontend image is then able to build LLBs and send solve requests + through the forwarder. +6. The container exits, and then the results are plumbed back to the LLB + bridge, which plumbs them back to the client. + +```plantuml +@startuml +participant Job +participant FrontendLLBBridge +participant "Gateway frontend" as Frontend +participant Worker +participant LLBBridgeForwarder +participant "Executor" as Executor +participant "Frontend Container" as Container + +[-> FrontendLLBBridge : Solve +FrontendLLBBridge -> Frontend : Solve +Frontend -> Worker : ResolveImageConfig +activate Worker #FFBBBB +Worker --> Frontend : Digest +deactivate Worker +Frontend -[#SeaGreen]> FrontendLLBBridge : Solve + +FrontendLLBBridge -[#SeaGreen]> Job : Build +activate Job #SeaGreen +note right of FrontendLLBBridge : The frontend image specified\nby build option "source" is solved\nand the rootfs of that image\nis then used to run the container. +Job -[#SeaGreen]-> FrontendLLBBridge : Result +deactivate Job + +FrontendLLBBridge -[#SeaGreen]-> Frontend : Result + +note over LLBBridgeForwarder : A temporary gRPC server is created\n that listens on the stdio of the\nfrontend container. The requests are\nthen forwarded to LLB bridge. +Frontend -> LLBBridgeForwarder : Create forwarder +activate LLBBridgeForwarder + +Frontend -[#MediumSlateBlue]> FrontendLLBBridge : Exec +FrontendLLBBridge -[#MediumSlateBlue]> Worker : Exec +Worker -[#MediumSlateBlue]> Executor : Exec + +Executor -[#MediumSlateBlue]> Container : Create container task +activate Container #MediumSlateBlue + +group container-based solve + note left of Container : Frontend images may request\ndefinition/frontend-based solves\nlike any other client. + loop + Container -> LLBBridgeForwarder : Solve + LLBBridgeForwarder -> FrontendLLBBridge : Solve + activate FrontendLLBBridge #FFBBBB + FrontendLLBBridge --> LLBBridgeForwarder : Result + deactivate FrontendLLBBridge + LLBBridgeForwarder --> Container : Result + end +end + +Container -[#MediumSlateBlue]-> Executor : Exit +deactivate Container + +Executor -[#MediumSlateBlue]-> Worker : Exit +Worker -[#MediumSlateBlue]-> FrontendLLBBridge : Exit +FrontendLLBBridge -[#MediumSlateBlue]-> Frontend : Exit +Frontend -> LLBBridgeForwarder : Discard +deactivate LLBBridgeForwarder + +Frontend --> FrontendLLBBridge : Result +FrontendLLBBridge -->[ : Result +@enduml +``` + +> Diagram from From a6727811584207541ab8e266588c1e5a2e2bc2ff Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Mon, 8 Aug 2022 15:56:11 +0100 Subject: [PATCH 5/7] docs: add dev doc README as index page Signed-off-by: Justin Chadwell --- docs/dev/README.md | 48 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) create mode 100644 docs/dev/README.md diff --git a/docs/dev/README.md b/docs/dev/README.md new file mode 100644 index 000000000000..f144842c0219 --- /dev/null +++ b/docs/dev/README.md @@ -0,0 +1,48 @@ +# BuildKit Developer Docs + +These are the BuildKit developer docs, designed to be read by technical users +interested in contributing to or integrating with BuildKit. + +> **Warning** +> +> While these docs attempt to keep up with the current state of our `master` +> development branch, the code is constantly changing and updating, as bugs are +> fixed, and features are added. Remember, the ultimate source of truth is +> always the code base. + +## Jargon + +The following terms are often used throughout the codebase and the developer +documentation to describe different components and processes in the image build +process. + +| Name | Description | +| :--- | :---------- | +| **LLB** | LLB stands for low-level build definition, which is a binary intermediate format used for defining the dependency graph for processes running part of your build. | +| **Definition** | Definition is the LLB serialized using protocol buffers. This is the protobuf type that is transported over the gRPC interfaces. | +| **Frontend** | Frontends are builders of LLB and may issue requests to Buildkit’s gRPC server like solving graphs. Currently there is only `dockerfile.v0` and `gateway.v0` implemented, but the gateway frontend allows running container images that function as frontends. | +| **State** | State is a helper object to build LLBs from higher level concepts like images, shell executions, mounts, etc. Frontends use the state API in order to build LLBs and marshal them into the definition. | +| **Solver** | Solver is an abstract interface to solve a graph of vertices and edges to find the final result. An LLB solver is a solver that understands that vertices are implemented by container-based operations, and that edges map to container-snapshot results. | +| **Vertex** | Vertex is a node in a build graph. It defines an interface for a content addressable operation and its inputs. | +| **Op** | Op defines how the solver can evaluate the properties of a vertex operation. An op is retrieved from a vertex and executed in the worker. For example, there are op implementations for image sources, git sources, exec processes, etc. | +| **Edge** | Edge is a connection point between vertices. An edge references a specific output a vertex’s operation. Edges are used as inputs to other vertices. | +| **Result** | Result is an abstract interface return value of a solve. In LLB, the result is a generic interface over a container snapshot. | +| **Worker** | Worker is a backend that can run OCI images. Currently, Buildkit can run with workers using either runc or containerd. | + +## Table of Contents + +The developer documentation is split across various files. + +For an overview of the process of building images: + +- [Request lifecycle](./request-lifecycle.md) - observe how incoming requests + are solved to produce a final artifact. +- [Dockerfile to LLB](./dockerfile-llb.md) - understand how `Dockerfile` + instructions are converted to the LLB format. +- [Solver](./solver.md) - understand how LLB is evaluated by the solver to + produce the solve graph. + +We also have a number of more specific guides: + +- [MergeOp and DiffOp](./merge-diff.md) - learn how MergeOp and DiffOp are + implemented, and how to program with them in LLB. From 2f2faaff31b5413378838d83152860dfa8444d8f Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Wed, 10 Aug 2022 10:37:29 +0100 Subject: [PATCH 6/7] docs: reword request lifecycle diagrams to mermaid Signed-off-by: Justin Chadwell --- docs/dev/request-lifecycle.md | 289 +++++++++++++++++----------------- 1 file changed, 142 insertions(+), 147 deletions(-) diff --git a/docs/dev/request-lifecycle.md b/docs/dev/request-lifecycle.md index 1b0a23e584de..92ded05c874e 100644 --- a/docs/dev/request-lifecycle.md +++ b/docs/dev/request-lifecycle.md @@ -23,58 +23,55 @@ A solve request goes through the following: 5. The results are plumbed back to the client, and the temporary job and bridge are discarded. -```plantuml -@startuml -ControlClient -> ControlServer : Solve -ControlServer -> Solver : Solve + +```mermaid +sequenceDiagram + ControlClient ->> ControlServer : Solve + ControlServer ->> Solver : Solve -Solver -> Job : Create job -activate Job + Solver ->> Job : Create job + activate Job -Solver -> FrontendLLBBridge : Create bridge over Job -activate FrontendLLBBridge + Solver ->> FrontendLLBBridge : Create bridge over Job + activate FrontendLLBBridge + Solver ->> FrontendLLBBridge : Solve -Solver -> FrontendLLBBridge : Solve - -alt definition-based solve - FrontendLLBBridge -> Job : Build - activate Job #FFBBBB - Job --> FrontendLLBBridge : Result - deactivate Job -else frontend-based solve - FrontendLLBBridge -> Frontend : Solve - activate Frontend #FFBBBB - note over FrontendLLBBridge, Frontend : Frontend must be either \ndockerfile.v0 or gateway.v0. - - loop - Frontend -[#SeaGreen]> FrontendLLBBridge : Solve - FrontendLLBBridge -[#SeaGreen]> Job : Build - activate Job #SeaGreen - note right FrontendLLBBridge : Implementations may also call\nFrontendLLBBridge to solve graphs\nbefore returning the result. - Job -[#SeaGreen]-> FrontendLLBBridge : Result + alt definition-based solve + FrontendLLBBridge ->> Job : Build + activate Job + Job -->> FrontendLLBBridge : Result deactivate Job - FrontendLLBBridge -[#SeaGreen]-> Frontend : Result + else frontend-based solve + FrontendLLBBridge ->> Frontend : Solve + activate Frontend + note over FrontendLLBBridge, Frontend : Frontend must be either
dockerfile.v0 or gateway.v0. + + loop + Frontend ->> FrontendLLBBridge : Solve + FrontendLLBBridge ->> Job : Build + activate Job + note over FrontendLLBBridge, Frontend : Implementations may also
call FrontendLLBBridge to
solve graphs before
returning the result. + Job -->> FrontendLLBBridge : Result + deactivate Job + FrontendLLBBridge -->> Frontend : Result + end + + Frontend -->> FrontendLLBBridge : Result + deactivate Frontend end - Frontend --> FrontendLLBBridge : Result - deactivate Frontend -end - -FrontendLLBBridge --> Solver : Result -Solver -> FrontendLLBBridge : Discard -deactivate FrontendLLBBridge + FrontendLLBBridge -->> Solver : Result + Solver ->> FrontendLLBBridge : Discard + deactivate FrontendLLBBridge -Solver -> Job : Discard -deactivate Job + Solver ->> Job : Discard + deactivate Job -Solver --> ControlServer : Result -ControlServer --> ControlClient : Result -@enduml + Solver -->> ControlServer : Result + ControlServer -->> ControlClient : Result ``` -> Diagram from - An important detail is that frontends may also issue solve requests, which are often definition-based solves, but can also be frontend-based solves, allowing for composability of frontends. Note that if a frontend makes a frontend-based @@ -110,56 +107,56 @@ the following: instructions and builds an LLB. The LLB is marshaled into a definition and sent in a solve request. -```plantuml -@startuml -participant Job -participant FrontendLLBBridge + +```mermaid +sequenceDiagram + participant Job + participant FrontendLLBBridge -box "Dockerfile frontend" - participant "Gateway Forwarder" as Frontend + # FIXME: use boxes with https://github.com/mermaid-js/mermaid/issues/1505 + # box "Dockerfile frontend" + participant Frontend as Gateway Forwarder participant BuildFunc -end box - -[-> FrontendLLBBridge : Solve -FrontendLLBBridge -> Frontend : Solve - -Frontend -> BuildFunc : Call -activate BuildFunc - -BuildFunc -[#SeaGreen]> FrontendLLBBridge : Solve -FrontendLLBBridge -[#SeaGreen]> Job : Build -activate Job #SeaGreen -note over Frontend : Solve to read Dockerfile from\nlocal context, git, or HTTP. -Job -[#SeaGreen]-> FrontendLLBBridge : Result -deactivate Job -FrontendLLBBridge -[#SeaGreen]-> BuildFunc : Result - -alt Dockerfile has syntax directive - BuildFunc -> FrontendLLBBridge : Solve - activate FrontendLLBBridge #FFBBBB - note over Frontend : Dockerfile delegates solve to\ngateway.v0 frontend. - FrontendLLBBridge --> BuildFunc : Result - deactivate FrontendLLBBridge -else Dockerfile has no syntax directive - BuildFunc -> FrontendLLBBridge : Solve - FrontendLLBBridge -> Job : Build - activate Job #FFBBBB - note over Frontend : Solve graph generated by\nDockerfile2LLB. - Job --> FrontendLLBBridge : Result + # end box + + # FIXME: use incoming messages with https://github.com/mermaid-js/mermaid/issues/1357 + Job ->> FrontendLLBBridge : Solve + FrontendLLBBridge ->> Frontend : Solve + + Frontend ->> BuildFunc : Call + activate BuildFunc + + BuildFunc ->> FrontendLLBBridge : Solve + FrontendLLBBridge ->> Job : Build + activate Job + note over Frontend : Solve to read
Dockerfile + Job -->> FrontendLLBBridge : Result deactivate Job - FrontendLLBBridge --> BuildFunc : Result -end + FrontendLLBBridge -->> BuildFunc : Result + + alt Dockerfile has syntax directive + BuildFunc ->> FrontendLLBBridge : Solve + activate FrontendLLBBridge #FFBBBB + note over Frontend : Dockerfile delegates
to gateway.v0 + FrontendLLBBridge -->> BuildFunc : Result + deactivate FrontendLLBBridge + else Dockerfile has no syntax directive + BuildFunc ->> FrontendLLBBridge : Solve + FrontendLLBBridge ->> Job : Build + activate Job + note over Frontend : Solved by
Dockerfile2LLB + Job -->> FrontendLLBBridge : Result + deactivate Job + FrontendLLBBridge -->> BuildFunc : Result + end -BuildFunc --> Frontend : Return -deactivate BuildFunc + BuildFunc -->> Frontend : Return + deactivate BuildFunc -Frontend --> FrontendLLBBridge : Result -FrontendLLBBridge -->[ : Result -@enduml + Frontend -->> FrontendLLBBridge : Result + FrontendLLBBridge -->> Job : Result ``` -> Diagram from - ## Gateway frontend (`gateway.v0`) The gateway frontend allows external frontends to be implemented as container @@ -185,67 +182,65 @@ following: 6. The container exits, and then the results are plumbed back to the LLB bridge, which plumbs them back to the client. -```plantuml -@startuml -participant Job -participant FrontendLLBBridge -participant "Gateway frontend" as Frontend -participant Worker -participant LLBBridgeForwarder -participant "Executor" as Executor -participant "Frontend Container" as Container - -[-> FrontendLLBBridge : Solve -FrontendLLBBridge -> Frontend : Solve -Frontend -> Worker : ResolveImageConfig -activate Worker #FFBBBB -Worker --> Frontend : Digest -deactivate Worker -Frontend -[#SeaGreen]> FrontendLLBBridge : Solve - -FrontendLLBBridge -[#SeaGreen]> Job : Build -activate Job #SeaGreen -note right of FrontendLLBBridge : The frontend image specified\nby build option "source" is solved\nand the rootfs of that image\nis then used to run the container. -Job -[#SeaGreen]-> FrontendLLBBridge : Result -deactivate Job - -FrontendLLBBridge -[#SeaGreen]-> Frontend : Result - -note over LLBBridgeForwarder : A temporary gRPC server is created\n that listens on the stdio of the\nfrontend container. The requests are\nthen forwarded to LLB bridge. -Frontend -> LLBBridgeForwarder : Create forwarder -activate LLBBridgeForwarder - -Frontend -[#MediumSlateBlue]> FrontendLLBBridge : Exec -FrontendLLBBridge -[#MediumSlateBlue]> Worker : Exec -Worker -[#MediumSlateBlue]> Executor : Exec - -Executor -[#MediumSlateBlue]> Container : Create container task -activate Container #MediumSlateBlue - -group container-based solve - note left of Container : Frontend images may request\ndefinition/frontend-based solves\nlike any other client. - loop - Container -> LLBBridgeForwarder : Solve - LLBBridgeForwarder -> FrontendLLBBridge : Solve - activate FrontendLLBBridge #FFBBBB - FrontendLLBBridge --> LLBBridgeForwarder : Result - deactivate FrontendLLBBridge - LLBBridgeForwarder --> Container : Result - end -end - -Container -[#MediumSlateBlue]-> Executor : Exit -deactivate Container - -Executor -[#MediumSlateBlue]-> Worker : Exit -Worker -[#MediumSlateBlue]-> FrontendLLBBridge : Exit -FrontendLLBBridge -[#MediumSlateBlue]-> Frontend : Exit -Frontend -> LLBBridgeForwarder : Discard -deactivate LLBBridgeForwarder - -Frontend --> FrontendLLBBridge : Result -FrontendLLBBridge -->[ : Result -@enduml + +```mermaid +sequenceDiagram + participant Job + participant FrontendLLBBridge + participant Frontend as Gateway frontend + participant Worker + participant LLBBridgeForwarder + participant Executor + participant Container as Frontend Container + + Job ->> FrontendLLBBridge : Solve + FrontendLLBBridge ->> Frontend : Solve + Frontend ->> Worker : ResolveImageConfig + activate Worker + Worker -->> Frontend : Digest + deactivate Worker + Frontend ->> FrontendLLBBridge : Solve + + FrontendLLBBridge ->> Job : Build + activate Job + note over FrontendLLBBridge, Frontend : The frontend image specified
by build option "source" is solved
and the rootfs of that image
is then used to run the container. + Job -->> FrontendLLBBridge : Result + deactivate Job + + FrontendLLBBridge -->> Frontend : Result + + note over LLBBridgeForwarder, Executor : A temporary gRPC server is created
that listens on stdio of frontend
container. Requests are then
forwarded to LLB bridge. + Frontend ->> LLBBridgeForwarder : Create forwarder + activate LLBBridgeForwarder + + Frontend ->> FrontendLLBBridge : Exec + FrontendLLBBridge ->> Worker : Exec + Worker ->> Executor : Exec + + Executor ->> Container : Create container task + activate Container #MediumSlateBlue + + rect rgba(100, 100, 100, .1) + note over Executor, Container : Frontend images may request
definition/frontend-based solves
like any other client. + loop + Container ->> LLBBridgeForwarder : Solve + LLBBridgeForwarder ->> FrontendLLBBridge : Solve + activate FrontendLLBBridge #FFBBBB + FrontendLLBBridge -->> LLBBridgeForwarder : Result + deactivate FrontendLLBBridge + LLBBridgeForwarder -->> Container : Result + end + end + + Container -->> Executor : Exit + deactivate Container + + Executor -->> Worker : Exit + Worker -->> FrontendLLBBridge : Exit + FrontendLLBBridge -->> Frontend : Exit + Frontend ->> LLBBridgeForwarder : Discard + deactivate LLBBridgeForwarder + + Frontend -->> FrontendLLBBridge : Result + FrontendLLBBridge -->> Job : Result ``` - -> Diagram from From 07357d69d7a6e841def10a2bbf516003ab93c8bc Mon Sep 17 00:00:00 2001 From: Justin Chadwell Date: Tue, 16 Aug 2022 12:48:26 +0100 Subject: [PATCH 7/7] docs: add bind mount output info to dockerfile->llb docs Signed-off-by: Justin Chadwell --- docs/dev/dockerfile-llb.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/docs/dev/dockerfile-llb.md b/docs/dev/dockerfile-llb.md index 00ba3890837b..1df0c853fc3d 100644 --- a/docs/dev/dockerfile-llb.md +++ b/docs/dev/dockerfile-llb.md @@ -157,14 +157,18 @@ your local state, which is specified through the `llb.Local` state object: llb.FollowPaths([]string{"."}), ) - st = st.Dir("/builder").Run( + execState = st.Dir("/builder").Run( llb.Shlex("pip install ."), llb.AddEnv( "PIP_INDEX_URL", "https://my-proxy.com/pypi", ), - llb.AddMount("/builder", localState) - ).Root() + ) + _ := execState.AddMount("/builder", localState) + // the return value of AddMount captures the resulting state of the mount + // after the exec operation has completed + + st := execState.Root() ``` ## Cache mounts