-
Notifications
You must be signed in to change notification settings - Fork 195
CCv0: Discussion options for removing the CC branch and running confidential-containers out of kata-containers main
#4441
Comments
Hi @stevenhorsman , just brainstorming here...maybe we don't need to duplicate the CI jobs. Keep the current Kata jobs building and testing the "normal" Kata Containers; then add just one new job for CC specific tests. Suppose that the new job configuration is called
The makefile target |
Sorry, I'm probably missing something obvious, but how do we test all ~12 os/cri/hypervisor combinations that Kata supports in this way if we only add 1 new job config? |
It was just an example. The way that Kata CI is designed, one need to created entries on |
Update: on the Architecture Committee meeting today (15th Feb) we brought up the issue of getting confidential containers code merged into |
Wainer has also suggested that go plugins might be able to help as an alternative to build tags |
I think the single release option is no longer being considered, so I've removed it from the description for clarity, but I'll paste it here for reference: Single releaseDescription
Pros:
Cons:
Common issues:Other
Questions
|
Thanks for raising this @stevenhorsman. A few thoughts and questions. Note: these are not directed at you, but at the entire community... Assumptions
ifdef build solutionsIf any sort of Containerd forkIt would be useful if we could summarise:
Avoid confusionHow is all this going to be documented? It's seems odd to document CC functionality in Kata repos, but if that's where the code lives, we will have to atleast create a set of links back to the actual docs in an appropriate CC repo. Unless handled carefully, it's going to be rather unclear to users and contributors why there are two separate GitHub organisations imho. Merge periodicityIs this a "one off" merge or is there a possibility of further CC code syncs into Kata? If there is a possibility of this happening (and I can imagine there is), we need to consider some sort of merge window rules now. For example, we don't want to land a lot of CC changes in Kata close to the date of new Kata releases (even if the CC code isn't being actively used by Kata). Maybe something as simple as this would suffice?:
We could also consider something like this:
/cc @kata-containers/architecture-committee for further input. |
Thanks so much for the feedback @jodh-intel - I really appreciate the feedback and this is something that we all have to do together and be happy with if we are going to be successful. In the Kata CC meeting, I took an action to break this issue down into separate issues for different components, but I think it is worth having the discussion here first as it might get some of the higher-level topics dealt with first before we worry about the specific code merges. I definitely don't have all the answers and as you've said it's a discussion for the whole community, but I'll give my initial reaction to some of the questions raised.
I agree. I think we'd prefer to minimise the '#ifdef'-ing of code (apart from in the containerd section which we can't currently avoid). This would mean that we'd just have a single version of the agent which would pull into the extra crates required for image offload. I hope that the footprint of these won't be too large, but it's something we'll want to work with some performance specialists to check before we merged anything.
Yep - if we can restrict the
I'm not an expert on all the details, so @wllenyj will be able to give more details, but here is my basic understanding:
I think the simple answer here is that we need to modify the containerd code to enable us to pass the pull_image request through the shim and to the agent rather than being run on the host, in order to enable image offload.
I think the approach that the Ali team have taken is to create a new image service and pull image command in the shim and implemented a CC CRI service that handles the routing of the commands. I'm sure this is probably not completely accurate. For reference the full changes can be seen at: confidential-containers/containerd@main...confidential-containers:CC-main
My current thinking is that we keep the CC fork of containerd tightly aligned with the version that 'vanilla Kata' is using, so when Kata updates we will rebase the CC fork on that version. That minimises the code differences in Kata and tries to ensure we don't get left too far behind.
That's the $64,000 question. We initially hoped that containerd would have accepted our changes, or at least ideas by now and avoid this problem, but from the discussions that Samuel et al. have had with Derek and co. it seems like they have a grander to re-architect containerd to provide the ability to offload most of the containerd services when it starts. This is way beyond the scope of what we need, but means that they are not going to merge our fork and their new architecture won't be ready in the short term (my impression is we are looking at closer to 12 months than 1, or 2, but I'm definitely not 'in the know'). That's how long we are might have to maintain the fork, which we are pretty much committed to for CC, as we don't have a solution without it, but it's up to the wider Kata community to decide if they are comfortable with this.
At the moment, some of the limited doc we have is in the kata
This depends on the Kata Architecture committee and wider community, but the way I would like to see it is that we'd have a few smaller scoped PRs to get the existing As I said - this is my own thoughts and probably not representative of the CC community and people who are much smarter and more experienced in this than me. Maybe @sameo @fitzthum @magowan, or @ariel-adam can elaborate further and correct me? |
Thanks @stevenhorsman. A few more questions: containerd forkfwics the ideal would be for upstream containerd to allow us to configure it to:
Although these features wouldn't generally be useful to most, they don't look that contentious. Is the problem then that containerd don't want us to spend time on a containerd feature that would (have to) be dropped when their new architecture lands? Is there a middle ground I wonder where we could create a new containerd plugin for the features we'd like? It looks like containerd already supports these but maybe the scope of that interface might need to be expanded to support the features we need? Short-term alternatives?Random thought: containerd is but one container manager. If we are going to have to wait for the upstream grand redesign work to be finished, to avoid having to maintain a fork of containerd ourselves (and all the costs and complications that brings with it), could we use an alternative such as podman or even Docker (once we have a viable shimv2 "wrapper") as a stop-gap solution? Or are there other specific containerd features we need for Kata+CC? |
@stevenhorsman, I'd explicitly add the need of having a |
@jodh-intel For them, there is also the problem of having to maintain a specific feature they can't really test, and that also conflicts with their changes in progress. |
+1 to @fidencio's idea of having separate If we end up having to use the "ifdef CC" build approach, we'll also need a similar set of checks for the CI ( MetricsAs discussed in the AC meeting earlier today, we can "suck it and see" by raising a PR and just reviewing what the metrics CI thinks of it. If we have a single image that includes the CC elements, we may need to adjust the thresholds but if we are talking about separate images, we might need a separate set of CC metrics thresholds. /cc @GabyCT who might have thoughts on this. |
@jodh-intel currently for CC there is not metrics CI, in case that you are testing with CC elements you will need a new baremetal to test and enable that CI as probably the thresholds will be different |
Whilst the conversation is on-going I've started the process of breaking down this into separate issues for different areas that we might need to separate and merge. This a still WIP and likely to change before we can run with them:
|
Hi @stevenhorsman,
Am I wrong assuming that kata-containers/kata-containers#3971 can be the first task we should focus on? IIUC merging the CC agent/guest into main doesn't break the non-CC use case of Kata, although CC specific features (e.g. pull image inside the guest) won't be working until the runtime and containerd are changed. The current chain of dependencies seems to be: confidential-containers/guest-components#5 -> (release image-rs x.y so we stop pulling from main?) -> kata-containers/kata-containers#3970 -> kata-containers/kata-containers#3971 -> #4626 -> #4627 -> #4628
|
@wainersm - Firstly, thanks for taking a look at the issues and giving feedback - it's really appreciated. I also have a confession that once I read your flow I realised that I should have split out merging the runtime and integration tests into separate issues for better clarity, so have both kata-containers/kata-containers#3996 and #4628 now. Sorry if that makes you comments make less sense
Sort of. I agree on the dependencies: confidential-containers/image-rs#5 -> (release image-rs x.y so we stop pulling from main?) -> kata-containers/kata-containers#3970 -> kata-containers/kata-containers#3971 -> kata-containers/kata-containers#3996 -> #4628 (also dependant on #4627) However my thinking (which might be flawed) is that we can make progress on #4626, then #4627 before the agent and runtime changes are merged. If we did this then they would be testing regression scenarios from 'vanilla' Kata with the forked containerd installed, rather than new 'confidential-containers' integration tests, but I think there is still value in this. Also have #4626 might be useful for us if we want to try out only adding the attesation-agent to the guest image if we are creating the 'CC' build (which might be required depending on how much it bloats the image.) I hope that splitting the flow into two slightly parallel streams helps speed up our progress, but there might well be something I've failed to take into account? |
@stevenhorsman I think your comments on #4441 (comment) made the plan very clear to me. IMO that's detailed enough to allow us kick off some changes on |
Linking this issue with kata-deploy: Provide a kata-deploy image to be used with the Confidential Containers operator |
@wainersm - I was thinking of taking #4627 as it's mostly just a port of the code I wrote in CCv0, so would simplify the DCO sign-off. I think #4628 is obviously a lot larger and will require us to sync up depending on when it is unblocked, but I'm (or someone on my team) happy to be involved in that and #4626 if there is anything we can help with! |
I just reassigned #4627 to you as it makes complete sense. |
So just an update to the merge of |
Thanks for the details update @stevenhorsman ! I wasn't able to join that meeting but I think the decision the community took was the best one. |
So to rehydrate this discussion 7 months on with a few thing changing I think we have the following high-level items to resolve/split up and merge before we can completely be free of the
I'm sure there are some other items, so keep me honest and I'll try and update this list. As we get more uniform agreement we can covert them into issues. |
Hi @stevenhorsman , so minor changes that came to my mind while in the meeting today:
|
Thanks, I'll update the list above to try and reflect this |
@stevenhorsman is there an issue in the kata repo where the work to migrate from the forked containerd to upstream containerd is being tracked? |
Hey Suraj, not that I'm aware of. I can start one tomorrow if that's helpful, or anyone else can for that matter and I'll link to it in the above list. |
@stevenhorsman kata-containers/kata-containers#6067 created an issue. I don't have a lot of context on what needs to be done there other than just switching from the forked version to the latest beta. But I have created the tracking issue. Also that issue needs to have the label |
For the confidential containers project our end goal was always to merge code into the
main
branches of kata-containers and tests, but whilst we were doing PoCs in CCv0 we decided the simplest approach was to work on a separate branch. As we entered the 'CCv1' roadmap we thought that that fact we were using a containerd fork blocked this, but as the burden of managing two development branches is continuing we want to give it some attention and work out what the options are and what is viable.Here is the updated plan from my perspective, but it might well be wrong, or incomplete, so other input is welcome:
Our favourite suggestion so far is to have all the content for 'normal' kata and 'CC' kata coming from the same
main
branch, but provide a build config to create different binaries for the different versions.Description
tests/.ci/install_kata_image.sh
Lines 52 to 54 in d70430e
version.yaml
, similar to https://github.com/kata-containers/kata-containers/blob/46522a3e46fcb44ce8bb22c35da31f7ff86329fd/versions.yaml#L187-L198 which points to the fork and use the logic fromtests/.ci/install_cri_containerd.sh
Lines 53 to 68 in d70430e
containerd-cc
CCv0
changes into manageable chunks, ensuring that they have the appropriate level of testing required and creating PRs to go intomain
(carefully ensuring we keep the DCO history and don't break theCCv0
branch).tests
repo, and/or the 'guest'/agent side onceimage-rs
has been added without the containerd runtime changes. At this point it won't be much use (though we can add agent component tests using agent-ctl maybe?), but will close the gap at least and get us starting the conversation with the wider kata communityThe text was updated successfully, but these errors were encountered: