Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attestation-agent-config: generate attestation-agent config when createVM instance #1868

Merged
merged 6 commits into from
Jun 26, 2024

Conversation

huoqifeng
Copy link

@huoqifeng huoqifeng commented Jun 17, 2024

Fixes: #1852

  • Generate the attestation-agent toml file aa.toml when aaKBCParams provided

  • Use the cfg file to start attestation agent service when it exists

  • Start attestation agent service directly when no cfg file exists

  • remove aa_kbc_params in agent-config so that cdh won't read from it (this is tricky issue in CDH)

  • rename package agent to aa to reflect it's for attestation agent config

  • certs field will be added in aa config in enable kbs cert for cdh #1875 after we tried it out

    Signed-off-by: Qi Feng Huo [email protected]

@huoqifeng
Copy link
Author

@mkulke @stevenhorsman @liudalibj @bpradipt This is still in draft, I need some time to verify the code. Idea is to generate the agent-config.toml similar as cdh.toml. described in #1852

Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general. however, we were discussing in slack that we probably don't need to template the agent-config file anymore, after we have a config file for AA, since all the fields but AAKBCParams (which we wouldn't need anymore) are static and can be set at buildtime. what do you think?

src/cloud-api-adaptor/pkg/adaptor/cloud/cloud.go Outdated Show resolved Hide resolved
@huoqifeng huoqifeng force-pushed the agent-config branch 3 times, most recently from ae922b5 to 321b2a1 Compare June 18, 2024 01:51
@huoqifeng
Copy link
Author

looks good in general. however, we were discussing in slack that we probably don't need to template the agent-config file anymore, after we have a config file for AA, since all the fields but AAKBCParams (which we wouldn't need anymore) are static and can be set at buildtime. what do you think?

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

@mkulke
Copy link
Collaborator

mkulke commented Jun 18, 2024

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

AAKBCParams is already a parameter in configmap (and a cli param for the CAA binary), so we don't need to do anything here.

My reasoning was:

  • If we have a CDH and an AA config file, we don't need AAKBCParams the agent config anymore, because it's not used.
  • If we don't have an AAKbcParam in the agent config file it is completely static and the same in each deployment.
  • If the file is static, we don't need a cloud-config directive for it but can include it at build time.

@huoqifeng
Copy link
Author

@mkulke because AAKBCParams is a dynamic endpoint, which might be changed per cluster. means we need a PeerPod VM image per cluster if we build it in PeerPod VM image. Shall we make AAKBCParams configurable in configmap?

AAKBCParams is already a parameter in configmap (and a cli param for the CAA binary), so we don't need to do anything here.

My reasoning was:

* If we have a CDH and an AA config file, we don't need AAKBCParams the agent config anymore, because it's not used.

* If we don't have an AAKbcParam in the agent config file it is completely static and the same in each deployment.

* If the file is static, we don't need a cloud-config directive for it but can include it at build time.

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

@mkulke
Copy link
Collaborator

mkulke commented Jun 18, 2024

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

we already do that for cdh.toml, for AA atm, we use agent-config.toml, which is a workaround for when there was no AA config file yet. now AA supports a config file, and we also want to use it because the ${kbc}::${uri} schema of AAKBCParams is not sufficient anymore, we also need to include a kbs certificate in the config.

@huoqifeng
Copy link
Author

@mkulke which is reasonable to me, I like this approach. How does the CDH and AA use AAKBCParams which is set in k8s configmap? Is some changes happening in guest-components?

we already do that for cdh.toml, for AA atm, we use agent-config.toml, which is a workaround for when there was no AA config file yet. now AA supports a config file, and we also want to use it because the ${kbc}::${uri} schema of AAKBCParams is not sufficient anymore, we also need to include a kbs certificate in the config.

OK, my understanding is we need generate an attestation-agent.toml rather than a kata-agent.toml (named agent-config.toml today) in cloud-api-adaptor side.

@mkulke
Copy link
Collaborator

mkulke commented Jun 18, 2024

OK, my understanding is we need generate an attestation-agent.toml rather than a kata-agent.toml (named agent-config.toml today) in cloud-api-adaptor side.

yup, exactly

note: there is a nuisance when using a aa.toml file for CAA currently: an aa config file is only valid if it has a token_config entry, which only makes sense for cc_kbc, at the moment. for offline_fs_kbc you cannot specifiy --config, which means we need some machinery to do that:

/usr/bin/attestation-agent "${AA_CONFIG_FILE+--config $AA_CONFIG_FILE}"

@huoqifeng huoqifeng force-pushed the agent-config branch 4 times, most recently from fe61999 to 09fae96 Compare June 20, 2024 05:03
@huoqifeng
Copy link
Author

huoqifeng commented Jun 20, 2024

@mkulke I found we still need set cc_kbc::http://IP:Port in file /etc/agent-config.toml because confidential-data-hub is still using it. If aa_kbc_params = "offline_fs_kbc::null" set in file /etc/agent-config.toml, cdh will report error like:

Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]: [2024-06-20T07:52:24Z ERROR ttrpc_cdh::ttrpc_server] [ttRPC CDH] GetResource :
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     get resource failed
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     Caused by: Kbs client error: offline-fs-kbc: resource not found one/two/key

While it's OK if files /etc/agent-config does not exist at all.
Anything we have not completed yet in confidential-data-hub configuration?

Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! couple of nits

src/cloud-api-adaptor/pkg/agent/config.go Outdated Show resolved Hide resolved
src/cloud-api-adaptor/pkg/agent/config.go Outdated Show resolved Hide resolved
@huoqifeng
Copy link
Author

etc/agent-config

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

@mkulke
Copy link
Collaborator

mkulke commented Jun 20, 2024

@mkulke I found we still need set cc_kbc::http://IP:Port in file /etc/agent-config.toml because confidential-data-hub is still using it. If aa_kbc_params = "offline_fs_kbc::null" set in file /etc/agent-config.toml, cdh will report error like:

Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]: [2024-06-20T07:52:24Z ERROR ttrpc_cdh::ttrpc_server] [ttRPC CDH] GetResource :
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     get resource failed
Jun 20 07:52:24 podvm-nginx-nydus-54c6f8c6cc-nm6td-e21b40ee confidential-data-hub[2825]:     Caused by: Kbs client error: offline-fs-kbc: resource not found one/two/key

While it's OK if files /etc/agent-config does not exist at all. Anything we have not completed yet in confidential-data-hub configuration?

does CDH use the config file? In CAA we've been using the config file for CDH for a while now and it shouldn't rely on the agent-config.toml any more. This is the respective logic in CDH.

@mkulke
Copy link
Collaborator

mkulke commented Jun 20, 2024

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

@huoqifeng
Copy link
Author

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

I did not find the code yet, it's confusion because I saw CDH logs like this and the function does not work if /etc/agent-config.toml exists.

[2024-06-20T09:08:13Z INFO  ttrpc_cdh] Use configuration file /run/confidential-containers/cdh.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z INFO  ttrpc_cdh] [ttRPC] Confidential Data Hub starts to listen to request: unix:///run/confidential-containers/cdh.sock

@huoqifeng
Copy link
Author

Sounds like CDH lost a function - Never use /etc/agent-config if a config file provided by -c like /usr/local/bin/confidential-data-hub -c /run/confidential-containers/cdh.toml

where do you see that CDH is falling back to kata-agent-config?

I did not find the code yet, it's confusion because I saw CDH logs like this and the function does not work if /etc/agent-config.toml exists.

[2024-06-20T09:08:13Z INFO  ttrpc_cdh] Use configuration file /run/confidential-containers/cdh.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] get aa_kbc_params from file
[2024-06-20T09:08:13Z DEBUG attestation_agent::config::aa_kbc_params] reading agent config from /etc/agent-config.toml
[2024-06-20T09:08:13Z INFO  ttrpc_cdh] [ttRPC] Confidential Data Hub starts to listen to request: unix:///run/confidential-containers/cdh.sock

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130
https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

@mkulke
Copy link
Collaborator

mkulke commented Jun 20, 2024

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130 https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

Thanks, for searching, I didn't spot it myself. I suspect this function is dead code, but I'm not 100% sure.

this fn sets AA_KBC_PARAMS + KBS_CERT envs as a side-effect and will do nothing else:

pub fn set_configuration_envs(&self) {...}

I was suspecting that image-rs is using it for encrypted images, but it doesn't seem to be used in guest-components:

/dev/guest-components (main)$ git grep AA_KBC_PARAMS
attestation-agent/attestation-agent/src/config/aa_kbc_params.rs:    if let Ok(params) = env::var("AA_KBC_PARAMS") {
confidential-data-hub/README.md:* **AA_KBC_PARAMS** environment variable
confidential-data-hub/hub/src/bin/config/mod.rs:                "AA_KBC_PARAMS",

@mkulke
Copy link
Collaborator

mkulke commented Jun 20, 2024

Sounds like https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L130 https://github.com/confidential-containers/guest-components/blob/main/attestation-agent/attestation-agent/src/config/aa_kbc_params.rs#L57

Thanks, for searching, I didn't spot it myself. I suspect this function is dead code, but I'm not 100% sure.

this fn sets AA_KBC_PARAMS + KBS_CERT envs as a side-effect and will do nothing else:

pub fn set_configuration_envs(&self) {...}

I was suspecting that image-rs is using it for encrypted images, but it doesn't seem to be used in guest-components:

/dev/guest-components (main)$ git grep AA_KBC_PARAMS
attestation-agent/attestation-agent/src/config/aa_kbc_params.rs:    if let Ok(params) = env::var("AA_KBC_PARAMS") {
confidential-data-hub/README.md:* **AA_KBC_PARAMS** environment variable
confidential-data-hub/hub/src/bin/config/mod.rs:                "AA_KBC_PARAMS",

Note: AA_KBC_PARAMS env does seem to be required in the constructor of CDH's kbs kms plugin.

@huoqifeng
Copy link
Author

Yeah, I think which comes from the CDH config if agent-config.toml does not provide it, https://github.com/confidential-containers/guest-components/blob/main/confidential-data-hub/hub/src/bin/config/mod.rs#L133-L136, that's why I dropped it in this commit fc0a04e. @mkulke

@huoqifeng huoqifeng marked this pull request as ready for review June 20, 2024 13:13
@huoqifeng
Copy link
Author

I tried it on s390x by building the image following as below:

  1. Build caa image
cd src/cloud-api-adaptor
ARCHES=linux/s390x make image
  1. Build PodVM image
cd podvm-mkosi
make fedora-binaries-builder
ATTESTER=se-attester make binaries
make image-debug
  1. Deploy a KBS service following https://github.com/confidential-containers/trustee/blob/main/attestation-service/verifier/src/se/README.md
  2. Try it in a k8s cluster by using the caa and PodVM image
  3. Create a PeerPod instance and retrieve a secret
# kubectl exec -it nginx-nydus-54c6f8c6cc-xzxzg -- /bin/sh
# 
# 
# 
# curl http://127.0.0.1:8006/cdh/resource/one/two/key
some random characters....
Sat Jun 15 03:06:01 UTC 2024
# exit

@huoqifeng huoqifeng requested a review from liudalibj June 20, 2024 13:19
Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process-user-data will not start on mkosi x86 images currently (and hence there is no daemon.json provisioned, because this overlay issues an update-agent-config command, that is not available any more:

mkosi.skeleton/usr/lib/systemd/system/process-user-data.service.d/10-override.conf

@huoqifeng
Copy link
Author

process-user-data.service.d

My fault, update-agent-config is not used any more because agent-config.toml is static and aa.toml, cdh.toml can be handled by provision-files. Dropped a fix.

Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what we provision when we specify --aa-kbc-params offline_fs_kbc::null

cat /run/peerpod/aa.toml
[token_configs]
[token_configs.coco_as]
url = ''

[token_configs.kbs]
url = 'null'

not sure if that makes sense, it probably doesn't, but it might also not hurt at the moment. I think what we could do is just provision that file if aa-kbc-params is cc_kbc::* from the CAA side, because the config file doesn't cover anything else yet.

@huoqifeng
Copy link
Author

this is what we provision when we specify --aa-kbc-params offline_fs_kbc::null

cat /run/peerpod/aa.toml
[token_configs]
[token_configs.coco_as]
url = ''

[token_configs.kbs]
url = 'null'

not sure if that makes sense, it probably doesn't, but it might also not hurt at the moment. I think what we could do is just provision that file if aa-kbc-params is cc_kbc::* from the CAA side, because the config file doesn't cover anything else yet.

Indeed, aa.toml, not sure cdh.toml should follow same pattern?

@huoqifeng
Copy link
Author

huoqifeng commented Jun 25, 2024

cc_kbc::*

With the fix in confidential-containers/guest-components#599, I think we can leave the code as is, I will ad a log for s.aaKBCParams and pick the guest-components new commit df60725afe0ba452a25a740cf460c2855442c49a

Copy link
Collaborator

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks!

Tested the following combinations on azure SNP & TDX cvms:

  • empty aa_kbc_params
  • aa_kbc_params=offline_fs_kbc
  • aa_kbc_params=cc_kbc::$some_kbs

logged into the podvm to verify that AA, CDH and ASR services run properly and retrieved a confidential resource from KBS.

Copy link
Member

@stevenhorsman stevenhorsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is general fine, but I'd like to see the lbivirt e2e tests run before merging.

@huoqifeng huoqifeng changed the title agent-config: generate attestation-agent config when createVM instance attestation-agent-config: generate attestation-agent config when createVM instance Jun 25, 2024
Qi Feng Huo added 5 commits June 25, 2024 16:31
attestation-agent-config: generate and use attestation-agent config toml
    - Generate the attestation-agent toml file aa.toml when aaKBCParams provided
    - Use the cfg file to start attestation agent service when it exists
    - Start attestation agent service directly when no cfg file exists
    - remove aa_kbc_params in agent-config so that cdh won't read from it
    - rename package agent to aa to reflect the real config

Signed-off-by: Qi Feng Huo <[email protected]>
- remove process-user-data.service.d in mkosi because update-agent-config is not needed after we generated aa.toml and cdh.toml dynamically
- kata-agent-config.toml is static config file now

Signed-off-by: Qi Feng Huo <[email protected]>
- start cdh using config conditionally in cdh service

Signed-off-by: Qi Feng Huo <[email protected]>
- add aa.toml in in process-user-data provision list

Signed-off-by: Magnus Kulke <[email protected]>
Signed-off-by: Qi Feng Huo <[email protected]>
- Use same config.TokenCfg.CocoAs.URL as config.TokenCfg.Kbs.URL
- Added log for aaKBCParams
- Updated guest-components to df60725afe0ba452a25a740cf460c2855442c49a to pick cc_kbs cfg fix

Signed-off-by: Qi Feng Huo <[email protected]>
Copy link
Member

@liudalibj liudalibj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @huoqifeng
Tested this pr with s390x fedora image.
Both sample and ibm.se attestation drivers work as expected.

@liudalibj
Copy link
Member

liudalibj commented Jun 25, 2024

I tried to build podvm ubuntu amd64 image, from this PR.

  • quay.io/liudalibj/podvm-generic-ubuntu-amd64:61a7406743c9ad6915b7dfc1fd3217cb5e581235b8791d791aaf9f669255fd52
make podvm-builder
make podvm-binaries
make podvm-image

with the built out podvm-image I can create a peerpod succeed.
But when I try to use this podvm-image with the sample kbs, cc_kbc::http://ip:8080, the attestation-agent.service is failed to start in the created peerpod vm.

The related log from journalctl

...
Jun 25 08:40:28 ubuntu systemd[1]: Installed transient /etc/machine-id file.
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found ordering cycle on attestation-agent.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Job attestation-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found ordering cycle on kata-agent.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: multi-user.target: Job kata-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:28 ubuntu systemd[1]: Created slice system-modprobe.slice.
...
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd-timesyncd[561]: Network configuration changed, trying to establish connection.
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found ordering cycle on kata-agent.service/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found dependency on cloud-final.service/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: multi-user.target: Job kata-agent.service/start deleted to break ordering cycle starting with multi-user.target/start
Jun 25 08:40:33 podvm-nginx-nydus-54c6f8c6cc-dz74n-6f4c65e4 systemd[1]: agent-protocol-forwarder.service: Scheduled restart job, restart counter is at 1.
...

@huoqifeng it seems that there are cycle decency on multi-user.target, attestation-agent.service and kata-agent.service maybe we need do a fix, the e2e-test case TestLibvirtKbsKeyRelease is failed because this.

@mkulke
Copy link
Collaborator

mkulke commented Jun 25, 2024

oh, I only tested mkosi on x86_64 (i.e. without cloud-config)

So assuming you tested mkosi + cloud-config, and that worked, we probably should look into the delta of packer vs mkosi + cloud-config

@liudalibj
Copy link
Member

@huoqifeng it seems that there are cycle decency on multi-user.target, attestation-agent.service and kata-agent.service maybe we need do a fix, the e2e-test case TestLibvirtKbsKeyRelease is failed because this.

Verified: replace the cloud-final.service to cloud-init.service, can fix the cycle dependency issue @huoqifeng @mkulke FYI.

With this replace fix, packer ubuntu amd64 peerpod + cloud-init + libvirt + sample kbc, works fine.

fix cycle dependnecy between systemd services by use cloud-init instead cloud-final to avoid cycle dependency in service files

Signed-off-by: Qi Feng Huo <[email protected]>
@huoqifeng
Copy link
Author

huoqifeng commented Jun 25, 2024

@mkulke not sure whether worth for you to have another try with the latest fix (hopefully last :-) ) here 1fbc636.

@mkulke
Copy link
Collaborator

mkulke commented Jun 25, 2024

@mkulke not sure whether worth for you to have another try with the latest fix (hopefully last :-) ) here 1fbc636.

I think that shouldn't impact mkosi on x86_64, let's wait for the libvirt test

@huoqifeng huoqifeng merged commit 0403602 into confidential-containers:main Jun 26, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test_e2e_libvirt Run Libvirt e2e tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

aa-kbc-params is not customized in agent-config.toml for libvirt provider in fedora
4 participants