Updating the threat list

Signed-off-by: ytimocin <[email protected]>
radius-project · Sep 27, 2024 · 3498c5b · 3498c5b
1 parent 60e63e4
commit 3498c5b
Show file tree

Hide file tree

Showing 4 changed files with 117 additions and 79 deletions.
diff --git a/.github/config/en-custom.txt b/.github/config/en-custom.txt
@@ -816,4 +816,11 @@ CustomResourceDefinitions
 UCPD
 interactor
 Reconcilers
-reconcilations
+reconcilers
+reconcilations
+Kubebuilder
+crypto
+cryptographically
+SHA
+sha
+rfc
diff --git a/architecture/2024-08-controller-component-threat-model.md b/architecture/2024-08-controller-component-threat-model.md
@@ -50,13 +50,13 @@ The Controller component consists of several key parts:
 
    1. **Purpose**: The purpose of computing the hash of the configuration of the deployment resource is to compare and determine if the deployment is up-to-date or needs an update.
    2. **Library**: The library used to calculate the hash of the deployment configuration is the crypto library, which is one of the standard libraries of Go: [Link to library](https://pkg.go.dev/[email protected]).
-   3. **Type**: [SHA1](https://www.rfc-editor.org/rfc/rfc3174.html). Note: "SHA-1 is cryptographically broken and should not be used for secure applications." [Link to warning](https://pkg.go.dev/crypto/[email protected]).
+   3. **Type**: [SHA1](https://www.rfc-editor.org/rfc/rfc3174.html). Note: "SHA-1 is cryptographically broken and should not be used for secure applications." [Link to warning](https://pkg.go.dev/crypto/[email protected]). This is used as an optimization for detecting changes, not as a security protection.
 
 2. **Hashing the Secret Data**: [Link to code](https://github.com/radius-project/radius/blob/8151a96665b7f5bcd6474f5e33aff35d01adfa5a/pkg/controller/reconciler/deployment_reconciler.go#L580).
 
    1. **Purpose**: We hash the secret data and add it to the Pod definition to determine if the secret has changed in an update.
    2. **Library**: The library used to calculate the hash of the secret is the crypto library, which is one of the standard libraries of Go: [Link to library](https://pkg.go.dev/[email protected]).
-   3. **Type**: [SHA1](https://www.rfc-editor.org/rfc/rfc3174.html). Note: "SHA-1 is cryptographically broken and should not be used for secure applications." [Link to warning](https://pkg.go.dev/crypto/[email protected]).
+   3. **Type**: [SHA1](https://www.rfc-editor.org/rfc/rfc3174.html). Note: "SHA-1 is cryptographically broken and should not be used for secure applications." [Link to warning](https://pkg.go.dev/crypto/[email protected]). This is used as an optimization for detecting changes, not as a security protection.
 
 #### Storage of secrets
 
@@ -67,7 +67,7 @@ Below you will find where and how Radius stores secrets. We create Kubernetes Se
 
 #### Data Serialization / Formats
 
-We do not use custom parsers and instead rely on Kubernetes built-in parsers. Therefore, we trust Kubernetes security measures to handle data serialization and formats securely.
+We use custom parsers to parse Radius-related resource IDs and do not use any other custom parsers and instead rely on Kubernetes built-in parsers. Therefore, we trust Kubernetes security measures to handle data serialization and formats securely. The custom parser that parses Radius resource IDs has its own security mechanisms that don't accept anything other than a Radius resource ID.
 
 ### Clients
 
@@ -131,53 +131,107 @@ This threat model assumes that:
 
 ### Threats
 
-#### Threat: Users with access to the webhook server modifying the behavior of the webhook server
+#### Spoofing UCP API Server Could Cause Information Disclosure and Denial of Service
 
-A user with access to the webhook server can modify its behavior to approve malicious requests, which can be reconciled in the next step of the data flow. Reconciliation loops can also trigger resource updates or even deletions by calling UCPD with the malicious requests.
+**Description:** If a malicious actor can spoof the UCP API Server by tampering with the configuration in the Controller, the Controller will start sending requests to the malicious server. The malicious server can capture the traffic, leading to information disclosure. This would effectively disable the Controller, causing a Denial of Service.
 
-**Impact**:
+**Impact:** All data sent to UCP by the Controller will be available to the malicious actor, including payloads of resources in the applications. The functionality of the Controller for managing resources will be disabled. Users will not be able to deploy updates to their applications.
 
-1. **Unauthorized Operations**: With the approval of malicious requests, unauthorized operations could be performed on user resources, including the creation, update, or deletion of resources.
-2. **Resource Deletion**: Malicious modifications could lead to the deletion of existing resources, causing potential data loss and service disruption.
+**Mitigations:**
 
-**Mitigation**:
+1. Tampering with the controller code, configuration, or certificates would require access to modify the `radius-system` namespace. Our threat model assumes that the operator has limited access to the `radius-system` namespace using Kubernetes' existing RBAC mechanism.
+2. The resource payloads sent to UCP by the Controller do not contain sensitive operational information (e.g., passwords).
 
-1. **Audit Logs**:
+**Status:** All mitigations listed are currently active. Operators are expected to secure their cluster and limit access to the `radius-system` namespace.
 
-   - **Description**: Implement detailed audit logging to track which user performs which operation on the webhook server. Regularly review these logs to detect any unauthorized or suspicious activities.
-   - **Status**:
+#### Spoofing the Kubernetes API Server Leading to Escalation of Privilege
 
-2. **RBAC (Role-Based Access Control)**:
+**Description:** If a malicious actor could hijack communication between the controller and the Kubernetes API Server, the actor could send requests to the controller. At that point, the controller would be processing illegitimate data.
 
-   - **Description**: Implement strict RBAC policies to ensure that only authorized users have the necessary permissions to access and modify the webhook server. This minimizes the risk of unauthorized access and modifications.
-   - **Status**:
+**Impact:** A malicious actor could use the controllers (Recipe and/or Deployment) to escalate privileges and perform arbitrary operations against Radius/UCP.
 
-#### Threat: Webhook server being unavailable or slow to respond
+**Mitigations:**
 
-If the webhook server becomes unavailable or slow to respond, it can lead to delays or failures in processing requests. This may not be a direct security issue but it can affect the overall reliability of the system.
+1. The controllers authenticate requests to the Kubernetes API Server using credentials managed and rotated by Kubernetes. Our threat model assumes that the API Server and mechanisms like Kubernetes-managed authentication are not compromised.
+2. The webhook follows a known Kubernetes implementation pattern and uses widely supported libraries to communicate (client-go, Kubebuilder).
+3. Tampering with the controller code, configuration, or authentication tokens would require access to modify the `radius-system` namespace. Our threat model assumes that the operator has limited access to the `radius-system` namespace using Kubernetes' existing RBAC mechanism.
 
-**Impact**:
+**Status:** All mitigations listed are currently active. Operators are expected to secure their cluster and limit access to the `radius-system` namespace.
 
-**Mitigation**:
+#### Spoofing Requests to the Validating Webhook
+
+**Description:** If a malicious actor could circumvent webhook authentication, they could send unauthorized requests to the webhook.
+
+**Impact:** The webhook performs validation only and does not mutate any state. The security impact of spoofing is unclear, but it could potentially lead to unauthorized actions being validated.
+
+**Mitigations:**
+
+1. The webhook authenticates requests (mTLS) from the Kubernetes API Server using a certificate managed and rotated by Kubernetes. Our threat model assumes that the API Server and mechanisms like Kubernetes-managed certificates are not compromised.
+2. The webhook follows a known Kubernetes implementation pattern and uses widely supported libraries to implement mTLS (Kubebuilder).
+3. Tampering with the webhook code, configuration, or certificates would require access to modify the `radius-system` namespace. Our threat model assumes that the operator has limited access to the `radius-system` namespace using Kubernetes' existing RBAC mechanism.
+
+**Status:** All mitigations listed are currently active. Operators are expected to secure their cluster and limit access to the `radius-system` namespace.
+
+#### Denial of Service Caused by Invalid Request Data
+
+**Description:** If a malicious actor sends a malformed request that triggers unbounded execution on the server.
+
+**Impact:** A malicious actor could cause a denial of service or waste compute resources.
+
+**Mitigations:**
+
+1. The controllers and webhooks use widely supported libraries for all parsing of untrusted data in standard formats.
+   1. The Go standard libraries are used for HTTP.
+   2. The Kubernetes YAML libraries are used for YAML parsing.
+2. Radius/UCP implements a custom parser for resource IDs, a custom string format. This requires fuzz-testing.
+
+**Status:** All mitigations listed are currently active. Operators are expected to secure their cluster and limit access to the `radius-system` namespace.
+
+#### Information Disclosure by Unauthorized Access to Secrets
+
+**Description:** A malicious actor could circumvent Kubernetes RBAC controls and gain unauthorized access to Kubernetes secrets managed by Radius. These secrets may contain sensitive information, such as credentials intended for use by applications.
+
+**Impact:** A malicious actor could gain access to sensitive information.
+
+**Mitigations:**
+
+1. Secret data managed by the controllers is stored at rest in Kubernetes secrets. Our threat model assumes that the API server and mechanisms like Kubernetes authentication/RBAC are not compromised.
+2. Secrets managed by Radius are always placed in the same namespace as the object that "owns" them. This is a requirement of the Kubernetes RBAC model.
+3. Secrets managed by Radius are subject to the Kubernetes RBAC model for controlling access. Operators are expected to limit access for users using existing tools.
+
+**Status:** All mitigations listed are currently active. Operators are expected to secure their cluster and limit access for users.
+
+#### Escalation of Privilege by Using Radius to Circumvent Kubernetes RBAC Controls
+
+**Description:** A malicious actor could circumvent Kubernetes RBAC controls and create arbitrary resources in Kubernetes by using the `Recipe` custom resource.
+
+The `Recipe` controller has limited permissions, so it cannot be used directly to escalate privileges in Kubernetes. However, it calls into UCP/Radius, which operates with a wide scope of permissions in Kubernetes and the cloud.
+
+Authorized users with access to create a `Recipe` resource in Kubernetes can execute any Recipe in any Environment registered with Radius.
+
+At the time of writing, Radius does not provide granular authorization controls. Any authenticated client can create any Radius resource and execute any action Radius is capable of taking. This is not limited to the Kubernetes controllers.
+
+**Impact:** An authorized user of the Kubernetes cluster with permission to create a `Recipe` resource can execute any Recipe in any Environment registered with Radius.
+
+**Mitigations:**
+
+1. Operators should limit access to the `Recipe` resource using Kubernetes RBAC.
+2. Operators should limit direct access to the Radius API using Kubernetes RBAC.
+3. We should revisit the threat model and provide a more robust set of authorization controls when granular authorization policies are added to Radius.
+
+**Status:** These mitigations are partial and require configuration by the operator. We will revisit and improve this area in the future.
 
 ## Open Questions
 
 ## Action Items
 
 1. Use a hashing algorithm other than SHA-1 while computing the hash of the configuration of a Deployment object. This is a breaking change because deployments that are already hashed with SHA1 should be redeployed so that reconciler can work as expected.
-2. Check if TLS is enabled for every component to ensure secure communication. Make changes to the necessary components if required.
-3. Ensure that all communication uses mTLS (Mutual TLS) to authenticate both the client and server, providing an additional layer of security. Verify that mTLS is correctly configured for all components and endpoints. Make changes to the necessary components if required.
-4. Check if RBAC with Least Privilege is configured for every component to ensure that each component has only the permissions it needs to function. Make changes to the necessary components if required.
-5. Define and implement necessary Network Policies to ensure that communication is accepted only from expected and authorized components. Regularly review and update these policies to maintain security.
-6. Separate and firewall the etcd cluster to ensure the safety of the datastore. Implement network segmentation to isolate the etcd cluster from other components. Configure firewall rules to restrict access to the etcd cluster, allowing only authorized components and administrators to communicate with it. Regularly review and update firewall rules and network policies to maintain security.
-7. Containers should run as a non-root user wherever possible to minimize the risks. Check if we can run any of the Radius containers as non-root. Do the necessary updates.
+2. Check if RBAC with Least Privilege is configured for every component to ensure that each component has only the permissions it needs to function. Make changes to the necessary components if required.
+3. Define and implement necessary Network Policies to ensure that communication is accepted only from expected and authorized components. Regularly review and update these policies to maintain security.
+4. Containers should run as a non-root user wherever possible to minimize the risks. Check if we can run any of the Radius containers as non-root. Do the necessary updates.
 
 ## Review Notes
 
-<!--
-Update this section with the decisions and feedback from the threat model review meeting. Document any changes made to the model based on the review.
--->
-
 ## References
 
 1. <https://kubernetes.io/blog/2018/07/18/11-ways-not-to-get-hacked>