-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add health probes and ingress #1408
Conversation
📝 WalkthroughWalkthroughThe pull request introduces a new Changes
Suggested reviewers
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (1)
95-100
: 💡 Verification agent❓ Verification inconclusive
Role Assignment and Secret Retrieval: Address Authorization Failure
The pipeline failure indicates an authorization error when attempting to retrieve secrets from the Container App: the client does not have permission to perform
Microsoft.App/containerApps/listSecrets/action
. The current role assignment (AcrPull
role) might not include the necessary permissions for listing secrets. Please review and update the identity’s permissions to cover this action. This might involve assigning an additional role or adjusting the current role assignment.
Action: Revisit Role Assignment Permissions for Secret Retrieval
The pipeline failure indicates that the current role assignment using the "AcrPull" role (configured in infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf, lines 95–100) does not grant permission for the
Microsoft.App/containerApps/listSecrets/action
. Since the AcrPull role is designed for pulling container images from ACR, it typically lacks the secret listing permission needed by Container Apps.Key points to address:
- Review the required permissions: Confirm that the action
Microsoft.App/containerApps/listSecrets/action
is not included in the AcrPull role.- Adjust role assignments: Consider assigning an additional or alternative role (for example, a role that includes secret retrieval permissions on the Container App’s scope) or creating a custom role to cover this action.
- Verify the scope: Ensure that the role assignment’s scope aligns with the Container App resource if secret retrieval is needed there rather than on the container registry.
Please update the permissions accordingly and verify that the identity has the necessary actions allowed.
🧹 Nitpick comments (3)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (3)
56-64
: Startup Probe: Consider Adjusting the Initial DelayThe startup probe is configured with an
initial_delay
of 0 seconds and an interval of 1 second. While this may help detect startup issues quickly, it could also lead to premature failure detections if the application inherently requires more time to initialize. Please verify that your application can reliably start within 0 seconds; if not, consider increasing the initial delay. For example:- initial_delay = 0 + initial_delay = 5 # Adjust based on observed application startup timing
65-74
: Readiness Probe: Verify Health Check ParametersThe readiness probe is similarly configured with an
initial_delay
of 0 seconds and a 1‑second interval. Although the thresholds (failure count of 3 and success count of 1) may be sufficient, ensure that these values are appropriate for your application's normal startup and steady-state behavior. If your app needs a short warm-up period before it can reliably answer readiness checks, consider adjustinginitial_delay
accordingly.
75-83
: Liveness Probe: Confirm Adequate Delay SettingsThe liveness probe also starts checking immediately with an
initial_delay
of 0 seconds. This configuration is quite aggressive and might lead to unintended container restarts if the application needs a few seconds to stabilize after startup. Validate these values against your application’s behavior, and if necessary, adjust theinitial_delay
and related thresholds to prevent false positives.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf
(2 hunks)
🧰 Additional context used
🪛 GitHub Actions: altinn-apim-test-rg
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf
[error] 22-22: Error: retrieving secrets for Container App: unexpected status 403 (403 Forbidden) with error: AuthorizationFailed: The client '02b17d28-7ec8-4084-92ce-d26051fa4998' does not have authorization to perform action 'Microsoft.App/containerApps/listSecrets/action'.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (1)
37-45
: Ingress Block: Validate Configuration and Future FlexibilityThe new
ingress
block is configured with explicit parameters (e.g.,allow_insecure_connections = false
,target_port = 8080
,transport = "http"
, andexternal_enabled = true
) along with a 100% traffic weight. This looks correct for exposing the application on the specified port. However, please ensure that these static values align with your deployment and security requirements. In a future revision, you might consider parameterizing some of these settings (such astraffic_weight
) if dynamic traffic routing or blue/green deployments become necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (3)
37-45
: Ingress Block Configuration ReviewThe newly added ingress block correctly configures external access with a target port of 8080 and specifies a full (100%) traffic weight. One minor suggestion is to verify consistent casing for the
transport
property. Here, it's defined as"http"
while health probes use"HTTP"
, which might lead to confusion or unintended behavior if the provider is case-sensitive.
56-64
: Startup Probe ConfigurationThe startup_probe is set to access the
/swagger/swagger.json
endpoint on port 8080. While the configuration is syntactically correct, using aninitial_delay
of 0 and a very shortinterval_seconds
of 1 might be too aggressive depending on the actual startup time of the application. Please verify that the application can reliably serve the endpoint immediately after container launch.
75-83
: Liveness Probe ConfigurationThe liveness_probe uses an aggressive schedule – a 1-second interval with a failure threshold of 3 – which will quickly mark the container as unhealthy if issues occur. This approach may lead to frequent restarts if transient delays occur. It may be beneficial to use a dedicated health endpoint instead of the
/swagger/swagger.json
used across all probes, to better isolate health-check concerns from API documentation availability.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (1)
65-74
: Readiness Probe ConfigurationThe readiness_probe is configured similarly to the startup probe, with parameters ensuring checks occur very quickly after startup. It's important to confirm that invoking the
/swagger/swagger.json
endpoint immediately (with an initial delay of 0) does not lead to false negatives before the app is fully ready. Additionally, consider whether using the Swagger endpoint is the most reliable indicator of application readiness.
Terraform environment testFormat and Style 🖌
|
Context | Values |
---|---|
Pusher | @tjololo |
Action | workflow_dispatch |
Working Directory | ./infrastructure/adminservices-test/altinn-apim-test-rg |
State File | github.com/altinn/altinn-platform/environments/test/altinn-apim-test-rg.tfstate |
Plan File | github.com_altinn_altinn-platform_environments_test_altinn-apim-test-rg.tfstate.tfplan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (1)
76-84
: Liveness Probe: Evaluate Endpoint Choice and RobustnessThe liveness probe also targets
/swagger/swagger.json
. While this may work for verifying service availability, consider whether using a dedicated health or heartbeat endpoint might provide a more reliable indication of the application's true liveness, especially under load or during partial failures.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (3)
37-46
: Ingress Block: Validate Consistency and CorrectnessThe new
ingress
block is correctly configured to expose the container app on port 8080 with secure settings. One item to verify is the value of thetransport
attribute which is set to"http"
here—while in the probe blocks the transport is denoted as"HTTP"
. Ensure that this discrepancy is intentional and conforms to the expected enumeration in the provider documentation.
57-65
: Startup Probe: Revisit Timing ParametersThe startup probe is configured to target
/swagger/swagger.json
with aninitial_delay
of 0 seconds, an interval of 1 second, and a timeout of 1 second. This aggressive timing might trigger false negatives if the application takes slightly longer to start up. Please verify that these values match your application's startup characteristics.
66-75
: Readiness Probe: Confirm Threshold SettingsThe readiness probe settings (using an interval of 1 second, a failure threshold of 3, and a success threshold of 1) are designed for rapid feedback. Confirm that these thresholds are appropriate for your application's typical response times so that temporary delays or minor hiccups do not result in the app being marked as unready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (4)
58-66
: Startup Probe Configuration VerificationThe startup probe block is syntactically correct and appears to target the intended endpoint. However, an
initial_delay
of 0 and aninterval_seconds
of 1 combined with afailure_count_threshold
of 10 might be a bit aggressive depending on your container's startup time. Consider verifying these values in a staging environment to ensure they do not lead to premature failures.
67-76
: Readiness Probe Configuration EvaluationThe readiness probe uses the same endpoint and aggressive timing (0-second delay, 1-second interval). While this configuration can work for a very responsive service, it’s worth confirming that the application is ready to handle requests immediately at startup. If there’s any delay in becoming fully ready, you might see false negatives.
77-85
: Liveness Probe Configuration EvaluationThe liveness probe is consistently configured with the same endpoint and similar aggressive timings (0-second initial delay, 1-second interval). Ensure that these settings do not inadvertently cause the container to restart during transient issues. Consider whether a slightly longer initial delay or timeout would provide a more stable assessment of container health.
37-85
: Overall Health Probes and Ingress Configuration ConsiderationAll added configurations integrate well with the existing resource definition and align with the PR objectives. One point to confirm: all probes target
/swagger/swagger.json
. This endpoint is typically used for API documentation rather than a dedicated health check. Verify that this is intentional and adequately reflects your application's health status.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
infrastructure/adminservices-test/altinn-apim-test-rg/backend.tf (1)
37-47
: Ingress Block Configuration ReviewThe new ingress block correctly sets key parameters—such as disabling insecure connections, specifying the target port (8080), and enabling external traffic—with a nested
traffic_weight
that routes 100% of traffic to the latest revision. Please double-check that these values meet the security and traffic distribution requirements for your deployment.
LGTM |
Description
Related Issue(s)
Verification
Documentation
Summary by CodeRabbit