Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation #27791

Merged
merged 20 commits into from
Jan 16, 2025

Conversation

CorrenSoft
Copy link
Contributor

@CorrenSoft CorrenSoft commented Oct 28, 2024

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritize for review
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for PR followers and do not help prioritize for review

Description

  • Added temporary_name_for_rotation property (optional, not persisted).
  • Added the functionality to rotate the node pool instead of recreate it for the following properties:
    • fips_enabled
    • host_encryption_enabled
    • kubelet_config
    • linux_os_config
    • max_pods
    • node_public_ip_enabled
    • os_disk_size_gb
    • os_disk_type
    • pod_subnet_id
    • snapshot_id
    • ultra_ssd_enabled
    • vm_size
    • vnet_subnet_id
    • zones
  • Removed the ForceNew flag, and added the code to update the values on the above listed properties.
  • Updated TestAccKubernetesClusterNodePool_manualScaleVMSku and TestAccKubernetesClusterNodePool_ultraSSD test cases.
  • Removed schemaNodePoolSysctlConfigForceNew, schemaNodePoolKubeletConfigForceNew and schemaNodePoolLinuxOSConfigForceNew as they are no longer used.
  • Renamed retrySystemNodePoolCreation to retryNodePoolCreation, as now is being used for both cases.

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevant documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Related Issue(s)

Fixes #22265

Note

If this PR changes meaningfully during the course of review please update the title and description as required.

Updates the NodePoolUpdate function to rotate the node pool.
Removes the ForceNew flag on properties.
Restoring name as ForceNew.
Deleting obsolete methods.
Renaming `retrySystemNodePoolCreation` to `retryNodePoolCreation`.
@CorrenSoft
Copy link
Contributor Author

CorrenSoft commented Oct 28, 2024

Test Logs

TF_ACC=1 go test -v ./internal/services/containers -run=TestAccKubernetesClusterNodePool_ -timeout 180m -ldflags="-X=github.com/hashicorp/terraform-provider-azurerm/version.ProviderVersion=acc"
=== RUN TestAccKubernetesClusterNodePool_autoScale
=== PAUSE TestAccKubernetesClusterNodePool_autoScale
=== RUN TestAccKubernetesClusterNodePool_autoScaleUpdate
=== PAUSE TestAccKubernetesClusterNodePool_autoScaleUpdate
=== RUN TestAccKubernetesClusterNodePool_availabilityZones
=== PAUSE TestAccKubernetesClusterNodePool_availabilityZones
=== RUN TestAccKubernetesClusterNodePool_capacityReservationGroup
=== PAUSE TestAccKubernetesClusterNodePool_capacityReservationGroup
=== RUN TestAccKubernetesClusterNodePool_errorForAvailabilitySet
kubernetes_cluster_node_pool_resource_test.go:126: AvailabilitySet not supported as an option for default_node_pool in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_errorForAvailabilitySet (0.00s)
=== RUN TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
=== PAUSE TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
=== RUN TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
=== PAUSE TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
=== RUN TestAccKubernetesClusterNodePool_other
=== PAUSE TestAccKubernetesClusterNodePool_other
=== RUN TestAccKubernetesClusterNodePool_multiplePools
=== PAUSE TestAccKubernetesClusterNodePool_multiplePools
=== RUN TestAccKubernetesClusterNodePool_manualScale
=== PAUSE TestAccKubernetesClusterNodePool_manualScale
=== RUN TestAccKubernetesClusterNodePool_manualScaleMultiplePools
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleMultiplePools
=== RUN TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
=== RUN TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
=== RUN TestAccKubernetesClusterNodePool_manualScaleUpdate
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleUpdate
=== RUN TestAccKubernetesClusterNodePool_manualScaleVMSku
=== PAUSE TestAccKubernetesClusterNodePool_manualScaleVMSku
=== RUN TestAccKubernetesClusterNodePool_modeSystem
=== PAUSE TestAccKubernetesClusterNodePool_modeSystem
=== RUN TestAccKubernetesClusterNodePool_modeUpdate
=== PAUSE TestAccKubernetesClusterNodePool_modeUpdate
=== RUN TestAccKubernetesClusterNodePool_nodeTaints
=== PAUSE TestAccKubernetesClusterNodePool_nodeTaints
=== RUN TestAccKubernetesClusterNodePool_nodeLabels
=== PAUSE TestAccKubernetesClusterNodePool_nodeLabels
=== RUN TestAccKubernetesClusterNodePool_nodePublicIP
=== PAUSE TestAccKubernetesClusterNodePool_nodePublicIP
=== RUN TestAccKubernetesClusterNodePool_podSubnet
=== PAUSE TestAccKubernetesClusterNodePool_podSubnet
=== RUN TestAccKubernetesClusterNodePool_osDiskSizeGB
=== PAUSE TestAccKubernetesClusterNodePool_osDiskSizeGB
=== RUN TestAccKubernetesClusterNodePool_proximityPlacementGroupId
=== PAUSE TestAccKubernetesClusterNodePool_proximityPlacementGroupId
=== RUN TestAccKubernetesClusterNodePool_osDiskType
=== PAUSE TestAccKubernetesClusterNodePool_osDiskType
=== RUN TestAccKubernetesClusterNodePool_requiresImport
=== PAUSE TestAccKubernetesClusterNodePool_requiresImport
=== RUN TestAccKubernetesClusterNodePool_spot
=== PAUSE TestAccKubernetesClusterNodePool_spot
=== RUN TestAccKubernetesClusterNodePool_upgradeSettings
=== PAUSE TestAccKubernetesClusterNodePool_upgradeSettings
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkManual
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkManual
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
=== RUN TestAccKubernetesClusterNodePool_windows
=== PAUSE TestAccKubernetesClusterNodePool_windows
=== RUN TestAccKubernetesClusterNodePool_windows2019
=== PAUSE TestAccKubernetesClusterNodePool_windows2019
=== RUN TestAccKubernetesClusterNodePool_windows2022
=== PAUSE TestAccKubernetesClusterNodePool_windows2022
=== RUN TestAccKubernetesClusterNodePool_windowsAndLinux
=== PAUSE TestAccKubernetesClusterNodePool_windowsAndLinux
=== RUN TestAccKubernetesClusterNodePool_zeroSize
=== PAUSE TestAccKubernetesClusterNodePool_zeroSize
=== RUN TestAccKubernetesClusterNodePool_hostEncryption
=== PAUSE TestAccKubernetesClusterNodePool_hostEncryption
=== RUN TestAccKubernetesClusterNodePool_maxSize
=== PAUSE TestAccKubernetesClusterNodePool_maxSize
=== RUN TestAccKubernetesClusterNodePool_sameSize
=== PAUSE TestAccKubernetesClusterNodePool_sameSize
=== RUN TestAccKubernetesClusterNodePool_ultraSSD
=== PAUSE TestAccKubernetesClusterNodePool_ultraSSD
=== RUN TestAccKubernetesClusterNodePool_osSkuUbuntu
=== PAUSE TestAccKubernetesClusterNodePool_osSkuUbuntu
=== RUN TestAccKubernetesClusterNodePool_osSkuAzureLinux
=== PAUSE TestAccKubernetesClusterNodePool_osSkuAzureLinux
=== RUN TestAccKubernetesClusterNodePool_osSkuCBLMariner
kubernetes_cluster_node_pool_resource_test.go:821: CBLMariner is an invalid os_sku in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_osSkuCBLMariner (0.00s)
=== RUN TestAccKubernetesClusterNodePool_osSkuMariner
kubernetes_cluster_node_pool_resource_test.go:839: Mariner is an invalid os_sku in 4.0
--- SKIP: TestAccKubernetesClusterNodePool_osSkuMariner (0.00s)
=== RUN TestAccKubernetesClusterNodePool_osSkuMigration
=== PAUSE TestAccKubernetesClusterNodePool_osSkuMigration
=== RUN TestAccKubernetesClusterNodePool_dedicatedHost
=== PAUSE TestAccKubernetesClusterNodePool_dedicatedHost
=== RUN TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
=== PAUSE TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
=== RUN TestAccKubernetesClusterNodePool_scaleDownMode
=== PAUSE TestAccKubernetesClusterNodePool_scaleDownMode
=== RUN TestAccKubernetesClusterNodePool_workloadRuntime
=== PAUSE TestAccKubernetesClusterNodePool_workloadRuntime
=== RUN TestAccKubernetesClusterNodePool_customCATrustEnabled
kubernetes_cluster_node_pool_resource_test.go:981: Skipping this test in 4.0 beta as it is not supported
--- SKIP: TestAccKubernetesClusterNodePool_customCATrustEnabled (0.00s)
=== RUN TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
=== PAUSE TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
=== RUN TestAccKubernetesClusterNodePool_nodeIPTags
=== PAUSE TestAccKubernetesClusterNodePool_nodeIPTags
=== RUN TestAccKubernetesClusterNodePool_networkProfileComplete
=== PAUSE TestAccKubernetesClusterNodePool_networkProfileComplete
=== RUN TestAccKubernetesClusterNodePool_networkProfileUpdate
=== PAUSE TestAccKubernetesClusterNodePool_networkProfileUpdate
=== RUN TestAccKubernetesClusterNodePool_snapshotId
=== PAUSE TestAccKubernetesClusterNodePool_snapshotId
=== RUN TestAccKubernetesClusterNodePool_gpuInstance
=== PAUSE TestAccKubernetesClusterNodePool_gpuInstance
=== RUN TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
=== PAUSE TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
=== RUN TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempAndOriginal
=== PAUSE TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempAndOriginal
=== RUN TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempWithoutOriginal
=== PAUSE TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempWithoutOriginal
=== CONT TestAccKubernetesClusterNodePool_autoScale
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkManual
=== CONT TestAccKubernetesClusterNodePool_modeSystem
=== CONT TestAccKubernetesClusterNodePool_multiplePools
=== CONT TestAccKubernetesClusterNodePool_dedicatedHost
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkAutomatic
=== CONT TestAccKubernetesClusterNodePool_manualScaleVMSku
=== CONT TestAccKubernetesClusterNodePool_manualScaleUpdate
--- PASS: TestAccKubernetesClusterNodePool_modeSystem (975.49s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkAutomatic (1104.13s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkManual (1111.55s)
=== CONT TestAccKubernetesClusterNodePool_manualScaleMultiplePools
--- PASS: TestAccKubernetesClusterNodePool_multiplePools (1143.18s)
=== CONT TestAccKubernetesClusterNodePool_manualScale
--- PASS: TestAccKubernetesClusterNodePool_dedicatedHost (1204.76s)
=== CONT TestAccKubernetesClusterNodePool_other
--- PASS: TestAccKubernetesClusterNodePool_manualScaleUpdate (1318.20s)
=== CONT TestAccKubernetesClusterNodePool_osDiskSizeGB
--- PASS: TestAccKubernetesClusterNodePool_autoScale (1416.60s)
=== CONT TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial
--- PASS: TestAccKubernetesClusterNodePool_manualScaleVMSku (1433.36s)
=== CONT TestAccKubernetesClusterNodePool_upgradeSettings
--- PASS: TestAccKubernetesClusterNodePool_manualScaleIgnoreChanges (877.97s)
=== CONT TestAccKubernetesClusterNodePool_spot
--- PASS: TestAccKubernetesClusterNodePool_manualScale (846.83s)
=== CONT TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig
--- PASS: TestAccKubernetesClusterNodePool_manualScaleMultiplePools (1017.96s)
=== CONT TestAccKubernetesClusterNodePool_requiresImport
--- PASS: TestAccKubernetesClusterNodePool_osDiskSizeGB (812.83s)
=== CONT TestAccKubernetesClusterNodePool_capacityReservationGroup
--- PASS: TestAccKubernetesClusterNodePool_other (945.73s)
=== CONT TestAccKubernetesClusterNodePool_osDiskType
--- PASS: TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfigPartial (814.91s)
=== CONT TestAccKubernetesClusterNodePool_availabilityZones
--- PASS: TestAccKubernetesClusterNodePool_manualScaleMultiplePoolsUpdate (1173.20s)
=== CONT TestAccKubernetesClusterNodePool_proximityPlacementGroupId
--- PASS: TestAccKubernetesClusterNodePool_upgradeSettings (1076.84s)
=== CONT TestAccKubernetesClusterNodePool_autoScaleUpdate
--- PASS: TestAccKubernetesClusterNodePool_spot (789.22s)
=== CONT TestAccKubernetesClusterNodePool_hostEncryption
--- PASS: TestAccKubernetesClusterNodePool_kubeletAndLinuxOSConfig (722.50s)
=== CONT TestAccKubernetesClusterNodePool_osSkuMigration
--- PASS: TestAccKubernetesClusterNodePool_requiresImport (739.65s)
=== CONT TestAccKubernetesClusterNodePool_osSkuAzureLinux
--- PASS: TestAccKubernetesClusterNodePool_capacityReservationGroup (869.46s)
=== CONT TestAccKubernetesClusterNodePool_osSkuUbuntu
--- PASS: TestAccKubernetesClusterNodePool_availabilityZones (798.06s)
=== CONT TestAccKubernetesClusterNodePool_networkProfileUpdate
--- PASS: TestAccKubernetesClusterNodePool_osDiskType (891.25s)
=== CONT TestAccKubernetesClusterNodePool_ultraSSD
=== NAME TestAccKubernetesClusterNodePool_proximityPlacementGroupId
--- PASS: TestAccKubernetesClusterNodePool_proximityPlacementGroupId (977.11s)
=== CONT TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempWithoutOriginal
--- PASS: TestAccKubernetesClusterNodePool_hostEncryption (772.17s)
=== CONT TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempAndOriginal
--- PASS: TestAccKubernetesClusterNodePool_osSkuAzureLinux (725.02s)
=== CONT TestAccKubernetesClusterNodePool_sameSize
--- PASS: TestAccKubernetesClusterNodePool_osSkuMigration (1054.91s)
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition
--- PASS: TestAccKubernetesClusterNodePool_autoScaleUpdate (1259.98s)
=== CONT TestAccKubernetesClusterNodePool_maxSize
--- PASS: TestAccKubernetesClusterNodePool_osSkuUbuntu (794.91s)
=== CONT TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled
--- PASS: TestAccKubernetesClusterNodePool_networkProfileUpdate (992.36s)
=== CONT TestAccKubernetesClusterNodePool_networkProfileComplete
=== NAME TestAccKubernetesClusterNodePool_maxSize
testcase.go:173: Step 1/3 error: Error running apply: exit status 1

    Error: creating Agent Pool (Subscription: "a7c38d21-c587-4bd0-9913-67218cfdc5bf"
    Resource Group Name: "acctestRG-aks-250107173646112854"
    Managed Cluster Name: "acctestaks250107173646112854"
    Agent Pool Name: "internal"): performing CreateOrUpdate: unexpected status 400 (400 Bad Request) with response: {
      "code": "InsufficientSubnetSize",
      "details": null,
      "message": "Pre-allocated IPs 102400 exceeds IPs available 65536 in Subnet Cidr 10.244.0.0/16, Subnet Name networkProfile.podCIDR. If Autoscaler is enabled, the max-count from each nodepool is counted towards this total (which means that pre-allocated IPs count represents a theoretical max value, not the actual number of IPs requested). http://aka.ms/aks/insufficientsubnetsize",
      "subcode": "",
      "target": "networkProfile.podCIDR"
     }

      with azurerm_kubernetes_cluster_node_pool.test,
      on terraform_plugin_test.tf line 58, in resource "azurerm_kubernetes_cluster_node_pool" "test":
      58: resource "azurerm_kubernetes_cluster_node_pool" "test" {

--- PASS: TestAccKubernetesClusterNodePool_ultraSSD (1048.43s)
=== CONT TestAccKubernetesClusterNodePool_gpuInstance
--- FAIL: TestAccKubernetesClusterNodePool_maxSize (572.87s)
=== CONT TestAccKubernetesClusterNodePool_snapshotId
--- PASS: TestAccKubernetesClusterNodePool_sameSize (801.68s)
=== CONT TestAccKubernetesClusterNodePool_nodeLabels
=== NAME TestAccKubernetesClusterNodePool_snapshotId
testcase.go:173: Step 1/5 error: Pre-apply plan check(s) failed:
azurerm_kubernetes_cluster_node_pool.test - Resource not found in plan ResourceChanges
--- PASS: TestAccKubernetesClusterNodePool_windowsProfileOutboundNatEnabled (644.14s)
=== CONT TestAccKubernetesClusterNodePool_nodeIPTags
--- FAIL: TestAccKubernetesClusterNodePool_snapshotId (100.25s)
=== CONT TestAccKubernetesClusterNodePool_scaleDownMode
--- PASS: TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempWithoutOriginal (1206.57s)
=== CONT TestAccKubernetesClusterNodePool_workloadRuntime
--- PASS: TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempAndOriginal (1455.30s)
=== CONT TestAccKubernetesClusterNodePool_podSubnet
--- PASS: TestAccKubernetesClusterNodePool_gpuInstance (822.10s)
=== CONT TestAccKubernetesClusterNodePool_nodePublicIP
--- PASS: TestAccKubernetesClusterNodePool_networkProfileComplete (918.18s)
=== CONT TestAccKubernetesClusterNodePool_nodeTaints
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkOwnershipRaceCondition (1315.09s)
=== CONT TestAccKubernetesClusterNodePool_windows2022
--- PASS: TestAccKubernetesClusterNodePool_nodeIPTags (906.00s)
=== CONT TestAccKubernetesClusterNodePool_zeroSize
--- PASS: TestAccKubernetesClusterNodePool_nodeLabels (1076.94s)
=== CONT TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings
--- PASS: TestAccKubernetesClusterNodePool_workloadRuntime (1017.44s)
=== CONT TestAccKubernetesClusterNodePool_windowsAndLinux
--- PASS: TestAccKubernetesClusterNodePool_scaleDownMode (1089.34s)
=== CONT TestAccKubernetesClusterNodePool_modeUpdate
=== NAME TestAccKubernetesClusterNodePool_windowsAndLinux
testcase.go:173: Step 1/5 error: Pre-apply plan check(s) failed:
azurerm_kubernetes_cluster_node_pool.test - Resource not found in plan ResourceChanges
--- FAIL: TestAccKubernetesClusterNodePool_windowsAndLinux (139.88s)
=== CONT TestAccKubernetesClusterNodePool_windows
--- PASS: TestAccKubernetesClusterNodePool_nodePublicIP (1075.35s)
=== CONT TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet
--- PASS: TestAccKubernetesClusterNodePool_nodeTaints (1071.18s)
=== CONT TestAccKubernetesClusterNodePool_windows2019
--- PASS: TestAccKubernetesClusterNodePool_windows2022 (950.24s)
--- PASS: TestAccKubernetesClusterNodePool_podSubnet (1238.30s)
--- PASS: TestAccKubernetesClusterNodePool_zeroSize (798.43s)
--- PASS: TestAccKubernetesClusterNodePool_turnOnEnableAutoScalingWithDefaultMaxMinCountSettings (998.29s)
--- PASS: TestAccKubernetesClusterNodePool_windows (962.61s)
--- PASS: TestAccKubernetesClusterNodePool_modeUpdate (1191.22s)
--- PASS: TestAccKubernetesClusterNodePool_virtualNetworkMultipleSubnet (894.44s)
--- PASS: TestAccKubernetesClusterNodePool_windows2019 (926.59s)
FAIL
FAIL github.com/hashicorp/terraform-provider-azurerm/internal/services/containers 6938.076s
FAIL
make: *** [GNUmakefile:103: acctests] Error 1

@CorrenSoft
Copy link
Contributor Author

CorrenSoft commented Oct 28, 2024

Note on test cases

TestAccKubernetesClusterNodePool_manualScaleVMSku and TestAccKubernetesClusterNodePool_ultraSSD tests were previously failing, stating that the resource couldn't be replaced. After adding a temporary name for the rotation, the tests are successful, but not sure if they are still relevant to keep after this change (plus, I saw another PR removing at least one of them), or perhaps rewrite them as a unique and consolidated test for this specific use case.
Edit: Testes removed on merge from main.

TestAccKubernetesClusterNodePool_windowsAndLinux and 'TestAccKubernetesClusterNodePool_maxSize' are failing but the errors are pre-existing.

@CorrenSoft CorrenSoft changed the title azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation` azurerm_kubernetes_cluster_node_pool: Adds support for temporary_name_for_rotation Oct 28, 2024
Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @CorrenSoft!

In addition to the comments and suggestions in-line, we also need to consider the behaviour when rotation of the node pool fails e.g. what happens if we fail to spin up the node pool with the new configuration and are left with the temporary node pool? It would be good if failures here are recoverable for the user.

The azurerm_kubernetes_cluster handles this by falling back on the temporary_name_for_rotation system node pool when we perform a read.

We then have special logic in the CustomizeDiff to prevent the node pool name from triggering a ForceNew and to allow the rotation logic to continue on from where it failed.

Test cases that simulate these failure scenarios should be added for this as well, I've linked two tests we wrote for the AKS resource below that can help you with those:

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempWithoutDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basicWithTempName(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool and delete the old default node pool to simulate the case where resizing fails when trying to bring up the new node pool
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
if err := client.DeleteThenPoll(ctx, defaultNodePoolId); err != nil {
return fmt.Errorf("deleting default %s: %+v", defaultNodePoolId, err)
}
return nil
}, data.ResourceName),
),
// the plan will show that the default node pool name has been set to "temp" and we're trying to set it back to "default"
ExpectNonEmptyPlan: true,
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})
}

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempAndDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basic(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool to simulate the case where both old default node pool and temp node pool exist
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
return nil
}, data.ResourceName),
),
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})

I hope that makes sense, let me know if you have any questions!

@stephybun
Copy link
Member

Hey @CorrenSoft we have some customers requesting this feature. Would you be able to let me know whether you're planning to work through the review feedback that was left? I'm happy to take this over to get it in if you don't find yourself with time/energy at the moment to get back to this, just checking before I step on your toes here 🙂

@CorrenSoft
Copy link
Contributor Author

Thanks for this PR @CorrenSoft!

In addition to the comments and suggestions in-line, we also need to consider the behaviour when rotation of the node pool fails e.g. what happens if we fail to spin up the node pool with the new configuration and are left with the temporary node pool? It would be good if failures here are recoverable for the user.

The azurerm_kubernetes_cluster handles this by falling back on the temporary_name_for_rotation system node pool when we perform a read.

We then have special logic in the CustomizeDiff to prevent the node pool name from triggering a ForceNew and to allow the rotation logic to continue on from where it failed.

Test cases that simulate these failure scenarios should be added for this as well, I've linked two tests we wrote for the AKS resource below that can help you with those:

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempWithoutDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basicWithTempName(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool and delete the old default node pool to simulate the case where resizing fails when trying to bring up the new node pool
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
if err := client.DeleteThenPoll(ctx, defaultNodePoolId); err != nil {
return fmt.Errorf("deleting default %s: %+v", defaultNodePoolId, err)
}
return nil
}, data.ResourceName),
),
// the plan will show that the default node pool name has been set to "temp" and we're trying to set it back to "default"
ExpectNonEmptyPlan: true,
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})
}

func TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempAndDefault(t *testing.T) {
data := acceptance.BuildTestData(t, "azurerm_kubernetes_cluster", "test")
r := KubernetesClusterResource{}
data.ResourceTest(t, r, []acceptance.TestStep{
{
Config: r.basic(data),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
// create the temporary node pool to simulate the case where both old default node pool and temp node pool exist
data.CheckWithClientForResource(func(ctx context.Context, clients *clients.Client, state *terraform.InstanceState) error {
if _, ok := ctx.Deadline(); !ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeout(ctx, 1*time.Hour)
defer cancel()
}
client := clients.Containers.AgentPoolsClient
id, err := commonids.ParseKubernetesClusterID(state.Attributes["id"])
if err != nil {
return err
}
defaultNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, state.Attributes["default_node_pool.0.name"])
resp, err := client.Get(ctx, defaultNodePoolId)
if err != nil {
return fmt.Errorf("retrieving %s: %+v", defaultNodePoolId, err)
}
if resp.Model == nil {
return fmt.Errorf("retrieving %s: model was nil", defaultNodePoolId)
}
tempNodePoolName := "temp"
profile := resp.Model
profile.Name = &tempNodePoolName
profile.Properties.VMSize = pointer.To("Standard_DS3_v2")
tempNodePoolId := agentpools.NewAgentPoolID(id.SubscriptionId, id.ResourceGroupName, id.ManagedClusterName, tempNodePoolName)
if err := client.CreateOrUpdateThenPoll(ctx, tempNodePoolId, *profile); err != nil {
return fmt.Errorf("creating %s: %+v", tempNodePoolId, err)
}
return nil
}, data.ResourceName),
),
},
{
Config: r.updateVmSize(data, "Standard_DS3_v2"),
Check: acceptance.ComposeTestCheckFunc(
check.That(data.ResourceName).ExistsInAzure(r),
),
},
data.ImportStep("default_node_pool.0.temporary_name_for_rotation"),
})

I hope that makes sense, let me know if you have any questions!

I see the point. I will do further test to evaluate how is behaving and what can do about it.

@CorrenSoft
Copy link
Contributor Author

@stephybun,
I have included the indicated tests (with the corresponding updates) and their execution has been successful. I have also performed some tests with a compiled version of the provider by forcing a failure (by interrupting the update), and Terraform was able to recover from the inconsistency state.

Regarding the fallback in case of failure, in this case, it would not be necessary since these additional nodes do not need (nor should they) be marked as default.

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding those test cases @CorrenSoft! I think some old tests that were removed because they're no longer relevant post 4.0 crept their way into your PR. Could you please remove those and take a look at the additional comments I left in-line?

@CorrenSoft CorrenSoft requested a review from stephybun January 13, 2025 20:12
Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to work on this @CorrenSoft! I know this will be much appreciated by the community and our customers.

The tests look good, I think this is ready to go in. LGTM 🚀

@stephybun stephybun merged commit 5a1f433 into hashicorp:main Jan 16, 2025
33 checks passed
@github-actions github-actions bot added this to the v4.16.0 milestone Jan 16, 2025
stephybun added a commit that referenced this pull request Jan 16, 2025
@CorrenSoft CorrenSoft deleted the feature/22265 branch January 16, 2025 13:29
jackofallops added a commit that referenced this pull request Jan 16, 2025
* update for #27680

* Update CHANGELOG.md for #28465

* Update CHANGELOG.md #27932

* Update CHANGELOG.md for #28505

* Update CHANGELOG.md for #28474

* Update CHANGELOG.md #28516

* Update CHANGELOG for #28456

* Update CHANGELOG.md for #28472

* Update CHANGELOG.md #28307

* Update CHANGELOG.md for #27859

* Update for #28519

* Update for #27791 #27528

* Update CHANGELOG.md for #28527

* update changelog links and generate provider schema

---------

Co-authored-by: jackofallops <[email protected]>
Co-authored-by: catriona-m <[email protected]>
Co-authored-by: sreallymatt <[email protected]>
Co-authored-by: Matthew Frahry <[email protected]>
Co-authored-by: jackofallops <[email protected]>
NotTheEvilOne pushed a commit to b1-systems/terraform-provider-azurerm that referenced this pull request Jan 20, 2025
…ame_for_rotation` (hashicorp#27791)

* Adds property.
Updates the NodePoolUpdate function to rotate the node pool.
Removes the ForceNew flag on properties.

* Updating tests.
Restoring name as ForceNew.

* Updating Docs.

* Fixing value assignment.
Deleting obsolete methods.
Renaming `retrySystemNodePoolCreation` to `retryNodePoolCreation`.

* Updating properties values from HCL definition.

* Remove unused function (schemaNodePoolSysctlConfigForceNew)

* Fixing docs

* Update pointer's function.

* Improving subnet assignment

* Fixing zones not being updated when value was set to null.

* Fixing assigment when value is null

* Restoring files lose on merge.

* Linting

* Adds 'TestAccKubernetesClusterNodePool_updateVmSizeAfterFailureWithTempAndOriginal'

* Adds TestAccKubernetesCluster_updateVmSizeAfterFailureWithTempWithoutOriginal

* Fix test's name.

* Removing deprecated test and applying feedback.

* Applying feedback.

* Removing obsolete code.
NotTheEvilOne pushed a commit to b1-systems/terraform-provider-azurerm that referenced this pull request Jan 20, 2025
* update for hashicorp#27680

* Update CHANGELOG.md for hashicorp#28465

* Update CHANGELOG.md hashicorp#27932

* Update CHANGELOG.md for hashicorp#28505

* Update CHANGELOG.md for hashicorp#28474

* Update CHANGELOG.md hashicorp#28516

* Update CHANGELOG for hashicorp#28456

* Update CHANGELOG.md for hashicorp#28472

* Update CHANGELOG.md hashicorp#28307

* Update CHANGELOG.md for hashicorp#27859

* Update for hashicorp#28519

* Update for hashicorp#27791 hashicorp#27528

* Update CHANGELOG.md for hashicorp#28527

* update changelog links and generate provider schema

---------

Co-authored-by: jackofallops <[email protected]>
Co-authored-by: catriona-m <[email protected]>
Co-authored-by: sreallymatt <[email protected]>
Co-authored-by: Matthew Frahry <[email protected]>
Co-authored-by: jackofallops <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for temporary_name_for_rotation for other nodepools
3 participants