Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support disable-default-snat for GKE #6537

Closed
tvvignesh opened this issue Jun 5, 2020 · 12 comments
Closed

Add support disable-default-snat for GKE #6537

tvvignesh opened this issue Jun 5, 2020 · 12 comments

Comments

@tvvignesh
Copy link

tvvignesh commented Jun 5, 2020

Hi. I tried setting up Private GKE clusters using Terraform. While I was able to successfully set it up for 3 clusters, one of the clusters keeps failing with this error:

Screenshot from 2020-06-05 16-47-14

I did refer information about the error here (disabling snat): https://cloud.google.com/kubernetes-engine/docs/how-to/alias-ips#cannot_use_--disable-default-snat_without_--enable-ip-alias

May I know how to solve this using Terraform?

My Sample config:

module "tc-dev-ops-1" {
  source                    = "../modules/gke-cluster"
  project_id                = var.core_project_id
  name                      = "myapp-1"
  description               = "Cluster 1 (asia-south1-a)"
  location                  = "asia-south1-a"
  network                   = module.core-vpc-host-1.self_link
  subnetwork                = module.core-vpc-host-1.subnet_self_links["asia-south1/app-subnet-as1-1"]
  secondary_range_pods      = "pods"
  secondary_range_services  = "services"
  default_max_pods_per_node = 100
  master_authorized_ranges = {
    internal-vms = "10.0.0.0/8"
  }
  private_cluster_config = {
    enable_private_nodes    = true
    enable_private_endpoint = true
    master_ipv4_cidr_block  = "192.168.70.0/28"
  }
  labels = {
    app         = "myapp"
    environment = "dev"
    location    = "asia-south1-a"
    module      = "ops"
  }
  addons = {
    horizontal_pod_autoscaling            = true
    http_load_balancing                   = true
    network_policy_config                 = true
    gce_persistent_disk_csi_driver_config = true
    cloudrun_config                       = false
    dns_cache_config                      = false
    istio_config = {
      enabled = false
      tls     = false
    }
  }

  cluster_autoscaling = {
    enabled    = true
    cpu_min    = 5
    cpu_max    = 40
    memory_min = 10
    memory_max = 50
  }

  enable_intranode_visibility = true
  enable_shielded_nodes       = true
  enable_tpu                  = false
  enable_binary_authorization = false
  pod_security_policy         = true
  release_channel             = "RAPID"
  vertical_pod_autoscaling    = true
  workload_identity           = true
  maintenance_start_time      = "21:30"
}
@ghost ghost added bug labels Jun 5, 2020
@edwardmedia edwardmedia self-assigned this Jun 5, 2020
@edwardmedia edwardmedia added question and removed bug labels Jun 5, 2020
@edwardmedia
Copy link
Contributor

@tvvignesh can you share you module code so I can repro the issue? In the meantime, can you post your debug log? Thank you

@tvvignesh
Copy link
Author

tvvignesh commented Jun 5, 2020

@edwardmedia Thanks for your quick reply. I am using this as the module for the GKE Cluster without any changes: https://github.com/terraform-google-modules/cloud-foundation-fabric/tree/master/modules/gke-cluster

Is that what you were asking for?

Last few lines of the TRACE (I am not sure if the full log contains any creds/info since its big. So, posting the last few lines of it)

2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: -----------------------------------------------------
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [DEBUG] Retry Transport: Stopping retries, last request was successful
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [DEBUG] Retry Transport: Returning after 1 attempts
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [DEBUG] Got DONE while polling for operation operation-1591376704443-6deccea5's status
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [INFO] GKE cluster projects/timecampus-dev-core/locations/asia-south1-a/clusters/tc-dev-ops-1 has been deleted
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [WARN] Verified failed creation of cluster  was cleaned up
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [DEBUG] Unlocking "google-container-cluster/timecampus-dev-core/asia-south1-a/tc-dev-ops-1"
2020-06-05T22:35:15.202+0530 [DEBUG] plugin.terraform-provider-google-beta_v3.23.0_x5: 2020/06/05 22:35:15 [DEBUG] Unlocked "google-container-cluster/timecampus-dev-core/asia-south1-a/tc-dev-ops-1"
2020/06/05 22:35:15 [DEBUG] module.tc-dev-ops-1.google_container_cluster.cluster: apply errored, but we're indicating that via the Error pointer rather than returning it: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalMaybeTainted
2020/06/05 22:35:15 [TRACE] EvalMaybeTainted: module.tc-dev-ops-1.google_container_cluster.cluster encountered an error during creation, so it is now marked as tainted
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalWriteState
2020/06/05 22:35:15 [TRACE] EvalWriteState: removing state object for module.tc-dev-ops-1.google_container_cluster.cluster
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalApplyProvisioners
2020/06/05 22:35:15 [TRACE] EvalApplyProvisioners: google_container_cluster.cluster has no state, so skipping provisioners
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalMaybeTainted
2020/06/05 22:35:15 [TRACE] EvalMaybeTainted: module.tc-dev-ops-1.google_container_cluster.cluster encountered an error during creation, so it is now marked as tainted
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalWriteState
2020/06/05 22:35:15 [TRACE] EvalWriteState: removing state object for module.tc-dev-ops-1.google_container_cluster.cluster
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalIf
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalIf
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalWriteDiff
2020/06/05 22:35:15 [TRACE] module.tc-dev-ops-1: eval: *terraform.EvalApplyPost
2020/06/05 22:35:15 [ERROR] module.tc-dev-ops-1: eval: *terraform.EvalApplyPost, err: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.
2020/06/05 22:35:15 [ERROR] module.tc-dev-ops-1: eval: *terraform.EvalSequence, err: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.
2020/06/05 22:35:15 [TRACE] [walkApply] Exiting eval tree: module.tc-dev-ops-1.google_container_cluster.cluster
2020/06/05 22:35:15 [TRACE] vertex "module.tc-dev-ops-1.google_container_cluster.cluster": visit complete
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "module.tc-dev-ops-1.output.name" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "module.tc-dev-ops-1.output.location" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "module.tc-dev-ops-1-np-1.var.cluster_name" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "module.tc-dev-ops-1-np-1.var.location" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "module.tc-dev-ops-1-np-1.google_container_node_pool.nodepool" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "provider.google-beta (close)" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "meta.count-boundary (EachMode fixup)" errored, so skipping
2020/06/05 22:35:15 [TRACE] dag/walk: upstream of "root" errored, so skipping
2020/06/05 22:35:15 [TRACE] statemgr.Filesystem: creating backup snapshot at terraform.tfstate.backup
2020/06/05 22:35:15 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 136
2020/06/05 22:35:15 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate

Error: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.

  on ../modules/gke-cluster/main.tf line 12, in resource "google_container_cluster" "cluster":
  12: resource "google_container_cluster" "cluster" {



2020/06/05 22:35:15 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2020/06/05 22:35:15 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock
Error: error creating NodePool: googleapi: Error 400: Node version "1.17.5-gke.0" is unsupported., badRequest

  on ../modules/gke-nodepool/main.tf line 2, in resource "google_container_node_pool" "nodepool":
   2: resource "google_container_node_pool" "nodepool" {


2020-06-05T22:35:15.322+0530 [DEBUG] plugin: plugin process exited: path=/opt/timecampus/ops/timecampus-cloud/gcp/dev/.terraform/plugins/linux_amd64/terraform-provider-google-beta_v3.23.0_x5 pid=76662
2020-06-05T22:35:15.322+0530 [DEBUG] plugin: plugin exited

@tvvignesh
Copy link
Author

Also wondering why its picking up 1.17.5-gke.0 and says it is unsupported when gke.9 has been released. I have not specified version anywhere. Have just specified the release channel as RAPID.

2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalWriteState
2020/06/05 23:12:50 [TRACE] EvalWriteState: recording 2 dependencies for module.tc-dev-core-2-np-1.google_container_node_pool.nodepool
2020/06/05 23:12:50 [TRACE] EvalWriteState: writing current state object for module.tc-dev-core-2-np-1.google_container_node_pool.nodepool
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalApplyProvisioners
2020/06/05 23:12:50 [TRACE] EvalApplyProvisioners: google_container_node_pool.nodepool is tainted, so skipping provisioning
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalMaybeTainted
2020/06/05 23:12:50 [TRACE] EvalMaybeTainted: module.tc-dev-core-2-np-1.google_container_node_pool.nodepool was already tainted, so nothing to do
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalWriteState
2020/06/05 23:12:50 [TRACE] EvalWriteState: recording 2 dependencies for module.tc-dev-core-2-np-1.google_container_node_pool.nodepool
2020/06/05 23:12:50 [TRACE] EvalWriteState: writing current state object for module.tc-dev-core-2-np-1.google_container_node_pool.nodepool
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalIf
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalIf
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalWriteDiff
2020/06/05 23:12:50 [TRACE] module.tc-dev-core-2-np-1: eval: *terraform.EvalApplyPost
2020/06/05 23:12:50 [ERROR] module.tc-dev-core-2-np-1: eval: *terraform.EvalApplyPost, err: error creating NodePool: googleapi: Error 400: Node version "1.17.5-gke.0" is unsupported., badRequest
2020/06/05 23:12:50 [ERROR] module.tc-dev-core-2-np-1: eval: *terraform.EvalSequence, err: error creating NodePool: googleapi: Error 400: Node version "1.17.5-gke.0" is unsupported., badRequest

@tvvignesh
Copy link
Author

@edwardmedia I have looked up the Terraform documentation completely regarding this and am not able to find a way to currently do this. Puzzled 😕

@edwardmedia
Copy link
Contributor

edwardmedia commented Jun 8, 2020

@tvvignesh can you post the sections of request and response of the log? Without having the full log, it is difficult to tell why 1.17.5-gke.0 was picked. Can you try by setting node_version explicitly? Your module uses a log of dynamic. Can you repro the issue with hard-coded configs?

@mkushakov
Copy link

Hello, I have the same problem but i am using terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster module.

I am trying to create private cluster using REGULAR channel with access to master over internet.

Here is mine TF config:

module "gke" {
  source          = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
  project_id      = local.project_id
  name            = "gke-${local.region}"
  release_channel = "REGULAR"
  regional        = false
  region          = local.region
  zones = [
    "${local.region}-c",
    "${local.region}-d"
  ]

  network           = module.vpc.network_name
  subnetwork        = module.vpc.subnets_names[0]
  ip_range_pods     = "gke-${local.region}-pods"
  ip_range_services = "gke-${local.region}-services"

  horizontal_pod_autoscaling = true
  remove_default_node_pool   = true

  enable_private_nodes   = true
  master_ipv4_cidr_block = "172.116.0.0/28"
  master_authorized_networks = [
    {
      cidr_block   = "0.0.0.0/0"
      display_name = "All"
    }
  ]
}

And here is error message i have:

2020/06/09 10:20:06 [ERROR] module.gke: eval: *terraform.EvalSequence, err: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.

Error: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.

  on .terraform/modules/gke/terraform-google-kubernetes-engine-9.2.0/modules/beta-private-cluster/cluster.tf line 22, in resource "google_container_cluster" "primary":
  22: resource "google_container_cluster" "primary" {

@ghost ghost removed the waiting-response label Jun 9, 2020
@edwardmedia
Copy link
Contributor

edwardmedia commented Jun 9, 2020

--disable-default-snat has not been implemented. Change to enhancement for backlog planning
https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster.DefaultSnatStatus

@edwardmedia edwardmedia changed the title Disabling snat using Terraform - GKE Add support disable-default-snat for GKE Jun 9, 2020
@ghost ghost added bug labels Jun 9, 2020
@edwardmedia edwardmedia removed their assignment Jun 9, 2020
@tvvignesh
Copy link
Author

CC: @danawillow

@edwardmedia
Copy link
Contributor

Dup with #6465

@abhianand09
Copy link

We had a similar requirement of creating private GKE cluster.But because of missing "default sNAT" feature in terraform we are blocked for now.

Error: Error waiting for creating GKE cluster: Must disable default sNAT (--disable-default-snat) before using public IP privately in the cluster.

Can you let me know when this support will be available in terraform for GKE ?

@danawillow
Copy link
Contributor

@abhianand09 we're tracking it in #6465, which this issue was marked as a duplicate of. I'd expect a few weeks max.

@ghost
Copy link

ghost commented Jul 11, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 [email protected]. Thanks!

@ghost ghost locked and limited conversation to collaborators Jul 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants