Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE autopilot is always created with default service account #8918

Closed
tSte opened this issue Apr 15, 2021 · 19 comments · Fixed by GoogleCloudPlatform/magic-modules#4894, #9399 or hashicorp/terraform-provider-google-beta#3361
Assignees
Labels
bug forward/review In review; remove label to forward service/container

Comments

@tSte
Copy link

tSte commented Apr 15, 2021

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.14.10

Affected Resource(s)

  • google_container_cluster

Terraform Configuration Files

terraform {
  required_version = "0.14.10"
  backend "gcs" { bucket = "hm-playground-terraform-state" }

  required_providers {
    google = {
      version = "3.64.0"
      source  = "hashicorp/google"
    }
  }
}

locals { project_id = "hmplayground" }

provider "google" {
  project = local.project_id
  region  = "europe-west3"
  zone    = "europe-west3-a"
}

resource "google_compute_network" "private_vpc" {
  provider    = google
  name        = "my-vpc"
  description = "My VPC private network"
  auto_create_subnetworks = false
}

locals {
  subnet_pod_range_name     = "my-pods"
  subnet_service_range_name = "my-services"
}

resource "google_compute_subnetwork" "cluster" {
  provider = google
  name = "my-subnet"
  network                  = google_compute_network.private_vpc.self_link
  private_ip_google_access = true

  ip_cidr_range = "10.5.16.64/26"

  secondary_ip_range {
    range_name    = local.subnet_pod_range_name
    ip_cidr_range = "10.0.64.0/18"
  }

  secondary_ip_range {
    range_name    = local.subnet_service_range_name
    ip_cidr_range = "10.4.16.0/20"
  }
}

resource "google_service_account" "gke" {
  account_id   = "my-gke-sa"
  display_name = "my-gke-sa"
}

resource "google_container_cluster" "main" {
  provider = google

  name        = "my-cluster"
  description = "My autocluster"
  location    = "europe-west3"

  network    = google_compute_network.private_vpc.self_link
  subnetwork = google_compute_subnetwork.cluster.self_link

  ip_allocation_policy {
    cluster_secondary_range_name  = local.subnet_pod_range_name
    services_secondary_range_name = local.subnet_service_range_name
  }

  private_cluster_config {
    enable_private_endpoint = false
    enable_private_nodes    = true
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  master_auth {
    username = ""
    password = ""

    client_certificate_config {
      issue_client_certificate = false
    }
  }

  addons_config {
    cloudrun_config { disabled = true }
    http_load_balancing { disabled = false }
    horizontal_pod_autoscaling { disabled = false }
  }

  release_channel { channel = "REGULAR" }
  vertical_pod_autoscaling { enabled = true }

  enable_tpu              = false
  enable_legacy_abac      = false
  enable_kubernetes_alpha = false

  logging_service    = "logging.googleapis.com/kubernetes"
  monitoring_service = "monitoring.googleapis.com/kubernetes"

  enable_autopilot = true

  initial_node_count = 1
  node_config {
    service_account = google_service_account.gke.email

    oauth_scopes = [
      "https://www.googleapis.com/auth/service.management.readonly",
      "https://www.googleapis.com/auth/logging.write",
      "https://www.googleapis.com/auth/monitoring",
      "https://www.googleapis.com/auth/devstorage.read_only",
      "https://www.googleapis.com/auth/trace.append",
    ]
  }

  timeouts {
    create = "10m"
    update = "10m"
    delete = "10m"
  }
}

Debug Output

https://gist.github.com/tSte/16b3bfc369242c1e0e88878d869f6f83

Panic Output

Expected Behavior

GKE autopilot cluster is created with non-default service_account and oauth_scopes.

Actual Behavior

GKE autopilot cluster is created with the default service account and oauth_scopes.

Steps to Reproduce

  1. terraform apply

Important Factoids

gcloud CLI enables service account configuration.

I tried to use cluster_autoscaling.auto_provisioning_defaults.service_account, but cluster_autoscaling.enabled conflicts with autopilot.

References

  • #0000
@tSte tSte added the bug label Apr 15, 2021
@venkykuberan venkykuberan self-assigned this Apr 15, 2021
@venkykuberan
Copy link
Contributor

Can you please attach the debug log, want to look at the request/response ..

@tSte
Copy link
Author

tSte commented Apr 16, 2021

@venkykuberan sure, added.

@ghost ghost removed waiting-response labels Apr 16, 2021
@venkykuberan
Copy link
Contributor

@tSte The service account used in the config is being passed to API on the Create request however API ignores that and associate default service account to cluster. I believe this is inline with Auto-Pilot's other restrictions. Provider is working as expected. With that said, we will add conflicts_with for service_account & enable_autopilot fields as an enhancement.

@tSte
Copy link
Author

tSte commented Apr 21, 2021

@venkykuberan I'm not sure if this is the case, because using always the default service account would collide with GKE hardening guide. The same thing applies to access scopes - if this assumption is correct, then there is no way to create a GKE cluster with other than default access scopes. I can confirm, that gcloud container clusters create-auto with --service-account and --scopes has the same behaviour. Let me check up with Google's support on this...

@tSte
Copy link
Author

tSte commented Apr 27, 2021

@venkykuberan this came from GCP support:

To update you on the status of your request, it is in fact an issue with the GKE Autopilot cluster creation process where it always uses the default Service Account instead of taking the SA defined on the “--service-account” flag. As you correctly pointed out, not allowing the usage of a custom SA is not one of the best practices according to the GKE hardening guide.

@venkykuberan
Copy link
Contributor

@tSte we have added fix for this condition here. Can you please try with latest version for the provider .

@tSte
Copy link
Author

tSte commented May 4, 2021

There is open issue on this matter...

@ghost ghost removed the waiting-response label May 4, 2021
@slevenick
Copy link
Collaborator

Great find @tSte we will likely need to track that to see what the resolution looks like before we can fix this in the provider. Until then the restriction on specifying these fields at the same time seems to be the best solution

@tSte
Copy link
Author

tSte commented May 10, 2021

To be honest, I'm not sure what should be passed to GCP REST API in order to create GKE autopilot cluster with custom service account and access scopes (e.g. node_config.service_account vs. auto_provisioning_defaults.service_account). Hope the issue will be resolved soon...

@Kampe
Copy link

Kampe commented May 14, 2021

We see the same issue while attempting to create clusters even without Autopilot. It seems that the node config block is not respecting the service account definition what so ever. This causes problems when you do not utilize the default compute service account and disable it entirely. This seems very much like a regression.

This is utilizing

* hashicorp/google: version = "~> 3.67.0"
* hashicorp/google-beta: version = "~> 3.67.0"

@slevenick
Copy link
Collaborator

@Kampe that is definitely worrying. Can you provide a config that specifies node_config & a service account that doesn't create using that service account?

@Kampe
Copy link

Kampe commented May 14, 2021

  node_config {
    service_account  = data.terraform_remote_state.per_env.outputs.sa_email
    oauth_scopes     = local.scopes

    metadata         = {
      disable-legacy-endpoints = "true"
    }

    disk_size_gb     = local.disk_size_gb
    machine_type     = local.machine_type
    preemptible      = false
    tags             = [
      "platform-allow-all-internal", 
      "platform-allow-htttp-external", 
      "gke-platform-node"
      ]

    shielded_instance_config {
      enable_integrity_monitoring = true
      enable_secure_boot          = false
    }
  }

Using this, with a created service account supplied will create the "default cluster nodepool" with the default compute service account instead.

We use this service account for the "real" node pools we standup after the default ones get sunset off the cluster create. This shows itself when attempting to create clusters google will never actually tie the nodes from the default pool to the cluster itself. Never allowing us to create our "real" node pool as the operation before it times out after 30 mins of waiting for healthchecks that never get satisfied.

@slevenick
Copy link
Collaborator

Huh, how are you checking the service account that is set on the default node pool?

I'm setting up a cluster with the following config:

 

  resource "google_service_account" "default" {
  account_id   = "service-account-id"
  display_name = "Service Account"
}

resource "google_container_cluster" "primary" {
  name               = "my-cluster"
  location           = "us-central1-a"
  initial_node_count = 3
  node_config {
    service_account  = google_service_account.default.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    metadata         = {
      disable-legacy-endpoints = "true"
    }

    preemptible      = false
    tags             = [
      "platform-allow-all-internal", 
      "platform-allow-htttp-external", 
      "gke-platform-node"
      ]

    shielded_instance_config {
      enable_integrity_monitoring = true
      enable_secure_boot          = false
    }
  }
}

And I'm getting the following response from the API after creating it:

 "nodePools": [
  {
   "name": "default-pool",
   "config": {
    "machineType": "e2-medium",
    "diskSizeGb": 100,
    "oauthScopes": [
     "https://www.googleapis.com/auth/cloud-platform"
    ],
    "metadata": {
     "disable-legacy-endpoints": "true"
    },
    "imageType": "COS",
    "tags": [
     "platform-allow-all-internal",
     "platform-allow-htttp-external",
     "gke-platform-node"
    ],
    "serviceAccount": "service-account-id@$MY_PROJECT.iam.gserviceaccount.com",
    "diskType": "pd-standard",
    "shieldedInstanceConfig": {
     "enableIntegrityMonitoring": true
    }
   },

which seems to point to the service account being set correctly

@tSte
Copy link
Author

tSte commented May 19, 2021

There's an update re. the autopilot issue so I'll check it out with both gcloud CLI and TF later today...
[EDIT]
... aaaand nope. They haven't fixed it yet.

@tSte
Copy link
Author

tSte commented Jun 14, 2021

@venkykuberan, @slevenick an update:
I've managed to create auto-pilot cluster with non-default service account as well as access scopes (even though the issue wasn't updated):

➜  ~ gcloud container clusters create-auto "hm-test-eu" --region "europe-west3" --release-channel "regular" --network "projects/myplayground/global/networks/my-vpc" --subnetwork "projects/myplayground/regions/europe-west3/subnetworks/my-subnet" --cluster-secondary-range-name="my-pods" --services-secondary-range-name="my-services" --enable-master-authorized-networks --enable-private-nodes --master-ipv4-cidr="172.16.0.16/28" --service-account="[email protected]" --scopes="logging-write,monitoring,storage-ro"

➜  ~ gcloud container clusters describe hm-test-eu --region europe-west3 | grep serviceAccount
    serviceAccount: [email protected]
  serviceAccount: [email protected]
    serviceAccount: [email protected]

However Terraform configuration does not allow me to create such a cluster, it still creates a cluster with default service account and default access scopes.

What is more, when I upgraded to latest provider, enable_autopilot conflicts with node_config. Sooo, how do I create an auto-pilot cluster with custom service account and access scopes?

Not sure if relevant, but it's possible to create GKE autopilot cluster via CLI and then import do TF, without changes.

@slevenick
Copy link
Collaborator

Hmmm, ok it looks like there has been some confusion around how this should work in the API and in Terraform.

Reading some internal docs it appears that we should now be able to set custom service accounts when using autopilot. I was under the impression that this was not a supported operation, so I marked the node_config field to conflict with enable_autopilot.

I'll need to do some testing on exactly how we are expected to send the custom service account when autopilot is enabled, but it should be possible now!

@slevenick
Copy link
Collaborator

Okay, I think the best solution in the short term is to remove the conflicts restriction on enable_autopilot + node_config. It seems like setting the service account via node_config now works when the cluster is in autopilot mode.

@tSte
Copy link
Author

tSte commented Jun 28, 2021

@slevenick may I ask you how did you test this? Because when I create the cluster, it is still created with the default service account. And when I terraform apply again, it prompts me to re-create the cluster with the change:

~ service_account   = "default" -> "[email protected]" # forces replacement

I'm still able to create GKE cluster with custom service account via gcloud CLI.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 29, 2021
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.