-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery external table requires replacement even though no changes have been made #10919
Comments
Hi @jamiet-msm ! I'm sorry you're running into this issue. Would you mind letting me know what version of the provider you are using? Running |
Hi @megan07 Sorry, should have included that shouldn't I. We're on 4.5.0 terraform {
backend "gcs" {}
required_providers {
google = {
source = "hashicorp/google"
version = "4.5.0"
}
google-beta = {
source = "hashicorp/google"
version = "4.5.0"
}
}
} |
Oh! No worries! I see now that you did, I wasn't looking in your configuration - sorry about that! Thanks! |
@jamiet-msm Very odd, I was able to repro this issue, however, I made the exact same changes (added the extra columns in the config) and I didn't have any problem. Can you tell me the order of operations when you tried that? When you did a |
@megan07 did you issue (I think that's correct. Give it a try.) |
I've updated a few typos in the original comment on this issue |
@jamiet-msm Hope I'm not adding too much noise to the original issue here, but I was successful in creating the table and Terraform only detecting changes when the actual schema changes using the following configuration: resource "google_bigquery_table" "events" {
# Your regular configuration here.
# Terraform will detect changes to this property made by BigQuery, but we'll ignore them using the `lifecycle` block.
schema = var.schema
external_data_configuration {
autodetect = false
source_format = "NEWLINE_DELIMITED_JSON"
source_uris = ["..."]
# Use the exact same schema here. This one won't be changed by BigQuery, however Terraform will still detect the changes you make on purpose to this field.
schema = var.schema
hive_partitioning_options {
mode = "CUSTOM"
source_uri_prefix = "..."
}
}
# If the schema does change on purpose, you will need to be able to delete the table.
# In this case, as the table is only a view on Google Storage data, it should be safe to delete it.
deletion_protection = false
lifecycle {
ignore_changes = [
# BigQuery will return the effective schema, which contains differences (e.g. the partition column(s) is added to
# it). Recreation of the table should only be based on `external_data_configuration.schema`, which is only stored
# in the Terraform state, not BigQuery. This field contains exactly the input schema and can be used for diffs.
schema,
]
}
} However I'm curious about your I know that I'm a bit off-topic here, but as the solution could also imply improving schema management in Terraform, I thought I'd chime in. |
To unblock your environment you can https://www.terraform.io/language/meta-arguments/lifecycle#ignore_changes |
Hi @ScottSuarez @flovouin , |
Hi @jamiet-msm , could you do one quick thing for me and completely move the |
Hi @megan07 |
@megan07 apologies, I missed this somehow, so I didn't attempt it. Agree with @EladDolev that this should be in the docs. |
Why: * I was recommended to do this by @megan07 at hashicorp/terraform-provider-google#10919 (comment) and it works! This change addresses the need by: * Moving the schema definition into external_data_configuration did the trick. After doing so I was able to apply and then a subsequent apply did not cause any changes. ``` terraform apply -var project=project-name -var region=europe-west2 An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: # google_bigquery_dataset.hive_store will be created + resource "google_bigquery_dataset" "hive_store" { + creation_time = (known after apply) + dataset_id = "hive_store" + delete_contents_on_destroy = false + etag = (known after apply) + id = (known after apply) + last_modified_time = (known after apply) + location = "EU" + project = "project-name" + self_link = (known after apply) + access { + domain = (known after apply) + group_by_email = (known after apply) + role = (known after apply) + special_group = (known after apply) + user_by_email = (known after apply) + view { + dataset_id = (known after apply) + project_id = (known after apply) + table_id = (known after apply) } } } # google_bigquery_table.hive_table will be created + resource "google_bigquery_table" "hive_table" { + creation_time = (known after apply) + dataset_id = "hive_store" + deletion_protection = false + etag = (known after apply) + expiration_time = (known after apply) + id = (known after apply) + last_modified_time = (known after apply) + location = (known after apply) + num_bytes = (known after apply) + num_long_term_bytes = (known after apply) + num_rows = (known after apply) + project = "project-name" + schema = (known after apply) + self_link = (known after apply) + table_id = "messages" + type = (known after apply) + external_data_configuration { + autodetect = false + compression = "NONE" + schema = jsonencode( [ + { + mode = "NULLABLE" + name = "column1" + type = "STRING" }, ] ) + source_format = "NEWLINE_DELIMITED_JSON" + source_uris = [ + "gs://project-name-bucket/publish/*", ] + hive_partitioning_options { + mode = "CUSTOM" + require_partition_filter = false + source_uri_prefix = "gs://project-name-bucket/publish/{dt:STRING}/{hr:STRING}/{min:STRING}" } } } # google_storage_bucket.bucket will be created + resource "google_storage_bucket" "bucket" { + force_destroy = false + id = (known after apply) + location = "EU" + name = "project-name-bucket" + project = "project-name" + self_link = (known after apply) + storage_class = "STANDARD" + uniform_bucket_level_access = (known after apply) + url = (known after apply) } # google_storage_bucket_object.fake_message will be created + resource "google_storage_bucket_object" "fake_message" { + bucket = "project-name-bucket" + content = (sensitive value) + content_type = (known after apply) + crc32c = (known after apply) + detect_md5hash = "different hash" + id = (known after apply) + kms_key_name = (known after apply) + md5hash = (known after apply) + media_link = (known after apply) + name = "publish/dt=2000-01-01/hr=00/min=00/fake_message.json" + output_name = (known after apply) + self_link = (known after apply) + storage_class = (known after apply) } Plan: 4 to add, 0 to change, 0 to destroy. Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve. Enter a value: yes google_bigquery_dataset.hive_store: Creating... google_storage_bucket.bucket: Creating... google_bigquery_dataset.hive_store: Creation complete after 2s [id=projects/project-name/datasets/hive_store] google_storage_bucket.bucket: Creation complete after 2s [id=project-name-bucket] google_storage_bucket_object.fake_message: Creating... google_storage_bucket_object.fake_message: Creation complete after 0s [id=project-name-bucket-publish/dt=2000-01-01/hr=00/min=00/fake_message.json] google_bigquery_table.hive_table: Creating... google_bigquery_table.hive_table: Creation complete after 1s [id=projects/project-name/datasets/hive_store/tables/messages] Apply complete! Resources: 4 added, 0 changed, 0 destroyed. ➜ hive_partitioned_table_issue git:(master) ✗ terraform apply -var project=project-name -var region=europe-west2 google_storage_bucket.bucket: Refreshing state... [id=project-name-bucket] google_bigquery_dataset.hive_store: Refreshing state... [id=projects/project-name/datasets/hive_store] google_storage_bucket_object.fake_message: Refreshing state... [id=project-name-bucket-publish/dt=2000-01-01/hr=00/min=00/fake_message.json] google_bigquery_table.hive_table: Refreshing state... [id=projects/project-name/datasets/hive_store/tables/messages] Apply complete! Resources: 0 added, 0 changed, 0 destroyed. ```
@megan07 happy to report that your fix worked. Here is the commit with the fix: jamiet-msm/hive_partitioned_table_issue@f7f8e76 and here is the proof, note 0 changes on the subsequent apply: terraform apply -var project=project-name -var region=europe-west2
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create
Terraform will perform the following actions:
# google_bigquery_dataset.hive_store will be created
+ resource "google_bigquery_dataset" "hive_store" {
+ creation_time = (known after apply)
+ dataset_id = "hive_store"
+ delete_contents_on_destroy = false
+ etag = (known after apply)
+ id = (known after apply)
+ last_modified_time = (known after apply)
+ location = "EU"
+ project = "project-name"
+ self_link = (known after apply)
+ access {
+ domain = (known after apply)
+ group_by_email = (known after apply)
+ role = (known after apply)
+ special_group = (known after apply)
+ user_by_email = (known after apply)
+ view {
+ dataset_id = (known after apply)
+ project_id = (known after apply)
+ table_id = (known after apply)
}
}
}
# google_bigquery_table.hive_table will be created
+ resource "google_bigquery_table" "hive_table" {
+ creation_time = (known after apply)
+ dataset_id = "hive_store"
+ deletion_protection = false
+ etag = (known after apply)
+ expiration_time = (known after apply)
+ id = (known after apply)
+ last_modified_time = (known after apply)
+ location = (known after apply)
+ num_bytes = (known after apply)
+ num_long_term_bytes = (known after apply)
+ num_rows = (known after apply)
+ project = "project-name"
+ schema = (known after apply)
+ self_link = (known after apply)
+ table_id = "messages"
+ type = (known after apply)
+ external_data_configuration {
+ autodetect = false
+ compression = "NONE"
+ schema = jsonencode(
[
+ {
+ mode = "NULLABLE"
+ name = "column1"
+ type = "STRING"
},
]
)
+ source_format = "NEWLINE_DELIMITED_JSON"
+ source_uris = [
+ "gs://project-name-bucket/publish/*",
]
+ hive_partitioning_options {
+ mode = "CUSTOM"
+ require_partition_filter = false
+ source_uri_prefix = "gs://project-name-bucket/publish/{dt:STRING}/{hr:STRING}/{min:STRING}"
}
}
}
# google_storage_bucket.bucket will be created
+ resource "google_storage_bucket" "bucket" {
+ force_destroy = false
+ id = (known after apply)
+ location = "EU"
+ name = "project-name-bucket"
+ project = "project-name"
+ self_link = (known after apply)
+ storage_class = "STANDARD"
+ uniform_bucket_level_access = (known after apply)
+ url = (known after apply)
}
# google_storage_bucket_object.fake_message will be created
+ resource "google_storage_bucket_object" "fake_message" {
+ bucket = "project-name-bucket"
+ content = (sensitive value)
+ content_type = (known after apply)
+ crc32c = (known after apply)
+ detect_md5hash = "different hash"
+ id = (known after apply)
+ kms_key_name = (known after apply)
+ md5hash = (known after apply)
+ media_link = (known after apply)
+ name = "publish/dt=2000-01-01/hr=00/min=00/fake_message.json"
+ output_name = (known after apply)
+ self_link = (known after apply)
+ storage_class = (known after apply)
}
Plan: 4 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.
Enter a value: yes
google_bigquery_dataset.hive_store: Creating...
google_storage_bucket.bucket: Creating...
google_bigquery_dataset.hive_store: Creation complete after 2s [id=projects/project-name/datasets/hive_store]
google_storage_bucket.bucket: Creation complete after 2s [id=project-name-bucket]
google_storage_bucket_object.fake_message: Creating...
google_storage_bucket_object.fake_message: Creation complete after 0s [id=project-name-bucket-publish/dt=2000-01-01/hr=00/min=00/fake_message.json]
google_bigquery_table.hive_table: Creating...
google_bigquery_table.hive_table: Creation complete after 1s [id=projects/project-name/datasets/hive_store/tables/messages]
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.
➜ hive_partitioned_table_issue git:(master) ✗ terraform apply -var project=project-name -var region=europe-west2
google_storage_bucket.bucket: Refreshing state... [id=project-name-bucket]
google_bigquery_dataset.hive_store: Refreshing state... [id=projects/project-name/datasets/hive_store]
google_storage_bucket_object.fake_message: Refreshing state... [id=project-name-bucket-publish/dt=2000-01-01/hr=00/min=00/fake_message.json]
google_bigquery_table.hive_table: Refreshing state... [id=projects/project-name/datasets/hive_store/tables/messages]
Apply complete! Resources: 0 added, 0 changed, 0 destroyed. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Terraform v0.14.9
Affected Resource(s)
Terraform Configuration Files
Debug Output
https://github.com/jamiet-msm/hive_partitioned_table_issue/blob/master/debug_output
Expected Behavior
Apply this configuration multiple times without terraform attempting to destroy anything or make changes.
In other words, if I apply and then make zero changes to the code then terraform should attempt to make zero changes to the deployed infrastructure.
Actual Behavior
On subsequent applies terraform destroys the columns that have been defined as hive partitioning columns:
Steps to Reproduce
terraform apply -var project=projectname -var region=europe-west2
, everything gets deployedbq query --nouse_legacy_sql "select * from projectname.hive_store.messages"
terraform apply -var project=projectname -var region=europe-west2
again, terraform determines some columns need to be removed from the table. The debug output from this operation is linked to above.Important Factoids
I am attempting to create a BigQuery external table that follows a hive partitioning layout, see Querying externally partitioned data for more information.
As far as I know I have defined the table in my terraform configuration correctly (I admit tis possible that I have not however I have tried different combinations and can't get the desired behaviour).
The problem here is that the partitioning columns, which are not originally defined in the table schema, seemingly get added to the table schema and then when I come to apply again terraform observes that those columns are not part of the schema as defined in the terraform configuration and attempts to remove them.
This is a problem because we have an automated deployment pipeline that automatically deploys changes to our environments but it only does so if terraform doesn't do anything destructive. If terraform reports that it WILL do anything destructive then we intentionally halt the deployment because we want a human being to say whether the destruction is valid or not. With the behaviour I've described herein every single apply will cause a destruction and hence we've lost our ability to automatically deploy stuff. This is a major major problem for us, automation is the bedrock of our team's successes, if we have to manually deploy everything then we lose our ability to deploy many many times a day which is what we have been doing up to now.
I acknowledge that it is BigQuery's API that is reporting that these columns are now part of the schema however we need the terraform provider to recognise that the added columns are the partitioning columns and therefore not attempt to make any changes.
I did try to circumvent the problem by defining those columns as part of the schema, thus my resource changed to:
however attempting to apply the configuration with those changes caused error:
which makes sense of course.
Quite simply, I need to be able to apply this table without causing changes on subsequent applies
The text was updated successfully, but these errors were encountered: