-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent overwriting states with other unrelated states #7026
Conversation
Yes!! |
This LGTM on initial review - would love to get @mitchellh's thoughts on overall history and UX here as well as @jen20's eyes since he's been spending a lot of time upgrading state for 0.7! |
f.Close() | ||
if err != nil { | ||
t.Fatalf("err: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper testStateFile
should be used here to save you a lot of lines and cleanup. :)
terraform/command/command_test.go
Line 136 in 48286ca
func testStateFile(t *testing.T, s *terraform.State) string { |
I made a few comments that need to be addressed. One other thing I'd recommend is actually documenting some of this behavior in perhaps a "state" internals page for: https://www.terraform.io/docs/internals/index.html That's probably outside the scope of this PR but its something that I think that should be documented soon since the state is slowly becoming more complex [to protect users] |
The lineage of a state is an identifier shared by a set of states whose serials are meaningfully comparable because they are produced by progressive Refresh/Apply operations from the same initial empty state. This is initialized as a type-4 (random) UUID when a new state is initialized and then preserved on all other changes. Since states before this change will not have lineage but users may wish to set a lineage for an existing state in order to get the safety benefits it will grow to imply, an empty lineage is considered to be compatible with all lineages.
Accidentally losing a state with resources in it can be anywhere from annoying to catastrophic. A common cause of such a problem is accidentally clobbering a remote state with an unrelated local state or vice-versa. Here we introduce safety checks that exploit the new "lineage" concept to ensure that once a state location is established it can only be updated with new states from the same lineage as the initial state. We also make the remote state caching mechanism treat a lineage mismatch as a confict, thus ensuring that automatic state syncing will stop if the user manages to get into a broken, mismatched state. As a special exception, we allow *empty* states to be "clobbered" regardless of lineage. This enables remote state to be configured easily in the common case where earlier user actions have implicitly caused an empty local state to be created, and also allows the remote state left behind from a config that has been destroyed to be overwritten by a new, unrelated state from a different config. The priority here is preventing actions that should never occur, so the UX is not polished. Later we may wish to make changes at higher levels of abstraction to either prevent these situations from arising in the first place (e.g. making remote configuration automatic based on config) or giving the user more guidance on resolution.
017d733
to
b34ff5f
Compare
Thanks for all the feedback! It looks like the feedback was on the remote state checking part, and I probably won't have time to iterate on this one more time before 0.7 final is cut, so I split the lineage part off into #7109 in the hope that it can get included in 0.7 in isolation. That will have no effect on the UX but will get this in the state format so that the UX change(s) can follow in a subsequent releases. |
I overwrote a remote state earlier today by mistake When trying to understand when a bogus local state would overwrite the remote one (my understanding is that today it is only based on serial number) I discovered the new lineage parameter in the state file which lead me to this PR I must say that this will be really helpful! |
I believe there are some more fundamental changes to remote state coming in the near future, and this is a pretty old patch anyway, so I'm going to close this out for now and maybe revisit this later once it's clearer how state management is changing. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Through real-world use of Terraform in production I have encountered a variety of "gotchas" relating to state management. This PR is an attempt to tackle one of these where I think I have a reasonable path forward: protecting against overwriting an remote states.
One common way this occurs is through clumsy use of
terraform remote config
. One variant of this is to forget to enable remote state and then try to enable it afterwards:terraform apply
terraform destroy
(deletes all of the stuff in the local state, leaving an empty state)terraform remote config ...
Another variant is when switching from one remote to another:
terraform remote config ...
terraform apply
terraform remote config ...
This PR first introduces the idea of "state lineage", which allows us to determine when one state is an earlier or later version of another state vs. when the two states are entirely unrelated. Comparing
Serial
is only expected to produce meaningful results whenLineage
matches.Building on this, when syncing local and remote state, fail with an error if the two states are of different lineage. This includes running
terraform remote config
to enable or re-configure remote state when a local state is already present.As a special case, if local cache has different lineage but it is entirely empty of resources then just silently drop it and replace it with the remote state, since the loss of an empty state is trivial to recover from and this will reduce friction when users are bootstrapping a new config.
The primary goal here is to return an error when the user seems to be accidentally doing something dangerous, with as little impact as possible to "legitimate" workflows. In future version of Terraform we may make more fundamental changes to these features to help the user not make these mistakes in the first place, but this is intended as a short-term fix to reduce the risk of a state-related catastrophe.
This is the same set of changes as from #6540 except that the change to prevent applying a stale plan is removed, since it no longer works in light of the
plan
behavior change introduced in #6811.