-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add safe-delete workspace API, and org setting to restrict force-delete #539
Conversation
organization.go
Outdated
@@ -78,6 +78,7 @@ type Organization struct { | |||
TrialExpiresAt time.Time `jsonapi:"attr,trial-expires-at,iso8601"` | |||
TwoFactorConformant bool `jsonapi:"attr,two-factor-conformant"` | |||
SendPassingStatusesForUntriggeredSpeculativePlans bool `jsonapi:"attr,send-passing-statuses-for-untriggered-speculative-plans"` | |||
AllowForceDeleteWorkspaces bool `jsonapi:"attr,allow-force-delete-workspaces"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this property is missing for older TFE versions, the default will be false. And then, of course, all deletes will be force deletes because safe delete isn't implemented. I think we are OK with this contradiction but I wanted to point it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a really good point...I am also ok with the contradiction, it seems less confusing than making this a *bool
or something when it isn't strictly necessary and we use regular bool
s are used for everything else in this struct. I also cant think of a less confusing name...
I will add a comment though to make it clear that this will be false for older TFE versions, even though all deletes are force deletes in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only thing I saw was that little duplicate assertion, apart from that I'd say this PR looks great 💯
workspace_integration_test.go
Outdated
w, err := client.Workspaces.ReadByID(ctx, wTest.ID) | ||
require.NoError(t, err) | ||
assert.Equal(t, wTest, w) | ||
assert.Equal(t, wTest, w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate assertion here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
302c326
to
354728c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great 👍 . There are a few minor blocking issues concerning your tests. I also wanted to mention the changelog is a bit out of order at the moment and will be resolved in #549
organization_integration_test.go
Outdated
} | ||
|
||
org, err := client.Organizations.Create(ctx, options) | ||
assert.Nil(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be require.NoError(t, error)
-- reason being is the assert
package will not halt the execution of the test if it fails, leading to more cascading errors and more noise. Make sure to change all error checks to use require
👍
organization_integration_test.go
Outdated
t.Cleanup(func() { | ||
err := client.Organizations.Delete(ctx, org.Name) | ||
if err != nil { | ||
t.Logf("error deleting organization (%s): %s", org.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally we t.Errorf()
when cleanup fails.
@@ -565,7 +565,41 @@ func TestOrganizationsReadRunTasksEntitlement(t *testing.T) { | |||
assert.NotEmpty(t, entitlements.ID) | |||
assert.True(t, entitlements.RunTasks) | |||
}) | |||
} | |||
|
|||
func TestOrganizationsAllowForceDeleteSetting(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our docs should be much more explicit, but we've recently introduced test splitting in our CI and we're requiring all top-level tests to call skipIfNotCINode(t)
as the first check. This will ensure the test runs only once on the correct test node instead of on every single node. You do not need to do this for subtests.
workspace_integration_test.go
Outdated
orgTest, orgTestCleanup := createOrganization(t, client) | ||
defer orgTestCleanup() | ||
|
||
wTest, _ := createWorkspace(t, client, orgTest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, Is there a reason we don't clean this workspace up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I was copying from the update tests in this file, which also ignore the cleanup
Im not sure why though...I will try to update all the tests to do the cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I remember. Because the the workspace gets removed in the process of the test, and then the cleanup call throws an errors. Or, in some other tests (and also this one, I would guess) the org also gets cleaned up first, which deletes the workspace and causes a defered workspace cleanup to fail
E.g. if you try to defer the cleanup in the test I linked above you get
=== RUN TestWorkspacesUpdate
=== CONT TestWorkspacesUpdate
helper_test.go:1602: Error destroying workspace! WARNING: Dangling resources
may exist! The full error is shown below.
Workspace: 0874350d-2197-1355-74c7-4ee437bbc010
Error: resource not found
--- FAIL: TestWorkspacesUpdate (5.32s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels like it should be a solve-able problem. I will update the workspace cleanup function not to error on 404s, and then update the other tests in this file to cleanup the workspaces which they create
workspace_integration_test.go
Outdated
defer workspaceCleanup() | ||
_, svTestCleanup := createStateVersion(t, client, 0, wTest) | ||
t.Cleanup(svTestCleanup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocker: Not a huge deal to mix the two, but the preference here is for t.Cleanup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that it is preferred to do t.Cleanup(workspaceCleanup)
instead of defer workspaceCleanup()
directly?
…e admins can force delete
354728c
to
e8b91ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost at the finish line 🏃 ⬇️
helper_test.go
Outdated
if err == ErrResourceNotFound { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, a recurring pain point for us. I tried introducing this in #527 but it was ultimately rejected considering it could obfuscate an authorization bug. So unfortunately will have to reject it here -- I think we can omit the cleanup for now if an "already deleted" workspace is causing this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting...I think I would lean the other direction on that, since a test creating a workspace and then being unable to "see" it when it goes for a delete seems unlikely, but I definitely see the logic
I removed the ErrResourceNotFound
check, and switched a lot of the tests to using t.Cleanup
instead of defer
for the cleanup callbacks, and that does seem to have cleared up most of the cleanup errors~
I also changed this to delete the workspace by ID instead of Name, because we had one test which changes the workspace name (causing this cleanup to fail)
workspace_integration_test.go
Outdated
wTest, workspaceCleanup := createWorkspace(t, client, orgTest) | ||
t.Cleanup(workspaceCleanup) | ||
w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{}) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed one require.NoError(t, err)
, I'd even say we should also require.True(t, w.Locked)
since the rest of your test depends on this being true. If this fails, there's no point in continuing to run the rest of the test.
workspace_integration_test.go
Outdated
wTest, workspaceCleanup := createWorkspace(t, client, orgTest) | ||
t.Cleanup(workspaceCleanup) | ||
w, err := client.Workspaces.Lock(ctx, wTest.ID, WorkspaceLockOptions{}) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
workspace_integration_test.go
Outdated
assert.Equal(t, wTest, w) | ||
assert.NotNil(t, w.Permissions.CanForceDelete) | ||
assert.True(t, *w.Permissions.CanForceDelete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like your test is panicking:
=== FAIL: . TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s)
workspace_integration_test.go:1237:
Error Trace: /home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1237
Error: Expected value not to be nil.
Test: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete
--- FAIL: TestCanForceDeletePermission/workspace_permission_set_includes_can-force-delete (0.13s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7f0a47]
goroutine 2995 [running]:
testing.tRunner.func1.2({0x861640, 0xbf9470})
/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1399 +0x39f
panic({0x861640, 0xbf9470})
/opt/hostedtoolcache/go/1.19.2/x64/src/runtime/panic.go:884 +0x212
github.com/hashicorp/go-tfe.TestCanForceDeletePermission.func1(0x1?)
/home/runner/work/go-tfe/go-tfe/workspace_integration_test.go:1238 +0xe7
testing.tRunner(0xc0004b56c0, 0xc0006b2780)
/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run
/opt/hostedtoolcache/go/1.19.2/x64/src/testing/testing.go:1493 +0x35f
exit status 2
Two things:
- This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.
- We tend to require the existence of a relationship struct, i.e
require.NotNil(t, w.Permissions)
, before we check said relationship's fields (otherwise we risk a panic).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very well may be an issue with our CI instance not having the latest API changes. In which case we'd need to fix this.
I think that this is at least partially it...where can I find the instance that the acceptance tests run against? I can check out if it has all the api changes expected
Edit: Found it, and it looks like it has at least some of the expected changes...I will dig into why the tests are failing like this!
Edit edit: I was wrong in the previous edit. I was looking at tfelocal-cloud
(which has the correct changes) not tflocal-go-tfe
which does not. @sebasslash can you confirm that the integration tests are running against tflocal-go-tfe
, and help me figure out how to update it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Looks like there are some flakes unrelated to this PR but still approving.
Description
Adds the new workspace safe-delete API (and tests)
Adds the can-force-delete permission hash to the workspace permissions hash. This value is a
*bool
which will allow clients to detect if safe vs force delete is availableAdds the allow_force_delete_workspaces setting to organizations
These are all part of a change which adds some guardrails around workspace deletion in TFC. There will be a new safe-delete API to remove a workspace only if it does not have resources under management. The existing delete API will continue to function as a force delete, but may be restricted to org owners based on an org level setting
This change is in draft as we complete the feature in TFC and prepare for it in a TFE release
Testing plan
External links
Output from tests
Including output from tests may require access to a TFE instance. Ignore this section if you have no environment to test against.