Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for organization-level default execution mode and agent pool #1137

Merged
merged 19 commits into from
Nov 27, 2023

Conversation

SwiftEngineer
Copy link
Collaborator

@SwiftEngineer SwiftEngineer commented Nov 9, 2023

Description

Adds support for organization-level default execution modes and agent pools 🎉

Remember to:

Testing plan

  1. Verify that workspaces can be created/updated to utilize the organization's default execution mode while the organization default execution mode is set to agent, then again with the default set to remote. Flip back and forth between the execution mode being set at the workspace-level and the organization-level
  2. Verify that the workspace's operations setting still works as expected, and that workspaces created with the operations setting can be migrated to use the organization default execution mode. Do this by creating a workspace with operations set to some value. Then remove the operations setting and apply the configuration.
  3. Using an older version of TFE, verify that the behavior of the provider matches the old behavior (remote is always the default execution mode).

External links

Include any links here that might be helpful for people reviewing your PR. If there are none, feel free to delete this section.

@SwiftEngineer SwiftEngineer requested a review from a team as a code owner November 9, 2023 20:31
@SwiftEngineer SwiftEngineer force-pushed the SwiftEngineer/org-level-agent-pools branch from 342c8c9 to 7d6056e Compare November 9, 2023 20:38
@SwiftEngineer SwiftEngineer changed the title Swift engineer/org level agent pools org level agent pools Nov 9, 2023
…using a version of tfe that supports default execution modes
@SwiftEngineer SwiftEngineer force-pushed the SwiftEngineer/org-level-agent-pools branch from 26264f9 to 7e14b3a Compare November 14, 2023 22:38
@@ -274,7 +275,7 @@ func resourceTFETeamDelete(d *schema.ResourceData, meta interface{}) error {
log.Printf("[DEBUG] Delete team: %s", d.Id())
err := config.Client.Teams.Delete(ctx, d.Id())
if err != nil {
if err == tfe.ErrResourceNotFound {
if errors.Is(err, tfe.ErrResourceNotFound) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes aren't related to my PR, but they showed up in the lint and I figured I might as well address them.

@@ -725,6 +725,7 @@ resource "tfe_organization" "foobar" {
resource "tfe_workspace" "foobar" {
name = "workspace-test"
organization = tfe_organization.foobar.id
execution_mode = "remote"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses an old provider version for the first step, so applying with the modern provider will produce a diff because the setting_overwrites need to be set in order for the provider to know that it is communicating with a version of TFE that supports setting_overwrites.

Explicitly setting the execution_mode eliminates this diff, without affecting the test's assertions in any way.

@SwiftEngineer SwiftEngineer changed the title org level agent pools support for organization-level default execution mode and agent pool Nov 21, 2023
"tfe_notification_configuration": resourceTFENotificationConfiguration(),
"tfe_oauth_client": resourceTFEOAuthClient(),
"tfe_organization": resourceTFEOrganization(),
"tfe_organization_default_execution_mode": resourceTFEOrganizationDefaultExecutionMode(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ the only line that really changed

Copy link
Collaborator

@brandonc brandonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking! I couldn't break it, although I had to be careful about when my workspace was created relative to the new resource.

Comment on lines +27 to +31
resource "tfe_organization_default_execution_mode" "org_default" {
organization = tfe_organization.test.name
default_execution_mode = "agent"
default_agent_pool_id = tfe_agent_pool.my_agents.id
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting consequence of this design is that workspaces have to wait on this in order to assume the default. I wonder if we can add to the docs to demonstrate how to ensure a workspace in the same config uses the org default. I can't really think of a great way to reference the execution mode, so perhaps we can just fallback to depends_on

resource "tfe_workspace" "workspace_with_org_default" {
  organization = tfe_organization.test.name
  name = "workspace-with-org-default"
  # Ensure workspaces are created after the org default execution mode is set
  depends_on = [tfe_organization_default_execution_mode.org_default]
}

It's either that or something like a precondition

resource "tfe_workspace" "workspace_with_org_default" {
  organization = tfe_organization.test.name
  name = "workspace-with-org-default"

  # Ensure workspaces are created after the org default execution mode is set
  lifecycle {
    precondition {
      error_message = "Must use org default execution"
      condition     = tfe_organization_default_execution_mode.name.default_execution_mode != ""
    }
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't matter if the execution_mode and agent_pool_id wouldn't be copied from the organization defaults to the state. The actual workspace will anyway follow the upstream settings, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. The workspace's execution mode is evaluated (more or less) in realtime, so no depends_on needed here.

Comment on lines +149 to +151
* `setting_overwrites` - Can be used to check whether a setting is currently inheriting its value from another resource.
- `execution_mode` - Set to `true` if the execution mode of the workspace is being determined by the setting on the workspace itself. It will be `false` if the execution mode is inherited from another resource (e.g. the organization's default execution mode)
- `agent_pool` - Set to `true` if the agent pool of the workspace is being determined by the setting on the workspace itself. It will be `false` if the agent pool is inherited from another resource (e.g. the organization's default agent pool)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we could perhaps conceive of a more ergonomic schema, I like that this reflects the API. At least we are only committed to what we're already committed to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the setting_overwrites at all. 😂

Although if I get it right, it shouldn't actually be needed to be specified, if the user just doesn't specify execution_mode.

But I still find it confusing to copy the upstream execution mode and agent pool to the state. I think it even causes state drift and a need for terraform apply if the upstream setting is changed (in another module or by web console).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, only now I realized that this is a computed (i.e. read only) attribute. So basically the new feature is that the default execution mode is the "upstream" default. Maybe add a comment in the first example, too?

The last point should still be valid. I don't think it's according to the design principals for resources to have attribute values controlled by other resources. State drifts are annoying. 😅

If someone really needs to know what the default execution mode is, they can use the (still missing) tfw_organization_default_execution_mode data source. And also the tfe_organization should be changed accordingly to set the execution_mode and agent_pool_id to null if setting_overwrites so says. (And here I'm not so sure that the null is the best value to control this feature in the first place.)

Copy link
Collaborator

@brandonc brandonc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe some additional import docs !


## Import

This resource does not manage the creation of an organization and there is no need to import it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I think you might as well include instructions since you would need to import the resource if you did not wish to manipulate config during an import process.

Suggested change
This resource does not manage the creation of an organization and there is no need to import it.
Organization default execution mode can be imported; use `<ORGANIZATION NAME>` as the import ID. For
example:
(backticks)shell
terraform import tfe_organization_default_execution_mode.test my-org-name
(backticks)

@@ -4,6 +4,8 @@

FEATURES:
* `d/tfe_registry_module`: Add `vcs_repo.tags` and `vcs_repo.branch` attributes to allow configuration of `publishing_mechanism`. Add `test_config` to support running tests on `branch`-based registry modules, by @hashimoon [1096](https://github.com/hashicorp/terraform-provider-tfe/pull/1096)
* **New Resource**: `r/tfe_organization_default_execution_mode` is a new resource to set the `default_execution_mode` and `default_agent_pool_id` for an organization, by @SwiftEngineer [1137](https://github.com/hashicorp/terraform-provider-tfe/pull/1137)'
* `r/tfe_workspace`: Now uses the organization's `default_execution_mode` and `default_agent_pool_id` as the default `execution_mode`, by @SwiftEngineer [1137](https://github.com/hashicorp/terraform-provider-tfe/pull/1137)'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change which should be highlighted

Copy link
Collaborator

@brandonc brandonc Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this breaks the API of the provider or the platform API itself because it's a purely additive change. If you don't change the default at the org level, everything continues to behave as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't change the default at the org level

Yeah but what if someone does? 🙂
The provider is still pre-1.0 so everything is allowed. I'm just suggesting to add a specific note about this in the changelog.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. Previously, I was thinking setting the default execution_mode in an admission that you were prepared for that change to affect all the workspaces in your organization, but adding a note here couldn't hurt!

Comment on lines +27 to +31
resource "tfe_organization_default_execution_mode" "org_default" {
organization = tfe_organization.test.name
default_execution_mode = "agent"
default_agent_pool_id = tfe_agent_pool.my_agents.id
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't matter if the execution_mode and agent_pool_id wouldn't be copied from the organization defaults to the state. The actual workspace will anyway follow the upstream settings, right?

Comment on lines +41 to +42
to be set to `agent`. This value _must not_ be provided if `default_execution_mode` is set to any other value or if `operations` is
provided.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No operations argument here

Suggested change
to be set to `agent`. This value _must not_ be provided if `default_execution_mode` is set to any other value or if `operations` is
provided.
to be set to `agent`. This value _must not_ be provided if `default_execution_mode` is set to any other value.

Comment on lines +149 to +151
* `setting_overwrites` - Can be used to check whether a setting is currently inheriting its value from another resource.
- `execution_mode` - Set to `true` if the execution mode of the workspace is being determined by the setting on the workspace itself. It will be `false` if the execution mode is inherited from another resource (e.g. the organization's default execution mode)
- `agent_pool` - Set to `true` if the agent pool of the workspace is being determined by the setting on the workspace itself. It will be `false` if the agent pool is inherited from another resource (e.g. the organization's default agent pool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the setting_overwrites at all. 😂

Although if I get it right, it shouldn't actually be needed to be specified, if the user just doesn't specify execution_mode.

But I still find it confusing to copy the upstream execution mode and agent pool to the state. I think it even causes state drift and a need for terraform apply if the upstream setting is changed (in another module or by web console).

@brandonc
Copy link
Collaborator

@tmatilai The reason why we store settings_overwrites in state is to simplify reading execution mode from the workspace, even if that value comes from the organization. Since this is a computed field, it is allowed to vary without drift if the org default changes. This means that changing the org default should never result in an unplanned change unless you are reading the execution mode into some other value, like an output, in which case that would be useful to know about! It's often the case that resources will change the value of computed attributes on other resources.

Here's what terraform does when I apply a default change to the org and a workspace "foo" is using the default:

$ terraform apply
tfe_organization_default_execution_mode.name: Refreshing state... [id=hashicorp]
tfe_agent_pool.mypool: Refreshing state... [id=apool-vyFK1NML9LRi2DsG]
tfe_workspace.foo: Refreshing state... [id=ws-J97L1xH8pFzwPp8W]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # tfe_organization_default_execution_mode.name must be replaced
-/+ resource "tfe_organization_default_execution_mode" "name" {
      - default_agent_pool_id  = "apool-vyFK1NML9LRi2DsG" -> null # forces replacement
      ~ default_execution_mode = "agent" -> "remote" # forces replacement
      ~ id                     = "hashicorp" -> (known after apply)
      + organization           = (known after apply)
    }

Plan: 1 to add, 0 to change, 1 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

tfe_organization_default_execution_mode.name: Destroying... [id=hashicorp]
tfe_organization_default_execution_mode.name: Destruction complete after 1s
tfe_organization_default_execution_mode.name: Creating...
tfe_organization_default_execution_mode.name: Creation complete after 1s [id=hashicorp]

Apply complete! Resources: 1 added, 0 changed, 1 destroyed.

$ terraform apply

tfe_organization_default_execution_mode.name: Refreshing state... [id=hashicorp]
tfe_agent_pool.mypool: Refreshing state... [id=apool-vyFK1NML9LRi2DsG]
tfe_workspace.foo: Refreshing state... [id=ws-J97L1xH8pFzwPp8W]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

@tmatilai
Copy link
Contributor

tmatilai commented Nov 27, 2023

@brandonc thanks a lot for taking the time to answer! 🤗

The reason why we store settings_overwrites in state is to simplify reading execution mode from the workspace, even if that value comes from the organization.

I can't think why that would be useful, but maybe there is a use case for it? 🤔

And then in the tfe_workspace data source we need to use the settings_overwrites to figure out where the execution mode comes from. Although not sure if that is very interesting either. 😆

I just find it really strange to have a dynamically changing default, and even changing not only on resource creation.

It's often the case that resources will change the value of computed attributes on other resources.

But it's not purely computed, as it's also optional. Not sure if I can agree on "ofter". 😄

# tfe_organization_default_execution_mode.name must be replaced
-/+ resource "tfe_organization_default_execution_mode" "name" {
      - default_agent_pool_id  = "apool-vyFK1NML9LRi2DsG" -> null # forces replacement
      ~ default_execution_mode = "agent" -> "remote" # forces replacement
      ~ id                     = "hashicorp" -> (known after apply)
      + organization           = (known after apply)
    }

Btw, who do those changes force recreation? I wonder if the resourceTFEOrganizationDefaultExecutionModeCreate() would work as is if hooked to the Update in the schema.

@tmatilai
Copy link
Contributor

unless you are reading the execution mode into some other value, like an output

And I admit that I missed this.

@SwiftEngineer
Copy link
Collaborator Author

SwiftEngineer commented Nov 27, 2023

Hiya @tmatilai 👋 I thought I'd try and give some background context on why setting_overwrites exists in the first place.

I can't think why that would be useful, but maybe there is a use case for it?

The original reason for setting_overwrites being added to TFC/TFE in the first place was to keep the API backwards-compatible with previous versions of the CLI. One of the first things the CLI does is check this execution_mode, so we needed to make sure that execution_mode always pointed to the "resolved" execution_mode of the workspace.

A big reason why I believe we needsetting_overwrites now in the provider is backwards-compatibility. When communicating with an older version of TFE, the way we revert back from a "overwritten" execution_mode to a "default" one is quite different. Without storing the version of TFE in the state of the resource, we would have to rely on some other mechanism to identify the version of TFE we're talking to.

Another reason (albeit opinionated) is that matching the underlying API is one of the provider design principles, and one that I personally have felt particularly attached to. I've found time and time again that straying from the underlying API, or creating abstractions in the provider over concepts in the API makes for some very painful and unpredictable design challenges further down the road.

I just find it really strange to have a dynamically changing default, and even changing not only on resource creation.

You and I are totally in agreement there! That being said, we do technically have a "default" that can change. So, I think it's super important the provider reflects this reality, rather than try to abstract over it.

This is why I actually personally don't recommend users to use the organization-level default unless it is something they really truly need. Instead, I recommend users to use this provider to provision their workspaces, and to set the execution_mode explicitly. This change should have ZERO effect on any organizations that do NOT use organization-level defaults.

But it's not purely computed, as it's also optional. Not sure if I can agree on "ofter". 😄

I think maybe what @brandonc meant by "purely computed" was "read only", in which case the field NEEDS to be both computed and optional.

Btw, who do those changes force recreation? I wonder if the resourceTFEOrganizationDefaultExecutionModeCreate() would work as is if hooked to the Update in the schema.

This is totally just a choice in implementation. By making the resource just support CREATE/READ/DELETE, it makes things quite a bit simpler! Less things to implement, less things to test, less things to break 🎉

@@ -2,8 +2,13 @@
<!-- Add CHANGELOG entry to this section for any PR awaiting the next release -->
<!-- Please also include if this is a Bug Fix, Enhancement, or Feature -->

BREAKING CHANGES:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't believe this rises to the level of a breaking change, but do think it deserves a special note. I'll think on it some more.

@brandonc brandonc merged commit f217b6e into main Nov 27, 2023
@brandonc brandonc deleted the SwiftEngineer/org-level-agent-pools branch November 27, 2023 22:01
@tmatilai
Copy link
Contributor

Hi @SwiftEngineer, I really appreciate your explanations! 🙌

keep the API backwards-compatible with previous versions of the CLI

Fair point. I was locked in the TF provider context only. These are not easy choices.

Without storing the version of TFE in the state of the resource, we would have to rely on some other mechanism to identify the version of TFE we're talking to.

Sure. But in case needed, maybe look at e.g. the Vault provider which has helpers to enable features based on the Vault version queried over API.

Another reason (albeit opinionated) is that matching the underlying API is one of the provider design principles, and one that I personally have felt particularly attached to. I've found time and time again that straying from the underlying API, or creating abstractions in the provider over concepts in the API makes for some very painful and unpredictable design challenges further down the road.

I indeed don't have the experience of maintaining a provider. But with my long history of Terraform user I tend to like the providers and resources with nice UX, and am not so interested in the underlying API implementation. 😅

This is why I actually personally don't recommend users to use the organization-level default unless it is something they really truly need. Instead, I recommend users to use this provider to provision their workspaces, and to set the execution_mode explicitly.

100% agree. I guess I should not have even got into so attached to the whole implementation because of that. 😄

My current use case for the organization level default is to set it to local when bootstrapping an org from workstation, so that only the state is stored in TFC. But indeed the goal is to then manage all workspaces with Terraform, and explicitly set the execution_mode.

Btw, who do those changes force recreation? I wonder if the resourceTFEOrganizationDefaultExecutionModeCreate() would work as is if hooked to the Update in the schema.

This is totally just a choice in implementation. By making the resource just support CREATE/READ/DELETE, it makes things quite a bit simpler! Less things to implement, less things to test, less things to break 🎉

Maybe doesn't matter with this resource, but in general I've seen people getting scared when resources are destroyed in their plans. And some cases it might cause cascading (known after apply) diffs in the plans if there are references to computed attributes, even though they will finally stay the same. So my gut feeling is that it would be nice to support the update if feasible. Especially if it seems to be "free" like here. 🙂


Anyway, thanks again for all the effort with the provider. It really makes my life easier when I'm trying to automate all the things. 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants