Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Private GKE nodes #30

Merged
merged 7 commits into from
Feb 1, 2022
Merged

Private GKE nodes #30

merged 7 commits into from
Feb 1, 2022

Conversation

cisaacstern
Copy link
Member

@cisaacstern cisaacstern commented Feb 1, 2022

Closes #29

This PR is mostly a direct copy of 2i2c-org/infrastructure#538 which ... totally works! 🎉 Here is a snapshot of the GCS console with three running private nodes, created by calling make deploy from this branch:

Screen Shot 2022-01-31 at 11 32 38 PM

@sgibson, first I cannot thank you enough for this incredibly helpful reference. Second, I wanted to ask, is there a reason why you assigned various values to empty [] or {} as part of the above-linked PR? (e.g., here, here, and here) That is, is this just general good practice, or is it something specifically tied to the private GKE nodes? I haven't included those assignments here, and terraform seems to work without them, but it's quite possible I'm overlooking something.

@rabernat @tracetechnical @sharkinsspatial, we don't have comprehensive tests for this infrastructure (perhaps that's not even feasible), so I'm basing my assessment that this works based on the fact that, by including this additional config, I am now able to run make deploy and create a GKE cluster without erroring out. Beyond that, I did start to explore the make test-flow command described in the README, and it did appear to execute in some form, though evaluating its outcome felt outside the scope of this PR, which is really just about creating the private nodes.

Note also I did also need to change


to my GCS user email address to get make deploy to work. AFAICT, that is not documented in the README, but we can address it in a separate PR, as it's not directly related to the private nodes work.

@@ -78,6 +78,13 @@ else
echo "CLUSTER_REGION is set to ${CLUSTER_REGION}"
fi

if [ -z "${ENABLE_PRIVATE_CLUSTER}" ]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably ought to trap non boolean values near here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is a "do it nowwww" thing? as the instructions in env are pretty clear about it being boolean.
I'm on the fence about the "when" of this one.

@@ -25,6 +25,37 @@ resource "google_container_cluster" "primary" {
ip_allocation_policy
]
}

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the code clean, this should be documented in the README.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in code tend to go stale or worse not reflect the actual code. Where the code isn't self descriptive, we should fall back to documenting it in the relevant readme section.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trace, really appreciate your review here! 🙏

Just so I understand, are you really advocating that ALL inline code comments are a practice to avoid? You yourself have left inline comments in this same file. I don't see how these are qualitatively different.

Whether code is self-descriptive depends a lot on the audience. Charles and I are completely new to terraform, so some inline comments can be really helpful in understanding what a particular code block is for. I agree there is a maintenance cost to adding comments.

Maybe rather than saying, "let's avoid all inline comments," we could ask, "are these specific notes better placed inline vs. in the README?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair counter-point. I need to look into more around best practises when it comes to Terraform and code comments, but I have always been told "inline comments bad" across the languages I have written in several companies.

Unfortunately, the comment which was in that file was as a result of a shameless copy-pasta on my part, rather than an intentional "This comment needs to be here to inform people". And this particular one should also be migrated to the README, which I will take an action point to do.

I'd prefer to retain the reduction of inline comments where possible, and perhaps flesh out the "per file" sections of the readme with references to the codeblocks where these are needed to help newer contributors.
This has a few benefits:

  • Keeping the code uncluttered
  • Allowing ease of searching (versus GitHubs code search which is buggy at best)
  • Keeping all the information in one place - Making the README the Goto for "How does this thing work"

Copy link
Contributor

@tracetechnical tracetechnical Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT the above: it looks like I was thinking of another repo where each file and its purpose was in the README. I will take an action item to get these items added to the README for this repo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great, so it sounds like we have a path forward.

  • Let's finish up this review focusing on the actual code and get it merged so Charles can unblock his next steps
  • Next you can make a pass through and migrate inline comments into a Readme to suit your style preferences (which are certainly better-informed than ours)

Does that sound oK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds spot on.

The main reason for requesting the changes was to prompt the "Is this something we can do very quickly? If not, then when?" discussion. And I agree that "Soon, but not now" is the right answer.

If you are happy with the approach RE: explaining each file and its purpose in the readme (plus a section for gotchas maybe?), then that sounds good to me.


// This range contains all IP addresses that IAP uses for TCP forwarding.
// https://cloud.google.com/iap/docs/using-tcp-forwarding
source_ranges = ["35.235.240.0/20"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fed in as a variable with a default, rather than hardcoded.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we should still do this. No idea about IAP, probably have to ask Yuvi or Sarah.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Note: probably worth adding to the docs how the interaction with IAP works for this usecase
Also, now on the fence RE: putting this in a variable after pondering it some more.

If this were to be used in another terraform project as a module, it may be more worthwhile, but I think that is unlikely in this case. The only time I can see this being changed is to add other whitelist IPs for on-prem access from static IPs (Non-Columbia usecases)

Copy link
Contributor

@tracetechnical tracetechnical Feb 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benefit of others,having googled it I found that: IAP is Google's Identity Aware Proxy :)

@tracetechnical tracetechnical self-requested a review February 1, 2022 14:29
Copy link
Contributor

@tracetechnical tracetechnical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--

@cisaacstern
Copy link
Member Author

Thanks for the prompt review, @tracetechnical! I started #31 to track the point you raised about trapping non-boolean values of enable_private_cluster. I'll leave it to you to start any other issues you see fit, including for documentation/comments cleanup, and (if it ultimately seems preferable) variablizing IAP source ranges etc.

Since I see your ✅ , going to merge now in interest of moving forward with #19. 🚀

@cisaacstern cisaacstern merged commit 3f51b84 into main Feb 1, 2022
@cisaacstern cisaacstern deleted the private-gke-nodes branch February 1, 2022 16:53
@sgibson91
Copy link

is there a reason why you assigned various values to empty [] or {} as part of the above-linked PR? (e.g., here, here, and here)

Yeah, I think it's just a bit of good housekeeping. If you don't set those values to empty, then GCP will set them to empty automatically when the resources are created. Then the next time you run terraform apply, you'll get a message saying "Resources changed outside of terraform", i.e. something in GCP's state doesn't match terraform's state. While not particularly harmful in the case of tags or labels, it can be distracting and mask changes that you do need to worry about!

@tracetechnical
Copy link
Contributor

is there a reason why you assigned various values to empty [] or {} as part of the above-linked PR? (e.g., here, here, and here)

Yeah, I think it's just a bit of good housekeeping. If you don't set those values to empty, then GCP will set them to empty automatically when the resources are created. Then the next time you run terraform apply, you'll get a message saying "Resources changed outside of terraform", i.e. something in GCP's state doesn't match terraform's state. While not particularly harmful in the case of tags or labels, it can be distracting and mask changes that you do need to worry about!

I can vouch for this approach too from my time at UKHO, The same rule applies in Azure to stop it being a pain when using it with TF.

Also, from a future proofing point of view, it protects you from sudden "surprise defaults" in the providers/resource blocks.

@cisaacstern
Copy link
Member Author

Then the next time you run terraform apply, you'll get a message saying "Resources changed outside of terraform", i.e. something in GCP's state doesn't match terraform's state. While not particularly harmful in the case of tags or labels, it can be distracting and mask changes that you do need to worry about!

Great to know! I assume I'll hit this exact scenario shortly, at which point I'll make another PR to fix this.

I can vouch for this approach too from my time at UKHO, The same rule applies in Azure to stop it being a pain when using it with TF.

Thank you both so much for sharing your expertise in this area 🏆 🙏 Quite literally saving me (and the project) what would otherwise be days upon days of confusion.

@tracetechnical
Copy link
Contributor

(I mention Azure specifically as MS seem to absolutely love surprise changes to their terraform providers (non existent fields, undocumented expectations etc).)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Columbia GKE VMs can't have external IPs. Use NAT instead?
4 participants