Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies #791

Open
ZIJ opened this issue Nov 15, 2023 · 4 comments
Open

Dependencies #791

ZIJ opened this issue Nov 15, 2023 · 4 comments

Comments

@ZIJ
Copy link
Contributor

ZIJ commented Nov 15, 2023

Our dependencies / include patterns implementation only conciders "static" graph based on folder structure. This leads to following problems:

  • downstream dependencies triggering when there is no actual change affecting them. Noise and extra Action minutes spent.
  • duplicate effort in case of module imports (need to specify dependencies in 2 places)

We probably want to:

  • calculate dependencies on the actual import graph (or terragrunt graph)
  • optionally limit it to inputs - so that even if dependency changes, downstream isn't triggered unless its outputs change
  • further optionally, limit to specific outputs (as in the description of the issue

Considerations for limiting to inputs / outputs

  • Only trigger downstream dependent projects if certain outputs change in the upstream (not all). Similar to stack dependencies in Spacelift. From user feedback 16.11.2023 (Alexander F)
  • First step is likely switching the way dependencies work from simple graph of projects to a graph of outputs, so that downstream projects are triggered based on changes in outputs of their dependencies (all outputs)
  • Second step is to limit scope to only certain outputs (as the user suggests).
@ZIJ ZIJ changed the title Dependency scope Dependencies based on outputs Nov 17, 2023
@ZIJ ZIJ changed the title Dependencies based on outputs Better dependencies handling Nov 23, 2023
@ZIJ
Copy link
Contributor Author

ZIJ commented Nov 23, 2023

Additional feedback from Francois (NT): "all projects are triggered if modules change, even though changes in certain modules aren't affecting certain projects".

@ZIJ
Copy link
Contributor Author

ZIJ commented Dec 11, 2023

Idea for using git as a store for dependencies / TFVars

In digger.yml, each project can define exports and imports lists, like this:

projects:
  - name: projA
    dir: ./proj-a
    exports: ["vpcId", "subnetId"]
  - name: projB
    dir: ./proj-b
    imports: ["projA.vpcId", "projA.subnetId"]

On every successful apply of a project, all its exports are pushed to git as outputs.tfvars file, into either a separate "infra state" repo or a special folder like .digger. Before applying, Digger copies its outputs.tfvars from each upstread dependency as imports.projectA.tfvars (possibly also updates the keys with project prefix) so that Terraform picks it up automatically.

Folder structure matches digger.yml, like this:

.digger
   ├- ProjA
       ├- outputs.tfvars
   ├- ProjB
       ├- outputs.tfvars

This ways multi-stage applies for every change can be split in "layers" (effectively levels of the dependency tree):

  • Layer 0: A code change triggers an apply of 1 or more porjects. They run in parallel and each produce outputs stored in git
  • Layer 1: All projects that consume outputs of layer 0 as inputs
  • ...

Making imports and exports an explicit white-list is important because persisting all outputs might be potentially unsafe as users might output some sensitive data like passwords. We'd only want to export outputs that are used in downstream projects (like VPC IDs), which tend to be safe.

User may choose to not proceed with Layer 1 after Layer 0 has been applied, or only apply some of the projects by hand and not all.

Why store in git? Because every apply produces outputs changes the "overall state of infrastructure". Ideally everything is tracked in git for it to be single source of truth, rollbacks, audit trail etc. But terraform state itself cannot be stored in git because it might contain sensitive data. Outputs however can be stored; and then the history of applies is clear commit-by-commit, and it'd be easy to see what exactly changed and when. Additional git-tracked state of sorts.

@motatoes
Copy link
Contributor

Good concept but I don't think outputs or inputs live in git, they are already stored by terraform in state of every project and can be fetched out and passed to state on demand, the state is independent of how many jobs or threads exist and when it comes to reading it it is also safe with concurrent reads. That's how terragrunt does input/output management, and I think we should follow a similar route if we were to build our own dependency management between modules.

We may wish to cache these dependencies if we are scaling with jobs however and for that we will need either a cloud based cache which can be part of the orchestrator, or some other secret store to rely on caching these values.

@ZIJ
Copy link
Contributor Author

ZIJ commented Feb 14, 2024

More user feedback (M. U. 13.02.2024):

is there a way to handle dependant projects where plans for the second project wait until applies are done on the first project? i just (successfully) tried out project dependencies for the first time in that sample PR I provided previously and it seems like plans run right after each other, when I would think (but I might be wrong...) for most cases the second project's plan depends on resources/outputs from the first project's apply...

what I want to do (going off your example config):
1. have digger run a plan on core, wait for apply request if there are expected changes on core
2. have me, as user, run digger apply to apply changes on core
3. once successful, have digger run a plan on platform, now that core is up to date
4. have me run digger apply to apply changes on platform
merge

if two projects are sharing data (e.g. in the above PR via remote_state), the current digger behaviour would result the plan for platform failing when it tries to reference a value that doesn't yet exist (because so far it's only appeared in a plan on core and is not actually in the state for core yet)

in a sense it kind of is supported but in a hacky way. you'd just let the second project plan's job fail (or cancel it ahead of time knowing it'll fail) and run digger apply -p core, then a manual digger plan again and then finally digger apply (and probably also limit these to -p platform while we're at it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants