Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After a successful deploy, next run will cause a failed deploy if target version is not updated #82

Closed
dominics opened this issue Dec 10, 2020 · 3 comments · Fixed by #85

Comments

@dominics
Copy link
Contributor

dominics commented Dec 10, 2020

TF: 0.14.2 (but present on 0.13.x too)
Module: 1.31.0

I'm running into an issue where the deploy module fails every second time it deploys. The error is:

The deployment failed because the AppSpec file that specifies the AWS Lambda deployment configuration is missing or has an invalid configuration.

Normally there's an extra part at the end of this message that says what is wrong. Not in this case. I've double-checked the content of the AppSpec file, and it looks correct.

TargetVersion must not equal CurrentVersion

I started to look deeper. I've been using this script, which is similar to the guts of the deploy script the module uses, but lets me freely control the target and current version easily, and muck with the create-deployment call.

from=123
to=123

content="{\"Resources\":[{\"MyFunction\":{\"Properties\":{\"Alias\":\"deployed\",\"CurrentVersion\":\"$from\",\"Name\":\"some_function_name\",\"TargetVersion\":\"$to\"},\"Type\":\"AWS::Lambda::Function\"}}],\"version\":\"0.0\"}"
sum=$(echo -n "$content" | shasum -a 256 | cut -f 1 -d ' ')
escaped="${content//\"/\\\"}"
revision="{\"revisionType\": \"AppSpecContent\", \"appSpecContent\": {\"content\": \"$escaped\", \"sha256\": \"$sum\"}}"

aws deploy --debug create-deployment \
    --application-name SomeAppName \
    --deployment-group-name SomeDeploymentGroupName \
    --revision "$revision"

What I found is that any time the CurrentVersion and TargetVersion are different, the deploy succeeds. But any time the versions are the same, the above error is given. It'd be useful to check that hypothesis more widely, but it definitely holds according to my experimentation above. (Edit: also needed to reproduce is to ensure the deployment group is not using an all-at-once config)

I'm working to get this confirmed by AWS support

Consequences

If triggering a deploy when one isn't needed (because the versions already match) always causes a failed deploy, we should not CreateDeployment if TargetVersion == CurrentVersion

It also means that the force_deploy option will cause failed deployments unless the target version changes every time we apply.

Incorrect guesses

I had some earlier speculation that this was caused by an AWS bug at aws/serverless-application-model#291 (more references at https://forums.aws.amazon.com/message.jspa?messageID=846025). On reflection, it doesn't seem to be.

@dominics dominics changed the title Deploy fails every second time it runs, with inaccurate AWS error After a successful deploy, next run will cause a failed deploy if target version is not updated Dec 10, 2020
@dominics
Copy link
Contributor Author

dominics commented Dec 10, 2020

There is a related problem that I'm tempted to open as another issue: a deploy causes another deploy, meaning you have to apply twice to get the module to show no changes.

That's because, after a deploy, the current version of the alias changes to the new version. This, in turn, causes a change of local.appspec_sha256. Which causes another deploy.

So, there are two deploys caused for every one target version change.


Thinking about this out loud, the current triggers for the deploy are:

  triggers = {
    appspec_sha256 = local.appspec_sha256
    force_deploy   = var.force_deploy ? uuid() : false
  }

where local.appspec_sha256 contains: function name, alias name, current version, target version and the hook ARNs (which is also an issue: changing the hook ARNs probably shouldn't cause a deploy). If we reduced this to just:

  triggers = {
    function_name  = var.function_name
    alias_name     = var.alias_name
    target_version = var.target_version
    force_deploy   = var.force_deploy ? uuid() : false
    # i.e. removed current_version and the hook ARNs
  }

Then, I think we fix one problem (but cause another). The previously targeted version is stored in TF state, and we only deploy when that targeted version has changed, which is good. But the downside is that if something changes the current version of the alias (from outside of TF perhaps), we'd no longer trigger a deploy until the target changes again. (And that's the case if the deployment fails too, which is very annoying.)

Without something like getting the previous values of the triggers (hashicorp/terraform#20859), I can't see a way to distinguish the two cases:

  • something else (like a rollback) moved our current alias version since the last run, and we want a redeploy
  • we (the deploy module) moved our current alias version in the last run by deploying, no need to redeploy

So, it's probably safest to leave the triggers as-is, and concentrate on making the deploy script skip create-deployment in the event of current and target versions matching. We'd continue to get one unnecessary updated null_resource on the run after a deploy, ah well - but it would be a no-op deployment.

@dominics
Copy link
Contributor Author

dominics commented Jan 5, 2021

Here is AWS Support's response:

The reason that you are receiving this error is because of the deployment policy that you are using. Your deployment policy shifts 25% of traffic from the current version to the latest version every minute. When the version numbers are the same, the traffic can not be split between the two versions, as there is only one version. This is why the deployment fails with the "The deployment failed because the AppSpec file that specifies the AWS Lambda deployment configuration is missing or has an invalid configuration." error.

To deploy updates, where the version number of the latest and previous versions remain the same, the Deployment Configuration will need to be set to AllAtOnce. Deploying with the 'AllAtOnce' deployment configuration pass the traffic shifting deployment stage without receiving the Lambda Validation Error.

So, there we go; the needed setup for reproducing the bug includes the deployment group using a linear or canary rollout config. If you're not using CodeDeployDefault.LambdaAllAtOnce, you can't deploy the same version that is already deployed.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
1 participant