-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(custom-resources): handle Inactive lambda functions #20922
Conversation
packages/@aws-cdk/custom-resources/lib/provider-framework/runtime/framework.ts
Outdated
Show resolved
Hide resolved
packages/@aws-cdk/custom-resources/lib/provider-framework/runtime/framework.ts
Outdated
Show resolved
Hide resolved
|
||
log('user function is still being initialized by Lambda, retrying with delay of: ', newDelay); | ||
|
||
return setTimeout(invokeUserFunction, newDelay, functionArnEnv, payload, newRetryOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not as readable as a for
loop with a sleep
.
function sleep(ms: number): Promise<void> {
return new Promise<void>(ok => window.setTimeout(ok, ms));
}
You can call await sleep(10_000)
on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still changes requested, getting it out of my queue
@@ -13,6 +13,9 @@ export = { | |||
[consts.FRAMEWORK_ON_TIMEOUT_HANDLER_NAME]: onTimeout, | |||
}; | |||
|
|||
const BASE_SLEEP = 10_000; | |||
const MAX_TOTAL_SLEEP = 620_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apparently already forgot about our conversation on the duplicate PR because I was about to be like, why this number? 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading that, however, made me realize that the ordering on these checks might be somewhat off. If the lambda. is called while it is in inactive
or pending
state, the invocation will simply fail so a new invocation will need to be called instead of giving time for the current invocation to pass. Basically, I think that invokeUserFunction
may need to have the following workflow:
Within a while loop `await invokeFunction` -> `await getFunctionResponse` ->
if `Configuration. LastUpdateStatus.failure && Configuration.State === inactive or pending` -> retry
else if -> `Configuration. LastUpdateStatus.failure` i.e. it's status is active or failed -> get and throw error error (break)
else -> return payload (break)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above could be a do-while loop so that you don't have to use breaks but obviously the actual implementation is up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, regarding this number, do we know in general how long it usually takes a function to return to active? That data might be good in determining this number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we do not at this time. I should have lambdas that will become inactive in the next week or so (someone from lambda told me offline that this is ~30 days), so I can check once I have an inactive function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original assessment was only partially right. I was under the impression that the Lamba function failed, not errored, but upon further digging, I see that it throws an exception. See outbound.ts
for my revised suggestion here.
}); | ||
|
||
if (!(getFunctionResponse.Configuration?.State === 'Inactive' || getFunctionResponse.Configuration?.State === 'Pending')) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this just break the while
loop and not the if
statement here? If so, this will still continue on to log that the error was thrown and then throw the error even though the lambda has now succeeded (presumably).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh good catch, you're right
// parse function name from arn | ||
// arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName} | ||
const arn = functionArn.split(':'); | ||
const functionName = arn[arn.length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a circumstance where this lambda might have a version added to the end of it or are our custom resources using unqualified ARNs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user function is under the user's control, so they can create versions as they please. However, getFunction()
will happily accept a name with a :version-string
at the end as well, so we're safe in either case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that if they do provide us with one that has a version at the end, arn[arn.length - 1]
will pick up the version number instead of the function name, since it would be arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName}:${Version}
while (totalSleep <= MAX_TOTAL_SLEEP) { | ||
// if the user's lambda has become Inactive, we must retry the invocation until Lambda finishes provisioning resources for it. | ||
const getFunctionResponse = await getFunction({ | ||
FunctionName: functionName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we can just provide the ARN here instead of trying to parse per GetFunction Documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can, but we need the name later anyway for the logs (line 154 in this file with these changes), and imo passing the name is cleaner. If you prefer though we can change this to the ARN
// parse function name from arn | ||
// arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName} | ||
const arn = functionArn.split(':'); | ||
const functionName = arn[arn.length - 1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is that if they do provide us with one that has a version at the end, arn[arn.length - 1]
will pick up the version number instead of the function name, since it would be arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName}:${Version}
@@ -13,6 +13,9 @@ export = { | |||
[consts.FRAMEWORK_ON_TIMEOUT_HANDLER_NAME]: onTimeout, | |||
}; | |||
|
|||
const BASE_SLEEP = 10_000; | |||
const MAX_TOTAL_SLEEP = 620_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original assessment was only partially right. I was under the impression that the Lamba function failed, not errored, but upon further digging, I see that it throws an exception. See outbound.ts
for my revised suggestion here.
async function defaultGetFunction(req: AWS.Lambda.GetFunctionRequest): Promise<AWS.Lambda.GetFunctionResponse> { | ||
if (!lambda) { | ||
lambda = new AWS.Lambda(awsSdkConfig); | ||
} | ||
|
||
return lambda.getFunction(req).promise(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, according to the documentation, we can run into a situation where new lamba functions also haven't finished provisioning before we try to invoke them so I wonder if the right path forward here is to replace `defaultInvokeFunction with the following (roughly, this also contains notes):
async function defaultInvokeFunction(req: AWS.Lambda.InvocationRequest): Promise<AWS.Lambda.InvocationResponse> {
if (!lambda) {
lambda = new AWS.Lambda(awsSdkConfig);
}
let status = await defaultGetFunction(req);
if (status.Configuration?.State !== 'Failed') {
try {
// If it is inactive, pending, or failed it will throw instead of returning
return lambda.invoke(req).promise();
} catch (e) {
// We only care about inactive or pending for retries
while (status.Configuration?.State === 'Inactive' || status.Configuration?.State === 'Pending') {
// Add wait of some sort in here if we want
// Since we don't know how long the reprovisioning of resources will take, we shouldn't add in a max time.
status = await defaultGetFunction(req);
}
}
}
// Now that it's either active or failed, return the terminal response, error or not
return lambda.invoke(req).promise();
}
This would mean that we don't want to export defaultGetFunction
as getFunction
since the scope of its use is only in this file. This edit would require no change (I think) to the logic in invokeUserFunction()
in framework.ts
. Well, except that there's still the issue with the ARN that I commented on. Maybe we just never take in qualified ARNs, though, so maybe that isn't an issue. On the other hand, you could export getFunction
and use that to make sure we're getting the correct function name. It's part of Configuration
.
What do you think?
@Mergifyio update |
✅ Branch has been successfully updated |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Closing this as the work has been continued in another PR. |
@TheRealAmazonKendra which one? |
all lambda functions can become inactive eventually. This will result in invocations failing. This PR adds logic to retry the invocation up to a maximum of ten minutes and some change. This number was chosen because a lambda only runs for 15 minutes, and 10 seemed like a good number.
Closes #20123.
All Submissions:
Adding new Unconventional Dependencies:
New Features
yarn integ
to deploy the infrastructure and generate the snapshot (i.e.yarn integ
without--dry-run
)?By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license