-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(custom-resources): handle Inactive lambda functions #20922
Changes from 17 commits
a9c4d46
6fbe8eb
d1fd547
dee5710
701306b
5b2a308
11505fd
bc2eef6
830686c
ed3b5a7
4f7ebd1
a34e600
6b0cf52
74f3dc5
928fed7
0257e58
9baa34a
e8f3eac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,7 @@ | |
import { IsCompleteResponse, OnEventResponse } from '../types'; | ||
import * as cfnResponse from './cfn-response'; | ||
import * as consts from './consts'; | ||
import { invokeFunction, startExecution } from './outbound'; | ||
import { invokeFunction, startExecution, getFunction } from './outbound'; | ||
import { getEnv, log } from './util'; | ||
|
||
// use consts for handler names to compiler-enforce the coupling with construction code. | ||
|
@@ -13,6 +13,9 @@ export = { | |
[consts.FRAMEWORK_ON_TIMEOUT_HANDLER_NAME]: onTimeout, | ||
}; | ||
|
||
const BASE_SLEEP = 10_000; | ||
const MAX_TOTAL_SLEEP = 620_000; | ||
|
||
/** | ||
* The main runtime entrypoint of the async custom resource lambda function. | ||
* | ||
|
@@ -96,7 +99,7 @@ async function onTimeout(timeoutEvent: any) { | |
}); | ||
} | ||
|
||
async function invokeUserFunction<A extends { ResponseURL: '...' }>(functionArnEnv: string, sanitizedPayload: A, responseUrl: string) { | ||
async function invokeUserFunction<A extends { ResponseURL: '...' }>(functionArnEnv: string, sanitizedPayload: A, responseUrl: string, reinvoke?: boolean): Promise<any> { | ||
const functionArn = getEnv(functionArnEnv); | ||
log(`executing user function ${functionArn} with payload`, sanitizedPayload); | ||
|
||
|
@@ -112,17 +115,47 @@ async function invokeUserFunction<A extends { ResponseURL: '...' }>(functionArnE | |
|
||
log('user function response:', resp, typeof(resp)); | ||
|
||
// parse function name from arn | ||
// arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName} | ||
const arn = functionArn.split(':'); | ||
const functionName = arn[arn.length - 1]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a circumstance where this lambda might have a version added to the end of it or are our custom resources using unqualified ARNs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The user function is under the user's control, so they can create versions as they please. However, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue here is that if they do provide us with one that has a version at the end, |
||
|
||
const jsonPayload = parseJsonPayload(resp.Payload); | ||
if (resp.FunctionError) { | ||
let totalSleep = 0, attempt = 0; | ||
while (totalSleep <= MAX_TOTAL_SLEEP) { | ||
// if the user's lambda has become Inactive, we must retry the invocation until Lambda finishes provisioning resources for it. | ||
const getFunctionResponse = await getFunction({ | ||
FunctionName: functionName, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like we can just provide the ARN here instead of trying to parse per GetFunction Documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can, but we need the name later anyway for the logs (line 154 in this file with these changes), and imo passing the name is cleaner. If you prefer though we can change this to the ARN |
||
}); | ||
|
||
if ((getFunctionResponse.Configuration?.State === 'Active' || getFunctionResponse.Configuration?.State === 'Failed') && !reinvoke) { | ||
if (getFunctionResponse.Configuration?.State === 'Active') { | ||
log('user function is in the \'Active\' state, reinvoking it now'); | ||
} else if (getFunctionResponse.Configuration?.State === 'Failed') { | ||
log('user function is in the \'Failed\' state with this reason code: ', getFunctionResponse.Configuration.StateReasonCode); | ||
log('user function provided this reason for the error: ', getFunctionResponse.Configuration.StateReason); | ||
log('reinvoking user function to get error trace'); | ||
} | ||
|
||
// do not reinvoke more than once | ||
return invokeUserFunction(functionArnEnv, sanitizedPayload, responseUrl, true); | ||
} | ||
|
||
const currentSleep = Math.floor(BASE_SLEEP * Math.pow(2, attempt) * Math.random()); | ||
|
||
// don't spend more than 10 minutes and some change waiting | ||
log(`user function is still being initialized by Lambda, sleeping for: ${currentSleep} ms before retry`); | ||
await sleep(currentSleep); | ||
|
||
totalSleep += currentSleep; | ||
attempt++; | ||
} | ||
|
||
log('user function threw an error:', resp.FunctionError); | ||
|
||
const errorMessage = jsonPayload.errorMessage || 'error'; | ||
|
||
// parse function name from arn | ||
// arn:${Partition}:lambda:${Region}:${Account}:function:${FunctionName} | ||
const arn = functionArn.split(':'); | ||
const functionName = arn[arn.length - 1]; | ||
|
||
// append a reference to the log group. | ||
const message = [ | ||
errorMessage, | ||
|
@@ -146,6 +179,10 @@ async function invokeUserFunction<A extends { ResponseURL: '...' }>(functionArnE | |
return jsonPayload; | ||
} | ||
|
||
async function sleep(ms: number): Promise<void> { | ||
return new Promise<void>(ok => setTimeout(ok, ms)); | ||
} | ||
|
||
function parseJsonPayload(payload: any): any { | ||
if (!payload) { return { }; } | ||
const text = payload.toString(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,6 +46,15 @@ async function defaultInvokeFunction(req: AWS.Lambda.InvocationRequest): Promise | |
return lambda.invoke(req).promise(); | ||
} | ||
|
||
async function defaultGetFunction(req: AWS.Lambda.GetFunctionRequest): Promise<AWS.Lambda.GetFunctionResponse> { | ||
if (!lambda) { | ||
lambda = new AWS.Lambda(awsSdkConfig); | ||
} | ||
|
||
return lambda.getFunction(req).promise(); | ||
} | ||
|
||
Comment on lines
+49
to
+56
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, according to the documentation, we can run into a situation where new lamba functions also haven't finished provisioning before we try to invoke them so I wonder if the right path forward here is to replace `defaultInvokeFunction with the following (roughly, this also contains notes):
This would mean that we don't want to export What do you think? |
||
export let startExecution = defaultStartExecution; | ||
export let invokeFunction = defaultInvokeFunction; | ||
export let getFunction = defaultGetFunction; | ||
export let httpRequest = defaultHttpRequest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I apparently already forgot about our conversation on the duplicate PR because I was about to be like, why this number? 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rereading that, however, made me realize that the ordering on these checks might be somewhat off. If the lambda. is called while it is in
inactive
orpending
state, the invocation will simply fail so a new invocation will need to be called instead of giving time for the current invocation to pass. Basically, I think thatinvokeUserFunction
may need to have the following workflow:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above could be a do-while loop so that you don't have to use breaks but obviously the actual implementation is up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, regarding this number, do we know in general how long it usually takes a function to return to active? That data might be good in determining this number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we do not at this time. I should have lambdas that will become inactive in the next week or so (someone from lambda told me offline that this is ~30 days), so I can check once I have an inactive function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original assessment was only partially right. I was under the impression that the Lamba function failed, not errored, but upon further digging, I see that it throws an exception. See
outbound.ts
for my revised suggestion here.