-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for cron stopping. #15566
Add tests for cron stopping. #15566
Conversation
QA Wolf here! As you write new code it's important that your test coverage is keeping up. |
// For the purpose of testing, don't trigger cron jobs immediately. | ||
// Require the test to trigger them manually with timestamps. | ||
if (message.opts?.repeat != null) { | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to be that when you called config.api.application.publish()
, all crons were triggered straight away. This was odd behaviour to write tests against, so I decided to defer all cron triggers to be done manually in tests because it's less spooky.
async removeRepeatableByKey(id: string) { | ||
for (const [idx, message] of this._messages.entries()) { | ||
if (message.opts?.jobId?.toString() === id) { | ||
this._messages.splice(idx, 1) | ||
this._emitter.emit("removed", message) | ||
return | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was put in to try and test us removing repeatable jobs from Bull but in practice it never gets called, due to my change in how crons get triggered. Because we trigger them manually, the job message doesn't have a job.opts.repeat
anymore. I couldn't figure a way around this, we can't easily mimic Bull's repeat job functionality in our InMemoryQueue
class.
await config.withProdApp(async () => { | ||
const { | ||
data: [latest, ..._], | ||
} = await automations.logs.logSearch({ | ||
automationId: runner.automation._id, | ||
}) | ||
expect(latest.status).toEqual(AutomationStatus.STOPPED_ERROR) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the best I could do for now to make sure cron jobs stop when they hit a certain number of errors (5 at present). It doesn't assert that we remove the job from the Bull queue, because in tests I'm not sure how we can do that.
if (!this.config.prodAppId) { | ||
throw new Error( | ||
"Automations can only be triggered in a production app context, call config.api.application.publish()" | ||
) | ||
} | ||
// Because you can only trigger automations in a production app context, we | ||
// wrap the trigger call to make tests a bit cleaner. If you really want to | ||
// test triggering an automation in a dev app context, you can use the | ||
// automation API directly. | ||
return await this.config.withProdApp(async () => { | ||
try { | ||
return await this.config.api.automation.trigger( | ||
this.automation._id!, | ||
request | ||
) | ||
} catch (e: any) { | ||
if (e.cause.status === 404) { | ||
throw new Error( | ||
`Automation with ID ${ | ||
this.automation._id | ||
} not found in app ${this.config.getAppId()}. You may have forgotten to call config.api.application.publish().`, | ||
{ cause: e } | ||
) | ||
} else { | ||
throw e | ||
} | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to flesh this out to make it easier to use in tests, and easier to know what to do in some common failure scenarios.
if (waited > waitMax) { | ||
// eslint-disable-next-line no-unsafe-finally | ||
throw new Error( | ||
`Timed out waiting for automation runs to complete. ${messagesOutstanding} messages waiting for completion.` | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously tests would hang forever if a message was added to the queue and not removed (as cron messages now are). Put in this failsafe to make sure the tests do eventually fail.
return metadata || { _id: metadataId, errorCount: 0 } | ||
} | ||
|
||
async incrementErrorCount() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was finding that the previous method of incrementing the error count would very frequently run into document conflicts. Rewrote it to be more robust to that failure scenario.
// Queue messages tend to be send asynchronously in API handlers, so there's | ||
// no guarantee that awaiting this function will have queued anything yet. | ||
// We wait here to make sure we're queued _after_ any existing async work. | ||
await helpers.wait(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anyway to check if something has been queued? Just wondering if in tests this 100ms could add up quite a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it can (and often does) happen async, so by the time we get to here the async task that does the queueing hasn't happened yet. I agree with you that this is not ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - nice to have test cases for a lot of this now!
Description
This PR rounds out the testing of cron triggers by adding end-to-end tests for filling in timestamps, and making sure crons stop when they error more than 5 times.
Annoyingly it's hard to test the stopping perfectly, so I'm asserting that the job has a stopped status. I couldn't figure out a nice way to make a test against us sending the message deletion request to Bull.