Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for cron stopping. #15566

Merged
merged 1 commit into from
Feb 18, 2025
Merged

Add tests for cron stopping. #15566

merged 1 commit into from
Feb 18, 2025

Conversation

samwho
Copy link
Collaborator

@samwho samwho commented Feb 17, 2025

Description

This PR rounds out the testing of cron triggers by adding end-to-end tests for filling in timestamps, and making sure crons stop when they error more than 5 times.

Annoyingly it's hard to test the stopping perfectly, so I'm asserting that the job has a stopped status. I couldn't figure out a nice way to make a test against us sending the message deletion request to Bull.

Copy link

qa-wolf bot commented Feb 17, 2025

QA Wolf here! As you write new code it's important that your test coverage is keeping up.
Click here to request test coverage for this PR!

@github-actions github-actions bot added firestorm Data/Infra/Revenue Team size/m labels Feb 17, 2025
Comment on lines +90 to +94
// For the purpose of testing, don't trigger cron jobs immediately.
// Require the test to trigger them manually with timestamps.
if (message.opts?.repeat != null) {
return
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It used to be that when you called config.api.application.publish(), all crons were triggered straight away. This was odd behaviour to write tests against, so I decided to defer all cron triggers to be done manually in tests because it's less spooky.

Comment on lines +177 to +184
async removeRepeatableByKey(id: string) {
for (const [idx, message] of this._messages.entries()) {
if (message.opts?.jobId?.toString() === id) {
this._messages.splice(idx, 1)
this._emitter.emit("removed", message)
return
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was put in to try and test us removing repeatable jobs from Bull but in practice it never gets called, due to my change in how crons get triggered. Because we trigger them manually, the job message doesn't have a job.opts.repeat anymore. I couldn't figure a way around this, we can't easily mimic Bull's repeat job functionality in our InMemoryQueue class.

Comment on lines +89 to +96
await config.withProdApp(async () => {
const {
data: [latest, ..._],
} = await automations.logs.logSearch({
automationId: runner.automation._id,
})
expect(latest.status).toEqual(AutomationStatus.STOPPED_ERROR)
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the best I could do for now to make sure cron jobs stop when they hit a certain number of errors (5 at present). It doesn't assert that we remove the job from the Bull queue, because in tests I'm not sure how we can do that.

Comment on lines +223 to +250
if (!this.config.prodAppId) {
throw new Error(
"Automations can only be triggered in a production app context, call config.api.application.publish()"
)
}
// Because you can only trigger automations in a production app context, we
// wrap the trigger call to make tests a bit cleaner. If you really want to
// test triggering an automation in a dev app context, you can use the
// automation API directly.
return await this.config.withProdApp(async () => {
try {
return await this.config.api.automation.trigger(
this.automation._id!,
request
)
} catch (e: any) {
if (e.cause.status === 404) {
throw new Error(
`Automation with ID ${
this.automation._id
} not found in app ${this.config.getAppId()}. You may have forgotten to call config.api.application.publish().`,
{ cause: e }
)
} else {
throw e
}
}
})
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to flesh this out to make it easier to use in tests, and easier to know what to do in some common failure scenarios.

Comment on lines +110 to +114
if (waited > waitMax) {
// eslint-disable-next-line no-unsafe-finally
throw new Error(
`Timed out waiting for automation runs to complete. ${messagesOutstanding} messages waiting for completion.`
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously tests would hang forever if a message was added to the queue and not removed (as cron messages now are). Put in this failsafe to make sure the tests do eventually fail.

return metadata || { _id: metadataId, errorCount: 0 }
}

async incrementErrorCount() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was finding that the previous method of incrementing the error count would very frequently run into document conflicts. Rewrote it to be more robust to that failure scenario.

@samwho samwho marked this pull request as ready for review February 17, 2025 17:55
@samwho samwho requested a review from a team as a code owner February 17, 2025 17:55
@samwho samwho requested review from adrinr and removed request for a team February 17, 2025 17:55
// Queue messages tend to be send asynchronously in API handlers, so there's
// no guarantee that awaiting this function will have queued anything yet.
// We wait here to make sure we're queued _after_ any existing async work.
await helpers.wait(100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anyway to check if something has been queued? Just wondering if in tests this 100ms could add up quite a bit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it can (and often does) happen async, so by the time we get to here the async task that does the queueing hasn't happened yet. I agree with you that this is not ideal

Copy link
Collaborator

@mike12345567 mike12345567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice to have test cases for a lot of this now!

@samwho samwho merged commit 5e21278 into master Feb 18, 2025
20 checks passed
@samwho samwho deleted the automation-tests-9 branch February 18, 2025 09:28
@github-actions github-actions bot locked and limited conversation to collaborators Feb 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
firestorm Data/Infra/Revenue Team size/m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants