Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test cron removal. #15569

Merged
merged 5 commits into from
Feb 18, 2025
Merged

Test cron removal. #15569

merged 5 commits into from
Feb 18, 2025

Conversation

samwho
Copy link
Collaborator

@samwho samwho commented Feb 18, 2025

Description

In #15566 I was annoyed that I wasn't able to find a nice way to test that cron automations get removed correctly from Bull. In this PR, I think I've figured out a nice enough way to do it.

It boils down to capturing the cron message that goes into the queue and exposing a method to re-trigger that message in tests. With this, we ensure that when the job triggers, it's using the correct cron message, and this then correctly triggers the removal from the queue when it fails 5 times.

Copy link

qa-wolf bot commented Feb 18, 2025

QA Wolf here! As you write new code it's important that your test coverage is keeping up.
Click here to request test coverage for this PR!

@github-actions github-actions bot added firestorm Data/Infra/Revenue Team size/m labels Feb 18, 2025
@@ -86,10 +87,13 @@ class InMemoryQueue implements Partial<Queue> {
*/
async process(concurrencyOrFunc: number | any, func?: any) {
func = typeof concurrencyOrFunc === "number" ? func : concurrencyOrFunc
this._emitter.on("message", async message => {
this._emitter.on("message", async msg => {
const message = cloneDeep(msg)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hilariously this is what was causing those loop.spec.ts tests to pass without timing out. The orchestrator removes appId from the job message, which mutates the original message, causing all future invocations of that message to fail. Preserving the appId fixes these, but causes loop.spec.ts to then start timing out because it's doing a lot more work than before.

this._emitter.on("message", async msg => {
const message = cloneDeep(msg)

const isManualTrigger = (message as any).manualTrigger === true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the mechanism I'm using to trigger cron messages. On app publish they get queued without this manualTrigger property, and in that instance we ignore them. Then later when we trigger them manually, they have this property attached to them to distinguish them from the initial message. It's a bit dirty, but results in a much more realistic test.

Comment on lines +140 to +145
async add(data: T | string, optsOrT?: JobOptions | T) {
if (typeof data === "string") {
throw new Error("doesn't support named jobs")
}

const opts = optsOrT as JobOptions
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were required to do the correct generic typing on the class (InMemoryQueue<T> as opposed to just InMemoryQueue).

@@ -176,7 +186,7 @@ class InMemoryQueue implements Partial<Queue> {

async removeRepeatableByKey(id: string) {
for (const [idx, message] of this._messages.entries()) {
if (message.opts?.jobId?.toString() === id) {
if (message.id === id) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In getRepeatableJobs we transform each message to a JobInformation object. This sets .key to be the same as .id. We're relying on that fact here, as messages are not stored with a .key property.

Comment on lines +35 to +44
try {
await automations.logs.storeLog(automation, results)
} catch (e: any) {
if (e.status === 413 && e.request?.data) {
// if content is too large we shouldn't log it
delete e.request.data
e.request.data = { message: "removed due to large size" }
}
logging.logAlert("Error writing automation log", e)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handling was moved from the automation orchestrator to here because it seems to me that we should probably always log this when a log fails because it's too big.

Comment on lines +24 to +27
const { automations } = await config.api.automation.fetch()
for (const automation of automations) {
await config.api.automation.delete(automation)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix for the timeouts mentioned earlier in this review. This makes it such that there are fewer other automations to trigger when working with each test's specific automation.

Comment on lines 83 to 93
let results: Job<AutomationData>[] = []
const removed = await captureAutomationRemovals(automation, async () => {
results = await captureAutomationResults(automation, async () => {
for (let i = 0; i < MAX_AUTOMATION_RECURRING_ERRORS; i++) {
triggerCron(message)
}
})
})

expect(removed).toHaveLength(1)
expect(removed[0].id).toEqual(message.id)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of the PR, the test that makes sure jobs are removed from Bull when they fail >5 times. This nesting of capture* functions is a bit ugly but this is the only place we have to do it, so I'm not super worried.

Comment on lines +24 to +26
export function getTestQueue(): queue.InMemoryQueue<AutomationData> {
return getQueue() as unknown as queue.InMemoryQueue<AutomationData>
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was trying to avoid in the previous PR, but I bit the bullet and did it. I wanted to just use methods on the Bull.Queue interface but that was holding me back from being able to manually retrigger cron messages.

@samwho samwho marked this pull request as ready for review February 18, 2025 11:17
@samwho samwho requested a review from a team as a code owner February 18, 2025 11:17
@samwho samwho requested review from mike12345567 and adrinr and removed request for a team February 18, 2025 11:17
return metadata.errorCount
} catch (error: any) {
err = error
await helpers.wait(1000 + Math.random() * 1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the randomness?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When multiple things are fighting to write to a document, a cheap way to make it less likely they'll conflict is to wait a random amount of time. It's not strictly necessary but makes conflicts less likely.

@samwho samwho enabled auto-merge February 18, 2025 14:03
@samwho samwho merged commit 670b409 into master Feb 18, 2025
20 checks passed
@samwho samwho deleted the automation-tests-9 branch February 18, 2025 14:10
@github-actions github-actions bot locked and limited conversation to collaborators Feb 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
firestorm Data/Infra/Revenue Team size/m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants