-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated retries #1776
base: master
Are you sure you want to change the base?
Automated retries #1776
Conversation
eb65eb5
to
7c875d1
Compare
7c875d1
to
c0157fc
Compare
8757074
to
09cb758
Compare
<RetryHandler> | ||
{children} | ||
</RetryHandler> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RetryHandler
isn't a context provider, but I think we should use more providers inside context providers nonetheless. There are too many in pages/_app.js.
For example, we could put all providers for the carousel (price, block height, chain fee) into the same component.
f79f837
to
ccbde0d
Compare
api/resolvers/wallet.js
Outdated
SELECT | ||
'unlockInvoice', | ||
jsonb_build_object('id', id), | ||
now() + interval '10 minutes', | ||
now() + interval '15 minutes' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool construct.
- Can you explain your reasoning behind the times for unlocking?
- As to why it needs the lock, is this correct: it only exists to prevent races creating the invoice (either wrapped or direct to SN) that the race winner will pay?
- If I have (2) correct, did you consider skip locking in
retryPaidAction
in a separate tx0, before the invoice was created, rather than when it's fetched? Then, it'd be implicitly unlocked in tx1, where the invoice is set toRETRYING
? I guess it's possible tx1 fails, and doesn't unlock by setting toRETRYING
, but maybe that can be fixed by having tx0 set an expiring lock likelockedAt is null OR lockedAt < now() - interval '2 minutes'
- I'm just trying to brainstorm ways to avoid the pgboss job (async stuff like this tends to be the most confusing for me) and shorten the time between retries and the notification for retrying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain your reasoning behind the times for unlocking?
I didn't think too much about when exactly it should unlock. It should just be high enough such that we can reasonably assume that the client who received and locked this invoice is not going to retry this invoice. I think we can probably safely decrease the unlock time 10 minutes to 1 minute.
As to why it needs the lock, is this correct: it only exists to prevent races creating the invoice (either wrapped or direct to SN) that the race winner will pay?
Mhh, yes, by making sure we only hand out invoices to one client at a time to avoid any such races.
I guess it's possible tx1 fails, and doesn't unlock by setting to
RETRYING
, but maybe that can be fixed by having tx0 set an expiring lock likelockedAt is null OR lockedAt < now() - interval '2 minutes'
Yeah, an expiring lock would be the alternative to the async stuff.
update: changed it to use locks that expire after one minute in 15c799d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed you didn't address (3) exactly. I might have been confusing.
I don't think we should lock on query - we should lock in retryPaidAction
. ie multiple clients can call retryPaidAction
, but only one will successfully get an invoice to retry.
Is that not possible or is the current way better for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, forgot to reply
I just noticed you didn't address (3) exactly. I might have been confusing.
I didn't address (3) exactly, only the async stuff because of
I'm just trying to brainstorm ways to avoid the pgboss job
so I thought that was the main issue and didn't elaborate on the "lock location"
I don't think we should lock on query - we should lock in retryPaidAction. ie multiple clients can call retryPaidAction, but only one will successfully get an invoice to retry.
Is that not possible or is the current way better for some reason?
I am also using the lock set on query to know if the payment attempt counter should be increased in the retry mutation.
I could increase the payment attempt counter on the client and the client includes it in the retry mutation but letting the client pick that number makes this a little harder to think about for example wrt trusting inputs 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, anytime an invoice is transitioned from FAILED to RETRYING, we want this incremented in the new invoice record, right?
No, during sender and receiver fallbacks, we are also retrying but we want to keep the payment attempt counter the same so we can filter receiver wallets based on that. When we retry a locked invoice, we increment the counter to make all sender/receiver wallets available again and start a "new chain of invoices" where all have the same payment attempt counter.
Anyway, if you're certain that what I'm asking makes that impossible, then I'll accept it is until I've taken a closer look.
Changing the lock+payment attempt counter increment logic is not impossible but I think it's also good enough as it is now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't see why splitting this up into separate requests makes this possible in way that separate txs in retryPaidAction
wouldn't. I'm mostly suggesting the logic/counting stays the same, but we can get rid of the lock on query by locking before the retry, and if we can't acquire the lock because another client retried it already, returning null/error.
Anyway, I haven't read the code so I'm probably missing something subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly suggesting the logic/counting stays the same, but we can get rid of the lock on query by locking before the retry
Maybe I am actually missing something, but I can't tell from this description how you want to know during retry if you should increment the counter if you lock every invoice before retry. If the logic stays the same, every retry would then increment the counter which would break receiver fallbacks.
So basically, the lock on query allows us to distinguish "normal retries" (= same payment counter) from "full retries" (= increment payment counter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without solving this myself, assuming it can be solved, the easiest way for you to understand might be to ask a riddle: how would you do exactly what you do now without acquiring the lock in the query by acquiring the lock only during the mutation? hint: you can run multiple txs, and logic in between them, in the same mutation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an intermediate retry state RETRY_PENDING
in 89a6b42 to lock during retry:
// make sure only one client at a time can retry by immediately transitioning to an intermediate state
const [invoice] = await models.$queryRaw`
UPDATE "Invoice"
SET "actionState" = 'RETRY_PENDING'
WHERE id in (
SELECT id FROM "Invoice"
WHERE id = ${invoiceId} AND "userId" = ${me.id} AND "actionState" = 'FAILED'
FOR UPDATE
)
RETURNING *`
if (!invoice) {
throw new Error('Invoice not found')
}
This intermediate state avoids that concurrent retries request invoices from attached wallets even though their state transition from FAILED
to RETRYING
at the end will fail. This makes this a pessimistic lock instead of an optimistic lock though. But I think that's okay because I suspect many cases where the mobile and desktop client of a stacker retries at the same time so we'd request enough unused invoices from attached wallets for no apparent reason that it would be confusing and lead to questions.
I also updated the documentation that FAILED
now always transitions to RETRY_PENDING
before RETRYING
.
ccbde0d
to
15c799d
Compare
6651a11
to
e4d2570
Compare
2d17904
to
a604f0b
Compare
if (invoice.actionState !== 'FAILED') { | ||
if (invoice.actionState === 'PAID') { | ||
throw new Error('Invoice is already paid') | ||
} | ||
throw new Error(`Invoice is not in failed state: ${invoice.actionState}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing these error messages probably makes error messages during bugs worse, for example if retry
via notifications fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the thing to do in this case is return the pending/paid invoice rather than an error.
We should do the same thing in #1669 - if an attempt is made to vote again, just return the original vote.
It's kind of painting over the bug - which is some kind of clientside cache issue - but it's way better UX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this cause the client to retry a paid invoice if the retry mutation doesn't throw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we'd have to modify the client logic when retrying, or ....
we could just throw a more specific error, catch it, then update the cache appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have put this PR back into review after some Q&A of 9b0bcd3. I think what we discuss here should be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh for sure. I was just riffing on your comment. I keep encountering that poll bug.
18e7b5a
to
89a6b42
Compare
89a6b42
to
9b0bcd3
Compare
9b0bcd3
to
6615d14
Compare
...result, | ||
type: paidActionType(invoice.actionType) | ||
} | ||
} catch (err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if retryPaidAction
causes the server to crash and the catch block isn't hit?
Won't the invoice be kept in RETRY_PENDING
limbo?
Wouldn't the expiring lock you had previously still work with this retry flow? Or no?
SELECT id FROM "Invoice" | ||
WHERE id = ${invoiceId} AND "userId" = ${me.id} AND "actionState" = 'FAILED' | ||
FOR UPDATE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a set query here? Wouldn't this suffice:
UPDATE "Invoice"
SET "actionState" = 'RETRY_PENDING'
WHERE id = ${invoiceId} AND "userId" = ${me.id} AND "actionState" = 'FAILED'
RETURNING *
id
is a unique column so there'd only ever be one.
Description
close #1492 based on #1785, #1787
All failed invoices are returned to every client periodically via a new query
failedInvoices
. To make sure an invoice is only retried by one client at a time, theretryPaidAction
mutation makes sure that only one client is able to transition the invoice fromFAILED
toRETRY_PENDING
.To stop after three payment attempts (= two retries), a new integer column
Invoice.paymentAttempt
tracks at which payment attempt we are. This number is increased when we retry an invoice and passnewAttempt: true
which we do when we retry these fetched failed invoices. When the number is increased, the payment will start from the beginning with all sender and receiver wallets available.TODO:
added
userCancel
column, see #1785client only polls when it has send wallets
added
"cancelledAt" < now() - interval '${WALLET_RETRY_AFTER_MS} milliseconds'
filteradded
WALLET_RETRY_BEFORE_MS
used in this filter:Additional Context
see https://github.com/stackernews/stacker.news/pull/1776/files#r1907791409Checklist
Are your changes backwards compatible? Please answer below:
yes
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7
. Tested automated retries for posting, replies and zapping with this patch:Simulate multiple clients with this patch:
Test p2p zaps with this patch that makes forwards fail and disables the fallback to CCs:
For frontend changes: Tested on mobile, light and dark mode? Please answer below:
n/a
Did you introduce any new environment variables? If so, call them out explicitly here:
no