Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only replace refresh result if successful or current result is invalid #561

Merged
merged 10 commits into from
Jul 26, 2021

Conversation

shubha-rajan
Copy link
Contributor

@shubha-rajan shubha-rajan commented Jul 22, 2021

Change Description

Checklist

  • Make sure to open an issue as a
    bug/issue
    before writing your code! That way we can discuss the change, evaluate
    designs, and agree on the general idea.
  • Ensure the tests and linter pass
  • Appropriate documentation is updated (if necessary)

Relevant issues:

@shubha-rajan shubha-rajan force-pushed the update-refresh-logic branch from ca52213 to a2464fd Compare July 22, 2021 20:43
@shubha-rajan shubha-rajan requested a review from kurtisvg July 22, 2021 20:47
@enocom enocom self-requested a review July 22, 2021 20:48
@shubha-rajan shubha-rajan added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@shubha-rajan shubha-rajan force-pushed the update-refresh-logic branch from 96c0082 to 34159ac Compare July 22, 2021 21:58
Comment on lines 385 to 389
nextInstanceData = Futures.immediateFuture(performRefresh());
}
} else {
forceRefresh();
}
Copy link
Contributor

@kurtisvg kurtisvg Jul 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic we want here is:

On error:
     if current is invalid:
          replace current
     next = new new refresh

Since we know this refresh failed, we always want to schedule a new one. We just don't want it to replace "current" if we think "current' might still be good.

public void onFailure(Throwable t) {
logger.log(Level.WARNING,
"An error occurred while performing refresh. Retrying immediately.", t);
if (getInstanceData().getExpiration().toInstant().isBefore(Instant.now())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful here. There are at least two problems I see with the current code.

  1. If "current" is in a bad state (e.i. error) it'll throw an (maybe unrelated) error and nothing past this will happen
  2. if "current" is this thread (because we've already forced refresh), this will - well I'm not sure. Block forever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we'd ever get into a position where current is still pending, since this callback happens after refreshFuture has completed. The first problem is something to think about though. how can we check the validity of the instance data here, without potentially throwing errors. Would a try/catch work, where we try getInstanceData and if we get an error, we replace current

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think I would go with a try/catch around it. Just to clarify - you are sure that the callback will trigger after the future is completed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to think about - I think we actually want the replace of current and next to happen before the future is returned. Otherwise you may introduce a race condition where we return, fail to connect, and force refresh before the callback is executed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the callback will definitely trigger after the future is completed. I get what you're saying with the race condition, where a force refresh could be triggered before the callback, but how can we replace current before we know the future's result if our goal is to avoid replacing a valid InstanceData with an invalid one?

I think in the case where a force refresh is triggered before the callback, the worst that would happen is that we get an extra refresh attempt a minute later, since the rate limiter will delay the force refresh anyway. If the refresh attempt is successful, then we'd end up replacing a valid result with another (hopefully) valid result, or wait for another refresh attempt to fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, I think it's preferable if whatever the callback behavior is (replacing current or just rescheduling) before the future counts as "done" .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubha-rajan shubha-rajan requested a review from kurtisvg July 22, 2021 23:39
…/cloud-sql-jdbc-socket-factory into update-refresh-logic
@shubha-rajan shubha-rajan force-pushed the update-refresh-logic branch from c2a38b6 to ebe4d3b Compare July 22, 2021 23:45
@shubha-rajan shubha-rajan added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 22, 2021
@shubha-rajan shubha-rajan added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 23, 2021
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 23, 2021
Copy link
Contributor

@kurtisvg kurtisvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minus my one nit. Can we make sure we test it in Cloud Run for an extended period of time (~24hours?) before merging?

@shubha-rajan shubha-rajan added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 23, 2021
@enocom
Copy link
Member

enocom commented Jul 26, 2021

After running this over the weekend, here's what I saw:

image

That's one runtime exception in several days, but that didn't result in a 500. This PR seems solid.

@shubha-rajan
Copy link
Contributor Author

That's one runtime exception in several days, but that didn't result in a 500. This PR seems solid.

Good to know! I'll merge this then. Thanks!

@shubha-rajan shubha-rajan removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants