Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

504 Gateway Timeout on badges inside GitHub README #4223

Closed
1 task
jcbcn opened this issue Oct 21, 2019 · 18 comments
Closed
1 task

504 Gateway Timeout on badges inside GitHub README #4223

jcbcn opened this issue Oct 21, 2019 · 18 comments
Labels
operations Hosting, monitoring, and reliability for the production badge servers

Comments

@jcbcn
Copy link

jcbcn commented Oct 21, 2019

Are you experiencing an issue with...

🪲
I am receiving 504 (Gateway Timeout) when viewing badges inside GitHub README's over a range of badge types. This is happening with least one badge in the README each time I try.

Possibly linked to #1568.

Thanks

@jcbcn jcbcn added the question Support questions, usage questions, unconfirmed bugs, discussions, ideas label Oct 21, 2019
@paulmelnikow
Copy link
Member

Hi! Thanks for the report.

I haven't been able to reproduce this today on the Shields repos or any other repos that I've noticed.

Is this a public repo where you're seeing this happen?

@jcbcn
Copy link
Author

jcbcn commented Oct 21, 2019

Yeah, repo is here: https://github.com/pitch-game/pitch-api

I do have 17 in my README which may be the issue (maybe a few too many!). It is usually only happening on the test and coverage ones, rather than the three at the top.

I could move them into individual README's, but figured this could be a wider issue.

@paulmelnikow
Copy link
Member

It should work, even with lots of badges…

I wonder if these are timing out because the upstream requests are slow. GitHub cuts things off at 4 seconds, so if the upstream requests take longer than that, the only thing we can do is put the right answer in the cache and then get it right on the next request (e.g. after refresh).

@paulmelnikow
Copy link
Member

By the way, these are loading for me:

Screen Shot 2019-10-21 at 7 28 36 PM

@jcbcn
Copy link
Author

jcbcn commented Oct 22, 2019

I'm starting to see this on first load of my frontend repo as well. I've just loaded up the README again this evening and am seeing the same behaviour:

image

It seems like it's slowly creeping back in again. After #1568 was fixed I haven't seen a broken badge for ages, but recently in the past week they seem to be more regular. I wonder if the IP ranges have changed which the cache was using?

However, you can see each request was taking ~8 seconds, so maybe it's like you said above and the downstream req is timing out.

@calebcartwright
Copy link
Member

The other thing with both the Azure DevOps test and coverage badges is that we have to make two API calls to AzDO to get the actual data (the first to get the build id of the latest build and the second to get the respective data for that build)

@jcbcn
Copy link
Author

jcbcn commented Oct 22, 2019

Ah I see. The strange thing is this has been working flawlessly ever since #1568, just in the last week I've noticed it happening again. Obviously I'm not sure what's degraded, but because it had been fine for a while, I assumed it might be the caching again.

@platan
Copy link
Member

platan commented Oct 22, 2019

@calebcartwright
Copy link
Member

I believe @paulmelnikow ran a deploy at some point today (not sure when), but if there was a deployment during that same window that could potentially explain the spikes.

@paulmelnikow paulmelnikow added operations Hosting, monitoring, and reliability for the production badge servers and removed question Support questions, usage questions, unconfirmed bugs, discussions, ideas labels Oct 22, 2019
@jcbcn
Copy link
Author

jcbcn commented Oct 23, 2019

Seems to be fine since the rollback. Thanks again for looking into this. Much appreciated

EDIT: Spoke too soon. It seems to be happening again right now, if that helps to correlate with your logs

@paulmelnikow
Copy link
Member

To clarify, I think @jcbcn's initial report may be related to a capacity issue, or else due to upstream services being slow.

While the symptom was the same, the issue that happened yesterday was probably unrelated.

@jcbcn
Copy link
Author

jcbcn commented Oct 24, 2019

Just to show another repo:
image

@paulmelnikow
Copy link
Member

Have a link?

@jcbcn
Copy link
Author

jcbcn commented Oct 24, 2019

@paulmelnikow
Copy link
Member

They loaded for me just now.

Screen Shot 2019-10-24 at 10 46 37 PM

Seems like this is happening intermittently. I'm curious how much this is related to Azure DevOps and how much it's intermittent server issues.

@calebcartwright
Copy link
Member

Other than that short window after the last deploy, I haven't noticed anything either (including various Azure DevOps badges that are loading fine for me 🤷‍♂)

Either way though, it's probably worth revisiting the experiments we discussed in #3874 to update our prod environment

@jcbcn
Copy link
Author

jcbcn commented Dec 5, 2019

This looks to be resolved now. Thanks

@paulmelnikow
Copy link
Member

Thanks! Please open a new issue if this recurs.

@badges badges locked as resolved and limited conversation to collaborators Dec 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
operations Hosting, monitoring, and reliability for the production badge servers
Projects
None yet
Development

No branches or pull requests

4 participants