Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex badges show up as "invalid" #1285

Closed
eproxus opened this issue Nov 27, 2017 · 19 comments
Closed

Hex badges show up as "invalid" #1285

eproxus opened this issue Nov 27, 2017 · 19 comments
Labels
operations Hosting, monitoring, and reliability for the production badge servers service-badge New or updated service badge

Comments

@eproxus
Copy link

eproxus commented Nov 27, 2017

See examples on http://shields.io or in https://github.com/eproxus/meck/blob/master/README.md

@paulmelnikow
Copy link
Member

Confirmed, these are all showing invalid.


https://img.shields.io/hexpm/dw/plug.svg


https://img.shields.io/hexpm/v/plug.svg


https://img.shields.io/hexpm/v/meck.svg


https://img.shields.io/hexpm/l/meck.svg

Probably an API change. Would someone like to look into it? Here are the tests.

@paulmelnikow paulmelnikow added service-badge New or updated service badge question Support questions, usage questions, unconfirmed bugs, discussions, ideas labels Nov 27, 2017
@PyvesB
Copy link
Member

PyvesB commented Nov 28, 2017

I'll have a look into this issue sometime this week.

@platan
Copy link
Member

platan commented Nov 28, 2017

I run Shields locally and all Hex.pm badges are working correctly. Service-tests are working as well.

@paulmelnikow
Copy link
Member

Could you try the deployed commit too? It's possible, if unlikely, that it's been fixed since the deploy.

@platan
Copy link
Member

platan commented Nov 28, 2017

I works with master (4b5bf03) and gh-pages (2fd5949).

@PyvesB
Copy link
Member

PyvesB commented Nov 28, 2017

I checked the API response, and it seems to be consistent with the processing done in the code. Tests are working fine as well.

I also fired a server up with the currently deployed commit on my local machine (Node 8.9.1). Hex badges are generated as expected:
screenshot from 2017-11-28 19-55-40

Therefore I'm unsure why we are getting such errors on these badges. After a quick look into the hexpm repository, throttling/address blocking seems to be implemented on their side. Could we be hitting the rate limitations and trying to parse bogus responses, leading to "invalid" badges? Trying to make an API request directly from the production server may help us out here.

@ericmj
Copy link

ericmj commented Nov 28, 2017

If you provide a list of IPs that your production servers use I can can check if they have hit the rate limiting on Hex.pm.

@paulmelnikow
Copy link
Member

Making a request to the production servers is a good idea. I don't have access yet, however.

Here are the three IPs:

$ host s0.shields-server.com
s0.shields-server.com is an alias for vps71670.vps.ovh.ca.
vps71670.vps.ovh.ca has address 192.99.59.72
$ host s1.shields-server.com
s1.shields-server.com is an alias for vps244529.ovh.net.
vps244529.ovh.net has address 51.254.114.150
$ host s2.shields-server.com
s2.shields-server.com is an alias for vps117870.vps.ovh.ca.
vps117870.vps.ovh.ca has address 149.56.96.133

@paulmelnikow paulmelnikow added operations Hosting, monitoring, and reliability for the production badge servers and removed question Support questions, usage questions, unconfirmed bugs, discussions, ideas labels Nov 29, 2017
@ericmj
Copy link

ericmj commented Nov 29, 2017

All of these IP addresses have been blocked because they consistently exceeded 100 requests/min to the Hex.pm API. We can unblock them but my guess is that they will hit the rate limiting again.

I have suggested before that shields.io should do conditional HTTP requests and do request collapsing. As an example when I load http://shields.io/ 3 individual API requests are made for each Hex.pm plug package badge. Why are these requests not collapsed into a single request, why is the cache-control, etag, and last-modified headers ignored?

This is ignoring the hundreds of other badges that are loaded from different services, refreshing http://shields.io/ is a great way to have your own little DOS service.

If shields will not improve its caching I guess we have to build a special endpoint that is cheaper and only returns the data you need and that we don't have to rate limit. If you let us know what endpoints you hit on the Hex.pm API and the fields you need we can build this endpoint for you.

@paulmelnikow
Copy link
Member

I’m happy to discuss solutions. I joined the project several months ago so I wasn't part of the previous discussion. The Shields servers serve about 10k requests per minute, and while I don’t have per-service stats, I’m not surprised that the servers could at times make more than 100 req/min to Hex.pm.

The caching in Shields is based on the request. That means subsequent requests for the same badge will be cached for a while, though requests for different badges (e.g. license vs. version vs. downloads) will not. So the home page probably is not the problem. Once those badges have generated once, they will not make new requests until they are invalidated.

It would be nice to add caching for the service requests! Since a lot of projects will display multiple badges which pull the same data, we could save a sizable number of requests this way. I could see implementing it as part of the service rewrite.

Again I don't have exact numbers, but my impression is that Shields gets by on a tiny hosting budget, relying on optimized code, and avoiding any compute-intensive work. See this conversation on Twitter. The in-memory cache is size-limited to avoid OOM conditions, so I’m guessing to add a more sizable cache we’d need to add some hosting budget and back it with Redis or Memcache.

The data the current badges use is:

  • downloads.week
  • downloads.day
  • downloads.all
  • releases[0].version
  • meta.licenses

You could make a batch process that dumps this to a static file, which we could grab once an hour (or once a day) and keep in the cache.

Another thought… would it be possible to add caching behind your endpoint? What makes this request so expensive?

@paulmelnikow
Copy link
Member

You might also consider issuing an API key for Shields, as some other services have done.

@platan
Copy link
Member

platan commented Nov 29, 2017

Shields.io webpage currently displays about 320 badges. If it's visited frequently (more frequent requests from shields.io than from other pages) all badges should be in cache.
Referer stats from requests would give us answer about sources of traffic.

Do you know how much RAM does shields servers have? I would like to compare in memory caching made by shields code with Varnish Cache https://varnish-cache.org/intro/

@ericmj
Copy link

ericmj commented Nov 29, 2017

I have whitelisted the shields.io IPs so they should not be blocked in the future.

The data the current badges use is:

Thanks for this. I will create an optimized endpoint that only returns the data that shields uses, I will let you know when it's live.

Another thought… would it be possible to add caching behind your endpoint? What makes this request so expensive?

It's not super expensive, it's just that if we make a specific endpoint that ignores the rate limiting I feel like it should be optimized.

You might also consider issuing an API key for Shields, as some other services have done.

Sounds like a good idea, this way we don't have to rely on whitelisting specific IPs.

Shields.io webpage currently displays about 320 badges. If it's visited frequently (more frequent requests from shields.io than from other pages) all badges should be in cache.
Referer stats from requests would give us answer about sources of traffic.

This is not what I am seeing, if I grep our access logs and reload http://shields.io I see 3 new requests every time.

@platan
Copy link
Member

platan commented Nov 29, 2017

This is not what I am seeing, if I grep our access logs and reload http://shields.io I see 3 new requests every time.

Thanks for this info! So they are not in shields cache (but we have 3 servers - 3 caches).

@paulmelnikow
Copy link
Member

This is not what I am seeing, if I grep our access logs and reload http://shields.io I see 3 new requests every time.

Is that still happening after the whitelisting?

As @platan aptly observes, we do have three servers, and they don't share a cache. The requests should trickle to zero, as each of the three accumulates the five badges in its cache.

There are five Hex.pm badges on the page, so if nothing were being cached I'd expect to see five.

I have whitelisted the shields.io IPs so they should not be blocked in the future.

Thank you!

@eproxus
Copy link
Author

eproxus commented Jun 26, 2018

@paulmelnikow @platan (cc @ericmj) The badges are still showing up as invalid, I think it hasn't really changed since last year. This affects all Erlang and Elixir projects that are using Shields.io for Hex.pm 🙁

@ericmj
Copy link

ericmj commented Jun 26, 2018

It was definitely fixed last year and they work for me when I go to shields.io although the badges loads slowly. What badges are invalid for you?

@eproxus
Copy link
Author

eproxus commented Jun 26, 2018

Badges on https://github.com/eproxus/meck/blob/master/README.md

Looking at the requests it seems requests to to camo.githubusercontent.com times out:

[Error] Failed to load resource: the server responded with a status of 504 (Gateway Timeout) (68747470733a2f2f696d672e736869656c64732e696f2f686578706d2f762f6d65636b2e7376673f7374796c653d666c61742d737175617265, line 0)

Looking further, GitHub generates the following HTML for the README:

<a href="https://hex.pm/packages/meck" rel="nofollow">
    <img src="https://camo.githubusercontent.com/cc57adb0caefa2a016a54b863eded43f96dcd269/68747470733a2f2f696d672e736869656c64732e696f2f686578706d2f762f6d65636b2e7376673f7374796c653d666c61742d737175617265" 
         alt="Hex.pm Version"
         data-canonical-src="https://img.shields.io/hexpm/v/meck.svg?style=flat-square"
         style="max-width:100%;">
</a>

Accessing https://img.shields.io/hexpm/v/meck.svg?style=flat-square actually works, but I think the issue is that it is too slow (13.69 s) so probably GitHub's in-between caching layer breaks. A request to https://hex.pm/api/packages/meck is very fast, 137 ms (I assume this is the endpoint Shields.io is using).

Looks like the issue is the request speed to img.shields.io.

@eproxus
Copy link
Author

eproxus commented Jun 26, 2018

Seems to be the case of #1568.

Closing this since it was just related to Hex.pm, which is now no longer the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
operations Hosting, monitoring, and reliability for the production badge servers service-badge New or updated service badge
Projects
None yet
Development

No branches or pull requests

5 participants