Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring caching upon request #534

Closed
roji opened this issue Sep 14, 2015 · 14 comments
Closed

Allow configuring caching upon request #534

roji opened this issue Sep 14, 2015 · 14 comments

Comments

@roji
Copy link

roji commented Sep 14, 2015

It seems that shields.io configures caching headers to cache for very little time, if at all. The result is that if a site has shields on multiple pages, navigating between them provides a poor experience as shields are downloaded again and again.

How about allowing websites to configure how long badges should be cached via a parameter (like link, logoWidth...)?

@espadrine
Copy link
Member

That is a great idea! And at the right time, too; I am currently pondering a lot of ideas related to caching, none as good.

What should we name the parameter? I would imagine it to directly be how long a Cache-Control max-age should be set to, for browsers to remember. It would then be a number of seconds.

@gdamore
Copy link

gdamore commented Nov 3, 2015

What about properly using ETags? A simple checksum of the input parameters would suffice, I'd think.

@espadrine
Copy link
Member

ETags have no say on cache invalidation. The only way for shields.io to know if the ETag has changed is to call the vendor and perform the computation, which is what takes time. The image download in itself is tiny.

@roji
Copy link
Author

roji commented Nov 10, 2015

Note that what I had in mind wasn't for shields.io to cache anything, via etags or otherwise. It's to have an option for it to return caching headers that would make client browsers cache badges for a configurable amount of time.

The goal is simply to avoid going to shields.io on every page which contains the badges. I, as the maintainer of a site which uses shields.io, could decide to ask shields.io to add HTTP response headers that would make client browsers cache for, say, 2 minutes or whatever.

@espadrine
Copy link
Member

@roji I think we're on the same page. Would setting a Cache-Control max-age be suitable? Maybe via ?maxAge=…? I'm not sure if maxAge is very explicit in what it does, maybe browserCacheSec=…?

@roji
Copy link
Author

roji commented Nov 13, 2015

@espadrine, I think setting Cache-Control's max-age would be the perfect solution here, I think it's definitely well-defined - it allows browsers to cache for the number of seconds without checking anything with the servers.

As you mentioned above, ETags would be useless here because the idea is to eliminate the server request entirely, whereas ETags allow the elimination of the response payload once the cache expires (via max-age). However, since in the shields.io case the payload is negligible this doesn't make much sense.

To make things generic I'd make shields.io accept a cache-control query parameter, this way you users can even use the other features of Cache-Control if it makes sense for them...

@espadrine
Copy link
Member

@roji Thanks for your suggestion! It should be live now.

@roji
Copy link
Author

roji commented Apr 10, 2016

Thanks @espadrine, but something a bit odd is going on: no matter what I specify in the maxAge URL parameter I always get max-age=7200 in the response headers (https://img.shields.io/teamcity/http/build.npgsql.org/s/npgsql.svg?label=TeamCity&style=plastic&maxAge=50).

Also don't forget to update the docs at the bottom of shields.io.

Thanks for giving this your attention!

@roji
Copy link
Author

roji commented May 5, 2016

@espadrine, any response on this bug? Whatever I specify maxAge to be, I always get headers specifying 2 hours...

@espadrine
Copy link
Member

It seems like a CloudFlare bug. You can set it to be higher than 2 hours; just not lower. On the plus side, it's just browser caching, so Ctrl+Shift+R forces an update.

@roji
Copy link
Author

roji commented May 5, 2016

OK, thanks for looking into it!

@espadrine
Copy link
Member

espadrine commented May 5, 2016

Sent to CloudFlare's support:

Hi, and thanks for the amazing work you are doing on all fronts!

My server can send a Cache-Control maxAge below two hours, but when it does, the client always receive a maxAge of exactly two hours. (When the maxAge is above two hours, it is transmitted correctly to the client.)

My Browser Cache Expiration setting is at two hours, but I have a Page Rule set to bypass cache. In theory, I believe it should let my Cache-Control HTTP header through unchanged. Is it a bug?

Here is a comparison:

$ curl -vk 'https://51.254.114.150/gitter/room/nwjs/nw.js.svg?maxAge=50' 2>&1 | grep 'Cache-Control'
< Cache-Control: max-age=50
$ curl -vk 'https://img.shields.io/gitter/room/nwjs/nw.js.svg?maxAge=50' 2>&1 | grep 'Cache-Control'
< Cache-Control: public, max-age=7200

Thanks again!

espadrine added a commit that referenced this issue May 14, 2016
Initially motivated by a CloudFlare cache error (#534).
@espadrine
Copy link
Member

Hi,

Thank you for contacting CloudFlare support.

What do you currently have set at your origin server for .svg Cache-Control headers? Let me know and we can investigate this behavior together. Thank you.


Hi!

What do you currently have set at your origin server for .svg Cache-Control headers? Let me know and we can investigate this behavior together.

It depends on the parameter ?maxAge (it is set here:

ask.res.setHeader('Cache-Control', 'max-age=' + data.maxAge);
).

You can detect that the parameter returns the correct value by
executing the following command:

curl -vk 'https://51.254.114.150/gitter/room/nwjs/nw.js.svg?maxAge=50' 2>&1 | grep 'Cache-Control'

(Note that I fixed a small issue that was present in the past few days
which intermittently made it return no Cache-Control headers.)

This command bypasses CloudFlare by addressing the IP directly. You
should notice that the Cache-Control header is set to the maxAge
parameter.

The following command goes through CloudFlare:

curl -vk 'https://img.shields.io/gitter/room/nwjs/nw.js.svg?maxAge=50' 2>&1 | grep 'Cache-Control'

The Cache-Control header, which should be set to the maxAge parameter,
is set to 2h instead. Note that this only happens if the maxAge
parameter is set to below 2h.

Thanks for looking into it!


Hi there,

That's odd! We should respect any cache-control header that is sent to us. I notice that without the query string maxAge=50 in the URL, the max-age=50 isn't set. Is it possible for you to set that within the headers for the asset as a whole? I would be interested to see if that helps at all.


That's odd! We should respect any cache-control header that is sent to us. I notice that without the query string maxAge=50 in the URL, the max-age=50 isn't set.

That is by design, yes. Instead, it sends the following header:

Cache-Control: no-cache, no-store, must-revalidate

Is it possible for you to set that within the headers for the asset as a whole? I would be interested to see if that helps at all.

I find what you ask of me unclear. Upon the reception of a request by
the server, a number of rules are followed. How the max-age header is
set is not an on/off switch. Furthermore, the number of assets is not
finite.

Unfortunately, GitHub (where a lot of users put their badges) caches
images in a way that don't — or at least, didn't a year ago — respect
the Cache-Control header well. It understood no-cache, but if there
was a max-age, users were fed outdated images long after the cache max
age.

It is not an issue that we currently have with the gitter image, as it
never changes. I changed it to always send a max-age=50 header; is
that satisfactory?

curl -vk 'https://51.254.114.150/gitter/room/nwjs/nw.js.svg' 2>&1 | grep 'Cache-Control'
#> < Cache-Control: max-age=50
curl -vk 'https://51.254.114.150/gitter/room/nwjs/nw.js.svg?maxAge=70' 2>&1 | grep 'Cache-Control'
#> < Cache-Control: max-age=50

Hello there,

Thanks for that information. I have checked again though, and I see that the Expires header doesn't also respect the extra time here, so the Expires header is the same as the current time. Here's a test I did that shows this-

curl -vso /dev/null 'https://51.254.114.150/gitter/room/nwjs/nw.js.svg?maxAge=70' -k ; date

  • Trying 51.254.114.150...
  • Connected to 51.254.114.150 (51.254.114.150) port 443 (#0)
  • TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
  • Server certificate: img.shields.io

    GET /gitter/room/nwjs/nw.js.svg?maxAge=70 HTTP/1.1
    Host: 51.254.114.150
    User-Agent: curl/7.43.0
    Accept: /

    < HTTP/1.1 200 OK
    < Cache-Control: max-age=50
    < Expires: Fri, 13 May 2016 14:23:50 GMT
    < Date: Fri, 13 May 2016 14:23:50 GMT
    < Content-Type: image/svg+xml;charset=utf-8
    < Connection: keep-alive
    < Transfer-Encoding: chunked
    <
    { [3 bytes data]
  • Connection #0 to host 51.254.114.150 left intact
    Fri May 13 15:23:51 BST 2016
    Because of the Expires header there matching the current time, we won't cache the file at all, and apply the default browser Cache-Control header (which is 2 hours). If you can adjust the Expires header to time out in 50 seconds as well here. This should work as you need it to then. If you have any more questions, simply reply to this email and we will be happy to help.

Hi!

Thanks a lot for your answer.

Because of the Expires header there matching the current time, we won't cache the file at all

My understanding of HTTP headers leads me to believe that this
behavior goes against RFC 7234 section 4.2.1:

A cache can calculate the freshness lifetime (denoted as
freshness_lifetime) of a response by using the first match of the
following:
o If the cache is shared and the s-maxage response directive
(Section 5.2.2.9) is present, use its value, or
o If the max-age response directive (Section 5.2.2.8) is present,
use its value, or
o If the Expires response header field (Section 5.3) is present, use
its value minus the value of the Date response header field, or
o Otherwise, no explicit expiration time is present in the response.

https://tools.ietf.org/html/rfc7234#section-4.2.1

Is my understanding inaccurate?

For the purpose of debugging, I followed your advice by creating a
badge that sends Expires headers set 60 seconds into the future,
agreeing with its Cache-Control header. Here is an example run:

# Direct access
curl -vk 'https://51.254.114.150/flip.svg' 2>&1 | grep 'Expires\|Date\|Cache-Control'
# < Cache-Control: max-age=60
# < Expires: Sat, 14 May 2016 13:25:00 GMT
# < Date: Sat, 14 May 2016 13:24:00 GMT

# CloudFlare access
curl -vk 'https://img.shields.io/flip.svg' 2>&1 | grep 'Expires\|Date\|Cache-Control'
# < Date: Sat, 14 May 2016 13:24:00 GMT
# < Cache-Control: public, max-age=7200
# < Expires: Sat, 14 May 2016 15:24:00 GMT

As you can tell, CloudFlare still returns incorrect results, 2 hours
into the future instead of a minute.

That being said, I noticed that the image, which I coded to flip
between two values for debugging purposes, did not do so. I realized
that I could extend the CloudFlare cache bypass rule I had to all my
endpoints with a wildcard, which makes CloudFlare send headers
unchanged (and prevents CloudFlare caching). That workaround fixes the
issue for me. I still believe there is a bug in CloudFlare as I detail
above, but it no longer applies to me.

If you'd like me to remove the wildcard to let you perform further
debugging, I will happily do so.
If that bug gets fixed, I would very much welcome a notification by mail.

@espadrine
Copy link
Member

Hello there,

That is very odd! I will check what you mention about the order of precedence of caching headers. We should just use the standard nginx behaviour I believe. If you could disable the page rule you have there, that would be great. If there's another page that you would prefer we use for testing, that's good as well!


Hello there,

I suspect there's some confusion here. Your settings have a Browser cache expiry of 2 hours (7200s), so it's normal that we present this value to customers. That's the value you see when you cURL the domain. Anything that I'm missing? Or were you trying to assess how often we refresh our cache?


Hi!

I suspect there's some confusion here. Your settings have a Browser cache expiry of 2 hours (7200s), so it's normal that we present this value to customers. That's the value you see when you cURL the domain. Anything that I'm missing? Or were you trying to assess how often we refresh our cache?

Clearly, CloudFlare follows the HTTP rules for caching most of the
time. As a result, CloudFlare does not do what you say it does here:

curl -vk 'https://img.shields.io/gitter/room/nwjs/nw.js.svg?maxAge=8000' 2>&1 | grep Cache-Control
# < Cache-Control: public, max-age=8000
# (8000, not 7200, as per the HTTP spec.)

When the Cache-Control HTTP header that my server yields has a max-age
above 7200, CloudFlare follows the HTTP spec. Below that, it no longer
follows the spec, and always sends a Cache-Control header with a
max-age at 7200.

curl -vk 'https://img.shields.io/gitter/room/nwjs/nw.js.svg?maxAge=60' 2>&1 | grep Cache-Control
# < Cache-Control: public, max-age=7200

HTTP asks of caching proxies to respect the Cache-Control header.
CloudFlare does so most of the time, but there is an edge-case
(max-age below 7200) where it does not. That is the bug.

Relying on the CloudFlare browser cache expiration setting for
responses without a Cache-Control makes sense, but it is surprising
for cases where the server explicitly has a Cache-Control header for
CloudFlare to see.


Hello there,

Let me follow-up on your previous answer:

HTTP asks of caching proxies to respect the Cache-Control header. CloudFlare does so most of the time, but there is an edge-case (max-age below 7200) where it does not. That is the bug.

It's not a bug, this is how the system is designed:

  • when a Cache-control header instructs us not to cache, we don't cache.
  • when no Cache-control header is returned or contains a max-age lower than the browser cache expiry, we return a max-age equals to the browser cache expiry (it doesn't make sense to instruct our edge to query your origin more often that your visitor's browser).
  • when a max-age parameter above the browser expiry is returned, we forward that one.

I hope this clarifies the situation, but we're happy to answer your questions further, should you have any.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants