-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serve-expired does not adhere to secure-by-default principle #1175
Comments
Hi Pim,
When implementing RFC 8767 we purposely left the Serving expired answers is not ideal because the upstream explicitly communicates the TTL of the original record. Serving expired answers is mostly for controlled environments or big installations where the expired answers are used to push the qps up. For regular resolving I would advise to turn it off and instead use |
Hey Yorgos, yeah this was definitely my own mistake for just using 'some' docker container and trusting that container was using a sane / secure configuration without taking a good look at it's configuration myself. So the only one to blame for that is me 😿 One other thought I had about |
On the Docker topic, I hear good stories about the one maintained by @madnuttah. |
An (semi) official recommendation would be welcome too, will look at that docker image. Thanks! I'm currently using https://hub.docker.com/r/klutchell/unbound (see also mentioned issue above) because his build/config seemed ok as well and that image has 10M+ pulls on docker hub as opposed to 50K+ for the image by @madnuttah. I remember looking at https://hub.docker.com/r/mvance/unbound too, but iirc correctly then klutchell's image is very small / distribution less while mvance's isn't. |
There are such numbers! Look for But I think you are holding it wrong :) |
If you're going into the config to set serve-expired to yes - it doesn't take much to change the serve-expired-ttl at the same time whilst in the config? I don't see this as a "real" issue, as such. |
@AlexanderBand FWIW, madnuttah also sets @gthess Thanks for the tip about @Dynamic5912 That's not the point, you make an assumption that people already know that they have to configure |
If people are configuring Unbound they should have the knowledge and know-how to understand what they're configuring and enabling. |
Thank you for your heads-up! I'll fix that asap. As you've mentioned, the default config doesn't serve expired entries. |
Signed-off-by: ϺΛDИVTTΛH <[email protected]>
Wouldn't having the cache TTL as 0 and using Prefetch negate any issues though? Prefetch would (should) update those records in the cache as they are technically expired? |
I am not sure I follow but prefetch and serve-expire trigger the same fetching logic. Prefetch does it when a currently non-expired reply is in the last 10% of the original TTL, serve-expired does it when the record is expired. Both try to update the cache. So if you only have prefetch enabled, then an expired record will not be used and instead resolved normally. |
Is serve-expired-ttl still staying with a default value of "0" with this new change? |
Good question. |
Most people set a time of 1 day if you look at various forums/posts/blogs about setting up Unbound. So I think 1 day would be a good baseline default setting. |
Agreed, secure-by-default also means to stay on the safe side (of recommendations) which in this case is the lower value. That said, I still think that adding both an upper bound as taking the actual TTL value of the DNS record into account could be worthwhile (ie the above suggested |
Also as mentioned above, the I just don't think that many people are already really aware of RFC-8767 and it's implications for scalable cloud solutions. But those same people are probably aware of general potential DNS issues, so they likely already have set a reasonable TTL (but again without realizing that RFC-8767 recommends a static value to cache for 1-3 days). Which means that any scalable cloud hosted application that wants to provide ultimate safety to their clients (which is basically a GDPR requirement) either needs to buy their own ip ranges or they need to block cloud resources for 3 days after scaling down (which seems unreasonable). To be fair, this also seems like a short coming / oversight in RFC-8767. So maybe this should be discussed with the authors of RFC-8767? Wdyt @gthess & @AlexanderBand ? |
I understand you concern but to be fair, using serve-expired is altogether unreasonable. The upstream sends a TTL with the answer and clearly clarifies when the record should not be used anymore. The upstream cannot account for serve-stale being used because there lies madness. And if a resolver party points you to stale data that is the resolver party's fault/optimization. Both the static value and a dynamic one can lead to failures as any assumptions on the original TTL are probably wrong because the only thing that TTL conveys is the expiry time. For your factor proposal I am seeing it from a different angle. I would use a factor of the original TTL like 80% for example. Then it would mean that a cached record can be served for 180% of the original TTL, the last 80% being expired. I can't find a good value to suggest for either your proposal or mine because nothing makes sense to me :) The serve stale RFC made an effort to standardize the operational practice of serving expired records; these installations have the traffic to keep expired records fairly up-to-date. I am not sure if you are involved with the IETF, but if not it is easy to start a discussion, the IETF is a pretty open space. You can contact the authors directly or create a new topic on the dnsop mailing list. |
Many TTL's have expirations of 1 hour or less, some are even a matter of minutes or seconds. I highly doubt DNS records are changing that frequently and it just causes unnecessary traffic on connections. Serve-Expired is fine when used within the RFC guidelines, and an expiration of 1 day leaves little room for error/compromise compared to an indefinite "0" as currently used by default. |
Not sure I read this correctly, but also given what you said before do you/NLnet (strongly?) recommend to disable
Although we might word things differently, it seems we've been almost in perfect sync about the issues. F.e. using 80% instead of a factor of 1.8 seems more of an implementation detail to me 😉 The main reason I was thinking in factors was because my starting point was a 1 minute TTL, and I think that caching a 1 minute TTL for 5 minutes (factor 5) should probably be ok (but not for a TTL of 1 day)..
No, I'm not involved with IETF. So if you think some cache timer that takes the original TTL into account is a good idea, then I have no issue if you / NLnet take ownership as y'all are probably much more into the nitty-gritty details that comes with dns RFC's then I am 🙏 |
* nlnet/master: (26 commits) - For NLnetLabs#1175, update serve-expired tests. - Fix NLnetLabs#1175: serve-expired does not adhere to secure-by-default principle. The default value of serve-expired-client-timeout is set to 1800 as suggested by RFC8767. - Fix comparison to help static analyzer. Changelog entry for NLnetLabs#1169: - Merge NLnetLabs#1169 from Sergey Kacheev, fix: lock-free counters for auth_zone up/down queries. fix: lock-free counters for auth_zone up/down queries - Fix for NLnetLabs#1183: release nsec3 hashes per test file. - Fix NLnetLabs#1183: the data being used is released in method nsec3_hash_test_entry. - Complete fix for max-global-quota to 200. - More descriptive text for 'harden-algo-downgrade'. - Increase the default of max-global-quota to 200 from 128 after operational feedback. Still keeping the possible amplification factor (CAMP related issues) in the hundreds. Changelog entry for: - Fix SETEX check during Redis (re)initialization. - Fix SETEX check during Redis (re)initialization. - Fix to log redis timeout error string on failure. - Fix for the serve expired DNSSEC information fix, it would not allow current delegation information be updated in cache. The fix allows current delegation and validation recursion information to be updated, but as a consequence no longer has certain expired information around for later dnssec valid expired responses. Changelog note for NLnetLabs#1167 - Merge NLnetLabs#1167: Makefile.in: fix occasional parallel build failures around bison rule. Makefile.in: fix occasional parallel build failures around bison rule (NLnetLabs#1167) - Fix redis that during a reload it does not fail if the redis server does not connect or does not respond. It still logs the errors and if the server is up checks expiration features. - Fix redis that during a reload it does not fail if the redis server does not connect or does not respond. It still logs the errors and if the server is up checks expiration features. Changelog entry for NLnetLabs#1157: - Merge NLnetLabs#1157 from Liang Zhu, Fix heap corruption when calling ub_ctx_delete in Windows. Fix heap corruption when calling ub_ctx_delete in Windows (NLnetLabs#1157) ...
* nlnet/master: - For NLnetLabs#1207: [FR] Support for RESINFO RRType 261 (RFC9606), add LDNS_RR_TYPE_RESINFO similar to LDNS_RR_TYPE_TXT. Changelog entry for NLnetLabs#1204: - Merge NLnetLabs#1204: ci: set persist-credentials: false for actions/checkout per zizmor suggestion. set persist-credentials: false per zizmor suggestion - Fix typo in log_servfail.tdir test. Changelog entry for NLnetLabs#1187: - Merge NLnetLabs#1187: Create the SSL_CTX for QUIC before chroot and privilege drop. Create the SSL_CTX for QUIC before chroot and privilege drop (NLnetLabs#1187) - Safeguard alias loop while looking in the cache for expired answers. - Merge NLnetLabs#1198: Fix log-servfail with serve expired and no useful cache contents. - For NLnetLabs#1175, the default value of serve-expired-ttl is set to 86400 (1 day) as suggested by RFC8767. Changelog entry for NLnetLabs#1189, NLnetLabs#1197: - Merge NLnetLabs#1189: Fix the dname_str method to cause conversion errors when the domain name length is 255. - Merge NLnetLabs#1197: dname_str() fixes. - For NLnetLabs#1193, introduce log-servfail.tdir and cleanup the log-servfail setting from other tests. - Fix NLnetLabs#1193: log-servfail fails to log host SERVFAIL responses in Unbound 1.19.2 on Ubuntu 24.04.1 LTS, by not considering cached failures when trying to reply with expired data. - For NLnetLabs#1189, homogenize the input buffer size for dname_str(). - For NLnetLabs#1189, add unit tests for dname_str() and debug check the input buffer size. Fix the dname_str method to cause conversion errors when the domain name length is 255
Describe the bug
If you just set
serve-expired: yes
in your config, then unbound could serve stale dns entries for ever as the default value forserve-expired-ttl
is 0. This is not a secure configuration.To reproduce
n.a.
Expected behavior
unbound should also follow suggestions from RFC-8767 and not just recommendation and therefore set a non-zero default value for
serve-expired-ttl
. The decision to serve expired data forever should be an explicit decision by the user as they put themselves at risk by doing so. See below for why.System:
n.a.
Additional information
1. Security issue
As background and to explain why this should probably be considered a security issue, I was bit by this last week when I was performing an online payment when my browser stopped the request as I was about to be redirected to an unsecure page/domain.
After investigating I noticed that the SSL certificate of the page I was on did not belong to the PSP that was handling the payment but some random other third party. After contacting the PSP they mentioned that they had reason to believe this was caused by DNS caching. Although I didn't believe them at first, after diving into the unbound config more I'm quite sure they were right (sry PSP).
Looking at the unbound logs, I had last visited that PSP checkout page something like a month ago and I guess that when I first requested the PSP's checkout page last week that unbound served the cached DNS response from last month. But it seems that in between last month and last week, the PSP scaled down/changed their capacity causing the resources they used last month to not be used anymore. And their cloud provider had allocated those resources to another third party.
TLDR: this is a security issue as due to this default unbound configuration I almost submitted personal / payment information to some random third party as unbound served me a month old DNS record. Luckily my browser stopped me from doing that.
2. RFC
Also, I wonder whether unbound does really adhere to RFC-8767. Unbound says about enabling
serve-expired
that it attempts to serve old responses ... without waiting for the actual resolution to finish. But section 7 of the RFC says that a good-faith effort has been recently made to refresh the stale data before it is delivered to any client.I might misunderstand the RFC here, but it seems the RFC says an attempt to refresh has to be made while unbound immediately sends stale data without trying to refresh? I guess that unbound could better adhere to the RFC by also setting a non-zero value for
serve-expired-client-timeout
by default.3. Mitigation
Instead or in addition to setting
serve-expired-ttl
, maybe a setting likeserve-expired-ttl-factor
could be added. This might be outside the scope of RFC-8767, but the party who knows best how long DNS data should be cached is the owner of the DNS recordset who already sets a TTL themselves. And setting a fixed duration forserve-expired-ttl
seems wrong as it doesn't take the intentions of the original DNS owner into account. A setting likeserve-expired-ttl-factor: 10
could be added to work next toserve-expired-ttl
, whereserve-expired-ttl
is the maximum value andserve-expired-ttl-factor
means that unbound only serves stale cache data for the configured factor value times the TTL that the owner of the DNS record configured.F.e. setting a config like:
could mean that when requesting a DNS record with a configured TTL of 60 seconds would then only be cached for
10 * 60 = 600
seconds while a DNS record with a TTL of10.000
would be cached for86400
seconds (and thus not10 * 10.000 = 100.000
)The text was updated successfully, but these errors were encountered: