Redirects to www only matter if they're within the same hostname #154
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This updates the definition of
redirects_immediately_to_www
to only care about redirects to the correspondingwww
endpoint.Previously, that field was set if it redirected to any hostname beginning with
www
, including external redirects. However, the only reason we would ever care about redirecting to a hostname withwww
is to evaluate whether the domain appears to be redirecting within itself to favor thewww
hostname as its canonical endpoint.The canonical endpoint function was adding an additional scoping around that to make sure it wasn't an external redirect, but that still didn't catch the case where it was redirecting to a
www
hostname within the same parent domain but not on the same evaluated hostname (e.g.search.cio.gov
->www.cio.gov
).This scopes the
redirects_immediately_to_www
field calculation, and drops the additional (now unneeded) scoping during canonical endpoint calculation.There is an interesting tiny edge case where the
www
endpoint could itself redirect to anotherwww
prefix prepended, such ashttps://www.example.gov
->https://www.www.example.gov
. This would set theredirects_immediately_to_www
field for thewww
endpoint. That would be technically correct, and also wouldn't get factored into the calculation of the canonical endpoint forexample.gov
(which only interrogates thehttp
andhttps
endpoints), so I think that's fine.Fixes #152.
cc @PaulSD