Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redirects to www only matter if they're within the same hostname #154

Merged
merged 2 commits into from
Feb 26, 2018

Conversation

konklone
Copy link
Collaborator

This updates the definition of redirects_immediately_to_www to only care about redirects to the corresponding www endpoint.

Previously, that field was set if it redirected to any hostname beginning with www, including external redirects. However, the only reason we would ever care about redirecting to a hostname with www is to evaluate whether the domain appears to be redirecting within itself to favor the www hostname as its canonical endpoint.

The canonical endpoint function was adding an additional scoping around that to make sure it wasn't an external redirect, but that still didn't catch the case where it was redirecting to a www hostname within the same parent domain but not on the same evaluated hostname (e.g. search.cio.gov -> www.cio.gov).

This scopes the redirects_immediately_to_www field calculation, and drops the additional (now unneeded) scoping during canonical endpoint calculation.

There is an interesting tiny edge case where the www endpoint could itself redirect to another www prefix prepended, such as https://www.example.gov -> https://www.www.example.gov. This would set the redirects_immediately_to_www field for the www endpoint. That would be technically correct, and also wouldn't get factored into the calculation of the canonical endpoint for example.gov (which only interrogates the http and https endpoints), so I think that's fine.

Fixes #152.

cc @PaulSD

@jsf9k
Copy link
Member

jsf9k commented Feb 26, 2018

Nice catch, @konklone and @PaulSD!

@jsf9k jsf9k merged commit e9e212c into develop Feb 26, 2018
@jsf9k jsf9k deleted the tighten-www-redirect-check branch February 26, 2018 15:16
# hostname, even within the same parent domain).
endpoint.redirect_immediately_to_www = (
subdomain_immediate.startswith("www.") and
(re.sub("www\.", "", subdomain_immediate) == subdomain_original)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just subdomain_intermediate == 'www.'+subdomain_original be easier and more efficient here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably! :) Went ahead and made this improvement (using interpolation) in #155.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pshtt incorrectly labeling www as the canonical_endpoint when root redirects to *any* www URL
3 participants