Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL with multiple consecutive dots should not be valid? #25

Closed
dentarg opened this issue Feb 1, 2015 · 19 comments · Fixed by #159
Closed

URL with multiple consecutive dots should not be valid? #25

dentarg opened this issue Feb 1, 2015 · 19 comments · Fixed by #159

Comments

@dentarg
Copy link
Collaborator

dentarg commented Feb 1, 2015

$ ruby -e "require 'twingly/url'; p Twingly::URL.validate('www..hej..se')"
true
$ dig www..hej..se
dig: 'www..hej..se' is not a legal name (empty label)
@dentarg dentarg added the bug label Feb 1, 2015
@dentarg
Copy link
Collaborator Author

dentarg commented Feb 1, 2015

However a domain name with multiple consecutive dots is not valid since the length of each label has to be more than 0.

http://stackoverflow.com/questions/27142359/is-a-url-with-multiple-consecutive-dots-valid

Related: http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid

@roback roback added the wontfix label Sep 9, 2015
@roback
Copy link
Member

roback commented Sep 9, 2015

A URI with consecutive dots would be "dropped" by .NET System.Uri, Addressable::URI considers this to be valid though. We wont fix this at the moment, not sure what behaviour is "best". / The Mob (@roback @jage @walro @DevL @dentarg)

roback pushed a commit that referenced this issue Sep 9, 2015
@roback roback changed the title URL with multiple consecutive dots should not be valid URL with multiple consecutive dots should not be valid? Sep 9, 2015
@roback roback added the question label Sep 9, 2015
@walro
Copy link
Contributor

walro commented Oct 30, 2015

Just wanted to check what the status was since 2.0.0, still the same:

irb(main):002:0> Twingly::URL.parse("www..hej..se").valid?
=> true

@dentarg
Copy link
Collaborator Author

dentarg commented Nov 5, 2016

The regular one year check:

$ pry
[1] pry(main)> require "twingly/url"
=> true
[2] pry(main)> Twingly::URL.parse("www..hej..se")
=> #<Twingly::URL:0x3fe1840b4dac http://www..hej..se>
[3] pry(main)> Twingly::URL.parse("www..hej..se").valid?
=> true
[4] pry(main)> Twingly::URL::VERSION
=> "5.0.1"

@dentarg
Copy link
Collaborator Author

dentarg commented May 17, 2018

Someone reported this upstream: weppos/publicsuffix-ruby#150

@dentarg
Copy link
Collaborator Author

dentarg commented May 17, 2018

$ pry -rtwingly/url
[1] pry(main)> Twingly::URL.parse("www..hej..se")
=> #<Twingly::URL:0x3fc97ae4e958 http://www..hej..se>
[2] pry(main)> Twingly::URL.parse("www..hej..se").valid?
=> true
[3] pry(main)> Twingly::URL::VERSION
=> "5.1.1"

@dentarg
Copy link
Collaborator Author

dentarg commented May 17, 2018

[9] pry(main)> (url.methods - Object.methods - [:between?, :clamp]).each { |method| puts "#{method}: #{url.public_send(method).inspect}" }
path: ""
host: "www..hej..se"
password: ""
valid?: true
scheme: "http"
userinfo: ""
user: ""
domain: "hej.se"
origin: "http://www..hej..se"
normalized: #<Twingly::URL:0x3fe6b8d3a804 http://www..hej..se/>
sld: "hej"
trd: "www."
tld: "se"
ttld: "se"
without_scheme: "//www..hej..se"
normalized_scheme: "http"
normalized_host: "www..hej..se"
normalized_path: "/"

@Vigneshbalakrishnan118
Copy link

I am the one who raised the issue there..
Is there any solution or it won't be fixed?

@jage
Copy link
Contributor

jage commented May 21, 2018

I'm not sure what the best response is to the .valid? message, but I think we should normalize the dots, i.e.:

Instead of:

irb(main):003:0> Twingly::URL.parse("www..hej..se").to_s
=> "http://www..hej..se"
irb(main):004:0> Twingly::URL.parse("www..hej..se").normalized.to_s
=> "http://www..hej..se/"

It would do this:

irb(main):003:0> Twingly::URL.parse("www..hej..se").to_s
=> "http://www..hej..se"
irb(main):004:0> Twingly::URL.parse("www..hej..se").normalized.to_s
=> "http://www.hej.se/"

New issue for this?

@dentarg
Copy link
Collaborator Author

dentarg commented May 21, 2018

@jage Sounds good to me, yeah a new issue for that is the best I think

@dentarg
Copy link
Collaborator Author

dentarg commented May 21, 2018

@jage Ah no, I don't think we should do that, because of

#25 (comment)

A URI with consecutive dots would be "dropped" by .NET System.Uri

@dentarg
Copy link
Collaborator Author

dentarg commented May 21, 2018

I think the goal for twingly-url normalize is to do the same as our .NET normalizer, correct?

@dentarg
Copy link
Collaborator Author

dentarg commented May 21, 2018

I don't want #normalized to become a method that you use to try to "fix" broken URLs, because in other cases, calling #normalized will break (change) working URLs.

@jage
Copy link
Contributor

jage commented May 21, 2018

A URI with consecutive dots would be "dropped" by .NET System.Uri

Then I think we have the answer? If we should do the same, it should not be valid (i.e. dropped).

I think the goal for twingly-url is to do the same as our .NET normalizer, correct?

Yes it would be best if it's just "the same" normalisation, but I think this is a change where we could change the .NET normalisation as well.

I'm not trying to fix a URL, I'm trying to create a normalized form so we can compare and see if the URL is blocked, already in the system etc.

@jage
Copy link
Contributor

jage commented May 21, 2018

Let's keep the discussion of the validity here, and normalization in #125

@dentarg
Copy link
Collaborator Author

dentarg commented May 21, 2018

I'm not trying to fix a URL, I'm trying to create a normalized form so we can compare and see if the URL is blocked, already in the system etc.

I see.

I think not valid makes sense because other tools doesn't treat http://www..hej..se as a valid URL. Just tried both http://www..hej..se and www..hej..se in Chrome, and it searches Google for me, doesn't try to visit the site in question. Also see the first post here about dig.

@dentarg
Copy link
Collaborator Author

dentarg commented Sep 3, 2019

$ ruby -e "require 'twingly/url'; url = Twingly::URL.parse('www..hej..se') ; p Twingly::URL::VERSION, url, url.valid?"
"6.0.2"
#<Twingly::URL:0x3fd0578a13c8 http://www..hej..se>
true

@dentarg
Copy link
Collaborator Author

dentarg commented Oct 8, 2022

#158 made these URLs invalid, this issue can be closed!

@DevL
Copy link

DevL commented Oct 8, 2022

It's been a long time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants