URL with multiple consecutive dots should not be valid? #25

dentarg · 2015-02-01T16:27:43Z

$ ruby -e "require 'twingly/url'; p Twingly::URL.validate('www..hej..se')"
true

$ dig www..hej..se
dig: 'www..hej..se' is not a legal name (empty label)

dentarg · 2015-02-01T16:31:01Z

However a domain name with multiple consecutive dots is not valid since the length of each label has to be more than 0.

http://stackoverflow.com/questions/27142359/is-a-url-with-multiple-consecutive-dots-valid

Related: http://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid

roback · 2015-09-09T11:29:10Z

A URI with consecutive dots would be "dropped" by .NET System.Uri, Addressable::URI considers this to be valid though. We wont fix this at the moment, not sure what behaviour is "best". / The Mob (@roback @jage @walro @DevL @dentarg)

walro · 2015-10-30T16:24:50Z

Just wanted to check what the status was since 2.0.0, still the same:

irb(main):002:0> Twingly::URL.parse("www..hej..se").valid?
=> true

dentarg · 2016-11-05T20:21:30Z

The regular one year check:

$ pry
[1] pry(main)> require "twingly/url"
=> true
[2] pry(main)> Twingly::URL.parse("www..hej..se")
=> #<Twingly::URL:0x3fe1840b4dac http://www..hej..se>
[3] pry(main)> Twingly::URL.parse("www..hej..se").valid?
=> true
[4] pry(main)> Twingly::URL::VERSION
=> "5.0.1"

dentarg · 2018-05-17T14:35:18Z

Someone reported this upstream: weppos/publicsuffix-ruby#150

dentarg · 2018-05-17T14:36:53Z

$ pry -rtwingly/url
[1] pry(main)> Twingly::URL.parse("www..hej..se")
=> #<Twingly::URL:0x3fc97ae4e958 http://www..hej..se>
[2] pry(main)> Twingly::URL.parse("www..hej..se").valid?
=> true
[3] pry(main)> Twingly::URL::VERSION
=> "5.1.1"

dentarg · 2018-05-17T14:40:06Z

[9] pry(main)> (url.methods - Object.methods - [:between?, :clamp]).each { |method| puts "#{method}: #{url.public_send(method).inspect}" }
path: ""
host: "www..hej..se"
password: ""
valid?: true
scheme: "http"
userinfo: ""
user: ""
domain: "hej.se"
origin: "http://www..hej..se"
normalized: #<Twingly::URL:0x3fe6b8d3a804 http://www..hej..se/>
sld: "hej"
trd: "www."
tld: "se"
ttld: "se"
without_scheme: "//www..hej..se"
normalized_scheme: "http"
normalized_host: "www..hej..se"
normalized_path: "/"

Vigneshbalakrishnan118 · 2018-05-18T04:07:31Z

I am the one who raised the issue there..
Is there any solution or it won't be fixed?

jage · 2018-05-21T08:16:31Z

I'm not sure what the best response is to the .valid? message, but I think we should normalize the dots, i.e.:

Instead of:

irb(main):003:0> Twingly::URL.parse("www..hej..se").to_s
=> "http://www..hej..se"
irb(main):004:0> Twingly::URL.parse("www..hej..se").normalized.to_s
=> "http://www..hej..se/"

It would do this:

irb(main):003:0> Twingly::URL.parse("www..hej..se").to_s
=> "http://www..hej..se"
irb(main):004:0> Twingly::URL.parse("www..hej..se").normalized.to_s
=> "http://www.hej.se/"

New issue for this?

dentarg · 2018-05-21T08:21:23Z

@jage Sounds good to me, yeah a new issue for that is the best I think

dentarg · 2018-05-21T08:22:51Z

@jage Ah no, I don't think we should do that, because of

#25 (comment)

A URI with consecutive dots would be "dropped" by .NET System.Uri

dentarg · 2018-05-21T08:23:19Z

I think the goal for twingly-url normalize is to do the same as our .NET normalizer, correct?

dentarg · 2018-05-21T08:25:52Z

I don't want #normalized to become a method that you use to try to "fix" broken URLs, because in other cases, calling #normalized will break (change) working URLs.

jage · 2018-05-21T08:28:27Z

A URI with consecutive dots would be "dropped" by .NET System.Uri

Then I think we have the answer? If we should do the same, it should not be valid (i.e. dropped).

I think the goal for twingly-url is to do the same as our .NET normalizer, correct?

Yes it would be best if it's just "the same" normalisation, but I think this is a change where we could change the .NET normalisation as well.

I'm not trying to fix a URL, I'm trying to create a normalized form so we can compare and see if the URL is blocked, already in the system etc.

jage · 2018-05-21T08:29:16Z

Let's keep the discussion of the validity here, and normalization in #125

dentarg · 2018-05-21T08:35:45Z

I'm not trying to fix a URL, I'm trying to create a normalized form so we can compare and see if the URL is blocked, already in the system etc.

I see.

I think not valid makes sense because other tools doesn't treat http://www..hej..se as a valid URL. Just tried both http://www..hej..se and www..hej..se in Chrome, and it searches Google for me, doesn't try to visit the site in question. Also see the first post here about dig.

dentarg · 2019-09-03T10:22:54Z

$ ruby -e "require 'twingly/url'; url = Twingly::URL.parse('www..hej..se') ; p Twingly::URL::VERSION, url, url.valid?"
"6.0.2"
#<Twingly::URL:0x3fd0578a13c8 http://www..hej..se>
true

dentarg · 2022-10-08T10:21:16Z

#158 made these URLs invalid, this issue can be closed!

DevL · 2022-10-08T10:43:43Z

It's been a long time. 😄

dentarg added the bug label Feb 1, 2015

roback added the wontfix label Sep 9, 2015

roback added enhancement and removed bug wontfix labels Sep 9, 2015

roback pushed a commit that referenced this issue Sep 9, 2015

Document behaviour (see #25)

80cba20

roback changed the title ~~URL with multiple consecutive dots should not be valid~~ URL with multiple consecutive dots should not be valid? Sep 9, 2015

roback added the question label Sep 9, 2015

dentarg mentioned this issue Oct 23, 2015

Store exceptions #49

Closed

jage mentioned this issue May 21, 2018

Normalize URL with multiple consecutive dots #125

Closed

dentarg mentioned this issue Oct 8, 2022

Add more invalid URLs to specs #159

Merged

Chrizpy closed this as completed in #159 Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

URL with multiple consecutive dots should not be valid? #25

URL with multiple consecutive dots should not be valid? #25

dentarg commented Feb 1, 2015

dentarg commented Feb 1, 2015

roback commented Sep 9, 2015

walro commented Oct 30, 2015

dentarg commented Nov 5, 2016 •

edited

Loading

dentarg commented May 17, 2018

dentarg commented May 17, 2018

dentarg commented May 17, 2018

Vigneshbalakrishnan118 commented May 18, 2018

jage commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented May 21, 2018 •

edited

Loading

dentarg commented May 21, 2018 •

edited

Loading

jage commented May 21, 2018

jage commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented Sep 3, 2019

dentarg commented Oct 8, 2022

DevL commented Oct 8, 2022

URL with multiple consecutive dots should not be valid? #25

URL with multiple consecutive dots should not be valid? #25

Comments

dentarg commented Feb 1, 2015

dentarg commented Feb 1, 2015

roback commented Sep 9, 2015

walro commented Oct 30, 2015

dentarg commented Nov 5, 2016 • edited Loading

dentarg commented May 17, 2018

dentarg commented May 17, 2018

dentarg commented May 17, 2018

Vigneshbalakrishnan118 commented May 18, 2018

jage commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented May 21, 2018 • edited Loading

dentarg commented May 21, 2018 • edited Loading

jage commented May 21, 2018

jage commented May 21, 2018

dentarg commented May 21, 2018

dentarg commented Sep 3, 2019

dentarg commented Oct 8, 2022

DevL commented Oct 8, 2022

dentarg commented Nov 5, 2016 •

edited

Loading

dentarg commented May 21, 2018 •

edited

Loading

dentarg commented May 21, 2018 •

edited

Loading