Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improving url parsing: make it more strict #751

Closed
sax opened this issue Feb 11, 2025 · 23 comments
Closed

improving url parsing: make it more strict #751

sax opened this issue Feb 11, 2025 · 23 comments

Comments

@sax
Copy link

sax commented Feb 11, 2025

I have a number of projects that require hackney via transient dependencies. This morning my security audits started flagging hackney with a low severity error for CVE-2025-1211.

I don't see an existing issue for this, so thought I would bring it to your attention here.

I discovered this through the mix_audit library, which I run as a part of my CI pipelines.

@justincy
Copy link

A temporary work around:

mix deps.audit --ignore-advisory-ids GHSA-vq52-99r9-h5pw

@oscar-ospina
Copy link

Wondering... do you guys have plans for patching this?

@benoitc
Copy link
Owner

benoitc commented Feb 11, 2025

wondering how one could open a CVE about "possible" security issue without even contacting the author first. I am tired of this behavior.

Hackney doesn't giive 100% warranty a URL is syntactically correct. This is more a user and conveniencechoice . In the example linked above the host will be "127.0.0.2 " since all before @ will be shown as an auth string . The only "fix" i can see is to add an option to not use that feature and only ensure host and ports are passed there.

@benoitc benoitc changed the title CVE-2025-1211 — Server-side Request Forgery improving url parsing Feb 11, 2025
@benoitc benoitc changed the title improving url parsing improving url parsing: make it more strict Feb 11, 2025
@Trevoke
Copy link

Trevoke commented Feb 11, 2025

It looks like this is sample code that showcases the vulnerability: https://gist.github.com/snoopysecurity/996de09ec0cfd0ebdcfdda8ff515deb1

@kevinkirkup
Copy link

That gist has been open for 11 months.
Thanks snyk.io!

@Trevoke
Copy link

Trevoke commented Feb 11, 2025

@kevinkirkup It has. I'm afraid I don't understand what value this information adds to the conversation, can you help me see it?

@benoitc
Copy link
Owner

benoitc commented Feb 11, 2025

i will send a pr tomorrow to make url parsing stricter. I am debating it should be the default there though. But since mix.audit already flag it, there maybe no choice to make the default strict anyway. any feedback is welcome.

@sax
Copy link
Author

sax commented Feb 12, 2025

Yeah, it must be very frustrating that this was known about for that long and a CVE was issued without (it seems) any attempt to communicate with you. Thank you for the effort to resolve it!

With regards to the default implementation... I'm struggling to find the RFC(s) that covers the use of @ in basic auth passwords, especially regarding using them in URIs. This Stack Overflow answer implies that @ characters are only valid in quoted strings, but while it links to specific sections of RFC2616 I don't personally see how the linked sections tie back to URI parsing.

My personal opinion is that stricter would be preferred. If someone wants to use basic or digest auth with an @ in the password, they could either not, or they could use an Authorization request header instead.

@warmwaffles
Copy link

they could either not

This is the only sane method.

@sax
Copy link
Author

sax commented Feb 12, 2025

Ok, from my reading of RFC3986 section 3.2, an @ is not valid in the userinfo part of a URI. It would need to be encoded as %40.

authority     = [ userinfo "@" ] host [ ":" port ]
userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )

unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims     =       "$" / "&" / "'" /                                     "=" / other-delims
other-delims   = "!" /                   "(" / ")" / "*" / "+" / "," / ";"
pct-encoded   = "%" HEXDIG HEXDIG

@warmwaffles
Copy link

warmwaffles commented Feb 12, 2025

This won't mean much here but in Elixir it is parsed out as

%URI{
  scheme: "http",
  authority: "127.0.0.1",
  userinfo: nil,
  host: "127.0.0.1",
  port: 80,
  path: nil,
  query: "@127.2.2.2/",
  fragment: nil
} = URI.parse("http://[email protected]/")

The host is right, the request is supposed to go to 127.0.0.1

Using uri_string seems to be a good move https://www.erlang.org/doc/apps/stdlib/uri_string but I don't know which OTP version it was introduced in 😞

EDIT: OTP 21 is when uri_string was added. Probably safe to use here.

@benoitc
Copy link
Owner

benoitc commented Feb 12, 2025

So the reason we parse the user auth from the uri is because hackney is an http client. Just like curl or your browser does it , it's useful for the user to have the same convenience. Since. It's an HTTP client and as such is not only used for the use case of a server to connect to an api. There is nothing I see really wrong in parsing the auth from the http line there per see. Removing this convenience as an http client lib will break a lot of code around.

Also here I fail to see how it's a security issue, if 127.0.0.2 can be used, what stop the "attacker" to just pass this host? That would mean that no check is done on the uri at first? User can already fix this "security" issue by parsing the uri (using the hackney_url module) they receive or store and ensure that user part is stripped. Then remove it and check if the host is accepted. How lazy should be the validation?

That said, this convenience will be kept around but will now be enabled using a global option. The patch add support of url validation (strict mode enabled) . This will land during the day.

@jewalky
Copy link

jewalky commented Feb 12, 2025

So the reason we parse the user auth from the uri is because hackney is an http client. Just like curl or your browser does it , it's useful for the user to have the same convenience.

Worth noting though that both curl and available browsers (Firefox, Chrome) parse the URL http://[email protected]/ as going to 127.0.0.1 — while they still support being able to provide a username and password. So it's not entirely clear what's wrong with both keeping the convenience and fixing the issue, and why does it have to be mutually exclusive to either support @ or address the CVE 🤔

User can already fix this "security" issue by parsing the uri (using the hackney_url module)

Considering Hackney is the backend for one of the most popular HTTP libraries for Elixir (HTTPoison), most developers who are using it are not even aware of hackney_url or Hackney's direct APIs. While ignorance isn't generally a good thing, I wouldn't call people lazy if they don't dig deep into the dependencies and internals of each library they are using. Especially if it relates to something as common as URL parsing.

I recently had a similar obscure problem in my (much smaller of course) project, where I needed to filter out certain protocols. While investigating a report, I learned that the URL standard or it's common implementation allows having a \n inside the protocol name, where it will be silently removed by browsers and most URL parsing libraries, so java\nscript:alert(1) is perfectly valid (and executable!) URL.

Point being that it's actually critical that all libraries do URL parsing exactly the same way, not deviating from each other, across all programming languages and frameworks — and the less bespoke code, the better — otherwise no one knows at which point it will break and for which case, due to URLs being such a fundamental concept.

(A bit unrelated, btw, but hackney_url does preserve the \n in protocol as-is, Elixir's URI just breaks completely and does not recognize it as URL, but, for example, Python's urllib, NodeJS' new URL() and Rust's url do parse it as expected; and all of these except hackney_url will parse the URL from this topic as having 127.0.0.1 as host)

@Xunjin
Copy link

Xunjin commented Feb 12, 2025

Well I understand your frustration @benoitc that was indeed a shitty move by the CVE report, however @jewalky point is correct in my POV specifically this part:

Considering Hackney is the backend for one of the most popular HTTP libraries for Elixir (HTTPoison), most developers who are using it are not even aware of hackney_url or Hackney's direct APIs. While ignorance isn't generally a good thing, I wouldn't call people lazy if they don't dig deep into the dependencies and internals of each library they are using. Especially if it relates to something as common as URL parsing.

As this is the common use today shouldn't this fix which address be implemented?

@Trevoke
Copy link

Trevoke commented Feb 12, 2025

@Xunjin Unless the author of HTTPoison specifically had a conversation with the author of hackney, I think the amount of responsibility that trickles down to hackney from HTTPoison is limited : HTTPoison could also stop using hackney (and yes, I realize how much work I am suggesting).

@jewalky to this point:

Point being that it's actually critical that all libraries do URL parsing exactly the same way, not deviating from each other, across all programming languages and frameworks — and the less bespoke code, the better — otherwise no one knows at which point it will break and for which case, due to URLs being such a fundamental concept.

I think this is what you meant, but I would correct this to say [...] that all libraries implement the URL spec fully and correctly because I'd rather have one correct library and a thousand wrong ones than a thousand and one wrong ones :)

@kevinkirkup
Copy link

@kevinkirkup It has. I'm afraid I don't understand what value this information adds to the conversation, can you help me see it?

Sorry, wondering if this is really an issue since it's been know for 11 months.

@sax
Copy link
Author

sax commented Feb 12, 2025

So the reason we parse the user auth from the uri is because hackney is an http client.

I don't think anyone is suggesting removing the feature to parse userinfo from URIs, or even making the feature optional (though according to multiple specifications it should be deprecated).

From the RFCs that I read yesterday, though, an unescaped @ character is invalid in the userinfo part of a URI. I would expect HTTP clients to follow the same specs for parsing URIs.

According to the README, hackney supports Erlang 22.3 and later. Since uri_string was added in OTP 21, if hackney_url delegated to that, I wonder if it would solve the issue as well as reduce future maintenance for you.

@Trevoke
Copy link

Trevoke commented Feb 12, 2025

@kevinkirkup It has. I'm afraid I don't understand what value this information adds to the conversation, can you help me see it?

Sorry, wondering if this is really an issue since it's been know for 11 months.

Ah! Thank you for clarifying, I did not understand that from your message. I think the question of whether it's really an issue is a really fair point but unfortunately this is not the place for the conversation - you should send that comment upstream to whoever decided to open this advisory - so maybe one of these two places: https://nvd.nist.gov/vuln/detail/CVE-2025-1211 or https://www.cve.org/CVERecord?id=CVE-2025-1211

[ Edit: in addition, something being known for a while is not the same thing as something not being an issue :) ]

@peaceful-james
Copy link

Sending good vibes to you @benoitc

Thanks for allowing an entire community of developers to stand on your shoulders. 🥇

@peymor19
Copy link

peymor19 commented Feb 14, 2025

(A bit unrelated, btw, but hackney_url does preserve the \n in protocol as-is, Elixir's URI just breaks completely and does not recognize it as URL, but, for example, Python's urllib, NodeJS' new URL() and Rust's url do parse it as expected; and all of these except hackney_url will parse the URL from this topic as having 127.0.0.1 as host)

To add to this, I did find in RFC 2396 that US-ASCII control characters are not to be allowed in the URI syntax https://datatracker.ietf.org/doc/html/rfc2396#section-2.4.3, but I did not find anywhere in the new RFC 3986 that mentions anything about control characters in the URI.

Maybe someone smarter than me might be able to comment on this find.

@sax
Copy link
Author

sax commented Feb 15, 2025

The ABNF rules look to me like they do not include control characters, unless I'm missing something or there's a superseding RFC I can't find.

@peymor19
Copy link

The ABNF rules look to me like they do not include control characters, unless I'm missing something or there's a superseding RFC I can't find.

Yep, you are right. To me it looks like if you pass a control character you should error but if you pass a percent-encoded control character it would be valid according to the ABNF rules. I tested this case on a few different libraries and it seems that some handle this case slightly differently.

@benoitc
Copy link
Owner

benoitc commented Feb 20, 2025

I am closing this issue since hackney 1.21.0 has been released and patch hackney_url to fix this vulnerability. Thanks all for the discussion and encouragements. I appreciate.

Full changelog: https://github.com/benoitc/hackney/releases/tag/1.21.0

@benoitc benoitc closed this as completed Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests