-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 536 3.9 urlparse changes #565
Conversation
Why is the option "vendor all of urlparse" and not "implement a minimal |
Oops--scratch what I said. I missed stuff in the code changes. |
6f7a054
to
207603a
Compare
OK!
r? @willkg |
@g-k Can you rebase this? Then I'll test it out. |
Yeah I need to bump the version again. |
207603a
to
e1a5612
Compare
OK rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I had some minor issues, but I like the approach.
@@ -61,6 +61,7 @@ def get_version(): | |||
'Programming Language :: Python :: 3.6', | |||
'Programming Language :: Python :: 3.7', | |||
'Programming Language :: Python :: 3.8', | |||
'Programming Language :: Python :: 3.9', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hooray!
FWIW Freddy pointed out that Django forked urlsplit https://github.com/django/django/blob/main/django/utils/http.py#L287 |
a76def5
to
68ed6ff
Compare
@@ -522,6 +544,14 @@ def test_attributes_list(): | |||
{"protocols": ["http"]}, | |||
'<a href="192.168.100.100:8000">valid</a>', | |||
), | |||
pytest.param( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file an issue and see if anyone runs into this in the wild.
Finally got a chance to spend some time on this. This PR now:
We probably should switch to newer URL parsing code eventually, but this PR keeps the changes minimal to fix the Python 3.9 test failures and avoid breaking changes. Functional tests (not covered by CI):
r? @willkg when you get some free time |
@g-k I need some time to re-load Bleach things into my noggin so as to really go through this. I think I can get to this next week. |
Thanks! I'll be away and won't get back to this until the end of the month anyway, so there's no rush. I changed directions a few times on this, so the history is more involved and convoluted than it needs to be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a messy situation!
I think this is the right thing to do. It's straight-forward, easy to reason about, and similar to what other projects decided to do. If things come up later, this probably won't make changing our minds later harder. Nicely done!
68ed6ff
to
40de9cd
Compare
Not using pip for all vendored deps
Update test_uri_value_allowed_protocols testcases: * convert test_invalid_uri_does_not_raise_error into a test case * add test case for data: scheme * add test case for implicit http for IP and port with path and fragment * add test case for relative path URI * test "is not allowed by default" test cases against default ALLOWED_PROTOCOLS * change anchor-only test that doesn't include a domain and add a comment to the domain one refs: https://github.com/mozilla/bleach/pull/565/files#r568229243
40de9cd
to
75025e1
Compare
75025e1
to
931b24e
Compare
@@ -1,6 +1,18 @@ | |||
Bleach changes | |||
============== | |||
|
|||
Version 4.1.0 (August 25th, 2021) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor version bump since we're adding a new vendored dep.
Thanks @willkg! I appreciate your feedback on all the various iterations of the PR. |
Fixes #536
Currently:
bleach just uses
urlparse
to extract a URI scheme/protocol (viaurlsplit
)when
http
is inallowed_schemes
(the default), bleach treats unparseable or empty schemes as implicitlyhttp
and safe.base
,origin
, andscheme
.CPython <3.9
urlparse
treats all digits trailing the first colon in a URL as a port number and puts it in the parsed URL pathCPython 3.9
urlparse
treats everything up to the first colon in the URI as the scheme https://github.com/python/cpython/pull/16837/files#diff-b3712475a413ec972134c0260c8f1eb1deefb66184f740ef00c37b4487ef873eL448-R433localhost:8000
and worked around it to handle custom schemes/protocols:bleach/bleach/sanitizer.py
Lines 489 to 491 in f5971aa
Considered a few options:
javascript:
,file:
,data:
) and block them. Requires adding a blocklist, keeping it up to date, and knowing all potentially evil schemes, which isn't recommendedallowed_schemes
to behave the same as their stdliburlparse
.This PR:
Open questions:
Is it safe to trust the Django URL validator for HTML contexts? I think so, or at least I'm not aware of any net locations that are also XSS vectors.
Is the Django code compatible? It's BSD 3 licensed in setup.cfg, which should be compatible with MPL-2 (but also I'm not a lawyer).
If it is compatible, how should we attribute it in
bleach
?Note that we ended up vendoring the old urlparse instead see #565 (comment) for the final changes.