Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 536 3.9 urlparse changes #565

Merged
merged 9 commits into from
Aug 25, 2021
Merged

Fix 536 3.9 urlparse changes #565

merged 9 commits into from
Aug 25, 2021

Conversation

g-k
Copy link
Collaborator

@g-k g-k commented Nov 9, 2020

Fixes #536

Currently:

Considered a few options:

  • ignore it: mark the failing tests as expected failures on 3.9. This gets tests passing, but introduces a breaking change for 3.9 users and doesn't fix the underlying issue.
  • vendor an old urlparse version: either vendor 2.7.18 and make it compatible with PY3 or drop Python 3 support and vendor a urlparse version <3.9. This wouldn't introduce breaking changes, but we'd vendor more code
  • make the URI sanitizer stricter: stop doing that. This is stricter and more secure, but a breaking change that deviants from the web spec and probably how users expect it to work
  • enumerate bad schemes: check for schemes that can deliver malicious content (e.g. javascript:, file:, data:) and block them. Requires adding a blocklist, keeping it up to date, and knowing all potentially evil schemes, which isn't recommended
  • match schemes that look like network locations: check if schemes look like valid hostnames or IPv4 or v6 addresses with an optional port (using bits of django's URL validator). Potentially breaking change and might confuse users if they expect allowed_schemes to behave the same as their stdlib urlparse.

This PR:

  • implements the last option using the hostname, IP address, and port portions of Django's URL validator
  • updates CI and tox to test against 3.9 and use it as the default for other targets
  • updates the URI validation tests (details in the last commit message)

Open questions:

Is it safe to trust the Django URL validator for HTML contexts? I think so, or at least I'm not aware of any net locations that are also XSS vectors.

Is the Django code compatible? It's BSD 3 licensed in setup.cfg, which should be compatible with MPL-2 (but also I'm not a lawyer).

If it is compatible, how should we attribute it in bleach?

Note that we ended up vendoring the old urlparse instead see #565 (comment) for the final changes.

@g-k g-k requested review from willkg and jdufresne November 9, 2020 18:44
@g-k g-k marked this pull request as ready for review November 9, 2020 18:45
@willkg
Copy link
Member

willkg commented Nov 9, 2020

Why is the option "vendor all of urlparse" and not "implement a minimal get_schema that does what Bleach needs it to"? Is the latter option difficult for some reason I'm not seeing?

@willkg
Copy link
Member

willkg commented Nov 9, 2020

Oops--scratch what I said. I missed stuff in the code changes.

bleach/utils.py Outdated Show resolved Hide resolved
@g-k g-k removed the request for review from jdufresne January 25, 2021 19:54
@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch 6 times, most recently from 6f7a054 to 207603a Compare January 27, 2021 17:22
@g-k
Copy link
Collaborator Author

g-k commented Jan 27, 2021

OK!

  • moved the django bits to the _vendor namespace
  • updated the vendoring scripts so we don't ship all of django
  • switched to vendoring django 1.11 since it support 2.7+ (confirmed it uses the same regexes)

r? @willkg

@g-k g-k requested a review from willkg January 27, 2021 17:32
@willkg
Copy link
Member

willkg commented Feb 1, 2021

@g-k Can you rebase this? Then I'll test it out.

@g-k
Copy link
Collaborator Author

g-k commented Feb 1, 2021

Yeah I need to bump the version again.

@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch from 207603a to e1a5612 Compare February 1, 2021 19:59
@g-k
Copy link
Collaborator Author

g-k commented Feb 1, 2021

OK rebased

Copy link
Member

@willkg willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I had some minor issues, but I like the approach.

tox.ini Show resolved Hide resolved
bleach/_vendor/pip_install_vendor.sh Outdated Show resolved Hide resolved
bleach/_vendor/pip_install_vendor.sh Outdated Show resolved Hide resolved
bleach/_vendor/pip_install_vendor.sh Outdated Show resolved Hide resolved
bleach/_vendor/vendor.txt Outdated Show resolved Hide resolved
bleach/utils.py Outdated Show resolved Hide resolved
bleach/utils.py Outdated Show resolved Hide resolved
bleach/utils.py Outdated Show resolved Hide resolved
@@ -61,6 +61,7 @@ def get_version():
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hooray!

tests/test_clean.py Show resolved Hide resolved
@g-k
Copy link
Collaborator Author

g-k commented Mar 22, 2021

FWIW Freddy pointed out that Django forked urlsplit https://github.com/django/django/blob/main/django/utils/http.py#L287

@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch 5 times, most recently from a76def5 to 68ed6ff Compare August 5, 2021 15:55
@@ -522,6 +544,14 @@ def test_attributes_list():
{"protocols": ["http"]},
'<a href="192.168.100.100:8000">valid</a>',
),
pytest.param(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file an issue and see if anyone runs into this in the wild.

@g-k
Copy link
Collaborator Author

g-k commented Aug 5, 2021

Finally got a chance to spend some time on this.

This PR now:

  • vendors the python 3.6.14 urllib.parse, uses it on all python versions (3.9 is the only one using the updated behavior, but it's simpler to not have version specific behavior); so it drops custom URL parsing behavior and django regexes
  • updates vendor install and verify scripts (including the changes from Fix 536 3.9 urlparse changes #565 (comment) and Fix 536 3.9 urlparse changes #565 (comment); I don't think the code the other comments were on)

We probably should switch to newer URL parsing code eventually, but this PR keeps the changes minimal to fix the Python 3.9 test failures and avoid breaking changes.

Functional tests (not covered by CI):

  • run vendor_install.sh and make sure nothing changes
  • run vendor_verify.sh and make sure it passes
  • change parse.py then run vendor_verify.sh and make sure it fails
  • change parse.py shasum file then run vendor_verify.sh and make sure it fails

r? @willkg when you get some free time

@g-k g-k requested a review from willkg August 5, 2021 16:21
@willkg
Copy link
Member

willkg commented Aug 5, 2021

@g-k I need some time to re-load Bleach things into my noggin so as to really go through this. I think I can get to this next week.

@g-k
Copy link
Collaborator Author

g-k commented Aug 6, 2021

@g-k I need some time to re-load Bleach things into my noggin so as to really go through this. I think I can get to this next week.

Thanks! I'll be away and won't get back to this until the end of the month anyway, so there's no rush.

I changed directions a few times on this, so the history is more involved and convoluted than it needs to be.

@g-k g-k mentioned this pull request Aug 10, 2021
requirements-dev.txt Outdated Show resolved Hide resolved
Copy link
Member

@willkg willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a messy situation!

I think this is the right thing to do. It's straight-forward, easy to reason about, and similar to what other projects decided to do. If things come up later, this probably won't make changing our minds later harder. Nicely done!

@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch from 68ed6ff to 40de9cd Compare August 25, 2021 14:32
Greg Guthe added 8 commits August 25, 2021 10:34
Not using pip for all vendored deps
Update test_uri_value_allowed_protocols testcases:

  * convert test_invalid_uri_does_not_raise_error into a test case
  * add test case for data: scheme
  * add test case for implicit http for IP and port with path and fragment
  * add test case for relative path URI
  * test "is not allowed by default" test cases against default
    ALLOWED_PROTOCOLS
  * change anchor-only test that doesn't include a domain and add a
    comment to the domain one refs:
    https://github.com/mozilla/bleach/pull/565/files#r568229243
@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch from 40de9cd to 75025e1 Compare August 25, 2021 14:34
@g-k g-k force-pushed the fix-536-3.9-urlparse-changes branch from 75025e1 to 931b24e Compare August 25, 2021 14:39
@@ -1,6 +1,18 @@
Bleach changes
==============

Version 4.1.0 (August 25th, 2021)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor version bump since we're adding a new vendored dep.

@g-k g-k merged commit e4718bd into main Aug 25, 2021
@g-k g-k deleted the fix-536-3.9-urlparse-changes branch August 25, 2021 14:44
@g-k
Copy link
Collaborator Author

g-k commented Aug 25, 2021

Thanks @willkg! I appreciate your feedback on all the various iterations of the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test failures with py3.9
2 participants