-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] libxml 2.9.13 breaks HTML4 parser recovery from ill-formed <
character
#2461
Comments
OK, I have a repro which seems common across the rails-html-sanitizer failures as well as my day job CI failures: it "handles < character" do
input = %{<div> this < that </div>}
expected = %{<div> this < that </div>}
actual = Loofah.scrub_fragment(input, :escape)
assert_equal(expected, actual.to_html)
end with nokogiri v1.13.1, this passes. with nokogiri v1.13.2:
|
Without Loofah, here's the core problem:
|
I've opened an issue upstream: https://gitlab.gnome.org/GNOME/libxml2/-/issues/339 I'm going to explore reverting the related commits in a patch to see if I can get a fast-follow release of Nokogiri for y'all. |
I've updated this issue's description with a punch list of next steps. |
<
character
<
character<
character
see sparklemotion/nokogiri#2461 for background
v1.13.3 has been released to address this: https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.3 |
Summary
Nokogiri v1.13.2 shipped libxml 2.9.13. That version of libxml2 introduced a behavior change to how the HTML4 parser recovers when it sees a bare (ill-formed)
<
character (one that is not part of a start tag).I've opened an issue upstream at https://gitlab.gnome.org/GNOME/libxml2/-/issues/339
Immediate next steps
<
in an HTML4 document to the Nokogiri test suiteLess-urgent next steps
The text was updated successfully, but these errors were encountered: