-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid Character Error #23
Comments
@miso-belica thanks for sharing this. We'll take a look at it. |
Another URL with the same issue http://hasicilitomysl.hys.cz and the snippet that causes the issue (unclosed <img border="1"
src="http://hasicilitomysl.hys.cz/wp-content/uploads/cesko.jpg"
title="CZ"</a> And with little more context: <div id="sidebar-wrap2">
<ul id="sidelist">
<li><div id="text-4" class="widget widget_text"><h2 class="title">Překladač</h2> <div class="textwidget"><a href="http://hasicilitomysl.hys.cz" target="_blank"><img border="1"
src="http://hasicilitomysl.hys.cz/wp-content/uploads/cesko.jpg"
title="CZ"</a>
<a href="http://translate.google.cz/translate?sl=cs&tl=en&js=n&prev=_t&hl=cs&ie=UTF-8&layout=2&eotf=1&u=hasicilitomysl.hys.cz" target="_blank"><img border="1"
src="http://hasicilitomysl.hys.cz/wp-content/uploads/UK.jpg"
title="Anglie"</a> |
@miso-belica Just wanted to let you know this is still being looked at. We've not forgotten about it. |
Thanks, I also did some investigation and I find out that creating of attribute So maybe the best solution would be just ignore attributes that contain invalid characters (throw |
Okay, added a fix in 77ad931. This will basically treat |
Hi,
<img src="http://www.podnikatele.g6.cz/wp-content/uploads/2013/07/button_zadost.png" width="190 height="150"></a></div>
<meta name=”description” content=”Vaše obľúbené meme v CZ/SK a ENG jazyku.”>
<a class="rss-topnav" rel="nofollow" href="https://www.facebook.com/pages/Mapaj-Os/180837811986505?fref=ts" target="_blank"; ?>Najdete nás i na Facebooku</a>
<a href="http://www.l2top.co/vote/server/1148 target="_blank"> And unfortunately many more. So this fix is not the sufficient solution for me :( |
@miso-belica thanks for giving us more detail. |
We need to make a big decision about how deeply we are going to go when it comes to supporting broken HTML. There are also some really gray areas when it comes to how to deal with divergence from the spec. It's not clear, for example, whether I'm not terribly keen on duplicating HTMLTidy, but I also don't want to have parser errors in cases where we can cleanly work around it. |
Here's what I'm thinking....
Thoughts? |
Plus...
|
I am okay with Pull Request #29. I agree with @miso-belica that we should trap errors before they make their way into the DOM layer. If @mattfarina agrees, I will merge the pull request. |
Hi,
when I'm trying to parse URL http://e107.funsite.cz/ I get
DOMException("Invalid Character Error", 5)
because of one unclosed tag in the markup. The snippet below causes the exception. It is caused by trying to set attribute with name<div
in DOMTreeBuilder.php. As I understand from the doc all errors should be recorded in property$dom->errors
. Can you fix this please?The text was updated successfully, but these errors were encountered: