forked from evilstreak/markdown-js
-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
support for HTML blocks that stay untouched
- Loading branch information
Showing
1 changed file
with
25 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
50e2299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, while on one hand I really want this feature for my application of markdown-js for, on the other hand I really want a way to filter the HTML to keep out things like the following:
<blockquote>
<script>
<a onmouseover>
<a href="jscript:...">
<a href="mocha:...">
<a href=" javascript:...">
<iframe>
<img width=1 height=1 src="http://...">
I think I'd also be a little happier with something other than
false
as the tag for as-is blocks. It is JSON-serializable, I suppose... I don't suppose there's a JSONML spec for this kind of thing, is there? Last I checked, the JSONML spec wasn't even clear as to whether the contents of JSONML elements were supposed to be CDATA or PCDATA.I think another thing that we run into trouble with is entity handling. I ought to be able to write
© 2011 Kragen Javier Sitaker
in a Markdown document and have the © entity get passed through to the output (as you can see that it is in this comment). And the list from the spec, "<span>
,<cite>
, or<del>
", is just a list of examples, not a complete list of span-level HTML tags; the intent is that any span-level HTML tag can be used in those contexts.What this adds up to is that we probably need to run all the strings that are the contents of paragraphs, list items, or headers, through a more or less actual HTML parser that can be supplied with whitelists of tags, attributes, and URL schemes, so that it can successfully pass through the subset of well-formed HTML that's right for the application in question. In modern browsers, we could actually use DOMParser, but in Node we might have to use our own. It probably doesn't have to be quite as robust as a browser's parser, since many applications (and basically all applications that use some arbitrary subset of HTML) will give the user a chance to preview and fix their Markdown, so if it barfs on overlapping span-level tags (as GitHub sort of does:
<b>overlapping <i>span-level</b> tags</i>
) or unclosed tags, it's not a big deal.I'm not proposing that you should do all this work for me; I was just checking out the network to see if someone had already done it. It looks like you're the one that's come closest. Would it be useful to you if I did what I'm describing? Would it remove the need for the code in this commit for you?
Edited to add: I copied the above comment onto the latest issue requesting this feature.
50e2299
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See: evilstreak#16