Skip to content

Commit

Permalink
support for HTML blocks that stay untouched
Browse files Browse the repository at this point in the history
  • Loading branch information
cadorn committed Jun 17, 2011
1 parent a062bd9 commit 50e2299
Showing 1 changed file with 25 additions and 1 deletion.
26 changes: 25 additions & 1 deletion lib/markdown.js
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,22 @@ Markdown.dialects.Gruber = {

para: function para( block, next ) {
// everything's a para!
return [ ["para"].concat( this.processInline( block ) ) ];

// For any markup that is not covered by Markdown’s syntax, you simply use HTML itself.
// There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags.
// The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from
// surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces.
// Markdown is smart enough not to add extra (unwanted) <p> tags around HTML block-level tags.
if (block[0] == "<" && block[block.length-1] == ">") {
var blockStr = block.valueOf();
// Span-level HTML tags — e.g. <span>, <cite>, or <del> — can be used anywhere in a Markdown paragraph, list item, or header.
if (/^<(span|cite|del)[\s>]/.test(blockStr)) {
return [ ["para"].concat( this.processInline( block ) ) ];
}
return [ ["html"].concat( blockStr ) ];
} else {
return [ ["para"].concat( this.processInline( block ) ) ];
}
}
}
}
Expand Down Expand Up @@ -1297,6 +1312,11 @@ function render_tree( jsonml ) {
attributes = {},
content = [];

// render block as-is
if (tag === false) {
return jsonml.join( "" );
}

if ( jsonml.length && typeof jsonml[ 0 ] === "object" && !( jsonml[ 0 ] instanceof Array ) ) {
attributes = jsonml.shift();
}
Expand Down Expand Up @@ -1367,6 +1387,10 @@ function convert_tree_to_html( tree, references, options ) {
case "para":
jsonml[ 0 ] = "p";
break;
case "html":
// handle block as-is
jsonml[ 0 ] = false;
break;
case "markdown":
jsonml[ 0 ] = "html";
if ( attrs ) delete attrs.references;
Expand Down

2 comments on commit 50e2299

@kragen
Copy link

@kragen kragen commented on 50e2299 Jul 28, 2011

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, while on one hand I really want this feature for my application of markdown-js for, on the other hand I really want a way to filter the HTML to keep out things like the following:

  • unclosed <blockquote>
  • <script>
  • <a onmouseover>
  • <a href="jscript:...">
  • <a href="mocha:...">
  • <a href=" javascript:...">
  • <iframe>
  • <img width=1 height=1 src="http://...">
  • other things not mentioned here.

I think I'd also be a little happier with something other than false as the tag for as-is blocks. It is JSON-serializable, I suppose... I don't suppose there's a JSONML spec for this kind of thing, is there? Last I checked, the JSONML spec wasn't even clear as to whether the contents of JSONML elements were supposed to be CDATA or PCDATA.

I think another thing that we run into trouble with is entity handling. I ought to be able to write &copy; 2011 Kragen Javier Sitaker in a Markdown document and have the © entity get passed through to the output (as you can see that it is in this comment). And the list from the spec, "<span>, <cite>, or <del>", is just a list of examples, not a complete list of span-level HTML tags; the intent is that any span-level HTML tag can be used in those contexts.

What this adds up to is that we probably need to run all the strings that are the contents of paragraphs, list items, or headers, through a more or less actual HTML parser that can be supplied with whitelists of tags, attributes, and URL schemes, so that it can successfully pass through the subset of well-formed HTML that's right for the application in question. In modern browsers, we could actually use DOMParser, but in Node we might have to use our own. It probably doesn't have to be quite as robust as a browser's parser, since many applications (and basically all applications that use some arbitrary subset of HTML) will give the user a chance to preview and fix their Markdown, so if it barfs on overlapping span-level tags (as GitHub sort of does: <b>overlapping <i>span-level</b> tags</i>) or unclosed tags, it's not a big deal.

I'm not proposing that you should do all this work for me; I was just checking out the network to see if someone had already done it. It looks like you're the one that's come closest. Would it be useful to you if I did what I'm describing? Would it remove the need for the code in this commit for you?

Edited to add: I copied the above comment onto the latest issue requesting this feature.

@cadorn
Copy link
Owner Author

@cadorn cadorn commented on 50e2299 Jul 28, 2011

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.