Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug when there is "<" in the math formula #514

Merged
merged 7 commits into from
Feb 20, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 12 additions & 3 deletions nbconvert/filters/markdown_mistune.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from __future__ import print_function

import re
import cgi

import mistune

Expand Down Expand Up @@ -104,15 +105,23 @@ def header(self, text, level, raw=None):
html = super(IPythonRenderer, self).header(text, level, raw=raw)
return add_anchor(html)

# Pass math through unaltered - mathjax does the rendering in the browser
# We must be careful here for compatibility
# html.escape() is not availale on python 2.7
# For more details, see:
# https://wiki.python.org/moin/EscapingHtml
def escape_html(self,text):
return cgi.escape(text)

def block_math(self, text):
return '$$%s$$' % text
return '$$%s$$' % self.escape_html(text)

def latex_environment(self, name, text):
name = self.escape_html(name)
text = self.escape_html(text)
return r'\begin{%s}%s\end{%s}' % (name, text, name)

def inline_math(self, text):
return '$%s$' % text
return '$%s$' % self.escape_html(text)

def markdown2html_mistune(source):
"""Convert a markdown string to HTML using mistune"""
Expand Down
52 changes: 46 additions & 6 deletions nbconvert/filters/tests/test_markdown.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,19 +119,47 @@ def test_markdown2html_heading_anchors(self):
self._try_markdown(markdown2html, md, tokens)

def test_markdown2html_math(self):
# Mathematical expressions should be passed through unaltered
# Mathematical expressions not containing <, >, & should be passed through unaltered
# all the "<", ">", "&" must be escaped correctly
cases = [("\\begin{equation*}\n"
"\\left( \\sum_{k=1}^n a_k b_k \\right)^2 \\leq \\left( \\sum_{k=1}^n a_k^2 \\right) \\left( \\sum_{k=1}^n b_k^2 \\right)\n"
"\\end{equation*}"),
("$$\n"
"a = 1 *3* 5\n"
"$$"),
"$ a = 1 *3* 5 $",
"$s_i = s_{i}\n$"
]
"$s_i = s_{i}\n$",
"$a<b&b<lt$",
"$a<b&lt;b>a;a-b<0$",
"$<k'>$",
"$$a<b&b<lt$$",
"$$a<b&lt;b>a;a-b<0$$",
"$$<k'>$$",
"""$
\\begin{tabular}{ l c r }
1 & 2 & 3 \\
4 & 5 & 6 \\
7 & 8 & 9 \\
\\end{tabular}$"""]

for case in cases:
self.assertIn(case, markdown2html(case))

result = markdown2html(case)
# find the equation in the generated texts
search_result = re.search("\$.*\$",result,re.DOTALL)
if search_result is None:
search_result = re.search("\\\\begin\\{equation.*\\}.*\\\\end\\{equation.*\\}",result,re.DOTALL)
math = search_result.group(0)
# the resulting math part can not contain "<", ">" or
# "&" not followed by "lt;", "gt;", or "amp;".
self.assertNotIn("<", math)
self.assertNotIn(">", math)
# python 2.7 has assertNotRegexpMatches instead of assertNotRegex
if not hasattr(self, 'assertNotRegex'):
self.assertNotRegex = self.assertNotRegexpMatches
self.assertNotRegex(math,"&(?![gt;|lt;|amp;])")
# the result should be able to be unescaped correctly
self.assertEquals(case,self._unescape(math))

def test_markdown2html_math_mixed(self):
"""ensure markdown between inline and inline-block math"""
case = """The entries of $C$ are given by the exact formula:
Expand Down Expand Up @@ -171,7 +199,8 @@ def test_markdown2html_math_paragraph(self):
]

for case in cases:
self.assertIn(case, markdown2html(case))
s = markdown2html(case)
self.assertIn(case,self._unescape(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect markdown2html() anywhere else to be returning unescaped values? Is there any way to systematically detect that expectation (vs. not)?

I ask because it seems a little off to be using a private function in our testing suite just to make tests pass in the way that they were before. It almost seems like we should change the test's expected output (in this case, the case value) rather than to use this _unescape so our tests don't need any special treatment in order to pass. This way we're more explicit about the expected behaviour.

That said, if there's any purpose to having the unescaping function, we should probably surface it in the regular code base and test it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit ugly, but it was a bit of an ugly test before - it checks that the Markdown rendering doesn't change these samples. So it only works with samples that don't use any Markdown syntax (other than the math syntax we add). I wouldn't particularly ask @zasdfgbnm to fix this. Obviously we welcome improvements to the test, but that's probably best as a later PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that was the original test before. Currently this transformation does not leave these samples unchanged, which is why the unescaping needs to happen. I get that to make the test pass as it is it's easier to not have to concern yourself with the escaped bits.

Also, if the idea of this test is that the math processing leaves markdown unaffected, we should probably include some <, > and & in the markdown section and not automatically unescape them (because the math processing is leaking out into the markdown processing).

Also, this test should include a

\begin{}
… 
\end{}

(i.e., including the new lines around the declaration) since, not being wrapped in any number of $, there is an absence of the easiest cues to delimiting math vs. markdown.


@dec.onlyif_cmds_exist('pandoc')
def test_markdown2rst(self):
Expand All @@ -193,3 +222,14 @@ def _try_markdown(self, method, test, tokens):
else:
for token in tokens:
self.assertIn(token, results)

def _unescape(self,s):
# undo cgi.escape() manually
# We must be careful here for compatibility
# html.unescape() is not availale on python 2.7
# For more information, see:
# https://wiki.python.org/moin/EscapingHtml
s = s.replace("&lt;", "<")
s = s.replace("&gt;", ">")
s = s.replace("&amp;", "&")
return s