-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
an escaped line break spoils the parsing of a following CodeBlock #3730
Comments
It's not obvious that this is a bug. (Of course, the lack of a spec makes it hard to resolve this definitively; pandoc does not yet follow the CommonMark spec.) Generally when pandoc sees an escaped newline, it assumes that this is a line break inside a block, so it assumes that the next line is not meant to start a new block. Another example:
which doesn't turn into a setext header. On the other hand, CommonMark has a philosophy of discerning block structure independently of inline parsing, so CommonMark would do things the way you suggest. Still, I'm inclined to agree that this should be changed in the way you suggest. The following patch implements your suggestion "do not consume the newline character itself": diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 5694c43..8220bac 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1471,12 +1471,18 @@ escapedChar' = try $ do
escapedChar :: PandocMonad m => MarkdownParser m (F Inlines)
escapedChar = do
- result <- escapedChar'
+ result <- lookAhead escapedChar'
case result of
- ' ' -> return $ return $ B.str "\160" -- "\ " is a nonbreaking space
- '\n' -> guardEnabled Ext_escaped_line_breaks >>
- return (return B.linebreak) -- "\[newline]" is a linebreak
- _ -> return $ return $ B.str [result]
+ ' ' -> do
+ void $ count 2 anyChar
+ return $ return $ B.str "\160" -- "\ " is a nonbreaking space
+ '\n' -> do
+ guardEnabled Ext_escaped_line_breaks
+ void $ anyChar -- eat the backslash, leaving the newline (see #3730)
+ return (return B.linebreak) -- "\[newline]" is a linebreak
+ _ -> do
+ void $ count 2 anyChar
+ return $ return $ B.str [result] The test suite fails because of a test involving this Markdown:
which current pandoc converts to <h1 id="title-foo">Title<br />
foo</h1> and the changed code converts to <h1 id="title">Title<br />
</h1>
<p>foo</p> So, one effect of making this change is that one can no longer use this trick to get newlines in headers. I have a feeling that some people may be relying on the current behavior, so this change would require some further discussion on pandoc-discuss (though I'm still inclined to make it, to bring pandoc's parsing closer to CommonMark). |
Here's a different patch that might make more sense. It improves over current pandoc behavior in disallowing escaped newlines in some contexts where newlines aren't allowed. But it has the same effect as the above patch of disallowing hard breaks in headers. diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 5694c43..807b178 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1450,6 +1450,7 @@ inline = choice [ whitespace
, autoLink
, spanHtml
, rawHtmlInline
+ , escapedNewline
, escapedChar
, rawLaTeXInline'
, exampleRef
@@ -1466,16 +1467,20 @@ escapedChar' = try $ do
(guardEnabled Ext_all_symbols_escapable >> satisfy (not . isAlphaNum))
<|> (guardEnabled Ext_angle_brackets_escapable >>
oneOf "\\`*_{}[]()>#+-.!~\"<>")
- <|> (guardEnabled Ext_escaped_line_breaks >> char '\n')
<|> oneOf "\\`*_{}[]()>#+-.!~\""
+escapedNewline :: PandocMonad m => MarkdownParser m (F Inlines)
+escapedNewline = try $ do
+ guardEnabled Ext_escaped_line_breaks
+ char '\\'
+ lookAhead (char '\n') -- don't consume the newline (see #3730)
+ return $ return B.linebreak
+
escapedChar :: PandocMonad m => MarkdownParser m (F Inlines)
escapedChar = do
result <- escapedChar'
case result of
' ' -> return $ return $ B.str "\160" -- "\ " is a nonbreaking space
- '\n' -> guardEnabled Ext_escaped_line_breaks >>
- return (return B.linebreak) -- "\[newline]" is a linebreak
_ -> return $ return $ B.str [result]
ltSign :: PandocMonad m => MarkdownParser m (F Inlines) |
Further note: pandoc is a bit inconsistent, because if you leave two spaces at the end of the line, you don't get a hard break in a header:
Pandoc also disallows hard breaks in setext headers:
doesn't produce a header. This argues for removing the inconsistency by disallowing backslash-newline hard breaks in atx headers. The only reason not to do this is that it may break behavior that people are relying on (and this isn't something we should do lightly). |
The main problem with this bug, for me, is that a new block construct (fenced code) is not recognized after an escaped newline. Perhaps the best behavior is to allow a new block to start after the escaped newline but not terminate a header-line (is that the same kind of block?) if there is not a block-introducing construct after the escaped newline. And here's more: The escaped newline, immediately followed by a fenced code block, was originally produced by pandoc working from an HTML source. It was surprisingly tricky to produce a minimized test case, but here is one (the
This bug could be made moot for me if the markdown unparser put an extra newline between backslash and backtick, in this case. In fact, I work around the bug with a sed script to post-process the possibly-broken markdown:
|
This behaviour was fairly widely documented for use with headings; I see the problem with it, but is there any other way of achieving the same thing? |
@adunning did you try with root@18cbcfcaf28c:/source# pandoc -t html -f markdown
# Line<br/>break
^D
<h1 id="line-break">Line<br/>break</h1> If you need latex output, you could use root@18cbcfcaf28c:/source# pandoc -t latex -f markdown
# Line`\\`{=latex} Break
^D
\hypertarget{line-break}{%
\section{\texorpdfstring{Line\\ Break}{Line Break}}\label{line-break}} This doesn't look very optimal, since you'd need specific line-breaks for each desired output format, but it may serve as a workaround. |
+++ Andrew Dunning [Oct 31 17 08:09 ]:
This behaviour was [1]fairly [2]widely [3]documented for use with
headings; I see the problem with it, but is there any other way of
achieving the same thing?
You could do have a filter convert `[]{.br}` into a line
break, or something ad hoc like that.
|
Many thanks, both, for the ideas! |
A trailing backslash interferes with the parsing of a fenced code block on the next line.
Workaround: Pre-filter the input to insert a blank line after "\\n".
Note: In the next two commands, the backslash appears as a single slash in the markdown input that comes out of "echo".
Using Code instead of CodeBlock for the preformatted stuff will cause it to lose its format when it passes through HTML. That is how the bug was noticed.
This error suggests that the parsing after a backslash is a little too unforgiving.
Suggested fix: When parsing an escaped newline, do not consume the newline character itself.
The text was updated successfully, but these errors were encountered: