an escaped line break spoils the parsing of a following CodeBlock #3730

rose00 · 2017-06-10T01:31:46Z

A trailing backslash interferes with the parsing of a fenced code block on the next line.

Workaround: Pre-filter the input to insert a blank line after "\\n".

$ (echo 'nice line @'; echo '``` {style=".impeccable"}'; echo "  preformatted       stuff";
    echo '```') | pandoc -t native -f markdown+escaped_line_breaks
[Para [Str "nice",Space,Str "line",Space,Str "@"]
,CodeBlock ("",[],[("style",".impeccable")]) "  preformatted       stuff"]

Note: In the next two commands, the backslash appears as a single slash in the markdown input that comes out of "echo".

$ (echo 'bad line break \'; echo '``` {style=".impeccable"}'; echo "  preformatted       stuff";
    echo '```') | pandoc -t native -f markdown+escaped_line_breaks
[Para [Str "bad",Space,Str "line",Space,Str "break",LineBreak,Code ("",[],[]) "{style=\".impeccable\"}   preformatted       stuff"]]

Using Code instead of CodeBlock for the preformatted stuff will cause it to lose its format when it passes through HTML. That is how the bug was noticed.

$ (echo 'bad line break \'; echo '``` {style=".impeccable"}'; echo "  preformatted       stuff";
    echo '```') | pandoc -t native -f markdown-escaped_line_breaks
pandoc: 
Error at "source" (line 2, column 1):unknown parse error
``` {style=".impeccable"}
^
CallStack (from HasCallStack):
  error, called at src/Text/Pandoc/Error.hs:66:13 in pandoc-1.19.2.1-J1nmFBg9ln971v0RrPbKLJ:Text.Pandoc.Error

This error suggests that the parsing after a backslash is a little too unforgiving.

Suggested fix: When parsing an escaped newline, do not consume the newline character itself.

The text was updated successfully, but these errors were encountered:

jgm · 2017-06-10T08:46:34Z

It's not obvious that this is a bug. (Of course, the lack of a spec makes it hard to resolve this definitively; pandoc does not yet follow the CommonMark spec.)

Generally when pandoc sees an escaped newline, it assumes that this is a line break inside a block, so it assumes that the next line is not meant to start a new block.

Another example:

hi\
---

which doesn't turn into a setext header.

On the other hand, CommonMark has a philosophy of discerning block structure independently of inline parsing, so CommonMark would do things the way you suggest.

Still, I'm inclined to agree that this should be changed in the way you suggest.

The following patch implements your suggestion "do not consume the newline character itself":

diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 5694c43..8220bac 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1471,12 +1471,18 @@ escapedChar' = try $ do
 
 escapedChar :: PandocMonad m => MarkdownParser m (F Inlines)
 escapedChar = do
-  result <- escapedChar'
+  result <- lookAhead escapedChar'
   case result of
-       ' '   -> return $ return $ B.str "\160" -- "\ " is a nonbreaking space
-       '\n'  -> guardEnabled Ext_escaped_line_breaks >>
-                return (return B.linebreak)  -- "\[newline]" is a linebreak
-       _     -> return $ return $ B.str [result]
+       ' '   -> do
+         void $ count 2 anyChar
+         return $ return $ B.str "\160" -- "\ " is a nonbreaking space
+       '\n'  -> do
+         guardEnabled Ext_escaped_line_breaks
+         void $ anyChar -- eat the backslash, leaving the newline (see #3730)
+         return (return B.linebreak)  -- "\[newline]" is a linebreak
+       _     -> do
+         void $ count 2 anyChar
+         return $ return $ B.str [result]

The test suite fails because of a test involving this Markdown:

# Title\
foo

which current pandoc converts to

<h1 id="title-foo">Title<br />
foo</h1>

and the changed code converts to

<h1 id="title">Title<br />
</h1>
<p>foo</p>

So, one effect of making this change is that one can no longer use this trick to get newlines in headers. I have a feeling that some people may be relying on the current behavior, so this change would require some further discussion on pandoc-discuss (though I'm still inclined to make it, to bring pandoc's parsing closer to CommonMark).

jgm · 2017-06-10T08:55:36Z

Here's a different patch that might make more sense. It improves over current pandoc behavior in disallowing escaped newlines in some contexts where newlines aren't allowed. But it has the same effect as the above patch of disallowing hard breaks in headers.

diff --git a/src/Text/Pandoc/Readers/Markdown.hs b/src/Text/Pandoc/Readers/Markdown.hs
index 5694c43..807b178 100644
--- a/src/Text/Pandoc/Readers/Markdown.hs
+++ b/src/Text/Pandoc/Readers/Markdown.hs
@@ -1450,6 +1450,7 @@ inline = choice [ whitespace
                 , autoLink
                 , spanHtml
                 , rawHtmlInline
+                , escapedNewline
                 , escapedChar
                 , rawLaTeXInline'
                 , exampleRef
@@ -1466,16 +1467,20 @@ escapedChar' = try $ do
   (guardEnabled Ext_all_symbols_escapable >> satisfy (not . isAlphaNum))
      <|> (guardEnabled Ext_angle_brackets_escapable >>
             oneOf "\\`*_{}[]()>#+-.!~\"<>")
-     <|> (guardEnabled Ext_escaped_line_breaks >> char '\n')
      <|> oneOf "\\`*_{}[]()>#+-.!~\""
 
+escapedNewline :: PandocMonad m => MarkdownParser m (F Inlines)
+escapedNewline = try $ do
+  guardEnabled Ext_escaped_line_breaks
+  char '\\'
+  lookAhead (char '\n') -- don't consume the newline (see #3730)
+  return $ return B.linebreak
+
 escapedChar :: PandocMonad m => MarkdownParser m (F Inlines)
 escapedChar = do
   result <- escapedChar'
   case result of
        ' '   -> return $ return $ B.str "\160" -- "\ " is a nonbreaking space
-       '\n'  -> guardEnabled Ext_escaped_line_breaks >>
-                return (return B.linebreak)  -- "\[newline]" is a linebreak
        _     -> return $ return $ B.str [result]
 
 ltSign :: PandocMonad m => MarkdownParser m (F Inlines)

jgm · 2017-06-10T08:56:12Z

Further note: pandoc is a bit inconsistent, because if you leave two spaces at the end of the line, you don't get a hard break in a header:

# Title followed by two spaces  
content

Pandoc also disallows hard breaks in setext headers:

Title\
second line
----

doesn't produce a header.

This argues for removing the inconsistency by disallowing backslash-newline hard breaks in atx headers. The only reason not to do this is that it may break behavior that people are relying on (and this isn't something we should do lightly).

rose00 · 2017-06-11T19:37:04Z

The main problem with this bug, for me, is that a new block construct (fenced code) is not recognized after an escaped newline. Perhaps the best behavior is to allow a new block to start after the escaped newline but not terminate a header-line (is that the same kind of block?) if there is not a block-introducing construct after the escaped newline.

And here's more: The escaped newline, immediately followed by a fenced code block, was originally produced by pandoc working from an HTML source. It was surprisingly tricky to produce a minimized test case, but here is one (the <div> is required or the bug will not manifest):

$ echo '<div>nice line<br><pre class="impeccable">nice code</pre></div>' | pandoc -f html -t markdown
<div>

nice line\
``` {.impeccable}
nice code
```

</div>

This bug could be made moot for me if the markdown unparser put an extra newline between backslash and backtick, in this case. In fact, I work around the bug with a sed script to post-process the possibly-broken markdown:

$ sed '/[\\]$/{N;s/\([\\]\)\(\n\)\(```\)/\1\2\2\3/;}'

adunning · 2017-10-31T15:09:28Z

This behaviour was fairly widely documented for use with headings; I see the problem with it, but is there any other way of achieving the same thing?

agusmba · 2017-10-31T17:10:32Z

@adunning did you try with <br/> and writing the header in one line as proposed in the discussion topic you linked?

root@18cbcfcaf28c:/source# pandoc -t html -f markdown
# Line<br/>break

^D

<h1 id="line-break">Line<br/>break</h1>

If you need latex output, you could use raw_attributes

root@18cbcfcaf28c:/source# pandoc -t latex -f markdown
# Line`\\`{=latex} Break

^D

\hypertarget{line-break}{%
\section{\texorpdfstring{Line\\ Break}{Line Break}}\label{line-break}}

This doesn't look very optimal, since you'd need specific line-breaks for each desired output format, but it may serve as a workaround.

jgm · 2017-10-31T18:22:09Z

+++ Andrew Dunning [Oct 31 17 08:09 ]:

This behaviour was [1]fairly [2]widely [3]documented for use with headings; I see the problem with it, but is there any other way of achieving the same thing?

You could do have a filter convert `[]{.br}` into a line break, or something ad hoc like that.

adunning · 2017-10-31T18:23:11Z

Many thanks, both, for the ideas!

jgm added format:Markdown reader labels Jun 10, 2017

jgm added this to the pandoc 2.0 milestone Jun 10, 2017

jgm closed this as completed in b466152 Jun 11, 2017

dhebbeker mentioned this issue Feb 28, 2018

HTML line break / new line (<br>) converted to Markdown breaks following header (<h1>) #4418

Closed

jgm mentioned this issue Sep 26, 2018

slide tiltes with linebreaks no longer work with recent pandoc (beamer) #4936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

an escaped line break spoils the parsing of a following CodeBlock #3730

an escaped line break spoils the parsing of a following CodeBlock #3730

rose00 commented Jun 10, 2017 •

edited

Loading

jgm commented Jun 10, 2017 •

edited

Loading

jgm commented Jun 10, 2017

jgm commented Jun 10, 2017 •

edited

Loading

rose00 commented Jun 11, 2017 •

edited

Loading

adunning commented Oct 31, 2017

agusmba commented Oct 31, 2017

jgm commented Oct 31, 2017 via email

adunning commented Oct 31, 2017

an escaped line break spoils the parsing of a following CodeBlock #3730

an escaped line break spoils the parsing of a following CodeBlock #3730

Comments

rose00 commented Jun 10, 2017 • edited Loading

jgm commented Jun 10, 2017 • edited Loading

jgm commented Jun 10, 2017

jgm commented Jun 10, 2017 • edited Loading

rose00 commented Jun 11, 2017 • edited Loading

adunning commented Oct 31, 2017

agusmba commented Oct 31, 2017

jgm commented Oct 31, 2017 via email

adunning commented Oct 31, 2017

rose00 commented Jun 10, 2017 •

edited

Loading

jgm commented Jun 10, 2017 •

edited

Loading

jgm commented Jun 10, 2017 •

edited

Loading

rose00 commented Jun 11, 2017 •

edited

Loading