Allow to choose Markdown extensions used to parse metadata-file #6832

naivenaive · 2020-11-12T03:17:59Z

I tried to use pandoc to convert html file to docx. I use a YAML file to provide metadata data. As per the description of -metadata-file, 'string scalars in the YAML file will always be parsed as Markdown.' However, the superscript and subscript cannot be parsed correctly.

Here is an example. (pandoc 2.10.1, windows-x86-64)
metadata.yaml:
`---
title: "I am title"

author:
- Author One^1^
- Author Two~2~
- Author Three*3*
- Author Four**4**
---`

input.html:
'<p>I am input content</p>'

input.md:
'I am input content'

After calling pandoc --metadata-file metadata.yaml input.html -o output_html.docx, I get the following output_html.docx file.
'I am title
Author One^1^ //Incorrect
Author Two~2~ //Incorrect
Author Three3
Author Four4
I am input content
'
After calling pandoc --metadata-file metadata.yaml input.md -o output_md.docx, I get the following output_md.docx file.
'I am title
Author One¹
Author Two₂
Author Three3
Author Four4
I am input content
'
The emphasis symbol(*, **) can be parsed correctly. However, the superscript and subscript of metadata.yaml can only be handled and display correctly when the input is in .md format and can not be handled correctly when the input file is in .html format.
Please help to solve the problem.

The text was updated successfully, but these errors were encountered:

jgm · 2020-11-12T05:23:10Z

This is due to #5272 (commit 0216b68).

Yes, it's parsed as markdown -- but there's a question which extensions are enabled on the markdown.

After the fix to #5272, we use the extensions enabled on the reader. This makes sense, in that if you do -f gfm, then gfm extensions are enabled in the metadata file, and if you do -f markdown, then pandoc extensions are enabled, etc. But when your main input is HTML, it means that no markdown extensions will be enabled, because HTML doesn't take markdown extensions.

jgm · 2020-11-12T05:23:35Z

I'm not really sure what the best solution is, since the fix to #5272 was well motivated and makes sense in most contexts.

naivenaive · 2020-11-12T23:05:14Z

Thanks for your help.

I think metadata.yaml is parsed as gfm format in default so the superscript(^) and subscript(~) symbol cannot be parsed correctly but the bold(**) and italic(*) symbol can be parsed. Is it possible to set the default parser of metadata.yaml to pandoc markdown parser instead of gfm parser?

jgm · 2020-11-12T23:31:54Z

It's possible, and in fact it was done like this before the change noted above.
It was thought useful to allow the extensions to be determined by the user, but I don't think we considered the case where the main input format is not markdown.

naivenaive · 2020-11-16T01:24:47Z

Yes, It will be great to let users determine which parser is used to parse metadata.yaml. Despite the main input format is not markdown, Pandoc performs very well in converting .html file to .docx file. There is only small problem in parsing metadata.yaml.

mb21 · 2020-11-16T09:41:19Z

I don't think we considered the case where the main input format is not markdown.

Nope, we left that case to solve for a later time... from #1960 (comment):

We can always add more things later, like:

parsing .. meta:: in RST or <meta> in HTML (which would act analogous to the current YAML metadata blocks in markdown)

adding an additional option that specifies the markup language the metadata is interpreted as (overriding the default which would be set to markdown).

tarleb · 2021-05-05T11:38:13Z

Closing in favor of the more general issue #5914.

jgm · 2022-02-17T17:02:01Z

Reopening because #4914 is rather different. It concerns non-markdown content in metadata files ; this concerns markdown content in the metadata file combined with -f html.

...so that, when the input format is not markdown or a markdown variant, pandoc's markdown is used. When the input format is a markdown variant, the same format is used. Reason for the change: it doesn't make sense to run the markdown parser with a set of extensions designed for a non-markdown format, and this dramatically limits what people can do in metadata files. Refines #6832. Closes #7926. Perhaps this can be reconsidered if we come up with a way of specifying an arbitrary format for the metadata file (#5914).

tarleb changed the title ~~Fail to parse superscript in metadata.yaml~~ Allow to choose Markdown extensions used to parse metadata-file May 5, 2021

tarleb added enhancement format:Markdown status:more-discussion-needed labels May 5, 2021

tarleb closed this as completed May 5, 2021

fedeinthemix mentioned this issue Aug 5, 2021

Listing crossrefs when converting from LaTeX to HTML lierdakil/pandoc-crossref#319

Open

jgm mentioned this issue Feb 17, 2022

Raw header-includes #7926

Closed

jgm reopened this Feb 17, 2022

jgm closed this as completed Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to choose Markdown extensions used to parse metadata-file #6832

Allow to choose Markdown extensions used to parse metadata-file #6832

naivenaive commented Nov 12, 2020

jgm commented Nov 12, 2020

jgm commented Nov 12, 2020

naivenaive commented Nov 12, 2020

jgm commented Nov 12, 2020

naivenaive commented Nov 16, 2020

mb21 commented Nov 16, 2020

tarleb commented May 5, 2021

jgm commented Feb 17, 2022

Allow to choose Markdown extensions used to parse metadata-file #6832

Allow to choose Markdown extensions used to parse metadata-file #6832

Comments

naivenaive commented Nov 12, 2020

jgm commented Nov 12, 2020

jgm commented Nov 12, 2020

naivenaive commented Nov 12, 2020

jgm commented Nov 12, 2020

naivenaive commented Nov 16, 2020

mb21 commented Nov 16, 2020

tarleb commented May 5, 2021

jgm commented Feb 17, 2022