Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to choose Markdown extensions used to parse metadata-file #6832

Closed
naivenaive opened this issue Nov 12, 2020 · 8 comments
Closed

Allow to choose Markdown extensions used to parse metadata-file #6832

naivenaive opened this issue Nov 12, 2020 · 8 comments

Comments

@naivenaive
Copy link

I tried to use pandoc to convert html file to docx. I use a YAML file to provide metadata data. As per the description of -metadata-file, 'string scalars in the YAML file will always be parsed as Markdown.' However, the superscript and subscript cannot be parsed correctly.

Here is an example. (pandoc 2.10.1, windows-x86-64)
metadata.yaml:
`---
title: "I am title"

author:
- Author One^1^
- Author Two~2~
- Author Three*3*
- Author Four**4**
---`

input.html:
'<p>I am input content</p>'

input.md:
'I am input content'

After calling pandoc --metadata-file metadata.yaml input.html -o output_html.docx, I get the following output_html.docx file.
'I am title
Author One^1^ //Incorrect
Author Two~2~ //Incorrect
Author Three3
Author Four4
I am input content
'
After calling pandoc --metadata-file metadata.yaml input.md -o output_md.docx, I get the following output_md.docx file.
'I am title
Author One1
Author Two2
Author Three3
Author Four4
I am input content
'
The emphasis symbol(*, **) can be parsed correctly. However, the superscript and subscript of metadata.yaml can only be handled and display correctly when the input is in .md format and can not be handled correctly when the input file is in .html format.
Please help to solve the problem.

@jgm
Copy link
Owner

jgm commented Nov 12, 2020

This is due to #5272 (commit 0216b68).

Yes, it's parsed as markdown -- but there's a question which extensions are enabled on the markdown.

After the fix to #5272, we use the extensions enabled on the reader. This makes sense, in that if you do -f gfm, then gfm extensions are enabled in the metadata file, and if you do -f markdown, then pandoc extensions are enabled, etc. But when your main input is HTML, it means that no markdown extensions will be enabled, because HTML doesn't take markdown extensions.

@jgm
Copy link
Owner

jgm commented Nov 12, 2020

I'm not really sure what the best solution is, since the fix to #5272 was well motivated and makes sense in most contexts.

@naivenaive
Copy link
Author

Thanks for your help.

I think metadata.yaml is parsed as gfm format in default so the superscript(^) and subscript(~) symbol cannot be parsed correctly but the bold(**) and italic(*) symbol can be parsed. Is it possible to set the default parser of metadata.yaml to pandoc markdown parser instead of gfm parser?

@jgm
Copy link
Owner

jgm commented Nov 12, 2020

It's possible, and in fact it was done like this before the change noted above.
It was thought useful to allow the extensions to be determined by the user, but I don't think we considered the case where the main input format is not markdown.

@naivenaive
Copy link
Author

Yes, It will be great to let users determine which parser is used to parse metadata.yaml. Despite the main input format is not markdown, Pandoc performs very well in converting .html file to .docx file. There is only small problem in parsing metadata.yaml.

@mb21
Copy link
Collaborator

mb21 commented Nov 16, 2020

I don't think we considered the case where the main input format is not markdown.

Nope, we left that case to solve for a later time... from #1960 (comment):

We can always add more things later, like:

  • parsing .. meta:: in RST or <meta> in HTML (which would act analogous to the current YAML metadata blocks in markdown)
  • adding an additional option that specifies the markup language the metadata is interpreted as (overriding the default which would be set to markdown).

@tarleb tarleb changed the title Fail to parse superscript in metadata.yaml Allow to choose Markdown extensions used to parse metadata-file May 5, 2021
@tarleb
Copy link
Collaborator

tarleb commented May 5, 2021

Closing in favor of the more general issue #5914.

@jgm
Copy link
Owner

jgm commented Feb 17, 2022

Reopening because #4914 is rather different. It concerns non-markdown content in metadata files ; this concerns markdown content in the metadata file combined with -f html.

@jgm jgm closed this as completed Feb 18, 2022
jgm added a commit that referenced this issue Feb 18, 2022
...so that, when the input format is not markdown or a markdown
variant, pandoc's markdown is used.  When the input format is
a markdown variant, the same format is used.  Reason for the change:
it doesn't make sense to run the markdown parser with a set of
extensions designed for a non-markdown format, and this dramatically
limits what people can do in metadata files.

Refines #6832.  Closes #7926.

Perhaps this can be reconsidered if we come up with a way
of specifying an arbitrary format for the metadata file (#5914).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants