Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust pandoc conversion #16

Open
marph91 opened this issue Oct 10, 2024 · 1 comment
Open

More robust pandoc conversion #16

marph91 opened this issue Oct 10, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@marph91
Copy link
Owner

marph91 commented Oct 10, 2024

Affects all formats that use pandoc in the background. Tables are only converted to Markdown pipe tables if they are simple tables. Also lists aren't converted always.

Examples:

Pandoc allows to use filters in lua that are fast and manipulate the AST directly. This would eliminate the need of another preprocessing step and apply to all formats (i. e. HTML and DOCX). The filter should:

  • remove all divs (and other elements) from cells
  • convert lists to - [ ] ... + newline
  • convert newlines to some intermediate char (sequencs) and replace it in postprocessing
@marph91
Copy link
Owner Author

marph91 commented Jan 4, 2025

A first attempt to migrate to lua filters wasn't successful. See the linked PR. For now the extra preprocessing step is needed.

Advantages of LUA filters:

  • Fast
  • Only one filter path (it doesn't matter if the source was a file or text and which format)

Disadvantages of LUA filters:

  • Would introduce a second script language to this repo
  • Some modifications needed directly at the source. This would introduce a split between Python and LUA filters
  • Learning curve of LUA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant