Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX writer: embedding pdf and video files #7181

Open
MinmoTech opened this issue Mar 28, 2021 · 14 comments
Open

LaTeX writer: embedding pdf and video files #7181

MinmoTech opened this issue Mar 28, 2021 · 14 comments

Comments

@MinmoTech
Copy link

Hey! :)
I'm trying some things out to create a workflow with pandoc markdown that is similar to org-noter.
This means I want to reference locations inside of pdfs (and later video files as well) inside my documents.
This is already somewhat possible by using html and latex directly inside of markdown:

---
header-includes: |
    \usepackage{pdfpages}
---

# Test title

Have some test text

\```{=html}
<object data="loremipsum.pdf#page=2" type="application/pdf" width="700px" height="700px">
    <embed src="loremipsum.pdf#page=2">
        <p>PDF download: <a href="loremipsum.pdf">Download PDF</a>.</p>
    </embed>
</object>
\```

\```{=latex}
\includepdf[pages={2-3}]{loremipsum.pdf}
\```

I put a backslash in front of the triple backticks so they don't break the code block.

After converting this markdown document into pdf, it embeds pages 2-3 from the loremipsum.pdf into the newly generated pdf.
output.pdf

The generated html has the complete pdf embeded, but starts at page 2.

output.html:

<h1 id="test-title">Test title</h1>
<p>Have some test text</p>
<object data="loremipsum.pdf#page=2" type="application/pdf" width="700px" height="700px">
    <embed src="loremipsum.pdf#page=2">
        <p>Referenced pages 2-3: <a href="loremipsum.pdf">Download PDF</a>.</p>
    </embed>
</object>

loremipsum.pdf

This part: <p>Referenced pages 2-3: <a href="loremipsum.pdf">Download PDF</a>.</p> only gets displayed in case the browser does not support embedded pdfs.

Now I would like to have some syntax in pandoc markdown that can generate each of these blocks for their respective filetypes.
I would imagine the syntax to be something like this:
!embed[pdf-title](loremipsum.pdf:2-3)
Or anything like it.

A similar thing might be possible for embedding video files in html:

<object data="project.mp4#t=66,71" type="video/mp4">
    <embed src="project.mp4#t=66,71">
        <p>Referenced time: 66-71<a href="project.mp4">Download mp4</a>.</p>
    </embed>
</object>

And you even might be able to include these in pdf, but that might be undesireable.

If this is a stupid idea, feel free to close this issue 👋

@mb21
Copy link
Collaborator

mb21 commented Mar 29, 2021

For html output, this already works (perhaps this could be better documented):

~ echo '![](foo.mov)' | pandoc
<p><video src="foo.mov" controls=""><a href="foo.mov">Video</a></video></p>

~ echo '![](foo.pdf)' | pandoc
<p><embed src="foo.pdf" /></p>

But not for LaTeX/PDF output AFAIK...

@mb21 mb21 changed the title Feature request for pandoc markdown: embedding pdf and video files LaTeX writer: embedding pdf and video files Mar 29, 2021
@MinmoTech
Copy link
Author

Ohh that's awesome, but can you reference the pages/timestamps?

@MinmoTech
Copy link
Author

After trying it out, it seems like it does!

echo '![](foo.pdf#page=2-3)' | pandoc 
<p><embed src="foo.pdf#page=2-3" /></p>

This still opens the pdf at the correct page.

@Polirecyliente
Copy link

@juligreen you said:

And you even might be able to include these in pdf, but that might be undesireable.

This is not undesirable, quite the contrary, embedding videos in PDF would be useful just like embedding images in PDF.

@MinmoTech
Copy link
Author

MinmoTech commented Apr 9, 2021

Though regarding video files, while it works correctly with ".mov" and ".mp4" files, ".mkv" get incorrectly recognized as images

~ echo '![](foo.mkv)' | pandoc
<p><img src="foo.mkv" /></p>

@mb21
Copy link
Collaborator

mb21 commented Apr 10, 2021

yeah, I guess we could add mkv here... and/or make a pull to happstack where we get that list from.

About adding support of videos to the default LaTeX template, I think if it requires uncommon packages, then we would not want to do that... as you can simply use a custom template if you need to...

@heygent
Copy link

heygent commented Mar 29, 2022

Right now, I am able to embed the first page of a pdf with the image syntax, however I cannot specify the page. I think an easy way to implement pdf page embedding would be to pass the attribute page as is to the \includegraphics command, in a similar way done for the width and height attributes.

https://pandoc.org/MANUAL.html#extension-link_attributes

So that a specific pdf page could be embedded this way:

![Global frog population as of 1943.](slides.pdf){ page=13 }

Right now I am able to achieve what I want by putting in the raw latex command directly, but that comes at the cost of simple captioning. Plus, the graphicx package doesn't get loaded unless the ![]() syntax is used, so it's necessary to put an image somewhere or to load the package manually in order for this to work.

\includegraphics[page=6]{../slide/Lecture03ns02.pdf}

@jgm
Copy link
Owner

jgm commented Mar 29, 2022

So you can specify page=6 and includegraphics will use that page of a pdf? I didn't know that. Yes, I think it would be straightforward to react this way to the page attribute if present.

jgm added a commit that referenced this issue Mar 29, 2022
These are actually supported by `\includegraphics`, though
this is not well documented. See
https://tex.stackexchange.com/questions/7938/pdflatex-includegraphics-and-multi-page-pdf-files

Partially addresses #7181.
@MinmoTech
Copy link
Author

Just as a heads up, I think the mime-type of mkv is still not correctly set as mentioned here: #7181 (comment)

@jgm
Copy link
Owner

jgm commented Mar 30, 2022

What should the mime type of .mkv be?

@mb21
Copy link
Collaborator

mb21 commented Mar 30, 2022

@jgm Wikipedia says video/x-matroska although browser support used to be suboptimal for this mime type... can somebody in this thread try again with today's browsers?

@MinmoTech
Copy link
Author

So the answer to that is not super easy.
This page shows container-browser compatibility: https://jellyfin.org/docs/general/clients/codec-support.html#container-compatibilityhttpsdevelopermozillaorgen-usdocswebmediaformatscontainers

Firefox and Safari are the only browsers without mkv support currently (though I'm not even certain about Safari)
I created an html file with the following contents:

<p><video src="test.mkv" controls=""><a href="test.mkv">Video</a></video></p>

This plays without problems in chromium-based browsers, but shows the following for Firefox:
image

But it doesn't end there since mkv can hold any kind data inside which is especially problematic with some video and audio codecs, since they don't play at all in most browsers (the ones that work will show, the ones that don't will not, so you can have audio playing without video for example).

Btw adding type="video/webm" as the mime-type explicitly to the html element didn't change anything in my testing but it might it some rare edge-cases.

Even with all my ramblings here, I think the option should still be given to the user to embed mkv files if they know that they are compatible (and I pray for Firefox support in the meantime).

The thing I actually want is echo '![](foo.mkv)' | pandoc outputting <p><video src="foo.mkv" controls=""><a href="foo.mkv">Video</a></video></p> instead of <p><img src="foo.mkv" /></p> like it currently does

@mb21
Copy link
Collaborator

mb21 commented Mar 30, 2022

Thanks for the investigation! Adding video/x-matroska as the mime type in the pandoc source code will cause pandoc to output <video src="foo.mkv"... so we probably should just do that...

jgm added a commit that referenced this issue Mar 30, 2022
@marcindulak
Copy link

Though regarding video files, while it works correctly with ".mov" and ".mp4" files, ".mkv" get incorrectly recognized as images

~ echo '![](foo.mkv)' | pandoc
<p><img src="foo.mkv" /></p>

I'm reading the discussion above, and including video in pdfs appears to be supported.
Do I understand it correctly that the goal is to insert the video itself, not just a "screenshot" of it?
I'm trying the following, but getting an error:

grep PRETTY /etc/*release
/etc/os-release:PRETTY_NAME="Ubuntu 20.04.5 LTS"

wget https://github.com/jgm/pandoc/releases/download/2.19.2/pandoc-2.19.2-linux-amd64.tar.gz
tar zxf pandoc-2.19.2-linux-amd64.tar.gz

# Download a CC -BY licensed video https://www.eso.org/public/videos/eso2207a/
wget https://cdn.eso.org/videos/medium_podcast/eso2207a.mp4
file eso2207a.mp4 
eso2207a.mp4: ISO Media, Apple iTunes Video (.M4V) Video

echo '![](eso2207a.mp4)' > test.md

./pandoc-2.19.2/bin/pandoc test.md -o test.pdf
[WARNING] Could not convert image /tmp/tex2pdf.-eabaa3f94a509656/eso2207a.mp4: Cannot load file
  Jpeg Invalid marker used
  PNG Invalid PNG file, signature broken
  Bitmap Invalid Bitmap magic identifier
  GIF Invalid Gif signature : ft
  HDR Invalid radiance file signature
  Tiff Invalid endian tag value
  TGA Width is null or negative
Error producing PDF.
! LaTeX Error: Unknown graphics extension: .mp4.

See the LaTeX manual or LaTeX Companion for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.68 ...mp/tex2pdf.-eabaa3f94a509656/eso2207a.mp4}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants