Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF generation #1

Closed
victorklos opened this issue Jan 18, 2019 · 11 comments · Fixed by #20
Closed

PDF generation #1

victorklos opened this issue Jan 18, 2019 · 11 comments · Fixed by #20
Labels
enhancement New feature or request
Milestone

Comments

@victorklos
Copy link

Expected Behavior

A generated pdf document after pdoc3 --pdf time.

Actual Behavior

usage: pdoc3 [-h] [--version] [--filter STRING] [--html] [--html-dir DIR]
             [--html-no-source] [--overwrite] [--external-links]
             [--template-dir DIR] [--link-prefix STRING] [--http HOST:PORT]
             MODULE [MODULE ...]
pdoc3: error: unrecognized arguments: --pdf

Steps to Reproduce

  1. Read homepage https://pdoc3.github.io/pdoc/
  2. Execute command above

Additional info

  • pdoc version: 0.5.1
@victorklos
Copy link
Author

BTW I have some experience with pandoc so if you need help please let me know...

@kernc
Copy link
Member

kernc commented Jan 18, 2019

Thanks. How would you implement PDF generation using pandoc?

@victorklos
Copy link
Author

First step would be to generate a single page output (even HTML would be interesting in itself, e.g. if you want to email the documentation). Pandoc needs an input and a template. The input could be said HTML document, or markdown or what it is you currently generate. The template is in LaTeX. Pandoc would than be required on the path.

Creating PDF files through LaTeX is a bit of a pain through, as it depends on texlive which is over 2GB when installed fully. Maybe offer the option to compile through docker? Most developers have that installed nowadays I guess...

Some alternatives are possible:

  • focus on a single-page HTML output and convert it through a browser or local PDF printer
  • write your own PDF output generator, e.g. based on reportlab or pdfkit (not recommended)

@kernc
Copy link
Member

kernc commented Jan 18, 2019

I tried pandoc to convert pdoc documentation index.html as well as some other HTML. It didn't work. The latex converters seem picky about everything, including non-ASCII characters and referenced SVG images. There are some indications (jgm/pandoc#1793 (comment)) that alternative engines should be preferred for better results. Engine html5, though, requires wkhtmltopdf, which itself is a largish binary the user would require and can then as well be used standalone.

Pdoc indeed already contains some provisions for printing:

<style media="print">${css.print()}</style>

<%def name="print()" filter="minify_css">
@media print {
#sidebar h1 {
page-break-before: always;
}
.source {
display: none;
}
}
@media print {
* {
background: transparent !important;
color: #000 !important; /* Black prints faster: h5bp.com/s */
box-shadow: none !important;
text-shadow: none !important;
}
a,
a:visited {
text-decoration: underline;
}
a[href]:after {
content: " (" attr(href) ")";
}
abbr[title]:after {
content: " (" attr(title) ")";
}
/*
* Don't show links for images, or javascript/internal links
*/
.ir a:after,
a[href^="javascript:"]:after,
a[href^="#"]:after {
content: "";
}
pre,
blockquote {
border: 1px solid #999;
page-break-inside: avoid;
}
thead {
display: table-header-group; /* h5bp.com/t */
}
tr,
img {
page-break-inside: avoid;
}
img {
max-width: 100% !important;
}
@page {
margin: 0.5cm;
}
p,
h2,
h3 {
orphans: 3;
widows: 3;
}
h1,
h2,
h3,
h4,
h5,
h6 {
page-break-after: avoid;
}
}
</%def>

If we could somehow leverage the common web browsers, some instance of which exists in almost all environments, that'd be great! I was looking into Selenium / WebDriver API and whether it supports printing to file, but it appears this is not the (common) case.

I found this Reddit thread, comparing several possible approaches, and of the listed I feel like preferring running Chrome the most.

chromium --headless --disable-gpu --print-to-pdf=output.pdf input.html

Second to that maybe WeasyPrint, but that has a list of dependencies that may not be so easy to support in all environments (e.g. Windos without a C/C++ compiler).

@kernc kernc changed the title PDF generation is mentioned in docs but seems unavailable PDF generation Jan 18, 2019
@victorklos
Copy link
Author

I like the chromium route best. How much work would it be to generate a single HTML file? The current index file can become the TOC, the rest chapters, all links internal.

@kernc
Copy link
Member

kernc commented Jan 18, 2019

Currently, every module is rendered and written out separately:

pdoc/pdoc/cli.py

Lines 253 to 275 in e7868e2

def write_html_files(m: pdoc.Module):
f = module_html_path(m)
dirpath = path.dirname(f)
if not os.access(dirpath, os.R_OK):
os.makedirs(dirpath)
try:
with open(f, 'w+', encoding='utf-8') as w:
w.write(m.html(
external_links=args.external_links,
link_prefix=args.link_prefix,
source=not args.html_no_source,
))
except Exception:
try:
os.unlink(f)
except Exception:
pass
raise
for submodule in m.submodules():
write_html_files(submodule)

So not too much, but it would certainly bear generating a new mako template that handles a list of modules and in which all pdoc.Doc.url() calls are trimmed to URL fragments. Then, I guess, a new command line switch can be added.

Is ths something you would care to work on?

@kernc
Copy link
Member

kernc commented Jan 18, 2019

Or, probably better yet, adapting existing HTML template. It already handles a list of modules in a way when pdoc is run as --http web server:

<%def name="show_module_list(modules)">
<h1>Python module list</h1>
% if not modules:
<p>No modules found.</p>
% else:
<dl id="http-server-module-list">
% for name, desc in modules:
<div class="flex">
<dt><a href="${link_prefix}${name}">${name}</a></dt>
<dd>${desc | glimpse, to_html}</dd>
</div>
% endfor
</dl>
% endif
</%def>

@kernc kernc added this to the 0.6.0 milestone Jan 24, 2019
@kernc kernc added the enhancement New feature or request label Jan 27, 2019
@kernc
Copy link
Member

kernc commented Feb 3, 2019

@victorklos I made some progress #20 using markdown and pandoc. If you're still interested, please have a look.

@victorklos
Copy link
Author

Great! I will, but earliest this weekend...

@victorklos
Copy link
Author

The generated PDF already looks great though! Maybe add some pdoc3 styling later on.

@kernc
Copy link
Member

kernc commented Feb 6, 2019

It now prints straight to markdown, so any styling would need to be overridden on pandoc/LaTeX level, or a new set of CSS written and included as raw HTML for conversion through intermediate HTML.

@kernc kernc closed this as completed in #20 Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

2 participants