Skip to content

AstraBert/PdfItDown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PdfItDown

Convert Everything to PDF


PdfItDown Logo

Important

markdown-pdf is now implemented, with support for PyMuPdf v1.25.1, internally in PdfItDown. Make sure to install the latest version of the package (from 0.0.4 on) to avoid errors such as the one in this issue

PdfItDown is a python package that relies on markitdown by Microsoft and (a slightly modified version of) markdown_pdf.

Applicability

PdfItDown is applicable to the following file formats:

  • Markdown
  • PowerPoint
  • Word
  • Excel
  • HTML
  • Text-based formats (CSV, XML)
  • ZIP files (iterates over contents)

How does it work?

PdfItDown works in a very simple way:

  • From markdown to PDF
graph LR
2(Input File) --> 3[Markdown content] 
3[Markdown content] --> 4[markdown-pdf]
4[markdown-pdf] --> 5(PDF file)
Loading
  • From other text-based file formats to PDF
graph LR
2(Input File) -->  3[markitdown]
3[markitdown] -->  4[Markdown content]
4[Markdown content] --> 5[markdown-pdf]
5[markdown-pdf] --> 6(PDF file)
Loading

Installation and Usage

To install PdfItDown, just run:

python3 -m pip install pdfitdown

You can now use the command line tool:

usage: pdfitdown [-h] -i INPUTFILE -o OUTPUTFILE [-t TITLE]

options:
  -h, --help            show this help message and exit
  -i INPUTFILE, --inputfile INPUTFILE
                        Path to the input file that needs to be converted to PDF
  -o OUTPUTFILE, --outputfile OUTPUTFILE
                        Path to the output PDF file
  -t TITLE, --title TITLE
                        Title to include in the PDF metadata. Default: 'PDF Title'

An example usage can be:

pdfitdown -i README.md -o README.pdf -t "README"

Or you can use it inside your python scripts:

  • To convert .pptx/.docx/.csv/.json/.xml/.html/.zip file to PDF
from pdfitdown.pdfconversion import convert_to_pdf

output_pdf = convert_to_pdf(file_path = "BusinessGrowth.xlsx", output_path = "business_growth.pdf", title = "Business Growth")
  • To convert a .md file to PDF:
from pdfitdown.pdfconversion import convert_markdown_to_pdf

output_pdf = convert_markdown_to_pdf(file_path = "BusinessGrowth.md", output_path = "business_growth.pdf", title = "Business Growth")

In these examples, you will find the output PDF under business_growth.pdf.

Or you can just launch a Gradio-based user interface:

pdfitdown_ui

You will be able to see the application running on http://localhost:7860 within seconds!

Watch the demo here:

Watch the video demo!

Contributing

Contributions are always welcome!

Find contribution guidelines at CONTRIBUTING.md

License and Funding

This project is open-source and is provided under an MIT License.

If you found it useful, please consider funding it.