What is PdfItDown?
PdfItDown is a python package that relies onmarkitdown by Microsoft, markdown_pdf and img2pdf to carry out the conversion of text-based files, images and unstructured documents to PDF. PdfItDown is applicable to the following file formats:
- Markdown
- PowerPoint
- Word
- Excel
- HTML
- Text-based formats (CSV, XML, JSON)
- ZIP files (iterates over contents)
- Image files (PNG, JPG)
Setting up
To set uo PdfItDown, it is good practice to create an isolated development environmentChoose how to use it
Once you have PdfItDown set up, you can choose how to use it:In Python Scripts
Adapt PdfItDown to your pipelines by coding it yourself!
In the CLI
If you’re a terminal lover, this is perfect for you :)
As an API endpoint
Deploy PdfItDown as an API endpoint in your Starlette-compatible backend server.