Use this file to discover all available pages before exploring further.
PdfItDown is designed to be flexible and highly customizable, while also providing a ready-to-use solution that you can employ in your code immediately without any modifications.Additionally, you can easily mount PdfItDown into a Starlette-based server, allowing you to go from local development to deployment in just a few lines of code!Let’s explore three levels of usage:
Once you are comfortable with the basics, you may want more control within your application. PdfItDown offers the perfect solution: custom callbacks for conversion.Instead of using the default callback, which can convert almost any file format to PDF, you can provide your own logic to the Converter class, focusing on specific document types.
The conversion_callback must follow a specific function signature:
While parameter names can vary, the order must remain the same.
Here are a couple of examples of what you can do:
from pathlib import Pathfrom pdfitdown.pdfconversion import Converterfrom markdown_pdf import MarkdownPdf, Sectionfrom google import genaiclient = genai.Client()def conversion_callback(input_file: str, output_file: str, title: str | None = None, overwrite: bool = True): uploaded_file = client.files.upload(file=Path(input_file)) response = client.models.generate_content( model="gemini-2.0-flash", contents=["Based on the attached documentation piece, please provide a summary for educational purposes that can be used as material for our developer community to grow and learn", uploaded_file], ) content = response.text pdf = MarkdownPdf(toc_level=0) pdf.add_section(Section(content)) pdf.meta["title"] = title or f"{input_file} - Summary" pdf.save(output_file) return output_fileconverter = Converter(conversion_callback=conversion_callback)converter.convert_directory(directory="docs/", recursive=True)
from pdfitdown.pdfconversion import Converterfrom markdown_pdf import MarkdownPdf, Sectionfrom google import genaifrom pathlib import Pathclient = genai.Client()def conversion_callback(input_file: str, output_file: str, title: str | None = None, overwrite: bool = True): if Path(input_file).suffix not in [".json", ".yaml", ".yml"]: raise ValueError("File is not an OpenAPI spec document") uploaded_file = client.files.upload(file=Path(input_file)) response = client.models.generate_content( model="gemini-2.0-flash", contents=["Can you please provide a human-readable and elegant description of the attached OpenAPI spec, with routes and associated names, paths, and request/response formats?", uploaded_file], ) content = response.text pdf = MarkdownPdf(toc_level=0) pdf.add_section(Section(content)) pdf.meta["title"] = title or f"{input_file} - OpenAPI Spec" pdf.save(output_file) return output_fileconverter = Converter(conversion_callback=conversion_callback)converter.convert(file_path="openapi.json", output_path="openapi_spec.pdf")
For more examples, check out the cookbooks in the GitHub repository!
Now, run your application with uvicorn (assuming you saved it as api.py):
uvicorn api:app --host 0.0.0.0 --port 80
The /conversions/pdf route (mounted with PdfItDown conversion features) accepts one or more files as multipart/form-data.If the request is successful, the response consists of a stream of the PDF content bytes.Here is an example of how to call the API endpoint:
import httpxwith open("file.txt", "rb") as f: content = f.read()# The field name must match the `uploaded_file_field` parameter# used in the `mount` functionfiles = {"file": ("file.txt", content, "text/plain")}with httpx.Client() as client: response = client.post("http://localhost:80/conversions/pdf", files=files) response.raise_for_status() with open("file.pdf", "wb") as f: f.write(response.content)
With this, you have completed the third and final level of PdfItDown mastery!