ingest-anything
is a python package aimed at providing a smooth solution to ingest non-PDF files into vector databases, given that most ingestion pipelines are focused on PDF/markdown files. Leveraging chonkie, PdfItDown, Llamaindex, Sentence Transformers embeddings and Qdrant, ingest-anything
gives you a fully-automated pipeline for document ingestion within few lines of code!
Workflow

- The input files are converted into PDF by PdfItDown
- The PDF text is extracted using LlamaIndex Docling reader
- The text is chunked exploiting Chonkie’s functionalities
- The chunks are embedded thanks to Sentence Transformers models
- The embeddings are loaded into a Qdrant vector database
- The text is extracted from code files using LlamaIndex SimpleDirectoryReader
- The text is chunked exploiting Chonkie’s CodeChunker
- The chunks are embedded thanks to Sentence Transformers models
- The embeddings are loaded into a Qdrant vector database
Installation and usage
ingest-anything
can be installed using pip
in the following way:
- You can initialize the interface for text-based files like this:
- And ingest your files:
- You can also initialize the interface for code files
- And then ingest your code files: