Late Interaction and ColPali: The Future of Efficient Semantic Search
Beyond Bi-Encoders: The Rise of Late Interaction
In the world of Information Retrieval (IR), we usually face a trade-off between speed and accuracy.
1. Bi-Encoders (The Speed Kings)
Bi-encoders (like standard BERT embeddings) encode the query and the document independently into a single vector. Search is just a cosine similarity between these two points. It’s incredibly fast (sub-millisecond) but loses fine-grained details because the entire document is compressed into one fixed-size vector.
2. Cross-Encoders (The Accuracy Masters)
Cross-encoders feed both the query and the document into the model simultaneously (Early Interaction). The model can attend to every word in the query relative to every word in the document. This is highly accurate but computationally expensive because you must run the model for every single query-document pair. You can’t pre-compute embeddings.
3. Late Interaction: The Best of Both Worlds
Late Interaction models, pioneered by ColBERT, bridge this gap. Instead of one vector per document, they store a vector for every single token in the document.
When a query comes in: 1. The query is encoded into token-level embeddings. 2. A MaxSim (Maximum Similarity) operation is performed: for each query token, we find the document token that matches it best. 3. We sum these maximum similarities to get the final score.
This allows the model to perform fine-grained matching (like a cross-encoder) while still allowing document embeddings to be pre-computed (like a bi-encoder).
ColPali: Retrieval Without OCR
One of the most exciting recent developments is ColPali. Traditional PDF retrieval requires a complex pipeline: OCR the text, chunk it, and then embed it. This often fails on tables, charts, and complex layouts.
ColPali applies the Late Interaction principle to vision models (PaliGemma). It treats image patches of a PDF page as “tokens.” Instead of reading text, it “looks” at the page and matches query tokens directly to visual features.
Key Benefits of ColPali: - Layout Aware: It understands that a caption belongs to a specific image. - OCR-Free: No more messy text extraction from scanned documents. - Superior Retrieval: It outperforms traditional text-based RAG on visually rich documents.
Ecosystem and Tools
If you want to implement Late Interaction today, these are the projects to watch: - ColBERTv2: The optimized version of the original late interaction model. - PyLate: A flexible Python library for training and using late interaction models. - PLAID: An extremely fast engine for searching ColBERT vectors. - Model2Vec: While focused on static embeddings, it shows the trend towards more efficient representation learning.
Late interaction is transforming how we think about retrieval, moving us away from “one vector fits all” towards a more nuanced, token-aware future.
Internal Resources
If you’re interested in more technical deep dives or information on my research, check out these sections: