Diversifying Search Results with Pyversity and Qdrant
blogging
minishlab
embedding
qdrant
til
Diversifying search results with Qdrant and Pyversity for better RAG System with MMR, DDP and other algorthims
Author
kareem
Published
December 12, 2025
Why Diversify Search Results?
While building a real-estate search engine using RAG (Retrieval-Augmented Generation) across multiple collections, we hit an interesting problem: our top results were often too similar.
Example scenario: - User query: “I want a unit in New Cairo” - Top 5 results: 3 units from Palm Hills, 1 from Sodic, 1 from Radix
The issue? Our agent’s responses became heavily skewed toward Palm Hills properties, leading customers to believe we were manipulating results to favor specific developers.
The solution: Diversification algorithms like MMR (Maximal Marginal Relevance) help balance relevance with variety.
I found the amazing Pyversity - a lightweight Python library that implements multiple diversification strategies.
In this post, we’ll explore: 1. What Pyversity offers and how it works 2. Qdrant’s built-in MMR capabilities 3. Combining Pyversity with Qdrant for flexible diversification
Meet Pyversity: Your Diversification Toolkit
Pyversity is a lightweight library that solves a common problem: search results that all look the same. It re-ranks your results to surface items that are relevant and different from each other.
What makes it special? - Multiple strategies: MMR, MSD, DPP, COVER, and SSD - each with different strengths - Minimal dependencies: Just NumPy - Simple API: One function to rule them all
Let’s see it in action with a quick example.
import numpy as npfrom pyversity import diversify, Strategy# Define embeddings and scores (e.g. cosine similarities of a query result)embeddings = np.random.randn(100, 256)scores = np.random.rand(100)# Diversify the resultdiversified_result = diversify( embeddings=embeddings, scores=scores, k=10, strategy=Strategy.MMR, diversity=0.5# Diversity parameter (higher values prioritize diversity))print("Diversified Indices:\n", diversified_result.indices)print("\nSelection Scores:\n", diversified_result.selection_scores)print("\nStrategy Used:", diversified_result.strategy)
Qdrant has built-in MMR support that helps diversify visual search results.
Let’s test it with a fashion dataset - we’ll search for “black jacket” and compare standard search (which might return very similar items) against MMR search (which balances relevance with variety).
We’ll use the DeepFashion dataset with CLIP embeddings for visual similarity.
Notice something? Most of these black jackets look very similar
Same style, similar cuts, nearly identical designs.
While they’re all highly relevant to our query, they don’t give users much variety to choose from. This is exactly the problem diversification solves. All scores ar around 0.288 to 0.278 very similar results
MMR Search – Diverse Black Jackets: "black jacket"
#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.240
a man in a black shirt is looking at the camera
Tees & Tanks
#3 • Score: 0.275
a man wearing a black jacket and plaid shirt
Jackets & Vests
#4 • Score: 0.275
a man in a blue jacket is posing for a picture
Jackets & Vests
#5 • Score: 0.242
a man wearing a hat and plaid pants
Jackets & Vests
#6 • Score: 0.254
a man in a black shirt and black pants
Jackets & Vests
MMR Search - Diversity in Action
Look at the difference! Instead of six nearly-identical black jackets, MMR gives us real variety: formal wear with ties, casual tees, layered looks, even a blue jacket and styled outfits with patterned pants.
Yes, some scores dropped slightly - but the browsing experience? Much better. Users can actually explore different styles instead of scrolling through clones.
Pyversity with Qdrant
Let’s try to use the Pyversity algorhtims with Qdrant engine
def apply_pyversity(qdrant_results, strategy=Strategy.MMR, k=10, **strategy_kwargs):"""Apply Pyversity diversification to Qdrant search results""" embeddings = np.array([point.vector for point in qdrant_results.points]) scores = np.array([point.score for point in qdrant_results.points]) diversified = diversify( embeddings=embeddings, scores=scores, k=k, strategy=strategy,**strategy_kwargs )# Return reordered results based on diversified indicesreturn [qdrant_results.points[i] for i in diversified.indices], diversified
def diversified_search(client, collection_name, query_embedding, strategy=Strategy.MMR, k=10, candidates_limit=100, **strategy_kwargs):"""Search Qdrant and apply Pyversity diversification""" results = client.query_points( collection_name=collection_name, query=query_embedding.tolist(), limit=candidates_limit, with_payload=True, with_vectors=True )# This should return the tuple from apply_pyversityreturn apply_pyversity(results, strategy=strategy, k=k, **strategy_kwargs)
After testing all five strategies on our fashion search, here’s what we observed:
MMR & MSD: Both provided good variety while maintaining relevance. MMR tends to be slightly faster and is a solid default choice. MSD pushes for even more spread across different styles.
DPP: Offers probabilistic diversity with a natural balance. Great when you want to eliminate near-duplicates while keeping results feeling “organic.”
COVER: Ensures broad coverage across the dataset. Best when you need to represent different clusters or categories, though it’s slower on large datasets.
SSD: Sequence-aware diversification. Perfect for feeds where users scroll through results over time - it avoids showing similar items close together.
Start with MMR for general use. Experiment with others based on your specific needs.
The Diversity vs. Relevance Trade-off
Diversification isn’t free - there’s always a balance:
Score drops: Notice how diversified results sometimes have lower similarity scores? That’s expected. We’re trading pure relevance for variety.
Computational cost: Fetching 100 candidates and diversifying to 10 is slower than just grabbing the top 10. But for most applications, the added latency (milliseconds) is worth the improved user experience.
Sweet spot: In our tests, fetching 100 candidates and diversifying to 5-10 results gave the best balance. Too few candidates limits diversity options; too many adds unnecessary overhead.
The payoff: Better user engagement, reduced bias, and more satisfied customers who feel they’re seeing real choices.