Diversifying Search Results with Pyversity and Qdrant

blogging
minishlab
embedding
qdrant
til
Diversifying search results with Qdrant and Pyversity for better RAG System with MMR, DDP and other algorthims
Author

kareem

Published

December 12, 2025

Why Diversify Search Results?

While building a real-estate search engine using RAG (Retrieval-Augmented Generation) across multiple collections, we hit an interesting problem: our top results were often too similar.

Example scenario: - User query: “I want a unit in New Cairo” - Top 5 results: 3 units from Palm Hills, 1 from Sodic, 1 from Radix

The issue? Our agent’s responses became heavily skewed toward Palm Hills properties, leading customers to believe we were manipulating results to favor specific developers.

The solution: Diversification algorithms like MMR (Maximal Marginal Relevance) help balance relevance with variety.

I found the amazing Pyversity - a lightweight Python library that implements multiple diversification strategies.

In this post, we’ll explore: 1. What Pyversity offers and how it works 2. Qdrant’s built-in MMR capabilities 3. Combining Pyversity with Qdrant for flexible diversification

Meet Pyversity: Your Diversification Toolkit

Pyversity is a lightweight library that solves a common problem: search results that all look the same. It re-ranks your results to surface items that are relevant and different from each other.

What makes it special? - Multiple strategies: MMR, MSD, DPP, COVER, and SSD - each with different strengths - Minimal dependencies: Just NumPy - Simple API: One function to rule them all

Let’s see it in action with a quick example.

import numpy as np
from pyversity import diversify, Strategy

# Define embeddings and scores (e.g. cosine similarities of a query result)
embeddings = np.random.randn(100, 256)
scores = np.random.rand(100)

# Diversify the result
diversified_result = diversify(
    embeddings=embeddings,
    scores=scores,
    k=10, 
    strategy=Strategy.MMR,
    diversity=0.5 # Diversity parameter (higher values prioritize diversity)
)


print("Diversified Indices:\n", diversified_result.indices)
print("\nSelection Scores:\n", diversified_result.selection_scores)
print("\nStrategy Used:", diversified_result.strategy)
Diversified Indices:
 [66 16 20 34 54 89 53 81 24 42]

Selection Scores:
 [0.4990284  0.49552712 0.49322212 0.4734265  0.44911325 0.44692624
 0.4468549  0.43787128 0.4272018  0.4173287 ]

Strategy Used: Strategy.MMR

Diversification Strategies

Strategy What It Does Time Complexity Best For
MMR Balances relevance with dissimilarity to already-selected items O(k·n·d) General purpose - fast and effective
MSD Maximizes distance from all previous selections O(k·n·d) Broader topic coverage
DPP Probabilistic sampling with built-in “repulsion” O(k·n·d + n·k²) Eliminating redundancy
COVER Ensures selections represent the full dataset structure O(k·n²) Topic clustering (slower for large datasets)
SSD Sequence-aware: rewards novelty relative to recent items O(k·n·d) Content feeds, infinite scroll, conversational RAG

Qdrant MMR in Action

Qdrant has built-in MMR support that helps diversify visual search results.

Let’s test it with a fashion dataset - we’ll search for “black jacket” and compare standard search (which might return very similar items) against MMR search (which balances relevance with variety).

We’ll use the DeepFashion dataset with CLIP embeddings for visual similarity.

In the end, we will use pyversity with Qdrant

Creating Fashion Embedding

def fashion_search_standard(query_text, limit=5):
    text_model = TextEmbedding(model_name="Qdrant/clip-ViT-B-32-text")
    query_embedding = list(text_model.embed([query_text]))[0]
    
    results = client.query_points(
        collection_name=collection_name,
        query=query_embedding.tolist(),
        limit=limit,
        with_payload=True
    )
    return results
    


STANDARD FASHION SEARCH: 'black jacket'

Standard Search Results for "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests
#3 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests
#4 • Score: 0.283
a man wearing a black jacket with a hood
Jackets & Vests
#5 • Score: 0.278
a man wearing a black jacket and jeans
Jackets & Vests

Standard Qdrant Search Results

Notice something? Most of these black jackets look very similar

Same style, similar cuts, nearly identical designs.

While they’re all highly relevant to our query, they don’t give users much variety to choose from. This is exactly the problem diversification solves. All scores ar around 0.288 to 0.278 very similar results

def fashion_search_mmr(query_text, limit=5, diversity=0.5):
    text_model = TextEmbedding(model_name="Qdrant/clip-ViT-B-32-text")
    query_embedding = list(text_model.embed([query_text]))[0]
    
    results = client.query_points(
        collection_name=collection_name,
        query=models.NearestQuery(
            nearest=query_embedding.tolist(),
            mmr=models.Mmr(
                diversity=diversity,  # 0.0 - relevance; 1.0 - diversity
                candidates_limit=100  # num of candidates to preselect
            )
        ),
        limit=limit,
        with_payload=True
    )
    return results

MMR Search – Diverse Black Jackets: "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.240
a man in a black shirt is looking at the camera
Tees & Tanks
#3 • Score: 0.275
a man wearing a black jacket and plaid shirt
Jackets & Vests
#4 • Score: 0.275
a man in a blue jacket is posing for a picture
Jackets & Vests
#5 • Score: 0.242
a man wearing a hat and plaid pants
Jackets & Vests
#6 • Score: 0.254
a man in a black shirt and black pants
Jackets & Vests

MMR Search - Diversity in Action

Look at the difference! Instead of six nearly-identical black jackets, MMR gives us real variety: formal wear with ties, casual tees, layered looks, even a blue jacket and styled outfits with patterned pants.

Yes, some scores dropped slightly - but the browsing experience? Much better. Users can actually explore different styles instead of scrolling through clones.

Pyversity with Qdrant

Let’s try to use the Pyversity algorhtims with Qdrant engine

def apply_pyversity(qdrant_results, strategy=Strategy.MMR, k=10, **strategy_kwargs):
    """Apply Pyversity diversification to Qdrant search results"""
    embeddings = np.array([point.vector for point in qdrant_results.points])
    scores = np.array([point.score for point in qdrant_results.points])
    
    diversified = diversify(
        embeddings=embeddings,
        scores=scores,
        k=k,
        strategy=strategy,
        **strategy_kwargs
    )
    
    # Return reordered results based on diversified indices
    return [qdrant_results.points[i] for i in diversified.indices], diversified
def diversified_search(client, collection_name, query_embedding, 
                       strategy=Strategy.MMR, k=10, 
                       candidates_limit=100, **strategy_kwargs):
    """Search Qdrant and apply Pyversity diversification"""
    results = client.query_points(
        collection_name=collection_name,
        query=query_embedding.tolist(),
        limit=candidates_limit,
        with_payload=True,
        with_vectors=True
    )
    
    # This should return the tuple from apply_pyversity
    return apply_pyversity(results, strategy=strategy, k=k, **strategy_kwargs)

============================================================
Testing MMR Strategy
============================================================

MMR Strategy: "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.240
a man in a black shirt is looking at the camera
Tees & Tanks
#3 • Score: 0.275
a man wearing a black jacket and plaid shirt
Jackets & Vests
#4 • Score: 0.275
a man in a blue jacket is posing for a picture
Jackets & Vests
#5 • Score: 0.242
a man wearing a hat and plaid pants
Jackets & Vests
#6 • Score: 0.254
a man in a black shirt and black pants
Jackets & Vests

Diversity Stats:
Strategy: Strategy.MMR
Selection Scores: [ 0.1440569  -0.21647353 -0.23149979 -0.23982394 -0.2528901  -0.2622754 ]

============================================================
Testing MSD Strategy
============================================================

MSD Strategy: "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.240
a man in a black shirt is looking at the camera
Tees & Tanks
#3 • Score: 0.278
a man wearing a black jacket and jeans
Jackets & Vests
#4 • Score: 0.246
a man sitting on top of a white cube
Jackets & Vests
#5 • Score: 0.242
a man wearing a baseball cap and a plaid shirt
Sweaters
#6 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests

Diversity Stats:
Strategy: Strategy.MSD
Selection Scores: [0.1440569  0.28352648 0.45831633 0.5803725  0.7433619  0.9287865 ]

============================================================
Testing DPP Strategy
============================================================

DPP Strategy: "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests
#3 • Score: 0.275
a man in a blue jacket is posing for a picture
Jackets & Vests
#4 • Score: 0.278
a man wearing a black jacket and jeans
Jackets & Vests
#5 • Score: 0.283
a man wearing a black jacket with a hood
Jackets & Vests
#6 • Score: 0.272
a man in a black jacket and black pants
Jackets & Vests

Diversity Stats:
Strategy: Strategy.DPP
Selection Scores: [11.763905   4.50202    1.778986   1.6289598  1.1995786  1.063326 ]

============================================================
Testing COVER Strategy
============================================================

COVER Strategy: "black jacket"

#1 • Score: 0.254
a man in a black jacket and red pants
Jackets & Vests
#2 • Score: 0.274
a man wearing a black jacket and a beanie
Jackets & Vests
#3 • Score: 0.277
a man in a black jacket and black pants
Jackets & Vests
#4 • Score: 0.266
a man in a white jacket and black pants
Jackets & Vests
#5 • Score: 0.271
a man in a black shirt and grey pants
Sweaters
#6 • Score: 0.244
a man wearing a duffle coat and red pants
Jackets & Vests

Diversity Stats:
Strategy: Strategy.COVER
Selection Scores: [45.971733  19.065933  14.6605215 12.34475   10.894392   9.850458 ]

============================================================
Testing SSD Strategy
============================================================

SSD Strategy: "black jacket"

#1 • Score: 0.288
a young man wearing a black jacket and tie
Jackets & Vests
#2 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests
#3 • Score: 0.287
a black leather jacket on a white background
Jackets & Vests
#4 • Score: 0.283
a man wearing a black jacket with a hood
Jackets & Vests
#5 • Score: 0.278
a man wearing a black jacket and jeans
Jackets & Vests
#6 • Score: 0.275
a man in a blue jacket is posing for a picture
Jackets & Vests

Diversity Stats:
Strategy: Strategy.SSD
Selection Scores: [1.939636  1.524907  1.3047718 1.2576772 1.1047928 1.0550355]

Comparing Diversification Strategies

After testing all five strategies on our fashion search, here’s what we observed:

MMR & MSD: Both provided good variety while maintaining relevance. MMR tends to be slightly faster and is a solid default choice. MSD pushes for even more spread across different styles.

DPP: Offers probabilistic diversity with a natural balance. Great when you want to eliminate near-duplicates while keeping results feeling “organic.”

COVER: Ensures broad coverage across the dataset. Best when you need to represent different clusters or categories, though it’s slower on large datasets.

SSD: Sequence-aware diversification. Perfect for feeds where users scroll through results over time - it avoids showing similar items close together.


Start with MMR for general use. Experiment with others based on your specific needs.

The Diversity vs. Relevance Trade-off

Diversification isn’t free - there’s always a balance:

Score drops: Notice how diversified results sometimes have lower similarity scores? That’s expected. We’re trading pure relevance for variety.

Computational cost: Fetching 100 candidates and diversifying to 10 is slower than just grabbing the top 10. But for most applications, the added latency (milliseconds) is worth the improved user experience.

Sweet spot: In our tests, fetching 100 candidates and diversifying to 5-10 results gave the best balance. Too few candidates limits diversity options; too many adds unnecessary overhead.

The payoff: Better user engagement, reduced bias, and more satisfied customers who feel they’re seeing real choices.

References

  1. Qdrant MMR
  2. Pyversity