Diversifying Search Results with Pyversity and Qdrant

blogging

minishlab

embedding

qdrant

til

Diversifying search results with Qdrant and Pyversity for better RAG System with MMR, DDP and other algorthims

Author

kareem

Published

December 15, 2025

Why Diversify Search Results?

While building a real-estate search engine using RAG (Retrieval-Augmented Generation) across multiple collections, we hit an interesting problem: our top results were often too similar.

Example scenario: - User query: “I want a unit in New Cairo” - Top 5 results: 3 units from Palm Hills, 1 from Sodic, 1 from Radix

The issue? Our agent’s responses became heavily skewed toward Palm Hills properties, leading customers to believe we were manipulating results to favor specific developers.

The solution: Diversification algorithms like MMR (Maximal Marginal Relevance) help balance relevance with variety.

I found the amazing Pyversity - a lightweight Python library that implements multiple diversification strategies.

In this post, we’ll explore: 1. What Pyversity offers and how it works 2. Qdrant’s built-in MMR capabilities 3. Combining Pyversity with Qdrant for flexible diversification

Meet Pyversity: Your Diversification Toolkit

Pyversity is a lightweight library that solves a common problem: search results that all look the same. It re-ranks your results to surface items that are relevant and different from each other.

What makes it special? - Multiple strategies: MMR, MSD, DPP, COVER, and SSD - each with different strengths - Minimal dependencies: Just NumPy - Simple API: One function to rule them all

Let’s see it in action with a quick example.

import numpy as np
from pyversity import diversify, Strategy

# Define embeddings and scores (e.g. cosine similarities of a query result)
embeddings = np.random.randn(100, 256)
scores = np.random.rand(100)

# Diversify the result
diversified_result = diversify(
    embeddings=embeddings,
    scores=scores,
    k=10, 
    strategy=Strategy.MMR,
    diversity=0.5 # Diversity parameter (higher values prioritize diversity)
)


print("Diversified Indices:\n", diversified_result.indices)
print("\nSelection Scores:\n", diversified_result.selection_scores)
print("\nStrategy Used:", diversified_result.strategy)

Diversified Indices:
 [66 16 20 34 54 89 53 81 24 42]

Selection Scores:
 [0.4990284  0.49552712 0.49322212 0.4734265  0.44911325 0.44692624
 0.4468549  0.43787128 0.4272018  0.4173287 ]

Strategy Used: Strategy.MMR

Diversification Strategies

Strategy	What It Does	Time Complexity	Best For
MMR	Balances relevance with dissimilarity to already-selected items	O(k·n·d)	General purpose - fast and effective
MSD	Maximizes distance from all previous selections	O(k·n·d)	Broader topic coverage
DPP	Probabilistic sampling with built-in “repulsion”	O(k·n·d + n·k²)	Eliminating redundancy
COVER	Ensures selections represent the full dataset structure	O(k·n²)	Topic clustering (slower for large datasets)
SSD	Sequence-aware: rewards novelty relative to recent items	O(k·n·d)	Content feeds, infinite scroll, conversational RAG

Qdrant MMR in Action

Qdrant has built-in MMR support that helps diversify visual search results.

Let’s test it with a fashion dataset - we’ll search for “black jacket” and compare standard search (which might return very similar items) against MMR search (which balances relevance with variety).

We’ll use the DeepFashion dataset with CLIP embeddings for visual similarity.

In the end, we will use pyversity with Qdrant

Creating Fashion Embedding

def fashion_search_standard(query_text, limit=5):
    text_model = TextEmbedding(model_name="Qdrant/clip-ViT-B-32-text")
    query_embedding = list(text_model.embed([query_text]))[0]
    
    results = client.query_points(
        collection_name=collection_name,
        query=query_embedding.tolist(),
        limit=limit,
        with_payload=True
    )
    return results


STANDARD FASHION SEARCH: 'black jacket'

Standard Search Results for "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests

#3 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests

#4 • Score: 0.283

a man wearing a black jacket with a hood

Jackets & Vests

#5 • Score: 0.278

a man wearing a black jacket and jeans

Jackets & Vests

Standard Qdrant Search Results

Notice something? Most of these black jackets look very similar

Same style, similar cuts, nearly identical designs.

While they’re all highly relevant to our query, they don’t give users much variety to choose from. This is exactly the problem diversification solves. All scores ar around 0.288 to 0.278 very similar results

def fashion_search_mmr(query_text, limit=5, diversity=0.5):
    text_model = TextEmbedding(model_name="Qdrant/clip-ViT-B-32-text")
    query_embedding = list(text_model.embed([query_text]))[0]
    
    results = client.query_points(
        collection_name=collection_name,
        query=models.NearestQuery(
            nearest=query_embedding.tolist(),
            mmr=models.Mmr(
                diversity=diversity,  # 0.0 - relevance; 1.0 - diversity
                candidates_limit=100  # num of candidates to preselect
            )
        ),
        limit=limit,
        with_payload=True
    )
    return results

MMR Search – Diverse Black Jackets: "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.240

a man in a black shirt is looking at the camera

Tees & Tanks

#3 • Score: 0.275

a man wearing a black jacket and plaid shirt

Jackets & Vests

#4 • Score: 0.275

a man in a blue jacket is posing for a picture

Jackets & Vests

#5 • Score: 0.242

a man wearing a hat and plaid pants

Jackets & Vests

#6 • Score: 0.254

a man in a black shirt and black pants

Jackets & Vests

MMR Search - Diversity in Action

Look at the difference! Instead of six nearly-identical black jackets, MMR gives us real variety: formal wear with ties, casual tees, layered looks, even a blue jacket and styled outfits with patterned pants.

Yes, some scores dropped slightly - but the browsing experience? Much better. Users can actually explore different styles instead of scrolling through clones.

Pyversity with Qdrant

Let’s try to use the Pyversity algorhtims with Qdrant engine

def apply_pyversity(qdrant_results, strategy=Strategy.MMR, k=10, **strategy_kwargs):
    """Apply Pyversity diversification to Qdrant search results"""
    embeddings = np.array([point.vector for point in qdrant_results.points])
    scores = np.array([point.score for point in qdrant_results.points])
    
    diversified = diversify(
        embeddings=embeddings,
        scores=scores,
        k=k,
        strategy=strategy,
        **strategy_kwargs
    )
    
    # Return reordered results based on diversified indices
    return [qdrant_results.points[i] for i in diversified.indices], diversified

def diversified_search(client, collection_name, query_embedding, 
                       strategy=Strategy.MMR, k=10, 
                       candidates_limit=100, **strategy_kwargs):
    """Search Qdrant and apply Pyversity diversification"""
    results = client.query_points(
        collection_name=collection_name,
        query=query_embedding.tolist(),
        limit=candidates_limit,
        with_payload=True,
        with_vectors=True
    )
    
    # This should return the tuple from apply_pyversity
    return apply_pyversity(results, strategy=strategy, k=k, **strategy_kwargs)


============================================================
Testing MMR Strategy
============================================================

MMR Strategy: "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.240

a man in a black shirt is looking at the camera

Tees & Tanks

#3 • Score: 0.275

a man wearing a black jacket and plaid shirt

Jackets & Vests

#4 • Score: 0.275

a man in a blue jacket is posing for a picture

Jackets & Vests

#5 • Score: 0.242

a man wearing a hat and plaid pants

Jackets & Vests

#6 • Score: 0.254

a man in a black shirt and black pants

Jackets & Vests


Diversity Stats:
Strategy: Strategy.MMR
Selection Scores: [ 0.1440569  -0.21647353 -0.23149979 -0.23982394 -0.2528901  -0.2622754 ]

============================================================
Testing MSD Strategy
============================================================

MSD Strategy: "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.240

a man in a black shirt is looking at the camera

Tees & Tanks

#3 • Score: 0.278

a man wearing a black jacket and jeans

Jackets & Vests

#4 • Score: 0.246

a man sitting on top of a white cube

Jackets & Vests

#5 • Score: 0.242

a man wearing a baseball cap and a plaid shirt

Sweaters

#6 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests


Diversity Stats:
Strategy: Strategy.MSD
Selection Scores: [0.1440569  0.28352648 0.45831633 0.5803725  0.7433619  0.9287865 ]

============================================================
Testing DPP Strategy
============================================================

DPP Strategy: "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests

#3 • Score: 0.275

a man in a blue jacket is posing for a picture

Jackets & Vests

#4 • Score: 0.278

a man wearing a black jacket and jeans

Jackets & Vests

#5 • Score: 0.283

a man wearing a black jacket with a hood

Jackets & Vests

#6 • Score: 0.272

a man in a black jacket and black pants

Jackets & Vests


Diversity Stats:
Strategy: Strategy.DPP
Selection Scores: [11.763905   4.50202    1.778986   1.6289598  1.1995786  1.063326 ]

============================================================
Testing COVER Strategy
============================================================

COVER Strategy: "black jacket"

#1 • Score: 0.254

a man in a black jacket and red pants

Jackets & Vests

#2 • Score: 0.274

a man wearing a black jacket and a beanie

Jackets & Vests

#3 • Score: 0.277

a man in a black jacket and black pants

Jackets & Vests

#4 • Score: 0.266

a man in a white jacket and black pants

Jackets & Vests

#5 • Score: 0.271

a man in a black shirt and grey pants

Sweaters

#6 • Score: 0.244

a man wearing a duffle coat and red pants

Jackets & Vests


Diversity Stats:
Strategy: Strategy.COVER
Selection Scores: [45.971733  19.065933  14.6605215 12.34475   10.894392   9.850458 ]

============================================================
Testing SSD Strategy
============================================================

SSD Strategy: "black jacket"

#1 • Score: 0.288

a young man wearing a black jacket and tie

Jackets & Vests

#2 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests

#3 • Score: 0.287

a black leather jacket on a white background

Jackets & Vests

#4 • Score: 0.283

a man wearing a black jacket with a hood

Jackets & Vests

#5 • Score: 0.278

a man wearing a black jacket and jeans

Jackets & Vests

#6 • Score: 0.275

a man in a blue jacket is posing for a picture

Jackets & Vests


Diversity Stats:
Strategy: Strategy.SSD
Selection Scores: [1.939636  1.524907  1.3047718 1.2576772 1.1047928 1.0550355]

Comparing Diversification Strategies

After testing all five strategies on our fashion search, here’s what we observed:

MMR & MSD: Both provided good variety while maintaining relevance. MMR tends to be slightly faster and is a solid default choice. MSD pushes for even more spread across different styles.

DPP: Offers probabilistic diversity with a natural balance. Great when you want to eliminate near-duplicates while keeping results feeling “organic.”

COVER: Ensures broad coverage across the dataset. Best when you need to represent different clusters or categories, though it’s slower on large datasets.

SSD: Sequence-aware diversification. Perfect for feeds where users scroll through results over time - it avoids showing similar items close together.

Start with MMR for general use. Experiment with others based on your specific needs.

The Diversity vs. Relevance Trade-off

Diversification isn’t free - there’s always a balance:

Score drops: Notice how diversified results sometimes have lower similarity scores? That’s expected. We’re trading pure relevance for variety.

Computational cost: Fetching 100 candidates and diversifying to 10 is slower than just grabbing the top 10. But for most applications, the added latency (milliseconds) is worth the improved user experience.

Sweet spot: In our tests, fetching 100 candidates and diversifying to 5-10 results gave the best balance. Too few candidates limits diversity options; too many adds unnecessary overhead.

The payoff: Better user engagement, reduced bias, and more satisfied customers who feel they’re seeing real choices.

Why Diversify Search Results?

Meet Pyversity: Your Diversification Toolkit

Diversification Strategies

Qdrant MMR in Action

Creating Fashion Embedding

Standard Search Results for "black jacket"

Standard Qdrant Search Results

MMR Search – Diverse Black Jackets: "black jacket"

MMR Search - Diversity in Action

Pyversity with Qdrant

MMR Strategy: "black jacket"

MSD Strategy: "black jacket"

DPP Strategy: "black jacket"

COVER Strategy: "black jacket"

SSD Strategy: "black jacket"

Comparing Diversification Strategies

The Diversity vs. Relevance Trade-off

References