Explore the Qdrant Blog core concepts | Part 1

blogging

til

qdrant

This a 2 hours daily exploring and summarizing Qdrant Blogs part 1

Author

kareem

Published

May 18, 2025

Overview of Qdrant features and Concepts

I will divide the blogs into 3 types:

Startups using Qdrant: why and comparision you get read nice usecases with some numbers of comparision and how qdrant is amazing
Startup using Qdrant++: The same but with code snippets and System Design discussion..very useful for me as a developer
Qdrant releases: New features in Qdrant and how to use them.

Remember this when i will build Agentic RAG in the new job

users tend to ask more structured, analytical questions when they know a database is involved—queries better suited to SQL than vector search. This prompted the team to pair Qdrant with a text-to-SQL system, blending unstructured and structured query capabilities for a more versatile agent.

Hotel Search with Vectors

Superlinked enhances search by embedding each attribute (text, numbers, categories) into specialized spaces, enabling nuanced, multi-attribute queries.

An LLM interprets user intent, assigning weights to preferences (e.g., price, rating), allowing flexible, business-driven ranking without system redesign.

Hard filters narrow results, while weighted nearest neighbor search ranks them by user preferences.

This unified approach supports multimodal search—combining semantic text, scaled numerical, and categorical data—preserving relationships and preference strengths.

Unlike traditional systems that separate or flatten data, Superlinked enables simultaneous, weighted consideration of all attributes, solving challenges like reconciling

results across types and capturing nuanced user intent.

Reciprocal Rank Rusion (RRF)

Qdrant’s native support for Reciprocal Rank Fusion (RRF) streamlined their retriever implementations, reducing hybrid search code by 80%. The multi-vector capabilities also enabled more sophisticated retrieval methods that better captured semantic relationships.

Qdrant 1.13 GPU Indexing

Here is summary of these new features

GPU Accelerated Indexing with Qdrant

You can Index over all majro GPU vendors including NVIDIA,AMD and Intel that support Vulkan API to get speeds up to 10x faster than CPU-based methods*

As of right now this solution supports only on-premises deployments, but they will introduce support for Qdrant Cloud shortly.

Additional benefits:

Multi-GPU support
GPU indexing supports all quantization options and datatypes in Qdrant

Strict mode for Opertional Control

Strict Mode enforces operational controls in distributed Qdrant deployments. It limits resource-intensive operations (like unindexed filtering and large batch sizes), sets boundaries on search parameters, and adds safeguards for payload sizes and timeouts. This prevents system overload, solves the “noisy neighbor” problem, and ensures reliable performance—especially in multi-tenant or serverless environments.

HNSW Graph Compression

Make search lighter on memory wihtout sacrificing speed with Delta Encoding.

Delta Encoding is a clever way to compress data by storing only the differences (or “deltas”) between values. It’s commonly used in search engines (for the classical inverted index) to save space and improve performance. I think i have read this with Colbertv2 using similar techniques to reduce the siz it’s called residual compression mechanism needs more searching

It’s now used for HNSW graph structure that powers Qdrant’s search.

Static Embedding with Qdrant and Model2vec

Static embedding from minishLab reduce the model size with 15x reduction and up to 500x speed increase while the maintain more than 85% of the performance levels. it’s like our zaraah model for arabic.

Static embedding are dense embedding so you can also use with qdrant collections. The retrieval is not going to be any faster becuase static embeddings. but the speedup is in creating the vectors from your data and encoding the queries.

If you want to make the retrieval faster use the following: 1. Matryoshka Embeddings 2. Quantization methods like (Scalar and Binary Quantization) ### When to use Static Embeddings ?

Mobile applications - although many smartphones have powerful CPUs or even GPUs, the battery life is still a concern, and the static embeddings might be a good compromise between the quality and the power consumption. Moreover, the static embeddings can be used in the applications that require offline mode.
Web browser extensions - running a transformer-based model in a web browser is usually not quite an option, but static embeddings might be a good choice, as they have fewer parameters and are faster to encode.
Embedded systems - the static embeddings might be a good choice for the devices with limited computational power, such as IoT devices or microcontrollers.

References:

There is more text in here from Qdrant not me..you can continue reading here