<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Module 3: Scalability and Optimization on Qdrant - Vector Search Engine</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/</link><description>Recent content in Module 3: Scalability and Optimization on Qdrant - Vector Search Engine</description><generator>Hugo</generator><language>en-us</language><managingEditor>info@qdrant.tech (Andrey Vasnetsov)</managingEditor><webMaster>info@qdrant.tech (Andrey Vasnetsov)</webMaster><atom:link href="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/index.xml" rel="self" type="application/rss+xml"/><item><title>Multi-Stage Retrieval with Universal Query API</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/multi-stage-retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/multi-stage-retrieval/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="multi-stage-retrieval-with-universal-query-api">Multi-Stage Retrieval with Universal Query API&lt;/h1>
&lt;p>The most effective production deployments combine multiple optimization techniques in multi-stage pipelines. Fast approximate methods retrieve candidates, which are then reranked with higher-quality methods.&lt;/p>
&lt;p>Qdrant&amp;rsquo;s Universal Query API makes it easy to build sophisticated multi-stage retrieval systems.&lt;/p>
&lt;hr>
&lt;div class="video">
&lt;iframe
 src="https://www.youtube-nocookie.com/embed/qIjPepsY35E?rel=0"
 frameborder="0"
 allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
 referrerpolicy="strict-origin-when-cross-origin"
 allowfullscreen>
&lt;/iframe>
&lt;/div>
&lt;hr>
&lt;p>&lt;strong>Follow along in Colab:&lt;/strong> &lt;a href="https://colab.research.google.com/github/qdrant/examples/blob/master/course-multi-vector-search/module-3/multi-stage-retrieval.ipynb">
&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" style="display:inline; margin:0;" alt="Open In Colab"/>
&lt;/a>&lt;/p>
&lt;hr>
&lt;h2 id="why-multi-stage-retrieval">Why Multi-Stage Retrieval?&lt;/h2>
&lt;p>You&amp;rsquo;ve learned that multi-vector representations like ColBERT provide superior search quality compared to single-vector embeddings. But there&amp;rsquo;s a challenge: &lt;strong>computing MaxSim for every document in a large collection is expensive&lt;/strong>.&lt;/p></description></item><item><title>Vector Quantization Techniques</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/quantization-techniques/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/quantization-techniques/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="vector-quantization-techniques">Vector Quantization Techniques&lt;/h1>
&lt;p>Vector quantization compresses vectors by reducing the precision of each component. Qdrant supports several quantization methods that can reduce memory usage by 4-64x, sometimes with minimal quality loss.&lt;/p>
&lt;p>Choosing the right quantization method depends on your quality requirements and memory constraints.&lt;/p>
&lt;hr>
&lt;div class="video">
&lt;iframe
 src="https://www.youtube-nocookie.com/embed/we-AEfiXaow?rel=0"
 frameborder="0"
 allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
 referrerpolicy="strict-origin-when-cross-origin"
 allowfullscreen>
&lt;/iframe>
&lt;/div>
&lt;hr>
&lt;p>&lt;strong>Follow along in Colab:&lt;/strong> &lt;a href="https://colab.research.google.com/github/qdrant/examples/blob/master/course-multi-vector-search/module-3/quantization-techniques.ipynb">
&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" style="display:inline; margin:0;" alt="Open In Colab"/>
&lt;/a>&lt;/p>
&lt;hr>
&lt;h2 id="the-memory-challenge-with-multi-vector-models">The Memory Challenge with Multi-Vector Models&lt;/h2>
&lt;p>By default, embedding models produce vectors with &lt;strong>float32 precision&lt;/strong> - each component uses 32 bits (4 bytes) of memory. For single-vector embeddings, this is manageable. But multi-vector models like &lt;strong>ColModernVBERT&lt;/strong> change the equation dramatically.&lt;/p></description></item><item><title>Pooling Techniques</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/pooling-techniques/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/pooling-techniques/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="pooling-techniques">Pooling Techniques&lt;/h1>
&lt;p>While quantization reduces the size of each vector, pooling reduces the number of vectors per document. By intelligently combining token embeddings, you can achieve significant memory savings while preserving retrieval quality.&lt;/p>
&lt;hr>
&lt;div class="video">
&lt;iframe
 src="https://www.youtube-nocookie.com/embed/idDXBOrIuik?rel=0"
 frameborder="0"
 allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
 referrerpolicy="strict-origin-when-cross-origin"
 allowfullscreen>
&lt;/iframe>
&lt;/div>
&lt;hr>
&lt;p>&lt;strong>Follow along in Colab:&lt;/strong> &lt;a href="https://colab.research.google.com/github/qdrant/examples/blob/master/course-multi-vector-search/module-3/pooling-techniques.ipynb">
&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" style="display:inline; margin:0;" alt="Open In Colab"/>
&lt;/a>&lt;/p>
&lt;hr>
&lt;h2 id="pooling-in-embedding-models">Pooling in Embedding Models&lt;/h2>
&lt;p>Pooling isn&amp;rsquo;t new to vector search - it&amp;rsquo;s fundamental to how most embedding models work. When you encode text with models like Sentence Transformers, the model first generates embeddings for each token in your input. But to create a single vector representing the entire text, the model must &lt;strong>pool&lt;/strong> these token embeddings together.&lt;/p></description></item><item><title>MUVERA</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/muvera/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/muvera/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="muvera">MUVERA&lt;/h1>
&lt;p>MUVERA (Multi-Vector Retrieval with Approximation) solves a fundamental problem: MaxSim&amp;rsquo;s asymmetry makes traditional indexing methods like HNSW ineffective. MUVERA enables fast approximate search for multi-vector representations.&lt;/p>
&lt;p>Understanding MUVERA is key to scaling multi-vector search to millions of documents.&lt;/p>
&lt;hr>
&lt;div class="video">
&lt;iframe
 src="https://www.youtube-nocookie.com/embed/-r0Apuy0c8k?rel=0"
 frameborder="0"
 allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
 referrerpolicy="strict-origin-when-cross-origin"
 allowfullscreen>
&lt;/iframe>
&lt;/div>
&lt;hr>
&lt;p>&lt;strong>Follow along in Colab:&lt;/strong> &lt;a href="https://colab.research.google.com/github/qdrant/examples/blob/master/course-multi-vector-search/module-3/muvera.ipynb">
&lt;img src="https://colab.research.google.com/assets/colab-badge.svg" style="display:inline; margin:0;" alt="Open In Colab"/>
&lt;/a>&lt;/p>
&lt;hr>
&lt;h2 id="the-hnsw-incompatibility-problem">The HNSW Incompatibility Problem&lt;/h2>
&lt;p>Traditional vector indexes like HNSW are designed for single-vector search with symmetric distance metrics. Multi-vector representations break this assumption: &lt;strong>MaxSim is inherently asymmetric and non-metric&lt;/strong>.&lt;/p></description></item><item><title>Evaluating Search Pipelines</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/evaluating-pipelines/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/evaluating-pipelines/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="evaluating-search-pipelines">Evaluating Search Pipelines&lt;/h1>
&lt;p>Throughout this module, you&amp;rsquo;ve learned many optimization techniques: quantization to reduce memory, pooling to compress representations, MUVERA for efficient indexing, and multi-stage retrieval to balance speed with accuracy. But how do you know which combination is right for &lt;em>your&lt;/em> data?&lt;/p>
&lt;p>The answer lies in systematic evaluation across three dimensions: &lt;strong>cost&lt;/strong> (memory and compute resources), &lt;strong>latency&lt;/strong> (query response time), and &lt;strong>quality&lt;/strong> (retrieval accuracy). Cost and latency are straightforward to measure - you can observe memory usage and time queries directly. Quality, however, requires a more principled approach: you need to measure whether your system returns the &lt;em>right&lt;/em> documents.&lt;/p></description></item><item><title>Final Project: Build Your Own Multi-Vector Search System</title><link>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/final-project/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><author>info@qdrant.tech (Andrey Vasnetsov)</author><guid>https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/course/multi-vector-search/module-3/final-project/</guid><description>&lt;div class="date">
 &lt;img class="date-icon" src="https://deploy-preview-2256--condescending-goldwasser-91acf0.netlify.app/icons/outline/date-blue.svg" alt="Calendar" /> Module 3 
&lt;/div>

&lt;h1 id="final-project-build-your-own-multi-vector-search-system">Final Project: Build Your Own Multi-Vector Search System&lt;/h1>
&lt;hr>
&lt;h2 id="your-mission">Your Mission&lt;/h2>
&lt;p>It&amp;rsquo;s time to bring together everything you&amp;rsquo;ve learned about multi-vector search, late interaction models, and production optimization. You&amp;rsquo;ll build a sophisticated document retrieval system that leverages late interaction&amp;rsquo;s token-level matching for superior search quality.&lt;/p>
&lt;p>Your search engine will understand the nuanced relationships between query terms and document content. When someone searches for &amp;ldquo;machine learning applications in healthcare,&amp;rdquo; your system will find documents that discuss relevant concepts even when they use different terminology, thanks to late interaction&amp;rsquo;s fine-grained matching.&lt;/p></description></item></channel></rss>