Operations on Qdrant - Vector Search Engine

Capacity Planning

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Capacity Planning

When setting up your cluster, you’ll need to figure out the right balance of RAM and disk storage. The best setup depends on a few things:

How many vectors you have and their dimensions.
The amount of payload data you’re using and their indexes.
What data you want to store in memory versus on disk.
Your cluster’s replication settings.
Whether you’re using quantization and how you’ve set it up.

Calculating RAM size

You should store frequently accessed data in RAM for faster retrieval. If you want to keep all vectors in memory for optimal performance, you can use this rough formula for estimation:

Installation

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Installation requirements

The following sections describe the requirements for deploying Qdrant.

CPU and memory

The preferred size of your CPU and RAM depends on:

Number of vectors
Vector dimensions
Payloads and their indexes
Storage
Replication
How you configure quantization

Our Cloud Pricing Calculator can help you estimate required resources without payload or index data.

Supported CPU architectures:

64-bit system:

x86_64/amd64
AArch64/arm64

32-bit system:

Not supported

Storage

For persistent storage, Qdrant requires block-level access to storage devices with a POSIX-compatible file system. Network systems such as iSCSI that provide block-level access are also acceptable. Qdrant won’t work with Network file systems such as NFS, or Object storage systems such as S3.

Upgrades

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Upgrading Qdrant

If you are several versions behind, multiple updates might be required to reach the latest version. When upgrading Qdrant, upgrade to the latest patch version of each intermediate minor version first. For example, if you are running version 1.15 and want to upgrade to 1.17, you must first upgrade all cluster nodes to 1.16.3 before upgrading to 1.17. A Qdrant node with version 1.17 will be compatible with a node with version 1.16, but not with a node with version 1.15. If you run a single node cluster, you also can not skip versions to ensure that all data migrations are properly applied. Qdrant Cloud does this automatically for you.

Snapshots

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Snapshots

Available as of v0.8.4

Snapshots are tar archive files that contain data and configuration of a specific collection on a specific node at a specific time. In a distributed setup, when you have multiple nodes in your cluster, you must create snapshots for each node separately when dealing with a single collection.

This feature can be used to archive data or easily replicate an existing deployment. For disaster recovery, Qdrant Cloud users may prefer to use Backups instead, which are physical disk-level copies of your data.

Usage Statistics

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Usage statistics

The Qdrant open-source container image collects anonymized usage statistics from users in order to improve the engine by default. You can deactivate at any time, and any data that has already been collected can be deleted on request.

Deactivating this will not affect your ability to monitor the Qdrant database yourself by accessing the /metrics or /telemetry endpoints of your database. It will just stop sending independent, anonymized usage statistics to the Qdrant team.

Monitoring & Telemetry

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Monitoring & Telemetry

Qdrant exposes its metrics in Prometheus/OpenMetrics format, so you can integrate them easily with the compatible tools and monitor Qdrant with your own monitoring system. You can use the /metrics endpoint and configure it as a scrape target.

Metrics endpoint: http://localhost:6333/metrics

The integration with Qdrant is easy to configure with Prometheus and Grafana.

Metrics

Qdrant exposes various metrics in Prometheus/OpenMetrics format, commonly used together with Grafana for monitoring.

Security

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Security

Qdrant supports various security features to help you secure your instance. Most of these must to be explicitly configured to make your instance production ready. Please read the following section carefully.

Secure Your Instance

By default, all self-deployed Qdrant instances are not secure. They are open to all network interfaces and do not have any kind of authentication configured. They may be open to everybody on the internet without any restrictions. You must therefore take security measures to make your instance production-ready. Please read through this section carefully for instructions on how to secure your instance.

Troubleshooting

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Solving common errors

Too many files open (OS error 24)

Each collection segment needs some files to be open. At some point you may encounter the following errors in your server log:

Error: Too many files open (OS error 24)

In such a case you may need to increase the limit of the open files. It might be done, for example, while you launch the Docker container:

docker run --ulimit nofile=10000:10000 qdrant/qdrant:latest

The command above will set both soft and hard limits to 10000.

Configuration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Configuration

Qdrant ships with sensible defaults for collection and network settings that are suitable for most use cases. You can view these defaults in the Qdrant source. If you need to customize the settings, you can do so using configuration files and environment variables.

Configuration Files

To customize Qdrant, you can mount your configuration file in any of the following locations. This guide uses .yaml files, but Qdrant also supports other formats such as .toml, .json, and .ini.

Administration

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Administration

Qdrant exposes administration tools which enable to modify at runtime the behavior of a qdrant instance without changing its configuration manually.

Recovery mode

Available as of v1.2.0

Recovery mode can help in situations where Qdrant fails to start repeatedly. When starting in recovery mode, Qdrant only loads collection metadata to prevent going out of memory. This allows you to resolve out of memory situations, for example, by deleting a collection. After resolving Qdrant can be restarted normally to continue operation.

Distributed Deployment

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Distributed deployment

Since version v0.8.0 Qdrant supports a distributed deployment mode. In this mode, multiple Qdrant services communicate with each other to distribute the data across the peers to extend the storage capabilities and increase stability.

How many Qdrant nodes should I run?

The ideal number of Qdrant nodes depends on how much you value cost-saving, resilience, and performance/scalability in relation to each other.

Prioritizing cost-saving: If cost is most important to you, run a single Qdrant node. This is not recommended for production environments. Drawbacks:

Running with GPU

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Running Qdrant with GPU Support

Starting from version v1.13.0, Qdrant offers support for GPU acceleration.

However, GPU support is not included in the default Qdrant binary due to additional dependencies and libraries. Instead, you will need to use dedicated Docker images with GPU support (NVIDIA, AMD).

Configuration

Qdrant includes a number of configuration options to control GPU usage. The following options are available:

gpu:
 # Enable GPU indexing.
 indexing: false
 # Force half precision for `f32` values while indexing.
 # `f16` conversion will take place 
 # only inside GPU memory and won't affect storage type.
 force_half_precision: false
 # Used vulkan "groups" of GPU. 
 # In other words, how many parallel points can be indexed by GPU.
 # Optimal value might depend on the GPU model.
 # Proportional, but doesn't necessary equal
 # to the physical number of warps.
 # Do not change this value unless you know what you are doing.
 # Default: 512
 groups_count: 512
 # Filter for GPU devices by hardware name. Case insensitive.
 # Comma-separated list of substrings to match 
 # against the gpu device name.
 # Example: "nvidia"
 # Default: "" - all devices are accepted.
 device_filter: ""
 # List of explicit GPU devices to use.
 # If host has multiple GPUs, this option allows to select specific devices
 # by their index in the list of found devices.
 # If `device_filter` is set, indexes are applied after filtering.
 # By default, all devices are accepted.
 devices: null
 # How many parallel indexing processes are allowed to run.
 # Default: 1
 parallel_indexes: 1
 # Allow to use integrated GPUs.
 # Default: false
 allow_integrated: false
 # Allow to use emulated GPUs like LLVMpipe. Useful for CI.
 # Default: false
 allow_emulated: false

It is not recommended to change these options unless you are familiar with the Qdrant internals and the Vulkan API.

Optimize Performance

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Optimizing Qdrant Performance: Three Scenarios

Different use cases require different balances between memory usage, search speed, and precision. Qdrant is designed to be flexible and customizable so you can tune it to your specific needs.

This guide will walk you three main optimization strategies:

High Speed Search & Low Memory Usage
High Precision & Low Memory Usage
High Precision & High Speed Search

1. High-Speed Search with Low Memory Usage

To achieve high search speed with minimal memory usage, you can store vectors on disk while minimizing the number of disk reads. Vector quantization is a technique that compresses vectors, allowing more of them to be stored in memory, thus reducing the need to read from disk.

Optimizer

info@qdrant.tech (Andrey Vasnetsov) — Mon, 01 Jan 0001 00:00:00 +0000

Optimizer

It is much more efficient to apply changes in batches than perform each change individually, as many other databases do. Qdrant here is no exception. Since Qdrant operates with data structures that are not always easy to change, it is sometimes necessary to rebuild those structures completely.

Storage optimization in Qdrant occurs at the segment level (see storage). In this case, the segment to be optimized remains readable for the time of the rebuild.