Index Types & Selection

Choose the right index for your vector search. HNSW, IVF, DISKANN, and more — with selection guidelines.

35m20m reading15m lab

Index Types & Selection

Choosing the right index is the single most important performance decision. It affects:
  • Query latency — How fast searches return
  • Recall accuracy — How many true neighbors are found
  • Memory usage — How much RAM is required
  • Build time — How long index creation takes

Index Comparison

IndexBest ForRecallMemoryBuild TimeQuery Speed
FLATSmall datasets, 100% recall100%NoneSlow
HNSWGeneral purpose, high recall95-99%1.5-2×MediumFast
IVF_FLATBalanced speed/recall90-95%FastMedium
IVF_SQ8Memory constrained85-90%0.25×FastMedium
IVF_PQVery large datasets80-90%0.1×SlowMedium
DISKANNBillion-scale, SSD-based90-95%0.3×SlowMedium
Best for: Most production use cases

index_params.add_index(
    field_name="embedding",
    index_type="HNSW",
    metric_type="COSINE",
    params={
        "M": 16,              # Max connections (2-64)
        "efConstruction": 200 # Search scope at build (100-800)
    }
)

# Search-time parameter
search_params = {
    "metric_type": "COSINE",
    "params": {"ef": 64}     # Search scope (top_k to 10×top_k)
}

Parameters explained:
ParameterDefaultRangeEffect
M162-64Higher = better recall, more memory
efConstruction200100-800Higher = better recall, slower build
ef (search)64top_k to 10×top_kHigher = better recall, slower
Tuning guide:
Dataset SizeMefConstructionef (search)
<1M8-16100-20032-64
1M-10M16-32200-40064-128
10M-100M32-48400-600128-256
>100M48-64600-800256-512

IVF Indexes

Best for: Memory-constrained environments, known query distribution

IVF_FLAT

index_params.add_index(
    field_name="embedding",
    index_type="IVF_FLAT",
    metric_type="L2",
    params={"nlist": 1024}  # Number of clusters
)

search_params = {
    "metric_type": "L2",
    "params": {"nprobe": 16}  # Clusters to search
}

IVF_SQ8 (Quantized)

index_params.add_index(
    field_name="embedding",
    index_type="IVF_SQ8",
    metric_type="L2",
    params={"nlist": 1024}
)

Memory savings: 75% reduction vs IVF_FLAT

IVF_PQ (Product Quantization)

index_params.add_index(
    field_name="embedding",
    index_type="IVF_PQ",
    metric_type="L2",
    params={
        "nlist": 1024,
        "m": 16,       # Sub-quantizers
        "nbits": 8     # Bits per quantizer
    }
)

Memory savings: 90% reduction, but lower recall

DISKANN (Disk-Based)

Best for: Billion-scale datasets that don't fit in memory

index_params.add_index(
    field_name="embedding",
    index_type="DISKANN",
    metric_type="COSINE",
    params={
        "search_list_size": 16,   # Candidates to consider
        "pq_code_budget_gb": 1,   # PQ memory budget
        "build_dram_budget_gb": 16 # Build memory budget
    }
)

Requirements:
  • SSD storage (NVMe preferred)
  • Enable mmap in queryNode config
  • Higher latency than HNSW but much less RAM

Metric Types

MetricUse WhenFormula
L2Physical distancesEuclidean distance
IPCosine-like similarityInner product
COSINEDirection mattersNormalized IP
HAMMINGBinary vectorsBit differences
JACCARDBinary similaritySet intersection
Most common: COSINE for text embeddings, L2 for image embeddings

Index Selection Decision Tree

Dataset fits in memory?
├── NO → DISKANN
└── YES → Need 100% recall?
    ├── YES → FLAT
    └── NO → Query latency critical?
        ├── YES → HNSW
        └── NO → Memory constrained?
            ├── YES → IVF_SQ8 or IVF_PQ
            └── NO → IVF_FLAT or HNSW

Performance Testing

Before choosing, benchmark on your data:

def benchmark_index(collection_name, search_params, test_vectors, ground_truth):
    """Benchmark index performance and recall."""
    
    # Warmup
    client.search(collection_name, data=[test_vectors[0]], limit=10)
    
    # Latency test
    import time
    latencies = []
    for vec in test_vectors[:100]:
        start = time.time()
        client.search(collection_name, data=[vec], limit=10, search_params=search_params)
        latencies.append((time.time() - start) * 1000)
    
    # Recall test
    recalls = []
    for i, vec in enumerate(test_vectors):
        results = client.search(
            collection_name, 
            data=[vec], 
            limit=10,
            search_params=search_params
        )
        returned_ids = {r["id"] for r in results[0]}
        true_ids = set(ground_truth[i][:10])
        recall = len(returned_ids & true_ids) / 10
        recalls.append(recall)
    
    return {
        "p50_latency": sorted(latencies)[50],
        "p99_latency": sorted(latencies)[99],
        "avg_recall": sum(recalls) / len(recalls)
    }

Best Practices

  1. 1 Start with HNSW — Works well for most cases
  2. 2 Benchmark before deciding — Test on your actual data
  3. 3 Match index to metric — Some indexes only support specific metrics
  4. 4 Plan for growth — Choose index that works at target scale
  5. 5 Monitor build time — Large IVF_PQ builds can take hours

Next Steps

Learn about query optimization:

Query Performance

Discussion