Pulsar/Kafka — Message Queue

Milvus uses a message queue for:

Write-ahead logging (WAL) — All mutations are logged before processing
Event streaming — Components communicate via pub/sub
Recovery — Rebuild state from logs after failures

Why Message Queues Matter

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│    Proxy    │────▶│   Pulsar    │◀────│  Streaming  │
│  (produces  │     │    Topic    │     │    Node     │
│   messages) │     │             │     │  (consumes) │
└─────────────┘     └─────────────┘     └─────────────┘
                            │
                     ┌──────┴──────┐
                     │  Retention  │
                     │  (hours/days│
                     │   of data)  │
                     └─────────────┘

Without the message queue, Milvus would lose data on component crashes.

Pulsar vs Kafka

Feature	Pulsar (Default)	Kafka (Alternative)
Multi-tenancy	Native	Requires setup
Geo-replication	Built-in	MirrorMaker
Storage separation	BookKeeper + Broker	Unified
Scaling	Scale brokers independently	Rebalance required
Milvus default	✅ Yes	❌ Optional
Complexity	Higher	Lower

When to Use Kafka

Already have Kafka expertise
Existing Kafka infrastructure
Simpler operational model preferred
Single-tenant deployment

Pulsar Deployment

Standalone (Development)

services:
  pulsar:
    image: apachepulsar/pulsar:3.0.7
    command: >
      /bin/bash -c "
      bin/apply-config-from-env.py conf/standalone.conf &&
      exec bin/pulsar standalone"
    environment:
      PULSAR_STANDALONE_USE_ZOOKEEPER: "false"
      PULSAR_MEM: " -Xms512m -Xmx512m"
    volumes:
      - pulsar-data:/pulsar/data
    ports:
      - "6650:6650"   # Binary protocol
      - "8080:8080"   # HTTP API

Cluster Mode (Production)

# ZooKeeper (metadata)
zookeeper:
  image: apachepulsar/pulsar:3.0.7
  command: bin/pulsar-daemon start zookeeper
  # 3 nodes recommended

# BookKeeper (storage)
bookie:
  image: apachepulsar/pulsar:3.0.7
  command: bin/pulsar-daemon start bookie
  environment:
    zkServers: zookeeper:2181
  volumes:
    - bookie-data:/pulsar/data/bookkeeper
  # 3+ nodes recommended

# Broker
broker:
  image: apachepulsar/pulsar:3.0.7
  command: bin/pulsar-daemon start broker
  environment:
    zookeeperServers: zookeeper:2181
    configurationStoreServers: zookeeper:2181
  # 2+ nodes recommended

Kafka Deployment

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true"

Milvus Configuration

Using Pulsar

# milvus.yaml
pulsar:
  address: pulsar://pulsar:6650
  webAddress: http://pulsar:8080
  maxMessageSize: 5242880          # 5MB
  tenant: public
  namespace: default
  
msgChannel:
  chanNamePrefix:
    cluster: by-dev
    clusterHash: ""
  
  # Subname prefixes for different components
  subNamePrefix:
    dataSubNamePrefix: data-node
    dataDml: data-node-dml
    dataDql: data-node-dql

Using Kafka

# milvus.yaml
kafka:
  brokerList: kafka:9092
  saslUsername: ""              # For SASL auth
  saslPassword: ""
  saslMechanisms: PLAIN
  securityProtocol: SASL_SSL
  
# Disable Pulsar when using Kafka
# (remove or comment out pulsar section)

Topic and Retention Configuration

Pulsar Retention Policies

# Set retention (keep messages for 24 hours or 10GB)
pulsar-admin namespaces set-retention public/default \
  --size 10G \
  --time 24h

# View retention
pulsar-admin namespaces get-retention public/default

Key Topics Created by Milvus

Topic Pattern	Purpose	Retention
`by-dev-rootcoord-dml`	DDL operations	24h
`by-dev-rootcoord-delta`	Delete records	24h
`by-dev-data-node-*`	Data node messages	24h
`by-dev-proxy-*`	Proxy requests	6h

Milvus Retention Settings

# milvus.yaml
msgChannel:
  # Retention for consume positions
  seekPosition:
    readTimeout: 10s
    
common:
  retentionDuration: 86400        # 24 hours in seconds
  entityExpiration: -1            # Entity TTL (-1 = disabled)

Sizing Guidelines

Pulsar Resources

Component	CPU	RAM	Disk	Nodes
ZooKeeper	0.5	1 GB	10 GB	3
BookKeeper	2	4 GB	500 GB+ SSD	3+
Broker	2	4 GB	-	2+

Throughput Estimates

Workload	Messages/sec	Storage/day
Light (1K inserts/sec)	~5,000	~10 GB
Medium (10K inserts/sec)	~50,000	~100 GB
Heavy (100K inserts/sec)	~500,000	~1 TB

Monitoring

Pulsar Metrics

# Check cluster health
pulsar-admin brokers list

# Topic stats
pulsar-admin topics stats persistent://public/default/by-dev-rootcoord-dml

# Backlog (unconsumed messages)
pulsar-admin topics stats-internal persistent://public/default/by-dev-rootcoord-dml

Key Metrics to Watch

Metric	Warning	Critical
Storage usage	>70%	>85%
Message backlog	>10K	>100K
Publish latency	>100ms	>500ms
Consumer lag	>5 min	>30 min

Troubleshooting

"Message backlog growing"

Cause: Consumers slower than producers Fix:

# Check consumer rates
pulsar-admin topics stats <topic>

# Scale consumers (add more Data Nodes in Milvus)
# Or increase retention if temporary spike

pulsar-admin namespaces set-retention public/default --size 50G --time 48h

"Storage full"

Check:

# BookKeeper disk usage
df -h /pulsar/data/bookkeeper

# Check if compaction is running
pulsar-admin bookies list

Fix:

# Reduce retention (careful: may lose recovery capability)
pulsar-admin namespaces set-retention public/default --size 10G --time 12h

# Add more BookKeeper nodes

"High publish latency"

Causes:

BookKeeper I/O saturation
Network latency
Too few partitions

Fix:

# Check BookKeeper ledger dirs
pulsar-admin bookies list

# Increase partitions for hot topics
pulsar-admin topics partitioned-lookup <topic>

Best Practices

1 Dedicated Pulsar cluster — Don't share with other applications
2 SSD for BookKeeper — Critical for write latency
3 Monitor backlog — Growing backlog = lost data risk
4 Set alerts — Storage, backlog, latency thresholds
5 Regular compaction — Prevents unbounded growth
6 Backup ZooKeeper — Metadata loss = cluster loss
7 Separate network — Use dedicated NICs if possible

Next Steps

Learn about Milvus configuration:

→ Understanding milvus.yaml

Pulsar/Kafka — Message Queue

Pulsar/Kafka — Message Queue

Why Message Queues Matter

Pulsar vs Kafka

When to Use Kafka

Pulsar Deployment

Standalone (Development)

Cluster Mode (Production)

Kafka Deployment

Milvus Configuration

Using Pulsar

Using Kafka

Topic and Retention Configuration

Pulsar Retention Policies

Key Topics Created by Milvus

Milvus Retention Settings

Sizing Guidelines

Pulsar Resources

Throughput Estimates

Monitoring

Pulsar Metrics

Key Metrics to Watch

Troubleshooting

"Message backlog growing"

"Storage full"

"High publish latency"

Best Practices

Next Steps

Discussion