Helm on Kubernetes
GitHubProduction-grade Milvus deployment on Kubernetes using Helm charts. Auto-scaling, rolling updates, and enterprise features.
45m25m reading20m lab
Helm on Kubernetes
This is the production deployment method for Milvus. Kubernetes provides:- Horizontal scaling — Add/remove nodes automatically
- Rolling updates — Zero-downtime deployments
- Self-healing — Auto-restart failed components
- Resource management — CPU/memory limits and requests
Prerequisites
- Kubernetes cluster (1.24+) with at least 4 nodes
- kubectl configured
- Helm 3.x installed
- Storage class for PVCs
Quick Start
1. Add Milvus Helm Repository
helm repo add milvus https://zilliztech.github.io/milvus-helm/
helm repo update
2. Install Milvus (Minimal)
helm install milvus milvus/milvus \
--set cluster.enabled=true \
--set etcd.replicaCount=3 \
--set pulsar.enabled=true \
--set minio.mode=distributed
Wait for deployment:
kubectl get pods -w
Expected Output:
> NAME READY STATUS > milvus-etcd-0 1/1 Running > milvus-etcd-1 1/1 Running > milvus-etcd-2 1/1 Running > milvus-minio-0 1/1 Running > milvus-proxy-7d9f4b8c5-x2k9p 1/1 Running > milvus-rootcoord-0 1/1 Running > ... >
3. Access Milvus
# Port-forward to local
kubectl port-forward svc/milvus 19530:19530
# Test connection
python -c "from pymilvus import MilvusClient; c = MilvusClient('http://localhost:19530'); print(c.get_server_version())"
Production Values File
Create milvus-production.yaml:
# ============================================
# Cluster Configuration
# ============================================
cluster:
enabled: true
# ============================================
# Proxy (Access Layer)
# ============================================
proxy:
replicas: 2
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 2
memory: 4Gi
service:
type: LoadBalancer
port: 19530
annotations:
# AWS NLB example
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
# ============================================
# Coordinators
# ============================================
rootCoordinator:
replicas: 1
resources:
requests:
cpu: 0.5
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
# Enable HA with standby
enableActiveStandby: true
queryCoordinator:
replicas: 1
resources:
requests:
cpu: 0.5
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
dataCoordinator:
replicas: 1
resources:
requests:
cpu: 0.5
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
indexCoordinator:
replicas: 1
resources:
requests:
cpu: 0.5
memory: 1Gi
limits:
cpu: 1
memory: 2Gi
# ============================================
# Workers (Scalable)
# ============================================
queryNode:
replicas: 3
resources:
requests:
cpu: 2
memory: 8Gi
limits:
cpu: 4
memory: 16Gi
# Enable HPA for auto-scaling
hpa:
enabled: true
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
dataNode:
replicas: 2
resources:
requests:
cpu: 1
memory: 4Gi
limits:
cpu: 2
memory: 8Gi
indexNode:
replicas: 2
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 4
memory: 16Gi
# ============================================
# Dependencies
# ============================================
# etcd Configuration
etcd:
enabled: true
replicaCount: 3
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 2
memory: 4Gi
persistence:
enabled: true
size: 20Gi
storageClass: "standard"
# MinIO Configuration
minio:
enabled: true
mode: distributed
drivesPerNode: 1
replicas: 4
resources:
requests:
cpu: 1
memory: 2Gi
limits:
cpu: 2
memory: 4Gi
persistence:
enabled: true
size: 500Gi
storageClass: "standard"
# Pulsar Configuration
pulsar:
enabled: true
components:
zookeeper: true
bookkeeper: true
broker: true
proxy: false
zookeeper:
replicaCount: 3
resources:
requests:
cpu: 0.5
memory: 1Gi
bookkeeper:
replicaCount: 3
resources:
requests:
cpu: 1
memory: 4Gi
persistence:
enabled: true
size: 100Gi
broker:
replicaCount: 2
resources:
requests:
cpu: 1
memory: 4Gi
# ============================================
# External Dependencies (Alternative)
# ============================================
# Uncomment to use external services instead of in-cluster
# externalEtcd:
# enabled: true
# endpoints:
# - etcd-0.etcd:2379
# - etcd-1.etcd:2379
# - etcd-2.etcd:2379
# externalS3:
# enabled: true
# host: s3.amazonaws.com
# port: 443
# bucketName: my-milvus-bucket
# cloudProvider: aws
# useSSL: true
# accessKey: "<access-key>"
# secretKey: "<secret-key>"
# region: us-east-1
# externalPulsar:
# enabled: true
# host: pulsar.example.com
# port: 6650
Install with Custom Values
helm install milvus milvus/milvus -f milvus-production.yaml
Scaling Operations
Horizontal Pod Autoscaling
Query Nodes auto-scale based on CPU:
# Check HPA status
kubectl get hpa
# Manually scale
kubectl scale deployment milvus-querynode --replicas=5
Vertical Scaling
Update resources and rolling restart:
# Edit values
helm upgrade milvus milvus/milvus -f milvus-production.yaml
# Watch rolling update
kubectl get pods -w
Upgrading Milvus
Check Current Version
helm list
kubectl get deployment milvus-proxy -o jsonpath='{.spec.template.spec.containers[0].image}'
Upgrade to New Version
# Update chart
helm repo update
# Upgrade with new image tag
helm upgrade milvus milvus/milvus \
--set image.tag=v2.5.5 \
-f milvus-production.yaml
# Verify rollout
kubectl rollout status deployment/milvus-proxy
Rollback if Issues
# Check history
helm history milvus
# Rollback
helm rollback milvus 2
Monitoring Setup
Enable ServiceMonitor (Prometheus Operator)
metrics:
serviceMonitor:
enabled: true
interval: 30s
namespace: monitoring
Grafana Dashboard
Import the official Milvus dashboard:
kubectl create configmap milvus-grafana-dashboard \
--from-file=milvus-dashboard.json \
-n monitoring
Backup Configuration
etcd Backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: etcd-backup
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: bitnami/etcd:3.5.16
command:
- /bin/sh
- -c
- |
etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M).db \
--endpoints=milvus-etcd-0:2379
env:
- name: ETCDCTL_API
value: "3"
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: etcd-backup-pvc
restartPolicy: OnFailure
Troubleshooting
Check Component Logs
# Proxy logs
kubectl logs -l app.kubernetes.io/component=proxy --tail=100 -f
# QueryNode logs
kubectl logs -l app.kubernetes.io/component=querynode --tail=100
# Previous container logs (if crashed)
kubectl logs -l app.kubernetes.io/component=querynode --previous
Common Issues
Pods stuck in Pending:kubectl describe pod milvus-querynode-xxx
# Check: Resource limits, node selectors, PVC binding
etcd connection errors:
# Check etcd health
kubectl exec milvus-etcd-0 -- etcdctl endpoint health
# Check endpoints config
kubectl get cm milvus-config -o yaml | grep ETCD
OOMKilled:
# Check memory usage
kubectl top pod -l app.kubernetes.io/instance=milvus
# Increase limits in values file
Production Checklist
Before going live:- [ ] Resource limits set for all components
- [ ] HPA configured for Query Nodes
- [ ] PVCs have appropriate storage class
- [ ] External access via LoadBalancer or Ingress
- [ ] Monitoring and alerting configured
- [ ] Backup strategy tested
- [ ] Rolling update procedure tested
- [ ] Disaster recovery runbook written
- [ ] Security (TLS, auth) enabled
Next Steps
Learn about each dependency in detail:
→ etcd — Metadata StoreOr dive into configuration:
→ Understanding milvus.yaml