Troubleshooting

Common Elasticsearch errors and fixes — vm.max_map_count, shard allocation, license issues, analyzer errors, disk watermarks, and cluster recovery.

25m15m reading10m lab

vm.max_map_count Is Too Low

Error

[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Elasticsearch uses mmap to access index files efficiently. The default Linux setting is too low.

Fix

# Temporary (resets on reboot)
sudo sysctl -w vm.max_map_count=262144

# Permanent
echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

For Docker Desktop on macOS/Windows, set it in the Docker VM:
# Docker Desktop
docker run --privileged --pid=host -it alpine nsenter -t 1 -m -u -n -i \
  sysctl -w vm.max_map_count=262144

Unsupported License Type

Error

{
  "error": "unrecognised license type [trail]"
}

Cause

Typo in the license type. The correct value is trial, not trail.

Supported License Types

LicenseFeatures
basicFree tier, core features
standardBasic security features
goldMonitoring, alerting
platinumML, advanced security
enterpriseAll features
trial30-day trial of all features

Fix

curl -X POST "localhost:9200/_license/start_trial?acknowledge=true&pretty"

Or set in elasticsearch.yml:
xpack.license.self_generated.type: basic

Analyzer Not Found

Error

{
  "type": "mapper_parsing_exception",
  "reason": "analyzer [autocomplete_analyzer] has not been defined in the mapping"
}

Cause

A custom analyzer is referenced in mappings but was never defined in the index settings.

Fix

Custom analyzers must be defined in settings.analysis at index creation time. You cannot add them to an existing index without closing it first.

curl -X PUT "localhost:9200/my-index" \
  -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete_analyzer": {
          "type": "custom",
          "tokenizer": "edge_ngram_tokenizer",
          "filter": ["lowercase"]
        }
      },
      "tokenizer": {
        "edge_ngram_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": ["letter", "digit"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "autocomplete_analyzer"
      }
    }
  }
}'

Shard Allocation Failures

Error

{
  "index": "my-index",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned.reason": "CLUSTER_RECOVERED"
}

Diagnose

# View unassigned shards
curl -s "localhost:9200/_cat/shards?v&h=index,shard,prirep,state,unassigned.reason" | grep UNASSIGNED

# Get detailed allocation explanation
curl -s "localhost:9200/_cluster/allocation/explain?pretty"

Common Causes and Fixes

ReasonCauseFix
NODE_LEFTNode went offlineRestart the node or wait for recovery
ALLOCATION_FAILEDCorrupted shard dataDelete the index and restore from snapshot
INDEX_CREATEDNot enough nodes for replicasAdd nodes or reduce number_of_replicas
CLUSTER_RECOVEREDPost-restart rebalancingWait — Elasticsearch will auto-recover
DISK_THRESHOLDDisk watermark exceededFree disk space or adjust watermarks

Force Reroute (Last Resort)

curl -X POST "localhost:9200/_cluster/reroute?retry_failed=true&pretty"

Cluster Red Status

Diagnose

# Check which indices are red
curl -s "localhost:9200/_cat/indices?v&health=red"

# Check cluster health details
curl -s "localhost:9200/_cluster/health?level=indices&pretty"

# View unassigned shards
curl -s "localhost:9200/_cat/shards?v" | grep UNASSIGNED

Common Fixes

  1. 1 Missing nodes: Start the offline nodes
  2. 2 Disk full: Free space, then clear read-only blocks
  3. 3 Corrupted index: Delete and restore from snapshot
  4. 4 Not enough nodes for shards: Reduce replica count
# Reduce replicas for all red indices
curl -X PUT "localhost:9200/red-index/_settings" \
  -H 'Content-Type: application/json' -d'
{
  "index.number_of_replicas": 0
}'

Disk Watermark Exceeded (Read-Only Index)

Error

{
  "type": "cluster_block_exception",
  "reason": "index [my-index] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]"
}

Fix

# 1. Free disk space (delete old indices, logs, etc.)

# 2. Remove the read-only block
curl -X PUT "localhost:9200/my-index/_settings" \
  -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}'

# 3. Remove block from ALL indices
curl -X PUT "localhost:9200/_all/_settings" \
  -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}'

Connection Refused

Error

curl: (7) Failed to connect to localhost port 9200: Connection refused

Checklist

# 1. Is Elasticsearch running?
docker ps | grep elasticsearch
# or
systemctl status elasticsearch

# 2. Check logs for startup errors
docker logs elasticsearch
# or
tail -100 /var/log/elasticsearch/elasticsearch.log

# 3. Check if the port is bound
ss -tlnp | grep 9200

# 4. Check network.host setting
# If set to 0.0.0.0, ensure bootstrap checks pass
# If set to specific IP, curl that IP instead of localhost

Out of Memory (OOM)

Symptoms

  • Elasticsearch process killed
  • java.lang.OutOfMemoryError: Java heap space in logs
  • Node disappears from cluster

Fix

# Check current heap settings
curl -s "localhost:9200/_nodes/stats/jvm?pretty" | grep heap

# Set heap size (50% of available RAM, max 31GB)
# In jvm.options:
-Xms4g
-Xmx4g

# For Docker:
environment:
  ES_JAVA_OPTS: "-Xms4g -Xmx4g"

Rules:
  • Always set Xms equal to Xmx
  • Never exceed 50% of available RAM
  • Never exceed 31GB (compressed oops limit)

Circuit Breaker Triggered

Error

{
  "type": "circuit_breaking_exception",
  "reason": "[parent] Data too large"
}

Fix

# Check circuit breaker stats
curl -s "localhost:9200/_nodes/stats/breaker?pretty"

# Temporarily increase (not recommended long-term)
curl -X PUT "localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' -d'
{
  "persistent": {
    "indices.breaker.total.limit": "95%"
  }
}'

Better solutions:
  • Increase JVM heap size
  • Reduce the number of shards
  • Optimize queries (avoid deep aggregations)

Slow Startup After Crash

Cause

Elasticsearch is replaying the transaction log (translog) to recover uncommitted writes.

Monitor Recovery

# Check recovery progress
curl -s "localhost:9200/_cat/recovery?v&active_only=true"

# Check pending tasks
curl -s "localhost:9200/_cat/pending_tasks?v"

Speed Up Recovery

# Increase recovery speed (temporarily)
curl -X PUT "localhost:9200/_cluster/settings" \
  -H 'Content-Type: application/json' -d'
{
  "transient": {
    "cluster.routing.allocation.node_concurrent_recoveries": 4,
    "indices.recovery.max_bytes_per_sec": "200mb"
  }
}'

Quick Reference

ErrorQuick Fix
vm.max_map_count too lowsysctl -w vm.max_map_count=262144
unrecognised license typeUse trial not trail
analyzer not definedAdd to settings.analysis at creation
Unassigned shardsCheck allocation explain API
Cluster redCheck _cat/indices?health=red
Read-only indexSet read_only_allow_delete to null
Connection refusedCheck if ES is running, check logs
OOM killedSet heap to 50% RAM, max 31GB
Circuit breakerIncrease heap or optimize queries
Slow recoveryIncrease max_bytes_per_sec

Lab: Diagnose and Fix Issues

  1. 1 Start a cluster and deliberately fill disk to trigger watermarks
  2. 2 Remove the read-only block after freeing space
  3. 3 Stop a node and observe shard reallocation
  4. 4 Use _cluster/allocation/explain to debug unassigned shards
  5. 5 Check circuit breaker stats under load

Next Steps

Discussion