Reindexing Data

GitHub

Move data between indices with mappings changes and transformations

20m10m reading10m lab

Project Structure

📁elasticsearch-remote-reindexing
├── 📄docker-compose.yml
└── 📄.env

Basic Reindex

curl -X POST "localhost:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "old-index"
  },
  "dest": {
    "index": "new-index"
  }
}'

Reindex with Transformation

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "products-v1"
  },
  "dest": {
    "index": "products-v2"
  },
  "script": {
    "source": "ctx._source.price = ctx._source.price * 1.1"
  }
}'

Reindex from Query

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "logs-2024",
    "query": {
      "range": {
        "timestamp": {
          "gte": "2024-01-01",
          "lt": "2024-02-01"
        }
      }
    }
  },
  "dest": {
    "index": "logs-2024-01"
  }
}'

Prepare Destination Index

Always create the destination index with proper mappings first:

curl -X PUT "localhost:9200/new-index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "number_of_shards": 3
  },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "created_at": { "type": "date" }
    }
  }
}'

Monitor Reindexing

# Check task status
curl -X GET "localhost:9200/_tasks?detailed=true&actions=*reindex"

# Cancel if needed
curl -X POST "localhost:9200/_tasks/task_id:12345/_cancel"

Best Practices

  1. 1 Always test on a subset first
  2. 2 Create destination index before reindexing
  3. 3 Monitor cluster resources during large reindexes
  4. 4 Use slices for parallel processing on large datasets

Parallel Reindexing

curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "large-index",
    "slice": {
      "id": 0,
      "max": 4
    }
  },
  "dest": {
    "index": "new-large-index"
  }
}'

Lab: Practice Reindexing

  1. 1 Create a source index with sample documents
  2. 2 Reindex all documents to a new index
  3. 3 Reindex with a query filter (only specific documents)
  4. 4 Use _tasks API to monitor reindex progress
  5. 5 Verify document counts match between source and destination

Next Steps

Discussion