Understanding Indices: From Tables to Elasticsearch

How Elasticsearch stores and finds data — explained for anyone who knows SQL.

25m10m reading15m lab

If You Know SQL, You Already Know Half of This

Elasticsearch stores data differently than a relational database, but the concepts map cleanly:

SQL	Elasticsearch	What it is
Database	Cluster	The whole system
Table	Index	A named collection of similar documents
Row	Document	A single JSON record
Column	Field	A key in that JSON document
Schema	Mapping	The field types (text, keyword, integer…)
`CREATE TABLE`	PUT index	Define the index before writing
`INSERT INTO`	Index a document	Write a JSON document
`SELECT WHERE`	Search query	Find matching documents

The biggest shift: rows have a fixed schema, documents can have different shapes. But in practice, all documents in an index share the same mapping — so think of it like a table that tolerates NULL columns gracefully.

Three Parts of an Index

Every Elasticsearch index has three components defined when you create it:

1. Settings — Infrastructure

Settings control how the index behaves at the storage layer:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}

Setting	Default	Meaning
`number_of_shards`	1	How many pieces the index is split into. Set once at creation — cannot change.
`number_of_replicas`	1	How many copies of each shard. Can change any time.

For a single-node cluster: set replicas to 0. Elasticsearch won't assign replica shards to the same node as the primary — leaving them UNASSIGNED and the index yellow. Zero replicas = green.

ℹ️

Note: Elasticsearch index settings reference

2. Mappings — Field Types

Mappings tell Elasticsearch how to interpret each field:

{
  "mappings": {
    "properties": {
      "name":       { "type": "text" },
      "city":       { "type": "keyword" },
      "age":        { "type": "integer" },
      "created_at": { "type": "date" }
    }
  }
}

The two most common field types — and why they differ:

Type	What happens to the value	Best for
`text`	Analyzed: lowercased, split into tokens by an analyzer	Full-text search (`match` query)
`keyword`	Stored as-is	Exact match, aggregations, sorting (`term` query)

"New York" stored as text → tokens: ["new", "york"] — matches search for new or york. "New York" stored as keyword → stored as "New York" — only matches exact "New York".

ℹ️

Note: Elasticsearch mapping types reference

3. Shards and Replicas — Distribution

Primary shards hold the actual data. An index with 3 shards splits its documents across 3 buckets — each query fans out to all shards and merges results. Replica shards are exact copies of primaries. They serve two purposes:

Redundancy — if a node fails, replicas on other nodes keep the data available
Read scaling — search requests can hit replicas, distributing query load

Index: users  (3 primary shards, 1 replica each = 6 total shards)

Primary  P0  ──replica──  R0
Primary  P1  ──replica──  R1
Primary  P2  ──replica──  R2

On a single node: use number_of_replicas: 0. Replicas need a second node to live on. You'll add replicas in the multi-node lesson.

ℹ️

Note: Elasticsearch shards and replicas

How Data Actually Gets Stored — The Inverted Index

When you index a text field, Elasticsearch doesn't store the raw string. It runs an analyzer on it, breaks it into tokens, and builds an inverted index — a lookup table mapping every token to the list of documents that contain it.

This is what makes full-text search fast: instead of scanning every document for a word, Elasticsearch goes directly to the token in the inverted index and reads the posting list.

The interactive demo below shows exactly how this works. Insert the sample documents and then search for a token — watch the inverted index build and see which documents match.

Inverted Index — Interactive Demo

Insert documents · see how each token is indexed · search to trace it back

Documents — users index

{ "_id": 1 }

name:“Alice Johnson”text

city:“New York”keyword

role:“engineer”keyword

{ "_id": 2 }

name:“Bob Smith”text

city:“New York”keyword

role:“designer”keyword

{ "_id": 3 }

name:“Alice Smith”text

city:“Chicago”keyword

role:“engineer”keyword

Inverted Index

📖

Index is empty.
Insert a document on the left
to see it build.

text — analyzed: lowercased + split into tokenskeyword — stored as-is, exact match only#id — posting list: which docs contain this token

Lab: Create an Index and Index Documents

Your single-node cluster is running. Open a terminal and follow along.

1. Create an index with explicit mapping

curl -X PUT "localhost:9200/users" -H "Content-Type: application/json" -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "city": { "type": "keyword" },
      "role": { "type": "keyword" },
      "age":  { "type": "integer" }
    }
  }
}'

✅

Expected: {"acknowledged":true,"shards_acknowledged":true,"index":"users"}

2. Index some documents

curl -X POST "localhost:9200/users/_doc/1" -H "Content-Type: application/json" -d '{
  "name": "Alice Johnson",
  "city": "New York",
  "role": "engineer",
  "age": 31
}'

curl -X POST "localhost:9200/users/_doc/2" -H "Content-Type: application/json" -d '{
  "name": "Bob Smith",
  "city": "New York",
  "role": "designer",
  "age": 27
}'

curl -X POST "localhost:9200/users/_doc/3" -H "Content-Type: application/json" -d '{
  "name": "Alice Smith",
  "city": "Chicago",
  "role": "engineer",
  "age": 35
}'

3. Inspect the mapping Elasticsearch created

curl "localhost:9200/users/_mapping?pretty"

4. Check index health

curl "localhost:9200/_cat/indices/users?v&h=index,health,docs.count,store.size,pri,rep"

✅

Expected:

index  health  docs.count  store.size  pri  rep
users  green            3       9.5kb    1    0

green confirms all shards assigned. rep=0 — no replicas on a single node.

5. Search by full-text

curl "localhost:9200/users/_search?pretty" -H "Content-Type: application/json" -d '{
  "query": { "match": { "name": "alice" } }
}'

Returns both Alice Johnson and Alice Smith — the text field was tokenized, so "alice" matches in both.

6. Search by exact keyword

curl "localhost:9200/users/_search?pretty" -H "Content-Type: application/json" -d '{
  "query": { "term": { "city": "New York" } }
}'

Returns only the two New York documents. term queries hit the keyword field — exact match, case-sensitive.

Next Steps

Index Operations — aliases, open/close, delete
Document CRUD — update, delete, bulk indexing

Discussion

Please enable JavaScript to view the comments powered by Disqus.