Understanding Indices: From Tables to Elasticsearch

How Elasticsearch stores and finds data — explained for anyone who knows SQL.

25m10m reading15m lab

If You Know SQL, You Already Know Half of This

Elasticsearch stores data differently than a relational database, but the concepts map cleanly:

SQLElasticsearchWhat it is
DatabaseClusterThe whole system
TableIndexA named collection of similar documents
RowDocumentA single JSON record
ColumnFieldA key in that JSON document
SchemaMappingThe field types (text, keyword, integer…)
CREATE TABLEPUT indexDefine the index before writing
INSERT INTOIndex a documentWrite a JSON document
SELECT WHERESearch queryFind matching documents
The biggest shift: rows have a fixed schema, documents can have different shapes. But in practice, all documents in an index share the same mapping — so think of it like a table that tolerates NULL columns gracefully.

Three Parts of an Index

Every Elasticsearch index has three components defined when you create it:

1. Settings — Infrastructure

Settings control how the index behaves at the storage layer:

{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}
SettingDefaultMeaning
number_of_shards1How many pieces the index is split into. Set once at creation — cannot change.
number_of_replicas1How many copies of each shard. Can change any time.
For a single-node cluster: set replicas to 0. Elasticsearch won't assign replica shards to the same node as the primary — leaving them UNASSIGNED and the index yellow. Zero replicas = green.

2. Mappings — Field Types

Mappings tell Elasticsearch how to interpret each field:

{
  "mappings": {
    "properties": {
      "name":       { "type": "text" },
      "city":       { "type": "keyword" },
      "age":        { "type": "integer" },
      "created_at": { "type": "date" }
    }
  }
}

The two most common field types — and why they differ:

TypeWhat happens to the valueBest for
textAnalyzed: lowercased, split into tokens by an analyzerFull-text search (match query)
keywordStored as-isExact match, aggregations, sorting (term query)
"New York" stored as text → tokens: ["new", "york"] — matches search for new or york. "New York" stored as keyword → stored as "New York" — only matches exact "New York".

3. Shards and Replicas — Distribution

Primary shards hold the actual data. An index with 3 shards splits its documents across 3 buckets — each query fans out to all shards and merges results. Replica shards are exact copies of primaries. They serve two purposes:
  • Redundancy — if a node fails, replicas on other nodes keep the data available
  • Read scaling — search requests can hit replicas, distributing query load
Index: users  (3 primary shards, 1 replica each = 6 total shards)

Primary  P0  ──replica──  R0
Primary  P1  ──replica──  R1
Primary  P2  ──replica──  R2
On a single node: use number_of_replicas: 0. Replicas need a second node to live on. You'll add replicas in the multi-node lesson.

How Data Actually Gets Stored — The Inverted Index

When you index a text field, Elasticsearch doesn't store the raw string. It runs an analyzer on it, breaks it into tokens, and builds an inverted index — a lookup table mapping every token to the list of documents that contain it.

This is what makes full-text search fast: instead of scanning every document for a word, Elasticsearch goes directly to the token in the inverted index and reads the posting list.

The interactive demo below shows exactly how this works. Insert the sample documents and then search for a token — watch the inverted index build and see which documents match.

Inverted Index — Interactive Demo

Documents — users index

{ "_id": 1 }
name:Alice Johnsontext
city:New Yorkkeyword
role:engineerkeyword
{ "_id": 2 }
name:Bob Smithtext
city:New Yorkkeyword
role:designerkeyword
{ "_id": 3 }
name:Alice Smithtext
city:Chicagokeyword
role:engineerkeyword

Inverted Index

📖

Index is empty.
Insert a document on the left
to see it build.

text — analyzed: lowercased + split into tokenskeyword — stored as-is, exact match only#id — posting list: which docs contain this token

Lab: Create an Index and Index Documents

Your single-node cluster is running. Open a terminal and follow along.

1. Create an index with explicit mapping

curl -X PUT "localhost:9200/users" -H "Content-Type: application/json" -d '{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  },
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "city": { "type": "keyword" },
      "role": { "type": "keyword" },
      "age":  { "type": "integer" }
    }
  }
}'

Expected: {"acknowledged":true,"shards_acknowledged":true,"index":"users"}

2. Index some documents

curl -X POST "localhost:9200/users/_doc/1" -H "Content-Type: application/json" -d '{
  "name": "Alice Johnson",
  "city": "New York",
  "role": "engineer",
  "age": 31
}'

curl -X POST "localhost:9200/users/_doc/2" -H "Content-Type: application/json" -d '{
  "name": "Bob Smith",
  "city": "New York",
  "role": "designer",
  "age": 27
}'

curl -X POST "localhost:9200/users/_doc/3" -H "Content-Type: application/json" -d '{
  "name": "Alice Smith",
  "city": "Chicago",
  "role": "engineer",
  "age": 35
}'

3. Inspect the mapping Elasticsearch created

curl "localhost:9200/users/_mapping?pretty"

4. Check index health

curl "localhost:9200/_cat/indices/users?v&h=index,health,docs.count,store.size,pri,rep"

Expected:

index  health  docs.count  store.size  pri  rep
users  green            3       9.5kb    1    0

green confirms all shards assigned. rep=0 — no replicas on a single node.

5. Search by full-text

curl "localhost:9200/users/_search?pretty" -H "Content-Type: application/json" -d '{
  "query": { "match": { "name": "alice" } }
}'

Returns both Alice Johnson and Alice Smith — the text field was tokenized, so "alice" matches in both.

6. Search by exact keyword

curl "localhost:9200/users/_search?pretty" -H "Content-Type: application/json" -d '{
  "query": { "term": { "city": "New York" } }
}'

Returns only the two New York documents. term queries hit the keyword field — exact match, case-sensitive.

Next Steps

Discussion