MongoDB to Elasticsearch Pipeline

GitHub

Sync data from MongoDB collections to Elasticsearch indices using Logstash JDBC

1h 30m30m reading1h lab

Project Structure

πŸ“mongo-elasticsearch-logstash
β”œβ”€β”€ πŸ“„docker-compose.yml
β”œβ”€β”€ πŸ“„.env
└── πŸ“pipeline
└── πŸ“„logstash.conf

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  MongoDB │────▢│ Logstash │────▢│   ES     │────▢│  Kibana  β”‚
β”‚(Source)  β”‚     β”‚(Pipeline)β”‚     β”‚ (Index)  β”‚     β”‚(Explore) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites

# Create network
docker network create datapipeline

Step 1: Start MongoDB

mkdir ~/mongo-pipeline && cd ~/mongo-pipeline

# Download MongoDB compose file
wget https://raw.githubusercontent.com/JinnaBalu/infinite-containers/main/mongo-elasticsearch-logstash/mongo/docker-compose.yml

docker compose up -d

Access Mongo Express: http://localhost:8081

Step 2: Import Sample Data

# Copy sample data to container
docker cp movies.json mongodb:/

# Import to MongoDB
docker exec mongodb mongoimport \
  --db moviedb \
  --collection movies \
  --file /movies.json \
  --jsonArray

Step 3: Start Elasticsearch & Kibana

# Download ELK compose
wget https://raw.githubusercontent.com/JinnaBalu/infinite-containers/main/mongo-elasticsearch-logstash/elk/docker-compose.yml

docker compose up -d

Access Kibana: http://localhost:5601

Step 4: Setup Logstash

Download MongoDB JDBC Driver

cd ~/mongo-pipeline
mkdir -p logstash/drivers

# Download and extract driver
wget https://dbschema.com/jdbc-drivers/MongoDbJdbcDriver.zip
unzip MongoDbJdbcDriver.zip -d logstash/drivers/
rm MongoDbJdbcDriver.zip

Download Logstash Configuration

# Download pipeline config
wget https://raw.githubusercontent.com/JinnaBalu/infinite-containers/main/mongo-elasticsearch-logstash/elk/logstash/pipeline/logstash.conf -O logstash/pipeline/logstash.conf

# Download docker-compose for logstash
wget https://raw.githubusercontent.com/JinnaBalu/infinite-containers/main/mongo-elasticsearch-logstash/elk/logstash/docker-compose.yml -O logstash/docker-compose.yml

Build Custom Logstash Image

cd logstash
docker build -t mongologstash .

Step 5: Run Logstash

docker compose up -d

Data Types Covered

The pipeline handles various MongoDB data types:

TypeExample Fields
Stringstitle, name, role
Numbersrating, releaseYear, budget
BooleanisAvailable
DatesreleaseDate
Nested Objectsdirector, boxOffice.profit
Arraysgenres, cast
ObjectId_id

Verification

# Check document count
curl -X GET 'localhost:9200/movies/_count?pretty'

# Search in Kibana Dev Tools
GET movies/_search
{
  "query": { "match_all": {} }
}

Cleanup

docker stop $(docker ps -qa)
docker rm $(docker ps -qa)
docker network rm datapipeline

Lab: Build the MongoDB Pipeline

  1. 1 Create the Docker network and start MongoDB with Mongo Express
  2. 2 Import the sample movies.json data into MongoDB
  3. 3 Start Elasticsearch and Kibana, verify both are healthy
  4. 4 Build the custom Logstash image with the MongoDB JDBC driver
  5. 5 Run Logstash and verify documents appear in Elasticsearch
  6. 6 Search for a movie in Kibana Dev Tools

Next Steps

Discussion