Database Amigos

Tuesday, October 14, 2025

Why This Matters

ClickHouse powers Brevo’s large-scale analytical workloads — handling billions of events per day.
While it excels at fast reads and inserts, deleting data in ClickHouse is a completely different story.

In this article, we’ll explore:

Why traditional deletes slow down ClickHouse
What’s happening under the hood
Real examples from our production cluster
How we made deletes 3× faster and safer using partition-based strategies

How ClickHouse Stores Data

Think of ClickHouse as a container ship warehouse.

ClickHouse Concept	Real-World Analogy
Table	Container ship
Partition	Container on deck
Part	Carton inside the container
Row	Individual item inside a carton

Each new insert adds more cartons (parts) inside the containers (partitions).
ClickHouse is designed to append data, not rewrite it — and that’s where delete challenges begin.

Inserts Are Fast (Append-Only)

ClickHouse shines with bulk inserts:

Append-only: Data is always added, never modified in place.
Columnar storage: Each column is stored separately for better compression.
Batch-friendly: Ideal for large inserts (thousands of rows per batch).
Immediate availability: Data becomes queryable as soon as it’s inserted.

Deletes Are Slow — They Trigger Mutations

When you delete in ClickHouse, it doesn’t remove rows directly.
It creates a mutation job, which rewrites the affected parts in the background.


ALTER TABLE events.email_transactional
DELETE WHERE organization_id = 140354
  AND email = 'heiditanyh@gmail.com';

Each mutation:

Rewrites entire parts (even if only one row changes)
Blocks merges on affected partitions
Consumes heavy CPU, disk I/O, and temporary space

Partitions, Merges, and Mutations — Visualized

Merge: Background Optimization

Merges continuously combine small parts into bigger ones — improving query performance.

Mutation: Costly Rewrites

Mutations, on the other hand, must open and repack containers — blocking merges until they finish.
Only one mutation per part runs at a time, so queue buildup is inevitable during heavy delete workloads.

The Problem: Cross-Partition Deletes

Here’s an example of a problematic query we once ran:


ALTER TABLE logs.automation_workflow 
DELETE 
WHERE organization_id = 2373101
  AND event_date >= '2020-01-20'
  AND event_date <= '2021-01-31';

Why It’s Bad

Affects 13+ partitions (202001–202101)
Rewrites 20K+ parts
Triggers massive merge backlog
Keeps mutation queues busy for hours

The Fix: Partition-Scoped Deletes

Instead of deleting across all partitions, we delete one partition at a time:


ALTER TABLE logs.automation_workflow
DELETE IN PARTITION '202012'
WHERE organization_id = 2373101
  AND event_date >= '2020-12-01'
  AND event_date < '2021-01-01';

We repeat this for each month.
This drastically reduces the number of parts under mutation at once.

One partition at a time keeps merges running, avoids massive lock windows, and isolates failure risk.

Real Result: 3× Faster Execution

Before: Single wide mutation → 90 mins, heavy cluster load
After: Monthly partition deletes → 30 mins total, smooth merges
CPU and I/O dropped by ~60%
No Keeper overload or replication lag

How Mutations Replicate Across Replicas

ClickHouse-Keeper acts as the control tower for all replicas:

The mutation is recorded as an instruction in Keeper.
Each replica picks it up and executes independently.
Finished replicas can fetch mutated parts instead of reprocessing.
Lagging replicas replay missed mutation IDs later.

Monitor replication status


SELECT database, table, mutation_id, is_done, parts_to_do
FROM system.mutations
WHERE database='logs' AND table='automation_workflow_local_replicated';

Root Cause Summary – Why Mutations Slow the Cluster

Delayed Execution: Mutations queued per table & replica; depend on background load.
Frozen Parts: Merges paused → part backlog grows.
Resource Contention: Each replica rewrites full parts; heavy CPU & disk I/O.

Together, they cause long mutation queues, merge starvation, and replication lag.

Real Queries for Monitoring


-- Check active mutations
SELECT * FROM system.mutations WHERE is_done = 0;

-- Detect merge backlog
SELECT table, count() AS merging_parts FROM system.merges GROUP BY table;

-- Check replication lag
SELECT table, queue_size, last_queue_update
FROM system.replication_queue
ORDER BY queue_size DESC;

Best Practices for Safe Deletes

Principle	Recommendation
Scope by Partition	Always delete within a partition (`DELETE IN PARTITION`)
Batch Deletes	Delete by org IDs in groups (e.g., 50–200 at a time)
Off-Peak Schedule	Run during low traffic windows
Monitor Progress	Watch `system.mutations` and `system.merges`
Optimize Afterwards	`OPTIMIZE TABLE ... PARTITION ... FINAL` if needed

Key Takeaways

ClickHouse is append-optimized, not delete-friendly.
Mutations rewrite entire parts → slow and costly.
Cross-partition deletes cripple merges and replication.
Partition-scoped deletes = 3× faster, safer, cleaner.
Always monitor system.mutations and plan deletes during off-peak hours.

Saturday, January 9, 2021

What is MongoDB FTDC (Full Time Diagnostic Data Capture)? How does it impact your platform stability?

In simple words, MongoDB Inc. captures server behavior and stores it under a separate directory (data directory/diagnostic.data). This is mainly for MongoDB engineers, to process/analyze mongod and mongos when needed ( troubleshooting). FTDC data files are compressed and not human-readable. MongoDB Inc. engineers cannot access FTDC data without explicit permission and assistance from system owners or operators.

FTDC captures data every 1 second.

Sometimes this process leads to high resource consumption. In a few places, this process marked as overhead for MongoDB overhead.

i.e

https://bugzilla.redhat.com/show_bug.cgi?id=1834377

How ticket drops impacted this process,

This process/tread collects,

serverStatus
replSetGetStatus (mongod only)
collStats for the local.oplog.rs collection (mongod only)

Once our servers are getting load, capturing every 1 second all the above data, could lead to holding tickets longer time than usual.

The only drawback of disabling this feature is, troubleshooting time may increase from MongoDB Inc.

by default this feature is enabled.

How to disable it:

Run time ( Apply immediately but until next instance restart)
config level ( You need to restart the instance)

Runtime
use admin
db.adminCommand( { setParameter: 1 , diagnosticDataCollectionEnabled : false } )

MongoDB config file

setParameter:

diagnosticDataCollectionEnabled: false

There is a huge change in our platform

Monday, September 2, 2019

Disk reclaim on MongoDB WiredTiger Engine

This example shows that deleting MongoDB collection will release the disk space to OS ( only applicable for WT storage engine)

```
hasitha@DESKTOP / $ sudo du -sh /data/
6.0G /data/

hasitha@DESKTOP / $ mongo_local
MongoDB shell version v3.4.21
connecting to: mongodb://localhost:27017/
MongoDB server version: 3.4.21
> show dbs
admin 0.000GB
central_db 0.000GB
e_1v6y03dpfs6z 0.001GB
e_2nas8av7dawh 5.552GB
e_da618q7mfzsv 0.001GB
e_p4rxh05m2k4w 0.008GB
e_testdb 0.001GB
e_yrrsst6ncktg 0.004GB
local 0.000GB
m_testdb 0.000GB
m_yrrsst6ncktg 0.000GB
s_testdb 0.000GB

> use e_2nas8av7dawh
switched to db e_2nas8av7dawh

> show collections
users

> db.users.drop()

true

> show collections

> exit

bye

hasitha@DESKTOP / $ sudo du -sh /data/

358M /data/

```

Monday, June 17, 2019

Wildcard Indexing - The new feature comes in MongoDB 4.2

It's a new topic on the MongoDB world. Here is the first blog from MongoDB Developer:
https://www.mongodb.com/blog/post/coming-in-mongodb-42--1-wildcard-indexes

Wednesday, February 6, 2019

Download MongoDB backup files (Encrypted) from S3

After spending a few hours, I have written a small script where you can download MongoDB backup files from S3. The specialty of this script is, this can download backup files which are encrypted by using key value ( Encrypted key)

# DO NOT modify the encryption key unless if they used different key for encryption
key="<key value>"
# s3 path for MongoDB backups. You can find it from MongoDB backup logs ( You may not be able to find full s3 path from backup config file as backup file creates additional backup folder.
s3path="< S3 path >"
# Backup download location
backupLocation="/data/backups/tmp"

echo " ---------------------------------------------------------"
echo " List of backup files store in $s3path "
echo ""
echo " ---------------------------------------------------------"
echo -e " Date | Time | Size | \tName of the file"
echo " ---------------------------------------------------------"
aws s3 ls $s3path
echo " ---------------------------------------------------------"
echo " What is the database file name that you need to download ? < Please type or copy paste full name of the backup file here > "
read filename
if [ -z $filename ]
then
echo " ERROR:: No file name has entered !!! ABORTED"
exit 1
fi

echo ""
echo " $s3path$filename will be downloading to $backupLocation "
aws s3 cp $s3path$filename $backupLocation
echo ""
echo " ---------------------------------------------------------"
echo " Decrypting the file ..... "
openssl enc -d -aes-256-cbc -in $backupLocation/$filename -out $backupLocation/${filename::-4} -k $key
echo " ---------------------------------------------------------"
ls -al $backupLocation/
echo ""
echo " ---------------------------------------------------------"
echo ""
echo " Where do you want to extract the backup files ? Default location is /data/backups/tmp"
read restoreLocation
if [ -z $restoreLocation ]
then
restoreLocation="/data/backups/tmp"
fi

echo ""
echo " Extracting db files to $restoreLocation "
tar -xvf $backupLocation/${filename::-4} -C $restoreLocation

echo " ---------------------------------------------------------"
echo " list of files and directories in $restoreLocation "
ls -al $restoreLocation/
echo " ---------------------------------------------------------"
echo " Remove downloaded file : $backupLocation/$filename"
echo " ---------------------------------------------------------"
rm -f $backupLocation/$filename
echo ""
echo " #########################################################"
echo " # You have successfully downloaded backup files. Please #"
echo " # restore your files using mongorestore utility #"
echo " #########################################################"

Tuesday, February 5, 2019

Delete your Large MongoDB database effectively

You can use below javascript code to delete large database without having a long duration of global lock( which is blocked all other operations on this database, maybe your instance might get hang).

Initially, this script will delete all indexes belongs to the selected collection and then its' documents one by one. Once the collection has no documents, it drops the collection finally. Between these operations, sleep function is configured to avoid the global lock being held in longer duration.

var dbname= "dummy"
var URL = "localhost:27017/"+ dbname
db = connect(URL);

var collections = db.getCollectionNames();

for (var i in collections) {

print ("Deleting collection : " + collections[i] +"...");
print ("Droping all indexes in "+ collections[i] + "...");
db[collections[i]].dropIndexes();
sleep(5000);
db[collections[i]].find().forEach(function(mydoc) {
print ("Deleting document : " + mydoc._id)
db[collections[i]].remove({'_id': mydoc._id})
//print(mydoc.name);
sleep(1000);
}
)
db[collections[i]].drop();
sleep(5000);
}

function sleep(milliseconds) {
var start = new Date().getTime();
while (new Date().getTime()< start + milliseconds) {
// Do nothing
}
}

Monday, December 31, 2018

MongoDB Memory Management

Nice Article: http://www.briancarpio.com/2012/05/03/mongodb-memory-management/

Credit goes to the Brian Carpio