Tuesday, October 14, 2025


Why This Matters

ClickHouse powers Brevo’s large-scale analytical workloads — handling billions of events per day.
While it excels at fast reads and inserts, deleting data in ClickHouse is a completely different story.

In this article, we’ll explore:

  • Why traditional deletes slow down ClickHouse

  • What’s happening under the hood

  • Real examples from our production cluster

  • How we made deletes 3× faster and safer using partition-based strategies

How ClickHouse Stores Data

Think of ClickHouse as a container ship warehouse.

Each new insert adds more cartons (parts) inside the containers (partitions).
ClickHouse is designed to append data, not rewrite it — and that’s where delete challenges begin.

 Inserts Are Fast (Append-Only)

ClickHouse shines with bulk inserts:

  • Append-only: Data is always added, never modified in place.

  • Columnar storage: Each column is stored separately for better compression.

  • Batch-friendly: Ideal for large inserts (thousands of rows per batch).

  • Immediate availability: Data becomes queryable as soon as it’s inserted.

Deletes Are Slow — They Trigger Mutations

When you delete in ClickHouse, it doesn’t remove rows directly.
It creates a mutation job, which rewrites the affected parts in the background.

ALTER TABLE events.email_transactional DELETE WHERE organization_id = 140354 AND email = 'heiditanyh@gmail.com';

Each mutation:

  • Rewrites entire parts (even if only one row changes)

  • Blocks merges on affected partitions

  • Consumes heavy CPU, disk I/O, and temporary space

Partitions, Merges, and Mutations — Visualized

Merge: Background Optimization

Merges continuously combine small parts into bigger ones — improving query performance.

Mutation: Costly Rewrites

Mutations, on the other hand, must open and repack containers — blocking merges until they finish.
Only one mutation per part runs at a time, so queue buildup is inevitable during heavy delete workloads.

The Problem: Cross-Partition Deletes

Here’s an example of a problematic query we once ran:

ALTER TABLE logs.automation_workflow DELETE WHERE organization_id = 2373101 AND event_date >= '2020-01-20' AND event_date <= '2021-01-31';

Why It’s Bad

  • Affects 13+ partitions (202001–202101)

  • Rewrites 20K+ parts

  • Triggers massive merge backlog

  • Keeps mutation queues busy for hours

The Fix: Partition-Scoped Deletes

Instead of deleting across all partitions, we delete one partition at a time:

ALTER TABLE logs.automation_workflow DELETE IN PARTITION '202012' WHERE organization_id = 2373101 AND event_date >= '2020-12-01' AND event_date < '2021-01-01';

We repeat this for each month.
This drastically reduces the number of parts under mutation at once.

One partition at a time keeps merges running, avoids massive lock windows, and isolates failure risk.

Real Result: 3× Faster Execution

  • Before: Single wide mutation → 90 mins, heavy cluster load

  • After: Monthly partition deletes → 30 mins total, smooth merges

  • CPU and I/O dropped by ~60%

  • No Keeper overload or replication lag

How Mutations Replicate Across Replicas

ClickHouse-Keeper acts as the control tower for all replicas:

  1. The mutation is recorded as an instruction in Keeper.

  2. Each replica picks it up and executes independently.

  3. Finished replicas can fetch mutated parts instead of reprocessing.

  4. Lagging replicas replay missed mutation IDs later.

Monitor replication status

SELECT database, table, mutation_id, is_done, parts_to_do FROM system.mutations WHERE database='logs' AND table='automation_workflow_local_replicated';

Root Cause Summary – Why Mutations Slow the Cluster

  1. Delayed Execution: Mutations queued per table & replica; depend on background load.

  2. Frozen Parts: Merges paused → part backlog grows.

  3. Resource Contention: Each replica rewrites full parts; heavy CPU & disk I/O.

Together, they cause long mutation queues, merge starvation, and replication lag.

Real Queries for Monitoring

-- Check active mutations SELECT * FROM system.mutations WHERE is_done = 0; -- Detect merge backlog SELECT table, count() AS merging_parts FROM system.merges GROUP BY table; -- Check replication lag SELECT table, queue_size, last_queue_update FROM system.replication_queue ORDER BY queue_size DESC;

Best Practices for Safe Deletes

PrincipleRecommendation
Scope by PartitionAlways delete within a partition (DELETE IN PARTITION)
Batch DeletesDelete by org IDs in groups (e.g., 50–200 at a time)
Off-Peak ScheduleRun during low traffic windows
Monitor ProgressWatch system.mutations and system.merges
Optimize AfterwardsOPTIMIZE TABLE ... PARTITION ... FINAL if needed

Key Takeaways

  • ClickHouse is append-optimized, not delete-friendly.

  • Mutations rewrite entire parts → slow and costly.

  • Cross-partition deletes cripple merges and replication.

  • Partition-scoped deletes = 3× faster, safer, cleaner.

  • Always monitor system.mutations and plan deletes during off-peak hours.

Saturday, January 9, 2021

 What is MongoDB FTDC  (Full Time Diagnostic Data Capture)? How does it impact your platform stability?

In simple words, MongoDB Inc. captures server behavior and stores it under a separate directory (data directory/diagnostic.data). This is mainly for MongoDB engineers, to process/analyze mongod and mongos when needed ( troubleshooting). FTDC data files are compressed and not human-readable. MongoDB Inc. engineers cannot access FTDC data without explicit permission and assistance from system owners or operators.


FTDC captures data every 1 second.


Sometimes this process leads to high resource consumption. In a few places, this process marked as overhead for MongoDB overhead.

i.e

https://bugzilla.redhat.com/show_bug.cgi?id=1834377


How ticket drops impacted this process,

This process/tread collects,

  • serverStatus
  • replSetGetStatus (mongod only)
  • collStats for the local.oplog.rs collection (mongod only)

Once our servers are getting load, capturing every 1 second all the above data, could lead to holding tickets longer time than usual.  


The only drawback of disabling this feature is,  troubleshooting time may increase from MongoDB Inc. 

by default this feature is enabled.

How to disable it:
  • Run time  ( Apply immediately but until next instance restart)
  • config level ( You need to restart the instance)

Runtime 
use admin
db.adminCommand( { setParameter: 1 , diagnosticDataCollectionEnabled : false } )


MongoDB config file

setParameter:
  diagnosticDataCollectionEnabled: false

There is a huge change in our platform 




Monday, September 2, 2019

Disk reclaim on MongoDB WiredTiger Engine

This example shows that deleting MongoDB collection will release the disk space to OS ( only applicable for WT storage engine)



```
hasitha@DESKTOP / $ sudo du -sh /data/
6.0G    /data/

hasitha@DESKTOP / $ mongo_local
MongoDB shell version v3.4.21
connecting to: mongodb://localhost:27017/
MongoDB server version: 3.4.21
> show dbs
admin           0.000GB
central_db      0.000GB
e_1v6y03dpfs6z  0.001GB
e_2nas8av7dawh  5.552GB
e_da618q7mfzsv  0.001GB
e_p4rxh05m2k4w  0.008GB
e_testdb        0.001GB
e_yrrsst6ncktg  0.004GB
local           0.000GB
m_testdb        0.000GB
m_yrrsst6ncktg  0.000GB
s_testdb        0.000GB

> use e_2nas8av7dawh
switched to db e_2nas8av7dawh

> show collections
users

> db.users.drop()
true

> show collections

> exit
bye
hasitha@DESKTOP / $ sudo du -sh /data/
358M    /data/

```


Wednesday, February 6, 2019

Download MongoDB backup files (Encrypted) from S3


After spending a few hours, I have written a small script where you can download MongoDB backup files from S3. The specialty of this script is, this can download backup files which are encrypted by using key value ( Encrypted key)




 # DO NOT modify the encryption key unless if they used different key for encryption
 key="<key value>"
 # s3 path for MongoDB backups. You can find it from MongoDB backup logs ( You may not be able to find full s3 path from backup config file as backup file creates additional backup folder.
 s3path="< S3 path >"
 # Backup download location
 backupLocation="/data/backups/tmp"

 echo " ---------------------------------------------------------"
 echo " List of backup files store in $s3path "
 echo ""
 echo " ---------------------------------------------------------"
 echo -e "   Date  |   Time  |   Size  | \tName of the file"
 echo " ---------------------------------------------------------"
 aws s3 ls $s3path
 echo " ---------------------------------------------------------"
 echo " What is the database file name that you need to download ? < Please type or copy paste full name of the backup file here > "
 read filename
 if [ -z $filename ]
  then
   echo " ERROR:: No file name has entered !!! ABORTED"
   exit 1
 fi

 echo ""
 echo " $s3path$filename will be downloading to $backupLocation "
 aws s3 cp $s3path$filename $backupLocation
 echo ""
 echo " ---------------------------------------------------------"
 echo " Decrypting the file ..... "
 openssl enc -d -aes-256-cbc -in $backupLocation/$filename -out $backupLocation/${filename::-4} -k $key
 echo " ---------------------------------------------------------"
 ls -al $backupLocation/
 echo ""
 echo " ---------------------------------------------------------"
 echo ""
 echo " Where do you want to extract the backup files ? Default location is /data/backups/tmp"
 read restoreLocation
 if [ -z $restoreLocation ] 
  then 
   restoreLocation="/data/backups/tmp" 
 fi

 echo ""
 echo " Extracting db files to $restoreLocation "
 tar -xvf $backupLocation/${filename::-4} -C $restoreLocation

 echo " ---------------------------------------------------------"
 echo " list of files and directories in $restoreLocation "
 ls -al $restoreLocation/
 echo " ---------------------------------------------------------"
 echo " Remove downloaded file : $backupLocation/$filename"
 echo " ---------------------------------------------------------"
 rm -f $backupLocation/$filename
  echo ""
 echo " #########################################################"
 echo " # You have successfully downloaded backup files. Please #"
 echo " #  restore your files using mongorestore utility        #"
 echo " #########################################################"       

Tuesday, February 5, 2019

Delete your Large MongoDB database effectively



You can use below javascript code to delete large database without having a long duration of global lock( which is blocked all other operations on this database, maybe your instance might get hang).

Initially, this script will delete all indexes belongs to the selected collection and then its' documents one by one. Once the collection has no documents, it drops the collection finally. Between these operations, sleep function is configured to avoid the global lock being held in longer duration.

var dbname= "dummy"
var URL = "localhost:27017/"+ dbname
db = connect(URL);

var collections = db.getCollectionNames();

for (var i in collections) {

        print ("Deleting collection : " + collections[i] +"...");
        print ("Droping all indexes in "+ collections[i] + "...");
        db[collections[i]].dropIndexes();
        sleep(5000);
        db[collections[i]].find().forEach(function(mydoc) {
                print ("Deleting document : " + mydoc._id)
                db[collections[i]].remove({'_id': mydoc._id})
                //print(mydoc.name);
                sleep(1000);
                }
        )
        db[collections[i]].drop();
        sleep(5000);
}

function sleep(milliseconds) {
  var start = new Date().getTime();
  while (new Date().getTime()< start + milliseconds)  {
    // Do nothing
  }
}