Why This Matters
ClickHouse powers Brevo’s large-scale analytical workloads — handling billions of events per day.
While it excels at fast reads and inserts, deleting data in ClickHouse is a completely different story.
In this article, we’ll explore:
-
Why traditional deletes slow down ClickHouse
-
What’s happening under the hood
-
Real examples from our production cluster
-
How we made deletes 3× faster and safer using partition-based strategies
How ClickHouse Stores Data
Think of ClickHouse as a container ship warehouse.
ClickHouse Concept | Real-World Analogy |
---|---|
Table | Container ship |
Partition | Container on deck |
Part | Carton inside the container |
Row | Individual item inside a carton |
Each new insert adds more cartons (parts) inside the containers (partitions).
ClickHouse is designed to append data, not rewrite it — and that’s where delete challenges begin.
Inserts Are Fast (Append-Only)
ClickHouse shines with bulk inserts:
-
Append-only: Data is always added, never modified in place.
-
Columnar storage: Each column is stored separately for better compression.
-
Batch-friendly: Ideal for large inserts (thousands of rows per batch).
-
Immediate availability: Data becomes queryable as soon as it’s inserted.
Deletes Are Slow — They Trigger Mutations
When you delete in ClickHouse, it doesn’t remove rows directly.
It creates a mutation job, which rewrites the affected parts in the background.
Each mutation:
-
Rewrites entire parts (even if only one row changes)
-
Blocks merges on affected partitions
-
Consumes heavy CPU, disk I/O, and temporary space
Partitions, Merges, and Mutations — Visualized
Merge: Background Optimization
Merges continuously combine small parts into bigger ones — improving query performance.
Mutation: Costly Rewrites
Mutations, on the other hand, must open and repack containers — blocking merges until they finish.
Only one mutation per part runs at a time, so queue buildup is inevitable during heavy delete workloads.
The Problem: Cross-Partition Deletes
Here’s an example of a problematic query we once ran:
Why It’s Bad
-
Affects 13+ partitions (202001–202101)
-
Rewrites 20K+ parts
-
Triggers massive merge backlog
-
Keeps mutation queues busy for hours
The Fix: Partition-Scoped Deletes
Instead of deleting across all partitions, we delete one partition at a time:
We repeat this for each month.
This drastically reduces the number of parts under mutation at once.
One partition at a time keeps merges running, avoids massive lock windows, and isolates failure risk.
Real Result: 3× Faster Execution
-
Before: Single wide mutation → 90 mins, heavy cluster load
-
After: Monthly partition deletes → 30 mins total, smooth merges
-
CPU and I/O dropped by ~60%
-
No Keeper overload or replication lag
How Mutations Replicate Across Replicas
ClickHouse-Keeper acts as the control tower for all replicas:
-
The mutation is recorded as an instruction in Keeper.
-
Each replica picks it up and executes independently.
-
Finished replicas can fetch mutated parts instead of reprocessing.
-
Lagging replicas replay missed mutation IDs later.
Monitor replication status
Root Cause Summary – Why Mutations Slow the Cluster
-
Delayed Execution: Mutations queued per table & replica; depend on background load.
-
Frozen Parts: Merges paused → part backlog grows.
-
Resource Contention: Each replica rewrites full parts; heavy CPU & disk I/O.
Together, they cause long mutation queues, merge starvation, and replication lag.
Real Queries for Monitoring
Best Practices for Safe Deletes
Principle | Recommendation |
---|---|
Scope by Partition | Always delete within a partition (DELETE IN PARTITION ) |
Batch Deletes | Delete by org IDs in groups (e.g., 50–200 at a time) |
Off-Peak Schedule | Run during low traffic windows |
Monitor Progress | Watch system.mutations and system.merges |
Optimize Afterwards | OPTIMIZE TABLE ... PARTITION ... FINAL if needed |
Key Takeaways
-
ClickHouse is append-optimized, not delete-friendly.
-
Mutations rewrite entire parts → slow and costly.
-
Cross-partition deletes cripple merges and replication.
-
Partition-scoped deletes = 3× faster, safer, cleaner.
-
Always monitor
system.mutations
and plan deletes during off-peak hours.
No comments:
Post a Comment