Tuesday, October 14, 2025


Why This Matters

ClickHouse powers Brevo’s large-scale analytical workloads — handling billions of events per day.
While it excels at fast reads and inserts, deleting data in ClickHouse is a completely different story.

In this article, we’ll explore:

  • Why traditional deletes slow down ClickHouse

  • What’s happening under the hood

  • Real examples from our production cluster

  • How we made deletes 3× faster and safer using partition-based strategies

How ClickHouse Stores Data

Think of ClickHouse as a container ship warehouse.

Each new insert adds more cartons (parts) inside the containers (partitions).
ClickHouse is designed to append data, not rewrite it — and that’s where delete challenges begin.

 Inserts Are Fast (Append-Only)

ClickHouse shines with bulk inserts:

  • Append-only: Data is always added, never modified in place.

  • Columnar storage: Each column is stored separately for better compression.

  • Batch-friendly: Ideal for large inserts (thousands of rows per batch).

  • Immediate availability: Data becomes queryable as soon as it’s inserted.

Deletes Are Slow — They Trigger Mutations

When you delete in ClickHouse, it doesn’t remove rows directly.
It creates a mutation job, which rewrites the affected parts in the background.

ALTER TABLE events.email_transactional DELETE WHERE organization_id = 140354 AND email = 'heiditanyh@gmail.com';

Each mutation:

  • Rewrites entire parts (even if only one row changes)

  • Blocks merges on affected partitions

  • Consumes heavy CPU, disk I/O, and temporary space

Partitions, Merges, and Mutations — Visualized

Merge: Background Optimization

Merges continuously combine small parts into bigger ones — improving query performance.

Mutation: Costly Rewrites

Mutations, on the other hand, must open and repack containers — blocking merges until they finish.
Only one mutation per part runs at a time, so queue buildup is inevitable during heavy delete workloads.

The Problem: Cross-Partition Deletes

Here’s an example of a problematic query we once ran:

ALTER TABLE logs.automation_workflow DELETE WHERE organization_id = 2373101 AND event_date >= '2020-01-20' AND event_date <= '2021-01-31';

Why It’s Bad

  • Affects 13+ partitions (202001–202101)

  • Rewrites 20K+ parts

  • Triggers massive merge backlog

  • Keeps mutation queues busy for hours

The Fix: Partition-Scoped Deletes

Instead of deleting across all partitions, we delete one partition at a time:

ALTER TABLE logs.automation_workflow DELETE IN PARTITION '202012' WHERE organization_id = 2373101 AND event_date >= '2020-12-01' AND event_date < '2021-01-01';

We repeat this for each month.
This drastically reduces the number of parts under mutation at once.

One partition at a time keeps merges running, avoids massive lock windows, and isolates failure risk.

Real Result: 3× Faster Execution

  • Before: Single wide mutation → 90 mins, heavy cluster load

  • After: Monthly partition deletes → 30 mins total, smooth merges

  • CPU and I/O dropped by ~60%

  • No Keeper overload or replication lag

How Mutations Replicate Across Replicas

ClickHouse-Keeper acts as the control tower for all replicas:

  1. The mutation is recorded as an instruction in Keeper.

  2. Each replica picks it up and executes independently.

  3. Finished replicas can fetch mutated parts instead of reprocessing.

  4. Lagging replicas replay missed mutation IDs later.

Monitor replication status

SELECT database, table, mutation_id, is_done, parts_to_do FROM system.mutations WHERE database='logs' AND table='automation_workflow_local_replicated';

Root Cause Summary – Why Mutations Slow the Cluster

  1. Delayed Execution: Mutations queued per table & replica; depend on background load.

  2. Frozen Parts: Merges paused → part backlog grows.

  3. Resource Contention: Each replica rewrites full parts; heavy CPU & disk I/O.

Together, they cause long mutation queues, merge starvation, and replication lag.

Real Queries for Monitoring

-- Check active mutations SELECT * FROM system.mutations WHERE is_done = 0; -- Detect merge backlog SELECT table, count() AS merging_parts FROM system.merges GROUP BY table; -- Check replication lag SELECT table, queue_size, last_queue_update FROM system.replication_queue ORDER BY queue_size DESC;

Best Practices for Safe Deletes

PrincipleRecommendation
Scope by PartitionAlways delete within a partition (DELETE IN PARTITION)
Batch DeletesDelete by org IDs in groups (e.g., 50–200 at a time)
Off-Peak ScheduleRun during low traffic windows
Monitor ProgressWatch system.mutations and system.merges
Optimize AfterwardsOPTIMIZE TABLE ... PARTITION ... FINAL if needed

Key Takeaways

  • ClickHouse is append-optimized, not delete-friendly.

  • Mutations rewrite entire parts → slow and costly.

  • Cross-partition deletes cripple merges and replication.

  • Partition-scoped deletes = 3× faster, safer, cleaner.

  • Always monitor system.mutations and plan deletes during off-peak hours.