Monday, December 31, 2018

Thursday, December 6, 2018

How MongoDB's Journaling Works



MongoDB journal provides durability in the event of a failure, MongoDB uses write-ahead logging to on-disk journal files.

It has explained in nicely in below article.  Credit should go to the original author 

Friday, August 31, 2018

Cassandra data retrieving issue - Due to huge tombstones

Recently, one of the development team has complained about their Cassandra cluster isn't returning the data from one of the tables and soon after that issue faced cluster went down.

This incident happened over the weekend and on-call Cassandra DBA was informed and asked to check the cluster. Initially, Dev team has complained about the performance issue with Cassandra cluster, in other words, Cassandra cluster couldn't handle the current load( Later they have identified it was a sudden huge data load came directly from one of the subsystems)
As a first step, DBA decided to start the cluster by starting Cassandra service but he observed the Cassandra log saying "unable to handle the load" exception ( This not the actual error)


As usual heap size wasn't enough for the existing servers due to that new nodes have been added with 12 GB of heap size with total 16 GB of physical RAM.

Even after adding new nodes, DBA has executed repair command and the problem got solved temporarily. 

Late part of the day, Again Dev team has reported their issue still exist and ask DBA support. This time we have observed my WARN message for tombstoned and it was a huge error list. <-- This error was happening from the beginning but it didn't notice because the team wanted to start the cluster running state somehow.

With the high number of tombstones, we suspected that their might have some unusual behavior in application level and we checked with Dev team regarding application involvement with showed table.

Finally, the dev team has confirmed that they are using this table as a temporary location for particular data and delete the data in every 15 mins via corn job.

Since default grace period of the tombstone is nearly 10 days, data is not being deleted from the Cassandra cluster completely until grace period is over.  Due recursive call of the delete operation, these tombstones were slowly growing up and no repair jobs were running to resolve the data scatter.

With all of this,  select query on this table was unable to receive the data as it has to read the huge list of tombstones value to find a live or actual data and heap memory wasn't enough to load all tombstones values. 

As a solution, we have changed the "gc_grace_seconds" to the 1200( 20 mins) and upgrade the sstable and then compact the table manually in each node

connect to the instance and change the gc_grace_seconds value in gradebeforeitem  table
use xxxxxx_prd;
alter table gradebeforeitem with gc_grace_seconds = 1200;


Each command has executed in each node individually ( One after another) to reduce the impact.
nohup nodetool upgradesstables -a xxxxxx_prd   gradebeforeitem &
nohup nodetool compact xxxxxx_prd gradebeforeitem &

Once we received the confirmation and success story from dev team we reverted back to the " gc_grace_seconds " value to the default (864000 ) and set up the repair job in each node to run weekly. We scheduled the job keeping 24 hours difference between jobs to avoid the clashes between the repair jobs

Sunday, May 20, 2018

Upcoming Features in MongoDB 4.0

As per the MongoDB documentation,  MongoDB will be introduced "Multi-Document Transactions".  MongoDB 4.0 will add multi-document transactions for replica sets.

You can find example here :
https://docs.mongodb.com/manual/upcoming/

Wednesday, May 9, 2018

MongoDB Cloud Manager vs. Ops Manager

  • Enterprise Advanced comes with both Ops Manager and Cloud Manager so it is up to you which one you would like to use 
  • Cloud Manager is managed software-as-a-service where you only have to install relevant agents (monitoring, automation, or backup) in your infrastructure. Ops Manager is on-premise software that is part of an Enterprise subscription: you have to install and manage all services on your own infrastructure.
  • They are similar in terms of main feature areas (monitoring, automation, backup) and user interface but not identical. Since Cloud Manager is a managed service it is updated more regularly than Ops Manager releases and currently has some additional features such as integration with cloud provisioning instances via providers like AWS or Azure.
  • Please note that the agent versions also differ between Cloud and Ops Manager, so you should ensure you are running the correct agents and referring to the relevant documentation.

MongoDB Enterprise Licence Pricing

Recently, I got a chance to have a call with one of the sales executive and found out below pricing information for MongoDB enterprise node

Please note that this cost only for MongoDB Enterprise binaries and once you purchased binaries you will be getting many features along with binaries.

There are costing only for data nodes. As an example, you don't need to pay for config servers or MongoS severs in MongoDB shard cluster.

As per the April 2108


  • Production node is $12,990 per annum
  • UAT/QA node is $ 6,495 per annum ( 50% from production cost)
  • Development node is fee


MongoDB Enterprise version comes with trial feature and it has only 30 days. There are no features cut down or being stopped after the trial period.  But you are not allowed to use Enterprise more than 30 days without purchasing for the Production environment

Cost for 5 nodes production replica set runs for year,

5 x $12,990 = $ 64, 950

Since MongoDB didn't share this pricing information on their website, they could have a change pricing time to time. But you could estimate at least before reaching them.




Tuesday, May 8, 2018

About MongoDB lock file (mongodb.lock)

the mongodb.lock file is a simple file which holds a random number and it is doing
most important tasks for MongoDB

Important of mongodb.lock file

1. To detect unexpected/ Unclean shutdown

2. dbpath only access by ONE mongod process


If you have seen any unexpected shutdown or hanged MongoDB server you have to repair your MongoDB before it starts.

Here are basic steps

1. look at the mongodb log file and see bottom lines of log. You will see some clue on mongod hanged
    tail -n 20 /var/log/mongodb/mongodb.log
    you  may see below error message
    "Detected unclean shutdown - mongod.lock is not empty."

2. Create a backup copy of the data files in the --dbpath.
    mongodump --dbpath /data/mongodb/ -o /var

3. Start mongodb with --repair command
    mongod --dbpath /data/db --repair

After the repair, the dbpath should contain the repaired data files and an empty mongod.lock file.