Monday, May 19, 2014

Point in Time Recovery - MongoDB

In production environment database management system should have this feature since there can be situation where you need to recover your database for given time( database looks like same state for given point )

This is well-defined and valuable process for current database management system. MS SQL Server , Oracle DB2 etc  have good steps to do point in time recovery and you can find lot of information regarding these type of RDBMS.

But with NoSQL Technology very less resources are available for this topic and today I'm going to explain how we can do Point in time recovery in MongoDB.

Important : 

When you get stuck with problem in mongo databases better to start with new mongodb instance and do all the restoring to that instance and test/validate. if everything looks good for you then you can transfer data to appropriate place like primary server and allow replication propagate the corrected records to the secondaries. 

Problem : 

The backupDB database has one collection, backupColl. At midnight every night, the system is backed up with a mongodump. 
Your server continued taking writes for a few hours, until 02:46:39. At that point, someone (not you) ran the command:

 db.backupColl.drop()

Your job is to put your database back into the state it was in immediately before the database was dropped, 


Answer :

Step 1 :  Restore your latest backup file into new mongodb instance. this is server going to be your test server.

mongorestore -h <hostname:port> <backup file path>

Step 2 : Take oplog backup from existing server ( better to take it form Primary or large oplog file among member servers). Oplog file keep all operations and it is capped fire but it is not a BACKUP file.

mongodump -h <hostname:port> -d local -c oplog.rs -o oplogD

Step 3 : Move and rename this "oplog.rs.bson" file to "oplog.bson"

mkdir oplogR
mv oplogD/local/oplog.rs.bson oplogR/oplog.bson

Step 4:  Then you have to find exact timestamps for delete operation happened. for that you can convert your backup oplog.bson file. you can use dumpmongo command to convert this file to human readable format.

bsondump oplog.bson > oplog.txt  
bsondump oplog.bson > oplog.json
Then you can use grep command or simple find mechanism to find "drop" keyword. if not you can you your existing database to find this 

db.oplog.rs.find()

Your goal would be find "ts" field for given keyword ("drop")

"ts" : Timestamp( 1398778745, 1 )


Please make this value like this : 1398778745:1

Step 5Note that the mongorestore command has two options, one called --oplogReplay and the other called oplogLimit. You will now replay this oplog on the restored stand-alone server BUT you will stop before this offending update / delete operations. ( basically server should not be modified )

mongorestore -h <hostname:port> --oplogReplay --oplogLimit 1398778745:1 oplogR

This will restore each operation from the oplog.bson file in oplogR directory stopping right before the entry with ts value Timestamp( 1398778745, 1 )

Step 6: Once you have verified it then you can write the restored records to the appropriate place in the real primary (and allow replication propagate the corrected records to the secondaries).