Tuesday, May 8, 2018

How life works, referring to Cells, Cell division, DNA, Genes, Chromosomes, and proteins.

 Nothing related to the database but it something worth knowing about our life: 

This article all about explores how life works in terms of basics biological factors. I’m going to discuss here Cell, Cell division, DNA, Genes, Chromosomes, and proteins. We start from the smallest structural and functional unit of an organism called the cell. As an example, the human body contains trillion of the cell and in another word, it’s a structural, functional and biological unit of all organisms. The cell provides structure for the body, take in nutrients from food, convert those nutrients into energy, and carry out specialized functions. Physically cells always have a boundary membrane and a membrane-bound structure containing biomolecules, such as nucleic acids, proteins, and polysaccharides. According to the structure of the cell, biologists divide organisms into two group. Those groups are,
        Prokaryotes: prokaryotes cells surrounded by a membrane and a cell wall, with a circular strand of DNA contains their genes and it does not have a nucleus
        Eukaryotes: eukaryotic cells contained within the nuclear envelop and separated from the cytoplasm. Nevertheless, these cells boast their own personal “power plants”, called mitochondria. These tiny organelles in the cell not only produce chemical energy but also hold the key to understanding the evolution of the eukaryotic cell.

Cell division is the process of a parent cell divides into two or more daughter cells and it occurs as part of the large cell cycle.

There are two distinct types of cell division in eukaryotes,

        Vegetative division: daughter cells is generally identical to the parent cell(mitosis)

        Reductive cell division: number of chromosomes in the daughter cells is reduced by half, to produce haploid gametes(meiosis)

In terms of definition of the Meiosis is “A type of cellular reproduction in which the number of chromosomes is reduced by half through the separation of homologous chromosomes, producing two haploid cells.” and Mitosis is “A process of asexual reproduction in which the cell divides in two producing a replica, with an equal number of chromosomes in each resulting diploid cell.” Prokaryotes are much simpler in their organization than eukaryotes. There are great many more organelles in eukaryotes, also more chromosomes. The usual method of prokaryote cell division is termed binary fission. The prokaryotic chromosome is a single DNA molecule that first replicates, then attaches each copy to a different part of the cell membrane. When the cell begins to pull apart, the replicate and original chromosomes are separated. Following cell splitting (cytokinesis), there are then two cells of identical genetic composition (except for the rare chance of a spontaneous mutation).

DNA (Deoxyribonucleic acid)

DNA contains the biological instructions that make each species unique. DNA, along with the instructions it contains, is passed from adult organisms to their offspring during reproduction. In another word, it looks like a blueprint for building different parts of the cell. Most of DNA is found inside a special area of the cell called the nucleus. Apart from the DNA located in the nucleus, humans and other complex organisms also have a small amount of DNA in a different structure called mitochondria. A nucleotide is the chemical building blocks of the DNA and it contains three parts, a phosphate group, a sugar group and one of four types of nitrogen bases. The four types of nitrogen bases found in Nucleotides are adenine (A), thymine (T), guanine (G) and cytosine (C). The order or sequence, of these bases, determine what biological instructions are contained in a strand of DNA. For example, the sequence ATCGTT might instruct for blue eyes, while ATCGCT might instruct for brown. Nucleotides are arranged in two long strands that form a spiral called a double helix. The structure of the double helix is more likely a ladder. The important property of DNA is that it can replicate itself. Each strand of DNA in the double helix can serve as a pattern for duplicating the sequence of bases. This is critical when cells divide because each new cell needs to have an exact copy of the DNA present in the old cell.


Genes

In term of biology, “A gene is the basic physical and functional unit of heredity. Genes, which are made up of DNA, act as instructions to make molecules called proteins.” In humans, genes vary in size from a few hundred DNA bases to more than 2 million bases but when it comes to general, genes carry the information that determines traits, which are features that are passed from parents. Genes are found on tiny spaghetti­like structures called chromosomes. The DNA also contains large sequences that do not code for any protein and their function is not known. The gene of the coding region encodes instructions that allow a cell to produce a specific protein or enzyme. There are nearly 50,000 and 100,000 genes with each being made up of hundreds of or thousands of chemical bases. Chromosomes

Chromosomes are the place where DNA is located since the cell is very small and organisms have many DNA molecules per cell, each DNA molecules must be tightly packaged and this package is a form of the DNA is called chromosome. Chromosomes come in matching sets of two (or pairs) and there are hundreds or thousands of genes in just one chromosome. Chromosomes in humans can be divided into two types those are, autosomes and sex chromosomes. Certain genetic traits are linked to a person's sex and are passed on through the sex chromosomes. The autosomes contain the rest of the genetic hereditary information. All act in the same way during cell division. Human cells have 23 pairs of chromosomes (22 pairs of autosomes and one pair of sex chromosomes), giving a total of 46 per cell. Half of these chromosomes come from one parent and half come from the other parent In addition to these, human cells have many hundreds of copies of the mitochondrial genome. Sequencing of the human genome has provided a great deal of information about each of the chromosomes.


Proteins

Proteins are a complex and large molecule that play critical roles in the body and proteins are made with hundreds or thousands of smaller units called amino acids, which are attached to one another in long chains. In order to make proteins, the gene from the DNA is copied by each of the chemical bases into messenger RNA (ribonucleic acid) or mRNA. The mRNA moves out of the nucleus and uses cell organelles in the cytoplasm called ribosomes to form the polypeptide or amino acid that finally folds and configures to form the protein.



All of the above factors are playing such a complex roles/ responsibilities to perform life with many more other things.

Thursday, May 3, 2018

Best Practices for Running Cassandra on AWS



This post will explain how we can run Cassandra on AWS effectively, or in other words best approaches to running Cassandra on AWS. As you know AWS is one of the largest cloud environment available in the world.  I've done presentation recently about above topic and I would like to outline that presentation here for your reference.


What is Cassandra?
Apache Cassandra is a massively scalable open source NoSQL database
It delivers,
  • Continues availability,
  • Linear scalability,
  • Operational simplicity across many commodity servers with no single point of failure,
  • A masterless peer-to-peer distributed system where data is distributed among all nodes in the cluster


What is Cassandra Node?

  • A physical server, EC2 instance
  • Each machine has one installation of Cassandra.
  • A node in a cluster is just a fully functional machine that is connected to other nodes in the cluster through the high internal network.
  • All nodes work together to make sure that even if one of them failed due to an unexpected error, they as a whole cluster can provide service.
  • All nodes in a Cassandra cluster are same.
  • AWS provides an expert level of the platform for the Cassandra cluster.

How Cassandra supports High Availability?
  • Cassandra is designed to be fault-tolerant and highly available during multiple node failures.
  • Amazon Regions and availability zones can be used for deployment 
  • Resiliency is ensured through infrastructure automation.
  • Quick replacement of failing nodes
  • In case of regionwide failure, if we deploy with the multi_region option, traffic can be directed to the other active Region

Deploying Cassandra on AWS
  • Cassandra on Amazon EC2  can be automated.
  • Amazon CloudFormation which allows you to describe and provision all your infrastructure resources in AWS. No additional charge and you pay only for the AWS resources 
  • Cassandra common design patterns on AWS
    • Single AWS Region, 3 Availability Zones
    • Active-Active, Multi-Region
    • Active-Standby, Multi-Region
Single Region, 3 Availability Zones
  • Deploy the Cassandra cluster in one AWS Region and three Availability Zones.
  • There is only one ring in the cluster. 
  • By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
Single Region, 3 Availability Zones

  • Deploy the Cassandra cluster in one AWS Region and three Availability Zones.
  • There is only one ring in the cluster. 
  • By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.
Active-Active, Multi-Region

  • Two rings in two Regions
  • The VPCs in the two Regions have peered 
  • The two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
  • This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
  • Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.
Active-Standby, Multi-region
  • Two rings in two Regions
  • The VPCs in the two Regions have peered 
  • The two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
  • the second Region does not receive traffic from the applications. 
  • It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

Planning High-Performance Storage Options

  • Cassandra is sequential for a write-heavy workload.  But read-heavy workloads require random access.
  • If your working set (data + index) does not fit into memory, Then you need to have more I/O requests on the disk
  • Very important to select the correct storage option
  • We are not recommended to use magnetic volume types(HDD)  due to low-performance reasons
  • AWS provides two main options,
  • Amazon EC2 Instance stores 
  • Amazon EBS
Amazon EC2 Instance Store

  • Disk storage located on disks that are physically attached to the host computer – Called it as “Instance store”
  • If you are using  more than a single volume, we can stripe the instance store volumes ( RAID 0)
  • Enhanced I/O throughput
  • But If the instance is stopped, fails or is terminated,
  • You will lose all your data
  • Therefore, we need to replicate data across the multiple nodes across the Availability Zones or can go across the region level based on the requirements
Amazon EBS - Amazon Elastic Block Store

  • It provides persistent block storage 
  • Each Amazon volume id automatically replicated within its Availability Zone to protect from component failure ( High Availability, Durability)
  • By using Amazon CloudWatch with AWS Lambda you can automate volume changes 
  • General Purpose SSD  (gp2)
    • gp2 is designed to offer single-digit millisecond latencies
    • Deliver a consistent baseline performance of 3 IOPS/GB (minimum 100 IOPS) to a maximum of 10,000 IOPS
    • provide up to 160 MB/s of throughput per volume. 
    • Can reach up to 3000 IOPS if the volume is less than 1 TB
  • Provisioned IOPS SSD (io1)
    • The highest performance EBS storage option designed for critical, I/O intensive database.
    • 50 IOPS/GB to maximum 32000 IOPS
    • 500 MB/s of throughput per volume.
    • single-digit millisecond latencies and it designed to deliver the provisioned performance 99.9% of the time
Why EBS Optimized Instance?

  • Usually, the network traffic and EBS traffic is shared in Amazon EC2 instance
  • Meaning, Consistent EBS performance depends on the amount of non-EBS related network.  So we can't guarantee network traffic between Instance and EBS volume
  • The solution, EBS-Optimized instances
  • It has an additional and dedicated capacity between EC2 and EBS I/O
  • This optimization minimizes the contention between EBS I/O and other traffic from EC2
  • It has dedicated bandwidth to Amazon EBS depending on the instance type
  • Minimum  425 Mbps and 14,000 Mbps
Instance Types that support EBS Optimization
  • The Current generation instance types are EBS-optimized by default
  •  C5, C4, M5, M4, P3, P2, G3, and D2 instances,
  • No need to enable EBS optimization and no effect if you disable EBS optimization
  • We can enable EBS optimization if the instance that is not EBS-optimized by default
  • Can enable when launching the instance, or enable while the instance is running
  • We need to pay additionally if the instance doesn’t come with EBS-optimized
Available Instance Types for Cassandra

  • Computer Optimized  - C4, C5
    • High Level of Network performance
    • Default EBS-optimized for increased storage performance at no additional cost
  • Storage Optimized – I3
    • SSD-backed instance storage optimized for low latency
    • Very high random I/O performance,
  • High sequential and read throughput provide high IOPS at the low cost
    • Memory Optimized – x1e, R4
    • Optimized for memory – intensive applications 
    • SSD storage and EBS-optimized by default and at no additional cost
    • The lowest price of RAM
Planning Instance Types Based on Storage Needs

  • Let's assume we need to have a 600TB and 10% for overhead for disk formatting 
  • 95% writes and 5% reads ( Write heavy )
  • The most common instance type for Cassandra on AWS is “i3”. Why?
    • Designed for I/O intensive workloads
    • Having with SSD storage ( Instance store)
    • Instances are available in On-Demand, Reserved, and Spot from in 15 regions
  • Let’s pick i3.2xlarge instance
    • (600 x 1024)/(1900 x 0.9)  = 360 Instances 
    • 360 x $0.624 x 720 = $161,740.8 per month
      • Not including data transferring charges
      • Assumed commit log also stored on the same drive
  • This value is so costly 
    • Can use more local storage but it might not help with the cost
Decoupling with EBS ( Instance and storage separately)

  • Cassandra recommends separate drives for Data and commit log(better performance) 
  •  We will be allocated in each node, 
    • 500GB for commit log
    • 4 TB for data
  • Computer Optimized instance type is more popular for running Cassandra on EBS 
    • C4 Instance type does not have  any local storage
    • C4.4xlarge Good fit with for production workload 
    • 30 GB memory, 16 vCPU, 2,000 Mbps  Dedicated EBS Throughput
  • Will see the cost estimation based on the facts that we discussed 

Number of instances = 600TB/ 4TB => 150 Instances

 Storage Requirements =Data Storage + Commit Log storage 
600 TB   +   75 TB
    675TB

Calculating EC2  cost  per month -  US East(N. Virginia)    
      150 c4.4xlarge =  150 x $0.796 x720 = $85,968

Calculating EBS volume costs per month
  675 TB EBS GP2 = 675 x1024x $0.1 = $69,120

Total cost = $ 85,968 + $ 69,120
$155,088

We can consider Reserved Instance for further optimizations
  • Let’s say that we are happy with C4.4xlarge plus EBS configuration with cluster performance over couple of months
  • We can make reservations and optimize cost
  •  For 3 years plan with partial upfront plan
    • Per month Instances cost = 150 x $0.33x720 => $35,640
    • Per month EBS cost = 675 x1024x $0.1 => $69,120
    • Total cost per month = $35,640 + 69,120  => $104,760
  • For i3.2xlarge instance, 3 years plan with partial upfront plan
    • Per month cost =  360 x $0.28 x 720 => $72,576
Planning Elastic Network Interfaces (ENI)


  • The virtual Network interface that you can be attached /detached to/from an instance in VPC in single Availability Zone.
  • Can be attached to one instance, detached it from that instance, and attached it to another instance in the same availability zone
  • When you move ENI from one instance to another, network traffic is redirected to the new instance
  • Really help with seeds node configuration( Hard code config)
  • Failure of a seed node, you can automate in such a way that the new seed node takes over the ENI IP address programmatically

Monitoring by using Amazon CloudWatch

  • Amazon CloudWatch can be used as a resource monitoring service 
  • It collects and tracks metrics,
    • Collect and monitor log files
    • Set alarms
  • We can write a custom metric and submit it to Amazon CloudWatch
  • We can configure alarms to notify you when the metrics exceed certain defined thresholds
Maintenance
  • In terms of Cassandra cluster health,
    • Scaling 
      • Cassandra is horizontally scaled by adding more instances to the ring.
    • Upgrades
      • Rolling upgrade pattern will be used for Cassandra, Operating System patching and instance type)
    • Backup & Restore 
      • Cassandra supports snapshots and incremental backup
      • Can be used instance store, the file-based backup tool works best
      • These backup files are copied to new instances to restore
      • We recommend using S3 to durably store backup files for long-term storage.
Security

  • Ensure that the data is encrypted at rest and in transit. 
  • The second step is to restrict access to unauthorized users.
  • Encryption at rest
    • Encryption at rest can be achieved by using EBS volumes with encryption enabled.
  • Encryption in transit
    • Cassandra uses Transport Layer Security (TLS) for client and internode communications.
References 


https://d1.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf
https://aws.amazon.com/ec2/instance-types/
https://aws.amazon.com/ebs/details/
https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/
https://aws.amazon.com/ec2/pricing/on-demand/
https://aws.amazon.com/ec2/instance-types/x1e/
https://aws.amazon.com/blogs/aws/now-available-new-c4-instances/
https://aws.amazon.com/blogs/aws/now-available-i3-instances-for-demanding-io-intensive-applications/



Friday, July 18, 2014

db.currentOp() - MongoDB in-progress operations for database instance


db.currentOp() will return a document that reports in-progress operations for the database instance


i.e

db.currentOp(true)
This will return a more descriptive output, including idle connections and system operations.

Note: db.currentOp() is available only for Administrative users




To get current users;

db.currentOp().inprog.forEach(
   function(d){
     if(d.client)
        printjson(d.client)
     })

 Shards Environment

db.currentOp().inprog.forEach(
   function(d){
     if(d.client_s)
        printjson(d.client_s)
     })

Return the active write operation 

db.currentOp().inprog.forEach(
   function(d){
     if(d.active && d.lockType == "write")
        printjson(d)
     })



Return all active read operations

db.currentOp().inprog.forEach(
   function(d){
     if(d.active && d.lockType == "read")
        printjson(d)
     })




Wednesday, July 16, 2014

MongoDB Profiler


MongoDB Profiler

database profiler is the process of examining the data available in  existing database(s) and collecting statistics information about the database related action.

Link: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/

Profiling Stages,

You can retrieve  update , remove and select queries but you will not be able to get more details about insert queries

The following profiling levels are available:
Level Setting
0       Off. No profiling
1       Only includes “slow” operations ( default 100ms)
2       Includes all operations


This link will tell you profiler out put result details

http://docs.mongodb.org/manual/reference/database-profiler/

Here is the deal

How we can get latest or all operation query details from system.profiler collection

As an example,i'm using "test" database and "sample"

Limit(number) will limit your out put result. if you wan to get all out put remove ".limit()" function from the command.

Enable Profiler on test database

1) check existing profiler status
db.getProfilingLevel()
      You will get 0,1 or 2

2) set profiling for all operations
db.setProfilingLevel(2)
      { "was" : 0, "slowms" : 100, "ok" : 1 }

3) check ur profiling state again,
db.getProfilingLevel()

Now I'm inserting data.

use test
db.sample.insert({_id:1 ,"text":"Sample Text"})


Here is there record in system.profile collection related to latest insert operation

     db.system.profile.find({"op":"insert"}).sort({ts:-1}).limit(1).pretty()

{
        "op" : "insert",
        "ns" : "test.sample",
        "ninserted" : 1,
        "keyUpdates" : 0,
        "numYield" : 0,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(1283230)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(94889)
                }
        },
        "millis" : 1291,
        "ts" : ISODate("2014-07-15T05:19:57.958Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}



Now updating

db.sample.update({_id:1},{"Text":"Sample Text New"})

This what captured in system.profile collection

{
        "op" : "update",
        "ns" : "test.sample",
        "query" : {
                "_id" : 1
        },
        "updateobj" : {
                "Text" : "Sample Text New"
        },
        "idhack" : true,
        "moved" : true,
        "nmoved" : 1,
        "nupdated" : 1,
        "keyUpdates" : 0,
        "numYield" : 0,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(28005)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(19)
                }
        },
        "millis" : 28,
        "ts" : ISODate("2014-07-15T05:57:32.157Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}


Remove document

db.sample.remove({_id:2})

db.system.profile.find({"op":"remove"}).sort({ts:-1}).limit(1).pretty()

{
        "op" : "remove",
        "ns" : "test.sample",
        "query" : {
                "_id" : 2
        },
        "ndeleted" : 1,
        "keyUpdates" : 0,
        "numYield" : 1,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(85442)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(0),
                        "w" : NumberLong(43545)
                }
        },
        "millis" : 43,
        "ts" : ISODate("2014-07-15T06:35:24.784Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}


Query collection

db.sample.find({_id:1})

db.system.profile.find({"op":"query"}).sort({ts:-2}).limit(1).pretty()

{
        "op" : "query",
        "ns" : "test.system.profile",
        "query" : {
                "query" : {
                        "op" : "query"
                },
                "orderby" : {
                        "ts" : -2
                }
        },
        "ntoreturn" : 1,
        "ntoskip" : 0,
        "nscanned" : 114,
        "scanAndOrder" : true,
        "keyUpdates" : 0,
        "numYield" : 0,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(931),
                        "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(7),
                        "w" : NumberLong(6)
                }
        },
        "nreturned" : 1,
        "responseLength" : 400,
        "millis" : 0,
        "ts" : ISODate("2014-07-15T06:53:58.221Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}









Monday, May 19, 2014

Point in Time Recovery - MongoDB

In production environment database management system should have this feature since there can be situation where you need to recover your database for given time( database looks like same state for given point )

This is well-defined and valuable process for current database management system. MS SQL Server , Oracle DB2 etc  have good steps to do point in time recovery and you can find lot of information regarding these type of RDBMS.

But with NoSQL Technology very less resources are available for this topic and today I'm going to explain how we can do Point in time recovery in MongoDB.

Important : 

When you get stuck with problem in mongo databases better to start with new mongodb instance and do all the restoring to that instance and test/validate. if everything looks good for you then you can transfer data to appropriate place like primary server and allow replication propagate the corrected records to the secondaries. 

Problem : 

The backupDB database has one collection, backupColl. At midnight every night, the system is backed up with a mongodump. 
Your server continued taking writes for a few hours, until 02:46:39. At that point, someone (not you) ran the command:

 db.backupColl.drop()

Your job is to put your database back into the state it was in immediately before the database was dropped, 


Answer :

Step 1 :  Restore your latest backup file into new mongodb instance. this is server going to be your test server.

mongorestore -h <hostname:port> <backup file path>

Step 2 : Take oplog backup from existing server ( better to take it form Primary or large oplog file among member servers). Oplog file keep all operations and it is capped fire but it is not a BACKUP file.

mongodump -h <hostname:port> -d local -c oplog.rs -o oplogD

Step 3 : Move and rename this "oplog.rs.bson" file to "oplog.bson"

mkdir oplogR
mv oplogD/local/oplog.rs.bson oplogR/oplog.bson

Step 4:  Then you have to find exact timestamps for delete operation happened. for that you can convert your backup oplog.bson file. you can use dumpmongo command to convert this file to human readable format.

bsondump oplog.bson > oplog.txt  
bsondump oplog.bson > oplog.json
Then you can use grep command or simple find mechanism to find "drop" keyword. if not you can you your existing database to find this 

db.oplog.rs.find()

Your goal would be find "ts" field for given keyword ("drop")

"ts" : Timestamp( 1398778745, 1 )


Please make this value like this : 1398778745:1

Step 5Note that the mongorestore command has two options, one called --oplogReplay and the other called oplogLimit. You will now replay this oplog on the restored stand-alone server BUT you will stop before this offending update / delete operations. ( basically server should not be modified )

mongorestore -h <hostname:port> --oplogReplay --oplogLimit 1398778745:1 oplogR

This will restore each operation from the oplog.bson file in oplogR directory stopping right before the entry with ts value Timestamp( 1398778745, 1 )

Step 6: Once you have verified it then you can write the restored records to the appropriate place in the real primary (and allow replication propagate the corrected records to the secondaries).

Wednesday, March 19, 2014

High Availability on MongoDB

This is article completely summary of official mongo site : http://docs.mongodb.org/manual/core/replica-set-high-availability/

Replica Set
  • Replica set provides High Availability using automatic failover. Failover allows secondary members to become primary if primary is unavailable. Failover, in most situations does not require manual intervention.
  •  Replicas set members keep the same data set but they are independent.
  • To select Primary Server there will be an election between other servers. Place a majority of voting members and all the members that can become primary in this facility.
Replica Set Elections
  •          Replica sets use elections to determine which set member will become primary. Election occurs after initiating a replica set, and also any time the primary becomes unavailable
  •          The primary is the only member in the set that can accept write operations.
  •          Elections are part of the failover process
  •          While an election is in process, the replica set has no primary and cannot accept writes.

Factors and Conditions that Affect Elections
Heartbeats
  •          Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.

Priority Comparisons
  •          The Priority setting affects elections. Priorities specify higher values to make a member more eligible to become primary and lower values to make the member less eligible to become primary.
  •          Priority 0 = Cannot become primary and do not seek election

Optime
  •          The timestamp of the last operation that a member applied from the oplog.
      Connections

  •    A replica set member cannot become primary unless it can connect to a majority of the members in the replica set
Election Triggering Events

  •          Replica sets hold an election any time there is no primary. Specifically, the following:
  •          The initiation of new replica set.
  •          A secondary loses contact with a primary. Secondaries call for elections when they cannot see a primary.
  •          A primary step down.

o   After receiving the replSetStepDown command
o   If one of the current secondaries is eligible for election and has a higher priority
o   If primary cannot contact a majority of the members of the replica set.

Note: When a primary steps down, it closes all open client connections, so that client don’t attempt to write data to a secondary. This helps client maintain an accurate view of the replica set and helps prevent rollbacks.

Participation in Elections

  • By default, all members have a priority of 1 and have an equal chance of becoming primary. In the default, all members also can trigger an election.
  • Only members in the following states can vote: PRIMARY, SECONDARY, RECOVERING, ARBITER, and ROLLBACK.
Non-Voting Members
  • Non-voting members hold copies of the replica set’s data and can accept read operations from client applications
  • Non-voting members do not vote in elections, but can “veto” an election and become primary.







Friday, December 13, 2013

It is about MongoDB indexes.



No need to re-write again. Yeah this article is so simple to read and understand.

Link : http://docs.mongodb.org/manual/core/indexes-introduction/

Thank you very much 10gen come up with good articles. seems like MongoDB documentation is getting more easy to read and understand.