Database Amigos: Best Practices for Running Cassandra on AWS

This post will explain how we can run Cassandra on AWS effectively, or in other words best approaches to running Cassandra on AWS. As you know AWS is one of the largest cloud environment available in the world. I've done presentation recently about above topic and I would like to outline that presentation here for your reference.

What is Cassandra?
Apache Cassandra is a massively scalable open source NoSQL database
It delivers,

Continues availability,
Linear scalability,
Operational simplicity across many commodity servers with no single point of failure,
A masterless peer-to-peer distributed system where data is distributed among all nodes in the cluster

What is Cassandra Node?

A physical server, EC2 instance
Each machine has one installation of Cassandra.
A node in a cluster is just a fully functional machine that is connected to other nodes in the cluster through the high internal network.
All nodes work together to make sure that even if one of them failed due to an unexpected error, they as a whole cluster can provide service.
All nodes in a Cassandra cluster are same.
AWS provides an expert level of the platform for the Cassandra cluster.

How Cassandra supports High Availability?

Cassandra is designed to be fault-tolerant and highly available during multiple node failures.
Amazon Regions and availability zones can be used for deployment
Resiliency is ensured through infrastructure automation.
Quick replacement of failing nodes
In case of regionwide failure, if we deploy with the multi_region option, traffic can be directed to the other active Region

Deploying Cassandra on AWS

Cassandra on Amazon EC2 can be automated.
Amazon CloudFormation which allows you to describe and provision all your infrastructure resources in AWS. No additional charge and you pay only for the AWS resources
Cassandra common design patterns on AWS

Single AWS Region, 3 Availability Zones
Active-Active, Multi-Region
Active-Standby, Multi-Region

Single Region, 3 Availability Zones

Deploy the Cassandra cluster in one AWS Region and three Availability Zones.
There is only one ring in the cluster.
By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

Single Region, 3 Availability Zones

Deploy the Cassandra cluster in one AWS Region and three Availability Zones.
There is only one ring in the cluster.
By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

Active-Active, Multi-Region

Two rings in two Regions
The VPCs in the two Regions have peered
The two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.
Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

Active-Standby, Multi-region

Two rings in two Regions
The VPCs in the two Regions have peered
The two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.
the second Region does not receive traffic from the applications.
It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

Planning High-Performance Storage Options

Cassandra is sequential for a write-heavy workload. But read-heavy workloads require random access.
If your working set (data + index) does not fit into memory, Then you need to have more I/O requests on the disk
Very important to select the correct storage option
We are not recommended to use magnetic volume types(HDD) due to low-performance reasons
AWS provides two main options,
Amazon EC2 Instance stores
Amazon EBS

Amazon EC2 Instance Store

Disk storage located on disks that are physically attached to the host computer – Called it as “Instance store”
If you are using more than a single volume, we can stripe the instance store volumes ( RAID 0)
Enhanced I/O throughput
But If the instance is stopped, fails or is terminated,
You will lose all your data
Therefore, we need to replicate data across the multiple nodes across the Availability Zones or can go across the region level based on the requirements

Amazon EBS - Amazon Elastic Block Store

It provides persistent block storage
Each Amazon volume id automatically replicated within its Availability Zone to protect from component failure ( High Availability, Durability)
By using Amazon CloudWatch with AWS Lambda you can automate volume changes
General Purpose SSD (gp2)

gp2 is designed to offer single-digit millisecond latencies
Deliver a consistent baseline performance of 3 IOPS/GB (minimum 100 IOPS) to a maximum of 10,000 IOPS
provide up to 160 MB/s of throughput per volume.
Can reach up to 3000 IOPS if the volume is less than 1 TB

Provisioned IOPS SSD (io1)

The highest performance EBS storage option designed for critical, I/O intensive database.
50 IOPS/GB to maximum 32000 IOPS
500 MB/s of throughput per volume.
single-digit millisecond latencies and it designed to deliver the provisioned performance 99.9% of the time

Why EBS Optimized Instance?

Usually, the network traffic and EBS traffic is shared in Amazon EC2 instance
Meaning, Consistent EBS performance depends on the amount of non-EBS related network. So we can't guarantee network traffic between Instance and EBS volume
The solution, EBS-Optimized instances
It has an additional and dedicated capacity between EC2 and EBS I/O
This optimization minimizes the contention between EBS I/O and other traffic from EC2
It has dedicated bandwidth to Amazon EBS depending on the instance type
Minimum 425 Mbps and 14,000 Mbps

Instance Types that support EBS Optimization

The Current generation instance types are EBS-optimized by default
C5, C4, M5, M4, P3, P2, G3, and D2 instances,
No need to enable EBS optimization and no effect if you disable EBS optimization
We can enable EBS optimization if the instance that is not EBS-optimized by default
Can enable when launching the instance, or enable while the instance is running
We need to pay additionally if the instance doesn’t come with EBS-optimized

Available Instance Types for Cassandra

Computer Optimized - C4, C5

High Level of Network performance
Default EBS-optimized for increased storage performance at no additional cost

Storage Optimized – I3

SSD-backed instance storage optimized for low latency
Very high random I/O performance,

High sequential and read throughput provide high IOPS at the low cost

Memory Optimized – x1e, R4
Optimized for memory – intensive applications
SSD storage and EBS-optimized by default and at no additional cost
The lowest price of RAM

Planning Instance Types Based on Storage Needs

Let's assume we need to have a 600TB and 10% for overhead for disk formatting
95% writes and 5% reads ( Write heavy )
The most common instance type for Cassandra on AWS is “i3”. Why?

Designed for I/O intensive workloads
Having with SSD storage ( Instance store)
Instances are available in On-Demand, Reserved, and Spot from in 15 regions

Let’s pick i3.2xlarge instance

(600 x 1024)/(1900 x 0.9) = 360 Instances
360 x $0.624 x 720 = $161,740.8 per month

Not including data transferring charges
Assumed commit log also stored on the same drive

This value is so costly

Can use more local storage but it might not help with the cost

Decoupling with EBS ( Instance and storage separately)

Cassandra recommends separate drives for Data and commit log(better performance)
We will be allocated in each node,

500GB for commit log
4 TB for data

Computer Optimized instance type is more popular for running Cassandra on EBS

C4 Instance type does not have any local storage
C4.4xlarge Good fit with for production workload
30 GB memory, 16 vCPU, 2,000 Mbps Dedicated EBS Throughput

Will see the cost estimation based on the facts that we discussed

Number of instances = 600TB/ 4TB => 150 Instances

Storage Requirements =Data Storage + Commit Log storage

600 TB + 75 TB

675TB

Calculating EC2 cost per month - US East(N. Virginia)

150 c4.4xlarge = 150 x $0.796 x720 = $85,968

Calculating EBS volume costs per month

675 TB EBS GP2 = 675 x1024x $0.1 = $69,120

Total cost = $ 85,968 + $ 69,120

$155,088

We can consider Reserved Instance for further optimizations

Let’s say that we are happy with C4.4xlarge plus EBS configuration with cluster performance over couple of months
We can make reservations and optimize cost
For 3 years plan with partial upfront plan

Per month Instances cost = 150 x $0.33x720 => $35,640
Per month EBS cost = 675 x1024x $0.1 => $69,120
Total cost per month = $35,640 + 69,120 => $104,760

For i3.2xlarge instance, 3 years plan with partial upfront plan

Per month cost = 360 x $0.28 x 720 => $72,576

Planning Elastic Network Interfaces (ENI)

The virtual Network interface that you can be attached /detached to/from an instance in VPC in single Availability Zone.
Can be attached to one instance, detached it from that instance, and attached it to another instance in the same availability zone
When you move ENI from one instance to another, network traffic is redirected to the new instance
Really help with seeds node configuration( Hard code config)
Failure of a seed node, you can automate in such a way that the new seed node takes over the ENI IP address programmatically

Monitoring by using Amazon CloudWatch

Amazon CloudWatch can be used as a resource monitoring service
It collects and tracks metrics,

Collect and monitor log files
Set alarms

We can write a custom metric and submit it to Amazon CloudWatch
We can configure alarms to notify you when the metrics exceed certain defined thresholds

Maintenance

In terms of Cassandra cluster health,

Scaling

Cassandra is horizontally scaled by adding more instances to the ring.

Upgrades

Rolling upgrade pattern will be used for Cassandra, Operating System patching and instance type)

Backup & Restore

Cassandra supports snapshots and incremental backup
Can be used instance store, the file-based backup tool works best
These backup files are copied to new instances to restore
We recommend using S3 to durably store backup files for long-term storage.

Security

Ensure that the data is encrypted at rest and in transit.
The second step is to restrict access to unauthorized users.
Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

References

https://d1.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf
https://aws.amazon.com/ec2/instance-types/
https://aws.amazon.com/ebs/details/
https://aws.amazon.com/ec2/pricing/reserved-instances/pricing/
https://aws.amazon.com/ec2/pricing/on-demand/
https://aws.amazon.com/ec2/instance-types/x1e/
https://aws.amazon.com/blogs/aws/now-available-new-c4-instances/
https://aws.amazon.com/blogs/aws/now-available-i3-instances-for-demanding-io-intensive-applications/

Database Amigos

Thursday, May 3, 2018

Best Practices for Running Cassandra on AWS

No comments:

Post a Comment