You are on page 1of 37

INTRODUCTION TO

APACHE CASSANDRA
Gökhan Atıl
GÖKHAN ATIL
➤ Database Administrator

➤ Oracle ACE Director (2016)



ACE (2011)

➤ 10g/11g and R12 Oracle Certified Professional (OCP)

➤ Co-author of Expert Oracle Enterprise Manager 12c

➤ Founding Member and Vice President of TROUG

➤ Blogger (since 2008) gokhanatil.com

➤ Twitter: @gokhanatil

2
INTRODUCTION TO APACHE CASSANDRA
➤ What is Apache Cassandra? Why to use it?

➤ Cassandra Architecture

➤ Cassandra Query Language (CQL)

➤ Cassandra Data Modeling

➤ How to install and run Cassandra?

➤ Cassandra nodetool

➤ Backup and Recovery

3
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
4
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
➤ Fast Distributed (Column Family NoSQL) Database
High availability
Linear Scalability
High Performance
➤ Fault tolerant on Commodity Hardware
➤ Multi-Data Center Support
➤ Easy to operate
➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit

5
HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA

RDBMS Availability

Atomicity Consistency
 Partition


Consistency (ACID) Tolerance
Isolation
Durability

6
HIGH AVAILABILITY: THE RING

NO MASTER NO SLAVE

p
ssi
go e !
nl in
m o
I'
gossip

PEER TO
PEER

7
LINEAR SCALABILITY

8
CASSANDRA ARCHITECTURE
9
CASSANDRA PARTITIONS

EMAIL NAME PHONE


gokhan@ Gokhan 542xxxxxxx
aylin@ Aylin 532xxxxxxx
ilayda@ Ilayda 532xxxxxxx

PRIMARY KEY partitioner


PARTITION KEY, CLUSTERING KEY

10
REPLICATION FACTOR

EMAIL
gokhan@

Murmur3Partitioner

# 60

11
WRITE PATH (CLUSTER)

coordinator
client
node

hinted
hand off

12
WRITE PATH (NODE)

memtable
flush

mem
disk

commit log SSTable SSTable SSTable

compaction

➤ Logging data in the commit log ➤ Flushing to (immutable)


SSTables (Sorted Strings Table)
➤ Writing data to the memtable

13
READ PATH (CLUSTER)

est
e s t
ig

dig
d

coordinator data
client
node

➤ Read Repair: repair during read path using digest and timestamp

14
READ PATH (NODE)

found
memtable row (read) cache

no

partition partition key


maybe bloom filter

summary cache (maybe or no)

mem found
disk

partition index SSTable

15
CONSISTENCY LEVELS

ANY (write only) at least one node

at least one/two/three replica


ONE, TWO, THREE
node
a quorum (N/2+1) of replica
QUORUM
nodes across all datacenters
a quorum (N/2+1) of replica
LOCAL_QUORUM
nodes in the same datacenter

ALL on all replica nodes

➤ Formula for Strong Consistency: R + W > N

16
CASSANDRA QUERY LANGUAGE (CQL)
17
CASSANDRA QUERY LANGUAGE (CQL)

➤ Create a Keyspace (Database):



create keyspace demo with replication = { 'class' :
'SimpleStrategy', 'replication_factor' :1 };
➤ Remove a keyspace:

drop keyspace demo;
➤ Select a keyspace to operate:

use demo;

18
CASSANDRA QUERY LANGUAGE (CQL)
➤ Create a table:

create table demo.democlients ( email text, name text,
phone text, primary key (email, name));
➤ Alter a table:
 EMAIL: PARTITION KEY
NAME: CLUSTERING KEY
alter table democlients add money int;
➤ Remove a table:

drop table democlients;
➤ Remove all rows in a table:

truncate table democlients;

19
CASSANDRA QUERY LANGUAGE (CQL)
➤ Retrieve rows:

select * from democlients where name='Gokhan Atil'
ALLOW FILTERING; -- or create a secondary index
➤ Retrieve distinct values:
 EMAIL: PARTITION KEY

select DISTINCT email from democlients;


➤ Limit the number of rows returned:

select * from democlients LIMIT 1;
➤ Sort the result:

select * from democlients where email='gokhan at
gokhanatil.com' ORDER by name DESC;
NAME: CLUSTERING KEY
20
CASSANDRA QUERY LANGUAGE (CQL)
➤ Retrieve the results in the JSON format:

select JSON * from democlients;
➤ Insert a row:

insert into democlients (email, name, phone) values
('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT
EXISTS;
➤ Insert a row with TTL (Time to live - seconds):

insert into democlients (email, name, phone) values ('info
at gokhanatil.com','Information','542' ) USING TTL 10;

21
CASSANDRA QUERY LANGUAGE (CQL)
➤ Update records:

update democlients set phone='535' where
email='gokhan at gokhanatil.com' and 

name='Gokhan' IF EXISTS;
➤ Update records with a condition:

update democlients set money=20 where email='gokhan
at gokhanatil.com' and name='Gokhan Atil' 

IF phone='542';
➤ Delete rows:

delete from democlients where email='gokhan at
gokhanatil.com' IF EXISTS;

22
CASSANDRA QUERY LANGUAGE (CQL)
➤ Delete row with a condition:

delete from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil' IF money > 10;
➤ Delete columns in a row:

delete money from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil';

23
CASSANDRA DATA MODELING
➤ Query-Driven Data Modeling

➤ Spread data evenly across the cluster

➤ Use Denormalization

➤ Be careful about using secondary indexes

24
HOW TO INSTALL AND RUN CASSANDRA?
25
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ Make sure you have JDK (8u40 or newer) installed
➤ Download apache-cassandra-VERSION-bin.tar.gz
➤ Extract the file to a folder
➤ Make data and logs directories in cassandra folder
➤ Run bin/cassandra

➤ Edit the configuration file (conf/cassandra.yaml)


➤ Give a name to cluster, change listening address, data and logs
directory locations, enable authentication and authorization.

26
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ User docker to pull the latest image:

docker pull cassandra

➤ Run it as standalone:

docker run --name cas1 -p 9042:9042 -e
CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra

➤ Connect using clqsh:



docker exec -it cas1 cqlsh

➤ Run nodetool (i.e for check status):



docker exec -it cas1 nodetool status

27
CASSANDRA NODETOOL
28
CASSANDRA NODETOOL
➤ Get a quick summary of the node:

nodetool info

➤ Get version of Cassandra:



nodetool version

29
CASSANDRA NODETOOL
➤ Get status of the cluster/keyspace:

nodetool status <keyspace_name>

➤ View the network statistics of the node:



nodetool netstats
➤ Get information of a table:

nodetool cfstats <keyspace_name.table_name>

30
CASSANDRA NODETOOL
➤ Repair a node (you can run it weekly on non-peak hours):

nodetool repair

➤ Cleanup of keys no longer belonging to a node:



nodetool cleanup

➤ Start a major compaction process:



nodetool compact

➤ Check the compaction process:



nodetool compactionstats

31
CASSANDRA NODETOOL
➤ Decommission a node (to prepare to remove it):

nodetool decommission <node_UUID>

➤ Remove a dead/or decommissioned node from the cluster:



nodetool removenode <node_UUID>

➤ Take a snapshot (for backup):



nodetool snapshot

➤ Remove previous snapshots:



nodetool clearsnapshot

32
BACKUP AND RECOVERY
33
BACKUP AND RECOVERY
➤ Back up a cluster:
1. Take a snapshot of each node.
2. Move the snapshots to another storage (S3 bucket?)
3. Clean all the snapshots
➤ Restore node(s):
➤ Make sure schema exists
➤ Truncate table
➤ Copy most recent snapshots to a directory. Its name should
be formatted as "keyspace/tablename". Run:

sstableloader -d <nodeip> keyspace/tablename
34
BUILD A BACKUP NODE
➤ Use multi-DC replication:

CREATE KEYSPACE "MyKeyspace"

WITH replication = { 

'class' : 'NetworkTopologyStrategy',

'datacenter1' : 3, 'datacenter2' : 1 };

snapshots

RF=3
client

35
QUESTIONS?
36
Blog: www.gokhanatil.com Twitter: @gokhanatil

You might also like