Introductiontocassandra 180218073404

INTRODUCTION TO
APACHE CASSANDRA
Gökhan Atıl
GÖKHAN ATIL
➤ Database Administrator
➤ Oracle ACE Director (2016) 

ACE (2011)
➤ 10g/11g and R12 Oracle Certified Professional (OCP)
➤ Co-author of Expert Oracle Enterprise Manager 12c
➤ Founding Member and Vice President of TROUG
➤ Blogger (since 2008) gokhanatil.com
➤ Twitter: @gokhanatil
2
INTRODUCTION TO APACHE CASSANDRA
➤ What is Apache Cassandra? Why to use it?
➤ Cassandra Architecture
➤ Cassandra Query Language (CQL)
➤ Cassandra Data Modeling
➤ How to install and run Cassandra?
➤ Cassandra nodetool
➤ Backup and Recovery
3
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
4
WHAT IS APACHE CASSANDRA? WHY TO USE IT?
➤ Fast Distributed (Column Family NoSQL) Database
High availability
Linear Scalability
High Performance
➤ Fault tolerant on Commodity Hardware
➤ Multi-Data Center Support
➤ Easy to operate
➤ Proven: CERN, Netflix, eBay, GitHub, Instagram, Reddit
5
HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA
RDBMS Availability
Atomicity Consistency  Partition

Consistency (ACID) Tolerance
Isolation
Durability
6
HIGH AVAILABILITY: THE RING
NO MASTER NO SLAVE
p
ssi
go e !
nl in
m o
I'
gossip
PEER TO
PEER
7
LINEAR SCALABILITY
8
CASSANDRA ARCHITECTURE
9
CASSANDRA PARTITIONS
EMAIL NAME PHONE

gokhan@ Gokhan 542xxxxxxx
aylin@ Aylin 532xxxxxxx
ilayda@ Ilayda 532xxxxxxx
PRIMARY KEY partitioner

PARTITION KEY, CLUSTERING KEY
10
REPLICATION FACTOR
EMAIL
gokhan@
Murmur3Partitioner
# 60
11
WRITE PATH (CLUSTER)
coordinator
client
node
hinted
hand off
12
WRITE PATH (NODE)
memtable
flush
mem
disk
commit log SSTable SSTable SSTable
compaction
➤ Logging data in the commit log ➤ Flushing to (immutable)

SSTables (Sorted Strings Table)
➤ Writing data to the memtable
13
READ PATH (CLUSTER)
est
e s t
ig
dig
d
coordinator data
client
node
➤ Read Repair: repair during read path using digest and timestamp
14
READ PATH (NODE)
found
memtable row (read) cache
no
partition partition key

maybe bloom filter 
summary cache (maybe or no)
mem found
disk
partition index SSTable
15
CONSISTENCY LEVELS
ANY (write only) at least one node
at least one/two/three replica

ONE, TWO, THREE
node
a quorum (N/2+1) of replica
QUORUM
nodes across all datacenters
a quorum (N/2+1) of replica
LOCAL_QUORUM
nodes in the same datacenter
ALL on all replica nodes
➤ Formula for Strong Consistency: R + W > N
16
CASSANDRA QUERY LANGUAGE (CQL)
17
➤ Create a Keyspace (Database): 

create keyspace demo with replication = { 'class' :
'SimpleStrategy', 'replication_factor' :1 };
➤ Remove a keyspace: 
drop keyspace demo;
➤ Select a keyspace to operate: 
use demo;
18
➤ Create a table: 
create table demo.democlients ( email text, name text,
phone text, primary key (email, name));
➤ Alter a table:  EMAIL: PARTITION KEY
NAME: CLUSTERING KEY
alter table democlients add money int;
➤ Remove a table: 
drop table democlients;
➤ Remove all rows in a table: 
truncate table democlients;
19
➤ Retrieve rows: 
select * from democlients where name='Gokhan Atil'
ALLOW FILTERING; -- or create a secondary index
➤ Retrieve distinct values:  EMAIL: PARTITION KEY
select DISTINCT email from democlients;

➤ Limit the number of rows returned: 
select * from democlients LIMIT 1;
➤ Sort the result: 
select * from democlients where email='gokhan at
gokhanatil.com' ORDER by name DESC;
NAME: CLUSTERING KEY
20
➤ Retrieve the results in the JSON format: 
select JSON * from democlients;
➤ Insert a row: 
insert into democlients (email, name, phone) values
('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT
EXISTS;
➤ Insert a row with TTL (Time to live - seconds): 
insert into democlients (email, name, phone) values ('info
at gokhanatil.com','Information','542' ) USING TTL 10;
21
➤ Update records: 
update democlients set phone='535' where
email='gokhan at gokhanatil.com' and  
name='Gokhan' IF EXISTS;
➤ Update records with a condition: 
update democlients set money=20 where email='gokhan
at gokhanatil.com' and name='Gokhan Atil'  
IF phone='542';
➤ Delete rows: 
delete from democlients where email='gokhan at
gokhanatil.com' IF EXISTS;
22
➤ Delete row with a condition: 
delete from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil' IF money > 10;
➤ Delete columns in a row: 
delete money from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil';
23
CASSANDRA DATA MODELING
➤ Query-Driven Data Modeling
➤ Spread data evenly across the cluster
➤ Use Denormalization
➤ Be careful about using secondary indexes
24
HOW TO INSTALL AND RUN CASSANDRA?
25
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ Make sure you have JDK (8u40 or newer) installed
➤ Download apache-cassandra-VERSION-bin.tar.gz
➤ Extract the file to a folder
➤ Make data and logs directories in cassandra folder
➤ Run bin/cassandra
➤ Edit the configuration file (conf/cassandra.yaml)

➤ Give a name to cluster, change listening address, data and logs
directory locations, enable authentication and authorization.
26
HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ User docker to pull the latest image: 
docker pull cassandra
➤ Run it as standalone: 
docker run --name cas1 -p 9042:9042 -e
CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra
➤ Connect using clqsh: 

docker exec -it cas1 cqlsh
➤ Run nodetool (i.e for check status): 

docker exec -it cas1 nodetool status
27
CASSANDRA NODETOOL
28
CASSANDRA NODETOOL
➤ Get a quick summary of the node: 
nodetool info
➤ Get version of Cassandra: 

nodetool version
29
CASSANDRA NODETOOL
➤ Get status of the cluster/keyspace: 
nodetool status <keyspace_name>
➤ View the network statistics of the node: 

nodetool netstats
➤ Get information of a table: 
nodetool cfstats <keyspace_name.table_name>
30
CASSANDRA NODETOOL
➤ Repair a node (you can run it weekly on non-peak hours): 
nodetool repair
➤ Cleanup of keys no longer belonging to a node: 

nodetool cleanup
➤ Start a major compaction process: 

nodetool compact
➤ Check the compaction process: 

nodetool compactionstats
31
CASSANDRA NODETOOL
➤ Decommission a node (to prepare to remove it): 
nodetool decommission <node_UUID>
➤ Remove a dead/or decommissioned node from the cluster: 

nodetool removenode <node_UUID>
➤ Take a snapshot (for backup): 

nodetool snapshot
➤ Remove previous snapshots: 

nodetool clearsnapshot
32
BACKUP AND RECOVERY
33
BACKUP AND RECOVERY
➤ Back up a cluster:
1. Take a snapshot of each node.
2. Move the snapshots to another storage (S3 bucket?)
3. Clean all the snapshots
➤ Restore node(s):
➤ Make sure schema exists
➤ Truncate table
➤ Copy most recent snapshots to a directory. Its name should
be formatted as "keyspace/tablename". Run: 
sstableloader -d <nodeip> keyspace/tablename
34
BUILD A BACKUP NODE
➤ Use multi-DC replication: 
CREATE KEYSPACE "MyKeyspace" 
WITH replication = {  
'class' : 'NetworkTopologyStrategy', 
'datacenter1' : 3, 'datacenter2' : 1 };
snapshots
RF=3
client
35
QUESTIONS?
36
Blog: www.gokhanatil.com Twitter: @gokhanatil

Introductiontocassandra 180218073404

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introductiontocassandra 180218073404

Uploaded by

Copyright:

Available Formats

INTRODUCTION TO

➤ Oracle ACE Director (2016)

➤ 10g/11g and R12 Oracle Certified Professional (OCP)

➤ Co-author of Expert Oracle Enterprise Manager 12c

➤ Founding Member and Vice President of TROUG

➤ Blogger (since 2008) gokhanatil.com

➤ Cassandra Query Language (CQL)

➤ Cassandra Data Modeling

➤ How to install and run Cassandra?

➤ Backup and Recovery

Atomicity Consistency Partition

EMAIL NAME PHONE

PRIMARY KEY partitioner

commit log SSTable SSTable SSTable

➤ Logging data in the commit log ➤ Flushing to (immutable)

partition partition key

partition index SSTable

ANY (write only) at least one node

at least one/two/three replica

ALL on all replica nodes

➤ Formula for Strong Consistency: R + W > N

➤ Create a Keyspace (Database):

select DISTINCT email from democlients;

➤ Spread data evenly across the cluster

➤ Be careful about using secondary indexes

➤ Edit the configuration file (conf/cassandra.yaml)

➤ Connect using clqsh:

➤ Run nodetool (i.e for check status):

➤ Get version of Cassandra:

➤ View the network statistics of the node:

➤ Cleanup of keys no longer belonging to a node:

➤ Start a major compaction process:

➤ Check the compaction process:

➤ Remove a dead/or decommissioned node from the cluster:

➤ Take a snapshot (for backup):

➤ Remove previous snapshots:

You might also like

➤ Oracle ACE Director (2016) 

Atomicity Consistency  Partition

➤ Create a Keyspace (Database): 

➤ Connect using clqsh: 

➤ Run nodetool (i.e for check status): 

➤ Get version of Cassandra: 

➤ View the network statistics of the node: 

➤ Cleanup of keys no longer belonging to a node: 

➤ Start a major compaction process: 

➤ Check the compaction process: 

➤ Remove a dead/or decommissioned node from the cluster: 

➤ Take a snapshot (for backup): 

➤ Remove previous snapshots: