You are on page 1of 15

APACHE

CASSANDRA
SUMMARY OF CONTENTS
What is Apache Cassandra?
Evolution of Cassandra ?
Why Cassandra for Big Data?
Apache Cassandra Data types ?
Data Distribution in Apache Cassandra
How to Add Data in Cassandra ?
How to Read Data ?
How to Delete Data ?
Use Cases ?
Advantages and Limitations
W H A T I S A PA C H E C A S S A N D R A ?

Apache Cassandra is an open-source, NoSQL, wide column data store that


can quickly take and process huge amounts of data.

It is decentralized, distributed, scalable, highly available, and fault-


tolerant, ,
with identical nodes that are clustered together for eliminating single points
of failure.
E VO LU T I O N O F C A S S A N D R A ?
W H Y C A S S A N D R A F O R B I G D AT A ?

1. Handles high velocity data with ease


2. Uses schema that support broad varieties of data
3. Is designed for continuous availability
4. Offers quick installation and configuration for multi-node
clusters.
5. It is open source and reduces cost as compared to RDBMS.
D AT A T Y P E S I N C A S S A N D R A ?
1.It supports the most common data types including ASCII, Bigint , BLOB,
Boolean counter, decimal, double, float, int , text, timestamp, UUID, varchar
etc.

2.Its data model offers the convenient of the column indexes with the
performance of log structured updates, strong support for denormalization and
materialized views and built in caching.

3.Data access is performed using CQL (Cassandra Query Language) which is


resembled to SQL (Structured Query Language).
D AT A D I S T R I B U T I O N I N C A S S A N D R A ?

Cassandra uses a peer-to-peer model for distributing the data, which enables it
to fully distribute data in the form of variable-length rows, stored by partition
keys. Cassandra
is built for its scalability, continuous availability, and has having no single
point of
failure.

Many Different databases, such as Postgre SQL, use a master-slave replication


model,
in which the writes go to a master node and reads are executed on slaves. To
provide high availability, fault tolerance, and scalability, Cassandra’s peer-to-
peer distribution model provides nodes with open channels of communication.
Cassandra uses Tokens ( a 64
bit integer) for determining which node holds what data.
A D D I N G D AT A I N C A S S A N D R A ?

You can insert data into the columns of a row in a table using the command
INSERT. Syntax for creating data in a table is shown below.

INSERT INTO <tablename>


(<column1 name>, <column2
name>....) VALUES (<value1>,
<value2>....)
USING <option>

Let us assume there is a table called std with columns (std_id, std_name,
std_city, std_phone, std_fee) and you have to insert the following data
into the std table.
A D D I N G D AT A I N C A S S A N D R A ?

Use the commands given below to fill the table with required data.

cqlsh: project1> INSERT INTO std (std_id, std_name, std_city, std_phone,


std_fee) VALUES (1,'Ramesh', 'Hyderabad', 9191234567, 55000);

cqlsh:project1> INSERT INTO std (std_id, std_name, std_city, std_phone,


std_fee) VALUES (2,'Pavan', 'Visakhapatnam', 9191234567, 45000);

cqlsh:project1> INSERT INTO std (std_id, std_name, std_city, std_phone,


std_fee) VALUES (3,'Gayatri', 'Vizainagaram', 9191234567, 47000);
R E A D I N G D AT A I N C A S S A N D R A ?
SELECT clause is used to read data from a table in Cassandra. By Using this clause, you can
read a whole table, a single column, or a particular cell.

The syntax of SELECT clause is given below


SELECT FROM <tablename>

Assume there is a table in the key space


named std with the following details –

cqlsh:project1> select * from std;

cqlsh:project1> SELECT std_name, std_fee from std;


D E L E T I N G D AT A I N C A S S A N D R A ?
You can delete data from a table using the command DELETE. The syntax is given
below
DELETE FROM <identifier> WHERE <condition>;

The following statement deletes the std_fee column of last row –


cqlsh:project1> DELETE std_fee FROM std WHERE emp_id=3;
Deletion of Entire row:-
The following command deletes an entire row from a table.
cqlsh:project1> DELETE FROM std WHERE emp_id=3;
USE CASES FOR CASSANDRA ?

1. Mobility
2. Security and Fraud Detection
3. Personalization and
Recommendation
4. IOT
5. Cloud Operations
A D VA N TA G E S?
1. Open source
2. Peer to peer Architecture
3. Elastic Scalability
4. High Availability and Fault
Tolerence
5. High performance
6. Column oriented
7. Tunable Consistency
8. Schema-Free
L I M I TAT I O N S ?
1. A single column value may not be larger than 2 Giga Bytes.
2. The maximum number of column per row is 2 billion.
3. All data read should fit in memory due to thrift streaming
support lack.
4. The key must be less than 64k bytes.
THANK YOU

You might also like