You are on page 1of 23

NOSQL

Background……
⬥ Relational databases  mainstay of business.

⬥ Web-based applications caused spikes


⬦ explosion of social media sites (Facebook, Twitter) with
large data needs
⬦ rise of cloud-based solutions such as Amazon S3 (simple
storage solution)

⬥ RDBMS to web-based application becomes trouble

2
Issues with scaling up….
⬥ Best way to provide ACID and rich query model is to have
the dataset on a single machine
⬥ Limits to scaling up  dataset is just too big!
⬥ Scaling out  is a better choice
⬥ Different approaches for horizontal scaling :
⬦ Master/Slave
⬦ Sharding (partitioning)

3
Scaling out RDBMS……
⬥ Master/Slave
⬦ All writes are written to the master
⬦ All reads performed against the replicated slave
databases
⬦ Critical reads may be incorrect as writes may not have
been propagated down
⬦ Large datasets can pose problems as master needs to
duplicate data to slaves

4
Scaling out RDBMS……
⬥ Sharding (Partitioning)
⬦ Scales well for both reads and writes
⬦ Not transparent, application needs to be partition-aware
⬦ Can no longer have relationships/joins across partitions
⬦ Loss of referential integrity across shards

5
What is NoSQL
.

 It stands for Not Only SQL.


Class of non-relational data storage system uses
NoSQL.
Usually do not required fixed table format nor they
use concepts of joins.

6
7
3 Majors aspect of NoSQL
⬥ Three major papers were the “seeds” of the NOSQL
movement:
⬦ BigTable (Google)
⬦ DynamoDB (Amazon)
⬩ Ring partition and replication
⬩ Gossip protocol (discovery and error detection)
⬩ Distributed key-value data stores
⬦ CAP Theorem

8
CAP Theorem…….
⬥ Suppose three properties
of a distributed system (sharing data)
⬦ Consistency:
A
⬩ all copies have same value C
⬦ Availability: P
⬩ reads and writes always succeed
⬦ Partition-tolerance:
⬩ system properties (consistency and/or availability)
hold even when network failures prevent some
machines from communicating with others
9
10
CAP Theorem……
 The CAP theorem categorizes systems into three categories:
 CP (Consistent and Partition Tolerant) — CP is referring to a
category of systems where availability is sacrificed only in the
case of a network partition.

 CA (Consistent and Available) — CA systems are consistent and


available systems in the absence of any network partition. Often
a single node's DB servers are categorized as CA systems. Single
node DB servers do not need to deal with partition tolerance and
are thus considered CA systems.

 AP (Available and Partition Tolerant) — These are systems that


are available and partition tolerant but cannot guarantee
11
consistency.
CAP Theorem…..
 Consistency
 2 types of consistency:

1. Strong consistency – ACID (Atomicity, Consistency,


Isolation, Durability)

2. Weak consistency – BASE (Basically Available Soft-


state Eventual consistency)

12
ACID Paradigm
 Atomic means all operations of transaction are executed or
no operation executed.

 Consistent means all the data must be in consistent at the


end of each transaction.

 Isolated means modification of data performed by a


transaction must be independent of another transaction.

 Durability means the guarantee that once the user has been
notified of success, the transaction cant be undone.
13
BASE system
 A BASE system gives up on consistency so as to have
greater availability and partition tolerance.

 A BASE can be defined as:

1. Basically Available means the system available all the


time.
2. Soft indicates that the state of the system may change
over time without input. His is because of CAP model.
3. Eventual consistency indicates that the system will
become consistent over time even if input during time is not
received. 14
Types of NoSQL data stores…..
 Key value store

 Column store

 Document store

 Graph Store

15
Key Value…..
 The clients can read and write values using a key as follows:

1. To fetch the value associated with the key use  Get (Key)
2. To associate the value with the key use  Put(key value)
3. To fetch the list of values associated with the list of keys
use  Multi get (key1, Key2,…… key N)
4. Remove the entry for the key from the data store 
Delete(Key)

 The key value assigned based on two rules:


1. All keys in key-value store are unique.
2. No queries can be performed on the values of the table. 16
Continued……
 Simplest NoSQL database.

 The main idea is to use hash


table.

 Access data values using keys.

 Data has no required format.

 Basic operations:
Insert, Fetch, Update, Delete

17
Continued……
Pros:
 Very fast
 Very scalable (horizontally distributed to nodes based on
key)
 Simple data model
 Eventual consistency
 Fault-tolerance

Cons:
 Can’t model more complex data structure such as
objects 18
Column Value....

19
Document based data model…..
 Pair each key value with complex data structure.

 Indexes are done with B- trees.

 Documents can contain many different key values, key array pairs
or nested documents.

 Its like collection of documents.

20
Continued…..

21
Graph data model
 Based on graph theory.

 Scale vertically, no
clustering

 Transactions are easy to


understand.

 It uses ACID algorithm.

22
SQL Vs NOSQL

23

You might also like