You are on page 1of 18

NoSQL Databases

What is NoSQL?

- Non SQL/ Not Only SQL


- Non relational database i.e, no tables
- Stores data differently
- Types: Key-value, column based, document oriented and graph
databases
- Useful to store unstructured data
- Emerged in 2000s as the storage cost dramatically increased
- Examples: MongoDB, CouchDB, Redis, DynamoDB
Features

➔ Flexible Schema:
– No specific schema
– Easy to make changes
➔ Non relational database:
– No relations and tables
– So data not fixed
➔ Faster queries:
– Data not stored in tables
– So no joins and faster to query
Features
➔ Horizontal scaling:
– Also called scale out
– Bring additional nodes to share load
– Better distribution of data and load
➔ Distributed:
– Works in a distributed manner
– Allows auto scaling
➔ Simple API
Types of NoSQL Databases
KEY-VALUE:

- Data is stored in key value pairs


- KEY → VALUE
- Every item has a key which is a unique identifier
- Key maps to a specific value
- Similar to associative arrays or dictionaries
- All operations like search, retrieve, delete, update
and replace happen with the help of key
- Querying is easier
- Examples: Redis, Dynamo, Riak
Types of NoSQL Databases
COLUMN BASED/COLUMN ORIENTED:

- Data is stored in columns


- Columns grouped as column families
- Concept of keyspace - contains all column
families
- Each column family can be then broke down into
multiple rows
- Each row has multiple columns
- We can add columns as required
- Each column has - Name, Value and Timestamp
- Perform analytics efficiently
- Examples:HBase, Cassandra, HBase, Hypertable
Types of NoSQL Databases
GRAPH BASED:

- Focuses on relationship between data elements


- Nodes: Entities or instances of data. They have
properties. Eg: Person
- Edges: Relationship between 2 entities
- Better understanding how data is related
- More closer view of real world
- Flexible
- Examples: Neo4J, Infinite Graph, OrientDB,
FlockDB
Document based databases
- Stores data in the form of flexible documents
- A document is a record in database
- Stores data in a format like - JSON, BSON
- Field : value
- Advantages
1. Flexible and schema less
2. Manage unstructured data
3. Horizontal scalability
4. Distributed
- Examples: MongoDB, CouchDB, Amazon Simple DB
CAP Theorem
The CAP theorem states that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed system with
data replication.

Consistency:
The data should remain consistent even after the execution of an operation. This means once data is written, any
future read request should contain that data. For example, after updating the order status, all the clients should be
able to see the same data.

Availability:
The database should always be available and responsive. It should not have any downtime.

Partition Tolerance:
Partition Tolerance means that the system should continue to function even if the communication among the
servers is not stable. For example, the servers can be partitioned into multiple groups which may not communicate
with each other. Here, if part of the database is unavailable, other parts are always unaffected.
Sharding
- Shard means “a small part of a whole”
- Sharding involves breaking up one’s data into two or
more smaller chunks, called logical shards.
- The logical shards are then distributed across separate
database nodes, referred to as physical shards, which
can hold multiple logical shards.
- Shard key determines how to partition the dataset.
- Benefits
- Smaller, faster
- Easy to manage
- Reduces the cost of transaction
- Failure at one shard doesn’t affect the other
- Scale efficiently
- Improve response time
Sharding architectures
KEY BASED SHARDING:

- Also called hash based sharding


- Involves using a value taken from newly
written data and give it to a hash
function to determine which shard the
data should go to
- The values entered into the hash
function should all come from the same
column. This column is known as a
shard key
- Distributes data evenly
Sharding architectures

RANGE BASED SHARDING:

- Sharding data based on ranges of a


given value
- Relatively simple to implement
- Data might be unevenly distributed
Sharding architectures

DIRECTORY BASED SHARDING:

- Lookup table that uses a shard key to


keep track of which shard holds which
data
- Flexible: allows us to use whatever
system or algorithm to assign data
entries to shards
- Better than key and range based
sharing
- Requires connection to lookup table
Sharding architectures

GEO BASED SHARDING:

- Splits and stores database information


according to geographical location
- Retrieve information faster
- Might result in uneven distribution of
data
MongoDB
- Document based NoSQL database
- Provides high performance, high availability
and easy scalability
- Database: Physical container for collections
- Collection:
- Group of MongoDB documents
- Equivalent to RDBMS tables
- Documents in a collection are related to each other
- Document
- Set of key-value pairs
- Dynamic schema
Document
MongoDB stores data as BSON records. BSON is a binary representation of JSON documents,
though it contains more data types than JSON.
Data model design

Embedded Data Model

In this model, you can have (embed) all


the related data in a single document, it
is also known as de-normalized data
model.
Data model design

Normalized Data Model

In this model, you can refer the sub


documents in the original document,
using references.

You might also like