You are on page 1of 40

NoSQL Databases

Why Do We require NoSQL ?


 Multi- structure and heterogeneous
data from front end applications
 High volume data
 Need of Scalability
 Reliable and available database

2
Taxonomy of NoSQL
• Key-value

• Graph database

• Document-oriented

• Column family 3
NoSQL  “Not Only SQL”
(Non-Relational)
New ways of querying architecting
your dynamic data store…

…beyond RDBMS rows & columns…

…for a different breed of scale


problems…

…and more specialized and simplistic


development techniques. 4
No SQL Continued..
 It’s more than rows in
 It’s free of joins
 It’s schema-free
 It works on many processors
 It uses shared-nothing commodity
computers
 It supports linear scalability
 It’s innovative
5
Key drivers for NoSQL

6
What Giants use ?
 Google – Big Table
 Proprietary , available with Google Cloud
Platform
 Amazon – DynamoDB –proprietary
NoSQL Database
 Offered as a part of AWS
 It is a fully managed cloud database and
supports both document and key-value
store models.
7
Case Study : Google Big Table
 Google need to store results from the web crawlers
that extract HTML pages, images, sounds, videos, and
other media from the internet.

 The resulting dataset was so large that it couldn’t fit


into a single relational database, so Google built their
own storage system.

 Their fundamental goal was to build a system that


would easily scale as their data increased without
forcing them to purchase expensive hardware.

 The solution was neither a full relational database nor


a filesystem, but what they called a “distributed
storage system” that worked with structured data.
Google Big Table
 It gave Google developers a single
tabular view of the data by creating one
large table that stored all the data they
needed.
 In addition, they created a system that
allowed the hardware to be located in
any data center, anywhere in the world
 Created an environment where
developers didn’t need to worry about
the physical location of the data they
manipulated.
Amazon’s Motivation
 Traditional brick-and-mortar retailers that
operate in a few locations operating only during
business hours.
 When not open for business, they run daily
reports, and perform backups and software
upgrades.
 The Amazon model (1) Customers from all
corners of the world (2) Shop at all hours of the
day, every day.
 Any downtime in the purchasing cycle could
result in the loss of millions of dollars. Amazon’s
systems need to be iron-clad reliable and
scalable without a loss in service.
Amazon’s Dynamo—accept an order
24 hours a day, 7 days a week
 Amazon’s need to create

 A highly reliable web storefront


 that supported transactions from around
the world
 24 hours a day, 7 days a week, without
interruption
 Traditional RDBMS systems were not
able to support the business need
The Databases so far
 Flat Files- no structure , no standard
 RDBMS –relational tables
 OLAP / DWH - Cubes
 NoSQL-Collections

12
NoSQL

 Database management System


 focused on
 Scalability
 Performance
 High Availability

13
NoSQL Continued..
 No Joins
 No Complex transactions
 Complexity has to be taken care by the
application
 Less functionality but more
performance (w.r.t. RDBMS)

14
Sharding of data
 Distributes a single logical database system
across a cluster of machines
 Uses range-based partitioning to distribute
documents based on a specific shard key
 Automatically balances the data associated
with each shard
 Can be turned on and off per collection
(table) 8
MongoDB

16
MongoDB
 Document Oriented, NoSQL Database
 Open Source
 Developed and Supported by 10gen
founded in 2007
 General Public Licence (free)
 Commercial Licence
 Scalable, open source , high
performance , document oriented
database (10gen) 17
MongoDB Continued..
 Schema-less database
 Written in C++
 Supports APIs (drivers) in many
computer languages
 JavaScript, Python, Ruby, Perl, Java,
Java Scala, C#, C++, Haskell, Erlang

18
MongoDB
 Table – Collection
 Row – Document
 Document may have different field
 Each Row need to have same field
 Compare it with Flipkart page visits
for user

19
Schema Free
• MongoDB does not need any pre-defined data schema
• Every document in a collection could have different data

{name: “will”, name: “jeff”, {name: “brendan”,


eyes: “blue”, eyes: “blue”, aliases: [“eldiablo”]}
birthplace: “NY”, loc: [40.7, 73.4],
aliases: [“bill”, “la ciacco”], boss: “ben”}
loc: [32.7, 63.4],
boss: ”ben”}
{name: “matt”,
pizza: “DiGiorno”,
height: 72,
name: “ben”, loc: [44.6, 71.3]}
hat: ”yes”}
Flipkart Collection
{
{id=1
Page1 =page1;
Page 2 =page 2;
Page 3 = page3;
}
{id=2
Page 1 =page1;
Page 2= page2;
} 21

}
Flipkart Collection (another
way)
{
{id=1
Page =[page1, page 2, page3]
}
{id=2
Page [page1, page2]
}
}

22
CRUD operations
 Create
 Read
 Update
 Delete
 Done on the collections

23
 MongoDB :Use Cases
 (Project / Company specific )

24
Aadhar
 Adhar is an excellent example of real world use
cases of MongoDB.
 Aadhar, is the world’s biggest biometrics
database. Contains biometric data of over 1.2
billion residents.
 Aadhar has used MongoDB as one of its
database to store this huge amount of data,
originally procured for running the database
search.
 MySQL is used for storing demographic data
and MongoDB is used to store images.
25
Shutterfly
 Internet-based photo sharing and
personal publishing company
 Manages a store of more than 6
billion images with a transaction rate
of up to 10,000 operations per
second.
 One of the companies that
transitioned from Oracle to MongoDB.

26
MetLife
 MetLife is a leading global provider of
insurance, annuities and employee
benefit programs.
 They serve about 90 million customers
and hold leading market positions in the
United States, Japan, Latin America,
Asia, Europe and the Middle East.

27
Metlife continued..
 MetLife uses MongoDB for “The Wall”,
 an innovative customer service application
that provides a consolidated view of MetLife
customers, including policy details and
transactions
 The Wall is designed to look and
function like Facebook and has improved
customer satisfaction and call centre
productivity

28
eBay
 eBay has a number of projects
running on MongoDB for search
suggestions, metadata storage, cloud
management and merchandizing
categorization.

29
 MongoDB use cases
(Application specific)

Source: MongoDB

30
High Volume Data Feeds

Machine
• More machine forms, sensors & data
Generated • Variably structured
Data

• High frequency trading


Securities Data • Daily closing price

Social Media / • Multiple data sources


• Each changes their format consistently
General Public • Student Scores, ISP logs
High Volume Data Feeds Flexible document
model can adapt to
changes in sensor
format
Asynchronous Writes

Data
Data
Sources
Data
Sources
Data Write to memory with
Sources periodic disk flush
Sources

Scale writes over


multiple shards
Operational Intelligence

• Large volume of users


Ad Targeting • Very strict latency requirements
• Sentiment Analysis

• Expose data to millions of customers


Real time • Reports on large volumes of data
dashboards • Reports that update in real time

• Join the conversation


Social Media • Catered Games
Monitoring • Customized Surveys
Operational Intelligence
Parallelize queries
Low latency reads
across replicas and
shards

API
In database
aggregation

Dashboards
Flexible schema
Can use same adapts to changing
cluster to collect, input data
store and report on
data
Behavioural Profiles
Rich profiles
collecting multiple
complex actions
1 See Ad

Scale out to support { cookie_id: “1234512413243”,


high throughput of advertiser:{
activities tracked apple: {
See Ad actions: [
2 { impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
3 Click { purchase: ‘laptop’, time: 354 }
] …
Dynamic schemas
make it easy to

4 Convert
Metadata

• Diverse product portfolio


Product • Complex querying and filtering
Catalogue • Multi-faceted product attributes

• Data mining
Data analysis • Call records
• Insurance Claims

• Retina Scans
Biometric • Fingerprints
Metadata Indexing and rich query
API for easy searching
and sorting

db.archives. Indexing techniques


find({ “country”: “Egypt” }); that fit your data
modeling

db.archives.
find({key:“type”, value:“Artifact”}); Flexible data model
for similar but
different objects

{ type: “Artifact”, { ISBN: “00e8da9b”,


medium: “Ceramic”, type: “Book”,
country: “Egypt”, country: “Egypt”,
year: “3000 BC” title: “Ancient Egypt”
} }
Content Management

• Comments and user generated


News Site content
• Personalization of content and layout

Multi-device • Generate layout on the fly


rendering • No need to cache static pages

• Store large objects


Sharing • Simpler modeling of metadata
Content Management
Geo spatial indexing
Flexible data model for location-based
GridFS for large
for similar but searches
object storage
different objects

{ camera: “Nikon d4”,


location: [ -122.418333, 37.775 ]
}

{ camera: “Canon 5d mkII”,


people: [ “Jim”, “Carol” ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}

{ origin: “facebook.com/photos/xwdf23fsdf”,
license: “Creative Commons CC0”,
size: {
dimensions: [ 124, 52 ],
units: “pixels”
Horizontal scalability }
for large data sets }
Application Why MongoDB Might be a good fit
Large number of objects Sharding lets you split objects across
to store multiple servers
High write / read Sharding + Replication lets you scale
throughput and data read and write traffic across multiple
distribution servers, multiple tenants, or data
centers
Low latency access Memory mapped storage engine
caches documents in RAM, enabling
in-memory operations. Data locality of
documents significantly improves
latency over join-based approaches
Variable data in objects Dynamic schema and JSON data
model enable flexible data storage
without sparse tables or complex
joins, and provide for an intuitive
query language
Cloud based deployment Sharding and replication let you work
around hardware limitations in the 40
cloud.

You might also like