You are on page 1of 39

Advanced Database Systems

NoSQL
Firma convenzione
Politecnico di Milano e Veneranda Fabbrica
del Duomo di Milano
Instructor
Aula Magna –Eric
Rettorato
Umuhoza, PhD
Mercoledì 27 maggio 2015
eumuhoza@andrew.cmu.edu
@EricUmuhoza
NoSQL History

1980  Relational Databases


1990

2000

2010
Relational Databases - Advantages

 Relational databases have been a successful providing


o Persistence
o Concurrency control
 Different users access the same information at the same time
 Transactions are used to ensure consistent interaction
 ACID properties
o Integration mechanism
 Many applications need to share information
 By getting all applications to use the database, we ensure all these
applications have consistent, updated data
o SQL, a quasi standard
o Reporting
Relational Databases - Disadvantages

Impedance mismatch between


the relational model and the in-
memory data structures
NoSQL History

1980

1990
 Object Databases

2000

2010
NoSQL History

1980

1990
 Relational DBMS continued to dominate over
OODB
2000  Integration DB
o Multiple applications storing their data in a
shared DB
2010
o Improves communication
NoSQL History

 How to cope with lots of traffic generated by websites and social media
applications?
o Scale up (bigger machines, more processors, disk storage, and memory)
o Use lots of small machines in a cluster
 Relational databases are not designed to run efficiently on clusters
o Alternative route to data storage

 2007:
o The research paper on Amazon Dynamo is released
o The document database MongoDB is started
 2008:
o Cassandra project
o Voldemort
The #NoSQL Story

 A meetup
o Johan Oskarssonin SF, CA, June 2009
 A hashtag: #nosq

 BUT
Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-
source relational database that did not expose the standard SQL
interface.
Definition

 There is no standard definition of what NoSQL means


 The term began with a workshop organized in 2009
 But there is much argument about what databases can
truly be called NoSQL
Common Characteristics

 Non- relational
o They don’t use the relational data model and
thus don’t use the SQL language
 Cluster-friendly
o They tend to be designed to run on a cluster
 Open Source
 Schema-less
o They don’t have a fixed schema
o They allow you to store any data in any record
NoSQL Originators

 Google (BigTable, LevelDB)


 LinkedIn (Voldemort)
 Facebook (Cassandra)
 Twitter (Hadoop/Hbase, FlockDB, Cassandra)
 Netflix (SimpleDB, Hadoop/HBase, Cassandra)
Different Types of NoSQL

 Key-Value Store
o A key that refers to a payload (actual content /
data)
o MemcacheDB, Azure Table Storage, Redis

 Column Store
o Column data is saved together, as opposed to
row data
o Super useful for data analytics
o Hadoop, Cassandra, Hypertable
Different Types of NoSQL

 Document / XML / Object Store


o Key (and possibly other indexes) point at a
serialized object
o DB can operate against values in document
o MongoDB, CouchDB, RavenDB

 Graph Store
o Nodes are stored independently, and the
relationship between nodes (edges) are stored
with data
o Neo4j
Key-Value

Search by ID is usually built


on top of a key-value store Value

Key
100

215
325
Key-Value

Business Key Value


information about tweet
twitter.com tweet id

information about flight


kayak.com Flight number

yourbank.com Account number information about it

information about it
amazon.com item number
Column Storage

Row-store Column-store

+ Only need to read in relevant


+ Easy to add and modify a record data
- Might read in unnecessary data - Tuples writes requires multiple
accesses

Suitable for read-mostly, read-intensive, large data


repository
Why Document-based?

 Handles Schema Changes Well (easy


development)
 Solves Impedance Mismatch problem

 Usually in JSON

 Not really schema-less


o Implicit schema to retrieve specific values
o E.g.: I want a price of an order!
An Example with Relations

[Marco Brambilla, Data Design and Modeling]


Key-value Approach
Document-based Approach
Column-based Approach
From Key-based to Column/Document
Aggregate- oriented Database

 An aggregate is a collection of data that we interact with


as a unit
 Aggregates form the boundaries for ACID operations with
the database
 Key-value, document, and column-family databases
can be seen as forms of aggregate-oriented DB
 Aggregates make it easier for the database to manage
data storage over clusters
 Aggregate-oriented databases work best when most data
interaction is done with the same aggregate
Aggregate- oriented Database
Aggregate- oriented Database

 Graph databases
organize data into
node and edge graphs
 They work best for
data that has complex
relationship structures!
o How about relational
database?
o Relation doesn’t
mean relationship
Classification of NoSQL
Classification of NoSQL
Key Value CRUD Operations

 Query operations are limited to


o put(key,value)

o get(key)

o delete(key)
Example of NoSQL System: MongoDB

 An open source and document-oriented database


 Data is stored in JSON-like documents
 Designed with both scalability and developer
agility
 Dynamic schemas
Terminology: SQL vs MongoDB
MongoDB Data Model
MongoDB Queries: Create

 CRUD (Create–Read –Update –Delete)


o Create a database: use database_name
o Create a collection:
db.createCollection(name, options)
options: specify the number of documents in a collection etc.
o Insert a document:
db.<collection_name>.insert({“name”: “nguyen”, “age”: 24,
“gender”: “male”})
MongoDB Queries: Read

 CRUD (Create–Read –Update –Delete)


o Query [e.g. select all]
db.<collection_name>.find().pretty()
o Query with conditions
db.<collection_name>.find({“gender”: “female”, “age”: {$lte:20}
}).pretty()
o It’s pattern matching!
Read – Mapping to SQL
Comparison Operators
MongoDB Queries: Update

 CRUD (Create–Read –Update –Delete)


o <collection_name>.update(<select_criteria>,<updated_data>)
o db.students.update({‘name':‘nguyen'}, { $set:{‘age': 20 } } )
o Replace the existing document with new one: save method:
db.students.save({_id:ObjectId(‘string_id’), “name”: “ben”, “age”: 23, “gender”:
“male”}
MongoDB Queries: Update

 CRUD (Create–Read –Update –Delete)


o Drop a database
o db.dropDatabase()
o Drop a collection:
o db.<collection_name>.drop()
o Delete a document:
o db.<collection_name>.remove({“gender”: “male” })
Conclusions

 NoSQL pros
o Data modeling can be an iterative process
o Linear scaling occurs as nodes are added to the cluster
o Native integration of Map/Reduce Frameworks and Full-
text search engines
o Easy and efficient storage of high-variable data
Conclusions

 NoSQL “cons”
o Implicit schema at the application level
o Applications need to check for consistency and integrity constraints
o No transactions (across multiple objects), conflict resolution must
be done by the client application
o ACID transactions are limited to just one element (row, document,
entity, etc.) in contrast with RDBMS
o 2nd generation NoSQL or NewSQL databases try to cope with this problem
o Data models and query languages are proprietary and create
vendor lock-in
o Data structure is chosen upfront, based on the queries that will be
expressed.
o If queries change also data need to be changed

You might also like