You are on page 1of 44

Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.

html#1

Database Systems

1 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Lecture 10: NoSQL


Databases
Database Systems

J Mwaura

2 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Brief History of Databases

3/44

3 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Benefits of Relational
Databases
Designed for all purposes

ACID

Strong consistancy, concurrency, recovery

Mathematical background - set theory

Standard Query language (SQL)

Lots of tools to use with i.e. Reporting services,


entity frameworks

4/44

4 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

NoSQL why, what and when?


Era of distributed computing

...but relational databases were not built for


distributed applications

Because...

Joins are expensive

Hard to scale horizontally

Impedance mismatch occurs

Expensive (product cost, hardware, Maintenance)

5/44

5 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

NoSQL why, what and when?


Era of distributed computing

...but relational databases were not built for


distributed applications

And...

It's weak in:

1. Speed (performance)
2. High availability
3. Partition tolerance

6/44

6 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

NoSQL why, what and when?


Spread-out of web applications or services
handling Big Data

Big data is high-volume, high-velocity and high-


variety information assets that demand cost-
effective, innovative forms of information
processing for enhanced insight and decision
making

7/44

7 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Sources of Big Data


Mobile use of internet

Cloud computing

Collaboration

IP-based communication

Social media

Video streaming & media distribution

8/44

8 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

3 V's of Big Data

9/44

9 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Usage of big data


Visibility - making big data accessible in a timely fashion
to relevant stakeholders

Discover and analyze information - data fusion to


generate new information and patterns

Segmentation and customizations - creating highly


specific segmentations and tailor products and services.
e.g. segmentation of customers

Aid decision making - improve decision making, minimize


risks, and unearth valuable insights e.g. Automated Fraud
Alert systems in credit card processing

Innovation - innovation of new ideas in form of products


and services

10/44

10 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Big Data Challenges


Policies and procedures - compliance of data
privacy, security, intellectual property and protection
of big data

Access to data - access to 3rd parties data can pose


a legal, contractual challenge

Technology and techniques - inadequacy of the


legacy systems to deal with Big Data & lack of
experienced resources in newer technologies

Structure of Big Data - unstructured data such as


images, videos, logs etc

Data storage & processing - more memory needs &


new analyses algorithms and analytic softwares

11/44

11 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Big Data Technologies


1. New Storage and processing technologies
designed specifically for large unstructured
data e.g. mongoDB
2. Parallel processing
3. Clustering
4. Large grid environments
5. High connectivity and high throughput
6. Cloud computing and scale out
architectures

12/44

12 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

RDBMS Performance

13/44

13 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Properties of NoSQL
Databases
Provides:

1. Easy and frequent changes to DB


2. Fast development & data replication is
easy
3. Large data volumes - focus is on
distributed and horizontal scalability
4. Schema less - weak or no schema
restrictions
5. Easy access is provided via an API
6. The consistency model is not ACID
(instead, e.g., BASE)

14/44

14 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Properties of NoSQL
Databases
NoSQL avoids:

· Overhead of ACID transactions


· Complexity of SQL query
· Burden of up-front schema design
· DBA presence
· Transactions (it should be handled at
application layer)

15/44

15 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

NoSQL why/when to use?


Data which requires flexible schema

When ACID support is not really necessary

Object-relational impedance mismatch -


conceptual and technical difficulties

Need for distributed or scalable application

Logging data from distributed sources

Storing events/temporal data - shopping carts,


wish lists etc.

Polyglot persistence i.e. best data store


depending on nature of data

16/44

16 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

NoSQL why/when not to use?


Financial data

Data requiring strict ACID compliance

Business critical data

17/44

17 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Schema-less Data Model


In NoSQL Databases

· No schema to consider
· No unused cell
· No data type (implicit)
· Most of considerations are done in
application layer
· Data is gathered in an aggregate -
document

18/44

18 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Aggregate Data Models


NoSQL databases models
1. Key-value
2. Document
3. Column family
4. Graph

Each database has its own query language

19/44

19 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

BASE & CAP Theorem


BASE - Basically Available, Soft state, Eventually
consistent - allows replicated computer nodes to
temporarily hold diverging data versions and only be
updated with a delay

CAP Theorem by Eric Brewer - states that in any


massive distributed data management system, only
two of the three properties consistency, availability,
and partition tolerance can be ensured

but, We need a distributed database system having


such feature; Fault tolerance, High
availability, Consistency, Scalability

20/44

20 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Comparing ACID & BASE

ACID BASE

Consistency is the top Consistency is ensured


priority (strong only eventually (weak
consistency) consistency)
Mostly pessimistic Mostly optimistic
concurrency control concurrency control
methods with locking methods with nuanced
protocols setting options
High availability and
Availability is ensured for partition tolerance for
moderate volumes of data massive distributed data
storage
Some integrity restraints Some integrity restraints
(e.g., referential integrity) (e.g., referential integrity)
are ensured by the are ensured by the
database schema database schema

21/44

21 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Key-value data model


At the hardware level, CPUs work with registers based
on this model

Programming languages use the same concept in


associative arrays

Simplest database model possible - is data storage


that stores a data object as a value for another data
object as key

Uses simple command,e.g., SET, GET

· SET User:U17547:firstname John


· SET User:U17547:lastname Nzue
· SET User:U17547:email john.nzue@jkuat.net
· GET User:U17547:email >>john.nzue@jkuat.net

key-value stores do not support any kind of structure,


neither nesting nor references

22/44

22 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Key-value data model


Use special characters such as colons or slashes

key-value store properties

· There is a set of identifying data objects,


the keys
· For each key, there is exactly one
associated descriptive data object, the
value for that key
· Specifying a key allows querying the
associated value in the database

Example: Amazon DynamoDB, Redis

23/44

23 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Key-value data model


Data has no required format data may have any
format

Data model: (key, value) pairs

Basic Operations: Insert(key,value), Fetch(key),


Update(key), Delete(key)

24/44

24 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Column family data model


Often, the data matrix needs to be structured
with a schema

Column-family stores enhance the key-value


concept by providing additional structure

The column is lowest/smallest instance of data

It is a tuple that contains a name, a value and a


timestamp

Stores data not in enhanced and structured


multidimensional key spaces - column families

Example: Google Bigtable, Cassandra

25/44

25 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Column family data model


Column-family store properties

· Data is stored in multidimensional tables


· Data objects are addressed with row keys
· Object properties are addressed with column
keys
· Columns of the tables are grouped into column
families
· A table's schema only refers to the column
families; within one column family, arbitrary
column keys can be used
· In distributed, fragmented architectures, the
data of a column family is preferably physically
stored at one place (co-location) in order to
optimize response times

26/44

26 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Column family data model

27/44

27 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Document store model


Pair each key (document ID) with complex data
structure known as documents e.g. JSON/BSON
format such as {"hello":"world"}

Indexes are done via B-Trees

Document stores are completely schema-free -


the demerit of not having schema is the missing
referential integrity & normalization

Documents can contain many different key-value


pairs, or key-array pairs, or even nested
documents

28/44

28 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Document store
Document store properties

· It is a key-value store
· The data objects stored as values for keys
are called documents; the keys are used for
identification
· The documents contain data structures in
the form of recursively nested attribute-
value pairs without referential integrity
· These data structures are schema-free, i.e.,
arbitrary attributes can be used in every
document without defining a schema first

29/44

29 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Document store

Examples: MongoDB, CouchDB, JSON stores

30/44

30 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model


Graph model has a structuring schema as
opposed to first 3 models that forgo database
schemas and referential integrity for the sake of
easier fragmentation (sharding)

Data is stored as nodes & edges, which belong to


a node type or edge type, respectively, and
contain data in the form of attribute-value pairs

The relationships between data objects are


explicitly present as edges, and referential
integrity is ensured by the DBMS

31/44

31 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model


Based on Graph Theory

Scale vertically, no clustering

You can use graph algorithms easily

Supports transactions

Observes ACID

32/44

32 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model


Graph data model properties

· The data and/or the schema are shown as


graphs or graph-like structures, which
generalize the concept of graphs (e.g.,
hypergraphs)
· Data manipulations are expressed as graph
transformations, or operations which
directly address typical properties of
graphs (e.g., paths, adjacency, subgraphs,
connections, etc.)
· The database supports the checking of
integrity constraints to ensure data
consistency. The definition of consistency is
directly related to graph structures (e.g.,
node and edge types, attribute domains,
and referential integrity of the edges)

33/44

33 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model


The advantage of the graph database is the index-
free adjacency property - For every node, the
database system can find the direct neighbor,
without having to consider all edges

34/44

34 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model

35/44

35 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Graph data model

36/44

36 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Data Warehouse
Business intelligence - decisions making based
on facts gathered from the analysis of the
available data

Data analysis is often complex - due to


heterogeneity, volatility, and fragmentation of the
data, cross-application

Business intelligence makes 3 demands on the


data to be analyzed

1. Integration of heterogeneous data


2. Historicization of current and volatile data
3. Complete availability of data on certain subject
areas

37/44

37 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Data Warehouse
Data warehouse - a data warehouse or DWH is a
distributed information system with the following
properties
· Integrated - data from various sources and
applications (source systems) is periodically integrated
and filed in a uniform schema
· Read only - data in the data warehouse is not
changed once it is written
· Historicized - thanks to a time axis, data can be
evaluated for different points in time
· Analysis-oriented - all data on different subject
areas like customers, contracts, or products is fully
available in one place
· Decision support - the information in data cubes
serves as a basis for management decisions

38/44

38 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Steps of Data Warehousing

39/44

39 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

Data Mining Tools


Classification

Selection

Prognosis

Knowledge acquisition

40/44

40 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

MongoDB
Mongo database properties;

· It is a JSON-style documents
· Provides a Flexible 'Schemas" e.g.
{"author":"mike","text":"..."} change
to
{"author":"eliot","text":"...","tags":
["mongodb"]}
· Dynamic indexing & querying
· Atomic update modifiers
· Focus on performance
· Replication
· Auto-sharding
· Many supported platforms/languages

41/44

41 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

MongoDB
Mongo database is less good at;

· Highly transactionals needs


· Ad-hoc business intelligence
· Problems that require SQL

42/44

42 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

End of Course
Database Systems

43 of 44 1/31/2022, 7:38 AM
Database Systems https://omaps.bitbucket.io/tutoria/jkuat/ics2206/ppt-10.html#1

That's it!
Queries about this Lesson, please send them to:
jmwaura.uni@gmail.com

*References*
· Database Systems: Design,
Implementation, and Management,
12th ed. Carlos Coronel & Steven
Morris
· Database Modeling and Design;
Logical Design, 5th ed. Taby
Teorey et.al
· Fundamentals of database
systems, 6th ed. Ramez Elmasri &
Shamkant B. Navathe

“ Courtesy of … ”

44/44

44 of 44 1/31/2022, 7:38 AM

You might also like