You are on page 1of 28

TOPIC 2:

UNSTRUCTURED
DATA
OUTLINES

 NoSQL database vs. SQL


database.
 Modelling unstructured data
for NoSQL.
 NoSQL data manipulation.
INTRODUCTION

 Databases are created to


operate large quantities of
information by inputting, storing,
retrieving, and managing that
information.
 Database-Organized collection
of data
 Database Management System
(DBMS) is a software package
with computer programs that
controls the creation,
maintenance and use of a
database. Brief history of database
RELATIONAL DATABASE
 Designed for all purposes
 ACID (Atomic, Consistency, Isolation, Durability)
 Relational model.
 Strong consistency, concurrency, recovery.
 Lots of tools to use with i.e: Reporting services, entity
frameworks.
 Using Standard Query language (SQL)
 Stores data in a table.
 Adding a new property may require altering schemas.
 Good for structured data.
Examples of SQL database  Relationships are captured in normalised model using
joins to resolve references across tables.
 Strict schema.
RDBMS

 Relational databases were not built


 for distributed applications.
 Because...
 Joins are expensive
 Hard to scale horizontally
 Impedance mismatch occurs
 Expensive (product cost, hardware,
maintenance)
 It’s weak in:
 Speed (performance)
 High availability
 Partition tolerance
WHY NO-SQL

 Web-based applications caused


spikes
 explosion of social media sites
(Facebook, Twitter) with large
data needs
 rise of cloud-based solutions
such as Amazon S3 (simple
storage solution)
 Hooking RDBMS to web-based
application becomes trouble
 The Name:
 Stands for Not Only SQL or Non-SQL or Non relational
 The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-based
database
 It was again re-introduced by Eric Evans when an event was organized to discuss
open source distributed databases
 Eric states that “… but the whole point of seeking alternatives is that you need to solve
a problem that relational databases are a bad fit for. …”
 Stores and retrieves data that is not modelled in rows and columns.
 "Not only SQL“ -may supportSQL-like query languages.
WHAT IS NoSQL?
WHAT IS NoSQL?

 A No SQL database provides a mechanism for storage and retrieval of data that employs less constrained
consistency models than traditional relational database
 Key features (advantages):
 non-relational
 don’t require schema
 data are replicated to multiple nodes (so, identical & fault-tolerant) and can be partitioned:
 down nodes easily replaced
 no single point of failure
 horizontal scalable
 cheap, easy to implement (open-source)
 massive write performance
 fast key-value access
 reduce complexity of SQL query
WHAT IS SCHEMA-LESS DATA MODEL: SQL vs NoSQL
 In relational Databases:
 In NoSQL Databases:
 You can’t add a record which does
not fit the schema  There is no schema to consider

 You need to add NULLs to unused  There is no unused cell


items in a row  There is no datatype (implicit)
 We should consider the datatypes.  Most of considerations are done in
i.e : you can’t add a stirng to an application layer
interger field
 We gather all items in an aggregate
 You can’t add multiple items in a field (document)
(You should create another table:
primary-key, foreign key, joins,
normalization, ... !!!)
NoSQL DATA
MODEL

 NoSQL databases are classified


in four major data models:
 Key-value
 Document
 Column family
 Graph
NoSQL DATA MODEL : KEY VALUE
 Most basic and a backbone implementation of NoSQL.
 Underlying is a hash table which consists of a unique key that points to a specific item of data.
 Work by matching keys with values like a dictionary.
 Give a key (e.g. the_answer_to_life) and receives a matching value (e.g.24).
 Database is a global collection of key-value pairs.
 As the volume of data increases, maintaining unique values as keys may become more difficult.
 Riak, Amazon S3 (Dynamo), Oracle NoSQL.
 Data model: (key, value) pairs
 Basic Operations:
 Get(key), returns the value associated with the provided key.
 Put(key, value), associates the value with the key.
 Multi-get(key1, key2, .., keyN), returns the list of values associated with the list of keys.
 Delete(key), removes the entry for the key from the data store
NOSQL DATA MODEL : KEY VALUE
NoSQL DATA MODEL : COLUMN FAMILY

 Advance the simple nature of key / value based.


 Do not require a pre-structured table to work with the data.
 Work by creating collections of one or more key / value pairs.
 Two dimensional arrays whereby each key has one or more key / value pairs attached to it.
 Two groups: column-store and column-family store.
 Column-family store: Bigtable, HBase, Hypertable, and Cassandra.
 Column-store: SybaseIQ, C-store, Vertica, VectorWise, MonetDB, ParAccelandInfobright.
NoSQL DATA MODEL : COLUMN FAMILY

 In a Column Store database, data is stored in columns, as opposed to being stored in rows as is done in most
relational database management systems.
 A Column Store is comprised of one or more Column Families that logically group certain columns in the
database.
 A key is used to identify and point to a number of columns in the database, with a keyspace attribute that defines
the scope of this key.
 Each column contains tuples of names and values, ordered and comma separated.
NoSQL DATA MODEL : COLUMN FAMILY
HOW TO WRITE IT?
 Row-oriented: Each row is an
aggregate (for example, customer
with the ID of Row1) with column
families representing useful chunks
of data (country, product, sales)
within that aggregate.
 Column-oriented: Each column
family defines a record type (e.g.,
country) with rows for each of the
records. You then think of a row as
the join of records in all column
families.
NoSQL DATA MODEL
: GRAPH

 Use graph structures with edges, nodes


and properties.
 Nodes are organised based on their
relationships with one another.
 These relationships are represented by
edges between the nodes.
 Relationship defines social connectivity.
 Both nodes and relationships have
defined properties.
 Neo4j.
NoSQL DATA MODEL : DOCUMENT

 A collection of key value pairs but the values stored (referred to as “documents”) provide some structure and
encoding of the managed data i.e. XML, JSON, BSON. A unique key is a simple identifier (string, URI, path).
 Instead of storing each attribute of an entity with a separate key, document databases store multiple attributes in
a single document.
 While key-value stores require the key to access data value, document store has metadata which allows data
access directly to the attribute instead of through a key.
 CouchDB, Apache Cassandra, MongoDB.
NoSQL DATA MODEL
: DOCUMENT

 Pair each key with complex data


structure known as data
structure.
 Indexes are done via B-Trees.
 Documents can contain many
different key-value pairs, or key-
array pairs, or even nested
documents.
NoSQL DATA MODEL : DOCUMENT
NoSQL DATA MODEL : DOCUMENT
NoSQL DATA MODEL : DOCUMENT
EXERCISES
EXERCISES

You might also like