You are on page 1of 15

The Data model for

Big Data

1
Hello!
I am Minakshi Gogoi
You can find me at
minakshi_cse@gimt-guwahati.ac.in
Contents
● Data model for Big Data.

● The properties of data.


Let’s start with the first set of slides
New data
Data Model 011010010..

Speed Layer Batch layer


Master dataset
Realtime view

Realtime view
Serving layer
Batch Batch Batch
Realtime view
view view view

Query: Figure 1: Data model:


How many…? Master dataset in the
Lambda architecture
● “(i) The master dataset is
● the source of truth in
● your system and cannot
● withstand corruption.”
● (ii) The data in the speed layer real time
views has a high turnover rate, so any errors
are quickly expelled.
● (iii)Any errors introduced into the serving layer
● batch views are overwritten because they are
continually rebuilt from the master
● dataset.
The properties of data
Some terminology……
● Information is the general ● Queries are questions you
collection of knowledge relevant ask of your data.
to your Big Data system. ● Views are information that
● Data refers to the information has been derived from your
that can’t be derived from base data. They are built to
anything else. Data serves as the assist with answering
axioms from which everything specific types of queries.
else derives.
Data is raw
● A data system answers questions about information
you’ve acquired in the past.

○ When designing your Big Data system, you want


to be able to answer as many questions as possible.
UNSTRUCTURED DATA IS RAWER
THAN NORMALIZED DATA
● When deciding what raw data to store, a common hazy area is the
line between parsing and semantic normalization. Semantic
normalization is the process of reshaping free

form information into a structured form of data.

● MORE INFORMATION DOESN’T NECESSARILY MEAN


RAWER DATA
Data is immutable
● Human-fault tolerance ● Simplicity
○ With a mutable ○ Mutable data models imply
data model, a mistake can that the data must be indexed
cause data to be lost, in some way so that specific
because values are actually data objects can be retrieved
overridden in the database. and updated. In contrast, with
With an immutable data an immutable data model you
model, no data can be lost only need the ability to append
new data units to the master
dataset.
Data is eternally true
● The key consequence of immutability is that each
piece of data is true in perpetuity.

● That is, a piece of data, once true, must always be


true.
Thanks !

You might also like