Professional Documents
Culture Documents
(EMTE1012)
Chapter 2
• BY: Zerihun T.
• E-mail: zerish201@gmail.com
• 2021/22
Outlines of Discussion
What is Data Science
What are Data and Information
Data processing cycle
Data Science and their representation
Data Value Chain
Basic concept of Big Data
Cluster Computing
Hadoop Ecosystem (High Availability Distributed Object Oriented
Platform)
Big Data life Cycle with Hadoop
Input
Processing
Output
Integers(int)-
Booleans(bool)- Metadata
Characters(char)-) Structured data
Floating-point Semi-structured
numbers(float)- data, and
Alphanumeric Unstructured data
strings(string)
Information that either does not have a predefined data model or is not organized in a pre-
defined manner.
Typically text-heavy but may contain data such as dates, numbers, and facts as well.
This results in irregularities and ambiguities that make it difficult to understand using
Common examples of unstructured data include Word, PDF, Text ,audio, video files or No-
SQL databases.
O U
Y
N K
H A
T