Professional Documents
Culture Documents
BIG DATA
TECHNOLOGY
SAHANA SHETTY,
DEPT. OF CSE, FET
CONTENTS
2
• Big Data Technology-I: The elephant in the
room: Hadoop’s parallel world old Vs. new
approaches; Data discovery: work the way
people’s minds work; Open source technology
for big data analytics; The cloud and big data;
Predictive analytics moves into the limelight;
Software as a service BI. Mobile business
intelligence is going mainstream; Ease of
mobile application deployment; Crowdsourcing
analytics; Inter – and Trans-firewall analytics.
2003
4
2004
2006
Dept. of CSE & ISE, FET
• Hadoop:
6 • an open-source software framework that supports
data-intensive distributed applications, licensed under
the Apache v2 license. or
Hadoop is an open-source platform for storage and
processing of diverse data types that enables data-
driven enterprises to rapidly derive the complete
value from all their data.
• Goals / Requirements:
• Fault-tolerance
Dept. of CSE & ISE, FET
The two critical components of Hadoop are:
7
1. The Hadoop Distributed File System (HDFS):
The Hadoop Distributed File System (HDFS) is the
primary data storage system used by Hadoop
applications. It employs a NameNode and DataNode
architecture to implement a distributed file system
that provides high-performance access to data across
highly scalable Hadoop clusters.
Dept. of CSE & ISE, FET
8 2. MAPREDUCE
10
Dept. of CSE & ISE, FET
Datanode
11 The datanode is a commodity hardware having the GNU/Linux
operating system and datanode software. For every node
(Commodity hardware/System) in a cluster, there will be a
datanode. These nodes manage the data storage of their system.
•Datanodes perform read-write operations on the file systems, as
per client request.
•They also perform operations such as block creation, deletion, and
replication according to the instructions of the namenode.
Block
Generally the user data is stored in the files of HDFS. The file in a
file system will be divided into one or more segments and/or stored
in individual data nodes. These file segments are called as blocks.
In other words, the minimum amount of data that HDFS can read
or write is called a Block. The default block size is 64MB, but it can
be increased as per the need to change in HDFS configuration.
Dept. of CSE & ISE, FET
12
Dept. of CSE & ISE, FET
17
Data discovery : the term used to describe the new
wave of business intelligence that enables users to
explore data, make discoveries, and uncover insights
in a dynamic and intuitive way versus predefined
queries and preconfigured drill-down dashboards.
• This approach has resonated with many business
users who are looking for the freedom and fl exibility
to view Big Data.
• Tableau Software and QlikTech International.
companies’ approach to the market is much different
than the traditional BI software vendor. They grew
through a sales model that many refer to as “land and
expand.”
Open-Source Technology for Big Data Analytics
Dept. of CSE & ISE, FET
33