Module 2.1

Uploaded by

Priyanka Bandagale

0% found this document useful (0 votes)

2 views21 pages

Copyright

Available Formats

PPT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

2 views21 pages

Module 2.1

Uploaded by

Priyanka Bandagale

Copyright:

Available Formats

Download as PPT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 21

Search inside document

Module 2

Introduction to Big Data Frameworks:

Hadoop, NOSQL

Big Data Analytics

BEITC801 Prof. Priyanka Bandagale
FAMT, Ratnagiri
Learning Objectives
• In this lesson you will learn about:
• Distributed system and challenges
• Hadoop Introduction
• History
• The Hadoop Ecosystem
Big Data Challenges
Solution: Distributed system
New Approach to distributed
computing
Hadoop:
A scalable fault-tolerant distributed system for data storage and
processing
• Distribute data when the data is stored
• Process data where the data is
• Data is replicated
Hadoop Introduction
• Apache Hadoop is an open-source software framework for
storage and large-scale processing of data-sets on clusters of
commodity hardware.
• Some of the characteristics:
 Open source
 Distributed processing
 Distributed storage
 Scalable
 Reliable
 Fault-tolerant
 Economical
 Flexible
History
• Originally built as a Infrastructure for the “Nutch” project.
• Based on Google’s map reduce and google File System.
• Created by Doug Cutting in 2005 at Yahoo
• Named after his son’s toy yellow elephant.
Core Hadoop Components
1. Hadoop common package
2. Hadoop Distributed File System (HDFS)
3. Hadoop MapReduce
4. Hadoop Yet Another Resource Negotiator (YARN)
HDFS
• Hadoop Distributed File System (HDFS) is designed to reliably
store very large files across machines in a large cluster. It is
inspired by the GoogleFileSystem.
• Distribute large data file into blocks
• Blocks are managed by different nodes in the cluster
• Each block is replicated on multiple nodes
• Name node stored metadata information about files and
blocks
Nodes
• NameNode:
• Master of the system
• Maintains and manages the blocks which are present
on the DataNodes

• DataNodes:
• Slaves which are deployed on each machine and
provide the actual storage
• Responsible for serving read and write requests for
the clients
Name Node and Data Node block
Replication
How is a 400 MB file Saved on HDFS with hdfs
block size of 100 MB.
Example
Map Reduce
• Input data set is spilt into independent chunks.
• The Mapper:
 Each block is processed in isolation by a map task called
mapper
 Map task runs on the node where the block is stored

• The Reducer:
 Consolidate result from different mappers
 Produce final output
MapReduce Programming Phases and
Deamons

Map Reduce Framework

Phases:
Daemons:
Map- Converts I/P into
Job Tracker- Master
Key Value Pair
Schedule task
Reduce- Combines O/P
Task Tracker- Slave
produce of mappers
executesTask
and a resulted set
•Jobtracker:
• takes care of all the job scheduling and assign tasks to Task Trackers.
•TaskTracker:
• a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations -
from a jobtracker
Map Reduce Architecture
Versions of Hadoop
YARN
Hadoop Ecosystem
Hadoop Ecosystem
• HDFS -> Hadoop Distributed File System
• YARN -> Yet Another Resource Negotiator
• MapReduce -> Data processing using programming
• Spark -> In-memory Data Processing
• PIG, HIVE-> Data Processing Services using Query (SQL-like)
• HBase -> NoSQL Database
• Mahout, Spark MLlib -> Machine Learning
• Apache Drill -> SQL on Hadoop
• Zookeeper -> Managing Cluster
• Oozie -> Job Scheduling
• Flume, Sqoop -> Data Ingesting Services
• Solr & Lucene -> Searching & Indexing
• Ambari -> Provision, Monitor and Maintain cluster

Windows Power Shell Automation
Document675 pages
Windows Power Shell Automation
Madina Ashirbekova
100% (1)
PDF
Document158 pages
PDF
Julian Andres Toro Escobar
No ratings yet
2 Hadoop (Uploaded)
Document82 pages
2 Hadoop (Uploaded)
Prateek Pole
No ratings yet
Analog Layout Issues
Document8 pages
Analog Layout Issues
its_andy2000
100% (2)
Big Data
Document67 pages
Big Data
tamizhanps
No ratings yet
Hadoop Overview
Document16 pages
Hadoop Overview
Sunil D Patil
100% (1)
Shireesha IVR Tester
Document4 pages
Shireesha IVR Tester
Naveen
0% (1)
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
Document55 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
Pratheesh Kumar
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
Document23 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
MAMAN MYTHIEN S
No ratings yet
Datasheet Simatic s7-1200 Cpu 1212c Cpu DCDCDC
Document8 pages
Datasheet Simatic s7-1200 Cpu 1212c Cpu DCDCDC
laurana123
No ratings yet
Hadoop and Their Ecosystem
Document24 pages
Hadoop and Their Ecosystem
sunera pathan
100% (1)
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
HARISH REDDY B
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data and Hadoop: by - Ujjwal Kumar Gupta
Document57 pages
Big Data and Hadoop: by - Ujjwal Kumar Gupta
Ujjwal Kumar Gupta
No ratings yet
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
Rating: 4 out of 5 stars
4/5 (1)
Shortnotes For Cloud
Document22 pages
Shortnotes For Cloud
Mahi Mahi
No ratings yet
Chapter 2 Introduction To Hadoop
Document31 pages
Chapter 2 Introduction To Hadoop
shubham.ojha2102
No ratings yet
Hadoop, A Distributed Framework For Big Data
Document55 pages
Hadoop, A Distributed Framework For Big Data
sonia choudhary
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
Document20 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
C. Valeriu
No ratings yet
Lecture 1
Document55 pages
Lecture 1
George Okemwa
No ratings yet
Hadoop and Related Tools
Document57 pages
Hadoop and Related Tools
Prem Prasad
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
Document53 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
jainam dude
No ratings yet
04 - Introduction To The Big Data Ecosystem
Document25 pages
04 - Introduction To The Big Data Ecosystem
Jose Evanan
No ratings yet
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
Document62 pages
Hadoop Overview: Open Source Framework Processing Large Amounts of Heterogeneous Data Sets Distributed Fashion
Mousoomi Baruah
No ratings yet
Hadoop Important Lecture
Document38 pages
Hadoop Important Lecture
affanabbasi015
No ratings yet
Hadoop Intro - Part1
Document45 pages
Hadoop Intro - Part1
nosopa5904
No ratings yet
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
Document52 pages
Hadoop and Pig Overview - Hands-On: Outline of Tutorial
Konara Kiran
No ratings yet
Unit 3 Da
Document43 pages
Unit 3 Da
aadityapawar210138
No ratings yet
Unit Iv-1
Document84 pages
Unit Iv-1
keerthanavelmurugan02
No ratings yet
Introduction To Big Data and Hadoop
Document29 pages
Introduction To Big Data and Hadoop
Manoj K Upadhyaya
100% (1)
Data Mining With Hadoop and Hive Introduction To Architecture
Document39 pages
Data Mining With Hadoop and Hive Introduction To Architecture
Ashwin Ajmera
No ratings yet
HDFS 79
Document74 pages
HDFS 79
bhargavi
No ratings yet
BDL8 PDF
Document41 pages
BDL8 PDF
Mrs. Usha Naidu S
No ratings yet
Screenshot 2023-01-30 at 7.42.21 AM
Document66 pages
Screenshot 2023-01-30 at 7.42.21 AM
Sanjana Shetty
No ratings yet
Unit 1 Haoop Architecture
Document26 pages
Unit 1 Haoop Architecture
Anirudh Prakash
No ratings yet
Hadoop: - Introduction To Hadoop - Hadoop Distributed File System (HDFS) - Mapreduce - Other Tools / Frameworks
Document8 pages
Hadoop: - Introduction To Hadoop - Hadoop Distributed File System (HDFS) - Mapreduce - Other Tools / Frameworks
Khánh Nguyễn
No ratings yet
Unit V FRAMEWORKS AND VISUALIZATION
Document71 pages
Unit V FRAMEWORKS AND VISUALIZATION
Yash Deep
No ratings yet
Intro ToHadoop-Unit 04
Document24 pages
Intro ToHadoop-Unit 04
Asiya Khan
No ratings yet
The Hadoop Fair Scheduler: Matei Zaharia
Document31 pages
The Hadoop Fair Scheduler: Matei Zaharia
rafaeltelematica2185
No ratings yet
DSCI 5350 - Lecture 2 PDF
Document54 pages
DSCI 5350 - Lecture 2 PDF
Praz
No ratings yet
Chapter 10
Document45 pages
Chapter 10
Sarita Samal
No ratings yet
BY K.Karthikeyan: Hadoop & Map Reduce
Document7 pages
BY K.Karthikeyan: Hadoop & Map Reduce
Karthik S
No ratings yet
Bda 18CS72 Mod-2
Document152 pages
Bda 18CS72 Mod-2
Dhathri Reddy
No ratings yet
2 Hadoop Ecosystem
Document41 pages
2 Hadoop Ecosystem
tranngocbaooooo12062003
No ratings yet
Data W - Bigdata8
Document105 pages
Data W - Bigdata8
ujjwal subedi
No ratings yet
Hadoop 1.0 Vs 2.0
Document18 pages
Hadoop 1.0 Vs 2.0
Piyush Jangir
No ratings yet
Lecture 5 - Hadoop and Mapreduce
Document30 pages
Lecture 5 - Hadoop and Mapreduce
mba20238
No ratings yet
Chapter 2 - 大数据生态系统
Document31 pages
Chapter 2 - 大数据生态系统
gs68295
No ratings yet
Module-2 PPT-1
Document126 pages
Module-2 PPT-1
Lahari bilimale
No ratings yet
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
Document24 pages
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
Ravi Joshi
No ratings yet
Lecture Notes Hadoop
Document11 pages
Lecture Notes Hadoop
sakshi kureley
No ratings yet
Hadoop
Document7 pages
Hadoop
jefferyleclerc
No ratings yet
M2 Bigdata&Hadoop
Document27 pages
M2 Bigdata&Hadoop
Shreeveni
No ratings yet
Business Intelligence & Big Data Analytics-CSE3124Y
Document26 pages
Business Intelligence & Big Data Analytics-CSE3124Y
splokbov
No ratings yet
BigData Unit 2
Document56 pages
BigData Unit 2
Ravi Yadav
No ratings yet
Bda - Unit 2
Document56 pages
Bda - Unit 2
Kajal Vaniya
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
Document48 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
satish.sathya.a2012
No ratings yet
Lecture4 IntroMapReduce PDF
Document75 pages
Lecture4 IntroMapReduce PDF
Ronald Bbosa
No ratings yet
Unit - III Advanced Analytics Technology and Tools
Document44 pages
Unit - III Advanced Analytics Technology and Tools
Diksha Chhabra
No ratings yet
Module 2. 16974328568170
Document113 pages
Module 2. 16974328568170
Sagar B S
No ratings yet
Hadoop Chapter 1
Document6 pages
Hadoop Chapter 1
Swati
No ratings yet
Hadoopdb: An An Architectural Hybrid of Mapreduce & Dbms Technologies For Analytical Workloads
Document34 pages
Hadoopdb: An An Architectural Hybrid of Mapreduce & Dbms Technologies For Analytical Workloads
Tilani Gunawardena
No ratings yet
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
Document17 pages
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
neerendra pratap singh
No ratings yet
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
Document30 pages
Hadoop Common Hadoop Distributed File System (HDFS) Hadoop Yarn Hadoop Mapreduce
Yonggi Park
No ratings yet
Hadoop Dcs
Document31 pages
Hadoop Dcs
bt20cse155
No ratings yet
Iec 61850 in Substation Automation: Automation and Digitization in Substations and Utility Networks
Document7 pages
Iec 61850 in Substation Automation: Automation and Digitization in Substations and Utility Networks
Fahimuddin Qureshi
No ratings yet
Activity 4 - Digital Inputs
Document18 pages
Activity 4 - Digital Inputs
John Genrich Pilarta
No ratings yet
Java KeyWord
Document6 pages
Java KeyWord
Lê Đức Minh
No ratings yet
LabDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detectedDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detectedDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detected8&9
Document1 page
LabDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detectedDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detectedDerive a minimal state table for a single-input and single-output Moore-type FSM that produces an output of 1 if in the input sequence it detects either 110or 101patterns. Overlapping sequences should be detected8&9
Bhushan Kumar
No ratings yet
DIL-32 Communication-IC With Modbus Host Interface: Slave Solution For Simple Field Devices
Document3 pages
DIL-32 Communication-IC With Modbus Host Interface: Slave Solution For Simple Field Devices
Juan Carlos Moscoso
No ratings yet
Migration V1.8/V1.9/V2 - PCS7 V7.1: Configuration Manual
Document29 pages
Migration V1.8/V1.9/V2 - PCS7 V7.1: Configuration Manual
micahx
No ratings yet
Standard Operating Procedures Week 1 Css 11
Document6 pages
Standard Operating Procedures Week 1 Css 11
Jerry
No ratings yet
Sequence Diagram
Document41 pages
Sequence Diagram
Rawan Basheer
No ratings yet
Joydeep Saha: Objective
Document2 pages
Joydeep Saha: Objective
Joydeep Saha
No ratings yet
Readme First - FrameCAD Pro Upgrade 2010 Release
Document2 pages
Readme First - FrameCAD Pro Upgrade 2010 Release
Fernando Unique
No ratings yet
Pulsejack™ Surface Mount 1X1: Features and Benefits
Document12 pages
Pulsejack™ Surface Mount 1X1: Features and Benefits
Yasin Çiçek
No ratings yet
BIG Data - Storing Data
Document40 pages
BIG Data - Storing Data
Atul Agrawal
No ratings yet
Question Bank Computer Science HSSC-1 Chapter-1
Document3 pages
Question Bank Computer Science HSSC-1 Chapter-1
Ali Rauf
No ratings yet
Alcohol Detection Vehicle Control
Document28 pages
Alcohol Detection Vehicle Control
rahul
No ratings yet
Keonn Advanpay 150 Data - Sheet
Document2 pages
Keonn Advanpay 150 Data - Sheet
Irpal Girase
No ratings yet
Electrical Engineering
Document15 pages
Electrical Engineering
Wezir Aliyi (Smart men)
No ratings yet
KNX Protocol PDF
Document6 pages
KNX Protocol PDF
Selam12
No ratings yet
Oracle WebLogic Server - Performance and Tuning
Document182 pages
Oracle WebLogic Server - Performance and Tuning
Frenzy Benzy
No ratings yet
RF and Mic - Website
Document9 pages
RF and Mic - Website
Rajesh Natarajan
No ratings yet
CSC 101 - Operating System-Week1
Document10 pages
CSC 101 - Operating System-Week1
shadowbee 47
No ratings yet
CardioSoft Networking
Document2 pages
CardioSoft Networking
Sirr Dwomoh
No ratings yet
Switches Garretcom
Document23 pages
Switches Garretcom
Cecy CV
No ratings yet
SCTP An Innovative Transport Layer Protocol For TH
Document11 pages
SCTP An Innovative Transport Layer Protocol For TH
Caio Bernardo Brasil
No ratings yet
Design and Development of Robotic Arm For Cutting Tree PDF
Document10 pages
Design and Development of Robotic Arm For Cutting Tree PDF
Hamza Heb
No ratings yet
Microsoft Express Deployment Tool Userguide
Document47 pages
Microsoft Express Deployment Tool Userguide
afd123
No ratings yet