Welcome to Scribd!

Experiment No.:4: Aim: Word Count Using Mapreduce Theory

Uploaded by

0% found this document useful (0 votes)

8 views6 pages

The document describes an experiment on word count using MapReduce. It explains that the mapper maps keys to values by counting word frequencies, outputting tuples of words and counts. The reducer then combines the tuples to produce final word counts. It outlines the MapReduce workflow including splitting text, mapping words to counts, intermediate splitting of tuples across clusters, reducing by combining counts, and combining results. The conclusion states that word count was successfully implemented using MapReduce techniques to process large datasets in a scalable way.

Original Description:

Original Title

BDA_EXP-4_B-2_47

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

8 views6 pages

Experiment No.:4: Aim: Word Count Using Mapreduce Theory

Uploaded by

Harshita Mandloi

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 6

Search inside document

Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47

Experiment No.:4
Aim: Word Count Using MapReduce
Theory:
In MapReduce word count example, we find out the frequency of each word.
Here, the role of Mapper is to map the keys to the existing values and the role of
Reducer is to aggregate the keys of common values. So, everything is
represented in the form of Key-value pair.

In Hadoop, MapReduce is a computation that decomposes large manipulation

jobs into individual tasks that can be executed in parallel across a cluster of
servers. The results of tasks can be joined together to compute final results.

MapReduce consists of 2 steps:

● Map Function – It takes a set of data and converts it into another set of
data, where individual elements are broken down into tuples (Key-Value
pair).
● Example – (Map function in Word Count)

I Set of data Bus, Car, bus, car, train, car, bus, car, train, bus,
TRAIN,BUS, buS, caR, CAR, car, BUS, TRAIN
Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47

O Convert Into (Bus,1), (Car,1), (bus,1), (car,1), (train,1),

Another set of
data (car,1), (bus,1), (car,1), (train,1), (bus,1),

(Key,Value) (TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1),

(car,1), (BUS,1), (TRAIN,1)

● Reduce Function – Takes the output from Map as an input and combines
those data tuples into a smaller set of tuples.
● Example – (Reduce function in Word Count)

Input Set of Tuples (Bus,1), (Car,1), (bus,1),

(car,1), (train,1),

(output of
Map (car,1), (bus,1), (car,1),
function) (train,1), (bus,1),

(TRAIN,1),(BUS,1), (buS,1),
(caR,1), (CAR,1),

(car,1), (BUS,1), (TRAIN,1)

Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47

Output Converts into smaller (BUS,7),

set of tuples
(CAR,7),

(TRAIN,4)

Work Flow of the Program

Workflow of MapReduce consists of 5 steps:

1. Splitting – The splitting parameter can be anything, e.g. splitting by

space, comma, semicolon, or even by a new line (‘\n’).
2. Mapping – as explained above.
Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47

3. Intermediate splitting – the entire process in parallel on different

clusters. In order to group them in “Reduce Phase” the similar KEY data
should be on the same cluster.
4. Reduce – it is nothing but mostly group by phase.
5. Combining – The last phase where all the data (individual result set from
each cluster) is combined together to form a result.

Output:
Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47
Name- Harshita Mandloi

BE IT-1

BATCH- B-2

Roll no- 47

Lab Outcome: Construct scalable algorithms for large Datasets using

Map Reduce techniques
Conclusion: Hence we successfully executed the Word Count using
MapReduce

CICS Cheat Sheet
Document8 pages
CICS Cheat Sheet
Abhimanyu Hooda
No ratings yet
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
SLT in SAP HANA SAP SOURCE SYSTEM
Document7 pages
SLT in SAP HANA SAP SOURCE SYSTEM
amcreddy
No ratings yet
Computer Vision in Vehicle Technology: Land, Sea, and Air
From Everand
Computer Vision in Vehicle Technology: Land, Sea, and Air
Antonio M. López
No ratings yet
Example - (Map Function in Word Count)
Document6 pages
Example - (Map Function in Word Count)
Dileep Reddy
No ratings yet
Map Reduce
Document57 pages
Map Reduce
Shreya Sonar
No ratings yet
Word Count Program With MapReduce and Java
Document6 pages
Word Count Program With MapReduce and Java
Ria Khare
No ratings yet
Word Count Program With MapReduce and Java
Document6 pages
Word Count Program With MapReduce and Java
sheenu george
No ratings yet
Fundamentals of MapReduce With Example
Document2 pages
Fundamentals of MapReduce With Example
Sanidhya Singh Rajawat
No ratings yet
TSP
Document22 pages
TSP
Horacio Martin Sandoval Claeyssen
No ratings yet
2011 Hyunjung Park VLDB Map-Reduce, RAMP, Big Data
Document4 pages
2011 Hyunjung Park VLDB Map-Reduce, RAMP, Big Data
Sunil Thalia
No ratings yet
New Operators of Genetic Algorithms For Traveling Salesman Problem
Document4 pages
New Operators of Genetic Algorithms For Traveling Salesman Problem
Sam Davis
No ratings yet
Mapreduce Design Patterns: Barry Brumitt
Document33 pages
Mapreduce Design Patterns: Barry Brumitt
debojyotis
No ratings yet
Research Paper - Map Reduce - CSC3323
Document16 pages
Research Paper - Map Reduce - CSC3323
AliElabridi
No ratings yet
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
Document33 pages
Mapreduce: Theory and Implementation: Cse 490H - Intro To Distributed Computing, Modified by George Lee
c1099775
No ratings yet
Distributed Computing Seminar: Lecture 5: Graph Algorithms & Pagerank
Document33 pages
Distributed Computing Seminar: Lecture 5: Graph Algorithms & Pagerank
Anuroopa Pai M
No ratings yet
Distributed Arithmetic (Da)
Document13 pages
Distributed Arithmetic (Da)
Anonymous 1kEh0GvTgi
No ratings yet
Hadoop Interview Questions Faq
Document14 pages
Hadoop Interview Questions Faq
mihirhota
No ratings yet
UNIT 2-tt1
Document7 pages
UNIT 2-tt1
avm5439
No ratings yet
Join Algorithms Using MapReduce
Document22 pages
Join Algorithms Using MapReduce
Marwa Hussien
No ratings yet
Unit 3 MapReduce Part 1
Document12 pages
Unit 3 MapReduce Part 1
Ruparel Education Pvt. Ltd.
No ratings yet
Paper Dvi
Document7 pages
Paper Dvi
jefferyleclerc
No ratings yet
Unit 3 - Big Data Technologies
Document42 pages
Unit 3 - Big Data Technologies
prakash N
No ratings yet
Plotting On Google Static Maps in R: Markus Loecher Sense Networks December 13, 2010
Document7 pages
Plotting On Google Static Maps in R: Markus Loecher Sense Networks December 13, 2010
Phellipe Cruz
No ratings yet
Mapreduce Tutorial
Document1 page
Mapreduce Tutorial
p001
No ratings yet
BDA Assignment No-2 B-2 47
Document14 pages
BDA Assignment No-2 B-2 47
Harshita Mandloi
No ratings yet
B3 - 322070 - Sai Kamble - Assig - 4
Document6 pages
B3 - 322070 - Sai Kamble - Assig - 4
sai.22220063
No ratings yet
Understanding Inputs and Outputs of Mapreduce
Document13 pages
Understanding Inputs and Outputs of Mapreduce
Divya Panta
No ratings yet
HPC Project: :travelling Salesman Problem : Devarsh Sheth (201301423) Jaimin Khanderia (201301424)
Document11 pages
HPC Project: :travelling Salesman Problem : Devarsh Sheth (201301423) Jaimin Khanderia (201301424)
Amber Jordan
No ratings yet
Map Reduce Tutorial-1
Document7 pages
Map Reduce Tutorial-1
jefferyleclerc
No ratings yet
BDA Unit 3 Notes
Document11 pages
BDA Unit 3 Notes
duraisamy
No ratings yet
Full Paper For The Conference
Document4 pages
Full Paper For The Conference
suresh_me2004
No ratings yet
Understanding MapReduce
Document15 pages
Understanding MapReduce
gopikrishna
No ratings yet
Mapreduce Class Notes
Document43 pages
Mapreduce Class Notes
kowsalya
No ratings yet
Unit 4 2 - CC
Document6 pages
Unit 4 2 - CC
teja v
No ratings yet
Tufte in R
Document29 pages
Tufte in R
mr blah
No ratings yet
Ai PRA7
Document2 pages
Ai PRA7
Joshi Labhesh
No ratings yet
Hadoop Python MapReduce Tutorial For Beginners
Document15 pages
Hadoop Python MapReduce Tutorial For Beginners
Ahmed Mohamed
No ratings yet
Hierarchical Mapreduce Programming Model and Scheduling Algorithms
Document6 pages
Hierarchical Mapreduce Programming Model and Scheduling Algorithms
Sudhansu Shekhar Patra
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
Document60 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
Ayush Jha
No ratings yet
Mapreduce Algorithms: Cse 490H
Document49 pages
Mapreduce Algorithms: Cse 490H
priya_z3z
No ratings yet
Major Project
Document78 pages
Major Project
Manas Mehrotra
No ratings yet
22MSM40206 Data Visualisation
Document13 pages
22MSM40206 Data Visualisation
hejoj76652
No ratings yet
Hadoop Wordcount Program
Document20 pages
Hadoop Wordcount Program
Arjun S
No ratings yet
Unit 3 MapReduce Part 2
Document12 pages
Unit 3 MapReduce Part 2
Ruparel Education Pvt. Ltd.
No ratings yet
Activity-1: Student Name USN
Document16 pages
Activity-1: Student Name USN
Trishala Kumari
No ratings yet
Ramp: A System For Capturing and Tracing Provenance in Mapreduce Workflows
Document4 pages
Ramp: A System For Capturing and Tracing Provenance in Mapreduce Workflows
lswjx
No ratings yet
Hadoop Karunesh
Document14 pages
Hadoop Karunesh
Mukul Mishra
No ratings yet
The Mapreduce Programming Model
Document64 pages
The Mapreduce Programming Model
Imran Khan
No ratings yet
Robust Bi-Objective Shortest Path Problem in Real Road Networks
Document9 pages
Robust Bi-Objective Shortest Path Problem in Real Road Networks
mrjockey
No ratings yet
A Parallel Ant Colony Optimization Algorithm With GPU-Acceleration Based On All-In-Roulette Selection
Document5 pages
A Parallel Ant Colony Optimization Algorithm With GPU-Acceleration Based On All-In-Roulette Selection
Jie Fu
No ratings yet
CE 3025: Homework On Transporation Planning: Instructions
Document4 pages
CE 3025: Homework On Transporation Planning: Instructions
SUBHAM SAGAR
No ratings yet
Extended K-Map For Minimizing Multiple Output Logic Circuits
Document8 pages
Extended K-Map For Minimizing Multiple Output Logic Circuits
Anonymous e4UpOQEP
No ratings yet
T Drive
Document28 pages
T Drive
Meenachi Sundaram
No ratings yet
Natural Computing - Practical Assignment: Solving Traveling Salesman Problems With Nature-Inspired Algorithms
Document4 pages
Natural Computing - Practical Assignment: Solving Traveling Salesman Problems With Nature-Inspired Algorithms
Hotland Sitorus
No ratings yet
Lecture Notes Map Reduce
Document24 pages
Lecture Notes Map Reduce
Yuvaraj V, Assistant Professor, BCA
No ratings yet
Map Reduce Algorithm
Document4 pages
Map Reduce Algorithm
Leela Rallapudi
No ratings yet
Mapreduce Types and Formats
Document65 pages
Mapreduce Types and Formats
Tejaswini Karmakonda
No ratings yet
June 19th 2009
Document71 pages
June 19th 2009
api-24647689
No ratings yet
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Schedule of Experiment: Date
Document4 pages
Schedule of Experiment: Date
Pratiksha Kamble
No ratings yet
Whitepaper - Upgrading Elasticsearch With Zero Downtime
Document20 pages
Whitepaper - Upgrading Elasticsearch With Zero Downtime
amolpol98
No ratings yet
This Folder Contains Tcexam Installation Files. Please Delete This Directory After Use. Read Install - HTM File For Installation Instructions
Document1 page
This Folder Contains Tcexam Installation Files. Please Delete This Directory After Use. Read Install - HTM File For Installation Instructions
g1no_caem
No ratings yet
Installing SQL Server 2012 Express
Document12 pages
Installing SQL Server 2012 Express
Ans Boim
No ratings yet
Automated Students Admission System
Document47 pages
Automated Students Admission System
David Amada
No ratings yet
Student - Details (Register - No, Student - Name, DOB, Address, City) Mark - Details (Register - No, Mark1, Mark2, Mark3, Total)
Document6 pages
Student - Details (Register - No, Student - Name, DOB, Address, City) Mark - Details (Register - No, Mark1, Mark2, Mark3, Total)
Kaushik C
No ratings yet
Vendor: Oracle
Document7 pages
Vendor: Oracle
Benbase Salim Abdelkhader
100% (1)
Homework No. 3
Document4 pages
Homework No. 3
Sheenu Thakur
No ratings yet
h17356 Storage Config Sap Hana Tdi SC VG
Document38 pages
h17356 Storage Config Sap Hana Tdi SC VG
marcoaddario
No ratings yet
Add New Field On The Dynamic Selection Using Logical Database - SAP Blogs
Document9 pages
Add New Field On The Dynamic Selection Using Logical Database - SAP Blogs
ANDRE DWI CAHYA
No ratings yet
AzureMl Intro
Document28 pages
AzureMl Intro
balaji shanmugam
No ratings yet
Payment Gateway Integration
Document3 pages
Payment Gateway Integration
Janaka Vidurinda
No ratings yet
Section 7 Quiz
Document8 pages
Section 7 Quiz
Dorin Chioibas
0% (1)
MIS As A Communication Process
Document12 pages
MIS As A Communication Process
Vinutha Santhosh
No ratings yet
Current Log
Document48 pages
Current Log
Badia Charef Bent Mohammed
No ratings yet
Difference Between Dynamic Cache and Static
Document1 page
Difference Between Dynamic Cache and Static
camjole
No ratings yet
Hibernate ntcc2k19
Document21 pages
Hibernate ntcc2k19
Rishavrenold Baptist
No ratings yet
Bba Second Semester
Document21 pages
Bba Second Semester
Pragati Aryal
No ratings yet
Object Oriented Programming Lab 8-1
Document13 pages
Object Oriented Programming Lab 8-1
Haristic 01
No ratings yet
Analysis and Design Accounting Information System PDF
Document5 pages
Analysis and Design Accounting Information System PDF
Heilen Pratiwi
No ratings yet
Virtual University of Pakistan: Assignment No. 02 Student Id: Bc210407489
Document3 pages
Virtual University of Pakistan: Assignment No. 02 Student Id: Bc210407489
Ammara Shahzadi
No ratings yet
9515-185-51-ENG - REV - B1 Scribe Data Exchange Admin Manual
Document62 pages
9515-185-51-ENG - REV - B1 Scribe Data Exchange Admin Manual
Rodrigo Gutierrez
No ratings yet
HRIS-Ch2 (3 Files Merged)
Document62 pages
HRIS-Ch2 (3 Files Merged)
Abdullah Alvi
No ratings yet
Upgarede 12c - Arg8prd - Upgrade
Document43 pages
Upgarede 12c - Arg8prd - Upgrade
Vinu3012
No ratings yet
ISHAN SAXENA Ism File
Document24 pages
ISHAN SAXENA Ism File
Ishan Saxena
No ratings yet
JDBC Lecture Notes
Document14 pages
JDBC Lecture Notes
minni
No ratings yet
Peoplesoft HCM Fluid Benefits Workcenter: Implementation
Document39 pages
Peoplesoft HCM Fluid Benefits Workcenter: Implementation
Sai Shravan 2 I
No ratings yet
Whatsapp Sentiment Analysis Using R
Document4 pages
Whatsapp Sentiment Analysis Using R
DEADSHOT GAMING
No ratings yet
Procedure To Create A SQL Authenticator Using Weblogic Server Console
Document2 pages
Procedure To Create A SQL Authenticator Using Weblogic Server Console
Areeba
No ratings yet