Professional Documents
Culture Documents
• Please log in 10 mins before the class starts and check your internet connection to avoid any network issues during the LIVE
session
• All participants will be on mute, by default, to avoid any background noise. However, you will be unmuted by instructor if
required. Please use the “Questions” tab on your webinar tool to interact with the instructor at any point during the class
• Feel free to ask and answer questions to make your learning interactive. Instructor will address your queries at the end of on-
going topic
• If you want to connect to your Personal Learning Manager (PLM), dial +917618772501
• We have dedicated support team to assist all your queries. You can reach us anytime on the below numbers:
US: 1855 818 0063 (Toll-Free) | India: +91 9019117772
• Your feedback is very much appreciated. Please share feedback after each class, which will help us enhance your learning
experience
HDFS hdfs-site.xml
Yarn yarn-site.xml
Map
mapred-site.xml
Reduce
HDFS
Using Sqoop
Data Loading
Weather Forecasting
Problem Statement
HealthCare
Split Data
R
Split Data E
Very M
D
Big A All
Split Data U matches
Data P
C
: E
Split Data
c
• Processing data in parallel
Rack
Map Task
Data Center
HDFS Block
matches
Reduce
Map 0100
Map Logic 1101
1001
MapReduce Status
Job Submission
Node Status
Node Manager Resource Request
Node Manager
Datanode 1 Datanode 2
Client
HDFS
Node Manager
6. Create 9. Start
container container
HDFS 7. Get Input Splits
MR AppMaster
Data Node
Task JVM
Node Manager
Poll for Status YarnChild
HDFS
Update
MR AppMaster Status Map/Reduce
Task
Data Node
ApplicationMaster
» Yes
» No
List(K2,V2) K2,List(V2)
K1,V1
Deer, 1 Bear, (1,1) Bear, 2
Deer Bear River Bear, 1 List(K3,V3)
River, 1
Car, (1,1,1) Car, 3 Bear, 2
Dear Bear River Car, 1 Car, 3
Car Car River Car Car River Car, 1 Deer, 2
Deer Car Bear River, 1 River, 2
Deer, (1,1) Deer, 2
Deer, 1
Deer Car Bear
Car, 1
Bear, 1 River, (1,1) River, 2
Map:
Key Value
Reduce:
MapReduce
HDFS Physical
Blocks Division
INPUT DATA
Input Logical
Splits Division
▪ Logical records are lines that cross the boundary of the blocks.
Node 1 Node 2
▪ Input data is distributed to nodes
Node 1 Node 2
▪ Input data is distributed to nodes
Node 1 Node 2
▪ Input data is distributed to nodes
Node 1 Node 2
▪ Input data is distributed to nodes
Node 1 Node 2
Node 1 Node 2
▪ Input data is distributed to nodes
▪ Shuffle processor will sort and merge the data for a particular key
Reduce Reduce
Node 1 Node 2
Node 1 Node 2
▪ Input data is distributed to nodes
▪ Shuffle processor will sort and merge the data for a particular key
Node 1 Node 2
Ans. The client have to submit the input spit information by specifying the
start and end point either in InputFormat Configuration.
MapReduce
Combiners Partitioners
Mini-Reducers Perform a
“Local Reduce”
(B,1)
B
(C,1) (B,2)
C
Block 1
(D,1) (C,1)
D Mapper Combiner
(E,1) (D,2)
E (A, [2]) (A,2)
(D,1) (E,1)
D (B, [2,1]) (B,3)
(B,1)
B (C, [1,1]) Reducer (C,2)
Shuffle
(D, [2,2]) (D,4)
D (E, [1]) (E,1)
(D,1)
A (D,2)
Block 2
(A,1)
A (A,2)
Mapper (A,1) Combiner
C (C,1)
(C,1)
B (B,1)
(B,1)
D
(D,1)
Ans. Mapper level as Combiner works on the output data from Mapper.
Ans. Semi Reducer. Combiner works on the Mapper output and lessen
the burden on Reducer.
» TRUE
» FALSE
Ans. TRUE
matches
Reduce Task 1
De-identify columns
Map Task 1 Reduce Task 2
based on
configurations .
Map Task 2
.
. 0100
. 1101
1001
Copyright © edureka and/or its affiliates. All rights reserved.
DeIdentify MapReduce Code
public static String encrypt(String strToEncrypt, byte[] key)
{
try
{
Cipher cipher = Cipher.getInstance("AES/ECB/PKCS5Padding");
SecretKeySpec secretKey = new SecretKeySpec(key, "AES");
cipher.init(Cipher.ENCRYPT_MODE, secretKey);
return encryptedString.trim();
}
catch (Exception e)
{
logger.error("Error while encrypting", e);
}
return null;
}
}
Download all the MapReduce codes from LMS and import them in your Eclipse IDE and execute them
Attempt the Word Count, Patents, & Alphabets assignment using the items present in the LMS under the tab
Module 3
http://www.edureka.in/blog/hadoop-interview-questions-mapreduce/
http://www.edureka.in/blog/apache-hadoop-2-0-and-yarn/
http://www.edureka.in/blog/hadoop-2-0-setting-up-a-single-node-cluster-in-15-minutes/
Setup the CDH4 Hadoop development environment using the documents present in the LMS
http://blog.cloudera.com/blog/2013/08/how-to-use-eclipse-with-mapreduce-in-clouderas-quickstart-vm/
▪ Counters
▪ Distributed Cache
▪ MRUnit