You are on page 1of 3

Exercises

1)
Wordcount
2)
Distributed Grep
3)
Distributed Sed
4)
Return a max length word
5)
Number of lines a file
6)
Given a huge amount of html files which
contains links to other pages. Now write a
mapreduce program which returns a data
structure which maintains all unique urls
7)
Implement Inverse indexing
8)
Suppose input is given as urls with number
of times it viewd.
Now we need to find the avg count of each url
Exampple input:
A1.html
10
A2.html
20
A3.html
5
A1.html
20
A4.html
60
A1.html
30
Output:
A1.html
A2.html
A3.html
A4.html
9)

20
20
5
60

Moving average

10) Count total number of words which starts


with a,b,c.,z
11) I am getting stream of data, I need to find
the aggregates for every hour basis.
12) [Distributed Cache][Exercise] Using a
lookup file, filter out all the stop words and
given count of normal words.
13) [DC] Using a lookup file, which contains
user name and phone numbers. Given input file
which contains only users, need to extract out
all phone numbers
14) Secondary Sort
15) Configurations
16) Example of Key-Value Input Format
17) Example of Sequence File Input Format
18) MR Unit test case
19) http://www.apache.org/dyn/closer.cgi
20) http://wiki.apache.org/hadoop/HowToContribute
http://www.slideshare.net/cwsteinbach/hive-quick-start-tutorial

https://cwiki.apache.org/Hive/languagemanual-udf.html
https://cwiki.apache.org/Hive/languagemanual-explain.html

http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java
https://cwiki.apache.org/Hive/languagemanual-explain.html
http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-nodecluster/

http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://hbase.apache.org/book/standalone_dist.html#confirm
http://apache.techartifact.com/mirror/mrunit/mrunit-0.9.0-incubating/ and mockitoall-1.8.5 put these jars in class path
https://cwiki.apache.org/MRUNIT/mrunit-tutorial.html

You might also like