Professional Documents
Culture Documents
Exercises
Exercises
1)
Wordcount
2)
Distributed Grep
3)
Distributed Sed
4)
Return a max length word
5)
Number of lines a file
6)
Given a huge amount of html files which
contains links to other pages. Now write a
mapreduce program which returns a data
structure which maintains all unique urls
7)
Implement Inverse indexing
8)
Suppose input is given as urls with number
of times it viewd.
Now we need to find the avg count of each url
Exampple input:
A1.html
10
A2.html
20
A3.html
5
A1.html
20
A4.html
60
A1.html
30
Output:
A1.html
A2.html
A3.html
A4.html
9)
20
20
5
60
Moving average
https://cwiki.apache.org/Hive/languagemanual-udf.html
https://cwiki.apache.org/Hive/languagemanual-explain.html
http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java
https://cwiki.apache.org/Hive/languagemanual-explain.html
http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-nodecluster/
http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://hbase.apache.org/book/standalone_dist.html#confirm
http://apache.techartifact.com/mirror/mrunit/mrunit-0.9.0-incubating/ and mockitoall-1.8.5 put these jars in class path
https://cwiki.apache.org/MRUNIT/mrunit-tutorial.html