You are on page 1of 14

Best way to learn Hadoop!!!

Paritosh Agarwal Business Consultant at JDA Software


I am new to Hadoop and in a week I have to give a presentation on Hadoop. Can I install Hadoop
on my personal PC and secondly what is best source (sites etc ) to learn Hadoop? Thanks in
advance.

Like (21)
Comment (26)
Share
Follow
August 13, 2012

Comments
Hong Yuan, Yehia Zakaria and 19 others like this
26 comments Jump to most recent comment

Jiri
Jiri Kaplan
Software Development Engineer at Dell
Hi,
Some useful links:
http://www.cloudera.com/resources/training/ (basics of hadoop - video presentations)
http://hadoop.apache.org/common/docs/r1.0.3/
If you prefer video and just use cases, hints, news and general talk about hadoop some
presentations could be useful:
http://www.hadoopworld.com/agenda/http://hadoopsummit.org/
Best way to learn it? Download it and run it in local mode or pseudo-cluster mode imo.
See http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html or
https://ccp.cloudera.com/display/DOC/Documentation (look for quick start guide)
Learn hadoop by exampes:

hadoop-examples and hadoop-test jars are good for first touch with Hadoop jobs
$HADOOP_HOME/src/test* there are sources of examples (Package
org.apache.hadoop.examples in API)
Good luck.
o
o
o

Like (11)
Flag as inappropriate
August 15, 2012

Sivaramakrishnan V., Yehia Z. and 9 others like this

Lou
Lou Dasaro
Software Developer and QA Analyst/Engineer
Here are some videos I collected. The last one does a pretty good job of explaining the
hadoop ecosystem. See
http://www.youtube.com/playlist?list=PLF82F6499E89E1BAE&feature=mh_lolz
o
o
o

Like (4)
Flag as inappropriate
August 15, 2012

Patrick B., Sivaramakrishnan V. and 2 others like this

Tudor
Tudor Lapusan
BigData/Hadoop DevOps
Hi,
I think the best way to start learning hadoop is to :
1. - make a single/multiple cluster, run wordcount example.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-nodecluster/
1. - read more about hadoop core functionalities.
http://developer.yahoo.com/hadoop/tutorial/
bye!
o
o
o

Like (7)
Flag as inappropriate
August 16, 2012

Nikhil M., Sumit Kumar G. and 5 others like this

Varad
Varad Meru
engg@orzota: recsys, data science, big data
Hi Paritosh,
for a complete round up of study materials for Hadoop MapReduce and HDFS, please go
to http://lnkd.in/jycmQd
Copying the content from the above link Studying Hadoop or MapReduce can be a daunting task if you get your hand dirty at the
start.
Some of the prerequisites for learning Hadoop are having a good experience in Java.
Good Analytical skills help a lot as well and final secret sauce for being successful is
you need to be motivated to self learn lot of things in the bigdata arena.
For Learning Hadoop ,I followed the schedule as follows :
Start with very basics of MR with http://code.google.com/edu/parallel/dsdtutorial.htmlhttp://code.google.com/edu/parallel/mapreduce-tutorial.html
Then go for the first two lectures in
http://www.cs.washington.edu/education/courses/cse490h/08au/lectures.htm A very good
course intro to MapReduce and Hadoop.
Read the seminal paper http://labs.google.com/papers/mapreduce.html and its
improvements in the updated version
http://www.cs.washington.edu/education/courses/cse490h/08au/readings/communications

200801-dl.pdf
Then go for all the other videos in the U.Washington link given above.
Try youtubing the terms Map reduce and hadoop to find videos by ORielly and Google
RoundTable for good overview of the future of Hadoop and MapReduce
Then off to the most important videos Cloudera Videos
http://www.cloudera.com/resources/?media=Video
and
Google MiniLecture Series
http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html
Along with all the Multimedia above we need good written material
Documents:
Architecture diagrams at http://hadooper.blogspot.com are good to have on your wall
Hadoop: The definitive guide goes more into the nuts and bolts of the whole system
where as Hadoop in Action is a good read with lots of teaching examples to learn the
concepts of hadoop. Pro Hadoop is not for beginners
pdfs of the documentation from Apache Foundation
http://hadoop.apache.org/common/docs/current/
and http://hadoop.apache.org/common/docs/stable/
will help you learn as to how model your problem into a MR solution in order to gain the
advantages of Hadoop in total.
HDFS paper by Yahoo! Research is also a good read in order to gain in depth knowledge
of hadoop
Subscribe to the User Mailing List of Commons, MapReduce and HDFS in order to know
problems, solutions and future solutions.
Try the http://developer.yahoo.com/hadoop/tutorial/module1.html link for beginners to
expert path to Hadoop
In Addition following 3 books are good resources:
Hadoop Definitive Guide : Good info on Internals
Hadoop in Action : Good Programming Guide
Pro Hadoop : Overall good book with very good explanation to the advanced concepts.
For Any Queries
Contact Apache, Google, Bing, Yahoo!
And for setting up a single node hadoop setup and setting up eclipse for Hadoop
Programming, please go to http://orzota.com/blog/single-node-hadoop-setup-2/ and
http://orzota.com/blog/eclipse-setup-for-hadoop-development/
o
o
o

Like (11)
Flag as inappropriate
August 17, 2012

Kishan G., Tudor L. and 9 others like this

Paritosh
Paritosh Agarwal
Business Consultant at JDA Software
Thanks all of you for your responding. Really appreciate!!!
o
o
o

Like
Flag as inappropriate
August 20, 2012

Yehia
Yehia Zakaria
Graduate Research Assistant at Queen's University
Jiri Kaplan has summarized very good references, in addition, if you are a new hadoop
user and you want to install it, you can follow my blog
http://www.mysolvedproblem.blogspot.com
o
o
o

Like
Flag as inappropriate
August 20, 2012

Joey
Joey Calca
President at Cloud Creative Group

I gave a speech at DefCon 17 that was 40 minutes long about how Hadoop works as well
as a look at some sample code and what it does.
You can find the video of the speech in this playlist (you can skip the first video, its just a
teaser from the end of the speech)
https://www.youtube.com/watch?v=JYACdhxsUNs&list=PL64E697915C5C24FB&featu
re=plcp
The Source Code from the presentation and a simple breakdown of what it does can be
found on the project page here:
http://hackedexistence.com/project-netflix.html
Hope this helps (:
o
o
o

Like (2)
Flag as inappropriate
August 21, 2012

Amlendu K., Sivaramakrishnan V. like this

Gelesh
Gelesh G Omathil
IT Analyst at Tata Consultancy Services
Hadoop Definitve guide, search it on net,
And Instalation, its easy, you need to download the jar files, have a Linux PC ( Ubunto is
what I am using @home )
Have Java also installed,
read the installation procedure,
I will be happy to assist if you still have any confusion,
Best of luck Paritosh
o
o
o

Like (1)
Flag as inappropriate
August 21, 2012

Jagadeesh K. likes this

AVINAV CHANDRA
AVINAV CHANDRA MISHRA
Junior Software Engineer at Indegene Lifesystems Pvt. Ltd.
Hadoop- A Defrnitive Guide ia a good book. I am also a new learner to the Hadoop and
getting problems but it helped me a lot.
o
o
o

Like
Flag as inappropriate
August 24, 2012

Haribalan
Haribalan Raghupathy
BigData Consultant at DivIHN Integration Inc
I have personally installed hadoop in systems and used them and i have documented the
steps on link below. This is not production ready setup , but for prototype, and small level
programs. I have analysed 10GB of data on this setup (it took 1hr)... you can follow this
to install but use latest hadoop. http://www.applams.com/2013/07/install-hadoop-inwindows-7-write-and.html
Now to learn hadoop, depends on your background, but i would definitely recommend
Tom White - Definitive Hadoop guide.. its great.. But as all ways start with a hello world.
o
o
o

Like (1)
Flag as inappropriate
3 months ago

Josh W. likes this

Rishabh

Rishabh aggarwal
Business Developer at authorGEN Technologies
Can enroll in following course having complete Hadoop Big Data Course Training and
will assist you for presentation too.
http://www.wiziq.com/course/23733-hadoop-big-data-training
Can start the course as soon as possible.
Best of luck!
o
o
o

Like
Flag as inappropriate
3 months ago

vijay
vijay kadel
Hadoop Developer at GuruCul Solutions
to install hadoop on your system you want :
1)any Linux os (Centos)
2)number of tings have to install .
if u want to install hadoop -1)apache
2)cloudera
3)hortonworks
second and third one is easiest one ...so try for apache ..
o
o
o

Like
Flag as inappropriate
3 months ago

Ran
Ran Locar
Big Data Performance and Ops Lead

If you are more interested in working with Hadoop than in going through the installation
process (which, if you want to give a presentation, is probably more interesting to you) you can try downloading the Cloudera virtual machine
http://www.cloudera.com/content/support/en/downloads.html
o
o
o

Like
Flag as inappropriate
3 months ago

Elliott
Elliott Cordo
Principal Consultant at Caserta Concepts
All are good recommendations.. You can also check our Debian based Apache Hadoop
VM. Has Hive, Pig, and Mahout installed and configured, also a set of instructions for
building your own:
http://www.casertaconcepts.com/resources/downloads/
o
o
o

Like
Flag as inappropriate
3 months ago

Richard
Richard Raposa
Senior Curriculum Developer at Hortonworks
If you only have a week, I would not bother trying to install Hadoop. Hortonworks has a
Sandbox VM file for Hadoop 2.0 that is ready to go...and any demo on Hadoop nowadays
should include some discussion of YARN and HDFS Federation.
o
o
o

Like (1)
Flag as inappropriate
3 months ago

Sukant K. likes this

Nurudeen Kolade
Nurudeen Kolade Abdulsalam, MSc., MCP, CCNA, ACT,
Manager, IT Systems Research and Development at Skangix Development Limited
Wonderful recommendations! Now you have so many suggested guide towards learning
hadoop. I thinks you should not just rush into installing hadoop without clear
understanding of what's all about. I am suggesting you check at the link below to start
with. It's a wonderful post by Manuel Sevilla. I found it useful. Then you can move ahead
towards installing using any of the guides posted so far, starting with single mode
installation then gradually graduate into 3 -nodes cluster mode, etc.
http://www.capgemini.com/blog/capping-it-off/2012/01/what-is-hadoop
o
o
o

Like
Flag as inappropriate
3 months ago

Leon
Leon Katsnelson
Director, Cloud & Mobile at IBM Information Management
I agree with other comments that spending time installing hardtop is not the best use of
limited time. There are excellent resources available from Hortonworks, Cloudera and
others. I would recommend free courses on http://bigdatauniversity.com. We had many
people in the same situation I.e. needing to understand Hadoop well enough to bring
colleagues up to speed. Courses and hands on exercises help a lot. You can pick up an
environment on a VM or use it on the cloud. No mess no fuss. BigDataUniversity.com
has over 100K registered participants.
o
o
o

Like (1)
Flag as inappropriate
3 months ago

Amaranadh P. likes this

Suman
Suman Patra
-hello sir...Leon.
i agree with you ...BigDataUnivercity realy provide a great platform to start with
BigData/Hadoop....but i have some complaint u should improve the quality of videos and
audios and should focus on improving website in terms of user experience
o
o
o

Like
Flag as inappropriate
3 months ago

Rex
Rex Hu
Graduate Research Associate at Ohio State University
Why do you need to give presentation about hadoop when you are a novice?
o
o
o

Like (1)
Flag as inappropriate
3 months ago

Jack M. likes this

Ransome
Ransome Williams

Enterprise Software Sales, Top Performer


Hortonworks has a great free learning tool, it's called the Hortonworks Sandbox:
http://hortonworks.com/products/hortonworks-sandbox/
It's a free download of a full Hadoop implementation, single node cluster. It's a 15 minute
install which comes complete with tutorials. It also comes with sample data, but the better
way to use it is to ingest your own data, so it's more real world.
Full disclosure, I'm a Hortonworks employee.
o
o
o

Like (2)
Flag as inappropriate
3 months ago

Chris C., Sukant K. like this

Alpesh
Alpesh Pandya
Strategist, Leader, Technocrat
1. Introductory videos on Cloudera are very useful for beginners.
2. I liked Cloudera provided training but it is much focused on certification.
3. Definitive guide 3rd edition is great source of information for administrators and
developers alike.
o
o
o

Like (1)
Flag as inappropriate
3 months ago

Chris C. likes this

Anuja
Anuja Garde

Senior Java/JEE techie


A self paced learning video tutorial is the best way to start learning hadoop where you
can learn how to set and install hadoop, ask doubts, solve questions and prepare yourself
for Cloudera, Hortonworks certification. "Become a certified hadoop developer"
http://goo.gl/DSdqfz is a good start.
o
o
o

Like
Flag as inappropriate
1 month ago

AVINAV CHANDRA
AVINAV CHANDRA MISHRA
Junior Software Engineer at Indegene Lifesystems Pvt. Ltd.
yes you an install hadoop on your pc
at present hadoop is available for linux so you have to install VMware and then eclipse
with hadoop for you, Best way to learn hadoop is to read the book big-data and hadoop
from apaache
o
o
o

Like
Flag as inappropriate
28 days ago

Norman
Norman Johnson Jr
Senior Systems Analyst at Anthelio Healthcare Solutions
try Hortonworks HDP... http://hortonworks.com/products/hdp-2/#documentation
o
o
o

Like
Flag as inappropriate
27 days ago

Victor
Victor Borges
Diretor de T.I. da empresa Sociedade Racionalista
Take a look at the Big Data University (https://bigdatauniversity.com/). Over there, you
can find several hadoop free courses with certificate.
o
o
o

Like
Flag as inappropriate
27 days ago