You are on page 1of 2

1.

Install Java 8: Download Java 8 from the link:


a. Set environmental variables:
i. User variable:
Variable: JAVA_HOME user@ubuntu:~$ sudo apt-get install sun-java6-jdk
Value: C:\java
ii. System variable:
Variable: PATH
Value: C:\java\bin

b. Check on cmd, see below: user@ubuntu:~$ java -version


java -version

2. Install Eclipse Adding a dedicated Hadoop system user


a. Set environmental variables: We will use a dedicated Hadoop user account for
i. User variable: running Hadoop.
Variable: ECLIPSE_HOME user@ubuntu:~$ sudo addgroup hadoop_group
Value: C:\eclipse user@ubuntu:~$ sudo adduser --ingroup
ii. System variable: hadoop_group hduser1
Variable: PATH
Value: C:\eclipse \bin
b. Download hadoop2x-eclipse-plugin-master. : Copy these three jar files and pate them into
C:\eclipse\dropins.
c. Download slf4j-1.7.21. : Copy Jar files from this folder and paste them to C:\eclipse\plugins.

3. Download Apache-ant-1.9.6: (optional step)


4. Download Hadoop-2.8.x: extracted Hadoop-2.8.x files
a. Download hadoop-common-2.6.0-bin-master : Paste all files into the bin folder of Hadoop-2.8.x.
b. Create a data folder inside Hadoop-2.6.x, and also create two more folders in the data folder as
data and name.
c. Create a folder to store temporary data during execution of a project, such as D:\hadoop\temp.
d. Create a log folder, such as D:\hadoop\userlog
e. Go to Hadoop-2.6.x etc Hadoop and edit four files:
i. core-site.xml
ii. hdfs-site.xml Configuring SSH
The hadoop control scripts rely on SSH to peform cluster-
iii. mapred.xml
wide operations.
iv. yarn.xml SSH needs to be setup to allow password-less login
core-site.xml for the hadoop user from machines in the cluster.
<property> We have to generate an SSH key for the hduser user.
<name>hadoop.tmp.dir</name> user@ubuntu:~$ su - hduser1
<value>D:\hadoop\temp</value> hduser1@ubuntu:~$
</property>
<property> You have to enable SSH access to your local machine with
<name>fs.default.name</name> this newly created key which is done by the following
command.
<value>hdfs://localhost:50071</value>
hduser1@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >>
</property> $HOME/.ssh/authorized_keysssh-keygen -t rsa -P ""

hdfs-site.xml connecting to the local machine


with the hduser1 user.
<property><name>dfs.replication</name><value>1</value></property> to save your local machines host
<property> <name>dfs.namenode.name.dir</name><value>/hadoop- key fingerprint to the hduser
2.8.0/data/name</value><final>true</final></property> users known hosts file.
<property><name>dfs.datanode.data.dir</name><value>/hadoop- hduser@ubuntu:~$ ssh localhost
2.8.0/data/data</value><final>true</final> </property>
Main Installation
mapred.xml Now, I will start by switching to hduser
<property> hduser@ubuntu:~$ su - hduser1
Now, download and extract Hadoop 1.2.0
<name>mapred.job.tracker</name>
6 Chapter 1. HADOOP INSTALLATION
<value>localhost:9001</value> Installation and Configuration
</property> Documentation, Release 1.0.1
<property> Setup Environment Variables for Hadoop
<name>mapreduce.application.classpath</name> Add the following entries to .bashrc file
# Set Hadoop-related environment variables
<value>/hadoop-2.6.0/share/hadoop/mapreduce/*, export HADOOP_HOME=/usr/local/hadoop
/hadoop-2.6.0/share/hadoop/mapreduce/lib/*, # Add Hadoop bin/ directory to PATH
/hadoop-2.6.0/share/hadoop/common/*, export PATH= $PATH:$HADOOP_HOME/bin

/hadoop-2.6.0/share/hadoop/common/lib/*,
/hadoop-2.6.0/share/hadoop/yarn/*,
/hadoop-2.6.0/share/hadoop/yarn/lib/*,
Configuration
/hadoop-2.6.0/share/hadoop/hdfs/*,
/hadoop-2.6.0/share/hadoop/hdfs/lib/*,
hadoop-env.sh
</value> conf/*-site.xml
</property> conf/core-site.xml
yarn-site.xml conf/mapred-site.xml
<property> conf/hdfs-site.xml
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
f. Go to the location: Hadoop-2.6.0etchadoop, and edit hadoop-env.cmd with
set JAVA_HOME=C:\java\jdk1.8.0_91
g. Set environmental variables: Do: My computer Properties Advance system settings Advanced
Environmental variables
i. User variables:
Variable: HADOOP_HOME Value: D:\hadoop-2.6.0
ii. System variable
Variable: Path Value: D:\hadoop-2.6.0\bin,D:\hadoop-2.6.0\sbin,D:\hadoop-
2.6.0\share\hadoop\common\* , D:\hadoop-2.6.0\share\hadoop\hdfs,D:\hadoop-
2.6.0\share\hadoop\hdfs\lib\* , D:\hadoop-2.6.0\share\hadoop\hdfs\* ,D:\hadoop-
2.6.0\share\hadoop\yarn\lib\* ,D:\hadoop-2.6.0\share\hadoop\yarn\*, D:\hadoop-
2.6.0\share\hadoop\mapreduce\lib\* , D:\hadoop-2.6.0\share\hadoop\mapreduce\* , D:\hadoop-
2.6.0\share\hadoop\common\lib\*
h. Check on cmd; hadoop -version
i. Format name-node: On cmd go to the location Hadoop-2.6.0bin by writing on cmd cd hadoop-2.6.0.\bin
and then hdfs namenode format
j. Start Hadoop. Go to the location: D:\hadoop-2.6.0\sbin. Run the following files as administrator start-
dfs.cmd and start-yarn.cmd
5. Execute a project
You can also progress on http://localhost:8088 and http://localhost:50070
Starting your single-node cluster
Before starting the cluster, we need to give the required permissions to the directory with the following command
hduser@ubuntu:~$ sudo chmod -R 777 /usr/local/hadoop
Run the command
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.
hduser@ubuntu:/usr/local/hadoop$ jps