You are on page 1of 8

RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.

Riccardo Macoratti
Geeky developer · Linux lover

bio cv-eng cv-ita repos nixRIOT linuxvar.it

       

HOME CATEGORIES TAGS ATOM

Install Apache Hadoop 2.7


(on *buntu 16.04)
Posted on Tue 27 September 2016 in tutorials

If you are interested in Hadoop, read more here.

For this tutorial, I'll use a VM with Ubuntu Server 16.04, 64 bit version, relying
on VirtualBox 5.1.4 for the virtualization.

1 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.
All 2 cores of my i5-6200U
4096 MB of RAM (although 1024 MB should be enough)
A dinamically allocated 10 GB VDI hard disk (5 GB are the least)
Ubuntu Server 16.04 x64 ISO file (but every *buntu flavour should be ok)

Notes

When you read a line like this:

jdoe@farlands ~ $ echo "Hello, world!"

I imply a bash prompt without root priviledges, where


jdoe is the username and farlands is the hostname.

On the other hand, when the line is like this:

farlands % echo "Hello, world!"

I imply a bash prompt with root priviledges

Ok, let's start: run the guest os installation with default values and let's jump to
hadoop headaches.

Update the guest system


Open up a terminal and fire this commands to update repositories and upgrade
the emulated system.

farlands % apt update


farlands % apt upgrade -y

Java 8
2 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.

Open up the usual terminal and input:

farlands % apt purge openjdk*


farlands % add-apt-repository -y ppa:webupd8team/java
farlands % apt update
farlands % apt install -y oracle-java8-installer

You can verify Java version by typing:

farlands % java -version


java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

If you read a similar output, you completed this step.

Next, we need to create the JAVA_HOME environmental variable, to give


hadoop the capability to find java executables.

farlands % echo "export JAVA_HOME=/usr" >> /etc/profile


farlands % source /etc/profile

Disable IPv6
Apache Hadoop supports only IPv4, so let's disable IPv6 in the kernel
parameters.

Open the file /etc/sysctl.conf:

farlands % editor /etc/sysctl.conf

And append to the end:

# Disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

3 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.
farlands % reboot

Con�gure SSH keys


We want to run our setup on a different general purpose user, so we will create a
hadoopuser user and a hadoopgroup group.

farlands % addgroup hadoopgroup


farlands % adduser -ingroup hadoopgroup hadoopuser

We need ssh access to our machine, so let's install and start an OpenSSH server.

farlands % apt install ssh


farlands % systemctl enable ssh
farlands % systemctl start ssh

Now we need to setup passwordless ssh, by means of crypto keys. In first place,
we change to the hadoopuser account, then we create the key using RSA
encryption and finally we authorize the key for the current user.

farlands % su - hadoopuser
hadoopuser@farlands ~ $ ssh-keygen -t rsa -P ""
hadoopuser@farlands ~ $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
hadoopuser@farlands ~ $ chmod 600 ~/.ssh/authorized_keys
hadoopuser@farlands ~ $ ssh-copy-id -i ~/.ssh/id_rsa.pub localhost
hadoopuser@farlands ~ $ ssh localhost

If no password were asked on ssh login, you successfully configured


passwordless ssh for user hadoopuser.

Install Hadoop
We are ready to install Hadoop. Unfortunately, it does not come prepackaged, but
we have to extract and move it to /usr/local.

farlands % wget http://it.apache.contactlab.it/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.

4 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.
farlands % mv hadoop-2.7.3 /usr/local
farlands % ln -sf /usr/local/hadoop-2.7.3/ /usr/local/hadoop
farlands % chown -R hadoopuser:hadoopgroup /usr/local/hadoop-2.7.3/

Now we need to configure some environmental variables, with the hadoopuser


account. Switch to that account and edit ~/.bashrc:

hadoopuser@farlands ~ $ editor ~/.bashrc

Append at the end:

# Hadoop config
export HADOOP_PREFIX=/usr/local/hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
# Native path
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native"
# Java path
export JAVA_HOME="/usr"
# OS path
export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin

Next, source ~/.bashrc to apply changes.

hadoopuser@farlands ~ $ source ~/.bashrc

Now we need to edit /usr/local/hadoop/etc/hadoop/hadoop-env.sh:

farlands % editor /usr/local/hadoop/etc/hadoop/hadoop-env.sh

And add this at the end:

export JAVA_HOME="/usr"

5 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.

Hadoop configuration is quite hard, because it has a lot of config files. We need
to navigate to /usr/local/hadoop/etc/hadoop and edit these files:

core-site.xml
hdfs-site.xml
mapred-site.xml (needs to be copied from mapred-site.xml.template)
yarn-site.xml

They all are XML files with a top-level <configuration> node. For clarity we
report the configuration node only.

core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:/usr/local/hadoop/hadoopdata/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:/usr/local/hadoop/hadoopdata/hdfs/datanode</value>

6 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

Format namenode
Next, we need to format the namenode filesystem with the following command:

hadoopuser@farlands ~ $ hdf namenode -format

Search the output: if you can read a string like this:

INFO common.Storage: Storage directory /usr/local/hadoop/hadoopdata/hdfs/namenode has b

It's done.

Start and stop services


Now, the last thing to do is starting Hadoop services:

7 of 8 8/18/18, 11:09 PM
RicMa.co – Install Apache Hadoop 2.7 (on *buntu ... https://ricma.co/install-apache-hadoop-27-on-bunt...

This site uses some cookies! That's so unusual, right? Close this to accept the policy or learn ✖
more.

To check the status of the services use the jps command:

hadoopuser@farlands ~ $ jps
26899 Jps
26216 SecondaryNameNode
25912 NameNode
26041 DataNode
26378 ResourceManager
26494 NodeManager

To stop services, these are the commands:

hadoopuser@farlands ~ $ stop-dfs.sh
hadoopuser@farlands ~ $ stop-yarn.sh

Congratulations, you made it!

hadoop apache ubuntu tutorial hdfs

Licenses · Cookie policy

© Riccardo Macoratti 2016 - This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike
4.0 International License

Built using Pelican · Oniony theme based on Flex by Alexandre Vicenzi

8 of 8 8/18/18, 11:09 PM

You might also like