You are on page 1of 7

Commands in Hadoop

1. ls:

This command is used to list all the files. Use lsr for recursive approach. It is useful when we
want a hierarchy of a folder.

Syntax:

hadoop fs -ls <path>

Example:

Hadoop fs -ls /user

It will print all the directories present in user directory

2. mkdir:

To create a directory. In Hadoop fs there is no home directory by default. So let’s first create it.

Syntax:

hadoop fs -mkdir <folder name>

creating home directory:

Hadoop fs -mkdir /user

3. touchz:

It creates an empty file.

Syntax:

hadoop fs -touchz <file_path>

Example:

hadoop fs -touchz /user/myfile.txt

4. copyFromLocal (or) put:

To copy files/folders from local file system to hdfs store. This is the most important command.
Local filesystem means the files present on the OS.
Syntax:

hadoop fs -copyFromLocal <local file path> <dest(present on hdfs)>

Example: Let’s suppose we have a file AI.txt on Desktop which we want to copy to folder user
present on hdfs.

hadoop fs -copyFromLocal ../Desktop/AI.txt /user

OR

hadoop fs -put ../Desktop/AI.txt /user

5. cat:

To print file contents.

Syntax:

hadoop fs -cat <path>

Example:

// print the content of AI.txt present

// inside geeks folder.

hadoop fs -cat /user/AI.txt

6. copyToLocal (or) get:

To copy files/folders from hdfs store to local file system.

Syntax:

hadoop fs -copyToLocal <<srcfile(on hdfs)> <local file dest>

Example:

hadoop fs -copyToLocal /user/data.txt ../Desktop

(OR)
hadoop fs -put /user/data.txt ../Desktop

7 cp:

This command is used to copy files within hdfs. Lets copy folder user to user_copied.

Syntax:

hadoop fs -cp <src(on hdfs)> <dest(on hdfs)>

Example:

hadoop -cp /user /user_copied

8. mv:

This command is used to move files within hdfs. Lets cut-paste a file myfile.txt from user folder
to user_copied.

Syntax:

hadoop fs -mv <src(on hdfs)> <src(on hdfs)>

Example:

hadoop -mv /user/myfile.txt /user_copied

9. du:

It will give the size of each file in directory.

Syntax:

hadoop fs -du <dirName>

Example:

hadoop fs -du /user

10. dus:

This command will give the total size of directory/file.

Syntax:
hadoop fs -dus <dirName>

Example:

hadoop fs -dus /user


Map Reduce Steps word count program

Step 1: Create a file with the name word_count_data.txt and add some data to it

Step 2: Create a mapper.py file that implements the mapper logic. It will read the data from
STDIN and will split the lines into words, and will generate an output of each word with its
individual count.
#!/usr/bin/env python
# import sys because we need to read and write data to STDIN and STDOUT
import sys
# reading entire line from STDIN (standard input)
for line in sys.stdin:
# to remove leading and trailing whitespace

line = line.strip()
# split the line into words
words = line.split()
for word in words:
print ('%s\t%s' % (word, 1))
Step 3: Create a reducer.py file that implements the reducer logic. It will read the output of
mapper.py from STDIN(standard input) and will aggregate the occurrence of each word and
will write the final output to STDOUT.

#!/usr/bin/env python
from operator import itemgetter
import sys

current_word = None
current_count = 0
for line in sys.stdin:
line = line.strip()
word, count = line.split('\t')
count = int(count)
if current_word == word:
current_count += count
else:
if current_word:
print ('%s\t%s' % (current_word, current_count))
current_count = count
current_word = word

if current_word == word:
print ('%s\t%s' % (current_word, current_count))

Step 4: Now let’s start all our Hadoop daemons with the below command.
start-all.cmd

Now make a directory word_count_in_python in our HDFS in the root directory that will
store our word_count_data.txt file with the below command.
hdfs dfs -mkdir /word_count_in_python

Copy word_count_data.txt to this folder in our HDFS with help of copyFromLocal


command.
Syntax to copy a file from your local file system to the HDFS is given below:

hdfs dfs -copyFromLocal /path 1 /path 2 .... /path n /destination

Let’s give executable permission to our mapper.py and reducer.py with the help of below
command.
cd Documents/

chmod 777 mapper.py reducer.py # changing the permission to read, write, execute for user,
group and others

Step 5: Now download the latest hadoop-streaming jar. Then place, this Hadoop,-streaming
jar file to a place from you can easily access it.

Now let’s run our python files with the help of the Hadoop streaming utility as shown below.

Compile and run command

hadoop jar /home/dikshant/Documents/hadoop-streaming-2.7.3.jar -input


word_count_in_python/word_count_data.txt -output word_count_in_python/output -mapper
mapper.py -reducer reducer.py

You might also like