You are on page 1of 18

HDFS JAVA API ON AMAZON EC2

PREREQUISITES
● Download the ‘FileSystemOperationsTest.java’ file.
● Please ensure that you have installed the following on your Windows machine:
1. WinSCP​ tool.
2. Notepad++​.

IMPORTANT INSTRUCTIONS
● The following notations have been while running the Java API code.
[ec2-user@ip-10-0-0-14 ~]$ ​hadoop command

Output of the command


As shown above, the command to be run is written in ​bold. ​The output of the
command is written in ​italics​. The [​ec2-user​@ip-10-0-0-14 ~] tells us the user
through which the command is to be executed.
● Please be careful with the spaces in the commands.
● If a series of commands is given in a particular order, make sure that you run them in
the same order.

NOTE:​ Before starting with the document below, it is necessary to have created the
EC2 instance with Cloudera installed on it and to have connected to it as well. If not
so, kindly go through ​Video 1​ and ​Video 2​ before getting started with this document.
STEPS TO ACCESS HDFS USING THE JAVA API ON AMAZON EC2

● To check whether Java is available or not, do the following:

1. Switch to the root user using ​sudo-i

2. Now, run the following command


[root@ip-10-0-0-14 ~]# ​ls /usr/java/jdk1.7.0_67-cloudera/

● Set the JAVA_HOME and JRE_HOME in the /etc/profile location.

1. Open the file using the command given below.

vi /etc/profile
2. Add the following at the end of the file as shown below. Please change to the
insert mode by pressing​ i​ before pasting the following lines.
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera/
export JRE_HOME=/usr/java/jdk1.7.0_67-cloudera/jre/
export PATH=$JAVA_HOME/bin:$PATH

3. Now, save and exit the file. It is important to exit from the insert mode and
enter the following in the command mode while using the vi editor:
:wq!
● Now run the following commands as shown below:
[root@ip-10-0-0-14 ~]# ​source /etc/profile

[root@ip-10-0-0-14 ~]# ​echo $JAVA_HOME

/usr/java/jdk1.7.0_67-cloudera/
[root@ip-10-0-0-14 ~]# ​java -version

java version "1.7.0_67"


Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
[root@ip-10-0-0-14 ~]# ​javac -version

javac 1.7.0_67

● Open the ‘FileSystemOperationsTest.java’ file on your machine using Notepad++,


and edit the lines as shown below:

1. Replace ​quickstart.cloudera:8020 ​with your Private DNS of your instance as


shown below. Your Private DNS can be accessed from your EC2 instance.

2. Then, save the file and close it.


● WinSCP is a tool to transfer a file from a Windows machine to a Linux machine (EC2
instance).

1. Open WinSCP.

2. Enter the following credentials

Hostname​: ​Provide the public IP from the EC2 dashboard.


Username​: ​ec2-user
Then, click on ‘Advanced’.

3. After clicking on ‘Authentication’​,​ enter the path of your PPK file.


4. Click on ‘OK’ followed by ‘Login’ after which the followed screen will appear.
left side screen: your local machine (Windows, in our case)
right side screen: your linux machine ( AWS EC2 instance)

5. Browse to the ‘FileSystemOperationsTest.java’ side on the left side and drag


and drop it to the right.
6. Click on ‘OK’ on the prompt which appears as shown below.

7. We have now successfully copied the ‘FileSystemOperationsTest.java’ from


our local machine to our EC2 instance.
● Now, go to your AWS EC2 instance, and verify whether
‘FileSystemOperationsTest.java’ is present in the instance. To do so, first, switch to
the ec2-user using ‘su ec2-user’​ ​command. Then, switch to the home directory using
‘cd home’​ ​command followed by ‘cd ec2-user command’, and then use the ‘ls’
command to verify whether the file is present is not.

● Copy ‘FileSystemOperationsTest.java’ to /root (root user home directory).


[ec2-user@ip-10-0-0-14 ~]$ ​cp FileSystemOperationsTest.java /root/

cp: cannot stat ‘/root/FileSystemOperationsTest.java’: Permission denied


Now use the following command to overcome this error:
[ec2-user@ip-10-0-0-14 ~]$ ​sudo cp FileSystemOperationsTest.java /root

● To verify whether the same is done or not, use the following commands:
[ec2-user@ip-10-0-0-14 ~]$ ​sudo -i ​(helps shift from the ec2-user to the root user)

[root@ip-10-0-0-14 ~]# ​ls

FileSystemOperationsTest.java test.txt

You can see that the file has been copied to the root user home directory.
● Now, let us create a directory ‘testapi’ using the ‘mkdir’ command and now copy the
file ‘FileSystemOperationsTest.java’ to it.
[root@ip-10-0-0-14 ~]# ​mkdir testapi

[root@ip-10-0-0-14 ~]# ​cp FileSystemOperationsTest.java testapi/

● Verify the same as shown below:


[root@ip-10-0-0-14 ~]# ​cd testapi/

[root@ip-10-0-0-14 ~]# ​ls

FileSystemOperationsTest.java
You can see that the file is present.

● Now, create a new directory ‘testapi_classes’ using the ‘mkdir’ command to store
the class files after the compilation of the Java code.
[root@ip-10-0-0-14 testapi]# ​mkdir testapi_classes

● Set the environment variable for the Hadoop classpath using the below commands:
[root@ip-10-0-0-14 testapi]# ​export HADOOP_CLASSPATH=$(hadoop classpath)

[root@ip-10-0-0-14 testapi]# ​javac -classpath ${HADOOP_CLASSPATH} -d


/root/testapi/testapi_classes​ ​FileSystemOperationsTest.java
Note: FileSystemOperationsTest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
● Verify whether the class is created or not after the compilation of the Java code.
[root@ip-10-0-0-14 testapi]# ​cd testapi_classes/

[root@ip-10-0-0-14 testapi_classes]# ​ls

FileSystemOperationsDemo.class FileSystemOperationsTest.class
You can see that there is a FileSystemOperationsDemo.class created.

● Navigate back to the test_api folder using the ‘cd ..’ command.
[root@ip-10-0-0-14 testapi_classes]# ​cd ..

Creating a JAR file


● To create a JAR file, run the command shown below:
The syntax for creating a jar file is: jar -cvf jarname.jar -C classfolder_name / .
[root@ip-10-0-0-14 testapi]# ​jar -cvf test.jar -C testapi_classes/ .

added manifest
adding: FileSystemOperationsDemo.class(in = 4394) (out= 2062)(deflated 53%)
adding: FileSystemOperationsTest.class(in = 2655) (out= 1332)(deflated 49%)

Creating a text file


● Create a new file using the ‘cat’ command as shown below and add your text. Save
and exit the file by pressing ‘Ctrl+Z’.
[root@ip-10-0-0-14 testapi]# ​cat > file1.txt

qwe
qwer
Running the Java API Code
● Run the Java API code using the command shown below:
[root@ip-10-0-0-14 testapi]# ​hadoop jar test.jar FileSystemOperationsTest

Enter 1 for local to HDFS.


Enter 2 for HDFS to local.
Enter 3 for HDFS to HDFS.
Enter 4 to split a file.
Enter 5 for deletion from HDFS.
Enter 6 to create a directory.
Enter 7 to exit.
● If user enters​ 1
Enter the local file system path and the HDFS location path.
file1.txt
/user/root/
“18/02/12 10:28:48 INFO Configuration.deprecation: fs.default.name” is deprecated.
Instead, use fs.defaultFS.
Enter 1 for local to HDFS.
Enter 2 for HDFS to local.
Enter 3 for HDFS to HDFS.
Enter 4 to split a file.
Enter 5 for deletion from the HDFS.
Enter 6 to create a directory.
Enter 7 to exit.
Enter​ 7
After you exit, you can verify whether the file has been copied from local to HDFS
using the below command:
[root@ip-10-0-0-14 testapi]# ​hadoop fs -ls /user/root/

Found two items


-rw-r--r-- 3 root supergroup 9 2018-02-12 10:28 /user/root/file1.txt
-rw-r--r-- 6 root supergroup 27 2018-02-12 06:14 /user/root/test.txt

● Now let us remove the file file1.txt from our local directory using ‘rm -rf file1.txt’
command.
[root@ip-10-0-0-14 testapi]#​ rm -rf file1.txt
[root@ip-10-0-0-14 testapi]#​ ls
FileSystemOperationsTest.java testapi_classes test.jar

● Run the Java API code


[root@ip-10-0-0-14 testapi]#​ hadoop jar test.jar FileSystemOperationsTest
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit..
● If user enters ​2
Enter HDFS source…
/user/root/file1.txt
18/02/13 11:12:55 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…
Enter​ 7

● We can verify by checking whether the file is in our local directory by using the​ ‘​ls’
command.
[root@ip-10-0-0-14 testapi]#​ ls
file1.txt FileSystemOperationsTest.java testapi_classes test.jar
You can see that file1.txt exists.
● Run the Java API code
[root@ip-10-0-0-14 testapi]#​ hadoop jar test.jar FileSystemOperationsTest
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…

● If user enters​ 3
Enter HDFS source and destination…
/user/root/file1.txt
/user/root/av.txt
18/02/13 11:37:08 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS

● We can verify whether the file has been copied from HDFS to HDFS using the
command shown below. As you can see the file av.txt is present which will have the
same contents as of the file file1.txt.
[root@ip-10-0-0-14 testapi]# ​hadoop fs -ls /user/root/
Found 3 items
drwxr-xr-x - root supergroup 0 2018-02-13 11:37 /user/root/av.txt
-rw-r--r-- 3 root supergroup 9 2018-02-13 10:42 /user/root/file1.txt
-rw-r--r-- 6 root supergroup 27 2018-02-13 07:50 /user/root/test.txt
● You can also verify the same using ‘Hue’.
To access hue, type your public IP followed by :8888 on your browser.
<Public IP>:8888

Click on the icon on the top left corner followed by ‘files’​, ​then click on​ ​‘hdfs’
followed by​ ‘​root’​ ​ and verify whether the file av.txt exists.
● Run the Java API code
[root@ip-10-0-0-14 testapi]#​ hadoop jar test.jar FileSystemOperationsTest
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…

● If user enters​ 4
Enter HDFS source…
/user/root/file1.txt
18/02/13 11:40:41 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
File :- file1_1.txt created!!!!

● You can verify the same using the command shown below. As we can see the file has
been split into only a single file. This is because the default block size in HDFS is 128
MB. Since the size of the block is less than 128MB, it has been split into a single file.
[root@ip-10-0-0-14 testapi]# ​hadoop fs -ls /user/root/
Found 3 items
drwxr-xr-x - root supergroup 0 2018-02-13 11:37 /user/root/av.txt
-rw-r--r-- 3 root supergroup 9 2018-02-13 11:40 /user/root/file1_1.txt
-rw-r--r-- 6 root supergroup 27 2018-02-13 07:50 /user/root/test.txt
● Run the Java API code
[root@ip-10-0-0-14 testapi]# ​hadoop jar test.jar FileSystemOperationsTest
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…

● If user enters​ 5
Enter HDFS source to be deleted…
/user/root/file1_1.txt
18/02/13 11:45:44 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…

Enter​ 7
● You can verify the same using the command shown below. You can see that the file
file1_1.txt has been deleted
[root@ip-10-0-0-14 testapi]# ​hadoop fs -ls /user/root/
Found 2 items
drwxr-xr-x - root supergroup 0 2018-02-13 11:37 /user/root/av.txt
-rw-r--r-- 6 root supergroup 27 2018-02-13 07:50 /user/root/test.txt

● Run the Java API code


[root@ip-10-0-0-14 testapi]#​ hadoop jar test.jar FileSystemOperationsTest
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit​…
● If user enters​ 6
Enter HDFS directory path to be created…
/user/root/aa
18/02/13 11:54:32 INFO Configuration.deprecation: fs.default.name is deprecated.
Instead, use fs.defaultFS
Enter 1 for Local to HDFS
Enter 2 for HDFS to local
Enter 3 for HDFS to HDFS
Enter 4 for splitting a file
Enter 5 for deletion from HDFS
Enter 6 for making a directory
Enter 7 for exit…

Enter​ 7

● You can verify whether the directory has been created or not using the following
command. It can be seen that the directory ‘aa’ has now been created.
[root@ip-10-0-0-14 testapi]#​ hadoop fs -ls /user/root/
Found 3 items
drwxr-xr-x - root supergroup 0 2018-02-13 11:54 /user/root/aa
drwxr-xr-x - root supergroup 0 2018-02-13 11:37 /user/root/av.txt
-rw-r--r-- 6 root supergroup 27 2018-02-13 07:50 /user/root/test.txt
You can verify the same using Hue as described above.

You might also like