Professional Documents
Culture Documents
02 Hadoop Architecture Exercise 4.0.0 PDF
02 Hadoop Architecture Exercise 4.0.0 PDF
Hadoop Fundamentals
Unit 2: Hadoop Architecture
© Copyright IBM Corporation, 2015
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM Software
Contents
LAB 1 HADOOP ARCHITECTURE ............................................................................................................................... 4
1.1 GETTING STARTED .................................................................................................................................................4
1.2 LOGIN TO THE VM ..................................................................................................................................................7
1.3 START ALL SERVICES WITH THE AMBARI WEB CONSOLE........................................................................................9
1.4 BASIC HDFS INTERACTIONS USING THE COMMAND LINE.....................................................................................10
1.5 SUMMARY .............................................................................................................................................................15
Contents Page 3
IBM Software
Username Password
VM image setup screen root password
Linux virtuser password
Page 4 Unit 4
IBM Software
When you connect, you should login as root and use password password at the following screen to
complete the setup required to run the virtual machine (VM) on your system.
Hands-on-Lab Page 5
IBM Software
Further set up is required to get your VM working. Select English (USA) (or, another language, if you
keyboard is different), hit Tab, and then Enter.
Page 6 Unit 4
IBM Software
Choose a password for the user id that you will be working with (virtuser), e.g., password (but you may
select any other password, but have to remember that password for later use).. Hit Tab and Enter.
Enter the password a second time when requested, as confirmation.
Once the password has been entered, you can use the mouse to select Log In.
__2. You need to start the BigInsights components. With this VM, you have two options. One option is
to use the icon that was placed on the desktop.
―
But that icon is unique to this VM and may not be available on other systems / clusters for you in
the future..
__3. The other approach is to use the Ambari Web Console, which may be already started (you would
see a web page in front of you). If you need to start the Ambari Web Console, use the Firefox icon
on the top-left window border (indicated here by the arrow):
Hands-on-Lab Page 7
IBM Software
The URL needed is: http://localhost:8080 or substitute your hostname for localhost.
__4. The following screen will be shown. This is the Ambari Web Console where you configure the
IBM Open Platform for Apache Hadoop v4 software. The default user id and password for
Ambari are admin / admin.
Page 8 Unit 4
IBM Software
__6. To start all services, click Actions at the bottom of the left-hand side, and then Start All:
__8. Once all components have started successfully as shown on the Ambari Web Console, you can
minimize this webpage as you will not need it further during this Lab Exercise.
If you later close down this VM and restart it, you will need to Start All services again.
Hands-on-Lab Page 9
IBM Software
__9. Open a terminal window by clicking on your desktop and selecting Open in Terminal. This leaves
you at /home/virtuser/Desktop. Type cd and Enter to move to your home directory
in Linux, /home/virtuser (designated by “~” in the prompt):
__10. Start with the ls command to list files and directories. In your terminal window, type the
following three commands and hit Enter after each. Pause after each to review your results.
hadoop fs -ls
hadoop fs -ls .
hadoop fs -ls /
Page 10 Unit 4
IBM Software
The first of these lists the files in the current directory — there are none. The second is a little
more explicit since it asks for files in dot (“,”), a synonym for “here” (again the current directory).
The third lists files at the root level within the HDFS (and there are eight directories).
__11. Look at the directory, /user — this is where all “home” directories are kept for HDFS. The
equivalent for Linux is /home — and note the spelling “/user” as this distinguishes this
directory from the /usr directory in Linux that is used for executable binary programs.
hadoop fs -ls /user
Check the contents of your home directory before and after the command to see that is created.
You can do this by listing the contents of the home directory simply (hadoop fs –ls) or relative to
the root directory of HDFS (hadoop fs -ls /user/virtuser):
Hands-on-Lab Page 11
IBM Software
__13. Create a file in your Linux home directory for virtuser (i.e.. /home/virtuser): from a command line,
execute the following commands. (Ctrl-c means to press-and-hold the Ctrl key and then the c key.
cd ~
cat > myfile.txt
this is some data
in my file
Ctrl-c
__14. Next upload this newly created file to the test directory that you just created.
hadoop fs -put *.txt test
__15. Now list your text directory in HDFS. You can use either of the following commands:
hadoop fs -ls test
hadoop fs –ls –R .
Note the number 3 that follows the permissions. This is the replication factor for that data file.
Normally, in a cluster this is 3, but sometimes in a single-node cluster such as the one that you are
running with, there might be only one copy of each block (“split”) of the this file.
The value 3 (or 1, or something else) is the result of a configuration setting of HDFS that sets the
number of replicants by default.
__16. To view the contents of the uploaded file, execute
hadoop fs -cat test/myfile.txt
Page 12 Unit 4
IBM Software
__17. You can pipe (using the “|” character) any HDFS command so that the output can be used by any
Linux command with the Linux shell. For example, you can easily use grep with HDFS by doing
the following.
hadoop fs -cat test/myfile.txt | grep my
Or,
hadoop fs -ls –R . | grep test
__18. To find the size of a particular file, like myfile.txt, execute the following:
hadoop fs -du /user/virtuser/test/myfile.txt
__19. Or, to get the size of all files in a directory by using a directory name rather than a file name.
hadoop fs -du /user/virtuser
__20. Or get a total file size value for all files in a directory:
hadoop fs -du -s /user/virtuser
Hands-on-Lab Page 13
IBM Software
__21. Remember that you can always use the -help parameter to get more help:
hadoop fs -help
hadoop fs -help du
Page 14 Unit 4
IBM Software
1.5 Summary
Congratulations! You are now familiar with the Hadoop Distributed File System (HDFS). You now know
how to manipulate files within HDFS by using the command line.
Remember that the commands that have been illustrated here with hadoop fs can also be executed with
hdfs dfs or other combinations of the hadoop | hdfs and fs | dfs.
You may move on to the next unit.
Hands-on-Lab Page 15
NOTES
NOTES
© Copyright IBM Corporation 2015.