You are on page 1of 24

Apache Hadoop

Module 2
Agenda
• Installation
• Administering Hadoop
• Hive
• Pig
• Sqoop
• Hbase
Installation
Administrative tool
Post successful installation of Hadoop in system.
Open cmd and change directory to "C:\Hadoop*\sbin" and type "start-all.cmd" to start apache.

Make sure these apps are running


• Hadoop Namenode
• Hadoop datanode
• YARN Resourc Manager
• YARN Node Manager

SAFE MODE
Safe mode is needed to give the datanodes time to check in to the namenode to run the filesystem effectively.
When you are starting a new formatted HDFS Cluster, the namenode does not go into safe node, since there are
no blocks in the system.
Safe Mode Commands:
• Hdfs dfsadmin –safemode get : This command is used to check whether the namenode is in safe mode or not.
• Hdfs dfsadmin –safemode wait: Sometimes you want to wait for the namemode to exit safe mode before
carrying out a command.
• Hdfs dfsadmin –safemode enter: To enter Safe mode
• Hdfs dfsadmin –safemode leave: To enter Safe mode

• List of other commands:


• https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/CommandsManual.html
HDFS Commands
▪ cat: Copies source paths to stdout.
hdfs dfs -cat hdfs://<path>/file1
▪ chmod: Changes the permissions of files. With -R, makes the change recursively by way of the directory
structure. The user must be the file owner or the superuser
hdfs dfs -chmod [-R] <MODE[,MODE]… | OCTALMODE> URI [URI …]
Example: hdfs dfs -chmod 777 test/data1.txt
▪ copyFromLocal: Works similarly to the put command, except that the source is restricted to a local file
reference.
hdfs dfs -copyFromLocal input/docs/data2.txt hdfs://localhost/user/rosemary/data2.txt
▪ copyToLocal: Works similarly to the get command, except that the destination is restricted to a local file
reference.
hdfs dfs -copyToLocal data2.txt data2.copy.txt
▪ cp: Copies one or more files from a specified source to a specified destination. If you specify multiple
sources, the specified destination must be a directory.
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir
▪ du: Displays the size of the specified file, or the sizes of files and directories that are contained in the
specified directory. If you specify the -s option, displays an aggregate summary of file sizes rather than
individual file sizes. If you specify the -h option, formats the file sizes in a “human-readable” way.
hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1
▪ get: Copies files to the local file system. Files that fail a cyclic redundancy check (CRC) can still be
copied if you specify the –ignorecrc option. The CRC is a common technique for detecting data
transmission errors. CRC checksum files have the .crc extension and are used to verify the data integrity
of another file. These files are copied if you specify the -crc option.
hdfs dfs -get /user/hadoop/file3 localfile
▪ ls: Returns statistics for the specified files or directories.
hdfs dfs -ls /user/hadoop/file1
▪ mkdir: Creates directories on one or more specified paths. Its behavior is similar to the Unix mkdir -p
command, which creates all directories that lead up to the specified directory if they don’t exist
already.
hdfs dfs -mkdir /user/hadoop/dir5/temp
▪ moveFromLocal: Works similarly to the put command, except that the source is deleted after it is
copied.
Example: hdfs dfs -moveFromLocal localfile1 localfile2 /user/hadoop/hadoopdir
▪ mv: Moves one or more files from a specified source to a specified destination. If you specify multiple
sources, the specified destination must be a directory. Moving files across file systems isn’t permitted.
Example: hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
▪ put: Copies files from the local file system to the destination file system. This command can also read
input from stdin and write to the destination file system.
Example: hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir; hdfs dfs -put – /user/hadoop/hadoopdir
(reads input from stdin)
▪ rm: Deletes one or more specified files. This command doesn’t delete empty directories or files. To
bypass the trash (if it’s enabled) and delete the specified files immediately, specify the -skipTrash option.
Example: hdfs dfs -rm hdfs://nn.example.com/file9
▪ test: Returns attributes of the specified file or directory. Specifies –e to determine whether the file or
directory exists; -z to determine whether the file or directory is empty; and -d to determine whether the
URI is a directory.
hdfs dfs -test /user/hadoop/dir1
Hive
Hive
▪ Hive provides Structure Query Language (SQL) interface, in terms of HiveQL, or the Hive Query
Language.
▪ This interface translates the given query in MapReduce code.
▪ HiveQL enables users to perform tasks using MapReduce concept without explicitly writing the code in
terms of the map and reduce functions.
▪ The data stored in HDFS can be accessed through HiveQL, which contains the features of SQL but, runs
on the MapReduce framework.
▪ It is mostly used in data warehousing kind of applications, where you need to perform batch processing
on a huge amount of data. Example of this kind of data include Web logs, call data records, weather
data. etc.
▪ Following ways are for accessing Hive.
▪ Hive Command Line Interface:
▪ Hive Web Interface:
▪ Hive Server:
▪ JDBC/ODBC
Hive Architechture
Data types and Built in Functions in Hive.
Data Definition Language (DDL) Functions

DDL Commands
1 Create 2 Alter 3 Drop
4 Show 5 Truncate 6 Delete
DDL Command:
CREATE:
Create Database DigitalVidhya; OR
CREATE DATABASE IF NOT EXISTS DigitalVidhya;

CUSTOM LOCATION
CREATE DATABASE DigitalVidhya
LOCATION "/your/preferred/path/in/HDFS“;

DATABASE PROPERTIES:
CREATE DATABASE DigitalVidhya
COMMENT 'This comment is added for Hadoop Tutorial!'
WITH DBPROPERTIES("owner"=“ManishBhagchandani");

DESCRIBE:
DESCRIBE database DigitalVidhya;
DDL Command..
SHOW:
Ex. • Practical Aspects
SHOW DATABASES; • LOAD DATA LOCAL inpath '/home/hive/emp.csv'
SHOW tables;
into TABLE employee;
ALTERING: • select * from employee;
Renaming Tables
Modifying Columns
Delete some columns
Change table properties.
Alter tables for adding partitions.
Altering Storage Properties.
Altering Database Properties.
Ex.
ALTER TABLE stud RENAME TO student;
ALTER TABLE student CHANGE COLUMN sname student_name STRING;
ALTER TABLE student REPLACE COLUMN (sname STRING, grade STRING, city STRING);
ALTER TABLE student REPLACE COLUMN (sname STRING, grade STRING);
ALTER TABLE student ADD COLUMNS (city STRING);

DROP:
Ex.
DROP TABLE IF EXISTS student;
DROP DATABASE IF EXIST DigitalVidhya;
Select Statements
• LOAD DATA LOCAL INPATH '/home/hive/emp-gujarat.csv'
INTO TABLE employee;
• select name, salary from employee;
• select e.name, e.salary from employee e;
• select name, technology from employee;
• select symbol, 'price.*' from employee;
Pig
Pig
• Pig raises the level of abstraction for processing large datasets.
• Pig is made up of two pieces:
–The language used to express data flows, Called Pig Latin
–The execution environment to run Pig Latin Programs.
• Pig latin program is made up of a series of operation, or
transformations, that are applied to the input data to produce
output.
• Pig is scripting language for exploring large datasets.
• Pig was designed to be extensible.
Data type
Query
•Load
•For Each
•Filter
•Dump
Sqoop
Sqoop
• SQOOP is a command line tool which runs on bash or zsh. Thus, the let us create the step
by step procedure on how to import the data from MySQL to HDFS via SQOOP.
• MariaDB instead of MySQL which is sister branch of MySQL as an open source project.
Thus, the commands in both are the same in MariaDB as well as in MySQL.
• Practical Aspects
– Create Database
– Use Database
– Create table
– Insert rows
– View Table
HBASE
Sqoop
• SQOOP is a command line tool which runs on bash or zsh. Thus, the let us create the step
by step procedure on how to import the data from MySQL to HDFS via SQOOP.
• MariaDB instead of MySQL which is sister branch of MySQL as an open source project.
Thus, the commands in both are the same in MariaDB as well as in MySQL.
• Practical Aspects
– Create Database
– Use Database
– Create table
– Insert rows
– View Table
Thank you

You might also like