Professional Documents
Culture Documents
Aim: To execute basic commands of Hadoop ecosystem components like Hive, Hbase and Sqoop.
Lab Outcome :Program application using tools like Hive, Pig, NoSQL and MongoDB for Big Data Application.
Program formation/ Documentation (02) Timely Submission Viva Answer (03) Experiment Marks Teacher Signature
Execution / Ethical (03) (15) with date
practices (07 )
EXPERIMENT NO : 2
AIM : To execute basic commands of Hadoop ecosystem components like Hive, Hbase and Sqoop.
THEORY :
1) HIVE : The Hadoop ecosystem component, Apache Hive, is an open source data warehouse system for querying and analyzing large datasets stored in Hadoop
files. Hive does three main functions: data summarization, query, and analysis. Hive uses a language called HiveQL (HQL), which is similar to SQL. HiveQL automatically
translates SQL-like queries into MapReduce jobs which will execute on Hadoop.
Create Database Statement - Create Database is a statement used to create a database in Hive. A database in Hive is a namespace or a collection of tables.
Drop Database Statement - Drop Database is a statement that drops all the tables and deletes the database.
Create Table Statement - Create Table is a statement used to create a table in Hive. Eg
: Create table IF NOT EXISTS BE5students (NAME STRING, ROLLNO INT); Alter Table Statement - It is used to alter a table in Hive. Eg :
Alter table BE5students ADD COLUMNS (REGNO INT); Drop Table Statement - It is used to drop the
table.
Data can be INSERTED into Hive Tables can be done in Multiple ways as follows : a)Load data data from File/Directory:
LOAD DATA INPATH '/home/hadoop/marks.csv' INTO TABLE BE5students; (It will append)
LOAD DATA INPATH '/home/hadoop/marks.csv' OVERWRITE INTO TABLE BE5students; (It will overwrite)
INSERT INTO TABLE BE5students SELECT id, name, age, salary from BE5students_old; c)Directly insert values
2) HBASE : Apache HBase is a Hadoop ecosystem component which is a distributed database that was designed to store structured data in tables that could have
billions of rows and millions of columns. HBase is a scalable, distributed, and NoSQL database that is built on top of HDFS. HBase provides real-time access to read or write
data inHDFS.
version - This command returns the version of HBase used in your system. Its syntax is as follows :
hbase(main):010:0> version
table_help - This command guides you what and how to use table-referenced commands. Given below is the syntax to use this command :
hbase(main):02:0>table_help
whoami - This command returns the user details of HBase. If you execute this command, return the current HBase user as shown below.
hbase(main):008:0>whoami
Creating a Table using HBase Shell - You can create a table using the create command, here you must specify the table name and the Column Family name.
Inserting Data using HBase Shell - Using put command, you can insert rows into a table. hbase(main):005:0> put 'emp','1','personal data:name','raju' hbase(main):006:0> put
'emp','1','personal data:city','hyderabad'
Reading Data using HBase Shell - The get command and the get() method of HTable class are used to read data from a table in HBase. Using the get command, you can get a
Deleting a Specific Cell in a Table - Using the delete command, you can delete a specific cell in a table.
Dropping a Table using HBase Shell - Using the drop command, you can delete a table. Before dropping a table, you have to disable it.
hbase(main):019:0> drop'emp'
3) SQOOP : Apache Sqoop imports data from external sources into related Hadoop ecosystem components like HDFS, Hbase or Hive. It also exports data from
Hadoop to other external sources. Sqoop works with relational databases such as teradata, Netezza, oracle,MySQL.
as text data in the text files or as binary data in Avro and Sequence Files.
$ sqoop-import (generic-args)(import-args)
b) EXPORT - It is used to export data back from the HDFS to the RDBMS database. The target table must exist in the target database. The files which are
given as input to the Sqoop contain records, which are called rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter. The
default operation is to insert all the records from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement
$ sqoop-export (generic-args)(export-args)
CONCLUSION: Hadoop Ecosystem is a platform or a suite which provides various services to solve the big data problems. In this experiment we learned
Hive ,Hbase and sqoop.and performed some basic operations on them respectively.
OUTPUT :