You are on page 1of 33

Hive Query Language

DATABASE COMMANDS
Create Database
Syntax:
CREATE DATABASE IF NOT EXISTS STUDENTS COMMENT ‘student
details’ WITH DBPROPERTIES (‘creator’=‘JOHN’);

Objective : Creates a database which can be a collection of tables


SHOW DATABASES
Syntax:
• SHOW DATABASES;

• Objective :To display a list of all databases


DESCRIBE DATABASE
Syntax:
• DESCRIBE DATABASE STUDENTS;

• Objective : shows the database name, comment and database


directory
ALTER DATABASE
Syntax:
• ALTER DATABASE STUDENTS SET PROPERTIES(‘edited by’=‘JAMES’);

• Objective : To alter the database properties


USE
Syntax:
• USE STUDENTS;

• Objective : To make the database as current working database


DROP DATABASE
Syntax:
• DROP DATABASE STUDENTS;

• Objective : To destroy the database


COMMANDS FOR TABLES
• Hive provides two kinds of tables
• Managed Table
• External Table

• Managed tables are Hive owned tables where the entire lifecycle of the tables' data are managed
and controlled by Hive.
• External tables are tables where Hive has loose coupling with the data.
• All the write operations to the Managed tables are performed using Hive SQL commands.
• The writes on External tables can be performed using Hive SQL commands but data files can also be
accessed and managed by processes outside of Hive.
• If an External table or partition is dropped, only the metadata associated with the table or partition
is deleted but the underlying data files stay intact. A typical example for External table is to run
analytical queries on HBase via Hive, where data files are written by HBase or and Hive reads them
for analytics.
CREATE TABLE
• SYNTAX
• CREATE TABLE IF NOT EXISTS STUD( rollno INT, name STRING, gpa
FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;

• Objective : to create a managed table


DESCRIBE
SYNTAX
• DESCRIBE STUD

• OBJECTIVE: GIVES DETAILS LIKE COLUMNS, DATA TYPES


Creating External Table
• CREATE EXTERNAL TABLE OF NOT EXISTS EXT_ STUD( rollno INT, name
STRING, gpa FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ‘\t’ LOCATION ‘/STUDENT_INFO’;
LOADING DATA INTO TABLE FROM
FILE
• LOAD DATA LOCAL INPATH ‘/root/hivedemos/student.tsv’ OVERWRITE
INTO TABLE EXT_STUD;

• OBJECTIVE:
• The file at the local path is written into the table
• Avoid the keyword local if the input file has to be fetched from HDFS
Working with Collection datatypes
CREATE TABLE STUDENT_INFO (rollno INT, name STRING, subject ARRAY
<STRING>, marks MAP<STRING, INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’
COLLECTION ITEMS TERMINATED BY ‘:’
MAP KEYS TERMINATED BY ‘!’;

LOAD DATA LOCAL INPATH ‘/root/hivedemos/studentinfo.csv’ INTO TABLE


STUDENT_INFO

Input format:
1001, John, Smith: Jones, Mark1!45 : Mark2!46: Mark3!43
1002, Aby, Smith: Jones, Mark1!65 : Mark2!96: Mark3!93
QUERY TABLES
• SELECT * FROM EXT_STUD;
• SELECT NAME,GPA FROM EXT_STUD;
• SELECT NME,SUB FROM EXT_STUD;
• SELECT NAME, MARKS[Mark1] FROM EXT_STUD;
• SELECT NAME, SUB[0] FROM EXT_STUD
Partitions
• Hive reads the entire dataset even though a where clause is specified
• Hence I/O delayed and partitions required
• Partitions split data into meaningful chunks
• Static Partitions- Consists of columns whose values are known at compile time
• Dynamic Partitions-have partitions whose values are known only at execution
time
Static Partitions
• CREATE TABLE IF NOT EXISTS STATIC_PART_STUDENT(rollno INT, name
STRING) partitioned by (gpa FLOAT) ROW FORMAT DELIMITED FIELDS
TERMINATED BY ‘\t’;
• (created partition)

• INSERT OVERWRITE TABLE STATIC_PART_STUDENT


PARTITION(gpa=4.0) select rollno,name from STUDENT WHERE
gpa=4.0;
• Loaded the data into partition table from original table
Static Partitions continued….
• ALTER TABLE STATIC_PART_STUDENT ADD PARTITION(GPA=3.5);
• (added another partition to the same object)

• INSERT OVERWRITE TABLE STATIC_PART_STUDENT


PARTITION(gpa=3.5) SELECT rollno,name from STUDENT WHERE
gpa=3.5;
• (loaded data into partition from table)
Dynamic Partitions
• CREATE TABLE DYNAMIC_PART_STUD( ROLLNO INT, NAME STRING)
PARTITIONED BY (GPA FLOAT) ROW FROMAT DELEIMITED FIELDS
TERMINATED BY ‘\t’
• (created a partition for dynamic use)
• SET hive.exec.dynamic.partition=true;

• INSERT OVERWRITE TABLE DYNAMIC_PART_STUD PARTITION (gpa)


SELECT rollno,gpa from STUDENT;
• ( will create buckets for every value in STUDENT)
Disadvantages of Partitions
• You have to create partitions for each value of the column
• Hence bucketing preferred
Bucketing
• You can limit the number of buckets
• A bucket is stored as a file in Hive whereas a partition is a directory

Assuming table STUDENT already exists…..


SET hive.enforce.bucketing=true;
CREATE TABLE IF NOT EXISTS STUD_BUCKET(rollno INT, name STRING, grade
FLOAT) CLUSTERED BY (grade) into 3 buckets
FROM STUDENT INSERT OVERWRITE STUD_BUCKET SELECT ROLLNO,
NAME,GRADE;
Bucketing continues…
• Display content of first bucket
• SELECT DISTINCT GRADE FROM STUD_BUCKET TABLESAMPLE(BUCKET 1
OUT OF 3 ON GRADE)
• Outputs the unique values for GRADE in bucket1
VIEWS
• Purely logical objects
• CREATE VIEW V1 AS SELECT rollno, name FROM STUDENT;

• SELECT * FRO V1;


• SELECT * FROM V1 LIMIT 4;

• DROP VIEW V1;


AGGREGATION
• SELECT AVG(GPA) FROM STUD;
• SELECT COUNT(*) FROM STUD;
• SELECT COUNT(DISTINCT AVG) FROM STUD;
GROUP BY & HAVING
• SELECT COUNT(*),GPA FROM STUDENT GROUP BY GPA;

• SELECT COUNT(*),GPA FROM STUDENT GROUP BY GPA HAVING


GPA>6.0;
Online SQL editor for practice
• Online SQL Editor (programiz.com)
JOINS
• STUD(ROLLNO int, NAME string, GPA float)
• DEPT(ROLLNO int, DEPTNAME string)

• Select STUD.rollno, STUD.name, STUD.gpa, DEPT.deptno FROM STUD JOIN


DEPT ON STUD.ROLLNO=DEPT.ROLLNO

• Select a.rollno, a.name, a.gpa, b.deptno FROM STUD a JOIN DEPT b ON


a.ROLLNO=b.ROLLNO
SERDE
• SerDe stands for Serialiser/Deserializer

1. Contains the logic to convert unstructured data into records


2. Implemented using Java
3. Serializers are used at the time of writing
4. Deserializers are used at query time( SELECT statement)
Manipulate XML data
<employee> <empid> 1001 </empid> <name> John </name>
<designation> Team Lead</designation> </employee>

<employee> <empid> 1002 </empid> <name> Anu</name>


<designation> Developer</designation> </employee>]
CREATE TABLE XMLSAMPLE(xmldata string);
LOAD DATA LOCAL INPATH ‘/root/ivedemos/input.xml’ INTO TABLE XMLSAMPLE;
CREATE TABLE xpath_table AS SELECT xpath_int(xmldata, employee/empid),
xpath_string(xmldata, employee/name’),
xpath_string(xmldata,employee/designation’) FROM xmlsample;
SELECT * FROM xpath_table;
Match the following
1. HQL A. Serializer/Deserializer
2. Database B. Logical splits
C. Combine data from multiple tables
3. Complex Data Types
based on common column
4. Table D. Set of records
5. Joins E. Namespace
6. SERDE F. Hive Query language
7. Bucketing G. struct,map
8. Partition H. Physical splits based on indexing or
hashing
Short answer Qns
1. What does the metastore contain?
2. -------------is responsible for compilation, optimization and execution
of Hive queries
3. Hive is a NoSQL language. Comment
4. The results of a Hive Query can be stored as
a. Local file
b.HDFS File
c. Both
d. No such options
5. What is the disadvantage of using too many partitions in Hive tables?
a. It slows down the namenode
b. Storage space is wasted
c. Joins become slow
d. All the above

6. Main advantage of creating table partition


e. Effective storage memory utilization
f. Faster query performance
g. Less Ram required by namenode
h. Simpler Query Syntax
7. By default when a database is dropped in Hive
a. All tables in it are destroyed along with the tables
b. The database is deleted only if there are no tables
c. None of the above

You might also like