Professional Documents
Culture Documents
Kashif Saeed
1
Lecture Outline
2
RDBMS vs. Hadoop
• RDBMS
Fast query response time
Support for transactions
Capability to modify existing records
Capability to serve thousands of simultaneous clients
RDBMS requires ‘schema on write’ – data structure has to be defined
before it is written
• Hadoop is NOT RDBMS
Generally not as fast query response time
No support for transactions
Cannot modify existing records; can only append or overwrite
Does not have referential integrity enforcement
Hadoop requires ‘schema on read’ – data can be written before data
structures are defined
Hadoop is used for building a data lake, hence it does not have to support transaction processing or the
3
ability to modify records
4
Introduction to Impala and Hive
5
Difference between Impala and Hive
Apache has another open source product called Drill which was inspired by Google Dremel project. Drill
is not part of Cloudera distribution nor it is supported by Cloudera. 6
How Impala/Hive Load & Store Data
7
Both Impala and Hive share the same Metastore and data files.
8
Impala and Hive Commands
9
Creating a Database
Creating a Database
• Adds the database definition to Metastore
• Creates storage directory in HDFS
10
Removing a Database
Removing a Database
• DROP command is used to remove a database
• Only works if the database does not have any tables
• Use CASCADE to remove a database if there are tables in the
database
CASCADE only works in Hive; does not work in Impala
Will remove data from HDFS
11
Data Types
12
Creating a Table
13
Creating a Table -continued
14
Example: Table Creation
15
• Notice that STORED AS TEXTFILE is omitted in
this example because it is the default
• This table is created in the database currently in use
• To ensure creation of the table in a specific database,
you can either make use of the use database
databasename; command OR fully qualify the
table name at the time of creation as
databasename.tablename
16
Creating Tables based on Existing Table
17
Creating Tables based on SELECT
18
Controlling Table Data Location
19
EXTERNAL Tables
20
Exploring Tables
21
• DESCRIBE FORMATTED - shows the table properties:
• Show CREATE TABLE - displays the SQL Command used to create the
table
22
Loading Data into Tables
23
Loading Data from HDFS files
• You can add file to the table directory using hdfs dfs
command
• In the example below, it loads data into the sales table
24
Overwriting Data from Files
25
Appending to a Table
26
Loading data from a Relational Database
27
Hands-on
• Hive – Activity 1
• Hive – Activity 2
28
29
Why do we need Hive or Impala?
30
Interacting with Hive & Impala
31
Using the Impala Shell
33