You are on page 1of 9

N01346254 Inderjit Singh

1. Load sales data


a. First load product. csv file into hadoop /user/maria_dev/ directory

2. then create a hive external table and load the csv file
a. CREATE EXTERNAL TABLE IF NOT EXISTS product_external(id int, item string,fullname
string, quantity int, price int,item_type string)
b. ROW FORMAT DELIMITED
c. FIELDS TERMINATED BY ','
d. STORED AS TEXTFILE
e. LOCATION '/tmp/product';
3. Then lets load data into external hive table
a. LOAD DATA INPATH '/tmp/lab/product.csv' overwrite INTO TABLE product_external
b. If you get an error saying failed to movetask - try selecting from table product_external
to see if it has loaded data – if it has move on as there could be an issue with the
sandbox
4. Then create a hive internal table
a. CREATE TABLE IF NOT EXISTS EXISTS product_ORC(id int, item string,fullname string,
quantity int, price int,item_type string) STORED AS ORC;
b. If you get an error – remove the redundant word 😊

5. Then load data from external to internal ORC


a. INSERT INTO TABLE product_orc SELECT * FROM product_external; - screen print
results
6. select from both tables to see the data - screen print results

select * from product_external;


select * from product_ORC;

7. login to Hbase and create a hbase table


a. create table 'Product', 'details'
8. create table in hive that maps directly to the hbase table
a. Please review previous class notes to do this ?
create and print the definition use name ext_hbase_product

Table definition :
9. From hive we insert records into Hbase table
a. INSERT INTO TABLE ext_hbase_product SELECT * FROM product_orc;
b. Select * from ext_hbase_table - screen print first page only
10. From Hbase
a. scan 'Product' - screen print the last page from hbase

Last page of screen

b. get ‘Product’,’1’ – screen print


c. Write a filter command to show just the fullname

d. Write a filter command where item_type = ‘paper’ screen print

You might also like