Professional Documents
Culture Documents
-------------------
Today's Agenda
-----------------------
00)Hive partitions overview
0)insert static partitiones
1)Dynamic partitions
2)Subpartitins
3)BUcketing
4)Partitions with bucketing
5)Bucketing logic for INT
6)Bucketing logic for Date
7) Bucketing logic for String
1)Dynamic partitions
=====================
Note ;-
----------
last column of the select query in the insert statement will become partition.
class assignment
-----------
create a directory in hdfs --- /user/cloudera/zeyo_dynamic_dir
Create a partitioned table on top of that directory -- zeyo_dyn_table
Insert the data into partitioned table with partition specified dynamically
Note:-
3 country files created in the path /user/cloudera/zeyo_dynamic_dir_country/
2)sub partitined
---------------
Class_assignment
-------------------
Note :-for each spendby column,3 files got created with country wise in the hdfs
directory
>>If cordinality(more random category like first 3 columnms in the txns data) is
more in the data so better go for bucketing
>>If cordinality(like spendby and country columns in txns text file) is less then
go for partitioning.
>>>
create table txns_bucket (txnno INT, txndate STRING, custno INT ) clustered by
( txnno) into 10 buckets row format delimited fields terminated by ',' lines
terminated by '\n' stored as textfile location '/user/cloudera/txsn_bucket';