You are on page 1of 2

Date:-02-Nov-2019

=======================
Today's agenda
--------------------
1) Few export commands
2) HIVE
1) Few export commands
========================

Import EXPORT
Incremental Append = allow Insert
Incremental last modified = Update only

> megrge key in last modiefied is simmilar to update-key in ecport


>update_key in export updateonly
Update only
---------------------
if we have any duplicate records in part file
it will update in sequential manner in RDBMS while exporting but not insert the
record in RDBMS
allow Insert
--------------
Task 1
-------------
Have 2 part files in HDFS
part-m-00000 --- 5 record (1-5)
part-m-000001--- 2 record (6-7)
do the export to the rdbms
insert another partfile part--00002(8-9)
I just need 8 and 9th record in RDBMS (allow insert)

Task 2 ---
Have two parts in hdfs
part-m-00000 ---- 5 records (1-5)
part-m-00001 ---- 2 records (5-7) -- different 5th records
Do sqoop export and check in RDBMS ---- 1,2,3,4,5,5,6,7
Change last 5th record in hdfs firstname,lastname ----- and Do sqoop export and
post the observation
staging table in Export
---------------------------
create table sqlexpren(id int(10),name varchar(10) ,timestamp );
sqoop export --connect jdbc:mysql://localhost/zeyo22 --username root --password
cloudera -table target_table -m 1 --staging-table target_table_staging --export-dir
/user/achievesai8637/incre2/;
2) HIVE
=======================
what is Hive ?
An SQL layer on top of HDFS which is used to process data and provide
the data analytics
'''''''''''''''''''''HIve is not a DATABASE'''''''''''''''''''''''''
it is a kind of Datawarehouse
>HIve is not a underlined row and column system,Purly works on Top of File system
(HDFS) --Ecosytem of HADOOP
>But simillar to SQL

switch to Hive on edgenode


create database zeyobron22
use zeyobron22;
show tables;
-------------------------------------------------------
sql falls under ETL(extract XMLdata and trasformation on and loading into RDBMS)
HQL falls under ELT(extract xml data load into Hive and do the transformation )

steps
-----
create hdfs directory
create a xml hive table on top of that directoty
move exml file into the location
query hive table
>> we can connect hive with any tool which supporting to JDBC
why Hive ?
-----------------
1) HQL is very flexible (in case of SQL is performence for less data )
2) High latency --Time took to handle samall data (in case of SQL Low latency)
3)NO Transformation/Cleansing (in case of SQL cleaning/reformatting)
4)Experimention-- we can do anytnhing since it is raw data (in case of SQL prod
workloads )
5) query time parsing (in case of sql load time parsing ) --It is important topic
query time parsing is --Schema on read
Load time Parsing is --Schema on write
--------------------------------------------------------------------
craete sql table and load string data to integer column
create the table
cretae directoty in hdfs
cretae hive table and load string data to integer column
query the table and check the datza in the back end hdfs directory
-----------------------------------------------------------------------
>>>in HQL int will replace with null if varchar is in place of Int
>>>>>the advange of query parsing is loading will not be stop at any cost
(even if the varchar data is in place of Int concept).
6) stuctured and semi structured (In case of SQL only structured )
7) Eassily scalable (In case of SQL hard to scale)
>>HIve runs on MAP rEDUCE
>>if anything process from front end Map reduce job will run in the back end

You might also like