You are on page 1of 4

Each one of you has to prepare a presentation on the topics below.

In your presentation,
Please add examples from Banking or Retail Domain.

DataWare House ( NOT FOR 0 to 2 Yrs Expr )


Q1. What is a partition based load , explain it with examples
Q2. Tell me few HIVE Metastore commands
Q3. How to build a snapshot fact, explain with examples
Q4. What is Data Ingestion and where does your raw data files landed in the cluster
Q5. How raw data files are loaded into stage tables?
Q6. What is NameNode and its role in the Hadoop cluster
Q7. How a file stored in the HDFS, explain the distributed architecture in storage
Q8. Difference between Blocksize and InputSplits
Q9. If the Blocksize of a Hadoop cluster is 128 MB and it has 6 datanodes. When I’ll copy a
1GB file the. How it will be stored. Explain in details

HIVE/Hadoop
Q1. What are different file formats available in Hadoop and why you will choose ORC
Formats for HIVE - Total clear explanations required.
Q2. Similarity and Differences between RANK,ROW_NUMBER & DENSE_RANK -
Prepare clear examples
Refer below link:
https://codingsight.com/similarities-and-differences-among-rank-dense_rank-and-
row_number-functions/
Q3. How Partition BY & ORDER By works in  RANK,ROW_NUMBER & DENSE_RANK 
- Explain it with examples
Q4. Give an example of how to read an array and extract individual members of an array in
HIVE - Explain with examples
Q5. Give 2 examples on how to use HIVE split function and HIVE ARRAY data types
Q6. In which scenario will use Hive Lateral View explode ? What is the main purpose of 
Lateral View explode ?
Q7. What is the difference between hive collect_set & collect_list ? Give examples and
explanation
Q8. What is the difference between UNION & UNION ALL
Q9. What is the difference between Mapper & Reducer - Explain with 1 examples
Q10. What is the difference between EXTERNAL & INTERNAL Table
Q11. Why you should avoid analytical functions i.e.  RANK,ROW_NUMBER &
DENSE_RANK when you need to process large files with 50 millions to 100 million rows. 
Q12.What is TO_DATE functions in HIVE/SQL, give an examples
Q13. If you have two dates i.e. DATE_1=2020-06-20 11:45:58 and  DATE_2=2020-04-20
11:45:58  then how will you subtract DATE_1 - DATE_2. I need the number of days
difference
Q14. What is SORT_ARRAY in HIVE, give an examples
Q15. Please write a query to retrieve Employees with the second highest salary from each
department
Q16.  What is the difference between INSERT INTO & INSERT OVERWRITE in HIVE
Q17. You have two tables as below:

Q18. You have two tables as below:

Q20. Write a HIVE SQL code to select not null value for each user_id from msg_id1 &
msg_id1. If msg_id1 is NULL then take the value from msg_id2. If both are NULL then
output will be NULL.

Q21.
Below is the data in a table dim_usr_device_detaill
t.user_id t.country_cd t.last_login_time t.device_id t.brand
9812304 IND 2020-05-03 7734504 Samsung S+
9812305 IND 2020-05-03 7734505 Samsung S+
9812307 IND 2020-05-04 7734507 Samsung S+
9812308 LON 2020-05-04 7734508 Samsung S+
9812310 LON 2020-05-04 7734510 Samsung S+
9812311 LON 2020-05-04 7734511 Samsung S+
9812301 MX 2020-05-04 7734566 IPhone 8+
9812302 MX 2020-05-05 7734502 IPhone 8+
9812303 RU 2020-05-05 7734503 Huawei Honor
9812306 RU 2020-05-04 7734506 Huawei Honor
9812309 RU 2020-05-04 7734509 Samsung S+

We received a small data in below table, Business want to update brand of


dim_usr_device_detaill to the one given in below file. So please join dim_usr_device_detaill
& update_usr_brand, if there is a match then update the brand column of
dim_usr_device_detaill table by loading brand from update_usr_brand else keep it same.

update_usr_brand
user_id brand
9812309 RIM
9812306 RIM

Q22. Tell me SQL to select latest row(by activity_date) for each user_id
user_id activity_date msg_id device
110 2020-01-20 13:12:20 NULL x1234
110 2020-01-19 13:12:20 fte345 x9999
110 2020-01-18 13:12:20 NULL x8888
110 2020-01-17 13:12:20 AFG777 x9999
111 2020-01-20 13:12:20 NULL x1234
111 2020-01-19 13:12:20 NULL x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234
112 2020-01-19 13:12:20 NULL x9999
112 2020-01-18 13:12:20 985zse x8888

Q23. Below is a table, please tell me how will you select the latest row for each user_id
which contains not null msg_id.
user_id activity_date msg_id device
110 2020-01-20 13:12:20 NULL x1234
110 2020-01-19 13:12:20 fte345 x9999
110 2020-01-18 13:12:20 NULL x8888
110 2020-01-17 13:12:20 AFG777 x9999
111 2020-01-20 13:12:20 NULL x1234
111 2020-01-19 13:12:20 NULL x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234
112 2020-01-19 13:12:20 NULL x9999
112 2020-01-18 13:12:20 985zse x8888

Expected output:
user_id activity_date msg_id device
110 2020-01-19 13:12:20 fte345 x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234

You might also like