Professional Documents
Culture Documents
In your presentation,
Please add examples from Banking or Retail Domain.
HIVE/Hadoop
Q1. What are different file formats available in Hadoop and why you will choose ORC
Formats for HIVE - Total clear explanations required.
Q2. Similarity and Differences between RANK,ROW_NUMBER & DENSE_RANK -
Prepare clear examples
Refer below link:
https://codingsight.com/similarities-and-differences-among-rank-dense_rank-and-
row_number-functions/
Q3. How Partition BY & ORDER By works in RANK,ROW_NUMBER & DENSE_RANK
- Explain it with examples
Q4. Give an example of how to read an array and extract individual members of an array in
HIVE - Explain with examples
Q5. Give 2 examples on how to use HIVE split function and HIVE ARRAY data types
Q6. In which scenario will use Hive Lateral View explode ? What is the main purpose of
Lateral View explode ?
Q7. What is the difference between hive collect_set & collect_list ? Give examples and
explanation
Q8. What is the difference between UNION & UNION ALL
Q9. What is the difference between Mapper & Reducer - Explain with 1 examples
Q10. What is the difference between EXTERNAL & INTERNAL Table
Q11. Why you should avoid analytical functions i.e. RANK,ROW_NUMBER &
DENSE_RANK when you need to process large files with 50 millions to 100 million rows.
Q12.What is TO_DATE functions in HIVE/SQL, give an examples
Q13. If you have two dates i.e. DATE_1=2020-06-20 11:45:58 and DATE_2=2020-04-20
11:45:58 then how will you subtract DATE_1 - DATE_2. I need the number of days
difference
Q14. What is SORT_ARRAY in HIVE, give an examples
Q15. Please write a query to retrieve Employees with the second highest salary from each
department
Q16. What is the difference between INSERT INTO & INSERT OVERWRITE in HIVE
Q17. You have two tables as below:
Q20. Write a HIVE SQL code to select not null value for each user_id from msg_id1 &
msg_id1. If msg_id1 is NULL then take the value from msg_id2. If both are NULL then
output will be NULL.
Q21.
Below is the data in a table dim_usr_device_detaill
t.user_id t.country_cd t.last_login_time t.device_id t.brand
9812304 IND 2020-05-03 7734504 Samsung S+
9812305 IND 2020-05-03 7734505 Samsung S+
9812307 IND 2020-05-04 7734507 Samsung S+
9812308 LON 2020-05-04 7734508 Samsung S+
9812310 LON 2020-05-04 7734510 Samsung S+
9812311 LON 2020-05-04 7734511 Samsung S+
9812301 MX 2020-05-04 7734566 IPhone 8+
9812302 MX 2020-05-05 7734502 IPhone 8+
9812303 RU 2020-05-05 7734503 Huawei Honor
9812306 RU 2020-05-04 7734506 Huawei Honor
9812309 RU 2020-05-04 7734509 Samsung S+
update_usr_brand
user_id brand
9812309 RIM
9812306 RIM
Q22. Tell me SQL to select latest row(by activity_date) for each user_id
user_id activity_date msg_id device
110 2020-01-20 13:12:20 NULL x1234
110 2020-01-19 13:12:20 fte345 x9999
110 2020-01-18 13:12:20 NULL x8888
110 2020-01-17 13:12:20 AFG777 x9999
111 2020-01-20 13:12:20 NULL x1234
111 2020-01-19 13:12:20 NULL x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234
112 2020-01-19 13:12:20 NULL x9999
112 2020-01-18 13:12:20 985zse x8888
Q23. Below is a table, please tell me how will you select the latest row for each user_id
which contains not null msg_id.
user_id activity_date msg_id device
110 2020-01-20 13:12:20 NULL x1234
110 2020-01-19 13:12:20 fte345 x9999
110 2020-01-18 13:12:20 NULL x8888
110 2020-01-17 13:12:20 AFG777 x9999
111 2020-01-20 13:12:20 NULL x1234
111 2020-01-19 13:12:20 NULL x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234
112 2020-01-19 13:12:20 NULL x9999
112 2020-01-18 13:12:20 985zse x8888
Expected output:
user_id activity_date msg_id device
110 2020-01-19 13:12:20 fte345 x9999
111 2020-01-18 13:12:20 i568zs x8888
112 2020-01-20 13:12:20 nq68zs x1234