Professional Documents
Culture Documents
Gurvinder singh
_______________________________________________________________________________________
_
Student’s Name
5345308
_______________________________________________________________________________________
_
Student’s ID Number
2022-01-17
_______________________________________________________________________________________
Date
This Exam paper should be uploaded on Omnivox via Lea (No Mio)
Exercises 1 (30%):
a- What is Hive?
Ans- Hive is an ETL tool with data storage over the Hadoop Distributed File
System (HDFS). Hive makes the task easier to perform similar tasks
Data entry
Ad-hoc Questions
Analysis of large databases
c- What is SQOOP?
Ans- Sqoop is a command line interface for data transfer between Hadoop-
related information.
Supports single loading table loads or free SQL queries and archives that
can be used multiple times to import updates made to the site from the last
import. Using Sqoop, Data can be transferred to HDFS / hive / hbase from
MySQL / PostgreSQL / Oracle / SQL Server / DB2 and vice versa.
SQL SreverR2
Exercises 2 (35%):
Assume that Cloudera VM is installed on your PC and Apache Hive is installed on
this VM. We consider a dataset saved on text file named employee.txt and
located on: /home/cloudera/hive_exam
The employee.txt file content is:
1- Using Apache Hive, give all the commands to create from the employee.txt
data an external table named employee.
3|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
Ans-
Tips: the table columns are name, location, extension, and job
Exercises 3 (35%):
Assume that Cloudera VM installed on your PC. MySQL server is installed on this
VM. What are the good Linux commands to import data using Apache sqoop?
4|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
Instructions:
Connect to mySQL database using sqoop, import all orders that have order_status
as COMPLETE
Data Description:
A mysql instance is running on the localhost. In that instance, you will find orders
table that
contains order’s data.
> Installation: localhost
> Database name: retail_db
> Table name: Orders
> Username: root
> Password: cloudera
Output Requirement:
Place the customer’s files in HDFS directory
"/user/cloudera/problem1/orders/parquetdata"
Use parquet format with tab delimiter and snappy compression.
Null values are represented as -1 for numbers and "NA" for string
5|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
6|Page