You are on page 1of 6

FILLED BY THE STUDENT:

Gurvinder singh
_______________________________________________________________________________________
_

Student’s Name

5345308
_______________________________________________________________________________________
_

Student’s ID Number

2022-01-17
_______________________________________________________________________________________

Final Exam (40%) _

Date

PROFESSOR: Oussama Derbel


SECTION: 11112

EXAM RULES: FILLED BY THE PROFESSOR:


 All students must have an ID to confirm their identity.
 No student will be allowed to enter the evaluation room Evaluated Competencies:
20 minutes after the evaluation has started.
 Students may not leave the evaluation room during the
Create and use Databases
exam period for any reason.
 Any student who arrives late will not be given any extra Time Allowed: 2 hours
time to complete his or her evaluation.
 Students may be assigned a specific desk/location by the Materials Allowed: Yes
teacher.
 Students may not bring any food or drink other than
water into the evaluation room. Total Mark: 100
 All communication devices including but not limited to
cell phones, smart phones, smart watches, iPods, pagers Mark Obtained:
and Web-accessible electronic devices must be turned off
and left at a place designated by the teacher. Failure to
do so may lead to the removal of the evaluation.
 Cheating attempts or any assistance offered to others will
merit a mark of zero on the evaluation. This includes but
not limited to speaking or looking around the evaluation
room. In this case, the teacher will seize the evaluation
documents and submit a written report to the Program
Coordinator.
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________

This Exam paper should be uploaded on Omnivox via Lea (No Mio)

Exercises 1 (30%):

a- What is Hive?
Ans- Hive is an ETL tool with data storage over the Hadoop Distributed File
System (HDFS). Hive makes the task easier to perform similar tasks

Data entry
Ad-hoc Questions
Analysis of large databases

b- List the different components of Hive architecture


Ans- The major components of Apache Hive are
 the Hive clients,
Hive services,
Processing framework ,
Resource Management,
and the Distributed Storage

c- What is SQOOP?
Ans- Sqoop is a command line interface for data transfer between Hadoop-
related information.

Supports single loading table loads or free SQL queries and archives that
can be used multiple times to import updates made to the site from the last
import. Using Sqoop, Data can be transferred to HDFS / hive / hbase from
MySQL / PostgreSQL / Oracle / SQL Server / DB2 and vice versa.

d- List the different components of SQOOP architecture


Ans- MySQL
Netezza
Oracle JDBC
PostgreSQL
Teradata
2|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________

SQL SreverR2

e- What is log files?


Ans- Log files are the main data source for network visibility. Log is a
computer-generated data file that contains information about usage
patterns, functions, and performance within the operating system,
application, server or other device. IT organizations can use security event
monitoring (SEM), security information management (SIM), security
information and event management (SIEM), or other analytics tools to
integrate and analyze access files across the cloud computing.

Exercises 2 (35%):
Assume that Cloudera VM is installed on your PC and Apache Hive is installed on
this VM. We consider a dataset saved on text file named employee.txt and
located on: /home/cloudera/hive_exam
The employee.txt file content is:

1- Using Apache Hive, give all the commands to create from the employee.txt
data an external table named employee.

3|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________

Ans-

2- Display the row where the name is ‘Paul’.


Ans-

Tips: the table columns are name, location, extension, and job

Exercises 3 (35%):
Assume that Cloudera VM installed on your PC. MySQL server is installed on this
VM. What are the good Linux commands to import data using Apache sqoop?

4|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________

Instructions:
Connect to mySQL database using sqoop, import all orders that have order_status
as COMPLETE

Data Description:
A mysql instance is running on the localhost. In that instance, you will find orders
table that
contains order’s data.
> Installation: localhost
> Database name: retail_db
> Table name: Orders
> Username: root
> Password: cloudera

Output Requirement:
Place the customer’s files in HDFS directory
"/user/cloudera/problem1/orders/parquetdata"
Use parquet format with tab delimiter and snappy compression.
Null values are represented as -1 for numbers and "NA" for string

Tips: “Sqoop Import”.

5|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________

6|Page

You might also like