Final Exam Big Data - 11112

FILLED BY THE STUDENT:
Gurvinder singh
_______________________________________________________________________________________
_
Student’s Name
5345308
_______________________________________________________________________________________
_
Student’s ID Number
2022-01-17
_______________________________________________________________________________________
Final Exam (40%) _
Date
PROFESSOR: Oussama Derbel

SECTION: 11112
EXAM RULES: FILLED BY THE PROFESSOR:

 All students must have an ID to confirm their identity.
 No student will be allowed to enter the evaluation room Evaluated Competencies:
20 minutes after the evaluation has started.
 Students may not leave the evaluation room during the
Create and use Databases
exam period for any reason.
 Any student who arrives late will not be given any extra Time Allowed: 2 hours
time to complete his or her evaluation.
 Students may be assigned a specific desk/location by the Materials Allowed: Yes
teacher.
 Students may not bring any food or drink other than
water into the evaluation room. Total Mark: 100
 All communication devices including but not limited to
cell phones, smart phones, smart watches, iPods, pagers Mark Obtained:
and Web-accessible electronic devices must be turned off
and left at a place designated by the teacher. Failure to
do so may lead to the removal of the evaluation.
 Cheating attempts or any assistance offered to others will
merit a mark of zero on the evaluation. This includes but
not limited to speaking or looking around the evaluation
room. In this case, the teacher will seize the evaluation
documents and submit a written report to the Program
Coordinator.
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
This Exam paper should be uploaded on Omnivox via Lea (No Mio)
Exercises 1 (30%):
a- What is Hive?
Ans- Hive is an ETL tool with data storage over the Hadoop Distributed File
System (HDFS). Hive makes the task easier to perform similar tasks
Data entry
Ad-hoc Questions
Analysis of large databases
b- List the different components of Hive architecture

Ans- The major components of Apache Hive are
the Hive clients,
Hive services,
Processing framework ,
Resource Management,
and the Distributed Storage
c- What is SQOOP?
Ans- Sqoop is a command line interface for data transfer between Hadoop-
related information.
Supports single loading table loads or free SQL queries and archives that
can be used multiple times to import updates made to the site from the last
import. Using Sqoop, Data can be transferred to HDFS / hive / hbase from
MySQL / PostgreSQL / Oracle / SQL Server / DB2 and vice versa.
d- List the different components of SQOOP architecture

Ans- MySQL
Netezza
Oracle JDBC
PostgreSQL
Teradata
2|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
SQL SreverR2
e- What is log files?

Ans- Log files are the main data source for network visibility. Log is a
computer-generated data file that contains information about usage
patterns, functions, and performance within the operating system,
application, server or other device. IT organizations can use security event
monitoring (SEM), security information management (SIM), security
information and event management (SIEM), or other analytics tools to
integrate and analyze access files across the cloud computing.
Exercises 2 (35%):
Assume that Cloudera VM is installed on your PC and Apache Hive is installed on
this VM. We consider a dataset saved on text file named employee.txt and
located on: /home/cloudera/hive_exam
The employee.txt file content is:
1- Using Apache Hive, give all the commands to create from the employee.txt
data an external table named employee.
3|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
Ans-
2- Display the row where the name is ‘Paul’.

Ans-
Tips: the table columns are name, location, extension, and job
Exercises 3 (35%):
Assume that Cloudera VM installed on your PC. MySQL server is installed on this
VM. What are the good Linux commands to import data using Apache sqoop?
4|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
Instructions:
Connect to mySQL database using sqoop, import all orders that have order_status
as COMPLETE
Data Description:
A mysql instance is running on the localhost. In that instance, you will find orders
table that
contains order’s data.
> Installation: localhost
> Database name: retail_db
> Table name: Orders
> Username: root
> Password: cloudera
Output Requirement:
Place the customer’s files in HDFS directory
"/user/cloudera/problem1/orders/parquetdata"
Use parquet format with tab delimiter and snappy compression.
Null values are represented as -1 for numbers and "NA" for string
Tips: “Sqoop Import”.
5|Page
Big Data
420-BZ2-GX
STUDENT’S NAME:_____________ _____________________________________________________________________________________________________________
6|Page

Final Exam Big Data - 11112

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Exam Big Data - 11112

Uploaded by

Copyright:

Available Formats

FILLED BY THE STUDENT:

Final Exam (40%) _

PROFESSOR: Oussama Derbel

EXAM RULES: FILLED BY THE PROFESSOR:

b- List the different components of Hive architecture

d- List the different components of SQOOP architecture

e- What is log files?

2- Display the row where the name is ‘Paul’.

Tips: “Sqoop Import”.

You might also like