You are on page 1of 1

LOVELY PROFESSIONAL UNIVERSITY

Assignment-1 SET-A

Deadline of Submission: 07 Sep-2020 Course Code: INT576

Q1: For any organization X, there is a data of 100TB and the available resource of HDD is having accessing
rate of 100MB/s. The HDD is having two channels for data storage and retrieval. As a Big Data Engineer,
you are required to calculate the time required to retrieve this data with given features. Also, provide an
idea how this organization X can retrieve this data in minimum time and what are the requirements for
fulfilling your idea in solving this problem. (10)

(Hint: Data will be read from two channels in parallel. As a result you need to calculate time of half data
only as half data will be read in parallel at the same time via another channel).

Q2: Show the steps and commands used in the installation of Apache Hadoop for one node cluster on
your machine. Each step must be supported with screenshots of your machine with your name on
terminal. Explain the functionality of each file used in the configuration of Apache Hadoop. (10)

Q3: i) Create a text file named temp.txt and save it in local file system. Write a hadoop command to copy
this file into HDFS and later display this file from HDFS only. Support your answer with screenshot of CLI
fetching text file on HDFS.

Q3: ii) Create a text file named test.txt and save it in local file system. Write a hadoop command to display
the contents of this file. Support your answer with screenshot of CLI fetching text file.

You might also like