Professional Documents
Culture Documents
Mid-Semester Test
(EC-2 Regular)
Set-(A)
1. a) Label the following as structured and unstructured data. [1]
i) Images uploaded by user on social media platform.
ii) Passenger details recorded by railway reservation website
b) Consider a 2-level memory hierarchy system built using cache and main memory. It is observed
that 15 out of 50 times the data/instructions required by program are not found in cache.
i) Find out cache hit ratio. [1]
ii) If the cache access time is 2ms and main memory access time is 10ms. Find out the average
memory access time. [2]
c) You have a 700 MB file stored on HDFS as part of a Hadoop 1.x distribution. A data analytics
program uses this file and runs in parallel across the cluster nodes. The default block size and
replication factor is used in the configuration. How many total blocks including replicas will be
stored in the cluster? What are the unique HDFS block sizes you will find for this specific file? [2]
b) Consider a 2-level memory hierarchy system built using cache and main memory. It is observed
that 20 out of 80 times the data/instructions required by program are not found in cache.
i) Find out cache hit ratio. [1]
ii) If the cache access time is 3ms and main memory access time is 12ms. Find out the average
memory access time. [2]
c) You have a 942 MB file stored on HDFS as part of a Hadoop 2.x distribution. A data analytics
program uses this file and runs in parallel across the cluster nodes. The default block size and
replication factor is used in the configuration. How many total blocks including replicas will be
stored in the cluster? What are the unique HDFS block sizes you will find for this specific file? [2]
b) Consider a 2-level memory hierarchy system built using cache and main memory. It is observed
that 10 out of 50 times the data/instructions required by program are not found in cache.
i) Find out cache hit ratio. [1]
ii) If the cache access time is 3ms and main memory access time is 15ms. Find out the average
memory access time. [2]
c) You have an 812 MB file stored on HDFS as part of a Hadoop 1.x distribution. A data analytics
program uses this file and runs in parallel across the cluster nodes. The default block size and
replication factor of 4 is used in the configuration. How many total blocks including replicas will
be stored in the cluster? What are the unique HDFS block sizes you will find for this specific file?
b) Ravi wants to run a simulation for his research and his supervisor advised him to run it for a fixed
problem size. Ravi is successful in achieving 88% parallelism of the code, with 12% of it being
sequential.
i) If time taken to run the problem on single processor is 10 seconds. What will be the time
taken to execute the same problem on 11 processors? [3]
ii) What would be the maximum speed up Ravi can achieve executing the same problem if there
is no constraint on the number of processors available? [2]
Note: Show detailed calculations. Ignore other overheads such as communication etc.
Set-(B)
2. a) Consider the following use cases and suggest the design choice as per the design principles of CAP
theorem, i.e. is it of type CA, CP or AP? Justify your design choice in each case.
i) A large scale event reservation system that has less than 40% seats booked. The system should
facilitate the bookings in case of network disruptions. [2]
.
ii) ABC.com is an online e-retailer. The organization gathers product reviews from the customers
and displays it on its website. The customers can provide ratings with feedback comments. The
organization wants to ensure that review written by the customer is never lost while it may not
be immediately available to others for view. [2]
b) Ravi wants to run a simulation for his research and his supervisor advised him to run it for a fixed
problem size. Ravi is successful in achieving 66% parallelism of the code, with 34% of it being
sequential.
i) If time taken to run the problem on single processor is 10 seconds. What will be the time
taken to execute the same problem on 11 processors? [3]
ii) What would be the maximum speed up Ravi can achieve executing the same problem if there
is no constraint on the number of processors available? [2]
Note: Show detailed calculations. Ignore other overheads such as communication etc.
Set-(C)
2. a) Consider the following use cases and suggest the design choice as per the design principles of CAP
theorem, i.e. is it of type CA, CP or AP? Justify your design choice in each case.
i) A large scale event reservation system with more than 95% of seats booked. The system
should continue to work in case of network disruptions. [2]
ii) A large scale banking application facilitating credit and debit in customer accounts. This
application is expected to handle millions of transactions daily and maintain the financial
integrity of customer accounts. The application must ensure the security and accuracy of
financial data while providing a responsive and reliable user experience. [2]
b) Ravi wants to run a simulation for his research and his supervisor advised him to run it for a fixed
problem size. Ravi is successful in achieving 99% parallelism of the code, with 1% of it being
sequential.
i) If time taken to run the problem on single processor is 10 seconds. What will be the time
taken to execute the same problem on 11 processors? [3]
ii) What would be the maximum speed up Ravi can achieve executing the same problem if there
is no constraint on the number of processors available? [2]
Note: Show detailed calculations. Ignore other overheads such as communication etc.
Set-(A)
3. Consider the below Student file as follows
a) How you can use map-reduce programming model to find average marks of all the students in each
course? Explain. [1]
The reference output is given below
b) Write pseudo code for map and reduce functions to find out average marks for each student. [4]
c) Write the output after Map Phase and the input data that will be given to the reducer. [2]
Set-(B)
3. Consider the below database of Patients Visits
a) How you can use map-reduce programming model to find average fees paid by a patient for all
his/her visits. Explain [1]
The reference output is given below
Patient_id Avg_Fee_paid
101 950
102 1300
103 950
104 1200
105 1500
b) Write pseudo code for map and reduce functions to find out average fees paid for each patient. [4]
c) Write the output after Map Phase and the input data that will be given to the reducer. [2]
Set-(C)
3. Consider the below database of transactions
a) How you can use map-reduce programming model to find average sales per transaction across
category? Explain. [1]
b) Write pseudo code for map and reduce functions to find out average sales per transaction across
category [4]
c) Write the output after map phase and the input data that will be given to the reducer. [2]
Set-(A)
4. Consider the following assembly of system composed multiple components.
a) Find out
i) Mean time to failure of component A [1]
ii) Mean time to failure of component C [1]
iii) Mean Time to failure of system [2]
b) Assume the mean time to repair every component is 23 hours and mean time to diagnose for any
failed component is 1 hour. Find out
i) The availability of component A [1]
ii) The availability of component C [1]
iii) The availability of the system [2]
Set-(B)
4. Consider the following assembly of system composed multiple components.
c) Find out
i) Mean time to failure of component A [1]
ii) Mean time to failure of component C [1]
iii) Mean Time to failure of system [2]
d) Assume the mean time to repair every component is 22 hours and mean time to diagnose for any
failed component is 2 hours. Find out
i) The availability of component A [1]
ii) The availability of component C [1]
iii) The availability of the system [2]
Set-(C)
4. Consider the following assembly of system composed multiple components.
a) Find out
i) Mean time to failure of component A [1]
ii) Mean time to failure of component C [1]
iii) Mean Time to failure of system [2]
b) Assume the mean time to repair every component is 20 hours and mean time to diagnose for any
failed component is 4 hours. Find out
i) The availability of component A [1]
ii) The availability of component C [1]
iii) The availability of the system [2]