You are on page 1of 4

MIT ACADEMY OF ENGINEERING

COURSE CODE: CS311T 16 DECEMBER 2019


TY BTECH SEMESTER - V 2019 - 2020 EXAMINATION
DEPARTMENT OF COMPUTER ENGINEERING
END SEMESTER EXAMINATION
DESCRIPTIVE ANALYTICS
TIME : 3 HOURS MAX MARKS : 100 MARKS
TOTAL NO OF QUESTIONS: 5 TOTAL NO OF PRINTED PAGES: 4
INSTRUCTIONS TO CANDIDATES:
1. Assume suitable data wherever necessary
2. Non programmable scientific calculators are allowed
3. Black figures to the right indicate full marks

1 a) Explain one example for Inmon and Kimball [8] CO1 L3


Approach.
[Each Suitable Example- 2 Marks and Principle - 2
Marks]
b) In real-world data, tuples with missing values for some [7] CO2 L2
attributes are a common occurrence. Describe various
methods for handling this problem.
[For each step - 1 Mark]

c) Use the three methods to normalize the following [5] CO2 L3


group of data:
200; 300; 400; 600; 1000
(i) min-max normalization by setting
min = 0 and max = 1
[ Formula - 1 Mark & Calculation - 1 Mark]

(ii) z-score normalization


[ Formula - 1 Mark & Calculation - 1 Mark]

(iii) Normalization by decimal scaling


[ Formula - 0.5 Mark & Calculation - 0.5 Mark]

[1]
2 a) Design a data warehouse for a regional weather [10] CO3 L3
bureau. The weather bureau has about 1,000 probes,
which are scattered throughout various land and
ocean locations in the region to collect basic weather
data, including air pressure,
temperature, and precipitation at each hour. All data
are sent to the central station, which has collected
such data for over 10 years. Your design should
facilitate efficient querying and on-line analytical
processing, and derive general weather patterns in
multidimensional space.
a) Identify Grains [Atleast 4 Grains - 1 Mark]
b) Identify Dimensions [Atleast - 4 Dimension -
1 Mark]
c) Identify Measure [Atleast Measures - 1 Mark]
d) Draw Fact Table. [Correct Fact Table - 1 Mark]
e) Identify Correct data Modelling [Star/Snow Flake/
Fact Constellation]. Justify you answer. [Identification-
1 Mark and Justification - 1 Mark]
f) Draw identified schema modelling. [Schema
Modelling - 4 Marks]
b) Enlist three types of Data Warehouse. [10] CO3 L3
(a) Briefly describe each type with Pros and Cons.
(b) Which implementation type do you prefer, and
why?
[Each type - 1 Mark and Pros and Cons 1 Mark,
Best Type of DW - 1 Mark and Proper Explanation- 3
Marks]
3 a) A researcher reports that the average salary of [10] CO4 L3
assistant professor is more than Rs.42,000/-. A
sample of 30 assistant professor has mean salary of
Rs. 43,260/- At level of significance 5 %, test the
claim that assistant professor earn more than Rs.
42,000/- a year. The standard deviation of the
population is Rs. 5230/-. (Z value of 5% = 1.65)
i) Find out H0 and H1. [1 Mark]
ii) Explain whether problem is one tailed or two tailed.
Draw the curve. [1 Mark]
iii) State which hypothetical test is needed?
[1 Mark]

[2]
iv) Write correct formula. [1 Mark]
v) Find level of significance. [1 Mark]
vi) Calculate test value by formula. [1 Mark]
viii) Write the rules of rejection / acceptance for
hypothesis. [1 Mark]
ix) Write the result and compare the test statistics with
given value.[1 Mark]
x) Write the inference. [1 Mark]
b) A Super department store, A, has four competitors: [10] CO4 L3
B,C,D, and E. Store A hires a consultant to determine
if the percentage of shoppers who prefer each of the
five stores is the same. A survey of 1100 randomly
selected shoppers is conducted, and the results about
which one of the stores shoppers prefer are below. Is
there enough evidence using a
significance level = 0:05 to conclude that the
proportions are really the same?
Store A B C D E
Number of Shoppers 262 234 204 190 210
i) Create a contingency table to organize the
information? [2 Marks]
ii) What is the value of chi-square? [4 Marks]
iii) How many degrees of freedom are there?
[ 2 Marks]
iv) If we plan to test the claim then what are H0 and
H1? [ 2 Marks]
4 a) Differentiate between F-Test and T-Test w.r.t. [5] CO4 L2
principle, formula, application, rejection rule,
condition. [For each- 0.5 Mark]
b) The time in years that an employee spent at a [10] CO4 L3
company and the employee's hourly pay, for 5
employees are listed in the table below. Calculate and
interpret the correlation coefficient r. Conclude from
the calculation.

[ For formula - 2 Marks and calculation - 4 Marks,


Conclusion- 2 Marks]

[3]
c) Five randomly selected students took a math aptitude [5] CO5 L3
test before they began their statistics course. The
Statistics Department has three questions.
xi yi
95 85
85 95
80 70
70 65
60 70
i) What linear regression equation best predicts
statistics performance, based on math aptitude
scores?
[Linear Regression Formula and Equation - 3 Marks]
ii) If a student made an 80 on the aptitude test, what
grade would we expect her to make in statistics?
[Correct Answer - 2 Marks]

5 a) What is the condition for multiple regression? Give [5] CO5 L2


one real example where multiple regression is
applicable.
[Condition with equation - 2 Marks and Application - 3
Marks]
b) What are benefits of Market Basket Analysis. [5] CO6 L2
[For each with explanation - 1 Mark]
c) How do you find the association rules by applying [10] CO6 L3
Apriori Algorithm for the given example.
Transaction ID Item Purchased
101 {1,2,3,4,5,6}
102 {7,2,3,4,5,6}
103 {1,8,4,5}
104 {1,9,0,4,6}
105 {0,2,4,5}
Minimum Support = 60% and Minimum Confidence =
60%
i) Find Support of Item Set.[2 Marks]
ii) Write principle of Apriori Algorithm. [1 Mark]
ii) Find Frequent Item Set by applying Apriori
Algorithm.[4 Marks]
iii) Find Strong Association Rules with support and
confidence. [3 Marks]

[4]

You might also like