You are on page 1of 2

5*1

1. With the suitable example, discuss the need of pre-processing the data.
2. Give a precise definition of the term “concept hierarchy”.
2. Write the formula to find out the outliers using inter-quartile range (IQR) of the given data set.
3.Differentiate between ordinal and nominal attribute .
4.
5. Compute the Manhattan distance between the two objects represented by the tuples (22, 1, 42,
10) and (20, 0, 36, 8).

3*3
6 Briefly discuss why online OLTP is not applicable to data warehouse.
7. Explain 3 tier architecture with the help of suitable diagram.
8.
Name Grades
A
B
A
C
B
D
Find out the distance between each pair of objects represented by names and represent the
distance in dissimilarity matrix.

9. The data (in increasing order) for the attribute age:


13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52, 70.
Use smoothing by bin means and bin boundary to smooth given data, using a bin depth of 3.
Illustrate your steps.

10 . Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33,
35, 35, 35, 35, 36, 40, 45, 46, 52, 70.
(a) What is the mean of the data? What is the median?
(b) What is the mode of the data? Comment on the data’s modality (i.e., bimodal, trimodal, etc.).
(c) What is the midrange of the data?
(d) Can you find(roughly) the first quartile(Q1) and the third quartile(Q3) of the data?
(e) Give the five-number summary of the data.
(f) Show a boxplot of the data.

11. Suppose that a data warehouse for Big University consists of the four dimensions student,
course, semester, and instructor, and two measures count and avg grade. At the lowest
conceptual level (e.g., for a given student, course, semester, and instructor combination),
the avg grade measure stores the actual course grade of the student. At higher conceptual
levels, avg grade stores the average grade for the given combination.
a) Draw star and snowflake schema diagram for the data warehouse.
b) Starting with the base cuboid [student, course, semester, instructor], what specific
OLAP operations (e.g., roll-up from semester to year) should you perform in
order to list the average grade of CS courses for each Big University student.
c) If each dimension has five levels (including all), such as “student < major < status
< university < all”, how many cuboids will this cube contain (including the base
and apex cuboids)?

You might also like