Professional Documents
Culture Documents
INSTRUCTIONS TO CANDIDATES:
2) Suppose that the data for analysis includes the attribute age. The age values for the data
tuples are (in increasing order) 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33,
33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. (5 marks)
3) Suppose you have the following four Dimension Tables namely Time, Customer, Employee
and Product. Construct a snowflake scheme by developing "Sales" Fact Table. The linkage
attribute in the dimension tables can be used to split the table to form a snowflake scheme.
The aggregate variable of fact table can be "quantity" of products.
Time Customer
OrderID (primary key) CustID (primary key)
Order Date Name
Year Address
Quarter CityID (linkage attribute)
Month City Name
Zip Code
State
Country
Employee Product
EmpID (primary key) ProductID
Employee Name Product Name
DepartmentID (linkage attribute) Product Category
Region Product Description
Territory
(5 marks)
4) Suppose you have the following transactional database, construct an FP (frequent pattern)
tree from this transaction database.
(5 marks)
2/4
WQD7005
5) Let us consider the dataset of sales related to computer systems (e.g. hardware and software)
shown below. We are required to learn a decision tree which predicts the profit either up or
down based on certain features i.e. condition, upgradable and type.
(5 marks)
3/4
WQD7005
1) Select the best non-target features using one of statistical methods "correlation", "Chi-
square", or "ANOVA". Your solution should describe the relevant statistical findings.
(5 marks)
3) Discuss the performance metric of all three algorithms in terms of Receiver Operator
Characteristic (ROC) curve.
(5 marks)
END
4/4