You are on page 1of 3

Assignment 1

1. What are the various strategies and techniques used in data mining?
2. What is data mining? Differentiate between data mining techniques and data mining strategy.
3. What is data warehouse? How is it different from database?
4. What do you mean by granularity? What is partitioning.
5. Explain the life cycle of data warehouse.
6. Is data consolidation data modeling activity? Justify your answer.
7. What is data mining? Define the major issues in data mining.
8. Explain data, information, knowledge and intelligence.
9. What are different forms of data processing?
10. Explain data cleaning, data transformation, and data integration.
11. Distinguished between Dimensionality reduction and Numerosity reduction.
12. Explain concept hierarchy generation for categorical data.
13. Define KDD. Identify and describe the phases of KDD.
14. Explain Attribute subset selection methods for data reduction with example.
15. Describe the difference between the following approaches for the integration of data mining system
with databases or data warehouse system: no coupling, loose coupling, semi-tight coupling, and tight
coupling.
16. Explain principle component analysis (PCA) in detail.
17. What are outliers? How outliers analysis can be done.
18. Describe in brief the important steps of data mining and data mining functionalities.
19. Describe the important types of difficulties in data mining process.
20. Describe the process of data integration and transformation.
21. Explain the characteristics of operational data.

Assignment 2

1. What are the properties of standard deviation? Write its formula.


2. Distinguished between data cube approach and attribute oriented approach.
3. What are the roles of statistics in data mining?
4. Why we perform attribute relevance analysis.
5. What is association rule mining? Explain the Apriori algorithm to find frequent item set.
6. Explain multi-level association rules for transactional database.
7. Write short notes on: i). Quartile ii). Histogram iii).Scatter plot
8. Why analytical characterization and attribute relevance analysis are needed.
9. Describe statistical measures in large datasets.
10. Explain the basic concept of association rule mining. Also explain the market basket analysis.
11. Why is the task of mining frequent item sets difficult? Explain the reason.
12. A transaction database from a shop is as follows:
Transaction no Item
T1 1,2,3,4,5,6
T2 2,3,4,5,6,7
T3 1,8,4,5
T4 1,9,0,4,6
T5 0,2,2,4,5

Assume a minimum support is 3 and confidence is 50%


a) Calculate the frequent 1-itemset
b) Calculate the frequent k-itemsets for k=2, 3, 4,…
c) Generate any 4 possible association rules and calculate their level of confidence.

Assignment 3

1. Briefly explain the density based approaches for cluster formation.


2. Explain the density based clustering method based on connected region with sufficient high density
(DBSCAN).
3. (i). Explain different data types used in cluster analysis.
(ii). Describe the role of genetic algorithm in data mining.
4. Explain multi-layer feedback neural network. Differentiate between feed-forward and feedback system.
5. Describe classification and prediction. Also discuss the method regarding the classification.
6. Write short notes on: (i). CLIQUE (ii). STING
7. What is classification? Describe the Basiyan classifier in detail.
8. What is clustering? How is it distinguished from classification? Describe any one clustering algorithm
in detail.
9. Explain the types of data that often occur in cluster analysis, and also explain the preprocessing
methods on that data to prepare for cluster analysis.
10. Explain the back-propagation algorithm for training a Multilayer Neural Network with example.
11. Consider the dataset given below:
Point X Y
1 1 1
2 1 2
3 3 6
4 5 7
5 8 5

Cluster this dataset non-hierarchically and also provide the answer to the following:
i) Compute the matrix of Manhattan distance
ii) Which two cases are closest together?
iii) Which are the two clusters?
12. i) For clustering, the similarity measure between data points is used. List all the measures used to cluster the
points.
ii) Explain different kinds of non-hierarchical clustering based on density and probability algorithms.
Assignment 4

1. Explain three tier architecture of data warehouse. Also distinguished the data warehouse with data
mart.
2. Explain all steps and guidelines for the implementation of data warehouse.
3. What is multidimensional modeling? Explain the STAR, SNOWFLAKE, and FACT constellation schemas
for multidimensional databases. Also write their advantages and disadvantages.
4. Why data warehouse maintained separately from the database. Difference OLTP and OLAP.
5. Short notes on: i). Concept Hierarchy ii). Data mart
6. Explain the important approach to build a data warehouse system.
7. Explain various schemas for multidimensional modeling.
8. What are the difference between the information processing, analytical processing and data mining.

Assignment 5

1. Explain the back-up and recovery models in data warehouse.


2. Explain the basic similarity and difference between ROLAP, MOLAP, and HOLAP.
3. Explain in detail: i). Data mining interface. ii). Testing of Data warehouse.
4. Describe the various types of OLAP servers.
5. Explain in detail: i). back-up and recovery in data warehouse. ii). Security issues in Data
warehouse.
6. Discussed various OLAP operations. Explain how the query performance can be improved by cascading
these operations.

You might also like