0% found this document useful (0 votes)
36 views4 pages

DMBI All Pyqs

The document outlines various theory and numerical questions related to Data Warehousing, Data Mining, Classification, Association Rule Mining, Clustering, and Business Intelligence across six modules. It includes repeated questions, short notes, and practical applications such as algorithms and classification tasks. The focus is on understanding concepts, processes, and techniques used in data analysis and mining.

Uploaded by

xie.himanshu29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

DMBI All Pyqs

The document outlines various theory and numerical questions related to Data Warehousing, Data Mining, Classification, Association Rule Mining, Clustering, and Business Intelligence across six modules. It includes repeated questions, short notes, and practical applications such as algorithms and classification tasks. The focus is on understanding concepts, processes, and techniques used in data analysis and mining.

Uploaded by

xie.himanshu29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DMBI pyqs

Module 1: Introduction to Data Warehousing and Data Mining

Theory Questions:

1. What is Data Mining? Explain KDD process with diagram.


2. What do you mean by data mining? Explain KDD process with help of a suitable diagram.
(REPEATED)
3. What is Data warehousing Architecture?
4. Draw and list the components of a typical Data warehouse architecture. (REPEATED)
5. Compare and contrast between OLTP and OLAP.
6. Give the difference between OLAP and OLTP. (REPEATED)
7. What is OLAP? Explain various OLAP operations with neat labeled diagram.
8. List out stages in Data Mining with neat labelled diagram.

Numerical Questions: None

Module 2: Data Preprocessing and Data Exploration

Theory Questions:

1. What is noisy data? How to handle it?


2. Describe different steps involved in data preprocessing.
3. Explain different types of attributes used in data exploration with example.
4. What are the different types of summary data?
5. Explain concept of information gain and gini value used in decision tree.
6. What is an outlier? Explain various methods for performing outlier analysis. (REPEATED x3)
7. What is an outlier? List types of outliers. Describe methods used for outlier analysis.
8. Types of attributes – (Short note)
9. Bootstrapping – (Short note)

Numerical Questions:

1) Perform binning and smoothing techniques for the following data:


Data = {12, 15, 14, 10, 8, 13, 11, 9, 20, 18, 19, 25, 30}
- Use bin mean and bin boundary methods
2) Given the following data:
Marks = {45, 56, 47, 55, 62, 53, 59, 61, 52}
- Calculate mean, median, standard deviation, variance, skewness, and draw a box plot.

Module 3: Classification & Prediction

Theory Questions:

1. Explain working of decision tree based classifier (ID3 algorithm).


2. Explain concept of information gain and gini value used in decision tree. (Also listed in
Module 2)
3. Decision Tree Induction – (Short note)
4. Explain Naive Bayes Classifier.
5. Random Forest Algorithm – (Short note)
6. Cross Validation – (Short note)
7. BI Architecture – (Short note)

Numerical Questions:

1) Use Naive Bayes Classifier to classify a tuple using the dataset below:
Play Tennis Dataset:
Attributes: Outlook, Temperature, Humidity, Wind
Class: Play (Yes/No)
- Classify tuple: (Sunny, Cool, High, Strong)
2) Use Naive Bayes Algorithm to classify a tuple using this new dataset:
Attributes: Fever, Cough, Headache → Disease (Yes/No)
- Classify (Yes, No, Yes)
3) Construct a Decision Tree using ID3 Algorithm on the following dataset:
Attributes: Weather, Temp, Humidity, Wind
Class: Play (Yes/No)

Module 4: Association Rule Mining

Theory Questions:

1. What is market basket analysis? Explain with use case. (REPEATED x2)
2. Explain mining of Multilevel and Multidimensional association rules (REPEATED x2)
3. FP-Growth Algorithm – (Short note)
Numerical Questions:

1) Use Apriori algorithm to find frequent itemsets and strong association rules:
Transactions:
T1: {Bread, Milk}
T2: {Bread, Diaper, Beer, Eggs}
T3: {Milk, Diaper, Beer, Coke}
T4: {Bread, Milk, Diaper, Beer}
T5: {Bread, Milk, Diaper, Coke}
- Use min_sup = 2, min_conf = 60%
2) Use Apriori algorithm to find itemsets:
T1: {A, B, D}
T2: {B, C, E}
T3: {A, B, D, E}
T4: {A, B, C, E}
3) Generate strong association rules from the frequent itemset:
Frequent Itemset: {Milk, Diaper, Beer}
Support: 50%, Confidence: 75%
- Generate all rules above threshold

Module 5: Clustering and Outlier Detection

Theory Questions:

1. Explain steps in hierarchical clustering algorithm.


2. Compare Star Schema and Snowflake Schema (REPEATED x2)
3. Explain K-means algorithm with example
4. DBSCAN algorithm – explain with example
5. BIRCH Algorithm – (Short note)

Numerical Questions:

1. Apply K-means algorithm using Manhattan Distance:


Points: {(2,10), (2,5), (8,4), (5,8), (7,5), (6,4)}
k = 2, Initial Centroids: (2,10) and (5,8)
2. Apply K-means algorithm to the dataset:
Values = {2, 4, 10, 12, 3, 20, 30, 11, 25}
- Cluster into 3 clusters
3. Perform Hierarchical Clustering using single linkage for:
Points: {(1,1), (2,1), (4,3), (5,4), (3,3), (2,5), (3,4), (6,8), (7,9), (8,10)}

Module 6: Business Intelligence and Applications

Theory Questions:

1. Design a BI system for fraud detection


2. Dimensional Modelling – (Short note)
3. BI Architectures – (Short note)
4. Compare Star vs Snowflake Schema (Already repeated above)
5. Short Notes (any 4):
o FP-Growth
o Dimensional Modelling
o Cross Validation
o BI Architectures
o Random Forest
o BIRCH

Numerical Questions: None

You might also like