You are on page 1of 12

Ahmedabad Institute of Technology

CE & IT Department (Sem VII)

Data Mining and Business Intelligence (2170715)


Question Bank
Year: 2020-21

Prepared By: - Prof. Shital V. Patel

Department Vision:
”To produce technically sound and ethically responsible Computer Engineers to the society

by providing Quality Education.”

Department Mission:

1. To provide healthy Learning Environment based on current and future Industrial demands

2. To promote curricular, co-curricular and extra-curricular activities for overall personality


development of the students.

3. To build technically and ethically strong mind having real life problem solving capabilities.

4. To provide platform for Effective Teaching Learning.


AHMEDABAD INSTITUTE OF TECHNOLOGY

Sr 1. Overview and concepts Data Warehousing and Business Intelligence Marks


No [CO-1]

Explain why data warehouses are needed for developing business solutions from today’s
1 07
perspective. Discuss the role of data marts.(NOV 2016)
Define following terms & differenciate them: Data Mart , Enterprise Warehouse & Virtual
2 07/03
Warehouse (MAY 2017)(DEC-2018)
Explain role of Business intelligence in any one of following domain:Fraud
3 Detection,Market Segmentation, retail industry, telecommunications industry. Explain how 07
data mininig can be helpful in any of these cases.(MAY 2017)
Define the following terms: Business Intelligence, Data Mart, Closed frequent
4 04
itemset,Outlier Analysis (NOV 2017)
5 Do feature wise comparison between BI and DW.(MAY-2018) 04
6 Explain various features of Data Warehouse?(DEC-2018) 04
7 Differentiate between Operational Database System and Data Warehouse (DEC-2018) 07
8 Discuss the application of data warehousing and data mining (DEC-2018) 04
9 A data warehouse is a subject-oriented, integrated, time-variant, and 04
nonvolatile collection of data – Justify.(MAY-2019)
10 Can BI is used for DM? Or vice versa? Justify.(DEC-2019) 03
11 Explain why data warehouses are needed for developing business solutions from today’s 03
perspective. Discuss the role of data marts.(DEC-2019)
Sr 2. The Architecture of BI and DW [CO-1]
No

1 Give differences between OLAP and OLTP.(NOV 2016)(NOV 2017) 07/04


With the help of a suitable example, illustrate the OLAP operations:‘drill-down’, ‘roll-up’,
2 07
‘slice’ and ‘dice’.(NOV 2016)
What is Cuboid? Explain any three OLAP Operations on Data Cube withexample.(MAY
3 04
2017)
Compare OLTP & OLAP systems.(Do feature wise comparison between OLAP and
4 03
OLTP.)(MAY-2018)
Explain Star, Snowflake, and Fact Constellation Schema for Multidimensional
5 07/03
Database.(MAY 2017)(MAY-2018)
6 Explain Star schema and Snowflake schema with example.(NOV 2017) 04

7 Explain various OLAP operations.(NOV 2017) 07

8 Explain Data warehouse architecture.(NOV 2017) 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 2


AHMEDABAD INSTITUTE OF TECHNOLOGY
9 Do feature wise comparison between ROLAP and MOLAP.(MAY-2018) 04

10 Define data cube and explain 3 operations on it.(MAY-2018)(DEC-2018) 04

11 Differentiate Fact table vs. Dimension table(DEC-2018),(DEC-2019) 03

12 Explain three tier data warehouse Architecture in details.(DEC-2018), MAY(2019) 07

13 Discuss possible ways for integration of a Data Mining system with a Database or 04
DataWarehouse system.(MAY-2019)
14 Compare data mart and data warehouse.(MAY-2019) 03

15 Discuss star schema and fact constellation schema with diagram. (MAY-2019) 04

16 Compare OLAP and OLTP in detail. (MAY-2019)(NOV-2019) 07

17 Explain in detail the extract/transform/load (ETL) design of an automated warehouse(DEC- 04


2019)
18 Draw and Explain Snowflakes and Fact constellations Schema.(DEC-2019) 04

19 Discuss Following: (i) Meta Data (ii) Virtual Warehouse(DEC-2019) 03

3. Introduction to data mining (DM) [CO-1]

1 Define the term “data mining”. Discuss the major issues in data mining(NOV 2016) 07/03

What is Data Mining? Why is it called data mininig rather knowledge mininig?Explain KDD
2 07
process.(MAY 2017)(DEC-2018)
Explain mining in following Databases with example.
1. Temporal Databases
3 2. Sequence Databases 07

3. Spatial Databases
4. Spatiotemporal Databases.(MAY 2017)(DEC-2018)
What is the importance of visualization of discovered patterns? Explain the role of
4 presentation in pattern visualization. Discuss various visualization techniques in KDD.(MAY 07
2017)
5 What is Data Mining? Write down short note on KDD process(NOV 2017) 07

6 What are the major issues in Data Mining?(NOV 2017) 07

7 Explain KDD process using figure.(MAY-2018) 03

8 Explain research issues in Data Mining.(MAY-2018) 07

9 What is the difference between KDD and Data Mining?(DEC-2018) 03

10 Explain the major issues in data mining (DEC-2018) 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 3


AHMEDABAD INSTITUTE OF TECHNOLOGY
11 Explain cluster analysis and outlier analysis with example (MAY-2019) 03

12 What is apex cuboid? Discuss drill down and roll up operation with diagram.(MAY-2019) 04
13 Define data mining and list its features.(DEC-2019) 03
14 Describe the steps involved in data mining when viewed as a process of knowledge 07
discovery. (DEC-2019)
15 Define outlier analysis? Why outlier mining is important? Briefly describe the different 07
approaches: statistical-based outlier detection, distance-based outlier detection and deviation-
based outlier detection.(DEC-2019)

4.Data Pre-processing [CO-1]

In real-world data, tuples with missing values for some attributes are a common occurrence.
1 07
Describe various methods for handling this problem.(NOV 2016)
Explain the need for data smoothing during pre-processing and discuss data smoothing by
2 07
Binning.(NOV 2016)
What is Concept Hierarchy? List and explain types of Concept Hierarchy(MAY 2017)(DEC-
3 07/04
2018)
List and describe methods for handling missing values in data cleaning.(MAY 2017)(DEC-
4 03
2018)
What is noise? Explain data smoothing methods as noise removal technique to divide given
5 data into bins of size 3 by bin partition (equal frequency), by bin means, by bin medians and 07
by bin boundaries. Consider the data:10, 2, 19, 18, 20, 18, 25, 28, 22(MAY 2017)(DEC-2018)
Minimum salry is 20,000Rs and Maximum salary is 1,70,000Rs. Map thesalry 1,00,000Rs
6 in new Range of (60,000 , 2,60,000) Rs using min-max normalization method.(MAY 03
2017)
If Mean salary is 54,000Rs and standard deviation is 16,000 Rs then find zscore value of
7 07
73,600 Rs salry.(MAY 2017)
Explain Mean, Median, Mode,Variance, Standard Deviation & five number summay with
8 07
suitable database example.(MAY 2017)(DEC-2018)
Explain the following data normalization techniques: (i) min-max normalization and (ii)
9 07
decimal scaling.(NOV 2016)
Use min-max normalization method to normalize the following group of data by setting min =
10 04
0 and max = 1 ,200, 300, 400, 600, 1000(NOV 2017)
11 Describe various methods for handling missing data values(NOV 2017) 07
Suppose a group of sales price records has been sorted as follows:6, 9, 12, 13, 15, 25, 50, 70,
72, 92, 204, 232
12 03
Partition them into three bins by equal-frequency (equi-depth)
partitioning method. Perform data smoothing by bin mean.(NOV 2017)

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 4


AHMEDABAD INSTITUTE OF TECHNOLOGY
Explain the following terms:
13 03
Numerosity reduction, Data Integration, Data transformation (NOV 2017)
14 Explain sampling methods for data reduction.(NOV 2017) 07

Enlist the preprocessing steps with example. Explain procedure of any technique of
15 07
preprocessing.(MAY-2018)
Suppose that the data for analysis includes the attribute age.
The age values for the data tuples are (in increasing order):
16 13, 15, 16, 16, 19, 20, 23, 29, 35, 41, 44, 53, 62, 69, 72 04

Use min-max normalization to transform the value 45 for age onto the range [0:0, 1:0]
(DEC-2018)
17 What is noise? Explain binning methods for data smoothing. (MAY-2019) 04

18 Explain various data normalization techniques. (MAY-2019) 07

19 Enlist data reduction strategies and explain any two. (MAY-2019) 07

20 Discuss attribute subset selection. (MAY-2019) 04

21 Explain Mean, Median, Mode, Variance, Standard Deviation & five number summary with 07
suitable database example. (DEC-2019)
22 Is Graphical visualization is better than text data ?Justify your answer and explain different 07
data visualization technique.(DEC-2019)
23 In data pre-processing why we need data smoothing? Discuss data smoothing by 07
Binning.(DEC-2019)
24 Describe Concept Hierarchy? List and briefly explain types of Concept Hierarchy (DEC- 04
2019)
25 In real-world data, tuples with missing values for some attributes are a common occurrence. 07
Describe various methods for handling this problem.(DEC-2019)

5. Concept Description and Association Rule Mining [CO-2]

What are the limitations of the Apriori approach for mining? Briefly describe the techniques
1 07
to improve the efficiency of Apriori algorithm (NOV 2016)
What is market basket analysis? Explain the two measures of rule interestingness:
2 07/04
support and confidence with suitable example.(NOV 2016)(MAY 2017)(DEC-2018)
State the Apriori Property. Generate candidate itemsets, frequent itemsets and association
rules using Apriori algorithm on the following data set with minimum support count is
2.(MAY 2017)
3 07
TID List of items_IDs
1 T100 I1,I2,I5

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 5


AHMEDABAD INSTITUTE OF TECHNOLOGY
2 T200 I2,I4
3 T300 I2,I3
4 T400 I1,I2,I4
5 T500 I1,I3
6 T600 I2,I3
7 T700 I1,I3
8 T800 I1,I2,I3,I5
9 T900 I1,I2,I3
Explain measures for finding rule interestingness (support,confidence) with
4 03
example.(NOV 2017)(MAY-2019)
Using Apriori algorithm, generate frequent item sets (min_sup>= 33.3%) for the following
transaction database.(NOV 2017)
Trans_id Itemlist
T1 {A, B, D, K}

5 T2 {A, B, C, D, E} 07
T3 {A, B, C, E}
T4 {B, D}
T5 {A, C}
T6 {B, D}

Compare association and classification. Briefly explain associative classification with suitable
6 07
example.(NOV 2017)
What is an attribute selection measure? Explain different attribute selection measures with
7 07
example.(NOV 2017)

Explain what is concept description? Explain data generalization, summarization-based


8 07
characterization using example.(MAY-2018)
9 Write a note on incremental Association Rule Mining.(DEC-2019) 04
Generate frequent itemsets and generate association rules based on it using apriori algorithm.
Minimum support is 50% and minimum confidence is 70% (MAY-2018)

10 07

11 Explain Mining Multiple-Level Association Rules using example.(MAY-2018) 04


State the Apriori Property. Generate large itemsets and association rules
12 using Apriori algorithm on the following data set with minimum support 07
value and minimum confidence value set as 50% and 75% respectively (DEC-2018)

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 6


AHMEDABAD INSTITUTE OF TECHNOLOGY

Consider following database of ten transactions. Let min_sup = 30% and min_confidence = 60%.
A) Find all frequent itemsets using Apriori algorithm.
B) Generate strong association rules.

13 07

(MAY-2019)

Write and discuss the algorithm which is used to generate frequent itemsets using an iterative level-
14 07
wise approach based on candidate generation.(MAY-2019)
15 Discuss Hash-based technique to improve efficiency of Apriori algorithm.(MAY-2019) 04
Consider the following dataset and find frequent item sets and generate association rules for
them using Apriori Algorithm. (DEC-2019)
16 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 7


AHMEDABAD INSTITUTE OF TECHNOLOGY

minimum support count is 2 minimum confidence is 60% .

What is market basket analysis? Explain the two measures of rule interestingness: support and
17
confidence. (DEC-2019)

6. Classification and Prediction [CO-2]

With the help of a neat diagram explain the topology of a multilayer, feed-forward Neural
1 07
Network. Also explain the terms: “activation function” and“epoch”.(NOV 2016)
2 Briefly explain Linear and Non-linear regression(NOV 2016) 07

3 Explain the steps of the ID3 algorithm for generating Decision trees (NOV 2016) 07

4 Explain the methods to measure the accuracy of a classifier or predictor(NOV 2016) 07

What is Decision Tree? Explain how classification is done using decision tree
5 07
induction.(MAY 2017)
6 Explain Prepruning and Postpruning with an example(NOV 2017) 03

Why naïve Bayesian classification is called “naïve”? Describenaïve Bayesian classification


7 07
with example.(NOV 2017)
What is classification and prediction? List out Issues regarding Classification and
8 03
prediction.(NOV 2017)
Do feature wise comparison between classification and prediction.
9 03
(MAY-2018)
Differentiate between Overfitting and Tree Pruning w.r.to following parameters.(MAY-2018)
10 03
i). definition figure ii). use in particular situation iii). Limitation
11 Generate decision tree using CART algorithm for the following dataset.(MAY-2018) 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 8


AHMEDABAD INSTITUTE OF TECHNOLOGY

Define linear and nonlinear regression using figures. Calculate the value of Y for X=100
based on Linear regression prediction Method.(MAY-2018)

12 07

Calculate the weights using neural network single layer perceptron model. Three inputs are
x0, x1, x2, bias and weights are as follows:
w1(0) = 30 , w2(0) = 300
b(0)= 50 , η=0.01, xo = +1
13 04
Activation function is :
sgn(x) = +1, if x>=0
sgn(x) = -1, if x<0
(a)Calculate x2 for x1=100 and & 200.

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 9


AHMEDABAD INSTITUTE OF TECHNOLOGY
(b)For bias b(0)= -1230 recalculate the weights w1 and w2.(MAY-2018)
14 Draw the topology of a multilayer, feed-forward Neural Network.(DEC-2018) 03

15 Explain Linear regression with example.(DEC-2018) 04

16 Discuss following terms. 03


1) Supervised learning 2) Correlation analysis 3) Tree pruning (MAY-2019)
17 Discuss various layers of multilayer feed-forward neural network with diagram. (MAY-2019) 03
18 Using Naive Bayesian classification method, predict class label of X = (age = youth, income 07
= medium, student = yes, credit_rating = fair) using following training dataset. (MAY-2019)

19 Explain various conflict resolution strategies in rule based classification.(MAY-2019) 03


20 What is classification? Explain classification as a two step process with diagram. (MAY-2019) 04
21 What do you mean by learning-by-observation? Explain k-Means clustering algorithm in 07
detail.(MAY-2019)
22 Briefly outline the major steps of decision tree classification. Why tree pruning useful in 04
decision tree induction? (DEC-2019)
23 Draw the topology of a multilayer, feed-forward Neural Network. (DEC-2019) 03
24 Briefly explain Linear and Non-linear regression. (DEC-2019) 04

7. Data Mining for Business Intelligence Applications [CO-2]

1 Briefly explain the life-cycle of Data Analytics and discuss the role of data scientists.(NOV 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 10


AHMEDABAD INSTITUTE OF TECHNOLOGY
2016)
2 Discuss applications of data mining in Banking and Finance.(NOV 2016) 07

3 Explain data mining application for fraud detection.(NOV 2017) 04

How data Mining is useful for Business Intelligence applications viz.Balanced Scorecard,
4 Fraud Detection, Clickstream Mining, Market Segmentation, retail industry, 07
telecommunications industry, banking & finance and CRM (MAY-2018)
5 Discuss fraud detection and click-stream analysis using data mining. (MAY-2019) 07

6 Explain text mining using example. (DEC-2019) 03

7 Explain data mining application for fraud detection. (DEC-2019) 04

8.Advance topics [CO-2]


What is meant by“clustering”? Explain why clustering is called unsupervised learning.
1 07
Mention any two applications of clustering. (NOV 2016)
2 Discuss the main features of Hadoop Distributed File System.(NOV 2016) 07

3 Explain Hadoop Architecture.(MAY 2017)(MAY-2018) 07/04

What is Big Data? What is big data analytic ? Explain the big data- distributed file
4 07
system.(MAY 2017)(DEC-2018)
What is outlier analysis? Why outlier mining is important? Briefly describe the different
5 approaches : statistical-based outlier detection, distance-based outlier detection and deviation- 07
based outlier detection.(MAY 2017)
6 Define Big Data. Discuss various applications of Big Data(NOV 2017) 04

7 Explain Hadoop storage – HDFS.(NOV 2017) 07

8 Explain basic concepts of text mining and web mining.(NOV 2017) 07

9 Explain Spatial mining using example.(MAY-2018) 03

10 Explain text mining using example.(MAY-2018)(DEC-2018) 03

Explain big data and big data analytics. Explain key roles and their responsibilities for
11 04
successful analytic project.(MAY-2018)
Calculate 2 clusters using k-means cluster algorithm. For finding
12 07
the distance use euclidian distance.(MAY-2018)

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 11


AHMEDABAD INSTITUTE OF TECHNOLOGY

Assume mean1 as subject1 and mean2 as subject4


13 Explain web mining using example.(MAY-2018) 03

14 Explain mapreduce. Explain any example using mapreduce.(MAY-2018) 07

15 How K-Mean clustering method differs from K-Medoid clustering method?(DEC-2018) 03

16 What is web log? Explain web structure mining and web usage mining in detail(DEC-2018) 07

17 Discuss Big Data.(MAY-2019) 03

Discuss following terms.(MAY-2019)


18 03
1) DataNode 2) NameNode 3) Text mining
19 Define “clustering”? Mention any two applications of clustering. (DEC-2019) 03

Briefly explain the life-cycle of Data Analytics and discuss the role of data scientists. (DEC-
20 07
2019)
21 Discuss the main features of Hadoop Distributed File System. (DEC-2019) 07

Prepared by: Prof. Shital V. Patel DMBI 2170715 Page 12

You might also like