This Is the Final Year Question Bank

Attribution Non-Commercial (BY-NC)

181 views

This Is the Final Year Question Bank

Attribution Non-Commercial (BY-NC)

- 02. Introducing to DM & DW
- Pattern Recognition and Clustering Techniques
- Applying Data Mining Techniques
- A Clustering Method to Identify Representative Financial Ratios
- Effective Feature Selection for Mining Text Data with Side-Information
- biggio14-aisec
- Using an unsupervised approach of Probabilistic Neural Network(PNN) for land use classification from multitemporal satellite images
- Mia Syafrina Gjpam
- Enhanced multi-objective Fuzzy Clustering base protocol for 3D WSNs
- Secure and Distributed Approach for Mining Association Rules
- 10 Cluster Analysis
- An Efficient Technique for Mining Association Rules Using Enhanced Apriori Algorithm a Literature Survey
- VOL2I3P5 - Finding Anomalies In Databases
- c 04511822
- 1836425.pdf
- Automatic Image Pixel Clustering Using Evolutionary Computation Technique 1
- TowardsaBusinessModelTaxonomyofStartupsintheFinanceSec
- JOMMBA
- RECOMMENDATION FOR WEB SERVICE COMPOSITION BY MINING USAGE LOGS
- 41 Figuringout.pdf

You are on page 1of 12

(AN AUTONOMOUS INSTITUTION)

AICTE & GOVT.OF KARNATAKA)

Sem: 7th Credits: 3

Dept: ISE

UNIT-I

1. What is Data Mining? Explain the process of Knowledge Discovery in Databases (KDD) with a

diagram

2. What are the different motivating challenges faced by Data Mining Algorithms? Explain each of them

7. In case of record data, what is transaction / market based data, Data Matrix and Sparse Data Matrix?

Explain with examples.

8. In case of ordered data, Explain Sequential Data, Sequence Data, Time Series Data and Spatial Data

with examples

9. What do you mean by Data Preprocessing? Explain Aggregation and Sampling in this respect

11. What are the different variations of Graph Data? Explain with diagrams

12. What is Feature Subset Selection? What are the different approaches for doing this? Explain the

architecture of Feature subset selection with a diagram

i) Feature Extraction

1

NMIT, Bangalore Data Mining Question Bank Dept of ISE

iii) Feature Construction

14. What do you mean by Binarization? Explain the conversion of a Categorical Attribute to 3 binary

attributes? What is its drawback? How is it overcome?

15. How is Discretization of Continuous Attributes done? In this regard, Explain unsupervised and

supervised Discretization.

ii) Normalization/Standardization

20. What are Discrete and Continuous Attributes? Explain the term resolution.

21. What is the curse of Dimensionality? Explain Data Quality issues related to applications

UNIT-II

1. Give the formal definition of classification. What is classification model? Explain with diagram

2. With a diagram, explain the general approach for building a classification model

3. For the Nodes N1 & N2 given below, calculate the Gini Index, Entropy and Classification Error.

Node N1 Count

Class=0 0

2

NMIT, Bangalore Data Mining Question Bank Dept of ISE

Class=1 6

Node N2 Count

Class=0 1

Class=1 5

4. What is confusion matrix? Explain the confusion matrix for a 2-class problem with an example. In

this regard, explain Accuracy and error rate of prediction with appropriate formula

11. Calculate the Gini Index for Attributes A and B given below and specify which attribute is better for

splitting.

A B

C0: 4 C0: 2 C0: 1 C0: 5

C1: 3 C1: 3 C1: 4 C1: 2

12. What is rule based classifier? Explain how it works with an example. In this regard, also define

accuracy and coverage

13. Consider a training set that contains 60 positive examples and 100 negative examples. Suppose two

rules are given:

For the above two rules, calculate Laplace, accuracy, coverage and likelihood ratio.

3

NMIT, Bangalore Data Mining Question Bank Dept of ISE

14. Explain characteristics of Rule-Based classifier

15. How can a decision tree be converted into classification rules? Explain with example.

19. Consider a training set that contains 100 positive examples and 400 negative examples. For each of

the following candidate rules

Determine which is the best and worst candidate rules according to:

20. Consider a training set that contains 29 positive examples and 21 negative examples. For each of the

Determine which is the best and worst candidate rules according to:

21. For the following Confusion matrix , calculate the Accuracy and Error rate:

Predicted Class

Class=1 Class=0

Actual Class=1 15 10

Class

Class=0 20 11

22. Consider the following table with attributes A, B, C and two class labels +, -

4

NMIT, Bangalore Data Mining Question Bank Dept of ISE

A B C Number of Instances

+ -

T T T 5 0

F T T 0 20

T F T 20 0

F F T 0 5

T T F 0 0

F T F 25 0

T F F 0 0

F F F 0 25

According to the classification error rate, which attribute would be chosen as the best splitting attribute?

UNIT-III

1. How is market basket data represented in a binary format? Explain with example. In this case explain

the terms itemset, association rule, support count, support and confidence

4. What is frequent itemset generation? Generate candidate 3 itemsets for the following data by applying

APriori principle taking a minimum support threshold of 60%

TID Items

1 {Bread, Milk}

2 {Bread, A, B, C}

3 {Milk, A, B, D}

4 {Bread, Milk, A, B}

5 {Bread, Milk, A, D}

5

NMIT, Bangalore Data Mining Question Bank Dept of ISE

5. Write the algorithm for Frequent itemset generation of the Apriori algorithm

6. How is support counting done using a Hash tree? Explain with example

7. How are candidates generated using Lexicographic ordering ? Explain with example.

8. What is candidate generation and pruning? Explain Fk-1 x F1 and Fk-1 x Fk-1 methods of candidate

generation with examples.

9. What are the factors that affect the computation complexity of the Apriori algorithm? Explain

14. What are the alternative methods for generating frequent items? Explain

15. Explain relationships among frequent, maximal frequent and closed frequent itemsets with diagram

16. Explain the DFS and BFS methods of generating frequent itemsets with examples.

17. What are the two ways in which a transaction data set be represented? Explain with example

0001 {a,d,e}

0024 {a,b,c,e}

0012 {a,b,d,e}

0031 {a,c,d,e}

0015 {b,c,e}

0022 {b,d,e}

0029 {c,d}

0040 {a,b,c}

0030 {a,d,e}

0038 {a,b,e}

i) Compute the support count for itemsets {e}, {b,d} and {b,d,e}

6

NMIT, Bangalore Data Mining Question Bank Dept of ISE

1 {a,b,d,e}

2 {b,c,d}

3 {a,b,d,e}

4 {a,c,d,e}

5 {b,c,d,e}

6 {b,d,e}

7 {c,d}

8 {a,b,c}

9 {a,d,e}

10 {b,d}

i) What is the maximum number of association rules that can be extracted from this data?

ii) What is the maximum number of frequent itemsets that can be extracted (including null set)?

iii) Generate candidate 1-itemset, 2-itemsets and 3-itemsets assuming a support threshold of 60%

using Apriori algorithm

i. Equivalence classes

UNIT-IV

2. How are frequent itemsets generated using FP-Tree Algorithm? Explain with example.

5. For the following tables, Calculate the Interest Factor, ø-correlation coefficient and IS Measure

p p

q 880 5 930

0

q 50 3 70

0

7

NMIT, Bangalore Data Mining Question Bank Dept of ISE

930 7 1000

0

r r

s 2 50 70

0

s 5 880 930

0

7 930 1000

6. How can Objective Measures be extended 0

beyond pairs of Binary Variables? Explain

with contingency table

8. Calculate ø-correlation coefficient, IS Measure, Interest Factor and Confidence for the rule

{Tea} -> {Coffee} for the following table

Coffe Coffe

e e

Tea 50 30 800

11. For the following contingency tables compute support, the interest measure, and the ø-correlation

coefficient, for the association patterns {A,B}. Also compute the confidence of rules A -> B and B

-> A. Is confidence a Symmetric measure?

8

NMIT, Bangalore Data Mining Question Bank Dept of ISE

B B B B

9 1 A 89 1

1 89 1 9

AAAAAAAAAAAAA

A AAAAAAAAAAAAA

HDTV Machine

Yes No

Yes 99 81 180

No 54 66 120

Calculate:

9

NMIT, Bangalore Data Mining Question Bank Dept of ISE

c) Explain the inversion and scaling properties of Objective Measures with examples (6)

Machine

Yes No

College Students Yes 1 9 10

No 4 30 34

Working Adult Yes 98 72 170

No 50 36 86

Compute:

Students

ii) ø-correlation coefficient, IS Measure, Interest Factor when Customer Group=Working Adult

iii)Calculate Confidence for the rules when {HDTV=Yes} -> {Exercise Machine=Yes},

{HDTV=No} -> {Exercise Machine=Yes} when Customer Group=College Students and

when Customer Group=Working Adult

1 {a,b,d,e}

2 {b,c,d}

3 {a,b,d,e}

4 {a,c,d,e}

5 {b,c,d,e}

6 {b,d,e}

7 {c,d}

8 {a,b,c}

9 {a,d,e}

10 {b,d}

17. Identify the frequent itemsets in the above transactions using FP-Tree Algorithm

10

NMIT, Bangalore Data Mining Question Bank Dept of ISE

A

1 0

C=0 B 1 0 15

0 15 30

C=1 B 1 5 0

0 0 15

20. What do you mean by Timing Constraints with regard to Sequential Patterns?

21. Draw contingency tables for the rules {b} -> {c} and {a} -> {d} using the transactions shown

below:

1 {a,b,d,e}

2 {b,c,d}

3 {a,b,d,e}

4 {a,c,d,e}

5 {b,c,d,e}

6 {b,d,e}

7 {c,d}

8 {a,b,c}

9 {a,d,e}

10 {b,d}

Using these contingency tables from compute ø-correlation coefficient, IS Measure, Interest Factor and

Confidence for the two rules (Contingency tables)

UNIT-V

5. With respect to K-Means algorithm, explain how points are assigned to closest centroid using

SSE for centroid calculation

6. In K-Means Algorithm, How are initial centroids chosen? Explain with diagram

7. Give a table listing common choices for Proximity, Centroids and Objective Functions with

respect to K-Means Algorithm

11

NMIT, Bangalore Data Mining Question Bank Dept of ISE

8. Comment on Time and Space Complexity of K-Means Algorithm

12. Write and explain Basic Agglomerative Hierarchical Clustering Algorithm. How is proximity

between clusters defined?

13. Comment on the Time and Space Complexity of Agglomerative Hierarchical Clustering

algorithm.

16. Explain the Single Link or MIN method of Hierarchical Clustering with example

17. Explain Complete Link or MAX method of Hierarchical Clustering with example

19. How are points classified according to Centroid Based Density in DBSCAN algorithm? Explain with

diagrams and example

12

- 02. Introducing to DM & DWUploaded byBridget Smith
- Pattern Recognition and Clustering TechniquesUploaded bySrknt Rckz
- Applying Data Mining TechniquesUploaded byIbrar Hussain
- A Clustering Method to Identify Representative Financial RatiosUploaded byfirebirdshockwave
- Effective Feature Selection for Mining Text Data with Side-InformationUploaded byijtetjournal
- biggio14-aisecUploaded byapiotaya
- Using an unsupervised approach of Probabilistic Neural Network(PNN) for land use classification from multitemporal satellite imagesUploaded byAhmed Iounousse
- Mia Syafrina GjpamUploaded byMia Syafrina
- Enhanced multi-objective Fuzzy Clustering base protocol for 3D WSNsUploaded byeditor3854
- Secure and Distributed Approach for Mining Association RulesUploaded byEditor IJRITCC
- 10 Cluster AnalysisUploaded bydéborah_rosales
- An Efficient Technique for Mining Association Rules Using Enhanced Apriori Algorithm a Literature SurveyUploaded byEditor IJRITCC
- VOL2I3P5 - Finding Anomalies In DatabasesUploaded byJournal of Computer Applications
- c 04511822Uploaded byIOSRJEN : hard copy, certificates, Call for Papers 2013, publishing of journal
- 1836425.pdfUploaded byWulan Jessica
- Automatic Image Pixel Clustering Using Evolutionary Computation Technique 1Uploaded byGanesh Kona
- TowardsaBusinessModelTaxonomyofStartupsintheFinanceSecUploaded bypufu
- JOMMBAUploaded byshyam1985
- RECOMMENDATION FOR WEB SERVICE COMPOSITION BY MINING USAGE LOGSUploaded byLewis Torres
- 41 Figuringout.pdfUploaded byIJAERS JOURNAL
- IMDSUploaded bySunny Nguyen
- 3.ContentUploaded byGigo Pulikkottil
- Clustering Analysis of Railway Driving Missions With NichingUploaded bychrysobergi
- Cluster AnalysisUploaded byArpan Kumar
- Mining Frequent Itemsets Using Genetic AlgorithmUploaded byUday Ravichettu
- b7c968235e575efecb61c8be1be2e7533fd9Uploaded byNinad Samel
- silverUploaded byAnthony Wells
- 15Uploaded bysdghfgh
- Improving Expression Data Mining Through ClusterUploaded byYuhefizar Ephi Lintau
- Group1 Final ReportUploaded byArpit Kumar

- Web Technology Question BankUploaded byVinay Gopal
- TRAI New SMS RegulationsUploaded byVinay Gopal
- san qn bnkUploaded byVinay Gopal
- Anaadyanta ScheduleUploaded byVinay Gopal
- Resume 2Uploaded byapi-26861698
- SOftware Practice and testingUploaded byVinay Gopal
- Ds, c, c++, Aptitude, Unix, Rdbms, SQL, Cn, OsUploaded byapi-3726520
- Resume 3Uploaded byapi-26861698
- WT Model Question PaperUploaded byVinay Gopal
- Resume 1Uploaded byravinderranjan
- JSPUploaded byVinay Gopal
- 299-words-greUploaded byAsit Dalai

- Arranging HeapsUploaded byElvis Capia Quispe
- Chapter 11 - Analysis of Algorithms(4)Uploaded bysandip
- bee algorithmUploaded byChu Văn Nam
- Big o Linked List Array ListUploaded byfffff
- Data Structures NotesUploaded byhariharanbook
- java bookUploaded byapi-3836128
- Algorithm Lab ReportUploaded bythe.nube
- 2D Segment_Quad Tree Explanation with C++ - Stack OverflowUploaded byKrutarth Patel
- 20160912_Transform_and_Conquer.pdfUploaded byFyyrree
- tut3Uploaded byKatneza Katman Mohlala
- ad5Uploaded bySowmyaMukhamiSrinivasan
- Backtracking Line SearchUploaded bymutasili
- c-Unit 8Uploaded byshabbeersks
- BT0033 DATA STRUCTURE USING C PAPER 2 (BSciIT SEM 1)Uploaded bySeekEducation
- Good Parameters for Particle Swarm OptimizationUploaded bynstl0101
- Chap10 SlidesUploaded byMihaescu Florin
- Sariel Har-Peled_ AlgorithmsUploaded bysomeguyinoz
- Linked ListUploaded byapi-26091603
- greedy algorithm.pptUploaded byRahul Rahul
- An Index Based K-Partitions Multiple Pattern Matching AlgorithmUploaded byIDES
- Quick Sort with Optimal Worst Case Running TimeUploaded byAJER JOURNAL
- Algorithm PSOUploaded byDARWIN PATIÑO PÉREZ
- Comparison of AlgorithmsUploaded byPalanati Durgaprasad
- CIIT DIP lecture 20 Segmentation.pdfUploaded byMahrukh Hanif Malik
- FFTUploaded bystarman222
- 4-Dynamic Programming.pptUploaded byKartik Verma
- Perfect Play Using Nine Men's Moris as an ExampleUploaded byDanila Ion Radu
- Lecture 11 h.avlUploaded byjnfz
- 002_dcProblemsUploaded byTayyab Usman
- DAA-QBUploaded bybaba