You are on page 1of 3
M_E. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2016. . Elective Biometrics and Cyber Security CP 7025 — DATA MINING TECHNIQUES (Common to M.E. Computer Science and Engineering) (Regulations 2013) Time : Three hours Maximum : 100 marks 10. Answer ALL questions. PART A — (10 x 2 = 20 marks) What is interactive mining? How quality of data is defined? Differentiate models and patterns; Define probability distribution function. Why are decision tree classifiers so popular? What is back propagation? What is divisive hierarchical clustering method? Define geodesic distance, What is information retrieval? What is ubiquitous data mining? 1. 12. 13. 4. 15. @) () (a) (b) (a) (b) ) (a) () PART B — (5 x 13 = 65 marks) @) Write short notes on mining methodology. © Gi) Explain why data preprocessing is necessary and write about the major tasks in data proprocessing. (oy Or (@® What aro the major challenges of mining a huge amount of data in comparison with mining a small amount of data. © (ii) Discuss issues which are to be considered in data integration. (7) @. Write short notes on source function for datamining algorithms. (6) (ii) Explain models for probability density functions. @ Or (What is modeling, explain the model structure for prediction. (6) (ii) _ Differentiate predictive and descriptive score function. @ Explain the working of Native Bayesian classifier, along with an example. (13) Or Explain support vector machines in detail. (13) @ Write and explain density based clustering algorithm. © (i) List out the major tasks involved in clustering evaluation. @ Or () Briefly describe and give examples of various clustering methods.(6) (i) Write about the problems, challenges and major methodologies of clustering high dimensional data. M Write about data mining for intrusion detection and prevention. (13) Or () Discuss the kinds of associations that can be mined in multimedia data, © Gi) Explain measures used for assessing quality of text retrieval. (7) 2 17301 16. (a) (b) PART C — (1x 16 = 15 marks) ‘The following table consists of training data from matches played by Novak and Rafael. Court Surface Grass Clay Hard Friendly Mixed Master Clay Grass Master Grand slam Best effort value 1 if Novak uded full strength in the match and 0 otherwise. In Outcome N. indicates that Novak won and R indicates Rafael won. Build a decision tree using the data Extract classification rules from the docision tree, Ar Cluster the following eight points (with (x,y) representing locations) into three clusters A1(2,10), A2(2,5), A3(8,4), A4(6,8), AG(6,4), A7(1,2) A8(4,0). Initial cluster contors are Al (2, 10), A4(6,8) and A7(1, 2). The distance function between two points (x1, yl) and b = (x2, y2) is defined as p(a,b) =| x2~ x1|+|y2-y1|. (i) Use k-means algorithm to find the three cluster centers after the erga ioeations (10) (i)_Develop an algorithm for the k-means clustering technique. (6) 3 17301

You might also like