You are on page 1of 6

IMPORTANT QUESTIONS OF UNIT-I TO UNIT-IV Prepare 3 star question for external lab & remaining prepare for end

external exams ***1.Define the terms Data Mining and data warehousing? ***2.Explain fundamentals of Data mining? ***3.Explain data mining functionalities? ***4.Explain Major issues of data mining? ***5.Explain classification of a data mining system? ***6. Explain Data Preprocessing Techniques? ***7.Explain Data cleaning and data integration? ***8.Explain Data Tranformation and data reduction? ***9.Explain data mining Task primitives? 10.Explain discritization and concept hierarchy generation? 11.Explain Multidimensional Data model? 12.Define the terms lattice of cubiod?OLAM,MOLAP,HOLAP,ROLAP? 13.Explain slice,dice and pivot operations? ***14.compare OLAP V/S OLTP? 15 or Differentiate operational Database versus datawarehouse? ***16.Explain the importance of Data Mining query Language? ***17.Explain starschema,snowflake schema and fact constellation schema? 18.Explain major types of concept hierarchies? 19.Explain central tendency?(not syllabus but previous jntu question) 20.Explain dispersion of data? ?(not syllabus but previous jntu question) ***21.Explain Data warehouse architecture? ***22. Explain data warehouse implementation? 23.Explain further Development of data cube technology?

24.Explain attribute oriented induction technique? 25.Explain Mining frequent patterns? 25.Define association rule and explain types of association rules? 26.Explain constraint based association mining? ***27.Explain apriori algorithm? ***28.Explain fp growth algorithm? 29.Explain classification and prediction techniques? 30.Explain Bayesian classification technique? 31.Explain support vector machine? ***32.Explain backpropogation algorithm? 33.Explain support vector machine? 34.Explain other classification methods? ***35.Explain decision tree and rule based classification? 36.*** Explain the steps for KDD? Viva questions? 1.data mining 2.data warehousing 3.KDD 4.KDD steps 5.preprocessing 6.Data cleaning 7.Data Integration? 8.Data Transformation 9.Data reduction? 10.cluster

11.principle of clustering 12.regression 13.classification 14.prediction 15.oltp 16.olap 17.data mining applications 18.weka 19.arff 20.csv 21.crossvalidation 22.decision tree algorithms 23.visualization tools 24.data mining tools 25.data warehousing tools 26.categorical attribute 27.dmql 28.what is accuracy 29.redundancy 30.types of data bases 31.types of data minings 32.graph mining 33.spatial and multimedia dm 34.sequence data mining and time series dm 35.ranking mechanism

36.process of creating a arff file? 1.a 1. List all the categorical (or nominal) attributes and the real-valued attributes

separately from credit risk assessment arff file.


b. **.Explain fundamentals of Data mining?

2.a What attributes do you think might be crucial in making the credit assessment? Come up with some simple rules in plain English using your selected attributes. b. ***.Explain data mining functionalities? 3. a. One type of model that you can create is a Decision Tree - train a Decision Tree using the complete dataset as the training data. Report the model obtained after training. b. ***.Explain Major issues of data mining?
4. a Create an arff file for credit risk assessment? And perform classification?display a decision tree? b. ***.Explain classification of a data mining system? 5. a. Suppose you use your above model trained on the complete dataset, and classify credit

good/bad for each of the examples in the dataset. What % of examples can you classify correctly? (This is also called testing on the training set) Why do you think you cannot get 100 % training accuracy?
b. Explain Data Preprocessing Techniques? 6.a. One approach for solving the problem encountered in the previous question

is using cross-validation? Describe what cross-validation is briefly. Train a Decision Tree again using cross-validation and report your results. Does your accuracy increase/decrease? Why?

b. Explain data mining Task primitives?

7. Check to see if the data shows a bias against "foreign workers" (attribute 20),or "personal-status"(attribute 9). One way to do this (Perhaps rather simple minded) is to remove these attributes fromthedataset and see if the decision tree created in those cases is significantly different from the full dataset case which you have already done. To remove an attribute you can use the reprocess tab in WEKA's GUI Explorer. Did removing these attributes have any significant effect? Discuss. b. compare OLAP V/S OLTP?
Or Differentiate operational Database versus datawarehouse?

8.a Another question might be, do you really need to input so many attributes toget good results? Maybe only a few would do. For example, you could try just having attributes 2, 3, 5, 7, 10, 17(consider your own attributes) (and 21, the class attribute (naturally)). Try out some combinations. (You had removed two attributes in problem 7Remember to reload the ARFF data file to get all the attributes

initially beforeyou start selecting the ones you want.) b. Explain the importance of Data Mining query Language? 9. a Sometimes, the cost of rejecting an applicant who actually has a good credit Case 1. might be higher than accepting an applicant who has bad credit Case 2.Instead of counting the misclassifications equally in both cases, give a higher cost to the first case (say cost 5) and lower cost to the second case. You can do this by using a cost matrix in WEKA. Train your Decision Tree again and report the Decision Tree and cross-validation results. Are they significantly different from results obtained in problem 6 (using equal cost)? b Explain Data warehouse architecture? 10. a Do you think it is a good idea to prefer simple decision trees instead of having long complex decision trees? How does the complexity of a Decision Tree relate to the bias of the model? b. Explain apriori algorithm?
.Explain fp growth algorithm? 11. .a

List all the categorical (or nominal) attributes and the real-valued attributes separately from weather data set arff file?

b. **. Explain backpropogation algorithm? 12.a What attributes do you think might be crucial in making the student arfft? Come up

with some simple rules in plain English using your selected attributes in a student arff file?
b. Explain decision tree and rule based classification? 13. a. One type of model that you can create is a Decision Tree - train a Decision Tree

using the complete dataset as the training data. Report the model obtained after training from student data set arff file
b.. .Define the terms Data Mining and data warehousing? 14. a Create an arff file for employee data set arff file?perform classification?display a decision tree? b. ***17.Explain starschema,snowflake schema and fact constellation schema? 15 a.Diferentiate arff file and csv file with one example each and execute in weka tool? b. Explain steps for knowledge discovery of data?