You are on page 1of 4

1. What is a Binary Variables?

2. ______________most popular and commonly used for partitioning method?


3. The data matrix is often called as
4. Which method is used to determine clusters with arbitrary shape
5. The agglomerative approach is also called as _________________________
6. Trend analysis consists of the following components or movements
7. A social network is a ____________________________________
8. __________________method is used to determine clusters with arbitary shape
9. ________________is the combination of both visualization and dataminig
10. Wave cluster is used________________algorithm
11. ____________can be mined using graph mining technique
12. Huge amount of space-related data are in ____________forms
13. GSP Stands for _______________________________
14. PAM Stands for_______________________________
15. DENCLUE stands for _____________________________
16. Web linkage structure ,web contents etc are include in _______________
17. DBSCAN stands for________________________
18. WWW stands for_________________________
19. DOM Means______________________________
20. Write any two Data mining Applications _________
21.The process of grouping a set of objects into class of similar objects is called as______
22.The dissimilarity matrix is also called as_________
23.Eclidean Distance between the objects (22,1,8) and (20,0,6) is___________
24.Minhattan Distance between the objects (22,1,8) and (20,0,6) is___________
25.The Divisive approach is also called as_________
26.Aglomarative approach is called as_________
27_______is one of the most popular neural network method in cluster analysis
28VFDT stands for__________
29.CVFDT abbreviation____________
30.The densification follows the densification power law states that_______
31______technique a new vertex is added by connecting it with a vertex on right-most path
32______technique a existing vertex is added by connecting it with a vertex on right-most path

33. Social Network Analysis is also called as_____


34. Inductive Logic Programming is used by__________
35. _____is a task of mining significant pattern from a plan base
36. _____has complex tasks such as graphics, images, maps, videos etc
37. Web linkage structure, web contents are included in _____
38. _______is a collection of pointers to spatial objects
39. MBR stands for________________
40. ARIMA is used for______________
41. Variance analysis problem is also called as______
42. Proteomics is field of _________________________
43. Gene is defined as_________________
44. Visual data mining can be viewd as________&________________
45. Multimedia indexing and retrievals ________________&____________
46. DDL stands for_________
47. Basic measures of Text retrievals____________
48. What is HITS?
49. What are the trends in datamining?
50. EM algorithm of __________clustring methods
51. Define Clustering Feature__________
52. Jaccard coefficient is defind as_________________
53. What Influence function ?
54. COBWEB has two additional operators_______&__________
55. Define Visibility graph?
56. Outlier is defined as______________
57. What are the Hypothesis in Outlier detection?
58. ______&_________are the procedures for detecting outlier?
59. __________&________ are the Critical layers
60. __________algorithm is the first algorithm for classification of stream data?

Subtasks:(khajamainuddin.sm@gmail.com)

1. List all the categorical (or nominal) attributes and the real valued attributes separately.
2. What attributes do you think might be crucial in making the credit assessment? Come up
with some simple rules in plain English using your selected attributes.
3. One type of model that you can create is a Decision tree. Train a Decision tree using the
complete data set as the training data. Report the model obtained after training.
4. Suppose you use your above model trained on the complete dataset, and classify credit
good/bad for each of the examples in the dataset. What % of examples can you classify
correctly?(This is also called testing on the training set) why do you think cannot get
100% training accuracy?
5. Is testing on the training set as you did above a good idea? Why or why not?
6. One approach for solving the problem encountered in the previous question is using crossvalidation? Describe what is cross validation briefly. Train a decision tree again using
cross validation and report your results. Does accuracy increase/decrease? Why?
7. Check to see if the data shows a bias against foreign workers or personal-status. One
way to do this is to remove these attributes from the data set and see if the decision tree
created in those cases is significantly different from the full dataset case which you have
already done. Did removing these attributes have any significantly effect? Discuss.
8. Another question might be, do you really need to input so many attributes to get good
results? May be only a few would do. For example, you could try just having attributes
2,3,5,7,10,17 and 21. Try out some combinations.(You had removed two attributes in
problem 7. Remember to reload the arff data file to get all the attributes initially before you
start selecting the ones you want.)

9. Sometimes, The cost of rejecting an applicant who actually has good credit might be
higher than accepting an applicant who has bad credit. Instead of counting the
misclassification equally in both cases, give a higher cost to the first case ( say cost 5)
and lower cost to the second case. By using a cost matrix in weak. Train your decision
tree and report the Decision Tree and cross validation results. Are they significantly
different from results obtained in problem 6.

10. Do you think it is a good idea to prefect simple decision trees instead of having long
complex decision tress? How does the complexity of a Decision Tree relate to the bias of
the model?

11. You can make your Decision Trees simpler by pruning the nodes. One approach is to use
Reduced Error Pruning. Explain this idea briefly. Try reduced error pruning for training
your Decision Trees using cross validation and report the Decision Trees you obtain? Also
Report your accuracy using the pruned model Does your Accuracy increase?
12. How can you convert a Decision Tree into if-then-else rules. Make up your own small
Decision Tree consisting 2-3 levels and convert into a set of rules. There also exist
different classifiers that output the model in the form of rules. One such classifier in weka
is rules. PART, train this model and report the set of rules obtained. Sometimes just one
attribute can be good enough in making the decision, yes, just one ! Can you predict what
attribute that might be in this data set? OneR classifier uses a single attribute to make
decisions(it chooses the attribute based on minimum error).Report the rule obtained by
training a one R classifier. Rank the performance of j48,PART,oneR.

You might also like