You are on page 1of 20

ADITYA COLLEGE OF ENGINEERING

PUNGANUR ROAD, MADANAPALLE-517325


III-B.Tech(R13) II-Sem II-Internal Examinations May-2017 (Descriptive)
A
13A05603 Datamining (Computer Science & Engineering)
Time: 90 min Max Marks: 30

Part A
(Compulsory)
1. a. Write a note on attribute oriented induction.
Attribute-Oriented Induction (AOI) is a data mining technique that produces simplified
descriptive patterns. Classical AOI uses a predictive strategy to determine distinct values of
an attribute but generalises attributes indiscriminately i.e. the value 'ANY' is replaced like
any other value without restrictions. AOI only produces interesting rules by using interior
concepts of attribute hierarchies. Attribute Oriented Induction is not confined to categorical
data nor particular measures.
Procedure:
Collect the task-relevant data( initial relation) using a relational database query
Perform generalization by attribute removal or attribute generalization.
Apply aggregation by merging identical, generalized tuples and accumulating their
respective counts.
Interactive presentation with users.
Key points in Attribute Oriented Induction
Data focusing: task-relevant data, including dimensions, and the result is the initial relation.
Attribute-removal: remove attribute A if there is a large set of distinct values for A but (1)
there is no generalization operator on A, or (2) As higher level concepts are expressed in
terms of other attributes.
Attribute-generalization: If there is a large set of distinct values for A, and there exists a set
of generalization operators on A, then select an operator and generalize A.
Attribute-threshold control: typical 2-8, specified/default.
Generalized relation threshold control: control the final relation/rule size.
b. List out the characteristics of kNN classifier.
The k-nearest-neighbor method was first described in the early 1950s. The method is labor
intensive when given large training sets, and did not gain popularity until the 1960s when
increased computing power became available. It has since been widely used in the area of
pattern recognition. Nearest-neighbor classifiers are based on learning by analogy, that is, by
comparing a given test tuplewith training tuples that are similar to it. The training tuples are
described by n attributes. Each tuple represents a point in an n-dimensional space. In this
way, all of the training tuples are stored in an n-dimensional pattern space. When given an
unknown tuple, a k-nearest-neighbor classifier searches the pattern space for the k training
tuples that are closest to the unknown tuple. These k training tuples are the k nearest
neighbors of the unknown tuple. Closeness is defined in terms of a distance metric, such
as Euclidean distance.
c. What are Random Forests?
Random forest (or random forests) is an ensemble classifier that consists of many decision
A
trees and outputs the class that is the mode of the class's output by individual trees. The term
came from random decision forests that was first proposed by Tin Kam Ho of Bell Labs in
1995. The method combines Breiman's "bagging" idea and the random selection of features.
The advantages of random forest are:
It is one of the most accurate learning algorithms available. For many data sets, it
produces a highly accurate classifier.
It runs efficiently on large databases.
It can handle thousands of input variables without variable deletion.
It gives estimates of what variables are important in the classification.
It generates an internal unbiased estimate of the generalization error as the forest
building progresses.
It has an effective method for estimating missing data and maintains accuracy when a
large proportion of the data are missing.
It has methods for balancing error in class population unbalanced data sets.
Generated forests can be saved for future use on other data.
Prototypes are computed that give information about the relation between the variables
and the classification.
It computes proximities between pairs of cases that can be used in clustering, locating
outliers, or (by scaling) give interesting views of the data.
The capabilities of the above can be extended to unlabeled data, leading to
unsupervised clustering, data views and outlier detection.
It offers an experimental method for detecting variable interactions.

Disadvantages
Random forests have been observed to overfit for some datasets with noisy
classification/regression tasks.
For data including categorical variables with different number of levels, random
forests are biased in favor of those attributes with more levels. Therefore, the variable
importance scores from random forest are not reliable for this type of data.
d. What is Apriori Principle?
Apriori principle (Main observation):
If an itemset is frequent, then all of its subsets must also be frequent
Apriori principle holds due to the following property of the support measure:

X , Y : ( X Y ) s ( X ) s (Y )

The support of an itemset never exceeds the support of its subsets


This is known as the anti-monotone property of support
null
A
A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

e. Explain different types of clustering.


Clustering is the task of dividing the population or data points into a number of groups such
that data points in the same groups are more similar to other data points in the same group
than those in other groups. In simple words, the aim is to segregate groups with similar traits
and assign them into clusters.

Various types of clnsterings: hierarchical (nested) versus partitional ( unnested)


exclusive versus overlapping versus fuzzy, and
complete versus partial.
Hierarchical versus Partitional: The most commonly discussed distinction among different
types of clusterings is whether the set of clusters is nested
or unnested, or in more traditional terminology. hierarchical or partitional. A partitional
clustering is simply a division of the set of data objects into
non-overlapping subsets (clusters) such that each data object is in exactly one subset. A
hierarchical clustering can be viewed as a sequence of partitional clusterings and a
partitionaJ clustering can be obtained by taking any member of that sequence; i.e., by cutting
the hierarchical tree at a particular level.
Exclusive versus Overlapping versus Fuzzy: In the most general sense, an overlapping or
non-exclusive clustering is used to reflect the fact that an object can simultaneously belong
to more than one group (class) . In a fuzzy clustering. every object belongs to every cluster
with a membership weight that is between 0 (absolutely doesn't belong) and 1 (absolutely
belongs). In other words, clusters are treated as fuzzy sets. (Mathematically, a fuzzy set is
one in which an object belongs to any set with a weight that is between 0 and 1. In fuzzy
clustering, we often impose the additional constraint that the sum of the weights for each
object must equal 1.).
Complete versus Partial: A complete clustering assigns every object to a cluster, whereas a
partial clustering does not. The motivation for a partial
clustering is that some objects in a data set may not belong to well-defined groups. Many
A
times objects in the data set may represent noise, outliers, or
"'uninteresting background." Thus, to find the important topics in last month's stories, we
may want to search only for clusters of documents that are tightly related by a common
theme. In other cases, a complete clustering of the objects is desired.

Part-B
2. a. Explain Decision Tree Induction.Describe its implementation using Hunts algorithm.
b. What are the measures of selecting the splitting attribute. Explain.
The major steps are as follows:
The tree starts as a single root node containing all of the training tuples.
If the tuples are all from the same class, then the node becomes a leaf, labeled with that class.
Else, an attribute selection method is called to determine the splitting criterion. Such a method may
using a heuristic or statistical measure (e.g., information gain or gini index) to select the \best" way
to separate the tuples into individual classes. The splitting criterion consists of a splitting attribute
and may also indicate either a split-point or a splitting subset, as described below.
Next, the node is labeled with the splitting criterion, which serves as a test at the node. A branch is
grown from the node to each of the outcomes of the splitting criterion and the tuples are partitioned
accordingly. There are three possible scenarios for such partitioning.
1. If the splitting attribute is discrete-valued, then a branch is grown for each possible value of the attribute.
2. If the splitting attribute, A, is continuous-valued, then two branches are grown, corresponding to the
conditions A <= split point and A > split point.
3. If the splitting attribute is discrete-valued and a binary tree must be produced (e.g., if the gini index was used
as a selection measure), then the test at the node is A SA?" where SA is the splitting subset for A. It is a
subset of the known values of A. If a given tuple has value aj of A and if aj SA, then the test at the node is
satisfied.
The algorithm recurses to create a decision tree for the tuples at each partition.
The stopping conditions are:
If all tuples at a given node belong to the same class, then transform that node into a leaf, labeled
with that class.
If there are no more attributes left to create more partitions, then majority voting can be used to
convert the given node into a leaf, labeled with the most common class among the tuples.
If there are no tuples for a given branch, a leaf is created with the majority class from the parent node.
(or)
3. a. Explain in detail neural network learning for classification using back propagation
algorithm.
Classification by Backpropagation
Backpropagation: A neural network learning algorithm. It was started by psychologists and neurobiologists to develop and test
computational analogues of neurons. A neural network is a set of connected input/output units where each connection has a weight
associated with it. During the learning phase, the network learns by adjusting the weights so as to be able to predict the correct class label
of the input tuples. Also referred to as connectionist learning due to the connections between units

Neural Network as a Classifier : Strengths & Weakness


Weakness:
Long training time
Require a number of parameters typically best determined empirically, e.g., the network topology or ``structure."
Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of ``hidden units" in the network

Strength:
o High tolerance to noisy data
o Ability to classify untrained patterns
o Well-suited for continuous-valued inputs and outputs o Successful on a wide array of real-world data
Algorithms are inherently parallel
o Techniques have recently been developed for the extraction of rules from trained neural networks

A Neuron(= a perceptron)
A

A Multi-Layer Feed-Forward Neural Network

o The inputs to the network correspond to the attributes measured for each training tuple
o Inputs are fed simultaneously into the units making up the input layer
o They are then weighted and fed simultaneously to a hidden layer
o The number of hidden layers is arbitrary, although usually only one
o The weighted outputs of the last hidden layer are input to units making up the output layer, which emits the network's prediction
o The network is feed-forward in that none of the weights cycles back to an input unit or to an output unit of a previous layer

o From a statistical point of view, networks perform nonlinear regression: Given enough hidden units and enough training samples, they can closely
approximate any function

b. Write a short note on rule pruning.


Rule Pruning
The rule is pruned is due to the following reason
The Assessment of quality is made on the original set of training data. The rule may perform
well on training data but less well on subsequent data. That's why the rule pruning is
required.
A
The rule is pruned by removing conjunct. The rule R is pruned, if pruned version of R has
greater quality than what was assessed on an independent set of tuples.
FOIL is one of the simple and effective method for rule pruning. For a given rule R,
FOIL_Prune = pos - neg / pos + neg
where pos and neg is the number of positive tuples covered by R, respectively.
Note This value will increase with the accuracy of R on the pruning set. Hence, if the
FOIL_Prune value is higher for the pruned version of R, then we prune R.
4. a. Explain FP growth algorithm in detail.
FP tree algorithm, which use to identify frequent patterns in the area of Data Mining.
To identify frequent patterns from FP tree.Consider the following example:
Find all frequent itemsets or frequent patterns in the following database using FP-
growth algorithm. Take minimum support as 30%.

Table 1 - Snapshot of the Database

Step 1: Calculate Minimum Support


First should calculate the minimum support count. Question says minimum support
should be 30%. It calculate as follows:
Minimum support count(30/100 * 8) = 2.4
As a result, 2.4 appears but to empower the easy calculation it can be rounded to to the
ceiling value. Now,
Minimum support count is ceiling(30/100 * 8) = 3
Step 2: Find frequency of Occurence
Now time to find the frequency of occurrence of each item in the Database table. For
example, item A occurs in row 1,row 2,row 3,row 4 and row 7. Totally 5 times occurs
in the Database table. You can see the counted frequency of occurrence of each item in
Table 2.
A

Table2 -Frequency of Occurrence

Step 3: Prioritize the items.


In Table 2 you can see the numbers written in Red pen. Those are the priority of each
item according to it's frequency of occurrence. Item B got the highest priority (1) due
to it's highest number of occurrences. At the same time you have opportunity to drop
the items which not fulfill the minimum support requirement.For instance, if Database
contain F which has frequency 1, then you can drop it.
*Some people display the frequent items using list instead of table. The frequent item
list for the above table will be B:6, D:6, A: 5, E:4, C: 3.
Step 4: Order the items according to the priority.
As you see in the Table 3 new column added to the Table 1. In the Ordered Items
column all the items are queued according to it's priority, which mentioned in the Red
ink in Table 2. For example, in the case of ordering row 1, the highest priority item is
B and after that D, A and E respectively.

Table 3 - New version of the Table 1


Step 5 -Order the items according to priority
As a result of previous steps we got a ordered items table (Table 3). Now it's time to
draw the FP tree. I'll mention it row by row.
Row 1:
Note that all FP trees have 'null' node as the root node. So draw the root node first and
attach the items of the row 1 one by one respectively. (See the Figure 1) And write
their occurrences in front of it. (write using a pencil dear,because next time we have to
erase it. :D)
A

Figure 1- FP tree for Row 1


Row 2:
Then update the above tree (Figure 1) by entering the items of row 2. The items of row
2 are B,D,A,E,C. Then without creating another branch you can go through the
previous branch up to E and then you have to create new node after that for C. This
case same as a scenario of traveling through a road to visit the towns of the country.
You should go through the same road to achieve another town near to the particular
town.
When you going through the branch second time you should erase one and write two
for indicating the two times you visit to that node.If you visit through three times then
write three after erase two. Figure 2 shows the FP tree after adding row 1 and row 2.
Note that the red underlines which indicate the traverse times through the each node.

Figure 2- FP tree for Row 1,2


Row 3:
In row 3 you have to visit B,A,E and C respectively. So you may think you can follow
the same branch again by replacing the values of B,A,E and C . But you can't do that
you have opportunity to come through the B. But can't connect B to existing A
overtaking D. As a result you should draw another A and connect it to B and then
connect new E to that A and new C to new E. See Figure 3.

Row 4:
Then row 4 contain B,D,A. Now we can just rename the frequency of occurrences in
the existing branch. As B:4,D,A:3.

Row 5:
A

Figure 3 - After adding third row

In fifth raw have only item D. Now we have opportunity draw new branch from 'null'
node. See Figure 4.

Figure 4- Connect D to null node

Row 6:
B and D appears in row 6. So just change the B:4 to B:5 and D:3 to D:4.

Row 7:
Attach two new nodes A and E to the D node which hanging on the null node. Then
mark D,A,E as D:2,A:1 and E:1.

Row 8 :(Ohh.. last row)


Attach new node C to B. Change the traverse times.(B:6,C:1)
A

Figure 5 - Final FP tree

Step 6: Validation
After the five steps the final FP tree as follows: Figure 5.
How we know is this correct? Now count the frequency of occurrence of each item of
the FP tree and compare it with Table 2. If both counts equal, then it is positive point
to indicate your tree is correct.
b. Explain k-means clustering with a suitable example. Also write a short note on BIRCH
and CURE clustering techniques.

K-Means Clustering

K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering
problem. The procedure follows a simple and easy way to classify a given data set through a certain number
of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster.
These centroids shoud be placed in a cunning way because of different location causes different result. So,
the better choice is to place them as much as possible far away from each other. The next step is to take each
point belonging to a given data set and associate it to the nearest centroid. When no point is pending, the
first step is completed and an early groupage is done. At this point we need to re-calculate k new centroids
as barycenters of the clusters resulting from the previous step. After we have these k new centroids, a new
binding has to be done between the same data set points and the nearest new centroid. A loop has been
generated. As a result of this loop we may notice that the k centroids change their location step by step until
no more changes are done. In other words centroids do not move any more.
Finally, this algorithm aims at minimizing an objective function, in this case a squared error function. The
objective function

where is a chosen distance measure between a data point and the cluster centre , is an
indicator of the distance of the n data points from their respective cluster centres.

The algorithm is composed of the following steps:


1. Place K points into the space represented by the objects that are being
clustered. These points represent initial group centroids.
2. Assign each object to the group that has the closest centroid. A
3. When all objects have been assigned, recalculate the positions of the K
centroids.
4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a
separation of the objects into groups from which the metric to be minimized
can be calculated.

Although it can be proved that the procedure will always terminate, the k-means algorithm does not
necessarily find the most optimal configuration, corresponding to the global objective function minimum.
The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means
algorithm can be run multiple times to reduce this effect.

K-means is a simple algorithm that has been adapted to many problem domains. As we are going to see, it is
a good candidate for extension to work with fuzzy feature vectors.

An example
Suppose that we have n sample feature vectors x1, x2, ..., xn all from the same class, and we know that they
fall into k compact clusters, k < n. Let mi be the mean of the vectors in cluster i. If the clusters are well
separated, we can use a minimum-distance classifier to separate them. That is, we can say that x is in cluster
i if || x - mi || is the minimum of all the k distances. This suggests the following procedure for finding the k
means:

Make initial guesses for the means m1, m2, ..., mk


Until there are no changes in any mean
o Use the estimated means to classify the samples into clusters
o For i from 1 to k
Replace mi with the mean of all of the samples for cluster i
o end_for
end_until

Here is an example showing how the means m1 and m2 move into the centers of two clusters.

BIRCH (Balanced Iterative Reducing and Clustering Using Hierarchies)

Begins by partitioning objects hierarchically using tree structures, and then applies other clustering
algorithms to refine the clusters.

BIRCH is local (instead of global). Each clustering decision is made without scanning all data points or
currently existing clusters.

BIRCH exploits the observation that the data space is usually not uniformly occupied, and therefore not
every data point is equally important for clustering purposes.
BIRCH makes full use of available memory to derive the finest possible subclusters while minimizing I/O
costs.

The BIRCH Clustering Algorithm


A
Data

Phase 1: Load into memory by a building a CF tree

Initial CF Tree

Phase 2 (optional): Condense into desirable range by


building a smaller CF Tree

Smaller CF Tree

Phase 3: Global Clustering

Good Clusters

Phase 4 (optional and off line): Cluster Refining

Better Clusters

CURE: Clustering Using Representatives


CURE: proposed by Guha, Rastogi & Shim, 1998

A new hierarchical clustering algorithm that uses a fixed number of points as representatives
(partition)

Centroid based approach: uses 1 pt to represent cluster => too little information sensitive
to data shapes

All point based approach: uses all points to cluster => too much information sensitive to
outliers

A constant number c of well scattered points in a cluster are chosen, and then shrunk toward the
center of the cluster by a specified fraction alpha

The clusters with the closest pair of representative points are merged at each step

Stops when there are only k clusters left, where k can be specified

Six Steps in CURE Algorithm

Data Draw Random Sample Partition Sample Partially Cluster Partitions


Label Data In Disk Cluster Partial Clusters Eliminate Outliers

A
CUREs Advantages

More accurate:

Adjusts well to geometry of non-spherical shapes.

Scales to large datasets

Less sensitive to outliers

More efficient:

Space complexity: O(n)

Time complexity: O(n2logn) (O(n2) if dimensionality of data points is small)

CURE vs. BIRCH: quality of clustering

BIRCH cannot distinguish between the big and small clusters.

MST (all-point approach) merges the two ellipsoids.

CURE and BIRCH are two hierarchical clustering algorithms

CURE adjusts well to clusters having non-spherical shapes and wide variances in size.

CURE can handle large databases efficiently.

(or)
5. a. Explain Association Rule Mining.
Association Rules Mining
Association rule learning is a popular and well researched method for discovering interesting relations
between variables in large databases. It describes analyzing and presenting strong rules discovered in
databases using different measures of interestingness. Based on the concept of strong rules, association
rules were introduced for discovering regularities between products in large-scale transaction data recorded
by point-of-sale (POS) systems in supermarkets. For example, the rule
found in the sales data of a supermarket would indicate that if a
customer buys onions and potatoes together, he or she is likely to also buy hamburger meat. Such
information can be used as the basis for decisions about marketing activities such as, e.g., promotional
pricing or product placements. In addition to the above example from market basket analysis association
rules are employed today in many application areas including Web usage mining, intrusion detection and
bioinformatics. As opposed to sequence mining, association rule learning typically does not consider the
order of items either within a transaction or across transactions.

Formally, given a set of transactions, find rules that will predict the occurrence of an item based
on the occurrences of other items in the transaction.
Given a set of transactions T, the goal of association rule mining is to find all rules having

support minsup threshold

confidence minconf threshold

Brute-force approach:

List all possible association rules


Compute the support and confidence for each rule

Prune rules that fail the minsup and minconf thresholds A


Mining Association Rules

.. Two-step approach:

1. Frequent Itemset Generation

Generate all itemsets whose support = minsup

2. Rule Generation

Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent
itemset

.. Frequent itemset generation is still computationally expensive

b. Write a short note on the following:


1. SOM

Self-Organizing Map (SOM)


The Self-Organizing Map is one of the most popular neural network models. It belongs to the category of
competitive learning networks. The Self-Organizing Map is based on unsupervised learning, which means
that no human intervention is needed during the learning and that little needs to be known about the
characteristics of the input data. We could, for example, use the SOM for clustering data without knowing
the class memberships of the input data. The SOM can be used to detect features inherent to the problem and
thus has also been called SOFM, the Self-Organizing Feature Map.

The Self-Organizing Map was developed by professor Kohonen. The SOM has been proven useful in many
applications. The SOM algorithm is based on unsupervised, competitive learning. It provides a topology
preserving mapping from the high dimensional space to map units. Map units, or neurons, usually form a
two-dimensional lattice and thus the mapping is a mapping from high dimensional space onto a plane. The
property of topology preserving means that the mapping preserves the relative distance between the points.
Points that are near each other in the input space are mapped to nearby map units in the SOM. The SOM can
thus serve as a cluster analyzing tool of high-dimensional data. Also, the SOM has the capability to
generalize. Generalization capability means that the network can recognize or characterize inputs it has
never encountered before. A new input is assimilated with the map unit it is mapped to.

In the basic SOM algorithm, the topological relations and the number of neurons are fixed from the
beginning. This number of neurons determines the scale or the granularity of the resulting model. Scale
selection affects the accuracy and the generalization capability of the model. It must be taken into account
that the generalization and accuracy are contradictory goals. By improving the first, we lose on the second,
and vice versa.

2. SVM
Support Vector Machine (SVM) finds an optimal* solution. It maximizes the distance between the hyperplane and
the difficult points close to decision boundary. If there are no points near the decision surface, then there are no
very uncertain classification decisions. SVMs maximize the margin around the separating hyperplane. The decision
function is fully specified by a subset of training samples, the support vectors. Solving SVMs is a quadratic
programming problem. Seen by many as the most successful current text classification method.The classifier is a
separating hyperplane. The most important training points are the support vectors; they define the hyperplane.
Quadratic optimization algorithms can identify which training points xi are support vectors with non-zero Lagrangian
multipliers i. Both in the dual formulation of the problem and in the solution, training points appear only inside
inner products: A
Find 1N such that
T
Q() =i - ijyiyjxi xj is maximized and
(1) iyi = 0
(2) 0 i C for all i

T
f(x) = iyixi x + b

Non Linear SVM


SVM locates a separating hyperplane in the feature space and classify points in that space

It does not need to represent the space explicitly, simply by defining a kernel function

The kernel function plays the role of the dot product in the feature space.

Properties of SVM
Flexibility in choosing a similarity function
Sparseness of solution when dealing with large data sets
- only support vectors are used to specify the separating hyperplane
Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the feature space
Overfitting can be controlled by soft margin approach
Nice math property: a simple convex optimization problem which is guaranteed to converge
to a single global solution
Feature Selection
SVM Applications
SVM has been used successfully in many real-world problems
- text (and hypertext) categorization
- image classification
- bioinformatics (Protein classification,
Cancer classification)
- hand-written character recognition

Weakness of SVM
It is sensitive to noise
- A relatively small number of mislabeled examples can dramatically decrease the performance
It only considers two classes
- how to do multi-class classification with SVM?
- Answer: A
1) with output arity m, learn m SVMs
SVM 1 learns Output==1 vs Output != 1
SVM 2 learns Output==2 vs Output != 2
:
SVM m learns Output==m vs Output != m
2)To predict the output for a new input, just predict with each SVM and find out which one puts
the prediction the furthest into the positive region.

3. Multi class Problem and Class Imbalance Problem.


The class imbalance problem
The class imbalance problem states that if the class we are interested in is very rare,
then the classifier will ignore it.
Solutions for this will be:
a. We can modify the optimization criterion by using a cost sensitive metric
b. We can balance the class distribution
i. Sample from the larger class so that the size of the two classes is the same
ii. Replicate the data of the class of interest so that the classes are balanced
1. Over-fitting issues
Multiclass Problem
Multiclass classification is the problem of classifying instances into one of the more
than two classes. Multiclass classification is used to predict:
one of three or more possible outcomes
and the likelihood of each one.
Generally, there is no notion of closeness because the target class is nominal. See
nominal measurement
Example
Is this product a book, a movie, or an article of clothing?
Is this movie a comedy, a documentary, or a thriller?
Which category of products is of most interest to this customer?
4. Describe Partition based clustering methods in detail.
A
A

BIRCH
Refer 4.b.
ADITYA COLLEGE OF ENGINEERING
PUNGANUR ROAD, MADANAPALLE-517325
III-B.Tech(R13) II-Sem II-Internal Examinations May-2017 (Objective)
A
13A05603 Datamining (Computer Science & Engineering)

Name : Roll No. :


Time: 20 min Max Marks: 10

Answer all the questions 51=5M


1. List any two applications of classification.
A bank loan officer wants to analyze the data in order to know which customer (loan
applicant) are risky or which are safe.
A marketing manager at a company needs to analyze a customer with a given profile, who
will buy a new computer.
2. Define outliers and state their role in clustering.
The data objects that do not comply with the general behavior or model of the data and are
grossly different from or inconsistent with the remaining set of data are called outliers.
The outliers may be of particular interest, such as in the case of fraud detection, where
outliers may indicate fraudulent activity.
There are two basic types of procedures for detecting outliers namely block procedures and
sequential procedures.
3. Define support and confidence.

1.Tuples are transactions, attribute-value pairs are items.


2. Association rule: {A,B,C,D,...} => {E,F,G,...}, where A,B,C,D,E,F,G,... are items.
3. Confidence (accuracy) of A => B : P(B|A) = (# of transactions containing both A and B) /
(# of transactions containing A).
4. Support (coverage) of A => B : P(A,B) = (# of transactions containing both A and B) /
(total # of transactions)
5. We looking for rules that exceed pre-defined support (minimum support) and have high
confidence.
4. What are the characteristics of SVM.

Support vector machines Choose hyperplane based on support vectors. Support vector =
critical point close to decision boundary. (Degree-1) SVMs are linear classifiers.
Kernels: powerful and elegant way to define similarity metric. Perhaps best performing text
classifier.
5. Name some graph based clustering techniques.
Chameleon
CURE
BIRCH

Choose the correct answer from following questions. 10 1/2 =5 M


1. SOM stands for _ _ _ _ _ _ _ _ _ _ [ A ]
a. Self Organizing maps b. Self Originating maps c. self outlier maps d.self online map
2. Nave Bayesian classifier is also called as _ _ _ _ _ _ _ _ _ _ classifier. [ 2 ]
a.computational Bayesian 2. simple Bayesian 3. non-computational Bayesian 4. Complexed Bayesian
3. Decision trees can easily be converted to _ _ _ _ _ _ _ rules. [ C ]
a. IF b. Nested IF c. If-THEN d. GROUP BY A
4. While _ _ _ _ predicts class, _ _ _ _ models continuous-valued functions. [ B ]
a. prediction-classification b. classification-prediction c. speed-scalability d. scalability-speed
5. Bayes theorem provides a way of calculating which probability? [ A ]
a. posterior b. Prior c. Stable d. Ideal
6. The _ _ _ _ _ _ algorithm where each cluster is represented by one of the objects located
near the center of cluster. [ A ]
a. k-means b. DENCLUE c. DBSCAN d.None
7. K-nearest neighbor is one of the _______. [ C ]
a. Poorest Search Technique b. Prototype based c. Partition method d. All the above
8. Intelligent miner is a mining tool from _______. [ C ]
a. Infosys b. Wipro c. IBM d. SAS
9. _____________is an example of hierarchical clustering. [ C ]
a. BIRCH b. CURE c. DIANA d. None
10.______algorithms is an example of decision tree induction algorithm. [ A ]
a.ID3 b. CLIQUE c. QUEST d. None

You might also like