Professional Documents
Culture Documents
NAME:
QUESTION 1: - Create your own data set and find out how to convert to arff file
Then, enter a name of the file, and click ‘Save’ button. Ignore all messages that appear by
clicking ‘OK’.
1
Machine Learning Individual Assignment on Weka
Step3: Open the file with Note pad and you need to change the first line, which holds the
attribute names, into the header structure that makes up the beginning of an ARFF file. Add a
@relation tag with the dataset’s name, @attribute tag with the attribute information, and a
@data tag as shown below
Step4: Save the file using the file extension called .arff
2
Machine Learning Individual Assignment on Weka
It brings up a dialog box allowing you to browse for the data file on the local file system,
3
Machine Learning Individual Assignment on Weka
QUESTION 2:- Know about the metrics ROC, MCC, Kappa Statistics
4
Machine Learning Individual Assignment on Weka
MCC=SQRT(X2/n)
Where n is the total number of observations.
While there is no perfect way of describing the confusion matrix of true and false positives
and negatives by a single number, the Matthews correlation coefficient is generally regarded
as being one of the best such measures. Other measures, such as the proportion of correct
predictions (also termed accuracy), are not useful when the two classes are of very different
sizes. For example, assigning every object to the larger set achieves a high proportion of
correct predictions, but is not generally a useful classification.
The MCC can be calculated directly from the confusion matrix using the formula:
MCC= TP*TN-FP*FN/SQRT((TP+FP)(TP-FN)(TN+FP)(TN+FN))
Example
Given a sample of 13 pictures, 8 of cats and 5 of dogs, where cats belong to class 1 and dogs
belong to class 0,
actual = [ 1,1,1,1,1,1,1,1,0,0,0,0,0],
assume that a classifier that distinguishes between cats and dogs is trained, and we take the
13 pictures and run them through the classifier, and the classifier makes 8 accurate
predictions and misses 5: 3 cats wrongly predicted as dogs (first 3 predictions) and 2 dogs
wrongly predicted as cats (last 2 predictions).
prediction = [0,0,0,1,1,1,1,1,0,0,0,1,1]
With these two labelled sets (actual and predictions) we can create a confusion matrix that
will summarize the results of testing the classifier:
Actual class
Cat Dog
Predicte Cat 5 2
5
Machine Learning Individual Assignment on Weka
d
class
Dog 3 3
In this confusion matrix, of the 8 cat pictures, the system judged that 3 were dogs, and of the
5 dog pictures, it predicted that 2 were cats. All correct predictions are located in the diagonal
of the table (highlighted in bold), so it is easy to visually inspect the table for prediction
errors, as they will be represented by values outside the diagonal.
In abstract terms, the confusion matrix is as follows:
Actual class
P N
Predicte P TP FP
d
class N FN TN
Cats Dogs
Cats 10 7
Dog 5 8 Assume that a model was built using supervised machine learning on
s
labelled data. This doesn't always have to be the case; the kappa
statistic is often used as a measure of reliability between two human raters. Regardless,
columns correspond to one "rater" while rows correspond to another "rater". In supervised
machine learning, one "rater" reflects ground truth (the actual values of each instance to be
classified), obtained from labeled data, and the other "rater" is the machine learning
6
Machine Learning Individual Assignment on Weka
7
Machine Learning Individual Assignment on Weka
Example 2: - here is a less balanced confusion matrix and the corresponding calculations:
Cats Dogs
Cats 22 9
Dog 7 13
s
Ground truth: Cats (29), Dogs (22)
Machine Learning Classifier: Cats (31), Dogs (20)
Total: (51)
Observed Accuracy: ((22 + 13) / 51) = 0.69
Expected Accuracy: ((29 * 31 / 51) + (22 * 20 / 51)) / 51 = 0.51
Kappa: (0.69 - 0.51) / (1 - 0.51) = 0.37
QUESTION 3:-Try to perform the hierarchical clustering on your own data set
Clustering is a data mining (machine learning) technique that finds similarities between data
according to the characteristics found in the data & groupssimilardataobjectsintoonecluster
8
Machine Learning Individual Assignment on Weka
Step 3: Open the file named MyCluster.arff, then click on cluster button and choose
hierarchical cluster from the choose button
Here you can see how many clusters you have, the link type and others by clicking on the
name of the cluster
9
Machine Learning Individual Assignment on Weka
10
Machine Learning Individual Assignment on Weka
Search Method.
Each section has multiple techniques from which to choose.
The attribute evaluator is the technique by which each attribute in your dataset (also called a
column or feature) is evaluated in the context of the output variable (e.g. the class). The
search method is the technique by which to try or navigate different combinations of
attributes in the dataset in order to arrive on a short list of chosen features.
Some Attribute Evaluator techniques require the use of specific Search Methods. For
example, the CorrelationAttributeEval technique can only be used with a Ranker Search
Method, that evaluates each attribute and lists the results in a rank order. When selecting
different Attribute Evaluators, the interface may ask you to change the Search Method to
something compatible with the chosen technique.
Both the Attribute Evaluator and Search Method techniques can be configured. Once chosen,
click on the name of the technique to get access to its configuration details.
Click the “More” button to get more documentation on the feature selection technique and
configuration parameters. Make your mouse cursor over a configuration parameter to get a
tooltip containing more details.
11
Machine Learning Individual Assignment on Weka
12
Machine Learning Individual Assignment on Weka
13
Machine Learning Individual Assignment on Weka
Running this feature selection technique on the LidiyaAssign dataset selects 1 of the 3 input
variables: Temperature.
14