induction, Naive Bayes and k-nn are employed in seizure prediction modeling.
Support Vector Machine
The machine is presented with a set of training examples,(x
) where the x
is the real world data instances and the y
are the labels indicating which class the instance belongs to.For the two class pattern recognition problem, y
= +1 or y
= -1. A training example (x
) is called positive if y
= +1 andnegative otherwise . SVMs construct a hyper plane thatseparates two classes and tries to achieve maximum separation between the classes. Separating the classes with a large marginminimizes a bound on the expected generalization error.The simplest model of SVM called Maximal Marginclassifier, constructs a linear separator (an optimal hyper plane)given by w
x - y= 0 between two classes of examples. Thefree parameters are a vector of weights w which is orthogonalto the hyper plane and a threshold value. These parameters areobtained by solving the following optimization problem usingLagrangian duality.Minimize =
li xw D
(1)where Dii corresponds to class labels +1 and -1. Theinstances with non null weights are called support vectors. Inthe presence of outliers and wrongly classified trainingexamples it may be useful to allow some training errors inorder to avoid over fitting. A vector of slack variables
i thatmeasure the amount of violation of the constraints is introducedand the optimization problem referred to as soft margin is given below. In this formulation the contribution to the objectivefunction of margin maximization and training errors can be balanced through the use of regularization parameter C. Thefollowing decision rule is used to correctly predict the class of new instance with a minimum error.f(x)= sgn[w
] (2)The advantage of the dual formulation is that it permits an efficient learning of non–linear SVM separators, byintroducing kernel functions. Technically, a kernel functioncalculates a dot product between two vectors that have been(non- linearly) mapped into a high dimensional feature space. Since there is no need to perform this mapping explicitly,the training is still feasible although the dimension of the realfeature space can be very high or even infinite. The parametersare obtained by solving the following non linear SVMformulation (in matrix form),Minimize LD (u) =1/2uT Qu - eT ud
u =0 0
Ce (3)where K - the Kernel Matrix. Q = DKD.The Kernel function K (AAT) (polynomial or Gaussian) isused to construct hyperplane in the feature space, whichseparates two classes linearly, by performing computations inthe input space.f(x)=sgn(K(x,xiT)*u-
) (4)where u - the Lagrangian multipliers. In general larger themargins will lower the generalization error of the classifier.
Naïve Bayes is one of the simplest probabilistic classifiers.The model constructed by this algorithm is a set of probabilities. Each member of this set corresponds to the probability that a specific feature fi appear in the instances of class c, i.e., P (fi ¦ c). These probabilities are estimated bycounting the frequency of each feature value in the instances of a class in the training set. Given a new instance, the classifier estimates the probability that the instance belongs to a specificclass, based on the product of the individual conditional probabilities for the feature values in the instance. The exactcalculation uses bayes theorem and this is the reason why thealgorithm is called a bayes classifier.
-nearest neighbor algorithms are only slightly morecomplex. The
nearest neighbor of the new instance isretrieved and whichever class is predominant amongst them isgiven as the new instance's classification. K-nearest neighbor is a supervised learning algorithm where the result of newinstance query is classified based on majority of K-nearestneighbor category . The purpose of this algorithm is toclassify a new object based on attributes and training samples.The classifiers do not use any model to fit and only based onmemory.
J48 Decision Tree Induction
J48 algorithm is an implementation of the C4.5 decisiontree learner. This implementation produces decision treemodels. The algorithm uses the greedy technique to inducedecision trees for classification . A decision-tree model is built by analyzing training data and the model is used toclassify unseen data. J48 generates decision trees, the nodes of which evaluate the existence or significance of individualfeatures.
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 7, October 2010167http://sites.google.com/site/ijcsis/ISSN 1947-5500