(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, o. 11, ovember 2011
set. In the end, overall prediction is made by majority voting.The paper concludes with two novel classifiers Meta classifier and Decision Trees classifier that give idea of their Accuracyand Precision attributes.
Meta Classifier: AdaBoost Algorithm
Adaptive boosting is a popular and powerful metaensemble algorithm. “Boosting” is an effective method for theimprovement in the performance of any learning algorithm. Itis also referred as “stagewise additive modeling”. The modelis a more user friendly algorithm. The algorithm does notsuffer from overfitting. It solves both the binary classification problems as well as multiclass problems in the machinelearning community. AdaBoost also gives an extension toregression problems. Boosting algorithms are stronger than bagging on noise free data. The algorithm depends more ondata set than type of classifier algorithms. The algorithm putsmany weak classifiers together to create one strong classifier.It is a sequential production of classifiers.To construct a classifier:1.
A training set is taken as input2.
A set of weak or base learning algorithms are calledrepeatedly in a series of rounds to maintain a set of weights over the training set. Initially, all weights areset equally, but on each round, the weights of incorrectly classiﬁed examples are increased so that theweak learner is forced to focus on the hard examples inthe training data.3.
This boosting can be applied by two frameworks, i) boosting by weighting ii) boosting by sampling. In boosting by weighting method, the base learningalgorithms can accept a weighted training set directly.With such algorithms, the entire training set is given tothe base learning algorithm. And in boosting bysampling examples are drawn with replacement fromthe training set with probability proportional to their weights.4.
The stopping iteration is determined by crossvalidation.The algorithm does not require prior knowledge about theweak learner and so can be flexibly combined with
method for finding weak hypotheses. Finally, it comes with aset of theoretical guarantees given sufficient data and a weak learner that can reliably provide only moderately accurateweak hypotheses.The algorithm is used on learning problems having either of the following two properties. The first property is that theobserved examples tend to have varying degrees of hardness.The boosting algorithm tends to generate distributions thatconcentrate on the harder examples, thus challenging the weak learning algorithm to perform well on these harder parts of thesample space. The second property is that the algorithm issensitive to changes in the training examples so thatsignificantly different hypotheses are generated for differenttraining sets.
Meta Classifier: Bagging Algorithm
Bagging is a machine learning method of combiningmultiple predictors. It is a model averaging approach.Bagging is a technique generating multiple training sets bysampling with replacement from the available training data. Itis also known as bootstrap aggregating. Bootstrapaggregating improves classification and regression models interms of stability and accuracy. It also reduces variance andhelps to avoid overfitting. It can be applied to any type of classifiers. Bagging is a popular method in estimating bias,standard errors and constructing confidence intervals for parameters.To build a model,i)
split the data set into training set and test set.ii)
Get a bootstrap sample from the training data andtrain a predictor using the sample.Repeat the steps at random number of times. The modelsfrom the samples are combined by averaging the output for regression or voting for classification. Bagging automaticallyyields an estimate of the out of sample error, also referred toas the generalization error. Bagging works well for unstablelearning algorithms like neural networks, decision trees andregression trees. But it works poor in stable classifiers like k-nearest neighbors. The lack of interpretation is the maindisadvantage of bagging. The bagging method is used in theunsupervised context of cluster analysis.
Decision Tree Classifier: ADTree Algorithm
The Alternating Decision Tree (ADTree) is a successfulmachine learning classification technique that combines manydecision trees. It uses a meta-algorithm boosting to gainaccuracy. The induction algorithm is used to solve binaryclassification problems. The alternating decision trees providea mechanism to generate a strong classifier out of a set of weak classifier. At each boosting iteration, a splitter node andtwo prediction nodes are added to the tree, to generate adecision tree. In accordance with the improvement of purity,the algorithm determines a place for the splitter node byanalyzing all prediction nodes. Then the algorithm takes thesum of all prediction nodes to gain overall prediction values.A positive sum represents one class and a negative sumrepresents the other in two class data sets. A special feature of ADTree is the trees can be merged together. In multiclass problems the alternating decision tree can make use of all theweak hypotheses in boosting to arrive at a single interpretabletree from large numbers of trees.
Decision Tree Classifier: Random Forest Algorithm
A random forest is a refinement of bagged trees toconstruct a collection of decision trees with controlledvariations. The method combines Breiman's bagging and Ho'srandom subspace method. The algorithm improves on bagging by de-correlating the trees. It grows trees in parallelindependently of one another. They are often used in very