You are on page 1of 1

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 7, No. 9, 2016

measured the technique’s limitations, which means it failed to • E(S) is the information entropy of the subset S;
classify any attack as shown in Figure 5. We are also aware of
the computational efficiency of the techniques and how well • M is the number of different values of the attribute A in
they learn because we are dealing with comparatively large S;
data sets. Therefore, we observe the model building and • fS (Ai) is the items frequency that possess Ai as a value
testing time, which are listed in Table 2. for A in S;
On the basis of the classification accuracy, number of • Ai is the ith possible value of A; and
unclassified instances and computational complexity, we
found decision tree C4.5 could be a preferred choice for DoS • SAi is a subset of S that contains all items, where the
attack classification in the cloud computing area. The value of A is Ai.
classification accuracy and number of unclassified instances
Gain quantifies the entropy improvement through splitting
essentially summaries the average performances of the
over an attribute: higher is better. For to constructing the final
techniques for our attack classification task. So we tried to
decision tree, the algorithm computes the information gain of
observe the details of performance about the attack
each attribute [40].
classification scenario. As a result, we employed confusion
matrix [38] analysis to see the details of the techniques’
∑ E
performance measures. E=
1 N
(2)
N i =0 i
B. C4.5 Decision Tree Algorithm
Decision trees: Tree-shaped structures that represent sets
We build a model based on data mining for evaluating the
of decisions. These decisions generate rules for the
security state of cloud computing through simulating an attack
classification of a dataset. [35, 40] has developed C4.5
from a malicious source. This process involves identification
algorithm. A large tree can be constructed by C4.5 taking into
and utilization of vulnerabilities in real world scenario which
account all attribute values and finalizes the decision rule by
may occur in the cloud due to improper configuration, known
pruning [38]. This algorithm uses a heuristic methodology for
or unknown weaknesses in software systems, or hardware,
pruning, depending on the statistical significance of splits
operational weaknesses or loopholes in deployed safeguards.
[39]. The process of tree construction essentially calculates
information gain and the entropy to finalize the decision tree. We will use how strategy of inferring and analyzing the
Depending on this gain information, the C4.5 can determine data, searching for them in the cloud by one of the technology
the occurrence or non-occurrence of an attack. The expected tools (data mining) this paper shows the vision of the
information or entropy depends on the set partitioning process insurance. and the general arrangement for extracting the
into subsets by the equation [1]:- required data, through the cloud , enabling fighting terrorism
n
to limit the harms in advance by making the relief
E(S) = - ∑f
j =1
s
( j ) log 2 f s
( j) (1)
arrangements from the view of comprehensive security and
through the analysis of the results for the data survey.
Where:- This process of assigning predictions to individual records
is known as scoring. By scoring the same records used to
• E(S) is the subset information entropy (S); estimate the model, we can evaluate how accurately it
• n is the number of different values of the attribute in S performs on the training data—the data for which we know
(entropy is computed for one selected attribute); the outcome. This example uses a decision tree model, which
classifies records (and predicts a response) using a series of
• fS (j) is the frequency (proportion) of the value j in the decision rules.
subset S; and
C. Testing and verification of security and Integrity using a
• Log 2 is the binary logarithm. Entropy of (0) defines a simple decision tree model
perfectly classified subset, whereas (1) indicates a Using a simple decision tree model Chaid algorithm
completely random composition. Entropy is used for security rating for classifying the data including the fields of
determining the next node to be split in the algorithm. entry or the variables, Decision tree is the structure of the tree
This means that raising the entropy, leads to increase in on the shape of tree branches that represents sets of decisions.
the potential to improve the classification. These decisions generate rules for classification of the set of
The encoding information that would be gained by the data. It includes limited forms for the branches of the
branching on A is given by the following:- branches, which includes the decision of classification, or
decline, it includes the space of the automatic discovery of the
• G(S, A) is the gain of the subset S after a split over the mistakes.
A attribute;

156 | P a g e
www.ijacsa.thesai.org

You might also like