Cyber5 Cyber Security

A Study of Ensemble Methods for Cyber Security
Nicholas Lower and Felix Zhan
Abstract—Ensemble methods for machine learning serve to An important domain that utilizes machine learning is
increase the predictive power of preexisting models by applying cybersecurity, which is the practice of analyzing and studying
a meta-algorithm on top of the underlying workings of one or network activity and architecture to better protect against unau-
many prediction models. These ensemble models show promise
for anomaly detection over the simpler prediction models they thorized actions and other attacks. This area of research and
are built on-top of without much more admin work and theory, consumer products for cybersecurity is becoming increasingly
making them ideal for network intrusion detection. This study more important as the role of networking becomes integral to
looks at the advantages of these methods when applied to the a vast number of users’ lives, which brings along the onslaught
cybersecurity domain by using the widely used NSL-KDD intru- of hacks and attacks. Machine learning ideas and the use of
sion detection dataset. The types of ensemble methods studied
are voting, bagging, and boosting, specifically the algorithms models are valuable for this domain because the practice of
experimented with are the Voting classifier, boosting, Random packet analysis and finding signs of dangerous activity is easily
forest classifier, and AdaBoost classifier. related to using past examples and probabilities [7], [8].
Index Terms Motivation for using ensemble methods in the cybersecurity
domain is the shown power of such systems in anomaly
I. I NTRODUCTION detection under similar situations as general packet-switching
based networking [9], and also the increasing utility of such
It is often useful when making predictions in the real-world
systems in CPU and memory juggernaut data centers to filter
to consider others’ own predictions to solidify your choice
incoming traffic for foul play. Because ensembles are built
and form a more well-rounded decision. This concept is the
upon multiple smaller models, the time and memory efficiency
foundation of ensemble methods, which seek to strength a
lags behind simply using those individual models which made
machine learning model’s predictions by incorporating other
these methods undesirable for networking tools that prioritize
model’s predictions [1]. Many techniques to achieve this effect
speed of traffic, but with the abundance of CPU power and
are under the ensemble methods umbrella, the ones studied in
memory in large scale routers and servers in data centers
this paper are voting, bagging, and boosting [2], [3], [4], [5].
Ensemble methods work by utilizing multiple individual the useful detection of ensembles begin to outweigh the
models to create a singular truly confident final prediction. diminishing cost of their use. With an incredible amount of the
These individual models used in each ensemble are going to be consumer internet powered by these data centers [10], a single
referred to as the ”inner-models” of the ensemble throughout malicious attack on a center can render an uncomfortable
the paper. The inner-models are typically independent of the amount of users out of service or create a mass risk for theft,
ensemble algorithm, so any type of inner-models can be used so these centers are prime players for the potential benefits
initially then have an ensemble built on-top. This is a valuable ensemble methods bring to intrusion detection.
observation because tailoring the inner-models to the specific A. Related Work
problem at hand can greatly improve the ensemble’s overall
prediction performance, which is done in this paper by using The type of specific systems that detect these attacks are
decision trees, K-nearest neighbors, and logistical regression, Intrusion Detection Systems (IDS) [11]. These systems take
all of which have all been shown to perform well with the in network packets and look for telling signs of an attack,
NSL-KDD dataset [6]. A useful feature of ensembles is their such as flags set, protocol, duration, length, etc., so training a
ability to be used for both regression and classification without supervised learning model on these telling signs as features
altercations in the theoretical basis, allowing for ease of use for produces a prediction model telling you whether or not a
any type of typical machine learning problem. Many machine packet is part of an attack, and if it is part of an attack
learning or deep learning models that are in wide-use over the system can take precautions to avoid as much damage
nearly all domains are in fact ensembles that typically do the as possible and alert the user of all details if requested.
inner-model training and predicting hidden from the user; two Liao et al. give a comprehensive overview of the current
examples of these ensemble models are random forests and state of IDS [12], and they classify currently used methods
AdaBoost, which are tested in this paper for their effectiveness into five classes: Statistics-based, Pattern-based, Rule-based,
in tackling the NSL-KDD data prediction problem. State-based, and Heuristic-based. Using the same classification
characteristics, ensemble methods fall under the statistics-
based umbrella mostly with systems setting a predetermined
threshold confidence interval and triggering a warning when
1001
Authorized licensed use limited to: University of Adelaide. Downloaded on February 02,2024 at 15:35:29 UTC from IEEE Xplore. Restrictions apply.
a packet is classified b eyond t he t hreshold, b ut t here i s some choosing the label that has the highest probability. Formally
crossover with pattern-based systems depending on the amount this final prediction ŷ can be written as:
of rote learning is incorporated into the system by the admin m
user. Previous research on the application of machine-learning ŷ = argi max pij (1)
to IDS has brought favorable results, as Tsai et al. presented in j=0
a comprehensive review of research and studies [13], machine Where m is the number of all the models and pij is the
learning is applicable to the intrusion detection domain. Single prediction probability of classification label i of model j.
classifiers (including Support Vector Machines, artificial neural While the advantage of voting is a stronger prediction
networks, decision tress, etc.) have been studied thoroughly, when utilizing mediocre to effective inner-models, voting
with some studies reaching impressive accuracies of >80% performs very poorly when one or more inner-models have
with intensive feature engineering and hyper-parameter tuning, poor accuracy and precision. These inner-model’s defective
and hybrid methods (intelligent combining of multiple ideas predictions affect the voting process too much if no weighting
and methods behind single classifier a lgorithms i nto a single is used, so the final prediction can itself be defective. This
new algorithm) have shown even more promise and have can be counteracted with utilizing weak models that ”focus”
had a recent explosion in research. Ensemble methods have on separate aspects of the data, in other words in order to
also been been popping up in research recently as worthy get the full strength of a voting ensemble the inner-models
investments in cybersecurity, with increased accuracy over should all be different and have underlying algorithms that
single classifiers w ith m inimal a dditional e ffort a nd admin use and manipulate the dataset in differing ways. In this
work. The gathering of packets to serve as data points is sense, voting ensembles can be seen as being as powerful
a tricky process because normal packet traffic f or a u ser is as its participants. Another advantage of voting that is more
massive and has very little percentage of actual attacks to specific to ensemble building is its complete independence of
use as labels, so to ease the creation of IDS models, datasets the inner-models, that is voting does not require the inner-
containing normal and attack packets such as NSL-KDD exist models be connected at all, because only the final predictions
for training and testing. We go into more detail on the NSL- are used. This makes the ensemble easier to scale when more
KDD dataset in the Experiment section of the paper. Other and more predictions become available and it also allows the
works include [14]–[194]. inner-models themselves to be built off of completely different
II. E NSEMBLE M ETHODS BACKGROUND algorithms with no consequence.
In this section we will dive into each of the ensemble meth- B. Bagging
ods to better understand how they take multiple models and Bagging [3] is based on two ideas: Bootstrap sampling and
return a strong prediction. First we analyze the voting method, Aggregating. First, bootstrap sampling is the process of taking
then bagging, boosting, and finally e nding w ith stacking. a random sample from a dataset with replacement. This gives
us a truly random subset of the dataset to train multiple ”weak”
A. Voting individual models on. Second, aggregating is simply taking the
Voting is the simplest of the ensemble methods to under- predicted outputs of each weak model and combining them
stand because it mirrors our natural instinct when we consider into a strong final prediction, typically using a voting process.
multiple models, we tend to simply tally up all the results of Because bagging aggregates models that use a bootstrap
the models and output the majority winner. This is exactly how sample to train and not the whole ”true” dataset, bagging
a voting ensemble works, we take a ”vote” of all the inner- achieves a low variance in its model even if the inner-models
model’s predictions and return the result with the majority all have high variance themselves, and thus has very little
of the votes as the final prediction [2]. This only works for problem with overfitting, which is a major problem for single
classification, as tallying up raw numbers/probabilities will classifiers and voting ensembles. For bias, bagging models
rarely, if ever, lead to a majority winner, so in this case the tend to follow the bias of the underlying inner-models very
simplest method is returning the average of all the predictions. closely, so a disadvantage of bagging is the lack of ability to
An add-on to the voting process can be the inclusion of correct for low bias found in simple inner-model algorithms.
weights on the individual votes to skew the voting to favor A powerful example of a bagging model that is frequently
models that have shown to be more correct than others. used in real-world application is the Random Forest model
Typically, there are two different methods of voting, so [195]. Random Forest models are a type of bagging model that
called ”hard” voting and ”soft” voting. The previous example use decision trees exclusively as their inner-models plus the
of picking the majority winner of the vote process is hard added requirement that during the bootstrap sampling phase
voting, the final prediction of each inner-model is taken at of training not all features in the data are present in each
face-value and tallied, no matter how confident the prediction bootstrap sample, instead only a subset of given size are used.
is. Soft voting resolves this; in soft voting the actual confidence This further reduces variance in final predictions for the model,
probabilities of each prediction is taken into account. The and because decision trees typically have low bias but high
voting works by taking the average of all the confidence variance, and bagging reduces variance and keeps bias, this
probabilities for each predicted label for every model and method of ensemble greatly extends the power of decision
1002
trees and proves to be very effective in its real-world predictive is often valuable to test both on this multi-nominal target label
power. and a simpler binary label which just labels each instance as
A particularly useful feature of random forests is their either ”normal” or ”attack”.
ability to weigh the importance of certain features, a valuable Because the NSL-KDD features around 5 million instances,
tool in the feature selection process. This is done by using a 50% subset of both the test and train sets are used to make
mean decrease impurity and mean decrease accuracy, which computation faster. Also, in order to get the best performance
are the total averages of the impurity measures of each feature possible a number of feature engineering techniques are ap-
as determined by Gini impurity or information gain/entropy in plied, including using one-hot encoding and feature selection.
each of the random forest’s inner-model decision trees. Using One-hot encoding is applied to the whole dataset and because
these calculated importances, we can choose to use only the k the number of categorical features in NSL-KDD is quite high
most important features and reduce the dimensionality of the the number of features explodes to about around 120. So to
data. counteract this unwieldy number of features, feature selection
is used to choose only the most important 20 features as
C. Boosting determined by a series of random forest’s feature importance
Boosting is an ensemble technique that focuses on sequen- results.
tially turning, ”boosting”, weak models into stronger ones. A
weak model in this context is defined as a model that works B. Experiment
on the prediction using a single (or small subset of) areas The experiments to test the increase of performance of each
of correlation, for example in a spam filter a single area of ensemble method is done on the NSL-KDD dataset with binary
correlation might be whether or not an email contains a single labels described above. The inner-models themselves are in no
certain phrase. Boosting is all about taking these weak models way the strongest they can be, in that they are not entirely built
and creating a strong model which can successfully predict and tuned to get the greatest performance out of each, instead
using all correlations. they are typically simple forms of each in order to purely study
A popular ensemble algorithm that uses boosting is Adap- the quick and easy boost an ensemble method can provide. The
tive Boosting or ”AdaBoost”. As given by Freund and Schapire hyper-parameters used in this experiment are as follows1 :
[196], AdaBoost turns weak models into a strong model by Decision Trees - min sample split = 1000,
sequentially training each weak model with the a weighted min sample leaf = 1000, K-Nearest Neighbors - n neighbors
training dataset that puts heavy weights on the data instances = 1000, MLP Classifier - hidden layer = (100,100).
that were part of the previous model’s error, creating a line The voting experiment is done in two sections to evaluate
of stronger and stronger correlations until finally combining both hard and soft voting, each of these voting schema use
all the weak models into a strong ensemble that can predict the metrics:
based on multiple correlations with low bias. • IM - the names of the inner-models
D. Stacking • IMAA - the average accuracy of each individual inner-

model
An another ensemble worth describing is stacking [197],
• IMAP - the average precision of each individual inner-
which works by building a system where first many inner-
model
models are trained using the dataset and then each make a final
• IMAF - the average F1 score of each individual inner-
prediction. These predictions are then fed into a meta-model
model
that combines all these inputs into a final strong prediction.
• Acc. - the accuracy of the ensemble
The meta-model is what makes stacking stand out from simple
• Prec. - the precision of the voting ensemble
voting with each inner-model’s prediction, the meta-model
• F1 - the F1 score of the voting ensemble.
typically involves a complex combining or mixing of all
predictions to make the final strong prediction. Stacking will These metrics are measured in order to showcase the potential
not be experimented on in this paper. elevated results using an ensemble can bring compared to
the individual inner-models. Likewise, the bagging ensemble
III. E XPERIMENT experiment tests using the same metrics with the difference of
A. NSL-KDD Dataset each result coming from the number of decision tree inner-
models that were used. The random forest evaluations were
The dataset used for studying the effectiveness of the various
only measured using the Acc., Prec., and F1 scores because
ensemble methods is the NSL-KDD dataset [198]. This dataset
it is less valuable seeing the improved performance the act
is widely used for building and testing IDS models, it consists
of ensemble-ing the results brings; the outright accuracy,
of 4.9 million packet analysis instances of various intrusion
precision, and F1 score of the whole model is better to
attacks that fit into the five categories ddos, probe, r2l, and u2r.
study. The AdaBoost test follows the same idea, only the
Each instance has 41 features ranging from duration of attack,
performance of the whole model is given.
protocol, flags, etc. which describe the state of each packet
and a final class label which labels each instance as ”normal” 1 The names of the hyper-parameters follow those described in the Scikit-
or the name of the particular attack. For sake of experiment, it Learn docs.
1003
IV. R ESULTS performance, reaching very impressive numbers for models
The results of the hard voting test on the NSL-KDD dataset that are not finely-tuned, again showing the potential power
is shown in Table I. The inner-models (IM) used in the of the bagging ensemble. A reason for the impressive per-
experiment are denoted as follows: formance is Bagging’s avoidance of overfitting by using the
• D - Decision Tree
random smaller bootstrap samples of the dataset and then
• K - K Nearest Neighbors
aggregating, over using the entire training set to begin with.
• M - MLP Classifier
Overfitting is the major pitfall single classifiers, and in-turn
• L - Logistic Regression
voting classifiers, fall into and stunt their performance, so
The tests show that for a small voting poll of only two simple taking steps to minimize overfitting is critical for ensembles
models, a decision tree and K-Nearest Neighbors, the accuracy, and is primarily a focus behind their algorithm development.
precision, and F1 score have a clear increase over the average
inner-model’s. What can also be seen in the results is as TABLE IV
R ANDOM F OREST NSL-KDD R ESULTS FOR n T REES
the inner-models became weaker (the inner-model’s average
metrics fell) the ensemble’s overall performance fell as well, n Acc. Prec. F1
just not as dramatically. 5 0.822 0.723 0.821
10 0.857 0.774 0.849
100 0.863 0.781 0.855
TABLE I 1000 0.863 0.786 0.857
H ARD VOTING NSL-KDD R ESULTS
IMa IMAA IMAP IMAF Acc. Prec. F1

D, K 0.812 0.725 0.820 0.855 0.778 0.845 Likewise, the results of the random forest classifier shown
D, K, M 0.782 0.704 0.772 0.850 0.767 0.843 in Table IV demonstrate the predictive power of the ensemble,
D, K, M, L 0.641 0.589 0.721 0.784 0.677 0.795 reaching an impressive 86.3% accuracy and 0.857 F1 score.
a See Experiment subsection for abbrevation meanings.
The measures to avoid overfitting are also at play in the
Random Forest’s impressive performance, by - like bagging
- utilizing smaller bootstrap samples of dataset on top of
TABLE II using ”trimmed” inner-trees that have small depth and thus
S OFT VOTING NSL-KDD R ESULTS
are unable to overfit to the many specific features in each
IM IMAA IMAP IMAF Acc. Prec. F1 instance. It can speculated that with further hyper-parameter
D, K 0.786 0.675 0.796 0.801 0.684 0.802 tuning and feature engineering this ensemble can become a
D, K, M 0.689 0.603 0.739 0.792 0.681 0.801
D, K, M, L 0.681 0.551 0.619 0.794 0.683 0.802 top contender for predictive power on the NSL-KDD domain
and get a lot of real-world use.
The results of the soft voting test on the NSL-KDD dataset
TABLE V
is shown in Table II. The inner-models (IM) used in the exper- A DA B OOST NSL-KDD R ESULTS FOR n T REES
iment are denoted as follows: D - Decision Tree, K - K Nearest
Neighbors, M - MLP Classifier, and L - Logistic Regression. n Acc. Prec. F1
10 0.859 0.780 0.852
Like the hard-voting results the ensemble performed better 50 0.865 0.785 0.858
than the individual inner-models, but unlike the hard voting, 100 0.864 0.783 0.856
when the inner-models became weaker the overall ensemble’s 1000 0.864 0.785 0.856
performance stayed relatively stable instead of falling. This is
the benefit of soft voting over hard voting, because the voting
Finally, Table V shows the results for applying AdaBoost
considers the confidence of each inner model and not just
classifiers with n decision trees as inner-models on the NSL-
binary choices, when the inner models start becoming less and
KDD dataset. The accuracy, precision, and F1 score results
less confident the ensemble intuitively reacts and softens the
show impressive numbers for an out-of-box algorithm that has
fall in accuracy by considering a sum of every single choice
had no tuning or massaging. It is interesting to observe that
confidence.
the scores seem to not increase after a n somewhere between
TABLE III 10 and 50 and instead reach a somewhat constant state, this
BAGGING NSL-KDD R ESULTS FOR n D ECISION T REE C LASSIFIERS can be used for an advantage with AdaBoost classifiers, as
finding that point through some hyper-parameter search can
n IMAA IMAP IMAF Acc. Prec. F1
2 0.821 0.720 0.821 0.821 0.720 0.821 bring the greatest results with improved runtimes.
5 0.844 0.755 0.840 0.860 0.778 0.853 Comparing all the results we see Random Forests and
10 0.840 0.754 0.839 0.862 0.786 0.857 AdaBoost as the two stand-out stars. This can explained
20 0.854 0.771 0.847 0.862 0.784 0.855
by the special attention given to combat overfitting in their
algorithms, which is absent in the basic voting ensembles.
Table III gives the results of bagging n decision trees. Random Forest uses the explicit bootstrap datasets, while
As the number of decision trees increases so to does the AdaBoost uses more implicit weights for the dataset, but both
1004
serve to lower variance and thus improve performance. The [12] H. J. Liao, C. H. R. Lin, Y. C. Lin, and K. Y. Tung, “Intrusion detection
voting ensembles still improved over the performance of their system: A comprehensive review,” Journal of Network and Computer
Applications, vol. 36, no. 1, pp. 16–24, May-June 2013.
inner-models, and their ease of use and understanding is a plus [13] C. Tsai, Y. Hsu, C. Lin, and W. Lin, “Intrusion detection by machine
for teaching hand-implementing the systems over the more learning: A review,” Expert Systems with Applications, vol. 36, no. 10,
complicated algorithms like AdaBoost. Bagging vs Random pp. 11 994–12 000, December 2009.
[14] M. Schwob, J. Zhan, and D. A., “Modeling cell communication with
Forest is interesting because at the core both algorithms are time-dependent signaling hypergraphs,” IEEE/ACM Transactions on
notably similar, but the extra complexities Random Forests Computational Biology and Bioinformatics, 2019.
takes to build more refined inner-models and then aggregate [15] S. Chobe and J. Zhan, “Advancing community detection using keyword
attribute search,” Journal of Big Data, vol. 6, no. 83, 2019.
smarter takes its performance farther, but - again - at an [16] C. Chiu and J. Zhan, “An evolutionary approach to compact dag neural
increase in development and management complexity. network optimization,” IEEE Access, vol. 7, no. 1, pp. 178 331 –
178 341, 2019.
V. C ONCLUSION [17] ——, “Deep learning for link prediction in dynamic networks using
weak estimators,” IEEE Access, vol. 6, no. 1, pp. 35 937 – 35 945,
This paper was a study on the potential effectiveness of 2018.
ensemble methods when applied to the cybersecurity domain [18] M. Bhaduri and J. Zhan, “Using empirical recurrences rates ratio for
using the NSL-KDD for experiments on three existing ensem- time series data similarity,” IEEE Access, vol. 6, no. 1, pp. 30 855–
30 864, 2018.
ble algorithms. The ensembles studied in this paper, voting, [19] J. Wu, J. Zhan, and S. Chobe, “Mining association rules for low
bagging, and boosting, are by no means all-inclusive for frequency itemsets,” PLOS ONE, vol. 13, no. 7, 2018.
ensembles, many other ensembles tackle the goal of combining [20] P. Ezatpoor, J. Zhan, J. Wu, and C. Chiu, “Finding top-k dominance
on incomplete big data using mapreduce framework,” IEEE Access,
results for a smarter prediction, the ones studied in this paper vol. 6, no. 1, pp. 7872–7887, 2018.
are simply the most widely used. [21] P. Chopade and J. Zhan, “Towards a framework for community
It is important to note that these experiments only outline detection in large networks using game-theoretic modeling,” IEEE
Transactions on Big Data, vol. 5, no. 1, pp. 27 354–27 365, 2017.
the potential of the ensembles and their ability to elevate [22] M. Bhaduri, J. Zhan, and C. Chiu, “A weak estimator for dynamic
individual algorithms; the actual powerhouse models can be systems,” IEEE Access, vol. 5, no. 1, pp. 27 354–27 365, 2017.
created with more time and resources spent on devising and [23] M. Bhaduri, J. Zhan, C. Chiu, and F. Zhan, “A novel online and non-
refining the ensembles and their inner-models. It has been parametric approach for drift detection in big data,” IEEE Access,
vol. 5, no. 1, pp. 15 883–15 892, 2017.
shown in numerous papers that decision trees or KNN models [24] C. Chiu, J. Zhan, and F. Zhan, “Uncovering suspicious activity from
can achieve accuracy scores of 90% [199], [200], so it partially paired and incomplete multimodal data,” IEEE Access, vol. 5,
can be speculated that the recreation of these models for a no. 1, pp. 13 689 – 13 698, 2017.
[25] R. Ahn and J. Zhan, “Using proxies for node immunization identifica-
random forest or AdaBoost the predictions scores can reach tion on large graphs,” IEEE Access, vol. 5, no. 1, pp. 13 046–13 053,
very impressive numbers and give a powerful security tool to 2017.
consumers to fend against the increasing onslaught of intrusion [26] J. Zhan and B. Dahal, “Using deep learning for short text understand-
ing,” Journal of Big Data, vol. 4, no. 34, pp. 1–15, 2017.
based attacks. [27] J. Zhan, S. Gurung, and S. P. K. Parsa, “Identification of top-k nodes
in large networks using katz centrality,” Journal of Big Data, vol. 4,
R EFERENCES no. 16, 2017.
[1] G. Seni and J. Elder, “Ensemble methods in data mining,” Improving [28] J. Zhan, T. Rafalski, G. Stashkevich, and E. Verenich, “Vaccination
Accuracy Through Combining Predictions, 2010. allocation in large dynamic networks,” Journal of Big Data, vol. 4,
[2] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining no. 2, pp. 161–172, 2017.
classifiers,” IEEE Transactions on Pattern Analysis and Machine Intel- [29] J. Zhan, V. Gudibande, and S. P. K. Parsa, “Idenfication of top-k
ligence, vol. 20, no. 3, pp. 226–239, March 1998. influential communities in large networks,” Journal of Big Data, vol. 3,
[3] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, no. 2, no. 16, 2016.
pp. 123–140, 1996. [30] H. Selim and J. Zhan, “Towards shortest path identification on large
[4] T. M. Khoshgoftaar, J. V. Hulse, and A. Napolitano, “Comparing networks,” Journal of Big Data, vol. 3, no. 10, 2016.
boosting and bagging techniques with noisy and imbalanced data,” [31] X. Fang and J. Zhan, “Sentiment analysis using product review data,”
IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems Journal of Big Data, vol. 2, no. 5, pp. 1–14, 2015.
and Humans, vol. 41, no. 3, pp. 552–568, May 2011. [32] P. Chopade and J. Zhan, “Structural and functional analytics for
[5] R. Polikar, “Ensemble based systems in decision making,” IEEE community detection in large-scale complex networks,” Journal of Big
Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006. Data, vol. 2, no. 1, pp. 1–28, 2015.
[6] Y. Xin, L. Kong, Z. Liu, Y. Chen, Y. Li, H. Zhu, M. Gao, H. Hou, [33] J. Zhan and X. Fang, “A computational framework for detecting
and C. Wang, “Machine learning and deep learning methods for malicious actors in communities,” International Journal of Privacy,
cybersecurity,” IEEE Access, vol. 6, pp. 35 365–35 381, 2018. Security, and Integrity, vol. 2, no. 1, pp. 1–20, 2014.
[7] A. L. Buczak and E. Guven, “A survey of data mining and machine [34] A. Doyal and J. Zhan, “Towards ddos defense and traceback,” Inter-
learning methods for cyber security intrusion detection,” IEEE Commu- national Journal of Privacy, Security, and Integrity, vol. 1, no. 4, pp.
nications Surveys and Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016. 299–311, 2013.
[8] S. Dua and X. Du, “Data mining and machine learning in cybersecu- [35] J. Zhan, J. Oommen, and J. Crisostomo, “Anomaly detection in
rity,” Auerbach Publications, 2016. dynamic systems using weak estimator,” ACM Transaction on Internet
[9] G. F. Ciocarlie, U. Lindqvist, S. Nováczki, and H. Sanneck, “Detecting Technology, vol. 11, no. 1, pp. 53–69, 2011.
anomalies in cellular networks using an ensemble method,” in Pro- [36] J. Zhan and X. Fang, “Social computing: The state of the art,” In-
ceedings of the 9th International Conference on Network and Service ternational Journal of Social Computing and Cyber-Physical Systems,
Management, Zurich, 2013, pp. 171–174. vol. 1, no. 1, pp. 1–12, 2011.
[10] T. Day and N. D. Pham, “Data centers,” U.S. Chamber of Commerce [37] N. Mead, M. S., and J. Zhan, “Integrating privacy requirements con-
Technology Engagement Center, 2017. siderations into a security requirements engineering method and tool,”
[11] B. Mukherjee, L. T. Heberlein, and K. N. Levitt, “Network intrusion International Journal of Information Privacy, Security and Integrity,
detection,” IEEE Network, vol. 8, no. 3, pp. 26–41, May-June 1994. vol. 1, no. 1, pp. 106–126, 2011.
1005
[38] J. Zhan, “Granular computing in privacy-preserving data mining,” In- recommendations,” in IEEE Consumer Communications and Network-
ternational Journal of Granular Computing, Rough Sets and Intelligent ing Conference, Las Vegas, USA, January 2018.
Systems, vol. 1, no. 3, pp. 272–288, 2010. [59] F. Zhan, G. Laines, S. Deniz, S. Paliskara, I. Ochoa, I. Guerra,
[39] J. Wang, J. Zhang, and J. Zhan, “Towards real-time performance of S. Tayeb, C. Chiu, M. Pirouz, E. Ploutz, L. G. Justin Zhan and, and
data privacy protection,” International Journal of Granular Computing, P. Oh, “Prediction of online social networks users’ behaviors with a
Rough Sets and Intelligent Systems, vol. 1, no. 4, pp. 329–342, 2010. game theoretic approach,” in IEEE Consumer Communications and
[40] J. Zhan, “Secure collaborative social networks,” IEEE Transaction on Networking Conference, Las Vegas, USA, January 2018.
Systems, Man, and Cybernetics, Part C, vol. 40, no. 6, pp. 682–689, [60] S. Tayeb, M. Pirouz, B. Cozzens, R. Huang, M. Jay, K. Khembunjong,
2010. S. Paliskara, F. Zhan, M. Zhang, J. Zhan, and S. Latifi, “Toward data
[41] J. Zhan, H. C., I. Wang, T. Hsu, C. Liau, and W. D., “Privacy-preserving quality analytics in signature verification using a convolutional neural
collaborative recommender systems,” IEEE Transaction on Systems, network,” in IEEE International Conference on Big Data, Boston,
Man, and Cybernetics, Part C, vol. 40, no. 4, pp. 472–476, 2010. USA, December 2017, pp. 2644–2651.
[42] H. Park, J. Hong, J. Park, J. Zhan, and D. Lee, “Attribute-based access [61] P. Chopade and J. Zhan, “Large-scale big data networks analytics and
control using combined authentication technologies,” IEEE Transaction community detection,” in IEEE International Symposium on Technolo-
on Mobile Computing, vol. 9, no. 6, pp. 824–837, 2010. gies for Homeland Security, Waltham, MA USA, April 2017.
[43] I. Wang, C. Shen, J. Zhan, T. Hsu, C. Liau, and D. Wang, “Empirical [62] ——, “Efficient detection of communities and prediction of abnormal
evaluations of secure scalar product,” IEEE Transactions on Systems, events, situational awareness in large complex networks,” in IEEE
Man, and Cybernetics, Part C, vol. 39, no. 4, pp. 440–447, 2009. International Symposium on Technologies for Homeland Security,
[44] N. Mead, V. Viswanathan, and J. Zhan, “Incorporating security re- Waltham, MA USA, April 2017.
quirements engineering into standard lifecycle processes,” International [63] P. Chopade, J. Zhan, and M. Bikdash, “Micro-community detection
Journal of Security and Its Applications, vol. 2, no. 4, pp. 67–80, 2008. and vulnerability identification for large critical networks,” in IEEE
[45] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving multi-party International Symposium on Technologies for Homeland Security,
decision tree induction,” International Journal of Business Intelligence Waltham, Massachusetts, USA, May 2016.
and Data Mining, vol. 2, no. 2, pp. 197–212, 2007. [64] ——, “Node attributes and edge structure for large-scale big data
[46] ——, “Building k-nearest neighbor classifiers on vertically partitioned network analytics and community detection,” in IEEE International
private data,” International Journal of Network Security, vol. 1, no. 1, Symposium on Technologies for Homeland Security, Boston, USA,
pp. 46–51, 2005. April 2015.
[65] B. Kaur, M. Blow, and J. Zhan, “A novel approach for authentication
[47] J. Zhan and S. Matwin, “Privacy preserving support vector machine
using steganography,” in International Conference on Cyber Security,
classification,” International Journal of Intelligent Information and
Stanford University, CA, USA, May 2015.
Database Systems, vol. 1, no. 3/4, pp. 356–385, 2005.
[66] P. Chopade and J. Zhan, “A game theoretic modeling approach to com-
[48] F. Zhan, A. Martinez, N. Rai, R. McConnell, M. Swan, M. Bhaduri,
munity detection in large-scale networks,” in International Conference
J. Zhan, L. Gewali, and P. Oh, “Beyond cumulative sum charting in
on Big Data, Stanford University, CA, USA, May 2015.
non-stationarity detection and estimation,” IEEE Access, 2019.
[67] P. Chopade, J. Zhan, and C. K., “Smart and effective large-scale system
[49] A. Hart, B. Smith, S. Smith, E. Sales, J. Hernandez-Camargo, Y. M. risk analysis,” in Society for Risk Analysis (SRA) Annual meeting,
Garcia, F. Zhan, L. Griswold, B. Dunkelberger, M. R. Schwob, Crystal Gateway Marriot, Arlington, Virginia, USA, December 2015.
S. Chaudhry, J. Zhan, L. Gewali, and P. Oh, “Resolving intravoxel [68] P. Chopade, J. Zhan, K. Roy, and K. Flurchick, “Real-time large-
white matter structures in the human brain using regularized regression scale big data networks analytics and visualization architecture,” in The
and clustering,” Journal of Big Data, vol. 6, 2019. 12th International Conference on Emerging Technologies for a Smarter
[50] F. Zhan, “Hand gesture recognition with convolution neural networks,” World, Stony Brook University, Long Island, New York, USA, October
in Proceedings of IEEE 20th International Conference on Information 2015.
Reuse and Integration for Data Science, Los Angeles, CA, USA, July [69] Y. R. Bachupally and J. Zhan, “Towards a recommender system for
31-August 1 2019. databridge,” in The Fourth ASE International Conference on Big Data,
[51] ——, “How to optimize social network influence,” in Proceedings of Harvard University, Cambridge, MA, USA, December 2014.
the IEEE Second International Conference on Artificial Intelligence [70] B. Kaur, M. Blow, and J. Zhan, “Digital image authentication in
and Knowledge Engineering (AIKE), Cagliari, Sardinia, Italy, June 3-5 social media,” in The Sixth ASE International Conference on Privacy,
2019. Security, Risk, and Trust, Harvard University, Cambridge, MA, USA,
[52] E. Aguilar, J. Dancel, D. Mamaud, D. Pirosch, F. Tavacoli, F. Zhan, December 2014.
R. Pearce, M. Novack, H. Keehu, B. Lowe, J. Zhan, L. Gewali, and [71] A. Albu-Shamah and J. Zhan, “Discovering hidden networks based on
P. Oh, “Highly parallel seedless random number generation from arbi- twitter texts,” in The Third ASE International Conference on Social
trary thread schedule reconstruction,” in IEEE International Conference Informatics, Harvard University, Cambridge, MA, USA, December
on Big Knowledge, Beijing, China, November 10-11 2019. 2014.
[53] E. Hunt, R. Janamsetty, C. Kinares, C. Koh, A. Sanchez, F. Zhan, [72] H. Selim, P. Chopade, and J. Zhan, “Node degree and edge cluster-
M. Ozdemir, S. Waseem, O. Yolcu, B. Dahal, J. Zhan, L. Gewali, ing correlation for community detection in big data and large-scale
and P. Oh, “Machine learning models for paraphrase identification networks,” in The Fourth ASE International Conference on Big Data,
and its applications on plagiarism detection,” in IEEE International Harvard University, Cambridge, MA, USA, December 2014.
Conference on Big Knowledge, Beijing, China, November 10-11 2019. [73] B. Meyer and J. Zhan, “Drowning in opinions: Extracting the pearls,”
[54] M. Bhaduri, J. Zhan, C. Chiu, and F. Zhan, “A novel online and non- in The Third ASE International Conference on Social Informatics,
parametric approach for drift detection in big data,” IEEE Access, Harvard University, Cambridge, MA, USA, December 2014.
vol. 5, pp. 15 883–15 892, 2017. [74] P. Chopade, J. Zhan, and M. Bikdash, “Supervized community detec-
[55] C. Chiu, J. Zhan, and F. Zhan, “Uncovering suspicious activity from tion for big data and large-scale complex networks,” in The Fourth ASE
partially paired and incomplete multimodal data,” IEEE Access, vol. 5, International Conference on Big Data, Harvard University, Cambridge,
pp. 13 689–13 698, 2017. MA, USA, December 2014.
[56] D. Pintado, V. Sanchez, E. Adarve, M. Mata, Z. Gogebakan, B. Cabuk, [75] P. Chopade, K. Flurchick, J. Zhan, and M. Bikdash, “Visualization tech-
C. Chiu, J. Zhan, L. Gewali, and P. Oh, “Deep learning based shopping niques for large-scale big data networks: Smart power grid survivability
assistant for the visually impaired,” in IEEE International Conference in a complex operating environment,” in The Fourth ASE International
on Consumer Electrics, Las Vegas, USA, January 2019. Conference on Big Data, Harvard University, Cambridge, MA, USA,
[57] F. Fessahaye, L. Perez, T. Zhan, R. Zhang, C. Fossier, R. Markarian, December 2014.
C. Chiu, J. Zhan, L. Gewali, and P. Oh, “T-recsys: A novel music [76] Y.-T. Chiang, T.-S. Hsu, C.-J. Liau, Y.-C. Liu, C.-H. Shen, D.-W. Wang,
recommendation system using deep learning,” in IEEE International and J. Zhan, “An information-theoretic approach for secure protocol
Conference on Consumer Electrics, Las Vegas, USA, January 2019. composition,” in The Tenth International Conference on Security and
[58] F. Zhan, G. Laines, S. Deniz, S. Paliskara, I. Ochoa, I. Guerra, Privacy in Communication Networks, Beijing, China, September 2014.
M. Pirouz, C. Chiu, S. Tayeb, E. Ploutz, J. Zhan, L. Gewali, and [77] Y. Lu, X. Fang, and J. Zhan, “Data readiness level for unstructured
P. Oh, “An efficient alternative to personalized page rank for friend data with a focus on unindexed data,” in The Third ASE International
1006
Conference on Big Data, Tsinghua University, Beijing, China, August on Social Computing. Amsterdam, Netherlands: IEEE CS Press,
2014. September 2012.
[78] ——, “Towards data readiness level for structured data,” in The Sec- [99] A. Albu-Shamah and J. Zhan, “Towards an optimizing model for older
ond ASE International Conference on Big Data, Stanford University, people at risk of falls,” in 2012 International Conference on Biomedical
Stanford, CA, USA, May 2014. Computing. Washington D.C., USA: IEEE CS Press, December 2012.
[79] X. Fang and J. Zhan, “A novel approach for evaluating semantic [100] A. Doyal, J. Zhan, and A. Yu, “Towards defending ddos attacks,” in
similarity measures,” in The Sixth ASE International Conference on 2012 International Conference on Cyber Security. Washington D.C.,
Social Computing, Stanford University, Stanford, CA, USA, May 2014. USA: IEEE CS Press, December 2012.
[80] P. Chopade, M. Bikdash, and J. Zhan, “Grid reliability: Need of [101] M. Kanampiu and J. Zhan, “A dynamic covert passive actors detection
developing self-healing and self-mitigating smart power grid,” in The scheme for a healthy networked community,” in 2012 International
Second International Symposium on Energy Challenges and Mechanics Conference on Cyber Security. Washington D.C., USA: IEEE CS
(ECM2), Aberdeen, Scotland, UK, August 2014. Press, December 2012.
[81] H. Selim, P. Chopade, and J. Zhan, “Statistical modeling and scalable, [102] X. Fang and J. Zhan, “A computational framework for detecting
interactive visualization of large scale/big data networks,” in ASE Inter- malicious actors in communities,” in 2012 International Conference on
national Workshop on Big Data Analytics for Predictive Organization Social Informatics. Washington D.C., USA: IEEE CS Press, December
and Big Transformations, Stanford University, Stanford, CA, USA, 2012.
May 2014. [103] J. Zhan and X. Fang, “Location privacy protection on social networks,”
[82] P. Chopade and J. Zhan, “Community detection in large scale big data in Proceedings of International Conference on Social Computing,
networks,” in The Second ASE International Conference on Big Data, Behavioral-Cultural Modeling and Prediction, Maryland, USA, March
Stanford University, Stanford, CA, USA, May 2014. 2011.
[83] B. Kaur, M. Blow, and J. Zhan, “Authenticity of images in social [104] ——, “Trust maximization in social networks,” in Proceedings of
media,” in NSF Workshop on Big Data Security and Privacy, University International Conference on Social Computing, Behavioral-Cultural
of Texas at Dallas, TX, USA, September 2014. Modeling and Prediction, Maryland, USA, March 2011.
[84] B. Kaur and J. Zhan, “Malware detection using the weak estimator,” in [105] ——, “Authentication using multi-level social networks,” in Interna-
The Third ASE International Conference on Cyber Security, Stanford tional Conference on Knowledge Discovery, Knowledge Engineering
University, Stanford, CA, USA, May 2014. and Knowledge Management, Springer Series on Communications in
[85] H. Selim, P. Chopade, and J. Zhan, “Structural analysis and interactive Computer and Information Science, vol. 128, 2011.
visualization of large scale big data networks,” in The IEEE 11th [106] J. Zhan, X. Fang, and P. Killion, “Protecting location data on social
International Conference and Expo on Emerging Technologies for a networks,” in IEEE Symposium Series on Computational Intelligence.
Smarter World, Stony Brook University, New York, USA, October Paris, France: IEEE, April 2011.
2014. [107] J. Zhan, X. Fang, and N. Bandaru, “Protecting location data on social
[86] ——, “Interactive visualization of large scale big data networks,” in networks,” in IEEE Symposium Series on Computational Intelligence.
The Second ASE International Conference on Big Data Science and Paris, France: IEEE, April 2011.
Computing, Stanford University, Stanford, CA, USA, May 2014. [108] T. L. P. V. Zhan, J., “Using gaming strategies for attacker and
[87] B. Meyer and J. Zhan, “A look inside the nodes cohesive attribute defender in recommender systems,” in IEEE Symposium Series on
micro-clustering,” in The Second ASE International Conference on Big Computational Intelligence. Paris, France: IEEE, April 2011.
Data Science and Computing, Stanford University, Stanford, CA, USA, [109] J. Zhan and L. Thomas, “Phishing detection using stochastic learning-
2014. based weak estimators,” in IEEE Symposium Series on Computational
[88] A. Albu-Shamah and J. Zhan, “Obesity in online social networks,” in Intelligence. Paris, France: IEEE, April 2011.
The Sixth ASE International Conference on Social Computing, Stanford [110] J. Zhan, J. Oommen, and J. Crisostomo, “System call anomaly detec-
University, Stanford, CA, USA, May 2014. tion,” in IEEE International Conference on Intelligence and Security
[89] C. Passmore and J. Zhan, “Determining appropriate staffing adjust- Informatics. Beijing, China: IEEE, July 2011.
ments in a call center staff group,” in 2013 ASE/IEEE International [111] X. Fang, J. Zhan, N. Koceja, K. Williams, and J. Brewton, “Inte-
Conference on Big Data, Washington D.C., USA, September 2013. grating online social networks for enhancing reputation systems of
[90] C. T. C. J. C. M. K. G. K. H. L. H. S. S. Z. J. Rajasekar, A., e-commerce,” in IEEE International Conference on Intelligence and
“Sociometric methods for relevancy analysis of long tail science data,” Security Informatics. Beijing, China: IEEE, July 2011.
in 2013 ASE/IEEE International Conference on Big Data, Washington [112] J. Zhan, N. Jones, , and M. Purnell, “Top-k algorithm for recommenda-
D.C., USA, September 2013. tion in social networking kingdoms,” in IEEE International Conference
[91] X. Fang, J. Zhan, and N. Koceja, “Towards network reduction on on Social Computing. MIT, Boston, USA: IEEE CS Press, October
big data,” in 2013 ASE/IEEE International Conference on Big Data, 2011.
Washington D.C., USA, September 2013. [113] J. Zhan and X. Fang, “Anomaly detection in social-economic comput-
[92] A. Albu-Shamah and J. Zhan, “Towards obesity causes, prevalence and ing,” in IEEE International Conference on Social Computing. MIT,
prevention,” in 2013 ASE/IEEE International Conference on Biomedi- Boston, USA: IEEE CS Press, October 2011.
cal Computing, Washington D.C., USA, September 2013. [114] ——, “A novel trust computing system for social networks,” in IEEE
[93] C. Barron, H. Yu, and J. Zhan, “Cloud computing security case International Conference on Privacy, Security, Risk and Trust. MIT,
studies and research,” in 2013 International Conference of Parallel Boston, USA: IEEE CS Press, October 2011.
and Distributed Computing, London, UK, July 2013. [115] J. Zhan, L. Cabrera, G. Osman, and R. Shah, “Using private matching
[94] J. Lake, X. Yuan, and J. Zhan, “Towards authentication vulnerabilities for securely querying genomic sequences,” in IEEE International
in openemr,” in The 2013 Symposium on Computing at Minority Conference on Privacy, Security, Risk and Trust. MIT, Boston, USA:
Institutions, Virginia Beach, VA, USA, April 2013. IEEE CS Press, October 2011.
[95] H. Yu, J. Rann, and J. Zhan, “Such: A cloud computing management [116] T. Allan and J. Zhan, “Towards fraud detection methodologies,” in
tool,” in Proceedings of the 5th IFIP International Conference on New Proceedings of the International Symposium on Financial Security.
Technologies, Mobility and Security. Brisbane, Australia: IEEE CS Busan, Korea: IEEE CS Press, May 2010.
Press, May 2012. [117] C. Knopik and J. Zhan, “The effects of financial crises on american
[96] X. Fang, N. Koceja, J. Zhan, G. Dozier, and D. Dasgupta, “An financial institutions information security,” in Proceedings of the Inter-
artificial immune system for phishing detection,” in IEEE Congress national Symposium on Financial Security. Busan, Korea: IEEE CS
on Evolutionary Computation. Brisbane, Australia: IEEE Press, June Press, May 2010.
2012. [118] J. Henkel and J. Zhan, “Remote deposit capture in the consumer’s
[97] M. Kanampiu, J. Zhan, and J. Baek, “A secure group collaboration hands,” in Proceedings of the International Symposium on Financial
protocol for nonverbal human social signals featuring deception detec- Security. Busan, Korea: IEEE CS Press, May 2010.
tion,” in 2012 International Conference on Privacy, Security, Risk, and [119] D. Erhardt and J. Zhan, “A proactive approach to detecting and reducing
Trust. Amsterdam, Netherlands: IEEE CS Press, September 2012. information security threats in billing systems,” in Proceedings of the
[98] X. Fang and J. Zhan, “Task-oriented social ego network generation International Symposium on Financial Security. Busan, Korea: IEEE
via dynamic collaborator selection,” in 2012 International Conference CS Press, May 2010.
1007
[120] K. Beck and J. Zhan, “Phishing in finance,” in Proceedings of the [141] H. Park, J. Zhan, and D. Lee, “Privacy-aware access control through
International Symposium on Financial Security. Busan, Korea: IEEE negotiation in daily life service,” in International Conference on
CS Press, May 2010. Intelligence and Security, vol. 5075. Springer, 2008, pp. 514–519.
[121] X. Fang and J. Zhan, “Online banking authentication using mobile [142] H. Park, D. Lee, J. Zhan, and G. Blosser, “Efficient keyword index
phones,” in Proceedings of the International Symposium on Financial search over encrypted data of groups,” in Proceedings of the IEEE
Security. Busan, Korea: IEEE CS Press, May 2010. International Conference on Intelligence and Security Informatics, June
[122] Y. Duan, J. Canny, and J. Zhan, “Practical large-scale privacy- 2008, pp. 225–229.
preserving distributed computation robust against malicious users,” in [143] C. Hsu, J. Zhan, W. Fang, and J. Ma, “Towards improving qos-guided
Proceedings of 19th USENIX Security Symposium, Washington, D.C., scheduling in grids,” in Proceedings of the 3rd ChinaGrid Annual
USA, August 2010. Conference, Dunhuang,China, August 2008, pp. 89–95.
[123] T. Yu, D. Lee, and J. Zhan, “Multi-party k-means clustering with [144] J. Zhan and T. Lin, “Granular computing in privacy-preserving data
privacy consideration,” in Proceedings of IEEE International Sympo- mining,” in Proceedings of the IEEE International Conference on
sium on Parallel and Distributed Processing with Applications, Taipei, Granular Computing, Hangzhou, China, August 2008, pp. 86–92.
Taiwan, September 2010. [145] J. Zhan, I. Cheng, C. Hsieh, T. Hsu, C. Liau, and D. Wang, “Towards
[124] K. Beck and J. Zhan, “Phishing using a modified bayesian technique,” efficient privacy-preserving collaborative recommender systems,” in
in Proceedings of International Symposium on Social Computing Proceedings of the IEEE International Conference on Granular Com-
Applications. Minneapolis, USA: IEEE CS Press, August 2010, pp. puting, Hangzhou, China, August 2008, pp. 778–783.
1–5. [146] H. Park, D. Lee, and J. Zhan, “Attribute-based access control using
[125] J. Zhan and X. Fang, “A computational trust framework for social combined authentication technologies,” in Proceedings of the IEEE
computing,” in Proceedings of IEEE International Conference on International Conference on Granular Computing, Hangzhou, China,
Social Computing. Minneapolis, USA: IEEE CS Press, August 2010, August 2008, pp. 518–523.
pp. 1–5. [147] J. Wang, J. Zhan, and J. Zhang, “Towards real-time performance of
[126] C. Su, J. Zhan, and K. Sakurai, “Importance of data standardization data value hiding for frequent data updates,” in Proceedings of the
in privacy-preserving k- means clustering,” in Proceedings of the 2009 IEEE International Conference on Granular Computing, Hangzhou,
Symposium on Cryptography and Information Security, Otsu, Japan, China, August 2008, pp. 606–611.
January 2009, pp. 1–5. [148] C. Shen, J. Zhan, T. Hsu, C. Liau, and D. Wang, “Scalar product-
[127] B. Wasser and J. Zhan, “Practical values for privacy,” in Proceedings based secure two party computation,” in Proceedings of the IEEE
of the International Symposium on Privacy and Security Applications. International Conference on Granular Computing, Hangzhou, China,
Vancouver, Canada: IEEE CS Press, August 2009. August 2008, pp. 600–605.
[128] J. Zhan, J. Oommen, and J. Crisostomo, “Anomaly detection in [149] S. Miyazaki, N. Mead, and J. Zhan, “Computer-aided privacy require-
dynamic social systems,” in Proceedings of the IEEE International ments elicitation techniques,” in Proceedings of the IEEE Asia Pacific
Conference on Social Computing. Vancouver, Canada: IEEE CS Press, International Conference on Services Computing, Yilan, Taiwan, De-
August 2009, pp. 18–25. cember 2008, pp. 367–372.
[129] J. Zhan and V. Rajamani, “The economics of privacy: People, policy [150] Y. Duan, J. Canny, and J. Zhan, “Efficient privacy-preserving associa-
and technology,” in Proceedings of the International Conference on tion rule mining: P4p style,” in Proceedings of the IEEE Symposium on
Information Security and Assurance. IEEE CS Press, April 2008, pp. Computational Intelligence and Data Mining, Honolulu, Hawaii, USA,
579–584. April 2007, pp. 654–660.
[130] I. Wang, C. Shen, T. Hsu, C. Liau, D. Wang, and J. Zhan, “Towards [151] J. Zhan, “Privacy preserving data mining in digital age,” in the Pacific
empirical aspect of secure scalar product protocol,,” in Proceedings of Asia Workshop on Intelligence and Security Informatics, Chengdu,
the International Conference on Information Security and Assurance. China, April 2007.
IEEE CS Press, April 2008, pp. 573–578. [152] ——, “Quantifying privacy for privacy preserving data mining,” in
[131] N. Mead, V. Viswanathan, and J. Zhan, “Incorporating security require- Proceedings of the IEEE Symposium on Computational Intelligence
ments engineering into the rational unified process,” in Proceedings of and Data Mining, Honolulu, Hawaii, USA, April 2007, pp. 630–636.
the International Conference on Information Security and Assurance. [153] ——, “Using homomorphic encryption and digital envelope techniques
IEEE CS Press, April 2008, pp. 537–542. for privacy preserving collaborative sequential pattern mining,” in
[132] X. Xu and J. Zhan, “Security applications in dynamics evolution sys- Proceedings of the IEEE International Conference on Intelligence and
tems,” in Proceedings of the International Conference on Information Security Informatics, New Jersey, USA, May 2007, pp. 331–334.
Security and Assurance. IEEE CS Press, April 2008, pp. 567–572. [154] ——, “Privacy preserving collaborative data mining,” in Proceedings
[133] C. Hsieh, J. Zhan, D. Zeng, and F. Wang, “Privacy-preserving in joining of the IEEE International Conference on Intelligence and Security
recommender systems,” in Proceedings of the International Conference Informatics, New Jersey, USA, May 2007, pp. 208–218.
on Information Security and Assurance. IEEE CS Press, April 2008, [155] ——, “Privacy preserving decision tree classification in horizontal
pp. 561–566. collaboration,” in Proceedings of the International Conference on
[134] K. Prakobphol and J. Zhan, “A novel outlier detection scheme for net- Security of Information and Networks, North Cyprus, May 2007, pp.
work intrusion detection systems,” in Proceedings of the International 53–58.
Conference on Information Security and Assurance. IEEE CS Press, [156] C. Purcell and J. Zhan, “Adapting us privacy laws to the internet: Is
April 2008, pp. 555–560. patching enough?” in Proceedings of the International Conference on
[135] H. Park and J. Zhan, “Privacy-preserving sql queries,” in Proceedings Machine Learning and Cybernetics, Hong Kong, August 2007, pp.
of the International Conference on Information Security and Assurance. 3000–3005.
IEEE CS Press, April 2008, pp. 549–554. [157] G. Blosser and J. Zhan, “Maintaining k-anonymity on real-time data,”
[136] G. Blosser and J. Zhan, “Privacy-preserving collaborative social net- in Proceedings of the International Conference on Machine Learning
works,” in Proceedings of the International Conference on Information and Cybernetics, Hong Kong, August 2007, pp. 3012–3015.
Security and Assurance. IEEE CS Press, April 2008, pp. 543–548. [158] R. Wei and J. Zhan, “A proposal for customers’ privacy preference
[137] J. Zhan, G. Blosser, C. Yang, and L. Singh, “Privacy-preserving policy,” in Proceedings of the International Conference on Machine
collaborative social networks,” in Lecture Notes in Computer Science, Learning and Cybernetics, Hong Kong, August 2007, pp. 3022–3027.
vol. 5075. Springer, 2008, pp. 114–125. [159] C. Shen, J. Zhan, D. Wang, T. Hsu, and C. Liau, “Information
[138] G. Blosser and J. Zhan, “Privacy-preserving collaborative e-voting,” in theoretically secure number product protocol,” in Proceedings of the
International Conference on Intelligence and Security Informatics, vol. International Conference on Machine Learning and Cybernetics, Hong
5075. Springer, 2008, pp. 508–513. Kong, August 2007, pp. 3006–3011.
[139] C. Hsieh and J. Zhan, “Privacy-preserving collaborative recommender [160] J. Zhan, “Privacy-preserving k-medoids clustering,” in Proceedings of
systems for promoting sales in e-commerce,” in the Annual RAKUTEN the IEEE International Conference on Systems, Man, and Cybernetics,
Conference, Japan, 2008. Montreal, Canada, October 2007, pp. 3323–3326.
[140] X. Xu, J. Zhan, and H. Zhu, “Using social networks to organize [161] ——, “Privacy-preserving k-medoids clustering,” in Proceedings of the
researcher community,” in International Conference on Intelligence and IEEE International Conference on Systems, Man, and Cybernetics,
Security Informatics, vol. 5075. Springer, 2008, pp. 421–427. Montreal, Canada, October 2007, pp. 3600–3603.
1008
[162] R. Wei and J. Zhan, “Improved privacy preference policy,” in Proceed- [181] ——, “How to prevent private data from being disclosed to a malicious
ings of the IEEE International Conference on Granular Computing, attacker,” in Proceedings of the IEEE International Conference on Data
Silicon Valley, USA, November 2007, pp. 787–787. Mining Workshop on Foundations of Semantic Oriented Data and Web
[163] W. Zhang and J. Zhan, “A scientific decoding of yinyang 1-2-4-8- Mining, Houston, Texas, USA, November 2005, pp. 41–46.
64 for equilibrium-based granular computing,” in Proceedings of the [182] J. Zhan and S. Matwin, “Privacy and security issues in medical infor-
IEEE International Conference on Granular Computing, Silicon Valley, matics,” in the Electronic Health Information and Privacy Conference,
USA, November 2007, pp. 374–380. Ottawa, Canada, November 2005.
[164] K. Shin and J. Zhan, “A verification schemes for data aggregation in [183] J. Zhan, S. Matwin, and L. Chang, “Privacy-preserving naı̈ve bayesian
data mining,” in Proceedings of the IEEE International Conference on classification over horizontally partitioned data,” in Proceedings of
Granular Computing, Silicon Valley, USA, November 2007, pp. 374– the International Conference on Electronic Business, Hong Kong,
380. December 2005.
[165] L. Singh and J. Zhan, “Measuring topological anonymity in social [184] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving sequential
networks,” in Proceedings of the IEEE International Conference on pattern mining over vertically partitioned data,” in Proceedings of
774. December 2005.
[166] H. Park, B. Kim, D. Lee, Y. Chung, and J. Zhan, “Secure similarity [185] J. Zhan, M. S., and L. Chang, “Privacy-preserving decision tree
search,” in Proceedings of the IEEE International Conference on classification over horizontally partitioned data,” in Proceedings of
598. December 2005.
[167] J. Zhan, “The economic aspects of privacy preserving data mining [186] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving support vector
(poster),” in the IEEE Symposium on Security and Privacy, Oakland, machine learning,” in Proceedings of the International Conference on
California, USA, May 2006, pp. 5–14. Electronic Business, Hong Kong, December 2005.
[168] ——, “Privacy-preserving collaborative sequential pattern mining,” [187] J. Zhan, S. Matwin, and L. Chang, “Privacy-preserving association rule
in Proceedings of the ACM SIGKDD International Conference on mining,” in Proceedings of the 2005 Annual IFIP WG 11.3 Working
Knowledge Discovery and Data Mining, Workshop on Theory and Conference on Data and Applications Security, Storrs, Connecticut,
Practice of Temporal Data Mining, Philadelphia, USA, August 2006, USA, August 2005, pp. 153–165.
pp. 5–14. [188] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving naive bayesian
[169] ——, “Privately mining applications in bioinformatics,” in the Seventh classification,” in Proceedings of the IASTED International Conference
International Conference on Systems Biology, Yokohama, Japan, Oc- on Artificial Intelligence and Applications, Innsbruck, Austria, Febru-
tober 2006, pp. 4102–4105. ary 2004, pp. 141–155.
[170] J. Zhan and S. Matwin, “Privacy-oriented learning systems,” in Pro- [189] ——, “Privacy-preserving collaborative sequential pattern mining,” in
ceedings of the IEEE International Conference on Systems, Man, and Proceedings of the SIAM International Conference on Data Mining,
Cybernetics, Taipei, Taiwan, October 2006, pp. 4102–4105. Workshop on Link Analysis, Counter-terrorism, and Privacy, Lake
[171] ——, “A crypto-approach to privacy-preserving data mining,” in Buena Vista, Florida, April 2004, pp. 61–72.
Proceedings of the IEEE International Conference on Data Mining [190] J. Zhan and L. Chang, “Privacy-preserving collaborative sequential
Workshop on Privacy Aspect of Data Mining, Hong Kong, December pattern mining with horizontally partitioned datasets,” in Proceedings
2006, pp. 546–550. of the International Conference on Data Privacy and Security in A
[172] J. Zhan, L. Chang, and S. Matwin, “Collaborative association rule Global Society, Skiathos, Greece, May 2004, pp. 242–252.
mining by sharing private data,” in Proceedings of the Montreal [191] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving multi-party
Conference On E-Technologies, Montreal, Canada, January 2005, pp. decision tree classification,” in Proceedings of the 2004 Annual IFIP
193–197. WG 11.3 Working Conference on Data and Application Security, Sitges,
[173] J. Zhan, S. Matwin, and L. Chang, “Private mining of association Catalonia, Spain, July 2004, pp. 341–355.
rules,” in Proceedings of the IEEE International Conference on In- [192] ——, “Collaborative data mining and privacy protection,” in Founda-
telligence and Security Informatics, Atlanta, Georgia, May 2005, pp. tion and Novel Approach in Data Mining, Edited by T.Y. Lin, S. Ohsuga,
72–80. C.J. Liau, and X. Hu. Springer-Verlag, August 2004, pp. 213–227.
[174] J. Zhan, “Research directions in data mining and privacy,” in Inter- [193] J. Zhan and S. Matwin, “Privacy-preserving electronic surveys,” in
national Conference on Services Systems and Services Management, Proceedings of the International Conference on Electronic Business,
Chongqing, China, June 2005, pp. 49–58. Beijing, China, December 2004, pp. 1179–1185.
[175] J. Zhan, L. Chang, and S. Matwin, “How to construct support vector [194] J. Zhan, S. Matwin, N. Japkowicz, and L. Chang, “Association rule
machines without breaching privacy,” in Proceedings of Artificial Intel- mining and privacy protection,” in Proceedings of the International
ligence Studies: VII International Conference on Artificial Intelligence Conference on Electronic Business, Beijing, China, December 2004,
AI-20’2005, Poland, June 2005, pp. 49–58. pp. 1172–1178.
[176] J. Zhan, S. Matwin, and L. Chang, “Privacy-preserving clustering over [195] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
horizontally partitioned data,” in Proceedings of Artificial Intelligence 5–32, 2001.
Studies: VII International Conference on Artificial Intelligence AI- [196] Y. Freund and R. E. Schapire, “Experiments with a new boosting
20’2005, Poland, June 2005, pp. 39–48. algorithm,” ICML, vol. 96, 1996.
[177] J. Zhan, L. Chang, and S. Matwin, “Building k-nearest neighbor [197] D. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2,
classification on vertically partitioned private data,” in Proceedings of pp. 241–259, 1992.
the IEEE International Conference on Granular Computing, Beijing, [198] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed anal-
China, July 2005, pp. 708–711. ysis of the kdd cup 99 data set,” IEEE Symposium on Computational
[178] J. Zhan, S. Matwin, and L. Chang, “Privacy-preserving decision tree Intelligence for Security and Defense Applications, vol. 5, no. 2, pp.
classification over vertically partitioned data,” in Proceedings of the 1–6, 2009.
IEEE International Conference on Data Mining Workshop on Multi- [199] C. Azad and V. K. Jha, “Genetic algorithm to solve the problem of
Agent Data Warehousing and Multi-Agent Data Mining, Houston, small disjunct in the decision tree based intrusion detection system,”
Texas, USA, November 2005, pp. 121–129. Int. J. Comput. Netw. Inf. Secur., vol. 7, no. 8, pp. 56–71, 2015.
[179] ——, “Multi-party sequential pattern mining over private data,” in [200] S.Vishwakarma, V.Sharma, and A.Tiwari, “An intrusion detection sys-
Proceedings of the IEEE International Conference on Data Mining tem using knn-aco algorithm,” Int. J. Comput. Appl., vol. 171, no. 10,
Workshop on Multi-Agent Data Warehousing and Multi-Agent Data pp. 18–23, 2017.
Mining, Houston, Texas, USA, November 2005, pp. 112–120.
[180] J. Zhan, L. Chang, and S. Matwin, “Privacy-preserving naı̈ve bayesian
classification over vertically partitioned data,” in Proceedings of the
IEEE International Conference on Data Mining Workshop on Foun-
dations of Semantic Oriented Data and Web Mining, Houston, Texas,
USA, November 2005, pp. 47–53.
1009

Cyber5 Cyber Security

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cyber5 Cyber Security

Uploaded by

Copyright:

Available Formats

A Study of Ensemble Methods for Cyber Security

Nicholas Lower and Felix Zhan

D. Stacking • IMAA - the average accuracy of each individual inner-

IMa IMAA IMAP IMAF Acc. Prec. F1

You might also like