You are on page 1of 9

2017 IEEE International Parallel and Distributed Processing Symposium Workshops

LSTM-based Memory Profiling for Predicting Data Attacks


in Distributed Big Data Systems
Santosh Aditham1 Nagarajan Ranganathan2 and Srinivas Katkoori1

Abstract— With the increase in sensitivity of data, big data


clusters are a primary target for attackers. Most attack pre-
vention and detection methods concentrate on network level
analysis. In this paper, we propose a run-time profiling method
to predict the memory usage of datanodes that can help in
taking necessary actions in advance when there is a data
attack in the big data cluster. Our proposed method relies on
replication property of big data, models memory mappings as
four-dimensional time series data and applies LSTM recurrent
neural networks for prediction the memory usage. There are
two steps in our proposed method: (1) Local Analysis where a
LSTM (Long short-term memory) cell of a datanode predicts
its memory usage, and (2) Consensus among replica datanodes
regarding any abnormal behavior. We present the results of our
proposed method by testing it on Hadoop MapReduce examples
and show that our proposed method can successfully predict
memory usage of a datanode with an error rate in the range
8-20 for inputs in the range 150-8000.
I. INTRODUCTION Fig. 1: Distribution of Malicious Programs in 2015 [1]
The big data universe is growing at a rapid pace with
billions of dollars in market capital every year. Big data
platforms such as Hadoop [2] and Spark [3] are being widely reduce the turnaround time of detecting attacks. Predicting
adopted both by academia and industry. At the same time attacks involves analysis of behaviors and symptoms such as
we have seen an unprecedented rise in cyber attacks all over deliberate markers, meaningful errors, preparatory behaviors,
the world, as shown in Figure 1. Institutions ranging from correlated usage patterns, verbal behavior and personality
big tech companies such as LinkedIn, Yahoo, etc. to small traits [8]. From these sets of indicators, clues can be pieced
businesses have all been targeted by attackers. Security in big together to predict and/or detect an attack.
data platforms is realized by incorporating traditional secu- Deep learning is a branch of machine learning based
rity measures such as user authentication, access control lists, on algorithms that attempt to model high level abstractions
activity logging and data encryption [2]. Blind trust in the in data. Deep learning based neural networks prove to be
providers of big data platforms has become the norm if the efficient tools in predictive analytics because they can model
end user wants to store their data in the cloud because such data that contains non-linear characteristics. Hence, using
trust is built on an underlying assumption that the platforms deep neural networks to predict attacks in big data systems is
and their security methods will never be compromised. As a a promising direction to go forward for security researchers.
result, the providers have de facto control over user privacy Long short-term memory (LSTM) is a popular recurrent
at all times. But unexpected issues such as insider attacks or neural network that has been proved to work very well
control-flow attacks due to programmer errors can happen for time series and other simple sequence predictions. It is
in any system anytime. In the past we proposed various widely used across domains such as image & video analysis
techniques for detecting such attacks in big data platforms [9], natural language processing [10], stock markets and
[4]. As such, techniques for intrusion detection is a research anomaly detection [11]. Typically, LSTM networks are built
area that is constantly evolving [5], [6], [7]. But for this to run on big data clusters but in this work we will be using
work, we are interested in developing a novel technique to LSTM networks to predict attacks within a big data cluster.
predict the possibility of an attack. The motivation behind The rest of this paper is organized as follows: Section II
predicting attacks is to limit the impact of an attack and to presents background on the related fields such as big data
1 Santosh Aditham and Srinivas Katkoori are with the Department of security and neural networks. It also includes a brief review
Computer Science & Engineering, University of South Florida, Tampa, of past works that are closely related to our work. Section III
FL, USA. saditham, katkoori at mail dot usf dot explains the proposed method in detail. Section IV reports
edu
2 Nagarajan Ranganathan is Professor Emeritus with the Department of the experimental results and an analysis of those results.
Computer Science & Engineering, University of South Florida, Tampa, FL, Finally, Section V concludes the paper and talks very briefly
USA. rangatampa at gmail dot com about future work.

978-1-5386-3408-0/17 $31.00 © 2017 IEEE


978-0-7695-6149-3/17 1259
DOI 10.1109/IPDPSW.2017.76
II. BACKGROUND AND RELATED WORK used rules-statistics to analyze, predict and optimize memory
The three areas of security research that are closely related usage in MapReduce jobs. In distributed compute systems,
to this work are: (a) security in big data, (b) observing nodes of the cluster are typically virtual machines. A Hadoop
patterns in memory accesses, finding anomalies in process datanode is a process which dynamically dispatches tasks
behavior, and (c) deep learning and recurrent neural networks every time a job is scheduled. So, profiling the sizes of
for time series prediction. the resident set, private and shared memory mappings of all
threads will give the memory access pattern of the datanode.
A. Security in Big Data
C. Deep Learning and LSTM
Security in big data is gaining tremendous momentum in
both research and industry. But big data security is over- Neural networks are efficient tools to build predictive
whelmingly inclined towards leveraging big data’s potential models that can learn regularities in simple sequences [27],
in providing security for other systems. Security within [28]. A neural network is usually a feed-forward network
big data systems is still a budding phenomenon [4]. It built using perceptrons that take multiple inputs and use an
is ideal to include security as a major component in the activation function, some weights and biases to compute a
holistic view of big data systems. But the requirements single output. It learns to adjust the weights and biases over
of big data applications such as real-time data processing, time using an optimization function called gradient descent
fault tolerance and continuous availability give little scope algorithm. Back propagation [29] is the most popular method
to employ complex and robust security mechanisms. All for gradient calculation that gives an insight to how quickly
existing security techniques implemented within big data the cost changes when the weights and biases are changed
frameworks are software based and try to prevent external in a neural network. Feed-forward neural networks are the
entities from attacking the system [4]. For example, the most commonly used versions of neural networks and they
main requirements in Hadoop security design focus only on are used to solve classification problems. In recent times,
access control [12]. Big data systems encourage software a modification of neural networks called recurrent neural
based fine-grained security mechanisms such as Kerberos networks (RNN) is gaining popularity because it uses a
[13]; access control lists (ACL); log monitoring etc. Big chain-like structure and can connect the perceptrons in a
data security is inclined towards specifying access rights at directed cycle that creates internal memory to store parts of
multiple levels: user level, application level and data level. the input sequence and intermediate results [30]. But back
Advantages of having such simple software oriented security propagation in such networks takes extremely long time.
mechanisms, such as Kerberos, are better performance and Also, standard RNNs have to deal with the vanishing or
simple management. But there are various problems with exploding gradient problem where they eventually fail to
such a policy enforcing security software, as identified in learn in the presence of time lags between relevant input
[14] and [15]. Also, none of these approaches can strongly events and target signals [31]. To solve this problem a
counter insider attacks. gradient method called Long Short-Term Memory (LSTM)
was introduced that can help preserve the error that can be
B. Memory Access Patterns backpropagated through time and layers [32].
The need for tools that can diagnose complex, distributed LSTM networks work by using cells that are made up of
systems is high because the root cause of a problem can a set of gates: forget gate (FG), input gate (IG), cell gate
be associated to multiple events/components of the system. (CG) and output gate (OG). These cells and their gates help
Recent works in the distributed tracing domain [16], [17], in retaining, adding or deleting information from one pass to
[18], [19], [20], [21], [22], [23] concentrate on providing another. A vanilla LSTM implementation use all gates and
tracers as an independent service. While some of them relies on activation functions such as sigmoid (σ), tanh for
focus on reconstructing what happened, others are purely proportional selection of data. While sigmoid (σ) function
auditing tools. We believe that understanding memory access is used to open/close a gate in the LSTM cell, the tanh
patterns of big data applications will help in profiling them function is applied to cell state and output selection where
from their data usage perspective and providing task-centric the result can be positive/negative. LSTM also uses stochastic
causality which is important for attack prediction. Patterns optimization functions such as Adam [33], RMSprop [34] or
in bandwidth usage, read/write ratio or temporal and spatial stochastic gradient descent for measuring the gradient in the
locality can be used when observing memory accesses of a output. Since there is no conclusive evidence for which op-
process. For example, Wei et al. [24] observed that memory timization function works best for LSTM, we picked Adam
access patterns of big data workloads are similar to tradi- for this work. Figure 2 and Equations 1 give a basic overview
tional parallel workloads in many ways but tend to have of how a LSTM cell works. Here wf g , wig , wog , wcg are
weak temporal and spatial locality. One of the first works in weights and bf g , big , bog , bcg are bias associated to gates
characterizing memory behavior of big data workloads was while t − 1, t are two consecutive steps in a sequence. Every
done by Dimitrov et al. [25] who observed the characteristics LSTM cell has three data values at each step: cell state (c),
such as memory footprints, CPI, bandwidth etc. of the big input (i) and output (o). Many variations have been proposed
data workloads to understand the impact of optimization to vanilla LSTM [35] but researching all such variations is
techniques such as pre-fetching and caching. Xu et. al. [26] not the main focus for our work.

1260
B. Local Analysis
Proposing the use of machine learning techniques for
prediction models is not new. For example, Allesandro
et al. [37] proposed the use of Support Vector Machines
for predicting application failure. Murray et al. [38] used
naive Bayesian classifiers for predicting failure in hard
drives. Recently, researchers from Facebook [28] used stack-
augmented recurrent networks to predict algorithm behavior.
But we believe that proposing to use LSTM for predicting
memory behavior of datanodes in order to anticipate attacks
has not been done before.
Fig. 2: Sample LSTM cell design [36]
The first step when trying to predict or detect attacks in a
system is to understand the way the system works. When a
job is submitted to the big data cluster, a resource manager
or scheduler such as yarn will schedule that job to run on
F G = σ(wf g × [ot−1 , it ] + bf g ) (1a) datanodes that host the required data. Each replica datanode
IG = σ(wig × [ot−1 , it ] + big ) (1b) is supposed to execute the same job on their copy of the data
to maintain data consistency. This replication property of a
CG = tanh(wcg × [ot−1 , it ] + bcg ) (1c) big data cluster helps in predicting an attack. By monitoring
ct = ct−1 × F G + IG × CG (1d) the memory behavior of replica datanodes and understanding
OG = σ(wog × [ot−1 , it ] + bog ) (1e) their correlation, we can predict the memory requirement of
ot = OG × tanh(ct ) (1f) a datanode in near future with some certainty.
The proposed memory behavior prediction method in-
volves running a LSTM network on the memory mapping
data obtained locally and independently at each datanode
III. PROPOSED METHOD
of the big data cluster. For this work, we chose to closely
In this section, first the objectives and assumptions in the monitor four different features of memory, as given below:
threat model are listed down. Then the two steps of our
• the actual size of memory mappings.
proposed method - local analysis & consensus are explained
• resident set (RSS) that varies depending on caching
in detail. The overall algorithmic flow of the proposed
technique.
method can be seen in Figure 3.
• private pages to characterize a particular workload.
A. Threat Model • shared pages to understand the datanode workload in
the background, especially when the job is memory
A software-centric threat model, similar to our previous
intensive.
work [4], is used that dissects the big data platform design
to understand its operational vulnerabilities. The main threats Hence, each data point in the observed memory access
we focus in this work are related to data degradation, data sequence is a vector of size four. Since the readings are taken
theft and configuration manipulation. These threats are a at regular time intervals, this data resembles a time series.
subset of insider attacks that are not typically handled in the From Equation 1, the input to our LSTM cell at time step t−1
big data world. The common link in these threats is that they is it−1 which is a data point with four values < a, b, c, d >
are directly related to compromising the data hosted by the where a is the size of the mapping, b is the size of pages
cluster. Hence, they have an obvious relation with the cluster in the resident set of the memory, c is the size of private
infrastructure and memory behavior of datanodes. Our threat pages and d is the size of shared pages. The recurrent LSTM
model does not consider vulnerabilities originated outside the network in our case is simply a single LSTM cell with a
cluster such as user and network level vulnerabilities. feedback loop. The decision reached at time step t−1 affects
Some assumptions about the system were made to fit this the decision it will reach at time step t. After a memory
threat model. data point is fed to the LSTM cell as input, the output ot at
• All datanodes use the same architecture, operating sys-
the next time step t is the predicted output. During training
tem and page size. phase, this output is compared with the actual observed value
• Replica nodes host similar data which is not necessarily
for correctness and the weights and bias of the LSTM cell are
the case in a production environment. adjusted accordingly. For this purpose, we used a standard
• The path to framework installation is the same on all
metric called Root Mean Squared Error (RMSE). RMSE,
datanodes. as given in Equation 2, is a general purpose error metric for
• The communication network will always be intact.
numerical predictions. Here, n is the number of observations
• The communication cost among replicas (slaves) is
in the sequence, yk represents a single observation and ŷ
at most the communication cost between namenode denotes the mean of the sequence so far. It is sensitive to
(master) and datanodes (slaves). small errors and severely punishes large errors. When the

1261
Fig. 3: Algorithmic flow of the proposed method.

RMSE falls above/below a predefined threshold T , we can datanodes about a possible attack, necessary prevention steps
be certain that the datanode behavior is abnormal. But an such as notifying the master node can be taken.
observed anomaly in the memory behavior at a datanode at
this stage is only a necessary but not a sufficient condition IV. EXPERIMENTAL RESULTS
to predict an attack. For this reason, we use the second This section presents our experiments and their results
step in our proposed method - consensus. The data points in three parts: the first part describes the setup of Hadoop
representing the observed and the expected memory usage cluster, LSTM network and our choice of programs to test the
for a certain time window τ is shared with other replica nodes proposed method; the next part explains the results observed
using the secure communication protocol for consensus. from the conducted experiments and; the final part is an
analysis of the results along with some observations about

 n the proposed method.
1 
RM SE = (yk − ŷ)2 (2) A. Setup
n
k=1
We used Amazon EC2 to set up a small Hadoop cluster
C. Consensus of 3 datanodes, 1 namenode and 1 secondary namenode.
All programs used for testing are obtained from Hadoop
We use the same framework given in our previous works map-reduce examples that come with the basic installation
[4] to host our proposed attack prediction method. This of Hadoop and run on this Hadoop cluster. Some detail
framework facilitates secure communication among datan- about these MapReduce programs are given in Table I.
odes to exchange their local analysis results, share the statis- We used a standard quad-core desktop to run Keras &
tical test results for consensus and notify the master node, if Tensor Flow frameworks required for implementing a LSTM
necessary. The framework uses encryption/decryption tech- network. Statistical tests were conducted using Matlab. The
niques such as AES along with communication protocols configuration of the Hadoop nodes and the LSTM machine
such as SSH and proposes the use of hardware security such are given in Table II.
as TPM and secure co-processors for key generation, storage First, each Hadoop MapReduce example is run on the
and hashing. cluster for a hundred times. For every iteration, the memory
When a datanode receives memory behavior data from readings are taken periodically with a time interval of five
other replica datanodes, it compares that data with its local seconds from /proc/pid/smaps which is a linux tool for
data. Both datasets are two-dimensional with predicted val- observing memory usage of processes. Average values of
ues and actual values. Each replica datanode runs a variance the four memory features - size, resident set, private pages
test such as ANOVA to measure correlation between local and shared pages are calculated and stored in an array of
data and received data. If the null hypothesis is rejected, then four-dimensional vectors. By the end of hundred iterations,
a multiple comparison test such as Tukey test is used to find we have a time series history of process behavior where each
the corrupt datanode. When a consensus is reached among data point tells us about the memory behavior of the process

1262
TABLE I: List of Hadoop MapReduce Examples TABLE II: Machine configurations used for experiments
E.No. Name Description Attribute Value
1 aggregate- An Aggregate-based map/reduce program that Purpose Hadoop LSTM
wordcount counts the words in the input files. EC2 Instance Model t2.micro N/A
2 bbp A map/reduce program that uses Processor Intel Xeon with AMD A8 Quad
Bailey-Borwein-Plouffe to compute exact digits of Turbo Core
Pi. Compute Units 1 (Burstable) 4
3 pentomino A map/reduce tile laying program to find solutions vCPU 1 N/A
to pentomino problems. Memory (GB) 1 16
4 pi A map/reduce program that estimates Pi using a Storage Elastic Block Store SATA
quasi-Monte Carlo method. Networking low N/A
5 randomtex- A map/reduce program that writes 10GB of Performance
twriter random textual data per node. Operating System Linux/UNIX Ubuntu 16.04
6 ran- A map/reduce program that writes 10GB of Software distribution Hadoop 2.7.1 Keras
domwriter random data per node. Tensor Flow
7 sort A map/reduce program that sorts the data written
by the random writer.
8 wordcount A map/reduce program that counts the words in TABLE III: Insider Attacks on a Datanode
the input files.
No. Title Description Impact
1 Mail A system admin is sending out user data Data
Attack as attachments from datanode to Privacy
personal email account.
for a specific run. This is a part of local analysis and it is
2 Config- A system admin changes the HDFS Datan-
done in parallel on every datanode. uration configuration of a datanode he/she has ode
Next, the time series data of a process i.e., one hundred Attack access to. perfor-
mance
time steps of four-dimensional data points, is fed to a LSTM
3 Log A system admin modifies user data and System
network. Our LSTM network has one hidden layer of four Deletion deletes the log files on a datanode Analysis
LSTM cells and a dense output layer of four neurons (fully Attack he/she has access to.
connected) to predict four memory features. It looks back at
one previous time step when predicting a new value and the
batch size is also set to one. We used 80% of this data for • The next set of results show the accuracy in the pre-
training the LSTM network and tested it on the remaining diction of datanode memory behavior by the LSTM
20% of data. The number of epochs varied between 50 network. These results can be observed in Figure 5 and
and 100. RMSE is used to calculate accuracy of the LSTM Table IV. Since the range of the input data is between
network for both training and testing phase. By the end of (150 − 7500), RMSE in the range of (1 − 60) can be
this step, the LSTM network of every datanode is trained considered as good prediction.
and tested on normal behavior of that datanode. • The final set of results show the efficiency of the
Finally, three different kinds of insider attacks are simu- prediction algorithm when tested on the Hadoop cluster
lated in the Hadoop cluster and datanode memory behavior with corrupt or compromised datanodes. These results
when running the Hadoop MapReduce examples is captured can be observed in Figure 5 and Table IV. Since the
again in the same way as mentioned before. The details about range of the input data is between (150−20000), RMSE
these attacks are given in Table III. For demonstration, we in the range of (1800−6700) can be considered as poor
infected the datanodes by increasing thread count allocated prediction.
for HDFS, allowing the datanode to cache and increasing
C. Analysis
the cache report and cache revocation polling time (making
the datanodes faster). Also, the datanodes were running two The Hadoop MapReduce examples used in this work
separate scripts in the background: one for sending emails represent a variety a of workloads. Programs such as Pi,
of data and another to modify logs. This data is then used BBP, Pentomino use MapReduce jobs that are compute
for testing the LSTM network that has been trained by using intensive but use very less memory. The execution time of
the same 80% data from above experiments. The challenge a single run of these programs is less than 5 seconds. Pro-
that these attacks impose is that there is no error or change grams such as Random Write, Sort, Random Text
in program behavior with respect to correctness. Write, Aggregate Word Count are memory intensive
and at the same time they represent multi-tasking. While the
B. Results Random Write, Random Text Write programs are
limited to writing data to HDFS, the Sort, Aggregate
There are three sets of results in our work. Word Count programs use more memory because they
• The first set of results show a comparison between the are both memory and compute intensive tasks. Each of
program behavior when the datanodes are not yet com- these jobs took less than 2 minutes to complete working
promised and when they are compromised, as shown in on their 1GB of input data. The datanode alternates between
Figure 4. Here, the results are shown for one datanode these two programs for each run. Finally, the Word Count
and each Hadoop MapReduce job is analyzed using four program, which is very similar to Sort, Aggregate
different features of memory. Word Count programs shows that the replica datanode

1263
Fig. 4: Comparison of Datanode 1 Memory Behavior for MapReduce Examples before and after attack (all 3 attacks are
simultaneously staged).

TABLE IV: RMSE Results from LSTM while Predicting Datanodes Memory Mappings
E.No. Epochs Stage Samples RMSE
DN1 DN2 DN3
TRAIN 78 7.49 8.33 0.95
2 50 TEST(normal) 18 4.92 6.69 1.27
TEST(attack) 57 5586.74 5496.8 6590.52
TRAIN 78 7.83 5.77 2.58
3 50 TEST(normal) 18 5.78 5.08 1.19
TEST(attack) 57 5747.34 5578.56 6684.66
TRAIN 78 6.85 25.67 24.85
4 50 TEST(normal) 18 6.42 56.28 8.74
TEST(attack) 37 6007.43 5678.18 5977.24
TRAIN 78 11.87 17.74 5.7
8 50 TEST(normal) 18 12.44 19.40 9.61
TEST(attack) 22 2063.7 1870.45 1842.91
TRAIN 158 20.93 16.73 19.38
1+5 50 TEST(normal) 38 15.33 17.12 9.57
TEST(attack) 20 6180.51 5328.91 5852.19
TRAIN 238 12.49 12.26 13.37
6+7 100 TEST(normal) 58 11.46 10.71 14.2
TEST(attack) 27 5972.78 5742.77 5545.03

behavior is not always similar to one another even if the was three times that of the other jobs and each job took
Hadoop cluster is not under attack. The input for this job almost 8 minutes to complete. This was the most memory

1264
(a) Pi (b) BBP

(c) Pentomino (d) Word Count on 3GB file

(e) Random Text Write + Aggregated Word Count (f) Random Write + Sort

Fig. 5: LSTM Analysis of Datanode 1 Memory Behavior for MapReduce Examples

1265
Fig. 6: LSTM Root Mean Square Error Loss Convergence

intensive job of all examples used and the size of input and V. CONCLUSIONS
output is 6GB combined.
In this paper, we proposed a method to predict (and
While the LSTM network of every datanode is trained
successively detect) an internal data attack on a Hadoop
using the data from normal runs of the programs, the testing
cluster by analyzing individual datanode memory behavior
datasets were different. One testing dataset represents normal
using LSTM recurrent neural networks. The core of our idea
behavior and the other represents compromised behavior.
is to model memory mappings of a datanode using multiple
As shown in Figure 6, the convergence in loss plateaus
features and represent that data as time series such that a
after 50/100 epochs. This shows the efficiency of our LSTM
recurrent neural network such as LSTM can predict program
model during training. The values in the test results differ
level memory requirements. When the actual memory usage
by a huge margin as shown in Test(normal) row and
differs from the predicted requirement by a huge margin,
Test(attack) row of every MapReduce program in Table
datanodes in a big data cluster share that information and
IV. It can be observed that the RMSE increased from a small
come to a consensus regarding the situation. We demon-
value in the range of 1-60 to a much bigger value in the
strated the efficiency of our method by testing it on a Hadoop
range of 1800-6700. That huge difference in the RMSE
cluster using a set of MapReduce open-benchmarks and
can be attributed to the type of attacks that were introduced.
infecting the cluster using predefined attacks.
Since the attacks improved the memory performance of
For future work, we would like to use more variations
datanodes, they made the datanodes use more memory.
of LSTM and other recurrent neural networks and test the
Though it is difficult to come up with a fixed number or
methods on a broader set of industry datasets.
range in RMSE difference to denote compromised behavior,
the idea here is to have application specific threshold, T , that
represents the acceptable range of deviation between training R EFERENCES
and testing phase in order to predict an attack successfully. [1] Ivanov, Anton, Denis Makrushin, Jornt Van Der Wiel, Maria Garnaeva,
The attacks used in our experiments are designed to explicitly and Yuri Namestnikov. “Kaspersky Security Bulletin 2015. Overall
show such huge impact. statistics for 2015.“ Securelist Information about Viruses Hackers and
Spam. AO Kaspersky Lab, 15 Dec. 2015. Web. 22 Jan. 2017.
Since LSTM time-series prediction model is statistical pre- [2] White, Tom. Hadoop: The definitive guide. “ O’Reilly Media, Inc.“,
diction, it might be insufficient for certain security techniques 2012.
[3] Zaharia, Matei, et al. “Spark: cluster computing with working sets.“
that demand 100% accuracy. Also, the proposed technique Proceedings of the 2nd USENIX conference on Hot topics in cloud
succumbs to the base-rate fallacy phenomenon. computing. Vol. 10. 2010.

1266
[4] Aditham, Santosh, and Nagarajan Ranganathan. “A novel framework [30] Schuster, Mike, and Kuldip K. Paliwal. “Bidirectional recurrent neural
for mitigating insider attacks in big data systems.“ Big Data (Big networks.“ IEEE Transactions on Signal Processing 45.11 (1997):
Data), 2015 IEEE International Conference on. IEEE, 2015. 2673-2681.
[5] Debar, Herve, Monique Becker, and Didier Siboni. “A neural network [31] Gers, Felix A., Jrgen Schmidhuber, and Fred Cummins. “Learning to
component for an intrusion detection system.“ Research in Security forget: Continual prediction with LSTM.“ Neural computation 12.10
and Privacy, 1992. Proceedings., 1992 IEEE Computer Society Sym- (2000): 2451-2471.
posium on. IEEE, 1992. [32] Hochreiter, Sepp, and Jrgen Schmidhuber. “Long short-term memory.“
[6] Mukkamala, Srinivas, Guadalupe Janoski, and Andrew Sung. “Intru- Neural computation 9.8 (1997): 1735-1780.
sion detection using neural networks and support vector machines.“ [33] Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic
Neural Networks, 2002. IJCNN’02. Proceedings of the 2002 Interna- optimization.“ arXiv preprint arXiv:1412.6980 (2014).
tional Joint Conference on. Vol. 2. IEEE, 2002. [34] Tieleman, Tijmen, and Geoffrey Hinton. “Lecture 6.5-rmsprop: Divide
[7] Tuor, Aaron, et al. “Deep Learning for Unsupervised Insider Threat the gradient by a running average of its recent magnitude.“ COURS-
Detection in Structured Cybersecurity Data Streams.“ (2017). ERA: Neural networks for machine learning 4.2 (2012).
[8] Schultz, E. Eugene. “A framework for understanding and predicting [35] Srivastava, Nitish, et al. “Dropout: a simple way to prevent neural
insider attacks.“ Computers & Security 21.6 (2002): 526-531. networks from overfitting.“ Journal of Machine Learning Research
[9] Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. “Un- 15.1 (2014): 1929-1958.
supervised Learning of Video Representations using LSTMs.“ ICML. [36] Olah, Christopher. “Understanding LSTM Networks.“ Understanding
2015. LSTM Networks – Colah’s Blog. Github, 27 Aug. 2015. Web. 18 Jan.
[10] Graves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. “Speech 2017.
recognition with deep recurrent neural networks.“ Acoustics, speech [37] Pellegrini, Alessandro, Pierangelo Di Sanzo, and Dimiter R. Avresky.
and signal processing (icassp), 2013 ieee international conference on. “A Machine Learning-based Framework for Building Application
IEEE, 2013. Failure Prediction Models.“ Parallel and Distributed Processing Sym-
posium Workshop (IPDPSW), 2015 IEEE International. IEEE, 2015.
[11] Malhotra, Pankaj, et al. “Long short term memory networks for
[38] Murray, Joseph F., Gordon F. Hughes, and Kenneth Kreutz-Delgado.
anomaly detection in time series.“ Proceedings. Presses universitaires
“Machine learning methods for predicting failures in hard drives: A
de Louvain, 2015.
multiple-instance application.“ Journal of Machine Learning Research
[12] OMalley, Owen, et al. “Hadoop security design.“ Yahoo, Inc., Tech.
6.May (2005): 783-816.
Rep (2009).
[13] Neuman, B. Clifford, and Theodore Ts’ O. “Kerberos: An authenti-
cation service for computer networks.“ Communications Magazine,
IEEE 32.9 (1994): 33-38.
[14] Hu, Daming, et al. “Research on Hadoop Identity Authentication
Based on Improved Kerberos Protocol.“ (2015).
[15] Gaddam, Ajit. “Data Security in Hadoop.“ Data Security in Hadoop
(2015): n. pag. 20 Feb. 2015. Web. 10 Jan. 2016.
[16] Barham, Paul, et al. “Magpie: Online Modelling and Performance-
aware Systems.“ HotOS. 2003.
[17] Fonseca, Rodrigo, et al. “X-trace: A pervasive network tracing frame-
work.“ Proceedings of the 4th USENIX conference on Networked
systems design & implementation. USENIX Association, 2007.
[18] Mace, Jonathan, et al. “Retro: Targeted Resource Management in
Multi-tenant Distributed Systems.“ NSDI. 2015.
[19] “Apache HTrace.“ Apache HTrace About. The Apache Software
Foundation, 13 July 2016. Web. 19 Jan. 2017.
[20] Guo, Zhenyu, et al. “G2: A Graph Processing System for Diagnosing
Distributed Systems.“ USENIX Annual Technical Conference. 2011.
[21] Mace, Jonathan, Ryan Roelke, and Rodrigo Fonseca. “Pivot tracing:
Dynamic causal monitoring for distributed systems.“ Proceedings of
the 25th Symposium on Operating Systems Principles. ACM, 2015.
[22] Erlingsson, lfar, et al. “Fay: extensible distributed tracing from kernels
to clusters.“ ACM Transactions on Computer Systems (TOCS) 30.4
(2012): 13.
[23] Liao, Cong, and Anna Squicciarini. “Towards provenance-based
anomaly detection in MapReduce.“ Cluster, Cloud and Grid Com-
puting (CCGrid), 2015 15th IEEE/ACM International Symposium on.
IEEE, 2015.
[24] Wei, Wei, et al. “Exploring opportunities for non-volatile memories
in big data applications.“ Workshop on Big Data Benchmarks, Perfor-
mance Optimization, and Emerging Hardware. Springer International
Publishing, 2014.
[25] Dimitrov, Martin, et al. “Memory system characterization of big data
workloads.“ Big Data, 2013 IEEE International Conference on. IEEE,
2013.
[26] Xu, Lijie, Jie Liu, and Jun Wei. “Fmem: A fine-grained memory
estimator for mapreduce jobs.“ Proceedings of the 10th International
Conference on Autonomic Computing (ICAC 13). 2013.
[27] Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. “Multilayer
feedforward networks are universal approximators.“ Neural networks
2.5 (1989): 359-366.
[28] Joulin, Armand, and Tomas Mikolov. “Inferring algorithmic patterns
with stack-augmented recurrent nets.“ Advances in Neural Information
Processing Systems. 2015.
[29] Hecht-Nielsen, Robert. “Theory of the backpropagation neural net-
work.“ Neural Networks, 1989. IJCNN., International Joint Confer-
ence on. IEEE, 1989.

1267

You might also like