(1-5) Zhang2019

,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&RPPXQLFDWLRQV&RQWURODQG&RPSXWLQJ7HFKQRORJLHVIRU6PDUW*ULGV6PDUW*ULG&RPP
Domain-Adversarial Transfer Learning for Robust

Intrusion Detection in the Smart Grid
Yongxuan Zhang∗ and Jun Yan†
∗ Department of Computer Science and Software Engineering (CSSE), Concordia University
† Concordia Institute for Information Systems Engineering (CIISE), Concordia University
Montréal, Québec, Canada
z yongxu@encs.concordia.ca; jun.yan@concordia.ca
Abstract—The smart grid faces growing cyber-physical attack Control Center
threats aimed at the critical systems and processes communi- Monitoring Protection Control Optimization
cating over the complex cyber-infrastructure. Thanks to the Control LANs

Cyber
increasing availability of high-quality data and the success WANs Networks
FANs / NANs
of deep learning algorithms, machine learning (ML)-based Plant Substation Satellite Substation
IANs CANs
detection and classification have been increasingly effective LANs LANs Networks LANs
BANs HANs
and adopted against sophisticated attacks. However, many of
these techniques rely on the assumptions that the training
and testing datasets share the same distribution and the same
class labels in a stationary environment. As such assumptions Power
Grids
may fail to hold when the system dynamics shift and new
threat variants emerge in a non-stationary environment, the
capability of trained ML models to adapt in complex operating Generation Transmission Distribution Customers
scenarios will be critical to their deployment in real-world BAN: Building Area Network IAN: Industrial Area Network
smart grid communications. To this aim, this paper proposes CAN: Corporate Area Network LAN: Local Area Network
FAN: Field Area Network NAN:NeighborhoodArea Network
a domain-adversarial transfer learning framework for robust HAN: Home Area Network WAN: Wide Area Network
intrusion detection against smart grid attacks. The framework Fig. 1. The cyber-physical architecture of the smart grid.
introduces domain-adversarial training to create a mapping
between the labeled source domain and the unlabeled target The threat of an informed cyber-attack could be both so-
domain so that the classifiers can learn in a new feature
phisticated and catastrophic, as demonstrated by extensive
space against unknown threats. The proposed framework with
different baseline classifiers was evaluated using a smart grid research efforts [1]–[3], business studies [4], and real-world
cyber-attack dataset collected over a realistic hardware-in-the- incidences [5]. From field devices to communication channels
loop security testbed. The results have demonstrated effective and control rooms, smart grid may expose many critical
performance improvements of trained classifiers against unseen systems and processes that are both vulnerable and valuable
threats of different types and locations.
to adversaries with terrorism, monetary, or other purposes.
Intrusion detection system (IDS) is a critical layer in the
I. I NTRODUCTION
defence of smart grid security. While traditional intrusion
The smart grid has been known as the next-generation detection relies on system specifications [6] and attacks
infrastructure that will power modern society with efficient, signatures to identify the adversaries, recent efforts [7] have
reliable, and sustainable electricity. Compared to traditional developed advanced intrusion detection systems (IDS) based
power grids, the smart grid will fully leverage high-speed on machine learning and deep learning techniques to extract
two-way communications among utilities, regulators, and informative features and tackle system complexities in the
customers to better-informed and interactive electricity gener- detection and classification of cyber-physical attacks.
ation, transmission, distribution, and consumption. The smart However, many machine learning methods still face many
grid communication networks will encompass: 1) wired and challenges in practical smart grid communications. A crucial
wireless networks of consumer-end smart meters; 2) local limitation among them arises from the assumption that all
area networks in power plants, substations, and control cen- training and testing data will share the same feature sets and
ters; 3) field area networks over small regions; 4) wide area follow the same/similar distribution. While such assumption
networks across regions and utilities; and 5) satellite com- generally holds in a stationary environment, both physi-
munications for time synchronization. Millions of networked cal systems and communication networks face continuous
devices of different locations, manufacturers, standards, pro- changes that can lead to the shift of data distribution and new
tocols, topology, and ownership will create massive complex attack threats that were unseen by the well-trained models.
information flow for situation analysis, decision making, and Consider the following scenario as an example: a model
event logging. has been trained with previous collected and labeled dataset
The growing integration and reliance on communication that may contain a number of known and synthetic attacks.
networks in the smart grid will inevitably increase the sur- After the deployment in several months, the system dynamics
face of cyber-attacks on the power and energy systems. shifted with new renewable sources and increasing electrical
k,(((
vehicle integration, while new variants and/or attacks have National Laboratory has conducted a comprehensive evalua-
emerged targeting a zero-day vulnerability. The trained model tion of several classic machine learning approaches in binary
can perform poorly in the new environment against the new classification tasks [17], which established a benchmark for
threats [7], while in many real-world scenarios it is expensive, different machine learning approaches. Subsequent studies
if not prohibitive, to recollect new datasets and re-train a based on neural networks and ensemble learning have reached
deployed model. over 95% overall accuracy [18], [19]. However, these existing
In machine learning research, transfer learning has been approaches took the default assumption that training and
widely adopted in various image/video applications [8], [9] testing datasets are from the same distribution and the same
as a promising solution to transfer a learned model into new attacks have appeared in both datasets. While traditional
domains or tasks. By learning a new feature space where machine learning can suffer degrading accuracy when the
inner-class similarity and inter-class distance are preserved, assumption fails to hold, the proposed framework can al-
TL methods allow machine learning-based classifiers to retain low trained classifiers to retain robust performance against
their performance on incoming data with new distributions unknown threats.
and/or classes. However, such capacity has not been extended
B. Transfer Learning for Knowledge Adaption
or validated in complex cyber-physical systems like the smart
grid, where the common practice is still to manually select Transfer learning (TL), also known as inductive learning or
and test different feature subsets. domain adaptation, aims at applying knowledge or patterns
Motivated by these gaps, this paper proposes a domain- learned in a certain domain (or task) to a different but related
adversarial framework leveraging based on latest deep ad- domain (or task) [20]. The “transfer” is similar to the process
versarial learning technique [10] to extract novel features where a graduate tries to apply what he/she learns in class
and transfer different machine learning classifiers for new to a practical problem, and the key is to find a mapping
operation scenarios and attack threats. A realistic smart grid between the problems so that the learned knowledge can be
security dataset, collected over hardware-in-the-loop simula- adapted to the practical situation and resolve the problem.
tors [11] with multiple attack and normal scenarios as well Over the last two years, efforts have started to introduce
as cyber and physical system information, has been chosen transfer learning for anomaly detection in Internet [21] and
to evaluate the performance of the proposed framework. The cloud [22] applications. Juan et al. proposed a feature-
dataset allows the study to leverage deep packet inspection based transfer learning framework that was able to boost
(DPI) of the payloads (measurements), system logs, and classifier performance on the well-known NSL-KDD dataset
snort alerts, instead of packet headers or traffic statistics, of TCP traffic [23], demonstrating the potential merit for
to utilize the cyber-physical information against unknown cyber-security analysis. Meanwhile, inspired by the highly-
threats. Similar DPI method has also been adopted in [12], successful generative adversarial networks (GANs) [24],
[13] to extract voltages and currents for detecting false data Ganin et al. proposed the domain-adversarial training based
injection and fault localization. Ablation analysis has been on neural networks (DANN) [10] as an unsupervised domain
performed to provide insights on the choice and influence adaption technique for image applications. Seen the potential
of critical hyper-parameters. The results have shown that the and the gap of TL methods leveraging cyber-physical data
domain-adversarial TL framework can significantly improve for smart grid communication and security, the proposed
classification performance against unknown threats of differ- framework introduced the domain-adversarial TL as the core
ent types and locations. Discussions on hyper-parameters are feature extractor for the detection of unknown threats.
also provided for practical uses. III. D OMAIN -A DVERSARIAL T RANSFER L EARNING
The rest of this paper is organized as follows. Section II F RAMEWORK AGAINST U NKNOWN T HREATS
reviews the related work. Section III presents the domain- The paper proposes a framework that leverages domain-
adversarial transfer learning framework for intrusion detec- adversarial transfer learning, one of the latest domain adap-
tion. Section IV presents the experimental setup, the simulation techniques, to extend existing knowledge against un-
tion results of unknown threat types and unseen threat loca- known threats in the smart grid. The overall architecture,
tions, and the discussions on the choice of hyper-parameters. illustrated in Fig. 2, consists of four steps: (1) collect labeled
Section V draws the conclusions and future works. source data and unlabeled target data; (2) train the model
over both domains to obtain a new feature space in a domain-
II. R ELATED W ORKS adversarial manner; (3) use the new feature representations of
A. Machine Learning for Smart Grid Security the labeled data in source domain to train classifiers; (4) map
the new data to the new feature space and predict their class
Following the increasing availability of data and computing labels. The detailed problem formulation and methodology
power, machine learning (ML) has been extensively investi- are as follows.
gated as an intrusion detection and classification technique
in communication/network security [7] as well as smart A. Problem Formulation
grid security [14], [15]. Among them, adversarial machine In transfer learning, a domain D consists of two compo-
learning has been introduced in applications like synthesizing nents: a feature space X and a marginal probability distri-
normal data for load forecasting [16]. In 2014, Oak Ridge bution P(X). Given a specific domain D= {X ,P(X)}, a task
Step 1: Data Preprocess
Label Predictor
Feature Extractor
Label prediction loss
Source Domain Tuning
Labeled Training Data
(known threats) Class
Label Projected
Model Training
Labeled
Source Data
Domain Classifier
Feature Domain
Target Domain
Label Selected
Unlabeled Training Data Tuning
Classifier
(unknown
Domian Adaption Loss
threats/variants
Step 2: Transfer Learning Model (DANN) Training Step 3: Classifier Traning
Trained Feature Extractor
Target Domain
Unlabeled Testing Data Trained Predict Model Testing
(unknown Classifier Label
threats/variants
Step 4: Predict target domain data
Fig. 2. The proposed domain-adversarial transfer learning framework for smart grid intrusion detection.
consists of two components: a label space Y and an objective will be back-propagated to adjust the weights of the feature
predictive function f (·) to be learned from the training data. extractor.
We model the attack detection of smart grid as a bi- Similarly, the label predictor aims to learn a function Gy
nary classification problem, that is, to classify the events from the extracted features to the class labels:
as an attack or a natural event. The source domain
DS and target domain DT have the same feature space Gy (x) = sof tmax(VGf (x) + c) (2)
X but different combination of attack types and nor- where sof tmax(a) = [ |a|
e ai |a|
] , (V, c) ∈ RL×D × RL
mal events. The marginal probability distribution P(X) is j=1 eaj i=1
thus different. We denote the labeled source domain as is a matrix-vector pair of the weights V and the bias c of the
DS = {(xS1 , yS1 ), ..., (xSnS , ySnS )} and the unlabeled label predictor.
target-domain as DT = {(xT1 , yT1 ), ..., (xSnT , ySnT )}. The Given a sample-label pair (x, y), the loss function of the
goal is to learn a predictive function f (·) from the training binary classification problem is given as:
data that predicts labels in the target domain. Ly (x, y) = −y log[Gy (x)] − (1 − y) log[1 − Gy (x)] (3)
The following problems will be considered in this paper:
• One threat appears only in DS and another new threat For the source domain, the domain-adversarial training can
appears only in DT ; thus be formulated as the following optimization problem:

• The same type of threats appears in both DS and DT 1
but each domain contains attacks at different locations. min Ly (x, y) + λ · R(W, b) (4)
W,b,V,c nS
x∈DS
B. The Original Domain-Adversarial Training Design where R(W, b) is a regularizer weighted by a hyper-
The DANN architecture consists of three neural networks: parameter λ. The regularizer determines the penalty for
a feature extractor, a domain classifier, and a label predictor. the feature extractor, which provides the mapping from the
Note that each network can be implemented as a single or source/ target domains to a new feature space where the
multi-layer perceptron, where the simplest structure was used domain adaption loss, or the dissimilarity of samples mapped
in this paper, as by Ganin et al. in [10]. from the source and target domains, are minimized.
The feature extractor learns a function Gf : X → RD that The domain classifier learns a logistic regression GD →
maps an m-dimensional input into a D-dimensional feature: [0, 1] that predicts whether an input x is from DS or DT :
Gf (x) = sigmoid(Wx + b) (1) Gd (x) = sigmoid(uT Gf (x) + z) (5)
|a| where (u, z) ∈ RD × R is a vector-scalar pair of the weights

where sigmoid(a) = [ 1+e1−ai ]i=1 , (W, b) ∈ RD×m × RD u and the bias z of the domain classifier.
is a matrix-vector pair of the weights W and the bias b of Given a sample x, the domain adaption loss Ld is given
the feature extractor. As with other deep learning techniques, as:
there is no direct loss function associated with the feature ex-
tractor; the losses of the label predictor and domain classifier Ld (x, dx ) = −dx log[Gd (x)] − (1 − dx ) log[1 − Gd (x)] (6)
where dx = 0 if x ∈ DS and dx = 1 if x ∈ DT .
Subsequently, the regularizer R(W, b) can be defined as:
1 1
R(W, b) = − Ld (x, dx )− Ld (x, dx ) (7)
nS nT
x∈DS x∈DT
and the optimization problem in (4) can be rewritten as:

1 λ
min Ly (x, y) − Ld (x, dx )
W,V,b,c,u,z nS nS
x∈DS x∈DS
(8)
λ
− Ld (x, dx )
nT
x∈DT
C. Domain-Adversarial Training with Different Classifiers

The original DANN employed simple perceptron networks
and a softmax layer as the label predictor/classifier. The
design can be easily extended to integrate many alternative
classifiers who may perform well in different domain-specific
tasks [7].
Aware of this potential, we adopt a two-phase domain- Fig. 3. The data collection testbed and the attack threats [11].
adversarial training. The first phase consists of Steps 1 and 2 As shown in Table I, this paper considers a binary clas-
in Fig. 2, in which a rough training process is performed with sification problem between normal and attack events; the
the original DANN so that the errors can be back-propagated attack class further consists of two threat scenarios, i.e.,
to the feature extractor based on both label prediction and data injection (DI) and remote tripping command injec-
domain adaption losses. The second phase consists of Step tion (RTCI). Table II shows the cases of threats in the
3 in Fig. 2, in which the label predictor is replaced with source domain (known/labeled) and threats in the target
an alternative ML-based binary classifier. The new classifier domain(unknown/unlabeled) according to Sect. III-A. The
can be trained using the extracted features to exploit the normal-class is present in all cases and accounts for 50%
knowledge transferred between the source and target domains. of both training and testing data, respectively. Classifiers in
The final class label (normal vs. attack) will be given by the Cases 1 and 2 are trained on false data injection and faced
alternative classifier in Step 4 using the extracted features. by unseen remote tripping commands. The unlabeled target
IV. R ESULTS domain in Case 1 contains only RTCI threat while in Case 2 a
mixture of both the known DI and the unknown RTCI threat;
A. Experiment Setup the combination of both cases will allow more comprehensive
A standardized smart grid cyber-attack dataset [11] (more evaluation of the performance. Similarly, classifiers in Cases
details therein) was used for training and testing. The data 3 and 4 are trained on RTCI threats are faced by unseen
are collected by the University of Alabama in Huntsville DI threats. Classifiers in Cases 5 and 6 are trained on RTCI
and Oak Ridge National Laboratory on a hardware-in-the- threat on one relay and faced by unseen threats launched on
loop testbed. Illustrated in Fig. 3, the testbed implemented the other one on the same line. The extractor will extract
a high-level wide-area power grid abstracted into 3 buses, 60 features from the original 128; for other parameters, the
2 generators (G1 and G2), and 4 PMU-embedded intelligent domain adaptation regularizer is set as 0.5, the learning rate
relays (R1-R4) connected to four circuit breakers (BR1-BR4). as 0.1, and the number of epochs as 1,000.
Measurements of the relays are sent through a simulated Similar to [17], [23], we choose AdaBoost (AdB), k-
wide-area network to the phasor data concentrator (PDC) Nearest Neighbor (kNN), Support Vector Machine (SVM),
and then forwarded through a security gateway to the control Random Forest (RF), Classification and Regression Tree
room. (CART), and Artificial Neural Network (ANN) as the baseline
The dataset includes samples of 128 synchrophasor mea- classifiers and implement them with the Python scikit-learn
surements, relay status, control panel logs, and Snort logs. library.
TABLE I
TABLE II
S UMMARY OF C LASSES AND S CENARIOS U SED IN THE S TUDY
C ASE S ETUP FOR D OMAIN A DAPTION
Classes Scenarios Descriptions
Cases Threat in Source Domain Threat(s) in Target Domain
Normal No Events Normal operation with load variation
1 DI RTCI
Data Injection (DI) Attacker manipulates current, volt-
2 DI DI and RTCI
age, etc. to mislead controllers and/or
operators into mal-operations. 3 RTCI DI
Attack Remote Tripping Attacker sends a command to a relay 4 RTCI DI and RTCI
Command Injection and open a circuit breaker, directly 5 RTCI-15 (Relay R1) RTCI-16 (Relay R2)
(RTCI) causing a line outage. 6 RTCI-17 (Relay R3) RTCI-18 (Relay R4)
TABLE III
C OMPARISON OF ORIGINAL AND DOMAIN - ADVERSARIAL MACHINE LEARNING CLASSIFIERS AGAINST UNKNOWN THREAT VARIANTS
Cases Methods AdB kNN SVM RF CART ANN
Original 72.0% 77.6% 57.9% 51.4% 53.2% 83.0%
1 Domain-Adversarial 86.8% 86.0% 82.9% 88.2% 76.3% 84.5%
Improvement +14.8% +8.5% +25.0% +36.8% +23.1% +1.5%
Original 77.3% 82.7% 71.0% 84.5% 76.7% 86.5%
Improvement +16.9% +7.8% +14.5% +10.7% +2.5% +1.3%
Original 73.0% 75.1% 64.8% 76.0% 61.3% 81.7%
Improvement +10.6% +7.1% +16.1% +8.5% +14.4% +1.9%
Original 71.2% 80.5% 66.7% 83.0% 69.5% 85.9%
Improvement +18.1% +7.7% +18.6% +7.0% +10.1% +1.6%
B. Detection of Unknown Threats Types

Table III shows the accuracy with the original and the
domain-adversarial trained classifiers, along with the perfor-
mance improvement, for Cases 1 to 4 with unknown threat
types. It can be seen that while different baseline classifiers
demonstrated accuracy among these cases, all have benefited
from the domain-adversarial training to various extents. The
significant improvements were observed on AdB, kNN, SVM,
and RF classifiers, with improvements from 7% up to 36.8%;
even for the less significant ones, the CART and the ANN,
the average improvement is over 2.5%. Comparing between (a)
Cases 1 and 2, as well as between Cases 3 and 4, it can
be seen that although all baseline classifiers demonstrated
boosted performances (and less improvement) in the presence
of known threats in the unlabeled target domain, they are
still unable to handle unknown threats alone and would still
benefit from the domain adaptability from domain-adversarial
training. In practical smart grid communications where the
threats are expected to evolve/revolve frequently, such ca-
pacity can allow a trained/deployed model to remain ro-
bust/resilient against the unknown threats for better situational
awareness. (b)
Fig. 4. Detection of threats on unseen locations in (a) Case 5 and (b) Case 6.
C. Detection of Unseen Threats Locations
against attacks while reducing the costs of an exhaustive
Fig. 4 shows the performance of Cases 5 and 6 where the point-by-point re-training.
same type of attacks are launched at two different locations,
among which the classifiers are only partially trained for at- D. Choice of Hyper-Parameters
tacks on one location. The accuracy of all baseline classifiers
Hyper-parameters are controlled variables to be preset for
mostly range only within 50% to 70%, which have all been
machine learning algorithms, which can significantly affect
improved to close to 80% after domain-adversarial training.
the performance of machine learning algorithms. In our
It has been noted that such performance may not meet
framework, we recognize the number of extracted features
the practical requirement for deployment, and the limited
(Nf ) and the value of domain adaptation regularizer (λ) as
number of samples (less than 2,000) for these two cases
two such key hyper-parameters and conduct ablation analysis
have been identified as the main reason. Nevertheless, the
of their influence. As an example, 50 independent experi-
results have still demonstrated that the domain-adversarial
ments were run on Case 3 with random initial weights and
training would allow trained models to extend their learned
following values:
knowledge to attack events at different locations. In a large-
scale network that expect more device connections, especially • Nf : from 5 to 95 with a step size of 5;
with increasing customer participation via smart meters, the • λ: from 0.01 to 2.00 at the 1st, 2nd, and 5th decile (0.01,
adaptability provided by the TL framework can allow the 0.02, 0.05, 0.1, ...).
learned models to handle known threats occurred on new The results are shown in Fig. 5, with two x-axes above
devices and locations in the network, improving the resiliency and below the figure. The accuracy increases as Nf increases
[5] “Cyber-attack against ukrainian critical infrastructure,” https://

www.ics-cert.us-cert.gov/alerts/IR-ALERT-H-16-056-01, The Indus-
trial Control Systems Cyber Emergency Response Team (ICS-CERT),
Tech. Rep., 2016.
[6] G. Koutsandria, V. Muthukumar, M. Parvania, S. Peisert, C. McParland,
and A. Scaglione, “A hybrid network IDS for protective digital relays in
the power transmission grid,” in 2014 IEEE International Conference
on Smart Grid Communications (SmartGridComm), Nov 2014, pp.
908–913.
[7] A. Buczak and E. Guven, “A survey of data mining and machine
learning methods for cyber security intrusion detection,” IEEE Com-
munications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016.
[8] A. Zamir, A. Sax, W. Shen, L. Guibas, J. Malik, and S. Savarese,
“Taskonomy: Disentangling task transfer learning,” in The IEEE Con-
ference on Computer Vision and Pattern Recognition (CVPR), June
Fig. 5. Performance with different Nf and λ. 2018.
[9] Z. Cao, M. Long, J. Wang, and M. Jordan, “Partial transfer learning
or λ decreases; the value peaked at 83.7% when Nf = 60 with selective adversarial networks,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), June 2018.
and at 83.6% for λ = 0.5, respectively, which were chosen [10] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavi-
as the best-performing parameter for the experiments. The olette, M. Marchand, and V. Lempitsky, “Domain-adversarial training
comparison suggested that a relatively large Nf and a small of neural networks,” Journal of Machine Learning Research, vol. 17,
no. 1, pp. 2096–2030, Jan. 2016.
λ may be the suitable choice for future studies. [11] Industrial control system (ICS) cyber attack datasets. https://sites.
google.com/a/uah.edu/tommy-morris-uah/ics-data-sets.
V. C ONCLUSION [12] Z. Chu, J. Zhang, O. Kosut, and L. Sankar, “Unobservable false data
injection attacks against pmus: Feasible conditions and multiplicative
This paper proposed a transfer learning framework based attacks,” in 2018 IEEE International Conference on Communications,
on the domain-adversarial training technique [10] for intru- Control, and Computing Technologies for Smart Grids (SmartGrid-
sion detection against unknown threat variants. The frame- Comm), Oct 2018, pp. 1–6.
[13] M. Jamei, A. Scaglione, and S. Peisert, “Low-resolution fault local-
work is capable of integrating different baseline machine ization using phasor measurement units with community detection,”
learning classifiers to improve their performance against in 2018 IEEE International Conference on Communications, Control,
unknown threat/variants of different types and locations. and Computing Technologies for Smart Grids (SmartGridComm), Oct
2018, pp. 1–6.
Results have shown that all baseline classifiers can ben- [14] M. Ozay, I. Esnaola, F. Yarman Vural, S. Kulkarni, and H. Poor,
efit significantly from the domain-adversarial training and “Machine learning methods for attack detection in the smart grid,”
demonstrate robust performance against unseen types and IEEE Transactions on Neural Networks and Learning Systems, vol. 27,
no. 8, pp. 1773–1786, Aug. 2016.
locations of threats. Discussions on the hyper-parameters [15] J. Yan, B. Tang, and H. He, “Detection of false data attacks in smart
have also been provided for deployment in practice. In the grid with supervised learning,” in 2016 International Joint Conference
future, we will further investigate larger scales and more on Neural Networks (IJCNN), July 2016, pp. 1395–1402.
[16] C. Zhang, S. R. Kuppannagari, R. Kannan, and V. K. Prasanna,
detailed specifications of both cyber and physical systems, “Generative adversarial network for synthetic time series data gen-
to extend the knowledge transfer ability among events like eration in smart grids,” in 2018 IEEE International Conference on
normal operations, system faults, and intentional attacks as Communications, Control, and Computing Technologies for Smart
Grids (SmartGridComm), Oct 2018, pp. 1–6.
well as scenarios of heterogeneous manufacturers, protocols, [17] R. Borges Hink, J. Beaver, M. Buckner, T. Morris, U. Adhikari,
and networks. and S. Pan, “Machine learning for power system disturbance and
cyber-attack discrimination,” in 2014 7th International Symposium on
ACKNOWLEDGMENT Resilient Control Systems (ISRCS), Aug. 2014, pp. 1–8.
[18] D. Wilson, Y. Tang, J. Yan, and Z. Lu, “Deep learning-aided cyber-
This work was supported by the Natural Sciences and attack detection in power transmission systems,” in 2018 IEEE Power
Engineering Research Council of Canada (NSERC) under Energy Society General Meeting (PESGM), Aug. 2018, pp. 1–5.
[19] C. Hu, J. Yan, and C. Wang, “Advanced cyber-physical attack classifi-
RGPIN-2018-06724 and the Fonds de Recherche du Québec cation with extreme gradient boosting for smart transmission grids,” in
- Nature et Technologies (FRQNT) under 2019-NC-254971. 2019 IEEE Power Energy Society General Meeting (PESGM), accepted.
[20] S. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions
R EFERENCES on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359,
Oct. 2010.
[1] D. Mashima, B. Chen, T. Zhou, R. Rajendran, and B. Sikdar, “Se- [21] Z. Taghiyarrenani, A. Fanian, E. Mahdavi, A. Mirzaei, and H. Farsi,
curing substations through command authentication using on-the-fly “Transfer learning based intrusion detection,” in 2018 8th International
simulation of power system dynamics,” in 2018 IEEE International Conference on Computer and Knowledge Engineering (ICCKE), Oct.
Conference on Communications, Control, and Computing Technologies 2018, pp. 92–97.
for Smart Grids (SmartGridComm), Oct 2018, pp. 1–7. [22] R. Ahmadi, R. D. Macredie, and A. Tucker, “Intrusion detection using
[2] J. Chromik, C. Pilch, P. Brackmann, C. Duhme, F. Everinghoff, transfer learning in machine learning classifiers between non-cloud
A. Giberlein, T. Teodorowicz, J. Wieland, B. Haverkort, and A. Remke, and cloud datasets,” in Intelligent Data Engineering and Automated
“Context-aware local intrusion detection in scada systems: A testbed Learning (IDEAL), 2018, pp. 556–566.
and two showcases,” in 2017 IEEE International Conference on Smart [23] J. Zhao, S. Shetty, and J. Pan, “Feature-based transfer learning for
Grid Communications (SmartGridComm), Oct 2017, pp. 467–472. network security,” in 2017 IEEE Military Communications Conference
[3] H. He and J. Yan, “Cyber-physical attacks and defences in the smart (MILCOM), Oct. 2017, pp. 17–22.
grid: a survey,” IET Cyber-Physical Systems: Theory Applications, [24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
vol. 1, no. 1, pp. 13–27, 2016. S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
[4] “Business blackout: The insurance implications of a cyber attack on Advances in neural information processing systems, 2014, pp. 2672–
the us power grid,” Lloyds and the University of Cambridge Centre for 2680.
Risk Studies, Tech. Rep., 2015.

(1-5) Zhang2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(1-5) Zhang2019

Uploaded by

Copyright:

Available Formats

,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&RPPXQLFDWLRQV&RQWURODQG&RPSXWLQJ7HFKQRORJLHVIRU6PDUWULGV6PDUWULG&RPP

Domain-Adversarial Transfer Learning for Robust

Abstract—The smart grid faces growing cyber-physical attack Control Center

cating over the complex cyber-infrastructure. Thanks to the Control LANs

Step 1: Data Preprocess

Trained Feature Extractor

Step 4: Predict target domain data

Gf (x) = sigmoid(Wx + b) (1) Gd (x) = sigmoid(uT Gf (x) + z) (5)

|a| where (u, z) ∈ RD × R is a vector-scalar pair of the weights

and the optimization problem in (4) can be rewritten as:

C. Domain-Adversarial Training with Different Classiﬁers

B. Detection of Unknown Threats Types

[5] “Cyber-attack against ukrainian critical infrastructure,” https://

You might also like

(1-5) Zhang2019

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(1-5) Zhang2019

Uploaded by

Copyright:

Available Formats

,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&RPPXQLFDWLRQV&RQWURODQG&RPSXWLQJ7HFKQRORJLHVIRU6PDUW*ULGV 6PDUW*ULG&RPP

Domain-Adversarial Transfer Learning for Robust

Abstract—The smart grid faces growing cyber-physical attack Control Center

cating over the complex cyber-infrastructure. Thanks to the Control LANs

Step 1: Data Preprocess

Trained Feature Extractor

Step 4: Predict target domain data

Gf (x) = sigmoid(Wx + b) (1) Gd (x) = sigmoid(uT Gf (x) + z) (5)

|a| where (u, z) ∈ RD × R is a vector-scalar pair of the weights

and the optimization problem in (4) can be rewritten as:

C. Domain-Adversarial Training with Different Classiﬁers

B. Detection of Unknown Threats Types

[5] “Cyber-attack against ukrainian critical infrastructure,” https://

You might also like

,(((,QWHUQDWLRQDO&RQIHUHQFHRQ&RPPXQLFDWLRQV&RQWURODQG&RPSXWLQJ7HFKQRORJLHVIRU6PDUWULGV6PDUWULG&RPP