You are on page 1of 80

A Mini Project Report on

ELECTRICITY THEFT CYBER ATTACK DETECTION AND


PREDICTION FOR FUTURE IOT-BASED SMART ELECTRIC
METERS
Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD

In Partial fulfilment of the requirement for the award of degree of the


BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
By

R.VAMSHANK VARDHAN - 20RA1A0599


M.VAMSHI -20RA1A05A1
K. SANJAY -20RA1A0574

Under the guidance of


Mr.P. KAMARAJA PANDIAN

Assistant Professor
Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


KOMMURI PRATAP REDDY INSTITUTE OF TECHNOLOGY
(Affiliated to JNTUH, Ghanpur(V), Ghatkesar(M), Medchal(D)-500088)

2020- 2024
KOMMURI PRATAP REDDY INSTITUTE OF TECHNOLOGY
(Affiliated to JNTUH, Ghanpur(V), Ghatkesar(M), Medchal(D)-500088)

CERTIFICATE

This is to certify that the project work entitled “ELECTRICITY


THEFT CYBER ATTACK DETECTION AND PREDICTION FOR
FUTURE IOT-BASED SMART ELECTRIC METERS” is submitted by
Mr.R.VAMSHANK VARDHAN, Mr.M.VAMSHI and Mr.K.SANJAY bonafied
students of Kommuri Pratap Reddy Institution of Technology in partial
fulfillment of the requirement for the reward of the Degree of Bachelor of
Technology in Computer Science and Engineering of the Jawaharlal Nehru
Technological University Hyderabad, during the year 2023-24.

Internal Examiner HOD


Mr.P. Kamaraja Pandian Dr.C. Bagath Basha

External Examiner

I|K P R I T- C S E
DECLARATION

We hereby declare that this project work entitled “ELECTRICITY THEFT CYBER
ATTACK DETECTION AND PREDICTION FOR FUTURE IOT-BASED SMART
ELECTRIC METERS” in partial fulfillment of requirements for the award of degree of
Computer Science and Engineering is a bonafide work carried out by us during the
academic year 2023-24.

We further declare that this project is a result of our effort and has not been submitted for
the award of any degree by us to any institute.

By

R. VAMSHANK VARDHAN (20RA1A0599)

M.VAMSHI (20RA1A05A1)

K. SANJAY (20RA1A0574)

II | K P R I T - C S E
ACKNOWLEDGEMENT
It gives us immense pleasure to acknowledge with gratitude, the help and support extended
throughout the project report from the following:

We will be very much grateful to almighty our Parents who have made us capable of carrying
out our job.

We express our profound gratitude to Dr. P. Srinivasa Rao, Professor, Principal of Kommuri
Pratap Reddy Institute of Technology, who has encouraged in completing our project report
successfully.

We are grateful to Dr.C. Bagath Basha who is our Head of the Department, CSE for his
amiable ingenious and adept suggestions and pioneering guidance during the project report.

We express our gratitude and thanks to the Project Coordinator Mr.K.Srinivasa Rao,Asst
Professor of our department for his contribution for making it success with in the given time
duration.

We express our deep sense of gratitude and thanks to Internal Guide, Mr.P. Kamaraja
Pandian ,Assistant Professor, for his guidance during the project report.

We are also very thankful to our Management, Staff Members and all Our Friends for their
valuable suggestions and timely guidance without which we would not have been completed
it.

By

R.VAMSHANK VARDHAN - 20RA1A0599


M.VAMSHI -20RA1A05A1
K. SANJAY -20RA1A0574

III | K P R I T - C S E
Vision of the Institute

To emerge as a premier institute for high quality professional graduates who can contribute to
economic and social developments of the Nation.

Mission of the Institute

Mission Statement

IM1
To have holistic approach in curriculum and pedagogy through industry
interface to meet the needs of Global Competency.
IM2
To develop students with knowledge, attitude, employability skills,
entrepreneurship, research potential and professionally Ethical citizens.

IM3
To contribute to advancement of Engineering & Technology that would
help to satisfy the societal needs.
IM4
To preserve, promote cultural heritage, humanistic values and Spiritual
values thus helping in peace and harmony in the society.

IV | K P R I T - C S E
Vision of the Department

To Provide Quality Education in Computer Science for the innovative professionals to work
for the development of the nation.

Mission of the Department

Mission Statement

DM1 Laying the path for rich skills in Computer Science through the basic
knowledge of mathematics and fundamentals of engineering

DM2 Provide latest tools and technology to the students as a part of learning

Infrastructure

DM3 Training the students towards employability and entrepreneurship to meet the
societal needs.

DM4 Grooming the students with professional and social ethics

V|K P R I T- C SE
Program Educational Objectives (PEOs)
PEO’S Statement

PEO1 The graduates of Computer Science and Engineering will have


successful career in technology.

PEO2 The graduates of the program will have solid technical and professional
foundation to continue higher studies.

PEO3 The graduate of the program will have skills to develop products,

offer services and innovation.

PEO4 The graduates of the program will have fundamental awareness of

industry process, tools and technologies.

VI | K P R I T - C S E
Program Outcomes

PO1 Engineering Knowledge: Apply the knowledge of mathematics, science, Engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.

PO2 Problem Analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.

Design/development of Solutions: Design solutions for complex engineering problems and


PO3 design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

PO4 Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.

PO5 Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.

PO6 The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant
to the professional engineering practice.

PO7 Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental context, and demonstrate the knowledge of, and need
for sustainable development.

IV | K P R I T - C S E
PO8 Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.

Individual and team network: Function effectively as an individual, and as a member or


PO9 leader in diverse teams, and in multidisciplinary settings.

Communication: Communicate effectively on complex engineering activities with the


PO10 engineering community and with society at large, being able to comprehend and write
effective reports and design documentation, make Effective presentations, and give and
receive clear instructions.

Project management and finance: Demonstrate knowledge and understanding of the


PO11
engineering and management principles and apply these to one’s own work,as a member and
leader in a team, to manage projects and in multidisciplinary environment.

PO12 Life-Long learning: Recognize the need for, and have the preparation and able to engage in
independent and life-long learning in the broadest context of technological change.

PROGRAM SPECIFIC OUTCOMES

PSO1 Foundation of mathematical concepts: To use mathematical methodologies to crack problem


using suitable mathematical analysis, data structure and suitable algorithm.

PSO2 Foundation of Computer Science: The ability to interpret the fundamental concepts and
methodology of computer systems. Students can understand the functionality of hardware
and software aspects of computer systems.

PSO3 Foundation of Software development: The ability to grasp the software development
lifecycle and methodologies of software systems. Possess competent skills and knowledge
of software design process.

V|K P R I T- C SE
ABSTRACT

Electricity theft represents a pressing problem that has brought enormous financial losses
to electric utility companies worldwide. In the United States alone, $6 billion worth of
electricity is stolen annually. Traditionally, electricity theft is committed in the consumption
domain via physical attacks that includes line tapping or meter tampering. The smart grid
paradigm opens the door to new forms of electricity theft attacks. First, electricity theft can
be committed in a cyber manner. With the advanced metering infrastructure (AMI), smart
meters are installed at the customers’ premises and regularly report the customers’
consumption for monitoring and billing purposes. In this context, malicious customers can
launch cyber-attacks on the smart meters to manipulate the readings in a way that reduces
their electricity bill. Second, the smart grid paradigm enables customers to install
renewable-based distributed generation (DG) units at their premises to generate energy and
sell it back to the grid operator and hence make a profit. Therefore, this project evaluating
performance of various deep learning algorithms such as deep feed forward neural network
(DNN), and recurrent neural network with gated recurrent unit (RNN-GRU) for electricity
cyber-attack detection. Now-a-days in advance countries solar plates are used to generate
electricity and these users can sale excess energy to other needy users and they will be
maintained two different meters which will record consumption and production details.
While producing some malicious users may tamper smart meter to get more bill which can
be collected from electricity renewable distributed energy. This attack may cause huge
losses to agencies. To detect such attack, this project is employing deep learning models
which can detect all possible alterations to predict theft.

VI | K P R I T - C S E
TABLE OF CONTENTS

ABSTRACT ................................................................................................................................... VI
TABLE OF CONTENTS ........................................................................................................... VII
LIST OF FIGURES ...................................................................................................................... IX
1.INTRODUCTION........................................................................................................................ 1
2.LITERATURE SURVEY ............................................................................................................ 2
3.EXISTING SYSTEM................................................................................................................... 5
3.1 RNN-GRU.............................................................................................................................. 5
3.2 Limitations ............................................................................................................................. 5
4.PROPOSED SYSTEM ................................................................................................................ 7
4.1 Overview ................................................................................................................................ 7
4.2 Data Preprocessing ............................................................................................................. 10
4.3 Dataset Splitting .................................................................................................................. 11
4.4 DNN model .......................................................................................................................... 11
4.5 Advantages........................................................................................................................... 14
5.UML DIAGRAMS ..................................................................................................................... 16
5.1 Goals ..................................................................................................................................... 16
5.1.1 Class Diagram ................................................................................................................ 17
5.1.2 Sequence diagram .......................................................................................................... 17
5.1.2 Activity diagram ............................................................................................................ 18
5.1.3 Collaboration diagram.................................................................................................... 18
5.1.4 Deployment diagram ...................................................................................................... 19
5.1.5 Dataflow diagram........................................................................................................... 20
5.1.6 Component diagram: ...................................................................................................... 20
6.SOFTWARE ENVIRONMENT ............................................................................................... 21
6.1 What is Python? .................................................................................................................. 21
6.2 Advantages of Python ......................................................................................................... 21
6.3 Disadvantages of Python .................................................................................................... 23
6.4 History of Python ................................................................................................................ 24
6.5 Python Development Steps ................................................................................................. 25
6.6 Modules Used in Project ..................................................................................................... 26
6.7 How to Install Python on Windows and Mac ................................................................... 28
7.SYSTEM REQUIREMENTS SPECIFICATIONS ................................................................ 34

VII | K P R I T - C S E
7.1 Software Requirements ...................................................................................................... 34
7.2 Hardware Requirements .................................................................................................... 34
8.FUNCTIONAL REQUIREMENTS ......................................................................................... 35
8.1 Output Design...................................................................................................................... 35
8.2 Input Design ........................................................................................................................ 35
8.3 Error Avoidance.................................................................................................................. 37
8.4 Data Validation ................................................................................................................... 37
8.5 User Interface Design ......................................................................................................... 37
8.6 Computer-Initiated Interfaces ........................................................................................... 38
8.7 Performance Requirements ............................................................................................... 39
9.SOURCE CODE ........................................................................................................................ 40
10.RESULTS AND DISCUSSION .............................................................................................. 57
10.1 Implementation Description............................................................................................. 57
10.2 Dataset description............................................................................................................ 59
10.3 Results and description..................................................................................................... 59
11.CONCLUSION ........................................................................................................................ 64
11.1 Conclusion ......................................................................................................................... 64
12.2 Future Scope ...................................................................................................................... 64
13.REFERENCES ......................................................................................................................... 65

VIII | K P R I T - C S E
LIST OF FIGURES

PAGE
FIGURE NO FIGURE NAME NUMBER

FIGURE 3.1 RNN-GNN 5


FIGURE 4.1 Block diagram of proposed system 9
FIGURE 4.2 Feed-Forward Neural Network 12
FIGURE 4.3 Hidden layer architecture 12
FIGURE 5.1 class diagram 17
FIGURE 5.2 Sequence diagram 17
FIGURE 5.3 Activity diagram 18
FIGURE 5.4 Collaboration diagram 19
FIGURE 5.5 Deployment diagram 19
FIGURE 5.6 Dataflow diagram 20
FIGURE 5.7 Component diagram 20
FIGURE 10.1 User interface of the research 60
Sample Preprocessed outcome on electricity theft
FIGURE 10.2
dataset 60
FIGURE 10.3 Proposed DNN performance 61
FIGURE 10.4 Proposed DNN-RoC Curve 62
FIGURE 10.5 Existing GRU performance 62
FIGURE 10.6 Prediction results from test data 62

IX | K P R I T - C S E
1.INTRODUCTION

Electricity theft is defined as the consumed amount of energy that is not billed by the
consumers. This incurs major revenue losses for electric utility companies. All over the
world, electric utility companies lose $96 billion every year due to electricity theft. This
phenomenon affects all nations, whether rich or poor. For instance, Pakistan suffers 0.89
billion rupees of loss yearly due to non-technical losses (NTLs) [1] and in India, the
electricity loss exceeds 4.8 billion rupees annually. Electricity theft is also a threat to
countries with strong economies; i.e., in the U.S., the loss due to electricity theft is
approximately $6 billion, and in the UK, it is up to £175 million per annum. [2]. Moreover,
the rising electricity prices increase the burden on honest customers when the utility asks
them also to pay for the theft of energy. It also increases unemployment, the inflation rate
and decreases revenue and energy efficiency, which has adverse effects on a country’s
economic state.

Today, electric power loss has become one of the most conspicuous issues affecting both
conventional power grids and smart grids. From the statistics, it has been shown that
transmission and distribution losses increased from 11% to 16% between the years 1980 to
2000. The electricity losses vary from country to country. The losses in the USA, Russia,
Brazil, and India were 6%, 10%, 16%, and 18%, respectively, of their total energy
production [3]. The difference between the energy produced in one system and the metered
energy delivered to the users is known as the power loss. To determine the amount of
electricity loss, smart meters in smart grids play a prominent role. Advanced energy meters
obtain information from the consumers’ load devices and measure the consumption of
energy in intervals of an hour. The energy meter provides additional information to the
utility company and the system operator for better monitoring and billing and provides two-
way communications between the utility companies and consumers [4]. However, it is also
possible to limit the maximum amount of electricity consumption, which can terminate as
well as re-connect the supply of electricity from any remote place.

1|K P R I T- C S E
2.LITERATURE SURVEY

Hasan et. a [5] implemented a novel data pre-processing algorithm to compute the missing
instances in the dataset, based on the local values relative to the missing data point.
Furthermore, in this dataset, the count of electricity theft users was relatively low, which
could have made the model inefficient at identifying theft users. This class imbalance
scenario was addressed through synthetic data generation. Finally, the results obtained
indicate the proposed scheme can classify both the majority class (normal users) and the
minority class (electricity theft users) with good accuracy.

Zheng et. al [6] combined two novel data mining techniques to solve the problem. One
technique is the maximum information coefficient (MIC), which can find the correlations
between the nontechnical loss and a certain electricity behavior of the consumer. MIC can
be used to precisely detect thefts that appear normal in shapes. The other technique is the
clustering technique by fast search and find of density peaks (CFSFDP). CFSFDP finds the
abnormal users among thousands of load profiles, making it quite suitable for detecting
electricity thefts with arbitrary shapes. Next, a framework for combining the advantages of
the two techniques is proposed. Numerical experiments on the Irish smart meter dataset are
conducted to show the good performance of the combined method.

Li et. al [7] presented a novel CNN-RF model to detect electricity theft. In this model, the
CNN is similar to an automatic feature extractor in investigating smart meter data and the
RF is the output classifier. Because a large number of parameters must be optimized that
increase the risk of overfitting, a fully connected layer with a dropout rate of 0.4 is designed
during the training phase. In addition, the SMOT algorithm is adopted to overcome the
problem of data imbalance. Some machine learning and deep learning methods such as
SVM, RF, GBDT, and LR are applied to the same problem as a benchmark, and all those
methods have been conducted on SEAI and LCL datasets. The results indicate that the
proposed CNN-RF model is quite a promising classification method in the electricity theft
detection field because of two properties: The first is that features can be automatically
extracted by the hybrid model, while the success of most other traditional classifiers relies
largely on the retrieval of good hand-designed features which is a laborious and time-

2|K P R I T- C S E
consuming task. The second lies in that the hybrid model combines the advantages of the
RF and CNN, as both are the most popular and successful classifiers in the electricity theft
detection field.

Nabil et. al [8] proposed an efficient and privacy-preserving electricity theft detection
scheme for the AMI network and we refer to it as PPETD. Our scheme allows system
operators to identify the electricity thefts, monitor the loads, and compute electricity bills
efficiently using masked fine-grained meter readings without violating the consumers'
privacy. The PPETD uses secret sharing to allow the consumers to send masked readings to
the system operator such that these readings can be aggregated for the purpose of monitoring
and billing. In addition, secure two-party protocols using arithmetic and binary circuits are
executed by the system operator and each consumer to evaluate a generalized convolutional-
neural network model on the reported masked fine-grained power consumption readings for
the purpose of electricity theft detection. An extensive analysis of real datasets is performed
to evaluate the security and the performance of the PPETD.

Khan et. al [9] presents a new model, which is based on the supervised machine learning
techniques and real electricity consumption data. Initially, the electricity data are pre-
processed using interpolation, three sigma rule and normalization methods. Since the
distribution of labels in the electricity consumption data is imbalanced, an Adasyn algorithm
is utilized to address this class imbalance problem. It is used to achieve two objectives.
Firstly, it intelligently increases the minority class samples in the data. Secondly, it prevents
the model from being biased towards the majority class samples. Afterwards, the balanced
data are fed into a Visual Geometry Group (VGG-16) module to detect abnormal patterns
in electricity consumption. Finally, a Firefly Algorithm based Extreme Gradient Boosting
(FA-XGBoost) technique is exploited for classification. The simulations are conducted to
show the performance of our proposed model. Moreover, the state-of-the-art methods are
also implemented for comparative analysis, i.e., Support Vector Machine (SVM),
Convolution Neural Network (CNN), and Logistic Regression (LR). For validation,
precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Receiving Operating
Characteristics Area Under Curve (ROC-AUC), and Precision Recall Area Under Curve
(PR-AUC) metrics are used. Firstly, the simulation results show that the proposed Adasyn

3|K P R I T- C S E
method has improved the performance of FA-XGboost classifier, which has achieved F1-
score, precision, and recall of 93.7%, 92.6%, and 97%, respectively. Secondly, the VGG-16
module achieved a higher generalized performance by securing accuracy of 87.2% and
83.5% on training and testing data, respectively. Thirdly, the proposed FA-XGBoost has
correctly identified actual electricity thieves, i.e., recall of 97%. Moreover, our model is
superior to the other state-of-the-art models in terms of handling the large time series data
and accurate classification. These models can be efficiently applied by the utility companies
using the real electricity consumption data to identify the electricity thieves and overcome
the major revenue losses in power sector.

Kocaman et. al [10] developed by using deep learning methods on real daily electricity
consumption data (Electricity consumption dataset of State Grid Corporation of China).
Data reduction has been made by developing a new method to make the dataset more usable
and to extract meaningful results. A Long Short-Term Memory (LSTM) based deep learning
method has been developed for the dataset to be able to recognize the actual daily electricity
consumption data of 2016. In order to evaluate the performance of the proposed method,
the accuracy, prediction and recall metric was used by considering the five cross-fold
technique. Performance of the proposed methods were found to be better than previously
reported results.

Li et. al [11] presented a novel approach for automatic detection by using a multi-scale
dense connected convolution neural network (multi-scale DenseNet) in order to capture the
long-term and short-term periodic features within the sequential data. They compare the
proposed approaches with the classical algorithms, and the experimental results
demonstrate that the multi-scale DenseNet approach can significantly improve the accuracy
of the detection. Moreover, our method is scalable, enabling larger data processing while
no handcrafted feature engineering is needed.

Aldegheishem et. al [12] developed two novel ETD models. A hybrid sampling approach,
i.e., synthetic minority oversampling technique with edited nearest neighbor, is introduced
in the first model. Furthermore, AlexNet is used for dimensionality reduction and extracting
useful information from electricity consumption data.

4|K P R I T- C S E
3.EXISTING SYSTEM

3.1 RNN-GRU

Recurrent Neural Networks (RNNs) are a type of artificial neural network that is designed
to work with sequential data. RNNs are particularly useful in modeling time series data,
natural language processing, speech recognition, and many other tasks that involve
sequential inputs. One of the variants of RNNs is the Gated Recurrent Unit (GRU)
algorithm.

GRU is a variant of RNN that was introduced in 2014. GRU is a simpler and more efficient
version of the Long Short-Term Memory (LSTM) algorithm, which is another popular
variant of RNN. The GRU algorithm has a gating mechanism that allows it to selectively
update and reset the hidden state of the network at each time step. The gating mechanism
consists of two gates: the reset gate and the update gate. The reset gate determines how
much of the previous hidden state to forget, while the update gate determines how much of
the new information to add to the current hidden state. The GRU algorithm has fewer
parameters than the LSTM algorithm, which makes it faster and easier to train. GRU has
been shown to perform as well as LSTM on a wide range of tasks, including machine
translation, speech recognition, and image captioning.

3.2 Limitations

RNN-GRU are commonly used for analyzing sequential data like time series or natural
language. However, when it comes to detecting and predicting electricity theft cyber-attacks

5|K P R I T- C S E
in IoT-based smart electric meters, RNN-GRU models have certain limitations compared to
deep neural networks.

⎯ Limited long-term memory: RNN-GRU models struggle with capturing long-term


patterns due to a problem called the vanishing gradient. This means they may not
effectively detect subtle irregularities in electricity consumption over an extended
period, which is crucial for identifying theft activities.
⎯ Fixed input size: RNN-GRU models require fixed-size input sequences. But with
smart electric meters generating data of varying lengths, fitting the data into fixed-
size sequences can be challenging. This may require additional preprocessing steps
or truncating the data, potentially leading to information loss.
⎯ Limited parallelism: RNN-GRU models process data sequentially, which limits
their ability to take advantage of parallel processing. In practical scenarios involving
large datasets or computationally intensive tasks, this can result in slower training
and prediction times compared to deep neural networks that can parallelize
computations.
⎯ Model complexity and interpretability: Deep neural networks offer more flexibility
in terms of model complexity, incorporating multiple layers and various types of
units to capture complex relationships. However, this increased complexity can
make the models harder to interpret, making it challenging to understand the
underlying patterns associated with electricity theft cyber-attacks.

While RNN-GRU models have their limitations, they still have their strengths for certain
tasks. Their ability to capture short-term patterns in electricity consumption data can be
valuable. However, for more advanced detection and prediction tasks, it may be beneficial
to explore other deep neural network architectures, such as attention-based models or
Transformers. These architectures excel in capturing long-term dependencies, handling
variable-length sequences, and can provide better insights into electricity theft activities.

6|K P R I T- C S E
4.PROPOSED SYSTEM

4.1 Overview

Smart electric meters are devices that collect data about electricity usage, such as voltage,
current, power factor, and more. To detect and predict electricity theft or cyber-attacks, a
deep feed-forward neural network can be used. This type of neural network is designed to
process information in one direction, from the input layer to the output layer, without any
feedback connections. It is called "deep" because it has multiple hidden layers, allowing it
to learn complex patterns and representations. To use this neural network for electricity theft
and cyber-attack detection, the first step is to collect the relevant data from smart electric
meters. This data serves as the input for the neural network. Before feeding the data into the
network, preprocessing steps such as normalization, feature scaling, or outlier removal may
be necessary to ensure optimal performance. Next, the architecture of the neural network
needs to be designed. This involves determining the number of hidden layers, the number
of nodes in each layer, and the overall depth of the network. The complexity of the problem
at hand and the available data will guide these design decisions.

The neural network is then trained using a labeled dataset. This dataset should include
instances of normal electricity usage as well as instances where electricity theft or cyber-
attacks occurred. During training, the neural network learns to associate patterns in the input
data with the corresponding labels, enabling it to recognize similar patterns in the future.
The hidden layers of the neural network play a crucial role in feature extraction. They
automatically learn abstract representations of the input data, capturing relevant information
that can help in detecting patterns associated with electricity theft or cyber-attacks. Once
the neural network is trained, it can be used to predict and detect electricity theft or cyber-
attacks in real-time. The data from the smart electric meters is fed into the network, and the
output layer provides a prediction or detection result based on the learned patterns. To
ensure ongoing security, the system continuously monitors the incoming data from smart
electric meters. If the neural network detects any suspicious patterns or anomalies
associated with electricity theft or cyber-attacks, it can trigger an alert for further
investigation. Periodic retraining of the neural network is essential to adapt to evolving

7|K P R I T- C S E
attack techniques. As new data is collected and more instances of electricity theft or cyber-
attacks are detected, the neural network can be updated and improved to enhance its
performance.

Figure 4.1 shows the proposed system model. The detailed operation illustrated as follows:

Step 1. Dataset Preprocessing:

• Data Collection: Gather historical data from IoT-based smart electric meters,
including electricity consumption patterns, network traffic data, and any available
data related to past cyber attacks or electricity theft incidents.
• Data Integration: Combine data from various sources into a single dataset for
analysis. Ensure that data from different sources are compatible and have a common
time reference.
• Data Cleaning: Identify and handle missing values, outliers, and anomalies in the
dataset. Clean and sanitize the data to remove noise or errors that could affect model
performance.
• Data Transformation: Perform data transformations such as normalization or
standardization to ensure that all features have similar scales. Some data may also
require encoding (e.g., one-hot encoding) for machine learning models.
• Data Splitting: Divide the dataset into training, validation, and test sets. The
training set is used for model training, the validation set for hyperparameter tuning,
and the test set for model evaluation.

Step 2. Deep Neural Network (DNN) Model Building:

• Architecture Selection: Choose an appropriate DNN architecture for the task. In


the context of electricity theft and cyber attack detection, a neural network model,
such as a feedforward neural network or a convolutional neural network (CNN),
may be suitable.
• Model Design: Design the neural network architecture, including the number of
layers, the number of neurons in each layer, and the activation functions. Consider
incorporating techniques like dropout and batch normalization to improve model
generalization.

8|K P R I T- C S E
• Loss Function: Define an appropriate loss function for the binary classification
problem of detecting cyber attacks (1) vs. non-attacks (0).
• Optimization Algorithm: Select an optimization algorithm (e.g., Adam, RMSprop)
to update the neural network's weights during training.
• Training: Train the DNN model on the training dataset using the selected
optimization algorithm and loss function. Monitor training progress and use early
stopping to prevent overfitting.
• Hyperparameter Tuning: Tune hyperparameters, such as learning rate, batch size,
and the number of epochs, on the validation dataset to optimize model performance.

Step 3. Prediction:

• Model Evaluation: Evaluate the trained DNN model's performance on the test
dataset using appropriate evaluation metrics, such as accuracy, precision, recall, F1-
score, and receiver operating characteristic (ROC) curve.
• Cyber Attack Detection: Use the trained model to make real-time predictions on
incoming data from smart electric meters. The model can classify data points as
either normal or indicative of a cyber attack.

Fig. 4.1: Block diagram of proposed system.

9|K P R I T- C S E
4.2 Data Preprocessing

Data pre-processing is a process of preparing the raw data and making it suitable for a
machine learning model. It is the first and crucial step while creating a machine learning
model. When creating a machine learning project, it is not always a case that we come
across the clean and formatted data. And while doing any operation with data, it is
mandatory to clean it and put in a formatted way. So, for this, we use data pre-processing
task.

A real-world data generally contains noises, missing values, and maybe in an unusable
format which cannot be directly used for machine learning models. Data pre-processing is
required tasks for cleaning the data and making it suitable for a machine learning model
which also increases the accuracy and efficiency of a machine learning model.

One-Hot Encoding: Categorical variables are one-hot encoded to convert them into a
numerical format suitable for machine learning models. The code uses the
pd.get_dummies() function to create binary columns for each category within categorical
variables. This transformation allows machine learning algorithms to work with categorical
data effectively.

Standardization: Standard Scaler is applied to scale numeric features, ensuring that they
have a mean of 0 and a standard deviation of 1. The 'Standard Scaler' from scikit-learn is
used to standardize specific numeric features. Standardization is a common preprocessing
step to bring features to a similar scale, which can improve the performance of some
machine learning algorithms. This transformation is important for several reasons:

• Equal Scaling: StandardScaler scales each feature to have the same scale. This is
crucial for algorithms that are sensitive to the scale of features, such as gradient-
based optimization algorithms (e.g., in neural networks) and distance-based
algorithms (e.g., k-means clustering).

• Mean Centering: By subtracting the mean from each data point, StandardScaler
centers the data around zero. This can help algorithms converge faster during
training and improve their performance.

10 | K P R I T - C S E
• Normalization: Scaling by the standard deviation normalizes the data, ensuring that
features have comparable variances. This can prevent certain features from
dominating others in the modeling process.

• Interpretability: Standardized data is more interpretable because it puts all features


on a common scale, making it easier to compare the relative importance of features.

4.3 Dataset Splitting

In machine learning data pre-processing, we divide our dataset into a training set and test
set. This is one of the crucial steps of data pre-processing as by doing this, we can enhance
the performance of our machine learning model. Then, it will create difficulties for our
model to understand the correlations between the models. If we train our model very well
and its training accuracy is also very high, but we provide a new dataset to it, then it will
decrease the performance. So we always try to make a machine learning model which
performs well with the training set and also with the test dataset.

Training Set: A subset of dataset to train the machine learning model, and we already know
the output.

Test set: A subset of dataset to test the machine learning model, and by using the test set,
model predicts the output.

4.4 DNN model

At its simplest, a neural network with some level of complexity, usually at least two layers,
qualifies as a deep neural network (DNN), or deep net for short. Deep nets process data in
complex ways by employing sophisticated math modeling. To truly understand deep neural
networks, however, it’s best to see it as an evolution. A few items had to be built before deep
nets existed. First, machine learning had to get developed. ML is a framework to automate
(through algorithms) statistical models, like a linear regression model, to get better at
making predictions. A model is a single model that makes predictions about something.
Those predictions are made with some accuracy. A model that learns—machine learning—
takes all its bad predictions and tweaks the weights inside the model to create a model that
makes fewer mistakes. The learning portion of creating models spawned the development

11 | K P R I T - C S E
of artificial neural networks. ANNs utilize the hidden layer as a place to store and evaluate
how significant one of the inputs is to the output.

Deep neural nets, then, capitalize on the ANN component. They say, if that works so well
at improving a model—because each node in the hidden layer makes both associations and
grades importance of the input to determining the output—then why not stack more and
more of these upon each other and benefit even more from the hidden layer?.

Figure 4.3. Hidden layer architecture.

12 | K P R I T - C S E
In the context of detecting and predicting electricity theft cyber-attacks in IoT-based smart
electric meters, a deep feedforward neural network can be employed to analyze data
collected from these devices. The objective is to identify abnormal usage patterns or
potential instances of electricity theft, indicating a possible cyber-attack. Below is the
architecture of the neural network:

⎯ Sequential Layer: The sequential layer serves as the foundation of the neural
network model. It allows us to organize and stack other layers in a specific order,
determining how data flows through the network.
⎯ Dense Layers with ReLU Activation Function: The dense layers are fully connected
layers where each neuron is connected to all neurons in the previous and following
layers. ReLU (Rectified Linear Unit) activation function is commonly used to
introduce non-linearity into the network. It helps the network learn complex
relationships between input features and output labels. ReLU is defined as f(x) =
max(0, x), where x represents the input.
⎯ Dense Layer with Softmax Activation Function: The final dense layer in the network
is typically used for classification tasks. The softmax activation function is applied
to this layer, which produces a probability distribution across the different classes
in the problem. In the context of electricity theft cyber-attack detection, this layer
can be utilized to predict the likelihood of an attack or classify instances as normal
or abnormal.
⎯ Adam Optimizer: The Adam optimizer is a popular choice for training neural
networks. It is an enhanced version of stochastic gradient descent (SGD) that adjusts
the learning rate dynamically based on gradient characteristics. The Adam optimizer
facilitates faster convergence during training and often delivers good results.

To train a deep feedforward neural network for electricity theft cyber-attack detection and
prediction, we follow these steps:

⎯ Data Preparation: Gather data from IoT-based smart electric meters, including
information such as energy consumption, usage patterns, timestamps, and any other
relevant data. Label the data as normal or potentially indicative of electricity theft.

13 | K P R I T - C S E
⎯ Data Preprocessing: Clean the data by handling missing values, normalize the
features, and encode categorical variables if necessary. Split the data into training
and testing sets.
⎯ Model Architecture: Define the architecture of the deep feedforward neural network
with the sequential layer, dense layers with ReLU activation, and the final dense
layer with softmax activation. Determine the number of neurons and layers based
on the complexity of the problem and available computational resources.
⎯ Compile the Model: Configure the model with the Adam optimizer by an
appropriate loss function as categorical cross-entropy.
⎯ Training: Feed the training data into the model and optimize the network's weights
using the Adam optimizer. Monitor the loss function and evaluation metrics to
assess the model's performance.
⎯ Evaluation: Evaluate the trained model on the testing data to assess its ability to
generalize and perform on unseen instances. Adjust the model architecture or
hyperparameters if necessary.
⎯ Prediction: Once the model is trained and evaluated, it can be used to predict
electricity theft or detect abnormal usage patterns in real-time data from smart
electric meters.

4.5 Advantages

The development of an electricity theft cyber attack detection and prediction system for
future IoT-based smart electric meters offers several significant advantages:

• Enhanced Grid Security: One of the primary advantages is the bolstering of grid
security. By proactively detecting and predicting cyber attacks and electricity theft,
the system can prevent unauthorized access and manipulation of electric meters and
associated infrastructure. This safeguarding of the grid's integrity ensures a stable
and reliable power supply for consumers.

• Cost Savings: Detecting and preventing electricity theft has a direct financial
impact. Electricity theft is a significant concern for utility providers, as it leads to
revenue losses. By identifying and addressing theft incidents promptly, utilities can

14 | K P R I T - C S E
recover lost revenue and potentially lower electricity rates for law-abiding
consumers.

• Improved Billing Accuracy: The system contributes to more accurate billing. By


monitoring electricity consumption patterns and identifying anomalies associated
with theft.This fairness in billing builds trust with customers and reduces disputes.

• Early Threat Detection: Early detection of cyber attacks on smart electric meters
is crucial for preventing potential widespread disruptions. The system's ability to
recognize suspicious network behavior and anomalous patterns allows utilities to
respond swiftly and implement security measures to mitigate threats, safeguarding
the stability of the electric grid.

• Data-Driven Insights: The system generates valuable data-driven insights. By


analyzing historical data and cyber attack patterns, utilities can gain a deeper
understanding of attack vectors and vulnerabilities. This information informs the
development of more robust security strategies and policies.

15 | K P R I T - C S E
5.UML DIAGRAMS

UML stands for Unified Modeling Language. UML is a standardized general-purpose


modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group.

The goal is for UML to become a common language for creating models of object-oriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to;
or associated with, UML.

The Unified Modeling Language is a standard language for specifying,


Visualization, Constructing and documenting the artifacts of software system, as well as for
business modeling and other non-software systems. The UML represents a collection of
best engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

5.1 Goals

The Primary goals in the design of the UML are as follows:

• Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
• Provide extendibility and specialization mechanisms to extend the core concepts.
• Be independent of particular programming languages and development process.
• Provide a formal basis for understanding the modeling language.
• Encourage the growth of OO tools market.
• Support higher level development concepts such as collaborations, frameworks,
patterns and components.
• Integrate best practices.

16 | K P R I T - C S E
5.1.1 Class Diagram
In software engineering, a class diagram in the Unified Modeling Language (UML) is a
type of static structure diagram that describes the structure of a system by showing the
system's classes, their attributes, operations (or methods), and the relationships among the
classes. It explains which class contains information.

5.1.2 Sequence diagram


A sequence diagram represents the interaction between different objects in the system. The
important aspect of a sequence diagram is that it is time-ordered. This means that the exact
sequence of the interactions between the objects is represented step by step. Different
objects in the sequence diagram interact with each other by passing "messages".

17 | K P R I T - C S E
5.1.2 Activity diagram
The process flows in the system are captured in the activity diagram. Similar to a state
diagram, an activity diagram also consists of activities, actions, transitions, initial and final
states, and guard conditions.

5.1.3 Collaboration diagram


A collaboration diagram groups together the interactions between different objects. The
interactions are listed as numbered interactions that help to trace the sequence of the

18 | K P R I T - C S E
interactions. The collaboration diagram helps to identify all the possible interactions that
each object has with other objects.

5.1.4 Deployment diagram


The deployment diagram visualizes the physical hardware on which the software will be

deployed.

19 | K P R I T - C S E
5.1.5 Dataflow diagram
Data flow diagram is graphical representation of flow of data in an information system.

5.1.6 Component diagram: describes the organization and wiring of the physical
components in a system.

20 | K P R I T - C S E
6.SOFTWARE ENVIRONMENT

6.1 What is Python?

Below are some facts about Python.


• Python is currently the most widely used multi-purpose, high-level programming
language.

• Python allows programming in Object-Oriented and Procedural paradigms. Python


programs generally are smaller than other programming languages like Java.

• Programmers have to type relatively less and indentation requirement of the


language, makes them readable all the time.

• Python language is being used by almost all tech-giant companies like – Google,
Amazon, Facebook, Instagram, Dropbox, Uber… etc.

The biggest strength of Python is huge collection of standard libraries which can be used
for the following –

• Machine Learning

• GUI Applications (like Kivy, Tkinter, PyQt etc.)

• Web frameworks like Django (used by YouTube, Instagram, Dropbox)

• Image processing (like Opencv, Pillow)

• Web scraping (like Scrapy, BeautifulSoup, Selenium)

• Test frameworks

• Multimedia

6.2 Advantages of Python

Let’s see how Python dominates over other languages.

21 | K P R I T - C S E
1. Extensive Libraries

Python downloads with an extensive library and it contain code for various purposes like
regular expressions, documentation-generation, unit-testing, web browsers, threading,
databases, CGI, email, image manipulation, and more. So, we don’t have to write the
complete code for that manually.

2. Extensible

As we have seen earlier, Python can be extended to other languages. You can write some of
your code in languages like C++ or C. This comes in handy, especially in projects.

3. Embeddable

Complimentary to extensibility, Python is embeddable as well. You can put your Python
code in your source code of a different language, like C++. This lets us add scripting
capabilities to our code in the other language.

4. Improved Productivity

The language’s simplicity and extensive libraries render programmers more productive than
languages like Java and C++ do. Also, the fact that you need to write less and get more
things done.

5. IOT Opportunities

Since Python forms the basis of new platforms like Raspberry Pi, it finds the future bright
for the Internet of Things. This is a way to connect the language with the real world.

Advantages of Python Over Other Languages

1. Less Coding

Almost all of the tasks done in Python requires less coding when the same task is done in
other languages. Python also has an awesome standard library support, so you don’t have
to search for any third-party libraries to get your job done. This is the reason that many
people suggest learning Python to beginners.

2. Affordable

22 | K P R I T - C S E
Python is free therefore individuals, small companies or big organizations can leverage the
free available resources to build applications. Python is popular and widely used so it gives
you better community support.

The 2019 Github annual survey showed us that Python has overtaken Java in the most
popular programming language category.

3. Python is for Everyone

Python code can run on any machine whether it is Linux, Mac or Windows. Programmers
need to learn different languages for different jobs but with Python, you can professionally
build web apps, perform data analysis and machine learning, automate things, do web
scraping and also build games and powerful visualizations. It is an all-rounder programming
language.

6.3 Disadvantages of Python

So far, we’ve seen why Python is a great choice for your project. But if you choose it, you
should be aware of its consequences as well. Let’s now see the downsides of choosing
Python over another language.

1. Speed Limitations

We have seen that Python code is executed line by line. But since Python is interpreted, it
often results in slow execution. This, however, isn’t a problem unless speed is a focal point
for the project. In other words, unless high speed is a requirement, the benefits offered by
Python are enough to distract us from its speed limitations.

2. Weak in Mobile Computing and Browsers

While it serves as an excellent server-side language, Python is much rarely seen on


the client-side. Besides that, it is rarely ever used to implement smartphone-based
applications. One such application is called Carbonnelle.

The reason it is not so famous despite the existence of Brython is that it isn’t that secure.

3. Design Restrictions

23 | K P R I T - C S E
As you know, Python is dynamically-typed. This means that you don’t need to declare the
type of variable while writing the code. It uses duck-typing. But wait, what’s that? Well, it
just means that if it looks like a duck, it must be a duck. While this is easy on the
programmers during coding, it can raise run-time errors.

4. Underdeveloped Database Access Layers

Compared to more widely used technologies like JDBC (Java DataBase


Connectivity) and ODBC (Open DataBase Connectivity), Python’s database access layers
are a bit underdeveloped. Consequently, it is less often applied in huge enterprises.

5. Simple

No, we’re not kidding. Python’s simplicity can indeed be a problem. Take my example. I
don’t do Java, I’m more of a Python person. To me, its syntax is so simple that the verbosity
of Java code seems unnecessary.

This was all about the Advantages and Disadvantages of Python Programming Language.

6.4 History of Python

What do the alphabet and the programming language Python have in common? Right, both
start with ABC. If we are talking about ABC in the Python context, it's clear that the
programming language ABC is meant. ABC is a general-purpose programming language
and programming environment, which had been developed in the Netherlands, Amsterdam,
at the CWI (Centrum Wiskunde &Informatica). I don't know how well people know ABC's
influence on Python. I try to mention ABC's influence because I'm indebted to everything I
learned during that project and to the people who worked on it. "Later on in the same
Interview, Guido van Rossum continued: "I remembered all my experience and some of my
frustration with ABC. I decided to try to design a simple scripting language that possessed
some of ABC's better properties, but without its problems. So I started typing. I created a
simple virtual machine, a simple parser, and a simple runtime. I made my own version of
the various ABC parts that I liked. I created a basic syntax, used indentation for statement
grouping instead of curly braces or begin-end blocks, and developed a small number of
powerful data types: a hash table (or dictionary, as we call it), a list, strings, and numbers."

24 | K P R I T - C S E
6.5 Python Development Steps

Guido Van Rossum published the first version of Python code (version 0.9.0) at alt.sources
in February 1991. This release included already exception handling, functions, and the core
data types of lists, dict, str and others. It was also object oriented and had a module system.
Python version 1.0 was released in January 1994. The major new features included in this
release were the functional programming tools lambda, map, filter and reduce, which Guido
Van Rossum never liked. Six and a half years later in October 2000, Python 2.0 was
introduced. Python 3 is not backwards compatible with Python 2.x. The emphasis in Python
3 had been on the removal of duplicate programming constructs and modules, thus fulfilling
or coming close to fulfilling the 13th law of the Zen of Python: "There should be one -- and
preferably only one -- obvious way to do it."Some changes in Python 7.3:

• Print is now a function.

• Views and iterators instead of lists

• The rules for ordering comparisons have been simplified. E.g., a heterogeneous list
cannot be sorted, because all the elements of a list must be comparable to each
other.

• There is only one integer type left, i.e., int. long is int as well.

• The division of two integers returns a float instead of an integer. "//" can be used to
have the "old" behaviour.

• Text Vs. Data Instead of Unicode Vs. 8-bit

Purpose

We demonstrated that our approach enables successful segmentation of intra-retinal


layers—even with low-quality images containing speckle noise, low contrast, and different
intensity ranges throughout—with the assistance of the ANIS feature.

25 | K P R I T - C S E
6.6 Modules Used in Project

TensorFlow

TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library and is also used
for machine learning applications such as neural networks. It is used for both research and
production at Google.

TensorFlow was developed by the Google Brain team for internal Google use. It was
released under the Apache 2.0 open-source license on November 9, 2015.

NumPy

NumPy is a general-purpose array-processing package. It provides a high-performance


multidimensional array object, and tools for working with these arrays.

It is the fundamental package for scientific computing with Python. It contains various
features including these important ones:

• A powerful N-dimensional array object

• Sophisticated (broadcasting) functions

• Tools for integrating C/C++ and Fortran code

• Useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-
dimensional container of generic data. Arbitrary datatypes can be defined using NumPy
which allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Pandas

Pandas is an open-source Python Library providing high-performance data manipulation


and analysis tool using its powerful data structures. Python was majorly used for data
munging and preparation. It had very little contribution towards data analysis. Pandas
solved this problem. Using Pandas, we can accomplish five typical steps in the processing

26 | K P R I T - C S E
and analysis of data, regardless of the origin of data load, prepare, manipulate, model, and
analyze. Python with Pandas is used in a wide range of fields including academic and
commercial domains including finance, economics, Statistics, analytics, etc.

Matplotlib

Matplotlib is a Python 2D plotting library which produces publication quality figures in a


variety of hardcopy formats and interactive environments across platforms. Matplotlib can
be used in Python scripts, the Python and IPython shells, the Jupyter Notebook, web
application servers, and four graphical user interface toolkits. Matplotlib tries to make easy
things easy and hard things possible. You can generate plots, histograms, power spectra, bar
charts, error charts, scatter plots, etc., with just a few lines of code. For examples, see
the sample plots and thumbnail gallery.

For simple plotting the pyplot module provides a MATLAB-like interface, particularly
when combined with IPython. For the power user, you have full control of line styles, font
properties, axes properties, etc, via an object-oriented interface or via a set of functions
familiar to MATLAB users.

Scikit – learn

Scikit-learn provides a range of supervised and unsupervised learning algorithms via a


consistent interface in Python. It is licensed under a permissive simplified BSD license and
is distributed under many Linux distributions, encouraging academic and commercial use.
Python

Python is an interpreted high-level programming language for general-purpose


programming. Created by Guido van Rossum and first released in 1991, Python has a design
philosophy that emphasizes code readability, notably using significant whitespace.

• Python is Interpreted − Python is processed at runtime by the interpreter. You do


not need to compile your program before executing it. This is similar to PERL and
PHP.

• Python is Interactive − you can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.

27 | K P R I T - C S E
Python also acknowledges that speed of development is important. Readable and terse code
is part of this, and so is access to powerful constructs that avoid tedious repetition of code.
Maintainability also ties into this may be an all but useless metric, but it does say something
about how much code you have to scan, read and/or understand to troubleshoot problems
or tweak behaviors. This speed of development, the ease with which a programmer of other
languages can pick up basic Python skills and the huge standard library is key to another
area where Python excels. All its tools have been quick to implement, saved a lot of time,
and several of them have later been patched and updated by people with no Python
background - without breaking.

Install Python Step-by-Step in Windows and Mac

Python a versatile programming language doesn’t come pre-installed on your computer


devices. Python was first released in the year 1991 and until today it is a very popular high-
level programming language. Its style philosophy emphasizes code readability with its
notable use of great whitespace.

The object-oriented approach and language construct provided by Python enables


programmers to write both clear and logical code for projects. This software does not come
pre-packaged with Windows.

6.7 How to Install Python on Windows and Mac

There have been several updates in the Python version over the years. The question is how
to install Python? It might be confusing for the beginner who is willing to start learning
Python but this tutorial will solve your query. The latest or the newest version of Python is
version 3.7.4 or in other words, it is Python 3.

Note: The python version 3.7.4 cannot be used on Windows XP or earlier devices.

operating system. So the steps below are to install python version 3.7.4 on Windows 7
device or to install Python 3. Download the Python Cheatsheet here. The steps on how to
install Python on Windows 10, 8 and 7 are divided into 4 parts to help understand better.

Download the Correct version into the system

28 | K P R I T - C S E
Step 1: Go to the official site to download and install python using Google Chrome or any
other web browser. OR Click on the following link: https://www.python.org

Now, check for the latest and the correct version for your operating system.

Step 2: Click on the Download Tab.

29 | K P R I T - C S E
Step 3: You can either select the Download Python for windows 3.7.4 button in Yellow
Color or you can scroll further down and click on download with respective to their version.
Here, we are downloading the most recent python version for windows 3.7.4

Step 4: Scroll down the page until you find the Files option.

Step 5: Here you see a different version of python along with the operating system.

• To download Windows 64-bit python, you can select any one from the three options:
Windows x86-64 embeddable zip file, Windows x86-64 executable installer or
Windows x86-64 web-based installer.

Installation of PythonStep 1: Go to Download and Open the downloaded python version to


carry out the installation process.

30 | K P R I T - C S E
Step 2: Before you click on Install Now, made sure to put a tick on Add Python 3.7 to PATH.

Step 3: Click on Install NOW After the installation is successful. Click on Close.

With these above three steps on python installation, you have successfully and correctly
installed Python. Now is the time to verify the installation.

Note: The installation process might take a couple of minutes.

Verify the Python Installation

Step 1: Click on Start

Step 2: In the Windows Run Command, type “cmd”.

31 | K P R I T - C S E
Step 3: Open the Command prompt option.

Step 4: Let us test whether the python is correctly installed. Type python –V and press Enter.

Step 5: You will get the answer as 3.7.4

Check how the Python IDLE works

Step 1: Click on Start

Step 2: In the Windows Run command, type “python idle”.

32 | K P R I T - C S E
Step 3: Click on IDLE (Python 3.7 64-bit) and launch the program

Step 4: To go ahead with working in IDLE you must first save the file. Click on File > Click
on Save

Step 5: Name the file and save as type should be Python files. Click on SAVE. Here I have
named the files as Hey World.

Step 6: Now for e.g. enter print (“Hey World”) and Press Enter.

You will see that the command given is launched. With this, we end our tutorial on how to
install Python. You have learned how to download python for windows into your respective
operating system.

Note: Unlike Java, Python does not need semicolons at the end of the statements otherwise
it won’t work.

33 | K P R I T - C S E
7.SYSTEM REQUIREMENTS SPECIFICATIONS

7.1 Software Requirements

The functional requirements or the overall description documents include the product
perspective and features, operating system and operating environment, graphics
requirements, design constraints and user documentation.

The appropriation of requirements and implementation constraints gives the general


overview of the project in regard to what the areas of strength and deficit are and how to
tackle them.

• Python IDLE 3.7 version (or)


• Anaconda 3.7 (or)
• Jupiter (or)
• Google colab

7.2 Hardware Requirements

Minimum hardware requirements are very dependent on the particular software being
developed by a given Enthought Python / Canopy / VS Code user. Applications that need
to store large arrays/objects in memory will require more RAM, whereas applications that
need to perform numerous calculations or tasks more quickly will require a
faster processor.
Operating system : Windows, Linux

Processor : minimum intel i3

Ram : minimum 4 GB

Hard disk : minimum 250GB

34 | K P R I T - C S E
8.FUNCTIONAL REQUIREMENTS

8.1 Output Design

Outputs from computer systems are required primarily to communicate the results of
processing to users. They are also used to provides a permanent copy of the results for later
consultation. The various types of outputs in general are:

• External Outputs, whose destination is outside the organization


• Internal Outputs whose destination is within organization and they are the
• User’s main interface with the computer.
• Operational outputs whose use is purely within the computer department.
• Interface outputs, which involve the user in communicating directly.

Output Definition

The outputs should be defined in terms of the following points:

• Type of the output


• Content of the output
• Format of the output
• Location of the output
• Frequency of the output
• Volume of the output
• Sequence of the output

It is not always desirable to print or display data as it is held on a computer. It should be


decided as which form of the output is the most suitable.

8.2 Input Design

Input design is a part of overall system design. The main objective during the input design
is as given below:

35 | K P R I T - C S E
• To produce a cost-effective method of input.
• To achieve the highest possible level of accuracy.
• To ensure that the input is acceptable and understood by the user.

Input Types

It is necessary to determine the various types of inputs. Inputs can be categorized as follows:

• External inputs, which are prime inputs for the system.


• Internal inputs, which are user communications with the system.
• Operational, which are computer department’s communications to the system?
• Interactive, which are inputs entered during a dialogue.

Input Media

At this stage choice has to be made about the input media. To conclude about the input
media consideration has to be given to;

• Type of input
• Flexibility of format
• Speed
• Accuracy
• Verification methods
• Rejection rates
• Ease of correction
• Storage and handling requirements
• Security
• Easy to use
• Portability

Keeping in view the above description of the input types and input media, it can be said that
most of the inputs are of the form of internal and interactive. As

Input data is to be the directly keyed in by the user, the keyboard can be considered to be
the most suitable input device.

36 | K P R I T - C S E
8.3 Error Avoidance

At this stage care is to be taken to ensure that input data remains accurate form the stage at
which it is recorded up to the stage in which the data is accepted by the system. This can
be achieved only by means of careful control each time the data is handled.

Error Detection

Even though every effort is made to avoid the occurrence of errors, still a small proportion
of errors is always likely to occur, these types of errors can be discovered by using
validations to check the input data.

8.4 Data Validation

Procedures are designed to detect errors in data at a lower level of detail. Data validations
have been included in the system in almost every area where there is a possibility for the
user to commit errors. The system will not accept invalid data. Whenever an invalid data
is keyed in, the system immediately prompts the user and the user has to again key in the
data and the system will accept the data only if the data is correct. Validations have been
included where necessary.

The system is designed to be a user friendly one. In other words, the system has been
designed to communicate effectively with the user. The system has been designed with
popup menus.

8.5 User Interface Design

It is essential to consult the system users and discuss their needs while designing the user
interface:

User Interface Systems Can Be Broadly Clasified As:

• User initiated interface the user is in charge, controlling the progress of the
user/computer dialogue. In the computer-initiated interface, the computer selects the
next stage in the interaction.
• Computer initiated interfaces

37 | K P R I T - C S E
In the computer-initiated interfaces the computer guides the progress of the user/computer
dialogue. Information is displayed and the user response of the computer takes action or
displays further information.

User Initiated Intergfaces

User initiated interfaces fall into two approximate classes:

• Command driven interfaces: In this type of interface the user inputs commands or
queries which are interpreted by the computer.
• Forms oriented interface: The user calls up an image of the form to his/her screen
and fills in the form. The forms-oriented interface is chosen because it is the best
choice.

8.6 Computer-Initiated Interfaces

The following computer – initiated interfaces were used:

• The menu system for the user is presented with a list of alternatives and the user
chooses one; of alternatives.
• Questions – answer type dialog system where the computer asks question and takes
action based on the basis of the users reply.

Right from the start the system is going to be menu driven, the opening menu displays the
available options. Choosing one option gives another popup menu with more options. In
this way every option leads the users to data entry form where the user can key in the data.

Error Message Design

The design of error messages is an important part of the user interface design. As user is
bound to commit some errors or other while designing a system the system should be
designed to be helpful by providing the user with information regarding the error he/she has
committed.

This application must be able to produce output at different modules for different inputs.

38 | K P R I T - C S E
8.7 Performance Requirements

Performance is measured in terms of the output provided by the application. Requirement


specification plays an important part in the analysis of a system. Only when the requirement
specifications are properly given, it is possible to design a system, which will fit into
required environment. It rests largely in the part of the users of the existing system to give
the requirement specifications because they are the people who finally use the system. This
is because the requirements have to be known during the initial stages so that the system
can be designed according to those requirements. It is very difficult to change the system
once it has been designed and on the other hand designing a system, which does not cater
to the requirements of the user, is of no use.

The requirement specification for any system can be broadly stated as given below:

• The system should be able to interface with the existing system


• The system should be accurate
• The system should be better than the existing system
• The existing system is completely dependent on the user to perform all the duties.

39 | K P R I T - C S E
9.SOURCE CODE

from tkinter import *

import tkinter

from tkinter import filedialog

import numpy as np

from tkinter.filedialog import askopenfilename

import pandas as pd

from tkinter import simpledialog

import pandas as pd

import numpy as np

from sklearn.preprocessing import LabelEncoder

from sklearn.preprocessing import normalize

from keras.models import Sequential, Model

from keras.layers import Dense, Dropout, Activation

from keras.utils.np_utils import to_categorical

from keras.models import model_from_json

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import precision_score

from sklearn.metrics import recall_score

from sklearn.metrics import f1_score

from sklearn.metrics import accuracy_score

from sklearn.model_selection import train_test_split

40 | K P R I T - C S E
from sklearn import svm

import os

import matplotlib.pyplot as plt

from sklearn.metrics import roc_curve

from sklearn.metrics import roc_auc_score

from sklearn import metrics

from keras.layers import MaxPooling2D

from keras.layers import Flatten

from keras.layers import Convolution2D

from keras.layers import Bidirectional,GRU

main = tkinter.Tk()

main.title("Deep Learning Detection of Electricity Theft Cyber-attacks in Renewable


Distributed Generation")

main.geometry("1000x650")

global filename

global dnn_model

global X, Y

global le

global dataset

accuracy = []

precision = []

recall = []

fscore = []

41 | K P R I T - C S E
global classifier

global cnn_model

def uploadDataset():

global filename

global dataset

filename = filedialog.askopenfilename(initialdir = "Dataset")

text.delete('1.0', END)

text.insert(END,filename+' Loaded\n')

dataset = pd.read_csv(filename)

text.insert(END,str(dataset.head())+"\n\n")

def preprocessDataset():

global X, Y

global le

global dataset

le = LabelEncoder()

text.delete('1.0', END)

dataset.fillna(0, inplace = True)

dataset['client_id'] = pd.Series(le.fit_transform(dataset['client_id'].astype(str)))

dataset['label'] = dataset['label'].astype('uint8')

print(dataset.info())

dataset.drop(['creation_date'], axis = 1,inplace=True)

text.insert(END,str(dataset.head())+"\n\n")

dataset = dataset.values

42 | K P R I T - C S E
X = dataset[:,0:dataset.shape[1]-1]

Y = dataset[:,dataset.shape[1]-1]

Y = Y.astype('uint8')

indices = np.arange(X.shape[0])

np.random.shuffle(indices)

X = X[indices]

Y = Y[indices]

Y = Y.astype('uint8')

text.insert(END,"Total records found in dataset to train Deep Learning :


"+str(X.shape[0])+"\n\n")

def rocGraph(testY, predict, algorithm):

random_probs = [0 for i in range(len(testY))]

p_fpr, p_tpr, _ = roc_curve(testY, random_probs, pos_label=1)

plt.plot(p_fpr, p_tpr, linestyle='--', color='orange',label="True classes")

ns_fpr, ns_tpr, _ = roc_curve(testY, predict,pos_label=1)

plt.plot(ns_fpr, ns_tpr, linestyle='--', label='Predicted Classes')

plt.title(algorithm+" ROC Graph")

plt.xlabel('False Positive Rate')

plt.ylabel('True Positive rate')

plt.show()

43 | K P R I T - C S E
def runGRU():

global X, Y

Y1 = to_categorical(Y)

Y1 = Y1.astype('uint8')

X1 = np.reshape(X, (X.shape[0], X.shape[1], 1))

X_train, X_test, y_train, y_test = train_test_split(X1, Y1, test_size=0.2,


random_state=0)

if os.path.exists('model/gru_model.json'):

with open('model/gru_model.json', "r") as json_file:

loaded_model_json = json_file.read()

gru_model = model_from_json(loaded_model_json)

json_file.close()

gru_model.load_weights("model/gru_model_weights.h5")

gru_model._make_predict_function()

else:

counts = np.bincount(Y1[:, 0])

weight_for_0 = 1.0 / counts[0]

weight_for_1 = 1.0 / counts[1]

class_weight = {0: weight_for_0, 1: weight_for_1}

gru_model = Sequential() #defining deep learning sequential object

#adding GRU layer with 32 filters to filter given input X train data to select relevant
features

44 | K P R I T - C S E
gru_model.add(Bidirectional(GRU(32, input_shape=(X_train.shape[1],
X_train.shape[2]), return_sequences=True)))

#adding dropout layer to remove irrelevant features

gru_model.add(Dropout(0.2))

#adding another layer

gru_model.add(Bidirectional(GRU(32)))

gru_model.add(Dropout(0.2))

#defining output layer for prediction

gru_model.add(Dense(y_train.shape[1], activation='softmax'))

#compile GRU model

gru_model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])

#start training model on train data and perform validation on test data

hist = gru_model.fit(X_train, y_train, epochs=20, batch_size=16,


validation_data=(X_test, y_test),class_weight=class_weight)

#save model weight for future used

gru_model.save_weights('model/gru_model_weights.h5')

model_json = gru_model.to_json()

with open("model/gru_model.json", "w") as json_file:

json_file.write(model_json)

json_file.close()

y_test = np.argmax(y_test, axis=1)

45 | K P R I T - C S E
predict = gru_model.predict(X_test)

predict = np.argmax(predict, axis=1)

p = precision_score(y_test, predict,average='macro') * 100

r = recall_score(y_test, predict,average='macro') * 100

f = f1_score(y_test, predict,average='macro') * 100

a = accuracy_score(y_test,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

text.insert(END,"GRU Precision : "+str(p)+"\n")

text.insert(END,"GRU Recall : "+str(r)+"\n")

text.insert(END,"GRU FMeasure : "+str(f)+"\n")

text.insert(END,"GRU Accuracy : "+str(f)+"\n\n")

rocGraph(y_test, predict, "GRU")

def runCNN():

global X, Y

Y1 = to_categorical(Y)

Y1 = Y1.astype('uint8')

X1 = np.reshape(X, (X.shape[0], X.shape[1], 1, 1))

X_train, X_test, y_train, y_test = train_test_split(X1, Y1, test_size=0.2,


random_state=0)

global cnn_model

46 | K P R I T - C S E
if os.path.exists('model/cnn_model.json'):

with open('model/cnn_model.json', "r") as json_file:

loaded_model_json = json_file.read()

cnn_model = model_from_json(loaded_model_json)

json_file.close()

cnn_model.load_weights("model/cnn_model_weights.h5")

cnn_model._make_predict_function()

else:

counts = np.bincount(Y1[:, 0])

weight_for_0 = 1.0 / counts[0]

weight_for_1 = 1.0 / counts[1]

class_weight = {0: weight_for_0, 1: weight_for_1}

cnn_model = Sequential()

cnn_model.add(Convolution2D(32, 1, 1, input_shape = (X_train.shape[1],


X_train.shape[2], X_train.shape[3]), activation = 'relu'))

cnn_model.add(MaxPooling2D(pool_size = (1, 1)))

cnn_model.add(Convolution2D(32, 1, 1, activation = 'relu'))

cnn_model.add(MaxPooling2D(pool_size = (1, 1)))

cnn_model.add(Flatten())

cnn_model.add(Dense(output_dim = 256, activation = 'relu'))

cnn_model.add(Dense(output_dim = y_train.shape[1], activation = 'softmax'))

cnn_model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics =


['accuracy'])

47 | K P R I T - C S E
hist = cnn_model.fit(X_train, y_train, batch_size=64, epochs=20, shuffle=True,
verbose=2, validation_data=(X_test, y_test),class_weight=class_weight)

cnn_model.save_weights('model/cnn_model_weights.h5')

model_json = cnn_model.to_json()

with open("model/cnn_model.json", "w") as json_file:

json_file.write(model_json)

json_file.close()

y_test = np.argmax(y_test, axis=1)

predict = cnn_model.predict(X_test)

predict = np.argmax(predict, axis=1)

p = precision_score(y_test, predict,average='macro') * 100

r = recall_score(y_test, predict,average='macro') * 100

f = f1_score(y_test, predict,average='macro') * 100

a = accuracy_score(y_test,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

text.insert(END,"CNN Precision : "+str(p)+"\n")

text.insert(END,"CNN Recall : "+str(r)+"\n")

text.insert(END,"CNN FMeasure : "+str(f)+"\n")

text.insert(END,"CNN Accuracy : "+str(f)+"\n\n")

rocGraph(y_test, predict, "CNN")

48 | K P R I T - C S E
def runDNN():

text.delete('1.0', END)

global X, Y

global dnn_model

accuracy.clear()

precision.clear()

recall.clear()

fscore.clear()

Y1 = to_categorical(Y)

Y1 = Y1.astype('uint8')

X_train, X_test, y_train, y_test = train_test_split(X, Y1, test_size=0.2, random_state=0)

if os.path.exists('model/model.json'):

with open('model/model.json', "r") as json_file:

loaded_model_json = json_file.read()

dnn_model = model_from_json(loaded_model_json)

json_file.close()

dnn_model.load_weights("model/model_weights.h5")

dnn_model._make_predict_function()

print(dnn_model.summary())

else:

counts = np.bincount(Y1[:, 0])

weight_for_0 = 1.0 / counts[0]

49 | K P R I T - C S E
weight_for_1 = 1.0 / counts[1]

class_weight = {0: weight_for_0, 1: weight_for_1}

dnn_model = Sequential() #creating RNN model object

dnn_model.add(Dense(256, input_dim=X.shape[1], activation='relu',


kernel_initializer = "uniform")) #defining one layer with 256 filters to filter dataset

dnn_model.add(Dense(128, activation='relu', kernel_initializer =


"uniform"))#defining another layer to filter dataset with 128 layers

dnn_model.add(Dense(y_train.shape[1], activation='softmax',kernel_initializer =
"uniform")) #after building model need to predict two classes such as normal or
Dyslipidemia disease

dnn_model.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy']) #while filtering and training dataset need to display accuracy

print(dnn_model.summary()) #display rnn details

hist = cnn_model.fit(X_train, y_train, epochs=20,


batch_size=64,class_weight=class_weight)

dnn_model.save_weights('model/model_weights.h5')

model_json = dnn_model.to_json()

with open("model/model.json", "w") as json_file:

json_file.write(model_json)

json_file.close()

y_test = np.argmax(y_test, axis=1)

predict = dnn_model.predict(X_test)

predict = np.argmax(predict, axis=1)

p = precision_score(y_test, predict,average='macro') * 100

50 | K P R I T - C S E
r = recall_score(y_test, predict,average='macro') * 100

f = f1_score(y_test, predict,average='macro') * 100

a = accuracy_score(y_test,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

text.insert(END,"DNN Precision : "+str(p)+"\n")

text.insert(END,"DNN Recall : "+str(r)+"\n")

text.insert(END,"DNN FMeasure : "+str(f)+"\n")

text.insert(END,"DNN Accuracy : "+str(f)+"\n\n")

rocGraph(y_test, predict, "DNN")

def runCNNRandomForest():

global cnn_model

global classifier

global X, Y

global cnn_model

print(cnn_model.summary())

X1 = np.reshape(X, (X.shape[0], X.shape[1], 1, 1))

extract = Model(cnn_model.inputs, cnn_model.layers[-2].output)

XX = extract.predict(X1)

print(XX.shape)

X_train, X_test, y_train, y_test = train_test_split(XX, Y, test_size=0.2, random_state=0)

51 | K P R I T - C S E
X_train, X_test1, y_train, y_test1 = train_test_split(X_test, y_test, test_size=0.2,
random_state=0)

rfc = RandomForestClassifier(n_estimators=50)

rfc.fit(X_test, y_test)

classifier = rfc

predict = rfc.predict(X_test1)

p = precision_score(y_test1, predict,average='macro') * 100

r = recall_score(y_test1, predict,average='macro') * 100

f = f1_score(y_test1, predict,average='macro') * 100

a = accuracy_score(y_test1,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

text.insert(END,"Extension CNN with Random Forest Precision : "+str(p)+"\n")

text.insert(END,"Extension CNN with Random Forest Recall : "+str(r)+"\n")

text.insert(END,"Extension CNN with Random Forest FMeasure : "+str(f)+"\n")

text.insert(END,"Extension CNN with Random Forest Accuracy : "+str(f)+"\n\n")

rocGraph(y_test1, predict, "Extension CNN + Random Forest")

def predict():

global classifier

52 | K P R I T - C S E
global cnn_model

global dnn_model

text.delete('1.0', END)

filename = filedialog.askopenfilename(initialdir = "Dataset")

test = pd.read_csv(filename)

test.fillna(0, inplace = True)

test = test.values

data = test

test = np.reshape(test, (test.shape[0], test.shape[1], 1, 1))

extract = Model(dnn_model.inputs, dnn_model.layers[-2].output)

test = extract.predict(test)

predict = classifier.predict(test)

for i in range(len(predict)):

if predict[i] == 1:

text.insert(END,str(data[i])+" ===> record detected as ELECTRICITY


THEFT\n\n")

if predict[i] == 0:

text.insert(END,str(data[i])+" ===> record NOT detected as ELECTRICITY


THEFT\n\n")

def graph():

df =
pd.DataFrame([['DNN','Precision',precision[0]],['DNN','Recall',recall[0]],['DNN','F1
Score',fscore[0]],['DNN','Accuracy',accuracy[0]],

53 | K P R I T - C S E
['GRU','Precision',precision[1]],['GRU','Recall',recall[1]],['GRU','F1
Score',fscore[1]],['GRU','Accuracy',accuracy[1]],

['CNN','Precision',precision[2]],['CNN','Recall',recall[2]],['CNN','F1
Score',fscore[2]],['CNN','Accuracy',accuracy[2]],

['Extension CNN+RF','Precision',precision[3]],['Extension
CNN+RF','Recall',recall[3]],['Extension CNN+RF','F1 Score',fscore[3]],['Extension
CNN+RF','Accuracy',accuracy[3]],

],columns=['Parameters','Algorithms','Value'])

df.pivot("Parameters", "Algorithms", "Value").plot(kind='bar')

plt.show()

def close():

main.destroy()

font = ('times', 16, 'bold')

title = Label(main, text='Deep Learning Detection of Electricity Theft Cyber-attacks in


Renewable Distributed Generation', justify=LEFT)

title.config(bg='lavender blush', fg='DarkOrchid1')

title.config(font=font)

title.config(height=3, width=120)

title.place(x=100,y=5)

title.pack()

font1 = ('times', 13, 'bold')

54 | K P R I T - C S E
uploadButton = Button(main, text="Upload Electricity Theft Dataset",
command=uploadDataset)

uploadButton.place(x=200,y=100)

uploadButton.config(font=font1)

preprocessButton = Button(main, text="Preprocess Dataset",


command=preprocessDataset)

preprocessButton.place(x=500,y=100)

preprocessButton.config(font=font1)

cnnButton = Button(main, text="Run Feed Forward Neural Network",


command=runDNN)

cnnButton.place(x=200,y=150)

cnnButton.config(font=font1)

cnnrfButton = Button(main, text="Run RNN GRU Algorithm", command=runGRU)

cnnrfButton.place(x=500,y=150)

cnnrfButton.config(font=font1)

cnnsvmButton = Button(main, text="Run Deep Learning CNN Algorithm",


command=runCNN)

cnnsvmButton.place(x=200,y=200)

cnnsvmButton.config(font=font1)

'''rfButton = Button(main, text="Run Extension CNN + Random Forest",


command=runCNNRandomForest)

rfButton.place(x=500,y=200)

rfButton.config(font=font1)'''

predictButton = Button(main, text="Predict Electricity Theft", command=predict)

55 | K P R I T - C S E
predictButton.place(x=200,y=250)

predictButton.config(font=font1)

graphButton = Button(main, text="Comparison Graph", command=graph)

graphButton.place(x=500,y=250)

graphButton.config(font=font1)

exitButton = Button(main, text="Exit", command=close)

exitButton.place(x=800,y=250)

exitButton.config(font=font1)

font1 = ('times', 12, 'bold')

text=Text(main,height=20,width=120)

scroll=Scrollbar(text)

text.configure(yscrollcommand=scroll.set)

text.place(x=10,y=300)

text.config(font=font1)

main.config(bg='light coral')

main.mainloop()

56 | K P R I T - C S E
10.RESULTS AND DISCUSSION

10.1 Implementation Description

This is a Python script creates a graphical user interface (GUI) using the Tkinter library for
a Deep Learning application aimed at detecting electricity theft cyber-attacks in renewable
distributed generation.

• Imports:
The code begins by importing necessary libraries/modules like Tkinter, filedialog,
numpy, pandas, and various modules from scikit-learn, Keras, and others. These
libraries are used for GUI creation, file handling, data manipulation, machine
learning, and deep learning operations.
• Creating the Main Window:
main = tkinter.Tk(): This creates the main window of the application.
main.title(): Sets the title of the main window.
main.geometry(): Sets the dimensions (width x height) of the main window.
• Global Variables:
Global variables are declared which will be used throughout the code. These include
variables related to the dataset, models, and evaluation metrics.
• Button Functions:
The code defines several functions which will be executed when corresponding
buttons are clicked. These functions handle tasks like uploading datasets,
preprocessing data, running various models (GRU, CNN, DNN), and displaying
results.
• GUI Elements:
The code creates various GUI elements like buttons, labels, and text boxes using
Tkinter.
• Button Placements and Styling:
Buttons and other GUI elements are placed in specific coordinates within the
window using the place() method. They are also styled with fonts and other
attributes.

57 | K P R I T - C S E
• Text Widget for Displaying Output:
A text widget is created to display the output of the operations. It's given a scrollbar
for navigation.
• Main Loop:
main.mainloop(): This starts the main event loop of the application. It keeps the GUI
window open and responsive.
• Button Functions (Detailed Explanation):
uploadDataset(): This function opens a file dialog to select a dataset file. It then
loads and displays the dataset in the text widget.
preprocessDataset(): This function preprocesses the dataset by filling any missing
values with 0, encoding categorical variables, and splitting the data into features (X)
and target labels (Y).
runGRU(): This function trains a GRU (Gated Recurrent Unit) model or loads a pre-
trained one. It computes and displays evaluation metrics like precision, recall, F1-
score, and accuracy. It also displays a ROC graph.
runCNN(): This function trains a Convolutional Neural Network (CNN) model or
loads a pre-trained one. It performs similar tasks as in runGRU().
runDNN(): This function trains a Deep Neural Network (DNN) model or loads a
pre-trained one. It computes and displays evaluation metrics, and also displays a
ROC graph.
runCNNRandomForest(): This function combines a CNN with a Random Forest
classifier for extended functionality. It computes and displays evaluation metrics.
predict(): This function allows the user to select a file for prediction. It loads and
preprocesses the data, extracts features using the pre-trained model, and predicts the
labels. The results are displayed in the text widget.
graph(): This function creates a bar graph comparing different evaluation metrics
for the various algorithms used.
close(): This function closes the main window when the "Exit" button is clicked.
This script creates a GUI for a deep learning application focused on electricity theft
detection in renewable distributed generation. It provides buttons for various tasks like

58 | K P R I T - C S E
dataset handling, model training, prediction, and result display. The user can interact with
the application through the GUI.

10.2 Dataset description

The dataset contains following columns

• district: This column represents the district where the client is located. It is a
categorical variable with numerical values (e.g., 60, 69, 62, 63).
• client_id: This column is a unique identifier for each client. It is likely a string or
alphanumeric value.
• client_catg: This column represents a category or classification for the client. It is a
categorical variable with numerical values (e.g., 11, 12).
• region: This column indicates the region where the client is located. It is a
categorical variable with numerical values (e.g., 101, 107, 301, 105, 303, 103, 309,
304, 311).
• creation_date: This column contains dates, possibly indicating when the client was
registered or created. It should be treated as a date/time variable.
• label: This column a binary is a label, possibly indicating whether the client is
associated with electricity theft. It takes binary values (0 for no, 1 for yes).

10.3 Results and description

Figure 10.1 showcases the user interface (UI) that was integral to the research. It serves as
a visual representation of the software or tool used for conducting the experiments or
analysis. The UI likely includes buttons, input fields, data visualization elements, and
various interactive components essential for researchers to interact with the research
environment. By presenting this UI, the research aims to provide transparency in the
research process, giving readers insight into how experiments were conducted, and data was
analyzed within the given interface.

59 | K P R I T - C S E
Figure 10.2 presents a sample of the electricity theft dataset after it has undergone
preprocessing. The numeric values or data displayed in this figure likely represent a subset
of the cleaned and transformed dataset. These values may include features or variables
relevant to the research, which have been processed to a state suitable for analysis. By
showing this sample data, the research offers a glimpse into the characteristics of the dataset
and the initial state from which analyses were performed.

Figure 10.1. User interface of the research.

Figure 10.2. Sample Preprocessed outcome on electricity theft dataset.

Figure 10.3 summarizes the performance of the proposed Deep Neural Network (DNN)
model using numeric values. Precision is 95.29%, which indicates the accuracy of the
positive predictions made by the DNN model. Recall is 94.37%, which reflects the model's
ability to capture the actual positive instances. F1 Score is 94.74%, which is a balanced

60 | K P R I T - C S E
measure of precision and recall. Accuracy is 94.74%, which represents the overall
correctness of the DNN model's predictions.

Figure 10.4 displays the Receiver Operating Characteristic (ROC) curve for the proposed
Deep Neural Network (DNN) model. While not providing numeric values directly, the ROC
curve visualizes how the model performs across various threshold values, illustrating the
trade-off between true positive rate and false positive rate.

Figure 10.5 presents the performance metrics for an existing Gated Recurrent Unit (GRU)
model, using numeric values in parentheses. Precision is 68.86% measures the accuracy of
positive predictions made by the GRU model. Recall is 51.58% indicates the model's ability
to capture actual positive instances. F1 Score is 40.34% is a balanced measure of precision
and recall. Accuracy is 40.34% represents the overall correctness of the GRU model's
predictions.

Similar to Figure 10.4, this figure likely displays the Receiver Operating Characteristic
(ROC) curve for the existing Gated Recurrent Unit (GRU) model, providing a visual
representation of the model's performance in distinguishing between positive and negative
instances.

Figure 10.7 exhibit the results of predictions made by one or both of the models (DNN and
GRU) on a test dataset, though specific numeric values are not mentioned. It could include
a comparison of predicted outcomes against actual outcomes, visually illustrating how well
the models perform in practical scenarios. This visual representation allows readers to
assess the real-world applicability of the models.

Figure 10.3. Proposed DNN performance.

61 | K P R I T - C S E
Figure 10.4. Proposed DNN-RoC Curve.

Figure 10.5. Existing GRU performance.

Figure 10.6. Prediction results from test data.


Table 1 serves as a concise summary of the performance of two different models, the
"Proposed DNN" (Deep Neural Network) and the "Existing GRU" (Gated Recurrent Unit),
with regard to electricity dataset. The table is designed to help reader11s quickly understand
how these models perform in terms of key metrics.

62 | K P R I T - C S E
• Precision (%): Precision is a metric that measures the accuracy of positive
predictions made by a model. In the context of this table, "Precision (%)" represents
the percentage of positive predictions made by each model that were actually
correct. A higher precision indicates that the model makes fewer false positive
errors.

• Recall (%): Recall, also known as sensitivity or true positive rate, measures the
model's ability to capture actual positive instances. It represents the percentage of
actual positive instances that the model correctly identifies. A higher recall value
indicates that the model captures more of the true positive cases.

• F1 Score (%): The F1 score is a balanced metric that combines both precision and
recall into a single value. It is the harmonic mean of precision and recall and
provides an overall assessment of the model's performance. A higher F1 score
suggests that the model achieves a good balance between precision and recall.

• Accuracy (%): Accuracy represents the overall correctness of the model's


predictions, including both true positives and true negatives. It is the percentage of
all predictions (both positive and negative) that were correct. However, accuracy
can be misleading in imbalanced datasets where one class is significantly more
prevalent than the other.

63 | K P R I T - C S E
11.CONCLUSION

11.1 Conclusion

Global energy crises are increasing every moment. Everyone has the attention towards more
and more energy production and also trying to save it. Electricity can be produced through
many ways which is then synchronized on a main grid for usage. Weather losses are
technical or non-technical. Technical losses can abstract be calculated easily, as we
discussed in section of mathematical modeling that how to calculate technical losses.
Whereas nontechnical losses can be evaluated if technical losses are known. Theft in
electricity produce non-technical losses. To reduce or control theft one can save his
economic resources. Smart meter can be the best option to minimize electricity theft,
because of its high security, best efficiency, and excellent resistance towards many of theft
ideas in electromechanical meters. So, in this paper we have mostly concentrated on theft
issues. Therefore, this project evaluated performance of various deep learning algorithms
such as deep feed forward neural network (DNN), recurrent neural network with gated
recurrent unit (RNN-GRU) for electricity cyber-attack detection.

11.2 Future Scope

Looking towards the future, there are several exciting avenues for further research and
development in this field. Firstly, refining and optimizing the deep learning algorithms used
for cyber-attack detection in smart meters can lead to even more accurate and responsive
systems. Additionally, the integration of real-time data analytics and anomaly detection
techniques can enhance the ability to identify and respond to potential theft or cyber-attacks
promptly.

Furthermore, the adoption of a broader range of sensors and data sources within the smart
grid infrastructure can provide a more comprehensive view of the grid's health and security.
This may include incorporating data from IoT devices, weather sensors, and predictive
analytics to preemptively identify vulnerabilities and improve resilience against cyber
threats.

64 | K P R I T - C S E
12.REFERENCES

[1] Das, A.; McFarlane, A. Non-linear dynamics of electric power losses, electricity
consumption, and GDP in Jamaica. Energy Econ. 2019, 84, 104530.
[2] Bashkari, S.; Sami, A.; Rastegar, M. Outage Cause Detection in Power Distribution
Systems based on Data Mining. IEEE Trans. Ind. Inf. 2020.
[3] Bank, T.W. Electric Power Transmission and Distribution Losses (% of output);
IEA: Paris, France, 2016.
[4] Hasan, M.N., Toma, R.N., Nahid, A.A., Islam, M.M. and Kim, J.M., 2019.
Electricity theft detection in smart grid systems: A CNN-LSTM based approach.
Energies, 12(17), p.3310.
[5] K. Zheng, Q. Chen, Y. Wang, C. Kang and Q. Xia, "A Novel Combined Data-Driven
Approach for Electricity Theft Detection," in IEEE Transactions on Industrial
Informatics, vol. 15, no. 3, pp. 1809-1819, March 2019, doi:
10.1109/TII.2018.2873814.
[6] M. Nabil, M. Ismail, M. M. E. A. Mahmoud, W. Alasmary and E. Serpedin,
"PPETD: Privacy-Preserving Electricity Theft Detection Scheme with Load
Monitoring and Billing for AMI Networks," in IEEE Access, vol. 7, pp. 96334-
96348, 2019, doi: 10.1109/ACCESS.2019.2925322.
[7] Khan, Z.A., Adil, M., Javaid, N., Saqib, M.N., Shafiq, M. and Choi, J.G., 2020.
Electricity theft detection using supervised learning techniques on smart meter data.
Sustainability, 12(19), p.8023.
[8] Kocaman, B., Tümen, V. Detection of electricity theft using data processing and
LSTM method in distribution systems. Sādhanā 45, 286 (2020).
https://doi.org/10.1007/s12046-020-01512-0
[9] Li, B., Xu, K., Cui, X., Wang, Y., Ai, X., Wang, Y. (2018). Multi-scale DenseNet-
Based Electricity Theft Detection. In: Huang, DS., Bevilacqua, V., Premaratne, P.,
Gupta, P. (eds) Intelligent Computing Theories and Application.

65 | K P R I T - C S E
Project Details
Academic Year 2023-2024
Title of the Project Electricity Theft Cyber Attack Detection and
Prediction for Future IOT-Based Smart
Electric Meters
Name of the Students and Hall Ticket R.VAMSHANK VARDHAN (20RA1A0599)
No: M.VAMSHI (20RA1A05A1)
K. SANJAY (20RA1A0574)
Name of the Guide Mr.P.Kamaraja Pandian

Project PO Mapping
Name of the Related Description of the Page Attained
Course from Course application Number
which Principles Outcome
are applied in this Number
Project
Python C313.1, Students described the basis 01 PO2
Programming, C413.1 for their problem statement.
Software
Engineering
(C413,C313)
Machine Learning, C413.2, Students explained about 02-04 PO1
Python C413.3, predicting and clustered the
Programming, C411.2 data from the IOT Electric
Data Mining Meters with Python
(C413,C413,C411) Programming.
Python C313.3, Students identified the 05-15 PO2,PO3
Programming, C413.2 existing system and its
Software Drawbacks and proposed a
Engineering solution to it.
(C313,C413)
Design Patterns, C313.2, Students explain the flow 16-20 PO3,PO5,
Software C322.3 of the project using UML PO9,PSO3
Engineering,DBMS diagrams designed in
(C313,C322) STAR UML,ER diagram

Python C413.2, Students explained about 21-30 PO2,PO3,


Programming, C411.2 python programming PO4
Data Mining language and designed the
(C413,C411) modules for the solution of
the problem
Software C313.1 Students identified the 34 PO5
Engineering hardware and software
(C313) required for the project.

66 | K P R I T - C S E
Python C327.3, Students developed code 40-58 PO3,PO4,PO5
Programming C413.2, for the problem statement.
(C327,C413,C413) C413.3

Software C313.2, Students performed various 57-63 PO9,PO11,


Engineering, C324.3 testing’s for the code they PO2
Software Testing developed
Methodologies
(C313,C324)
Future Scope Students explained about 64 PO12,PSO2
how they would like to
further their project and
develop it as their future
scope.
Bibliography Listed the references from 65 PO8,PO12
which the literature was
collected
English Prepared the thesis and PO9,PO10
intermediate progress
reports and explained to the
panel. Also, continuously
interact with guide and
explain the progress

Signature of Internal guide: Signature of student:

67 | K P R I T - C S E

You might also like