Professional Documents
Culture Documents
1 Introduction
For a long time, power grid enterprises have been committed to investigating and
punishing the behavior of electricity theft. With the continuous development of tech-
nology, the electricity theft has presented characteristics of high-tech and covert. It
brings the problems of low efficiency and poor accuracy for power grid enterprises to
identify the customers stealing electricity [1, 2]. The traditional methods to identify
electricity theft users such as artificial analysis have been difficult to meet the
requirements of current anti-electricity theft. Modern techniques need to be used to
screen the factors interrelated with electricity theft, and help power grid enterprises to
accurately identify the electricity theft [2, 3].
Due to the widely application of various modern techniques in electricity theft, the
concealment of electricity theft is improving, and the factors that can reflect the
behavior of electricity theft are becoming more and more complicated [4–7]. As a
result, the requirement for the ability of the power consumption inspector has become
stricter. Some stealers even use kinds of modern equipment such as interference unit, to
mislead the power consumption inspector, which made more difficult for the inspector
to identify the thieves from various factors [8].
In 2015, the State Grid proposed a set of models for the on-line monitoring and
intelligent diagnosis of power metering. The electricity theft model contained in these
models could help the power consumption inspector to identify the electricity theft
users.
However, this model is based on rule matching, and the rule is dependent on the
subjective experience heavily. With the increase of new artifice of electricity theft, it is
difficult for this method to capture the characteristics of diverse behavior of electricity
theft accurately. The existing identification methods of electricity stealing behavior are
the same as this model, which can not meet the requirements of electricity theft users
and behavior recognition in current stage.
Therefore, it is necessary to optimize the model based on data analysis, in order to
find the characteristic factors of electricity theft quickly and effectively, and predict the
probability of customers appear the acts to steal electric power accurately. The opti-
mization model proposed in this paper can reduce the economic losses caused by
electricity theft, improve the accuracy of power consumption inspection, and reduce the
working pressure of inspector greatly.
Chi-square test is a commonly used hypothesis testing method. The most common use
of this algorithm is to investigate whether the distribution of disordered categorical
variables is consistent between two or more groups. In addition, it can also be used to
compare the relevance between two or more samples and the classified variables. Chi-
square test is not restricted by the overall distribution, and has many advantages such as
wide scope of application, easy to operate, and has much superiority in practical
applications [9]. Based on the Chi-square test, this paper calculates the main factors
that can determine whether the customers have been stealing electricity, thus building a
topic model of the electricity theft.
The regression model is a mathematical model for the quantitative description of
the statistical relationship. Regression analysis is the method of studying the specific
dependence of independent variables on dependent variables. Based on a set of sample
data, the regression analysis determines the mathematical relationship between vari-
ables, and then carries out reliability of the relationship by statistical test, and finds out
the significant variables from the variables that affect a specific variable finally. With
the aid of obtained relationship, the value of another specific variable would be pre-
dicted or controlled according to the value of one or several variables, and the accuracy
of prediction or control is also given [10]. The logistic regression analysis method is
often used for classifying variables [11]. The predictive result of logistic regression is a
probability between 0 and 1, which is easy to use and explain. In practical applications,
logistic regression plays an important role in the classification problem, such as pre-
dicting the probability of a disease, predicting the probability of commodity purchase,
or judging the sex of a user [12–14]. However, this analysis method is rarely used in
the field of anti-electricity stealing. This paper will screen out the factors that are
Optimization Method of Suspected Electricity Theft Topic Model 391
significantly associated with electricity theft, and optimize the weight of factors based
on logistic regression, in order to get a prediction model with high accuracy, accuracy
and recall rate.
Chi-square test can be used to compare the association between two or more samples
and the classified variables. The factors and results that affect the electricity theft can be
considered as the classified variables. In this paper, the correlation of factors and results
is obtained by Chi-square test, so as to eliminate irrelevant factors.
c1 and c2 are the sample number of factors qi that occurred and didn’t occur in
electricity theft users samples respectively. m1 and m2 are the sample number of factors
qi that occurred and didn’t occur in normal users samples respectively.
c ¼ c1 þ c2 ; m ¼ m1 þ m2
First, suppose that qi occurs or not is independent of whether the user has been
stealing electricity or not. Select a sample from user data randomly. The probability of
this sample belongs to electricity theft users is l ¼ c þc m.
Then, according to the independence hypothesis, a new fourfold table of theoretical
values is generated as shown in Table 2.
Obviously, if the two variables are linearly independent, the difference between the
theoretical values and the actual values in the fourfold table is very small.
The formula of Chi-square is
X ðA T Þ2
v2 ¼ ð1Þ
T
A is the actual values, which is the data shown in Table 1. T is the theoretical value,
which is the data shown in Table 2.
After calculating the value of v2 , determine whether the independence hypothesis is
reliable by querying the critical value table of the Chi-square distribution. The degree
of freedom (DF) of the fourfold table is 1. At this time, the critical probability of the
Chi-square distribution (part) is shown in Table 3.
By querying the whole table, the probability of all factors interrelated with elec-
tricity theft can be obtained. Suppose that the threshold is e, when the correlation
probability P [ e, this factor is identified as the interrelated factor. At last, a set of
interrelated factors xi ; i ð1; 2; . . .; nÞ could be screened out by this way.
if qxi ;xj [ u, the factors that are less associated with the results of the two factors are
eliminated. Suppose the set of factors which have been eliminated high interrelated
factors is yi ; i ð1; 2; . . .; kÞ, then yi xi .
Optimization Method of Suspected Electricity Theft Topic Model 393
Test the dependence between the disorderly scalar factors by the Apriori algorithm,
which is the nonlinear dependence between any two factors in yi ; i ð1; 2; . . .; k Þ:
Calculate the support degree ryi of each factor in yi ; i ð1; 2; . . .; kÞ based on the
electricity theft sample.
Define the threshold of support degree as g.
If ryi g, eliminate the i’th factor from yi ; i ð1; 2; . . .; kÞ; to generate a new set
zi ; i ð1; 2; . . .; lÞ; and
zi yi.
Constitute the set zi ; zj by selecting two random factors from zi ; i ð1; 2; . . .; lÞ:
And on this basis calculate the support degree rzi zj of each set in zi ; i ð1; 2; . . .; lÞ:
If rzi zj g, we could judge that there are correlation between factor zi and factor zj .
Eliminate the factor which has higher entropy after discretization from yi . At last, we
generate a new set ui ; i ð1; 2; . . .; r Þ; and ui zi .
On the basis of all the interrelated factors, we construct the loss function by logistic
regression algorithm. And then, calculate the optimal solution of each factor weight by
gradient descent method. At last we get the final prediction function.
1
gð z Þ ¼ ð4Þ
1 þ ez
1
hh ð x Þ ¼ g h T x ¼ ;kn ð5Þ
1 þ eðh0 þ h1 x1 þ ... þ hk xk Þ
394 J. Dou and Y. Aliaosha
0.5
-6 -4 -2 0 2 4 6
P
Therein, hT ¼ hi , hi represents the weight of the interrelated factor
xi ; ið0; 1; . . .; kÞ:
J ðhÞ is convex function in this situation. The concrete steps to update the weight by
the gradient descent method are as follows.
Initialize: h0 ; h1 ; . . .; hk , threshold g and learning rate a.
(1) Determine the gradient of the loss function of the current position. The gradient
expression of hi is @h@ i J ðhÞ.
Optimization Method of Suspected Electricity Theft Topic Model 395
(2) Multiply the gradient of the loss function by the learning rate, to get the distance
from the current position called step is a @h@ i J ðhÞ.
(3) Confirm that the distance of the gradient descent of each h value is less than g. If
this is the case, algorithm will terminate. Otherwise, enter step (4).
(4) Update h value according to the following formula:
0 @
hi ¼ hi a J ðhÞ ð10Þ
@hi
If the predictive value belongs to (n,1], the user would be predicted as an electricity
theft user. If the predicted value belongs to (0,n], the user would be predicted as a
normal user.
And then, we substituted all interrelated factors and interrelated combined factors
into the logistic regression model, as followed the steps in Sect. 4 for calculations.
Initialized all h values as 0.5, g ¼ 0:1, and defined the learning rates were a ¼ 0:1 and
a ¼ 0:5 respectively. When getting the updated h value in iterations, the predictive
function would be updated, into which the samples would be substituted to forecast.
Calculated the error rate of electricity theft users as the prediction result.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r¼ ð hh ð x Þ 1Þ 2 ð12Þ
Error
Rate α=0.5
α=0.1
80%
10%
Precision α=0.5
α=0.1
98%
Substituted the result of each iteration into the model to calculate the prediction
precision of samples. The precision curve was shown in Fig. 3.
The result of the solution didn’t change after 152 iterations. The results of the final
model factors weights were shown in Table 5.
Finally, we substituted the samples into the prediction function. The average
accuracy rate was 94.1%, the average precision rate was 96.6%, and the average recall
rate was 95.8%.
398 J. Dou and Y. Aliaosha
6 Conclusion
In view of the difficulties of power grid enterprises in inspecting electricity theft, this
paper proposes an optimization method of suspected electricity theft topic model based
on Chi-square test and logistic regression.
Optimization Method of Suspected Electricity Theft Topic Model 399
Based on the analysis of the factors that affect suspected electricity theft, Chi-
square test is used to calculate the factors and combined factors that have high cor-
relation degree with the results hat the electricity theft occurred or not. Screen out the
interrelated factors and eliminate the irrelevant factors. The logistic regression algo-
rithm is used to optimize the weights of the interrelated factors iteratively, to get the
final prediction function. It can predict whether the users have been stealing electricity
through the values of interrelated factors.
To verify the optimization model presented in this paper, we chose some electricity
theft users and normal users with their related electricity utilization data from provincial
power grid enterprise as the initial data samples. First, we screened out interrelated
factors. Then, we constructed logistic regression function to optimize the weights of
each factor. Finally got the predictive function. Substituted the experimental samples
into the prediction function to get the prediction results which had a good performance
in accuracy, precision, and recall rate.
Experimental results show that Chi-square test and logistic regression algorithms
have a good applicability in selecting electricity theft interrelated factors as well as
predicting whether the users have been stealing electricity. This method can inspire the
power grid enterprises in anti-electricity theft, and improve the accuracy of power
consumption inspection efficiently, and promote the stable development of power grid
enterprises, and maintain well social power using order.
References
1. Wang, J., Meng, Y., Yin, S., Zhang, Y.: The present situation and development trend of anti
electric stolen function of power demand information acquisition system. Power Syst.
Technol. 12(S2), 177–178 (2008)
2. Cheng, C., Zhang, H., Jing, Z., Chen, M., Jiao, L., Yang, L.: Study on the anti-electricity
stealing based on outlier algorithm and the electricity information acquisition system. Power
Syst. Prot. Control 43(17), 69–74 (2015)
3. Hu, S., Guan, J., Yang, Z., Yu, H.: Research on electricity quantity metrology and
acquisition system based on embedded system. Modern Electron. Tech. 39(22), 163–166
+170 (2016). https://doi.org/10.16652/j.issn.1004-373x.2016.22.040
4. Wang, Q., Li, S.: Technology analysis and preventive measures of electric larceny
prevention technology based on electric energy data acquisition system. Electr. Meas.
Instrum. (2016)
5. Ren, S.: Strengthen the supervision and management of electric power to combat the theft of
electricity. Global Mark. Inf. Guide 45, 156 (2014)
6. Zhuang, C., Zhang, B., Hu, J., Li, Q., Zeng, R.: Anomaly detection for power consumption
patterns based on unsupervised learning. Proc. CSEE 36(2), 379–387 (2016). https://doi.org/
10.13334/j.0258-8013.pcsee.2016.02.008
7. Zhao, L., Luan, W., Wang, Q.: Accurate line loss analysis of LV distribution network using
AMI data. Power Syst. Technol. 39(11), 78–83 (2015). https://doi.org/10.13335/j.1000-
3673.pst.2015.11.026
8. Ma, S.: Supervision and management of electricity and measures for preventing electricity
theft. Theor. Res. Urban Constr. 11, 2440 (2016)
400 J. Dou and Y. Aliaosha
9. Xu, C., Lu, G., Ye, Y., Mi, Y.: Cooperative spectrum sensing using Chi-square test for
multi-antenna cognitive radio. Chin. High Technol. Lett. 26(7), 650–656 (2016). https://doi.
org/10.3772/j.issn.1002-0470.2016.07.005
10. Wu, D.: Electricity theft identification method based on curve similarity. Electr. Power
50(2), 181–184 (2017). https://doi.org/10.11930/j.issn.1004-9649.2017.02.181.04
11. Chen, A., Xia, F., Zhong, Y.: A new independence test of four grid table. Stat. Decis. 13,
85–88 (2017). https://doi.org/10.13546/j.cnki.tjyjc.2017.13.020
12. Xu, J., Su, W., Wu, S., Wu, X.: Modeling user reliability based on logistic regression in
micro-blog. Comput. Eng. Des. 3, 772–777 (2015). https://doi.org/10.16208/j.issn1000-
7024.2015.03.042
13. Guo, J., Sun, J., Liang, T., Tan, R.: Evaluation model of disruptive design scheme based on
logistic regression. Comput. Integr. Manuf. Syst. 21(6), 1405–1416 (2015). https://doi.org/
10.13196/j.cims.2015.06.001
14. Wang, Z., Liu, K., Zheng, Z., Li, C.: Prediction retweeting of microblog based on logistic
regression model. J. Chin. Comput. Syst. 37(8), 1651–1655 (2016)