Professional Documents
Culture Documents
DDCLS'18
515
acquisition of thermal power plant is called a transaction, index of the current working condition is superior than the
and each parameter acquired each time is called an item. evaluation index of the historical working condition, the
STEP 1. Extract pre-processed big data of thermal power parameter value of the current working condition replaces
from HDFS and create RDDs. the parameter value of the original historical working
STEP 2. The support of frequent items is calculated in condition to update the historical knowledge base.
parallel. According to the support degree, the item set is
descended in descending order, delete the item that does not 3 The Process of Big Data Mining Method of
satisfy the minimum support degree, and record the list of Thermal Power Based on Spark
arranged good to F_list. The big data of thermal power contains abundant and
STEP 3. Data grouping. For each transaction, according to valuable unit state information and stores a large amount of
F_list, delete and sort. Then according to the PFP algorithm unit operation knowledge. Big data of thermal power is
grouping strategy, F_list is divided into Q groups, and the excavated effectively to get historical knowledge by big data
results are recorded as G_List. mining algorithm based on Spark, and the historical
STEP 4. Frequent itemsets are excavated in parallel. The knowledge base is established. Record the running state of
Mapper reads the G_List and divides the transactions into the unit for a period and compare with the historical
each group. Each work node completes the mining task on its knowledge base to find the similar historical conditions. If a
own node alone and gets the frequent pattern of this group. similar historical condition is found, the history knowledge
STEP 5. Aggregating. The frequent patterns in each group will be used to guide the operation of the unit. If there is a
obtained in STEP 4 are aggregated to obtain the global result new working condition, then excavate the record of the
of the strong association rule in each condition. get the working condition for a period and add the new knowledge
historical working condition H. H = [Power, Coal Quality, to the historical database. If the evaluation index of the
Economic index weight value, Environmental index weight current working condition is better, the parameter value of
value, Stable-operation index weight value, Each value of the current working condition replaces the parameter value
optimal parameter, Evaluation index] of the original historical working condition to update the
2.5 Optimized Guidance based on Historical Database historical knowledge base.
DDCLS'18
516
4 Application There are some external conditions in the actual operation
of the thermal power unit. Different external conditions
4.1 Build Spark Computing Platform Based on because the working conditions of the unit to be different.
Hadoop There is a great difference between the optimal values of the
operating parameters of the thermal power unit under
The hardware part of the platform uses 4 PC machines to different working conditions. Power and coal quality are the
build a Spark cluster based on Hadoop. 4 machines are in a important external conditions that affect the operation of the
LAN. A machine as the main node, running the Master unit [5]. The new method uses power and coal quality as
process, but also as a work node for computing tasks. The external constraints and divides the working conditions. The
other three Slave nodes are work nodes. Set up Hadoop + new method defines that the relative coal quality factor is the
Spark system on the platform, the software configuration is power/ total fuel quantity, and the coal quality factor can
shown in Table 1. reflect the coal's work capacity to a certain extent [22]. The
results of the division of the working conditions are shown in
Tab.1: Software configuration of Spark computing platform
Table 2. Then use K-means algorithm based on Spark for
based on Hadoop
data discretization of each parameter.
Software Edition 4.4 Mining Target
Ubuntu 14.04
In this paper, the economy, environmental protection and
Java jdk1.7 stable operation of the unit operation are considered. The
Scala 2.10.4 mining target index is L, of which L1 is economic index, L2
Hadoop 2.6.0 is the environmental protection index, and L3 is the stable
operating index.
Spark 1.5.2
L = p1 × L1 + p 2 × L2 + p 3 × L3 (1)
IDE Eclipse
p1 + p 2 + p 3 = 1 (2)
The new method uses HDFS for data management, only to
upload the original dataset to HDFS. The system Set the weights to determine the optimization goal. This
automatically divides the data into multiple data blocks and example takes the economy of the operation of the unit as the
stores the data blocks into the cluster, achieve the distributed optimization target and selects the coal consumption rate as
storage of big data of thermal power. the evaluation index.
Considering the operating parameters closely related to
4.2 Mining Object coal consumption rate, parameters such as main steam
This paper uses big data mining method of thermal power pressure, initial steam flow rate, total air flow, outlet gas
based on Spark on big data computing platform to analyze temperature, oxygen content of inlet air of A-air preheater
10-day operating data of 300MW unit in an electric-power and oxygen content of inlet air of B-air preheater and remove
plant in Anhui Province. the operating parameters unrelated to coal consumption rate
of to compress the data space.
4.3 Data Preprocessing
4.5 Results
The actual operation process of the power plant is strictly
in the process of dynamic change, and there are a lot of On Spark computing platform based on Hadoop, set a
unstable operating states. Only the stable running data can minimum support of 3% and a minimum confidence of 85%.
have the value of data mining and can effectively reflect the The FP-growth algorithm based on Spark was used to mine
actual operating state of the unit. The new method adopts the the discrete data for each condition. The results of some
method in reference Error! Reference source not found. to strong association rules are shown in Table 3.
determine the steady-state condition of thermal power big
data.
DDCLS'18
517
Initial steam flow rate /(t/h) <514.524, 650.77> <514.524, 650.77> <651.094, 727.933>
Total air flow <700.723, 775.52> <700.723, 775.52> <814.539, 846.718>
Outlet gas temperature /℃ <106.055,108.711> <101.554, 106.04> <110.945, 113.524>
Oxygen content A /% <4.62987, 6.46174> <4.62987, 6.46174> <4.07882, 4.62819>
Oxygen content B /% <3.6192, 5.47583> <3.6192, 5.47583> <3.10999, 3.61691>
Economic evaluation A+ A+ A+
From table 3, new method can effectively excavate the temperature is within the range of <106.055,108.711>,
strong association rules between the parameters and the Oxygen content A is within the range of <4.62987,
economy under each working condition. Taking the mining 6.46174>,Oxygen content B is within the range of <3.6192,
result of condition 5 as an example, to illustrate the 5.47583>,Economic evaluation of thermal power units at
application significance of strong association rules. When least 85% of the probability of A+. In this paper, the
external conditions for power is within the range of clustering center of the parameter interval is taken as a
<192.752, 202.739>, coal quality is excellent. When main parameter to optimize the target value, which is more helpful
steam pressure is within the range of <13.6397, 13.899>, to guide the operator to optimize the operation of the unit
steam flow is within the range of <514.524, 650.77>,total (see Tab. 4). H = [<192.752, 202.739>, Excellent, 1, 0, 0,
air flow is within the range of <700.723, 775.52>,Outlet gas each value of optimal parameter, A+]
DDCLS'18
518
5 Conclusions [10] J Han, J Pei, Y Yin. Mining frequent patterns without
candidate generation, ACM SIGMOD International
In this paper, a new method of big data of thermal power Conference on Management of Data. ACM, 2000:1-12.
mining is proposed: Big data mining method of thermal [11] F Zhang, M Liu, F Gui, et al. A distributed frequent itemset
power based on Spark. In addition, the proposed method is mining algorithm using Spark for Big Data analytics. Cluster
applied to the economic optimization of thermal power units. Computing, 18(4):1493-1501, 2015.
Compared with the traditional optimization method, new [12] W Huang, L Meng, D Zhang, et al. In-Memory Parallel
Processing of Massive Remotely Sensed Data Using an
method has the following advantages:
Apache Spark on Hadoop YARN Model. IEEE Journal of
(1)With judgment of steady-state conditions of thermal Selected Topics in Applied Earth Observations & Remote
power data, new method improves the data quality and Sensing 10(1):3-19. , 2017.
eliminates the interference from the dynamic unstable [13] W Chen, Y Tong, J Zhang, et al. Frequent sequence mining
working condition data to effectively reflect the actual from massive access log for user’s behaviour investigation.
running status of the unit. In addition, the steady-state data is Proceedings of Science, 2017.
divided based on the external constraints to realize the fine [14] D Zhang, M Xin, L Liu, et al. Research on Development
division of the actual operating conditions of the unit. Strategy for Smart Grid Big Data[J]. Proceedings of the
CSEE, 35(1):2-12, 2015.
( 2 ) By setting the weights of economic indicators, [15] X Wan, N Hu. Research on Application of Big Data Mining
environmental indicators and stable operation indicators, the Technology in Performance Optimization of Steam Turbines
user's different optimization needs are met , and clarify [J]. Proceedings of the CSEE, 36(2):459-467, 2016.
optimization goals. According to the optimization goal, the [16] Spark: Apache Spark. https://spark.apache.org/.
parameters are filtered to compress the data space. [17] H LI, Y Wang, D Zhang,et al. PFP:parallel FP⁃ Growth
(3)The technology of distributed storage computing is for query recommendation , ACM .Proceedings of the 2008
introduced. K-means algorithm based on Spark and ACM Conference on Recommender Systems,2008:107-114.
FP-growth algorithm based on Spark are used to process big [18] White T. Hadoop: The Definitive Guide [M]. 2011.
data of thermal power, which improves the capability of [19] Zhang Y H, Feng-Gang L I. Kmeans Algorithm Based on the
Spark of Parallel Implementation and Optimization. Journal
processing big data of thermal power and solves the problem
of Xian University, 2017.
that traditional methods cannot effectively deal with big data [20] P Liu, J Teng, G Zhang, Study of parallelized k-means
of thermal power. The new method breaks through the algorithm on massive text based on Spark, CCF Big Data.
bottleneck of traditional methods in computing big data of 2014.
thermal power. [21] Wei H K, Song W Z, Qi L I. A RBF Network Based Online
Molding Method For Realtime Cost Model In Power Plant.
References Proceedings of the CSEE, 24(7):246-252, 2004.
[1] YH Huang, ZH Yu, C Xie, et al. Study on the Application of [22] TT Yang, DL Zeng, JZ Liu, Operation optimization rule
Electric Power Big Data Technology in Power System extraction method for generator unit base on classification of
Simulation. Proceedings of the CSEE, 35(1):13-22, 2015. operation condition. Journal of North China Electric Power
[2] DX Liu, HH Hu, J Zhang,et al.Research on key issues University (Natural Science Edition), 36(6):64-68, 2009.
of big data lifecycle and its applications.Proceedings of
the CSEE,35(1):23-28,2015.
[3] Chinese Society for Electrical Engineering Informatization
committee.Chinese electric power big data development
white paper(2013)[R] . Beijing : Chinese Society for
Electrical Engineering,2013.
[4] JW Han. Data Mining: Concepts and Techniques [M].
Morgan Kaufmann Publishers Inc. 2005.
[5] QP Wang, ZQ Chen, H Wei. The Summary of Optimal
Operation Parameters in Power Station Based on the Data
Mining. Electric Power Science and Engineering, 7:19-24,
2015.
[6] Agrawal R,Imieliń ski T, Swami A. Mining association
rules between sets of items in large databases.ACMSIGMOD
Record,22(2):207-216,1993.
[7] JQ Li, JZ Liu, CL Niu, et al. The research and application of
data mining in power plant operation optimization, IEEE.
International Conference on Machine Learning and
Cybernetics, 2005:1642-1647 Vol. 3.
[8] JQ Li, CL Niu, JZ Liu.Application of data mining
technique in optimizing the operation of power
plants.Journal of Power Engineering,26(6):830-835,
2007.
[9] JQ Li, JZ Liu, LY Zhang, The Research and Application of
Fuzzy Association Rule Mining in Power Plant Operation
Optimization [J]. Proceedings of the CSEE, 26(20):118-123,
2006.
DDCLS'18
519
DDCLS'18
520