Professional Documents
Culture Documents
Email: wangzhixiang513@gmail.com,wubin@bupt.edu.cn
Abstract—With the rapid development of information technol- systems, represented by Hadoop and Spark, has drawn the
ogy and internet, all kinds of industry data exploded causing attention of researchers and been widely used. The data mining
difficult to analyze and mine useful information from big data. platforms have achieved the transition from the data analysis
Traditional analysis system has bottlenecks of performance and
scalability in big data processing. The research and development to big data mining. With the excellent algorithm performance
of novel and efficient big data analysis and mining platform has exhibited by deep learning in recent years, as well as the
become the focus of all organizations. Along with the development outstanding performance in image analysis, speech recognition
of smart grid, power data with characteristics of power industry and target detection. The new goal of data mining analysis
needs more targeted and efficient data mining analysis. In this systems is to apply and integrate deep learning algorithms.
paper, aiming at the shortage of existing work, we propose
a distributed big data mining platform based on distributed At present, based on distributed computing frameworks such
system infrastructure such as Hadoop and Spark. The platform as Hadoop and Spark, and deep learning frameworks such
develops and implements a variety of rapid highly parallel mining as Tensorflow and Caffee, various general-purpose big data
algorithm by Spark and Tensorflow, including machine learning, analysis and mining platforms have been developed. While
statistics and analysis, deep learning and so on. Using the OSGI the algorithms implemented by the analysis algorithm library
technology to build low coupling component model, the platform
improve reusability of component algorithm, introduces the are aimed at general-purpose data and lack of the abilities to
workflow engine and user-friendly GUI, reduces the complexity make good use of characteristics for industry data. In the face
of the user operations, support user-defined data mining tasks. of industry professional data, it is difficult to deal with frequent
For the characteristics of smart grid big data, the platform changes in complex application business and decision-making
develops and improves the dozens of algorithm components needs. Therefore, the development of big data mining platform
about data processing and analysis. And designing a scalable
algorithms library and the component library greatly improves for domain business data has become the focus of further
the scalability of big data mining platform and processing smart research.
grid data. Our platform has already been launched in a state Power big data resources are similar but different from other
grid Company, satisfying the demand of various smart grid data business data. They are large and complex data resources,
analysis business. including internal data such as power grid operation data,
Keywords-Parallel; Data Mining; Components; Spark; Work- equipment inspection data, enterprise marketing data, power
flow enterprise management data, and external data related to grid
data, such as weather data, national economic operation data,
I. I NTRODUCTION etc. To maximize the potential value of power data resources
Along with the rapid development of modern information and make full use of these massive power data resources, it
technology and explosive growth of global data, large data has is necessary to obtain comprehensive information for power
become the most important impetus for nations, enterprises resource allocation decisions through comprehensive data
and even the efficiently sustainable development of society. analysis and industry characteristics analysis. Moreover, power
The world has entered the era of big data. data analysis involves many aspects of power production
Effectively analyzing data can’t lack the support of data operations. People in different departments often pay attention
processing tools and machine learning platforms. Traditional to specific analysis functions, resulting in the need to generate
analytical system is based on OLAP(Online Analytical Pro- a large number of analysis modules. Chinese existing power
cessing) system and OLTP(Online Transaction Processing) data analysis platform has better performance for targeted
system. These systems perform quite well on the process processing of power business, but the speed of data mining
of data analysis. Due to the limitation of the stand-alone analysis and processing due to technical framework cannot
operation mode, the data processing at the big data level meet the current situation of rapidly increasing power big data.
reveals defects such as long processing time and insufficient This article aims to solve the deficiency of the major data
performance. analysis platform. Based on a Spark, Hadoop, YARN and
Since the inception of distributed systems infrastructure other frameworks, it proposes the distributed data mining
1.0.0 formal version of Hadoop in 2011, distributed processing platform that oriented to the power of big data. The platform
2346
III. PARALLEL D EEP L EARNING N ETWORK This paper extracts the distributed computing engine Spark
based on memory computing and Tensorflow as the imple-
A. Deep Learning Network mentation framework of LeNet-5 and LSTM networks. It can
The basic unit in a neural network is a neuron model, which accelerate the training with the help of distributed computing
includes input, output and computational functions. And a features, realize the training parameters to improve the training
neuron model is expressed as the (1). precision and reduce the training loss. In each round of
training, one node in the Spark platform is selected as the
Xn training parameter server, and other computing nodes perform
Yj = f ( (Xi + Wij ) + Bj ) (1) model training by obtaining the obtained data fragments to
i=1 obtain the model parameter variation ∆α. The parameter
Where Yj represents the jth output result of the neuron server will receive the parameter variation calculated by each
model, Xi represents ith input element of the neuron model, computing node, update the model parameters and the copy of
Wij represents the product of Xi and the weight of the jth the model parameters in each computing node, and perform a
neuron and Bj represents the bias of the jth neuron. new round of training until the final training is completed.
A plurality of neurons are combined to form a level in a The deep neural network model parallelization implemen-
neural network structure, and a plurality of layers are stacked tation algorithm is shown in the Alg.1:
to form a specific neural network. With pre-training method
proposed by Hinton to alleviate the local optimal solution Algorithm 1 Parallel Deep Neural Network Model Training
problem in neural networks [21], the hidden layer is deepened Based on Spark
to 7 layers, and the neural network is promoted to a true deep Input: Training data set
neural network. Output: Lenet-5/LSTM network model for training
Deep neural networks often use a fully connected form to 1: Obtain the training data set, determine the data fragment
connect the lower neurons and the upper neurons, which will size and the number of partitions, and initialize the model
lead to excessive expansion of the number of parameters, and it training parameter α.
is easy to fall into the local optimum. The CNN(Convolutional 2: Distribute data fragments and model training parameter
Neural Network) [22], which can reduce the number of free copies to each compute node.
parameters in the network, becomes a more suitable neural net- 3: Each computing node extracts one of the allocated data
work structure. CNN is inspired by the study of visual cortical fragments for network model training
electrophysiology in biology. By introducing a convolutional 4: The parameter server receives the model parameter varia-
layer, CNN uses the convolution kernel as an intermediary tion ∆α of each computing node.
between neurons and shares parameter weights, which greatly 5: Update the central model parameters and the copy of the
simplifies the model complexity and reduces the parameters model parameters in each compute node, and adjust the
of the model. network model using the back propagation mechanism.
At the same time, the deep neural network can’t model the 6: Repeat step 2 until the training is complete.
changes at the time series level, and the accuracy of the appli-
cation of natural language processing, speech recognition and
other time series data is relatively low. The RNN(Recurrent The Fig.1 shows a schematic diagram of training based on
Neural Network) acquires historical time characteristic infor- Spark parallelization LeNet-5 network model
mation by applying the output of the neuron as another input
signal to the next time stamp. Specify the parameter server and
TrainSet Network Model
initialize α
the most classic convolutional neural networks with 7 layers. Data fragment, Data fragment, Data fragment,
Update α' Update α' Update α'
Each layer contains trainable connection weights and adopts
a strategy of weight sharing between each layer. It requires Convolution Convolution
Convolution Convolution Convolution Convolution
multiple rounds of iterative calculation, and the sharing of Update α
al Layer C1 al LayerC5 al Layer C1 al Layer C5 al Layer C1 al Layer C5
Σƒ(Δαi)
parameter weights between each layer provides basic condi- Pooling
Fully
Pooling
Fully Pooling
Fully
...
Connected Connected Connected
Layer S2 Layer S2 Layer S2
Layer F6 Layer F6
tions and optimization space for distributed computing. And Layer F6
2347
Fig. 2. The overview and architecture of the platform.
IV. P LATFORM OVERVIEW To meet demand, quickly reading and writing the data,
The distributed parallel data mining platform for power big of the various algorithms in the upper structure, the data
data proposed in this paper adopts a scalable and easy-to- warehouse module also combines the memory distributed file
expand five-layer architecture, as shown in Fig.2. From the system Alluxio with HDFS to provide data storage in memory
aspects of data visualization, data management, and execution or other storage facilities in the form of files. The service
monitoring, it deconstructs the characteristics of power big provides a reliable data sharing layer for the upper distributed
data analysis and provides effective and convenient analysis computing framework, while reducing redundant storage and
services. resource recovery time.
The other module is the computational processing frame-
A. Data Process work module, which implements a multi-hybrid computing
framework and is composed by Spark, Hadoop, Tensorflow,
The data process layer is aimed at raw power data origi-
etc.
nating from different data sources, including such as GIS data
Each computing framework has its own different resource
and EMS data. To process these various large data, the parallel
management systems. In order to realize the overall man-
platform adopts the MapReduce and implements more than 50
agement scheduling of the hybrid computing framework, the
parallel distributed ETL algorithms, which is robust and more
parallel platform realizes the unified resource management
efficient to meet various types of data processing demands.
module based on the YARN resource management framework
At the same time, tools, such as Sqoop, are provided to
of Hadoop. At the same time, the TensorflowOnSpark frame-
transfer the data extracted in the original data system to the
work is introduced to assist the docking of the Tensorflow
distributed storage System. And it also improves storage per-
framework and the Spark computing framework. The module
formance and increase storage security by using the strategy
also provides unified resources for the upper layer application
of multiple backups.
and avoids conflicts between resource allocations.
B. Infrastructure C. Parallel Algorithms
The infrastructure of platform consists of two module parts, The parallel algorithm layer is the core of the distributed
one of which is the data warehouse module consisting of data parallel mining platform. These algorithms on the parallel
NoSQL, HDFS, Alluxio, etc. platform are mainly based on Spark, Mapreduce, Tensorflow.
The data warehouse module stores structured and unstruc- By improving the parallelism of the calculation in these algo-
tured data in data partitions on disks. For the data stored in the rithms and designing new calculation process in algorithms,
data warehouse module, the parallel platform defines unified dozens of parallels algorithms with high degree of computation
metadata information. The metadata information is composed and high computational efficiency are developed. And it also
of data storage type, data storage location, data storage amount implements the commonly used distributed parallel machine
and other information. To support the whole data warehouse learning algorithms and various types of ETL algorithm for
module, the metadata is also stored in HDFS. data preprocessing modules. At the same time, commonly used
2348
deep learning algorithms, such as CNN, RNN, LSTM network Parallel Algorithms Layer
and Bi-LSTM network, are also implemented. The coupling ETL Clustering Classification Regression
More..
degree between the algorithms is low, and the algorithm use Algorithms Algorithms Algorithms Algorithms
the reserved interfaces of other algorithms to call up. It is
Call The
convenient for the algorithm to improve the operation by the Interfaces
method.
For the special services involved in the power system data OSGI OSGI Services
and the unique features of power data, the general data mining
analysis algorithms are difficult to do the trick. To solve Building Register
the multi-timing, complexity and partial professional power Algorithms And Functions
WorkFlow Engine
Components
equipment characteristics of the power data, more than 20
special algorithms have been developed and added into the Packaging Integrated
platform, such as chromatographic differential warning and Bundle Modulars Offer Components Interfaces
time series prediction. At the same time, an algorithm expan-
sion interface is provided. The algorithm library can append
the targeted algorithm and develop the original algorithm for Fig. 4. The flow chart of integrating components.
the requirements of the business service.
D. Integrated Components Each component calls the interface of the parallel algorithm
layer and obtains the relevant algorithm to implement the
Industry business applications and decision-making needs operating logic of component. It adopts the OSGI framework
often change frequently. And the dimensions of different to enrich the corresponding operating logic and encapsulate it
business mining analysis about the data are not the same. The into the most basic module form, which is named bundle. It
dimension of the power data mining analysis is accompanied also provides interface for calling up and is integrated into the
by complicated power service characteristics. For example, workflow engine. The flow is shown in the Fig.4.
transformer oil chromatographic data in power data, which Each integrated component provides some interfaces, which
contains relevant gas content data such as CH4 and O2. What could update and expand the functions of components, sup-
the non-power industry personnel maybe pay attention to is the ported by OSGI technology for the administer. The administer
curve and trend of each gas content, while the power industry could modify the original components according to the busi-
personnel pay attention to the analysis and excavation whether ness requirements without affecting other parts, and improve
the amount of some important gases, such as CH4, CO2, H2, some operating logic to support the business analysis.
exceeds the threshold and each of them correlation between
gas content and transformer failure. E. Application Service
For the complex demands, the parallel platform builds up
The parallel distributed power big data mining platform
a component-based, service-oriented development mechanism
is a cloud computing web service application. The user can
and runtime environment in the Fig.3. It decomposes the
access and operate the parallel platform through the browser
required development functions into multiple component sets.
to execute the data analysis, and the computing process is at
The information of component sets will be transferred in the
the cloud computing service node.
form of DAG. The workflow engine in the platform will
analyze the DAG and uses the OSGI(Open Service Gateway
Initiative) service to execute scheduling operations, such as
start-stop, update, and uninstallation, of the corresponding
functional components. Depend on the mechanism, the busi-
ness function application is highly dynamic.
Y
Finish
Fig. 3. The flow chart of scheduling components. Fig. 5. The user interactive interface: Studio
2349
The user interacts through the interactive interface Studio
of the parallel platform. The Studio is as shown in the Fig.5.
The operations of users are constructed in the form of a
workflow. The workflow is constructed in the form of the
data flow diagram that the component is a node and the data
interaction between the components is connected to the node.
The workflow treats the entire data mining analysis process
as data flowing and converting in a data channel. Each time
the data flows through a component node, it is converted into
corresponding data. If a node in the data channel fails or
the data fails to be processed, the data flow no longer flows Fig. 6. The speedup of parallel LeNet-5 in various size of partitions.
to the subsequent flow, and the cause of the task failure is
prompted. The output of each successful node can view. With
these features, the user can clearly understand the analysis
procedure of the data, the process interruption and the cause
of the interruption. It also help the user to solve the problem
and continue the previous data analysis operation.
Besides, the parallel platform provides a real-time monitor-
ing part of the task, which can be used to view the running
status of the current submitted data analysis process and know
the processed component node and its state of the current
analysis process, to assist the user.
In addition to the basic data mining functions, the parallel
platform also provides various types of functional operations, Fig. 7. The speedup of parallel LSTM in various size of partitions.
such as scheduling flow, text mining, social network analysis,
deep learning, and web reporting, for users to process data
analysis in a multi-dimensional perspective.
V. E XPERIMENT A ND P ERFORMANCE O F A LGORITHMS
This section will demonstrate the performance of distributed
parallel algorithms of the platform. The following is the
experiment environment: a Spark on Yarn cluster consisting
of 32 nodes. One of them is Master node and the rest are
Workers. The cluster is equipped with Spark 1.5.1, Hadoop
2.6.0, Tensorflow 1.3.0 and TensorflowOnSpark 1.2.1. Each
node is created by OpenStack on Dell R720 servers, which has
12 cpu cores with 2.10GHz, the memory of 48G and capacity
Fig. 8. The comparison result between stand-alone and parallel LeNet-5.
of 1000G.
A. Performance Of Parallel Deep Neural Network
The parallel platform adopts the aforementioned Hadoop
and Spark server clusters, and performs parallel optimization
in Alg.1 on LeNet-5 and LSTM networks based on Spark
and Tensorflow computing framework. At the same time, two
comparative experiments are carried out: 1. Comparing the
runtime between stand-alone deep learning network with the
parallel ones. 2. Comparing the speedup of parallel deep neural
networks in various number of partitions.
In the Fig.6 and Fig.7 show that the parallel networks
could effectively reduce the runtime with increasing the size Fig. 9. The comparison result between stand-alone and parallel LSTM.
of partitions. And in the Fig.9 and Fig.8, the stand-alone
network run faster than the parallel networks in small data sets The above results show that the runtime of parallel neural
because of the communication cost of parallelization. While network is significantly reduced with the increase of the
it also shows that the parallel networks is more efficient that number of partitions. It has a significant advantage over the
the stand-alone ones in large data sets and the computational stand-alone deep neural network under large data volume and
efficiency is much higher than the communication cost. the methods in Alg.1 is proved scalable and efficient.
2350
TABLE I
C OMPARISON R ESULT B ETWEEN S OME A LGORITHMS O F T HE P LATFORM storage structure. It uses the high-speed IO feature of memory
A ND ML LIB to achieve fast calculation and improves the performance of
various data mining algorithms based on Spark in large-scale
Data BP NaiveBayes Linear Regression
Size Platform MLlib Platform MLlib Platform MLlib data computing. Hadoop uses file storage to store intermediate
0.5G 108s 168s 30s 30s 3501s 3558s data files, which is inferior to Spark computing engine in
1G 256s 289s 42s 59s 7010s 7204s terms of time performance. But it does not depend on memory
5G 781ss 1316s 99s 99s 23264s 24009s
10G 1489s 2877s 209s 310s 39657s 42591s
capacity, and it is not easy to cause data analysis tasks due to
insufficient memory resources under large data volume. The
data analysis task is extremely stable. Tensorflow is a widely
B. Performance Of Parallel Machine Learning Algorithms used computational engine framework for deep learning in the
world, supporting a variety of deep learning algorithms, but it
In addition to parallelized deep neural networks, The paral- is still in single-server usage.
lel platform provides parallelized ETL and machine learning Based on the characteristics of the above computing en-
algorithms to support a wider range of power data analysis gine framework, the parallel platform implements distributed
services and efficiently process big data in the power industry. parallel machine learning algorithm, distributed parallel ETL
The platform also introduces and improves some part of algorithm and deep learning algorithm. Then, it improves
the MLlib algorithms in operating logic optimization. And the Yarn resource management framework in Hadoop and
experiments are carried out on classical data sets of different introduces TensorflowOnSpark to enable combined operation
scales. The experimental results are shown in the Table I. by Tensorflow and Spark framework. At the same time, it
VI. P LATFORM F EATURES abstracts server computing resources, uniformly allocates and
manages the resources that jobs apply. Separating various
A. Parallel Algorithms Library computing tasks could avoid resource conflicts in each com-
Parallel platform integrates Spark, TensorflowOnSpark and puting engine framework and realize multi-computing engine
other computing frameworks and develops parallel deep learn- framework running at the same time.
ing algorithms. It also realizes data parallelism, parallel model C. High Applicability For Power Data
training and iterative averaging of model parameters. Forming
a central model to run the algorithm in a distributed CPU In view of the complexity, timing, and real-time char-
environment. It greatly decreases the training runtime of the acteristics of power data, general-purpose big data analysis
deep neural network and improves the ability of processing algorithms are often less applicable. To this end, the parallel
large power data. platform has specifically integrated multiple algorithm com-
The parallel platform implements and provides dozens of ponents for power data analysis business needs. For example,
parallel machine learning and ETL algorithms for different dozens of algorithms for differential early warning algorithms
needs of power data analysis services. Based on the Spark, for chromatographic data, load evaluation algorithms for de-
Hadoop and other frameworks, the traditional analysis algo- vice load data, text analysis algorithms for device defect
rithms is reconstructed. The distributed computing system is data, and gray correlation algorithms for grid data correlation
adopted to realize distributed parallelized machine learning analysis. And the parallel platform provides the extended
and data analysis algorithms, which complements the ma- algorithm interface. When faced with professional and strong
chine learning library MLlib provided by Spark computing business analysis requirements, the corresponding algorithm
framework. The algorithms also improves the operating logic can be extended for specific services to complete the analysis
of some algorithms in the MLlib library. For data iterative task.
calculation algorithms, such as DBScan, Clara, it uses the D. Simple And Convenient Operations
memory computer system and data memory storage function The parallel platform provides an interactive graphical big
provided by Spark. The calculation operation is transferred data analysis and management interface, which is named Stu-
to the memory, which greatly reduces the data access loss dio. It uses a workflow diagram to construct and compose data
time. Aiming at the parallel operations in the algorithm, which analysis tasks. The arrow marks in the workflow diagram point
could improve the parallel degree, the new algorithm process to the data flow and dependencies. Users can click and drag.
is designed to replace the time-consuming process of the The ambiguous way to complete the creation, configuration,
algorithm, and a more reasonable method such as reducing reuse, submission, operation, task monitoring and visualization
the dimensional function was adopted. It reduces the running of the workflow diagram. Parallel platform-friendly operation
time of the algorithm and improves the running performance reduces the threshold for users to perform big data analysis
of the algorithm. tasks, and efficiently realizes in-depth analysis and value
B. Hybrid Computing Framework mining of big data in the industry.
The most widely used computing frameworks in distributed VII. A PPLICATION O N F IELD O F P OWER
computing are Spark and Hadoop. Spark focuses on memory Based on the advantages of simple and efficient parallel
computing and adopts memory as the intermediate computing platform, the user can realize a variety of power data analysis
2351
TABLE II
business requirements by operating the parallel platform. It T HE R ESULT O F D IFFERENTIATED WARNING O N O IL C HROMATOGRAPHY
has been launched in a certain power grid company, and O F T RANSFORMER C HROMATOGRAPHY
cooperated to unstructured text data extraction and realize the
Time Warning Algorithm Neural Network SVM GM
differential data analysis for power business data. It enrichs 2017-05-01 0.095 0.122 0.126 0.202
the methods of power data analysis business and improves the 2017-05-20 0.053 -0.115 0.128 0.151
efficiency of power-related data analysis. 2017-06-20 0.079 -0.126 0.214 0.265
2017-06-30 0.044 -0.07 0.053 0.131
AveError 0.021 0.028 0.036 0.047
Fig. 10. The workflow of differentiated warning on oil chromatography of Fig. 11. The workflow of fault text of power transmission and transformation
transformer chromatography. analysis.
2352
defect data text analysis extraction. Then these have been put
on some power grid company.
In the future work, we will continue to expand the follow-up
work of this paper in terms of deeper development, optimiza-
tion of deep learning algorithms and expansion of other types
of data models in the power field.
ACKNOWLEDGMENT
This work is supported in part by the National Key R&D
Program of China(No.2018YFC0831500).
R EFERENCES
[1] Holmes G, Donkin A, Witten I H. Weka: A machine learning work-
bench[C]//Intelligent Information Systems, 1994. Proceedings of the 1994
Fig. 12. The result of fault text of power transmission and transformation Second Australian and New Zealand Conference on. IEEE, 1994: 357-
analysis. 361.
[2] Hofmann, M., Klinkenberg, R. RapidMiner: Data mining use cases and
business analytics applications[M]. CRC Press, 2013.
[3] Berthold M R, Cebron N, Dill F. KNIME: The Konstanz Information
performance. The experimental results in Fig.12 show that the Miner[J]. Acm Sigkdd Explorations Newsletter, 2006, 11: 26-31.
[4] An ZHUO. Research and Implementation of Big Data Analysis Platform
algorithm is more suitable for power data than the same type Based on P2P Scalable Architecture[D]. Tsinghua University, Beijing,
of algorithm. China, 2012.
[5] Yu L, Zheng J, Shen W C, et al. BC-PDM: data mining, social
network analysis and text mining system based on cloud comput-
VIII. C ONCLUSION ing[C]//Proceedings of the 18th ACM SIGKDD international conference
on Knowledge discovery and data mining. ACM, 2012: 1496-1499.
For the demand of power big data business analysis and [6] De Francisci Morales G. SAMOA: A platform for mining big data
mining along with the development of smart grid, this paper streams[C]//Proceedings of the 22nd International Conference on World
designs and develops a distributed parallel data mining plat- Wide Web. ACM, 2013: 777-778.
[7] Qing HE, Fuzhen ZHUANG, Li ZENG. PDMiner: Parallel Distributed
form for power big data. The parallel platform adopts a highly Data Mining Platform Based on Cloud Computing[J]. Science China,
reusable distributed framework and designs a scalable parallel 2014, 44: 855-871.
algorithm library. It integrates nearly 100 well-running perfor- [8] Li W, Cheng H L, Peng Y, et al. Visualized data mining platform based
on the Spark[J]. Chinese Association of Automation System Simulation
mances and highly parallelized algorithms, which are partially Professional Committee, 2014.
superior to existing open source algorithm libraries MLib [9] Jun LEI, Hangjun YE, Zesheng WU, Big-Data Platform Based on Open
and Hive tools. A variety of deep neural network, machine Source Ecosystem[J]. Journal of Computer Research and Developmen,
2017, 54: 80-93.
learning, statistical analysis and other categories of parallel [10] Guo T, Xu J, Yan X, et al. Ease the Process of Machine Learning with
general mining analysis algorithms and special algorithms for Dataflow[C]//Proceedings of the 25th ACM International on Conference
power data analysis business needs are involved in library. In on Information and Knowledge Management. ACM, 2016: 2437-2440.
[11] Bu Y, Wu B, Chen Y. BDAP: A data mining platform based on Spark[J].
addition, the graphical interactive interface Studio provided Journal of University of Science Technology of China, 2017, 47: 358-368.
by the parallel platform indicates the data flow direction and [12] Yu K. Large-scale deep learning at baidu[C]//Proceedings of the 22nd
dependence relationship with the workflow diagram. It also ACM international conference on Information and Knowledge Manage-
ment. ACM, 2013: 2211-2212.
visually displays the system analysis process, the current exe- [13] Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale
cution status and the analysis result. Then the Studio supports machine learning[C]//OSDI. 2016, 16: 265-283.
the user to customize the data analysis tasks by clicking and [14] Zou Y, Jin X, Li Y, et al. Mariana: Tencent deep learning platform and
its applications[J]. Proceedings of the VLDB Endowment, 2014, 7(13):
dragging without the operations to write programs, which 1772-1777.
reduces the threshold for users to perform big data analysis [15] (2016) The large data platform that supports EB level is re-
tasks. vealed in depth. [Online]. Available:https://yq.aliyun.com/articles/34246?
spm=5176.7965709.247259.6.341b6fdcDH9QQB,
In order to support the frequently changing business needs [16] ur Rehman M H, Liew C S, Wah T Y. UniMiner: Towards a unified
in power big data mining, the parallel platform integrates more framework for data mining[C]//Information and Communication Tech-
than 20 special algorithms for power data business analysis, nologies (WICT), 2014 Fourth World Congress on. IEEE, 2014: 134-139.
[17] Jungang YANG, Jie ZHANG, Wei QIN. Big data analysis platform
and implements pluggable component libraries using OSGI for semiconductor manufacturing[J]. Computer Integrated Manufacturing
technology. These components not only implements flexible Systems, 2016, 22: 2900-2910.
modification, but also decouples the functions of the analysis [18] Ping HU, Zhongqun WANG, Tao LIU. General Electric Data Platform
Based on Distributed OSGI[J]. Computer Engineering, 2014, 40: 71-75.
tasks and then logically organizes the existing component sets [19] Bengong YU, Tianxiang QIAO, Dong ZHANG. Research on Data Anal-
for the business requirements to generate corresponding data ysis Platform of Power Grid Based on Portlet Component[J]. Computer
analysis function modules. And the parallel platform realizes Technology And Development, 2015, 25: 218-220.
[20] Yun CHEN. The Design and Implementation of the Distributed Comput-
various algorithms for grid data analysis business demands, ing and Analysis Platform for Power System[D]. University of Electronic
including chromatographic data differentiation warning and Science and Technology of China, Sichuan, China, 2016.
2353
[21] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data
with neural networks[J]. science, 2006, 313(5786): 504-507.
[22] Taylor G W, Fergus R, LeCun Y, et al. Convolutional learning of spatio-
temporal features[C]//European conference on computer vision. Springer,
Berlin, Heidelberg, 2010: 140-153.
[23] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to
document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-
2324.
[24] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural com-
putation, 1997, 9(8): 1735-1780.
[25] Yuzhu Jiang. Research on data processing and analysis for electrical
equipment condition monitoring usingv Hadoop[D]. North China Electric
Power University, Beijing, China, 2014.
[26] Peng X, Deng D, Cheng S, et al. Key technologies of electric power
big data and its application prospects in smart grid[J]. Proceedings of the
CSEE, 2015, 35(3): 503-511.
[27] Simmhan Y, Aman S, Kumbhare A, et al. Cloud-based software platform
for big data analytics in smart grids[J]. Computing in Science and
Engineering, 2013, 15(4): 38-47.
[28] Yu G, Jin-zhuang L V. Application of big data mining analysis in power
equipment state assessment[J]. Southern Power System Technology, 2014,
8(6): 74-77.
[29] Zhang P, Yang H, Xu Y. Power big data and its application scenarios in
power grid[J]. Proc. CSEE, 2014, 34: 85-92.
[30] Tian L, XIANG M. Abnormal Power Consumption Analysis Based on
Density-based Spatial Clustering of Applications with Noise in Power
Systems[J]. Automation of Electric Power Systems, 2017, 5: 64-70.
[31] Zhu Y, Jia Y, Wang L. Partial discharge pattern recognition method
based on variable predictive model-based class discriminate and partial
least squares regression[J]. IET Science, Measurement and Technology,
2016, 10(7): 737-744.
[32] Kranjc J, Orač R, Podpečan V, et al. ClowdFlows: Online workflows
for distributed big data mining[J]. Future Generation Computer Systems,
2017, 68: 38-58.
[33] Basso T, Moraes R, Antunes N, et al. PRIVAaaS: privacy approach for
a distributed cloud-based data analytics platforms[C]//Proceedings of the
17th IEEE/ACM International Symposium on Cluster, Cloud and Grid
Computing. IEEE Press, 2017: 1108-1116.
[34] Jain V, Seung S. Natural image denoising with convolutional net-
works[C]//Advances in Neural Information Processing Systems. 2009:
769-776.
[35] Ngiam J, Chen Z, Chia D, et al. Tiled convolutional neural net-
works[C]//Advances in neural information processing systems. 2010:
1279-1287.
[36] Dean J, Corrado G, Monga R, et al. Large scale distributed deep
networks[C]//Advances in neural information processing systems. 2012:
1223-1231.
2354