Professional Documents
Culture Documents
Abstract²In electricity market, the wind power producers In previous studies, the ESS scheduling in a wind farm
face the challenge that how to maximize their income with the follows two separate processes: wind power prediction and
uncertainty of wind power. This paper proposes an integrated ESS decision making. In the wind power prediction, high-
scheduling mode that integrates the wind power prediction and dimensional meteorological data from wind farms are
the energy storage system (ESS) decision making, avoiding the compressed to forecasted wind power values, which causes the
loss of decision-making information in the wind power prediction. loss of effective decision-making information contained in the
Secondly, deep Q network, a deep reinforcement learning (DRL) original-meteorological data. Meanwhile, in the mathematical
algorithm, is introduced to construct the end-to-end ESS optimization algorithms based decision-making process, the
controller. The uncertainty of wind power is automatically
uncertainty of wind power is generally assumed to be a
considered during the DRL-based optimization, without any
assumption. Finally, the superiority of the proposed method is
specific probability distribution [7]. The inaccurate expression
verified through the analysis of the case wind farm located in of wind power uncertainty also reduces the scheduling
Jiangsu Province. benefits of wind farms [8].
To overcome the defects above, a deep reinforcement
Keywords—Deep Q network, deep reinforcement learning, learning (DRL) based method for ESS control in wind farm
electricity market, energy storage system, wind farm schedule. under the integration of prediction and decision is proposed.
The integration of prediction and decision means the ESS
I. INTRODUCTION control is directly driven by the high-dimensional original data
In recent years, as the most economical power generation (including meteorological data). Such end-to-end integrated
among non-water renewable energy sources, wind power scheduling mode can effectively utilize the hidden decision-
generation accounts for an increasing proportion of total making information in the original data to improve the
power generation. It is an inevitable trend for wind farms to scheduling profits. Secondly, deep Q network (DQN), a DRL
maximize their generation profits as wind power producers in algorithm, is introduced to construct the optimal controller
the electricity market [1]. However, the uncertainties of the under the integration of prediction and decision. The data-
wind power pose challenge for wind farm control. driven optimization algorithm allows the wind power
uncertainty laws contained in big data to be automatically
Integrating energy storage system (ESS) into the wind captured and utilized by the machine.
farm is an effective way to increase the profits obtained in the
condition of the uncertainties [2], [3]. There have been many
II. OVERVIEW OF THE PROPOSED METHOD
studies about the control of ESS considering wind power
uncertainty [4]-[6]. In literature [4], the opportunity cons-
A. Integration of Prediction and Decision in Wind Farm
trained optimization based model for pumped storage power
station control is proposed to alleviate the fluctuation of In traditional scheduling mode, wind power prediction and
integrated power. In the model, the wind power forecast error decision making are independent of each other. In the wind
obeys normal distribution. In literature [5], based on Monte power prediction, the input of the wind power prediction
Carlo method and scene reduction technology, the wind- system generally includes the real-time and historical output
storage scheduling model is established on multiple time power of the wind turbine and the real-time, historical and
scales, and the specific output of wind power and ESS is even predicted meteorological data (wind speed, wind
arranged in detail. Literature [6] introduced the reinforcement direction, temperature, air pressure, etc.). The output is the
learning (RL) into the decision making of the ESS control, and forecasted wind power Pw,fore t value in the future. In the
established a two-stage learning model based on Q-learning decision making, the controller determines the charge/
algorithm. discharge power of the ESS based on Pw,foret , the current state
of charge of the ESS and the price of electricity.
978-1-7281-1675-4/19/$31.00 ©2019 IEEE
568
Wind
turbins
In order to effectively deal with the high-dimensional input
state and extract the high-order data features for optimization,
Pw,t
Meteorology
data
this paper introduces the DQN algorithm to construct the
controller, as shown in Fig. 2. DRL takes the historical
scheduling experience as the learning samples and
continuously updates the parameters in the controller with the
Psys,t goal of maximizing profit.
Controller(Agent) Power
Grid
B. Deep Q Network
DRL Algorithm The DQN developed by DeepMind team in 2015 showed a
performance beyond human in Go game, which is one of the
optimization most classic algorithms in DRL [13]. DQN takes advantage of
Input State
a deep neural network (evaluation network) to approximate
Integration of Prediction and ESS the mapping relationship between input state and Q value,
(including
Decision (Controller) Instruction
meteorological data) enabling DQN to tackle a continuous state space. The meaning
of the Q value is the discounted expected value of the
Fig.2 Energy storage system optimization control based on deep accumulated reward obtained by the action after numerous
reinforcement learning trials. Secondly, DeepMind has established a replay buffer for
569
DQN to break the correlation among adjacent data samples the wind turbine, wind speed, wind direction, air pressure,
and realize offline learning. Finally, in addition to the humidity and other real-time, historical and even predicted
evaluation network, DeepMind also sets a target network meteorological data.
separately to eliminate the correlation of network parameters
in TD-Error. The iterative process of Q value is shown as B. Action Space
ª rt º In the integrated scheduling mode, the controller directly
« » outputs the scheduling instruction of the ESS. Therefore, the
« at 1
Q st , at ;Tt m Q st , at ;Tt D «J max Q st 1 , at 1 ;T » (2)
»
action space consists of n discrete quantities of the
charge/discharge power PESS,t of the ESS, as shown in (5):
« »
¬« Q st , at ;Tt ¼» A ^a1, a2 , , an ` (5)
where Q st , at ;Tt is the Q value of the action at in the state
st obtained through the evaluation network whose network C. Reward
parameters are represented as T t , the parameters of the target In this paper, the external environment refers to the
network are represented as T t , D is the learning rate of the electricity market environment, and the reward is the dispatch
evaluation network in supervised learning, rt is the immediate income obtained by the wind farm, calculated as follows:
reward from external environment and J is the discount factor rt Psys,t Ot 't (6)
that determines the current value of the rewards to be received
in the future. TD-Error is the loss function used to train the where: Ot is the selling price of the wind farm during the t
evaluation network whose specific form is defined as period. In the current work, in order to reduce the operating
rt J max Q st 1 , at 1 ;T Q st , at ;Tt in (2).
at 1
loss, the ESS is scheduled once every hour.
The ESS is limited by its operational constraints. In this
C. Action Selection Policy paper, the battery pack is chosen as the energy storage
The mapping relationship from state space to action space component. The charge/discharge power of the ESS can be
consists of the evaluation network and an action selection further expressed as:
strategy, we use H -greedy policy to select an action based on dis dis ch ch
the Q values. The policy defined in (3) calculates the PESS,t U ESS,t PESS,t U ESS,t PESS,t (7)
probability S that each action in the action space is selected dis ch
with.
U ESS, t U ESS,t d 1 (8)
dis ch
H where: PESS, t / PESS,t is the discharge/charging power of the
°1 H A , a arg max a Q s, a
dis ch
ESS in the t period. U ESS, t / U ESS,t is the discharge/charging
° state variables of the ESS, where value 0 indicating no, and
S a, s ® (3)
value 1 indicates yes. Equation (8) indicates that the state of
°H , a z arg max a Q s, a
°̄ A charge and the state of discharge cannot exist at the same time.
Battery pack charging/discharging power constraints:
where H (z 0) is used to determine the probability of dis dis dis
selecting the action randomly. 0 d PESS,t d PESS,maxU ESS,t (9)
ch ch ch
0 d PESS,t d PESS,maxU ESS,t (10)
IV. APPLICATION OF DRL
ch dis
Before applying the DQN algorithm described in the where: PESS,max / PESS,max are the maximum charge/discharge
previous section to the ESS integrated scheduling mode of power allowed by the ESS.
wind farm, it is necessary to determine the state space S, the
action space A and the reward value r returned by the external ESS capacity constraints:
environment. Emin d Et d Emax (11)
570
parameters in the controller converge. The implementation TABLE II. ELECTRICITY PRICES AT DIFFERENT TIME INTERVALS
process of the DQN algorithm is shown in Fig. 4. Interval index t 1 2 3 4 5 6
Reserve price (¥) 205 195 185 185 185 190
Initialize controller
Interval index t 7 8 9 10 11 12
parameters
Reserve price (¥) 195 200 205 210 215 220
DQN t=t+1
Interval index t 13 14 15 16 17 18
Reserve price (¥) 225 230 235 240 245 250
Calculate the Q-value of all Interval index t 19 20 21 22 23 24
actions in the current state st Reserve price (¥) 255 255 245 235 225 245
The evaluation network structure is shown in Fig. 5, which
Choose an action a using is a fully connected neural network with two hidden layers. In
İ-greedy policy the training process of the evaluation network, the learning
rate D is set as 0.001, the memory of the replay buffer is set
Observe the new state st+1 and
as 3000 samples, N is set as 300, the update interval of the
calculate reward rt
target network is 300, the reward discount factor J is 0.9, and
the H in the H -greedy policy is fit as 0.1.
Update the parameters in
the controller
1 1
Ot 1
2 2 1 Q1
Parameters no Et 1 2
3 3
converge Ϯ Q2
M t(1)
1 3
yes 4 4
͘͘͘
...
...
31 Q31
5 5
End M t(13)
1 15
͘͘͘
...
Input Output
Fig.4 Implementation process based on DQN 60 60
Layer Layer
The training process of evaluation network is done by the Hidden Hidden
RMSProp optimizer. Whenever the evaluation network is Layer 1 Layer 2
updated N times, the parameters of the evaluation network are Fig.5 Evaluation network structure
copied to the target network.
B. Simulation Results
V. SIMULATION RESULTS The gains from wind farms fluctuate with fluctuations of
wind power. Fig. 6 shows the variation of the average wind
A. Simulation Data farm's income with the number of samples experienced by the
controller. In this optimization process, the mapping from the
This paper takes a wind farm with an installed capacity of
state space to the action space is continuously optimized, and
50MW in Jiangsu Province as a case to analyze and verify the
the income of the wind farm also has a significant rising phase
proposed method. The battery pack parameters of the wind
as the sample increases and then reaches a stable fluctuation
farm are shown in Table I.
range. When the income curve is stable, the average income is
TABLE I. BATTERY PACK PARAMETERS 6216.1 瀲/h. The incomes of the wind power have stabilized,
which means that the parameters of the controller have
ch
PESS,max dis
PESS,max Emax Emin ch dis converged.
KESS KESS
(MW) (MW) (MWh) (MWh)
7.5 7.5 45 5 0.85 0.95
10000
8000
The wind farm state space consists of the forward-looking
Average revenue of time interval//
electricity price, the electricity value stored in the ESS, and 6000
the real-time wind farm measurement data. The measurement
4000
data includes: real-time wind tower 10m wind speed, wind
tower 30m wind speed, wind tower 50m wind speed, wind 2000
tower 70m wind speed, hub height wind speed, wind tower
0
10m wind direction, wind tower 30m wind direction, wind
tower 50m wind direction, wind tower 70m wind direction, -2000
hub height wind direction, wind farm pressure, humidity and
-4000
historical wind turbine output power. The entire state space
consists of 15 dimensions of data. The electricity price for 0 4800 9600 14400 19200 24000 28800 33600
Number of samples experienced by the controller
electricity sold at each time period is shown in Table II.
In the action space, the charge/discharge power of the ESS Fig.6 Change curve of the average gain of the wind farm
is divided into 31 actions:{-7.5, -7.0, ..., 0, ..., 7.0, 7.5}.
571
TABLE III. AVERAGE GAINS OF THE WIND FARM UNDER
DIFFERENT CONDITIONS
572
[8] Liu Guojing㸪Han Xueshan㸪Wang Shang㸪et al㸬Optimal decision-
ZĞĨĞƌĞŶĐĞƐ making in the cooperation of wind power and energy storage based on
[1] Wang Qingran㸪Xie Guohui㸪Zhang Lizi㸬An integrated generation reinforcement learning algorithm[J] 㸬 Power System Technology 㸪
consumption dispatch model with wind power[J] 㸬 Automation of 2016㸪40(9)㸸2729-2736㸬
Electric Power Systems㸪2011㸪35(5)㸸15-18㸪30㸬 [9] L. Busoniu, R. Babuska, and B. De Schutter, "A Comprehensive Survey
[2] Kyung S K㸪Mckenzie K J㸪Liu Y L㸪et al㸬A study on applications of Multiagent Reinforcement Learning," IEEE Transactions on Systems
of energy storage for the wind power operation in power Man & Cybernetics Part C, vol. 38, no. 2, pp. 156-172, 2008.
systems[C]//IEEE Power Engineering Society General [10] R. S. Sutton and A. G. Barto, "Reinforcement Learning: An
Meeting㸬IEEE㸪2006㸬 Introduction," Machine Learning, vol. 8, no. 3-4, pp. 225-227, 1992.
[3] Yan Gangui 㸪 Liu Jia 㸪 Cui Yang 㸪 et al 㸬 Economic evaluation of [11] C. Szepesvari, "Algorithms for Reinforcement Learning," vol. 4, no. 1,
improving the wind power scheduling scale by storage system pp. 632-636, 2009.
[J]㸬Proceeding of the CSEE㸪2013㸪36(22)㸸45-52㸬 [12] Deng Li 㸪 Yu Dong 㸬 Deep learning 㸸 methods and
[4] M. Young, The Technical WritHU¶V +Dndbook. Mill Valley, CA: applications[J] 㸬 Foundations and Trends in Signal Processing 㸪
University Science, 1989. 2014,7(3-4)㸸197-387㸬
[5] Ding H 㸪 Hu Z 㸪 Song Y 㸬 Stochastic optimization of the daily [13] V. Mnih et al., "Human-level control through deep reinforcement
operation of wind farm and pumped-hydro-storage plant[J]㸬Renewable learning," Nature, vol. 518, no. 7540, p. 529, 2015.
Energy㸪2012㸪48(6)㸸571-578㸬 [14] F. Yao, Z. Y. Dong, K. Meng, Z. Xu, H. C. Iu, and K. P. Wong,
[6] Wu Xiong㸪Wang Xiuli㸪Li Jun㸪et al㸬A joint operation model and "Quantum-Inspired Particle Swarm Optimization for Power System
solution for hybrid wind energy storage system[J]㸬Proceeding of the Operations Considering Wind Power Uncertainty and Carbon Tax in
CSEE㸪2013㸪33(13)㸸10-17㸬 Australia," IEEE Transactions on Industrial Informatics, vol. 8, no. 4, pp.
[7] Li J 㸪 Wan C 㸪 Xu Z 㸬Robust offering strategy for a wind power 880-888, 2012.
producer under uncertainties[C]// IEEE International Conference on
Smart Grid Communications㸬IEEE㸪2016㸬
573