You are on page 1of 8

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Optimal battery management strategies in mobile


networks powered by a smart grid
Wael Labidi, Tijani Chahed and Salah-Eddine Elayoubi
Institut Mines-Telecom, Telecom SudParis, SAMOVAR, 9 Rue Charles Fourier, 91000 Evry, France
Orange Labs, 140 avenue de la Republique, 92320 Chatillon , France
{wael.labidi, tijani.chahed}@telecom-sudparis.eu, salaheddine.elayoubi@orange.com

Abstract—We focus in this paper on energy management Energy storage techniques have been widely studied with
strategies for a mobile network equipped with battery storage the aim of improving the autonomy of the systems and
capacity as well as local energy production capability, and pow- optimizing their power consumption.
ered by a smart grid. At each time instant, the mobile network
operator has to decide whether to operate its network based on its The authors in [3] proposed an analytical model for optimiz-
own energy resources or the smart grid ones, with a possibility to ing battery storage management. They simplified the solution
sell energy to the smart grid as well. We formulate our problem by setting thresholds on the energy price allowing the operator
using Markov Decision Process (MDP) and derive an optimal, to decide online if it sells or buys energy. The achieved control
offline policy which minimizes the operator energy bill, using
dynamic programming algorithm. We show the optimality of
is independent of the state of charge of the battery, the solution
our solution by numerical comparison with the solution obtained is not optimal and could lead to high energy losses.
through linear programming. Our numerical applications allow In [4], the authors used linear programming to obtain a
to further understand when the operator has an incentive to buy battery management strategy in order to maximize the profit
energy, whether it is beneficial for it to act as an energy seller,
of an energy provider in the micro-grid1 under the uncertainty
the size of the battery to deploy, as well as the robustness of our
offline deterministic policy to estimation errors. of the wind turbine production.
The authors in [5] used dynamic programming at the energy
Index Terms—Energy expenditure minimization, Optimal of-
fline policy, Markov decision process, Dynamic programming, provider side to determine the optimal battery storage man-
Linear programming agement to satisfy a deterministic user load in order to reduce
as much as possible the energy losses assuming a Markovian
distribution for the photovoltaic energy production. The paper
I. I NTRODUCTION also studied the optimal battery size allowing to avoid energy
losses.
Smart grid networks have gained tremendous attention in
the energy-oriented research community over the years. Many In our work, we suppose that the serving eNB is equipped
works tend to exploit the data collected from the different sen- with a battery that can be charged using its own renewable
sors in the network in order to optimize the energy production energy sources or by buying energy from the energy provider
and distribution, to increase the use of the renewable power and discharged to serve its users traffic requests or when it
sources and, more generally, to save energy [1] . sells energy back to the smart grid, as illustrated in Fig. 1.
In the telecommunications field, resource-hungry applica- Our goal in this work is to reduce the daily energy ex-
tions have been continuously gaining popularity among the penditure of the network operator while serving the network
users, driven by the development of smart devices, such subscribers traffic load.
as smartphones and tablets. These users ask for intensive We formulate this problem as a Markov Decision Process
computing resources from the network operator forcing it to (MDP). Then, using dynamic programming tools, we investi-
ask for high amounts of energy from the electrical grid which gate offline solutions to find the optimal battery management
presents, in turn, a challenge for the electricity provider. policy based on prior knowledge of the environment: network
One of the solutions to overcome these issues is to equip subscribers traffic load and electricity unitary price.
the network operator, and especially the eNodeB (eNB)’s, with Our contributions are:
batteries that can be charged directly from the smart grid or • Devising an optimal offline deterministic policy for the
from its own renewable energy production on site and which operator to decide whether to buy or sell energy, to
are able to provide the operator with the energy required for operate on its own battery or on energy of the smart grid.
autonomous operation which may last few hours for instance • Comparing the above policy with a randomized one
[2]. obtained via linear programming and showing that both
These batteries will allow the operator to set an energy yield the same performance and hence prove the optimal-
storage strategy that takes advantage of changing electricity ity of the deterministic one.
prices and communication traffic load to reduce its energy
acquisition costs and to lighten its demand from the energy 1 Micro-grids are modern, small-scale versions of the centralized electrical
provider in the critical hours of the day characterized by a system. They have their own control capability, which means that they can
high energy users demand. disconnect from the traditional grid and operate autonomously.

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

•Evaluating the impact of the battery size, the energy equal to the day ahead price and other probabilities to be
seller capability and the renewable energy on the average different.
operator daily energy expenditure. The network provider has an infrastructure which allows it
• Testing the robustness of our offline policy to errors in to satisfy its users traffic demands. This infrastructure (eNB’s,
the estimation of user traffic, energy prices and battery backhaul, etc) needs to be continuously powered to ensure
state of charge (SOC). its operation. The operator’s eNB consumes a fixed amount
The remainder of the paper is organized as follows. In of energy for some basic features independent of the users
Section II, we describe our system and model from the traffic in the cell (cooling, energy storage, etc) and a variable
energy and the network providers points of view and explain energy component, dependent on the users activities (baseband
the interactions between them. In Section III, we formulate processing, radio frequency components, etc).
the optimization problem which reduces the average operator User traffic changes during the day. It depends on the user
energy cost and show how to solve it using both dynamic density in the covered area and the requested services. Let Ut
and linear programming. In Section IV, we show and discuss denote the ratio in time slot t between the actual traffic demand
our numerical results and analyze the robustness of the offline and the maximal traffic the cell can handle in the peak hours.
policy. Eventually, conclusions are given in Section V. We suppose that the operator uses all its available capacity to
serve its users when Ut = 1. Ut is a random variable which is
II. S YSTEM AND M ODEL taken from real statistics in [6]. The network operator has to
ensure that all its user requests are satisfied; no request could
Our system is composed of two players: a network operator
be delayed or blocked.
and an energy provider, as shown in Fig. 1. These two players
are in constant interaction as the network operator needs to The operator sites are equipped in addition with photovoltaic
be constantly powered to be able to operate its network. This panels (PV) that produce a deterministic amount of renewable
power comes from the energy provider who is the principle power Pt depending on the hours of the day. This energy is
intervenient in the smart grid. stored in the battery to be used later.
The operator equipment can be run using two types of power
sources:
• It can be powered directly by the smart grid. The network
operator pays its consumption following the price fixed by
the energy provider. We assume that the grid can satisfy
all the operator demands for electricity.
• It can be powered by its own battery. The amount of
energy that can be consumed at each time slot is limited
by the battery capacity Bmax . We assume that the use of
the battery has no financial cost for the operator.
The operator can also store in its own battery the energy
Fig. 1: Interactions between the system players
bought from the electricity provider. We assume that when the
battery is charging, the network operator cannot use it to run
The energy provider produces electricity to respond to the its components. Thus, its infrastructure has to be powered in
demands of its customers (houses, factories, etc). This energy this case directly by the smart grid.
is sold with a price fixed by the grid provider and changes We assume also that the battery energy efficiencies during
from one hour to another. We consider that electricity spot charge and discharge processes are equal to 1. Thus, all the
prices are announced by the smart grid one day in advance. energy that is bought from the smart grid could be fully stored
The pricing system in this so-called day-ahead market is, in without any losses. We assume also that the network operator
principle, determined by matching offers from generators to can sell energy back to the grid. We assume that the selling
bids from consumers at each node in order to produce a classic and purchase prices are equal and that the energy provider will
supply and demand equilibrium price, typically on an hourly never decline the offers of the network operator.
basis. This is the case for instance for EPEX SPOT SE which The time is divided into equal epochs of one hour each.
is a European trading market for electricity which indicates We suppose that this duration is sufficient to fully charge or
an estimated price of energy one day-ahead based on demand discharge the battery.
and supply in order to allow intervenients to make their bids The battery level is discretized into L finite values in
on electricity. [0, BL−1
ma x
, ..., Bmax ]. The battery state of charge (SOC) evo-
The network operator, as one of the biggest energy con- lution over time is expressed as follows:
sumers in the grid, is informed in advance of the expected Bt+1 = Bt + ct .Bmax + T .Pt − dt .nT X .T .(P0 + ∆ p .Pout .Ut ) (1)
electricity prices to be able to set its energy purchase strategy.
The real time price, which we denote by Ft , changes during where:
the day depending on the users demands for electricity. Its • ct .Bmax is the amount of energy that the operator could
value can differ from the day-ahead price. We assume a simple buy (resp. sell) from (resp. to) the grid. ct takes its values
model in which the real price has a certain probability to be from a finite space in [−1, ..., 1]. ct is delimited by -1 and

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

1 due to the fact that the operator cannot sell more than operator can choose from a finite space of possible actions
its battery reserves or charge its battery more than its at ∈ Ωt (st ) depending on the state st . at = (ct , dt ) with
capacity. If ct is negative the operator will sell energy. If ct the amount of energy to buy (resp. sell) from (resp.
it is positive, it will charge its battery and will be at the to) the smart grid and dt the decision of switching the
same time powered by the smart grid to satisfy its users power source between the battery and the smart grid. If
demands. the operator has the capability to be an energy seller,
• T is the observation interval duration. ct could take negative values, otherwise it can only be
• Pt is the renewable energy produced by the PV panels. positive.
• dt is the action of switching between the battery and the • Transition probabilities: They correspond each to the
smart grid. If dt = 0, the operator is powered by the smart probability of reaching state st+1 knowing that the system
grid. If dt = 1, the network infrastructure runs using its was in the previous slot in state st and that the operator
own battery. performed action at .
• nT X is the number of sectors in the eNB. P(st+1 /st , at ) = P(Bt+1 /st , at ).P(Ut+1 /st , at ).P(Ft+1 /st , at ).
• P0 is the power consumed by the eNB for traffic inde- (2)
pendent features. The battery state of charge at the beginning of slot
• ∆ p is the slope of load dependent power consumption. t + 1, denoted by Bt+1 , is completely known if we know
state st and action at taken at time slot t. Thus, the
• Pout is the irradiated power by the eNB’s antennas. battery transition probability is deterministic and can be
expressed as follows:
III. P ROBLEM F ORMULATION AND A NALYSIS (
1 i f Bt+1 = Bt + ct .Bmax + T .Pt −
In this section, we formulate the problem of minimizing the P(Bt+1 /st , at ) = nT X .T .dt .(P0 + ∆ p .Pout .Ut )
operator’s energy bill while satisfying the users traffic requests. 0 otherwise
(3)
An unconstrained Markov Decision Process (MDP) will be
The energy prices and user traffic load are assumed to be
proposed and solved using dynamic and linear programming
independent and uncorrelated in time. Thus:
to devise respectively offline deterministic and randomized
P(Ft+1 /Ft , at ) = P(Ft+1 ) and P(Ut+1 /Ut , at ) = P(Ut+1 )
policies which optimize the operator’s daily expenditure for
energy. • Instantaneous reward (outcome): It refers to the money
We treat both cases: when the operator has PV sources and that the network operator spends or earns with respect to
when it does not. Moreover, the network operator can be: the current state st and the action at it performs at time
• An energy seller and so could sell its battery surplus to slot t.
the smart grid to make profit especially when the energy rt (st , at ) = Ft .[ct .Bmax +T .(1− dt ).nT X (P0 +∆ p .Pout .Ut )]
prices are high. We suppose that the energy provider will (4)
never decline the offers of the operator.
The algorithm applied on this MDP aims to find an optimal,
• The operator has not the possibility to sell energy. The
deterministic, offline policy, denoted by µ̄∗ , which at each time
quantities bought or produced are exclusively intended
slot t and each state s defines a unique action a to perform
for its own consumption.
by the operator among all possible actions Ωt (s) in that state.

A. MDP Formulation To devise this policy, the operator needs to have a perfect
knowledge of the distribution of the random parameters in its
An MDP is a discrete-time state transition system. It is used environment: the energy prices and the user traffic requests
for modeling decision making in situations where outcomes in our case. This could be achieved through a learning phase
are partly random and partly under the control of a decision from previous experiments.
maker. It is composed of 4 components: states, actions, tran- The policy is performed at the beginning of each hour
sitions and outcomes. This process has to verify the Markov on a whole day. We define N = 24 to be the width of the
property: the effects of an action taken in a state depend only optimization window.
on that state and not on the prior history.
Let µ̄ denote the mapping between the state and action
Our system can be seen as an ergodic Markov chain leading
spaces and can be expressed as follows: µ̄ = (µ0, µ2, ..., µ N −1 )
to an MDP and can be solved using dynamic programming
with : µt : S → A as a = µt (s).
tools.
We define the following components for this MDP : The optimal policy µ̄∗ solves the following finite horizon
• State space: It denotes the possible states st the system
average cost problem:
can be in at any time t. st = (Bt , Ut , Ft ), where Bt is the N −1
1 Õ
battery state of charge, Ut is the ratio between the actual µ̄∗ = argmin rt (st , µt (st ))
µ̄ N
user traffic load and the maximal one and Ft is the unitary t=0
price of electricity. All these values are discrete and finite s.t. 0 ≤ Bt ≤ Bmax ∀t ∈ [1, .., N] (5)
which makes the whole system state space finite.
The objective is, again, to decide at each time slot, the
• Action space: It represents the actions that the operator energy source (grid or battery) and the amount of energy to
could perform at the beginning of each time slot t. The buy or sell in order to minimize the operator average daily

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

expenditure for energy while satisfying the user traffic demand C. Linear Programming Algorithm
and the battery level limitations. In this subsection, we convert the problem contained in
equation (5) to a constrained linear program which can be
solved by classical linear optimization tool.
B. Offline Dynamic Programming Algorithm The linear programming algorithm aims at finding a ran-
domized policy that solves the MDP (given by equation (5)).
The dynamic programming technique uses stochastic opti- In order to do this, the average cost problem is converted to
mization algorithms to restrict the search for the optimal policy a linear expression for which we have to find the occupation
that solves the MDP. The aim is to compute a policy which measures of all the state/action couples (s, a) at each time
describes how to act optimally in the face of uncertainty. It step t [9]. The occupation measures at time slot t, denoted by
consists at breaking a complex problem into sequential sub- βt (s, a), refer to the probabilities that the system reaches state
problems, solve them individually and combine their solutions s and performs action a in time slot t.
to achieve the global one [8]. In each stage, dynamic pro- Having an initial state occupancy denoted by γ at time
gramming makes decisions based on all the decisions made in slot t = 0, we have to find at each time slot t ∈ [1, .., N],
the previous stages, and may reconsider the previous stage’s the occupation measures of all the states and actions that
algorithmic path to the solution. minimize the objective function. The solution given by linear
In our case, the goal of this algorithm is to find for our programming is optimal [9] but depends closely on the initial
ergodic finite horizon Markov chain an optimal, deterministic state distribution, contrary to the solution given by dynamic
offline policy which solves the average cost problem given in programming.
equation (5) [8]. The optimization problem can be expressed as follows:
The optimization is based on Bellman equations which N −1
1 ÕÕ
compute the optimal path to follow starting from any initial min βt (s, a).rt (s, a)
β N
state distribution at t = 0. The minimization problem of this t=0 (s,a)
MDP is handled by solving iteratively dynamic programming Õ
s.t ∀ t ∈ [0, .., N − 1] βt (s, a) = 1
equations.
(s,a)
Specifically, the objective is to find at each step k ∈ Õ
[0, .., N −1] the optimal value function that defines the optimal ∀ s ∈S0
βt+1 (s , a ) = pt (s 0/s, a)βt (s, a)
0 0

a0
cost to go from any state s ∈ S at time step k to the final state.
! ∀ t ∈ [0, .., N − 1]
βt (s, a) = 0 i f a < Ωts
N −1 Õ
µ̄ 1 Õ ∀ s∈S β0 (s, a) = γ(s)
V (s) = min E
(k)
rt (s, µt (s)) (6)
µk N − k t=k a
(9)
For s0 denoting the initial state at t = 0, = 0 ) is J∗ V (0) (s The first line above represents the objective of minimizing
the optimal cost to go from the initial state to the final one in the daily expenses of the operator for energy. The first con-
N steps. J ∗ can be found by applying backward recursion straint ensures that the sum of all the occupation measures of
assuming that V (N ) (s) = r N (s), where r N is the end cost all couples (s, a) has to be equal to 1 at all time slots. The
obtained at step N. In the algorithm, we took it null for all second constraint sets a relationship between the occupation
the states. measures and the transition probabilities of the Markov chain.
The optimal policy µ̄∗ is obtained by computing, through In the third constraint, we eliminate the actions that could
backward recursion, the following two relationships: not be performed at state s : charging the battery more than
• Policy improvement equation: It consists at finding at its capacity or run on battery when the energy consumption
each step k and for each state s ∈ S the optimal action is higher than the battery actual state of charge. The last
µk (s) to perform over all possible actions a ∈ Ωk (s) constraint ensures that the system is in state s0 in time slot
which minimizes the cost to go for the remaining N − k t = 0 with respect to the initial distribution γ.
transitions:
" # Having at each t ∈ [0, ..., N − 1] the optimal occupation
measures βt (s, a), we can devise a randomized optimal policy
Õ
0 (k+1) 0
µk (s) = argmin rk (s, a) + p(s /s, a).V (s ) (7)
a ∈Ω k (s) s0 ∈S µ̄r . It gives ∀t ∈ [0, ..N − 1] and ∀ s ∈ S, the probability ρts (a)
of doing an action a when the system reaches state s at time
• Policy evaluation equation: It consists at evaluating the
cost to go at each state s ∈ S from step k to N after slot t.
choosing the policy µk to perform. This probability can be expressed as follows:
βt (s, a)
Õ
V (k) (s) = rk (s, µk (s)) + P s 0 /s, µk (s) .V (k+1) (s 0 )

(8) ρts (a) = Í (10)
s0 ∈S a βt (s, a)
The obtained optimal policy µ̄∗ = (µ0, ..., µ N −1 ) is deter- Comparing the performances given by both deterministic
ministic as at each state s there is only one unique action a and randomized policies can be a proof of the optimality of
to be executed. It minimizes V (0) (s0 ) regardless of the choice the deterministic offline policy [9]. We show this comparison
of the initial state s0 [9]. next, in the numerical results section.

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

IV. N UMERICAL R ESULTS As indicated above, if ct ≥ 0 and dt = 0, the operator is


powered by the grid and can also buy energy to charge its
In our simulations, the time is divided into slots of 1 hour battery. If ct = 0 and dt = 1, the operator is using its own
each and the optimization is done on a daily basis N = 24. battery.
We choose 3 sizes of battery: 4, 8, 12 KWh. We take L states
of battery charge depending on the battery capacity Bmax .
L = 21 for Bmax = 4 KWh, L = 41 for Bmax = 8 KWh and
L = 61 for Bmax = 12 KWh. A. Comparing Dynamic and Linear Programming Solutions
At the beginning of each time slot, after calculating the new
battery state of charge using equation (1), we have to make a In the first experiment, the operator has the capability to
projection on the battery state space following this rule, given sell energy to the grid but has no renewable energy sources.
by equation (11): We devise the average daily money spent by the opera-
Bmax Bmax tor following the deterministic policy, (obtained by dynamic
∀α ∈ [1, .., L−2], i f Bt+1 ∈ [(2α−1). , (2α+1). [ programming), the randomized policy (obtained by linear
2.(L − 1) 2.(L − 1)
Bmax programming).
then Bt+1 = α.
(L − 1) We compare the cases when the operator has a battery and
Bmax when it has no battery and is powered exclusively by the smart
I f α = 0, Bt+1 ∈ [0, [ then Bt+1 = 0
2.(L − 1) grid. The results are shown in the Fig. 2.
Bmax
I f α = L−1, Bt+1 ∈ [Bmax − , Bmax ] then Bt+1 = Bmax
2.(L − 1)
1.8 (11)
1.6
The distribution of the user traffic and electricity prices
1.4
follow a trinomial distribution with 1 trial. We have based
our model on European trading market EPEX type of pricing 1.2

for electricity which indicates an estimated price of energy 1

one day ahead based on demand and supply, so as to allow 0.8

energy intervenients make their bids on electricity. As this day- 0.6


Deterministic offline policy (DP)
ahead price might differ from the actual one, we added 0.15 0.4
Randomized offline policy (LP)
probability for it to be a little higher or a little lower. 0.2 Powered exclusively by the grid
The same argument holds for user traffic. The average user 0
4 8 12
traffic model is taken from the model presented in [6]. The Battery size (KWh)
probability that the user traffic is equal to the average value Fig. 2: Average daily money spent by the operator for energy
is 0.5 and the probabilities to be equal to the value below or acquisition for different battery sizes under deterministic and
to the value above are equal to 0.25 each. randomized policies
Our algorithm is nevertheless optimal for any distribution
which respects the Markov criterion and the ergodicity of
the Markov chain (no absorbing states, no periodicity in the
transitions between the states). From Fig. 2, we observe that the randomized and the
For the case when the network operator is equipped with deterministic policies achieve the same energy acquisition cost.
renewable resources, we take the model in [7]. The renew- This confirms that the proposed dynamic programming based
able energy production was generated from data collected at policy is optimal and minimizes the average cost problem over
Angstrom laboratory at Uppsala, Sweden, from January 1 to the finite horizon.
December 31, 2011. We take Pt as the power produced by We notice further that the daily energy expenses are lower
photovoltaic panels of 4m2 at each hour of the day based on when the operator is equipped with a battery. The gap increases
the results of [7]. with larger battery size as it gives more flexibility to the
The energy consumption model of the macro base stations is operator in buying and/or selling energy.
taken also from [6]. We suppose that P0 = 118.7W, ∆ p = 5.32,
nT X = 6 and that the eNB is transmitting with its maximal
power Pout = 20W. Following this model, the macro base
station consumes about 1.3 KWh at maximal capacity and B. Effect of the battery size
about 0.7 KWh when there is no traffic at all.
The action space is composed of 12 possible actions if In Figs. 3 and 4, we show the evolution of the battery state
the operator is an energy seller, 7 if not. The operator can of charge (SOC) for the cases when the operator is able to
buy energy from the grid with predefined portions ct of sell energy back to the grid and when it is not. The SOC is
its maximal battery size. If the operator is an energy seller averaged on all possible electricity prices and user traffic that
then ct ∈ [−1, −0.8, −0.6, −0.4, 0.2, 0, 0.2, 0.4, 0.6, 0.8, 1] else could occur at each hour of the day. The latter are shown
ct ∈ [0, 0.2, 0.4, 0.6, 0.8, 1]. above each of these figures.

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

120
100
by the grid. Conversely, small battery sizes do not allow long
Average user traffic load (%)
80 periods of self autonomy.
60
40
20 C. Impact of Energy Seller Capability and Renewable Energy
0

14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
In this subsection, we run our optimal policy under all cases:
Bmax = 4 KWh
with and without PV and with and without energy selling
Average Battery SOC (KWh)

12
Bmax = 8 KWh
10 Bmax = 12 KWh capability. The results are shown in Fig. 5.
8

6
45

Average daily energy expenditure saving (%)


4 No selling capability , no PV
40 No selling capability , with PV
2
With selling capability , no PV
0 35 With selling capability , with PV
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Day Hours
30
Fig. 3: SOC evolution for an operator able to sell energy 25

20

15
In Fig. 3, when the operator is able to sell energy to the
grid, we note that irrespective of the size of the battery, the 10

shape of the SOC evolution is the same for all battery sizes. 5

We see that the charging and discharging processes happen 0


0 4 8 12
exactly at the same hours. This comes from the fact that the Battery size (KWh)

operator takes the same decisions on buying and selling but Fig. 5: Average money spent per day for different battery sizes
with higher energy portions when its battery size is larger. We under different scenarios
notice also that having a battery of 12 KWh gives the operator
the opportunity to sell more energy at high price periods.
We notice that for all battery sizes, the operator having the
We notice that the achieved policy indicates trivially to ability to be an energy seller is able to reduce significantly its
operate on the battery and sell energy when the prices are energy expenses especially for large battery sizes compared to
in local maxima and to take advantage when the prices are in the case when it is powered exclusively by the grid. When the
local minima to charge the battery and stay powered by the battery size is low, the operator cannot afford long periods of
grid. An operation that the battery of 4 KWh allows but at self autonomy, and so, having the capability of selling energy
smaller proportions. back to the grid does not have a significant impact on its daily
120
energy expenditure.
100
Average user traffic load (%)
Having renewable energy resources allows the operator to
80
60
reduce the energy costs especially at low battery size, as this
40 additional energy source allows the operator to reduce the
20
0
amount of energy purchased from the grid.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
14 Fig. 6 details the actions of purchase and selling if applica-
Bmax = 4 KWh
Average Battery SOC (KWh)

12
Bmax = 8 KWh ble taken by the operator for the cases when it can and cannot
Bmax = 12 KWh
10
sell energy to the grid. We show the average amount of money
8

6
spent by the operator at each hour of the day. The battery is
4
set to 8 KWh which allows the operator for 6 hours of self
2 autonomy at high user traffic load. We treat the case when the
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 operator is not equipped with PV panels.
Day Hours

Fig. 4: SOC evolution for an operator not able to sell energy 120
100
Average user traffic load (%)
80
60
40
In Fig. 4, when the operator is not able to sell energy to the 20
0
grid, we notice that there is a small difference in the shape of 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
60
the SOC curve depending on the battery size at 7 pm when Operator able to sell Bmax = 8 KWh
40 Operator not able to sell Bmax = 8 KWh
the operator with 4 KWh battery was forced to charge it to
be able to sustain its demands during the next hours while for 20

the other two battery sizes, the operator had enough reserves 0

in its battery to use it when the prices become higher between -20

8 and 10 pm. -40


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
In fact, as the operator is not able to sell energy back to the Day Hours

grid, the amounts of energy collected are intended only for its Fig. 6: The energy seller capability effect on the optimal
own consumption. Thus, having a larger battery size allow the dynamic programming policy
operator to rely more on its battery and avoid being powered

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

We observe that the energy selling buys more energy in 35


no electricity price estimation error
the hours when it has to charge its battery in order to have 30
with electricity price estimation error of 5 ¢
with electricity price estimation error of 10 ¢
with electricity price estimation error of 15 ¢
a surplus allowing it to take advantage from opportunities to with electricity price estimation error of 20 ¢
25
sell energy later at higher prices. On the other hand, being not
able to sell forces the operator to keep its surplus for future 20
use.
15

Next, we show in Fig. 7 the impact of the presence 10

of renewable energy sources in the decisions taken by the 5


operator.
0
2 4 6 8 10 12 14 16 18 20 22 24
120
Day hours
100 Average user traffic load (%)
Renewable energy production (Wh/m2)
Fig. 8: Electricity price estimation error effect on the average
80
operation costs
60

40

20

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
We notice that the biggest gaps in energy expenses are
60
No PV ; Bmax = 8 KWh obtained when the operator buys large amounts of energy (at
40 With PV ; Bmax = 8 KWh 5 am and 2 pm) which also corresponds to local minima of
20
the electricity prices during the day. These gaps are due to the
0 errors in the estimated price.
-20 In Fig. 9, we vary the error in estimating the user traffic. If
-40
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
the operator has to serve more traffic than it expects and the
Day hours
energy stored in the battery does not allow it to do so, it has
Fig. 7: The renewable energy production effect on the optimal to buy the remaining energy from the grid.
dynamic programming policy
35
no traffic estimation error
with 10 % traffic estimation error
30 with 20 % traffic estimation error
with 30 % traffic estimation error
We notice that except for the case when the renewable
25
energy production is high, that is between 10 am and 6 pm, the
actions taken in the two cases are quite similar. Between 10 am 20

and 6 pm, when the energy prices are quite high, the operator
15
having photovoltaic panels exploits its production to sell it
to the grid while staying powered by the battery, and hence 10

making more profit. The renewable production in this period 5


of the day is not enough to power completely the operator
infrastructure which forces it to buy energy from the grid to 0
2 4 6 8 10 12 14 16 18 20 22 24
Day hours
satisfy needs.
Fig. 9: User traffic estimation error effect on the average
operation costs
D. Error robustness
In this section, we study the robustness of our offline Fig. 9 shows that modifying the traffic load does not
deterministic policy in the case where errors can occur in the influence significantly the expenditure. The biggest gap,
estimation of user traffic and energy prices. observed at 2 pm, is when the operator decides to buy energy.
In fact, the battery size is relatively large compared to the
For this, we introduce a white Gaussian disturbance on
energy deviation that can be caused by the error in traffic
the traffic and electricity prices while varying the standard
estimation.
deviation of the noise.
The errors of estimation of the traffic and energy prices are Figs. 10 and 11 show the error in daily energy expenditure
expressed as follows: normalized to the no error case for different battery sizes as
et = N(0, 1) × Ēt and em = N(0, 1) × Ēm (12) a function of the standard deviation of the estimation error
of energy prices and user traffic, respectively. We observe
We vary Ēt and Ēm to study the effect of the estimation that, for the case of error in traffic estimation, the normalized
error on the money spent by the operator. difference remains quite the same for different values of errors
We compare in Fig. 8 the average money spent by the and for all battery sizes.
operator on an hour basis for different values of error in On the other hand, we observe that the normalized error in
electricity price estimation for an operator not able to sell the daily expenses is sensitive to the energy price estimation
energy back to the grid equipped with a battery of 8 KWh. error, especially for larger battery size.

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TGCN.2018.2806299, IEEE
Transactions on Green Communications and Networking
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

0.14 states in order to achieve an acceptable error (< 0.01) in the


B max = 4 KWh
Normalized error on daily energy cost

0.12 B max = 8 KWh


normalized energy expenditure. Considering 21 states can be
B
max
= 12 KWh enough for a battery size of 4 KWh while for the case of a
0.1
battery of 12 KWh we need to consider 61 states to achieve
0.08 a robust policy.
0.06
V. CONCLUSION
0.04
In this paper, we addressed the issue of energy management
0.02
strategies for a telecommunication operator, equipped with a
0
0 5 10 15 20
storage battery as well as energy production capability, and
Energy price estimation error standard deviation (¢) empowered with a smart grid. The aim of the operator is to
Fig. 10: Effect of electricity prices estimation error on the minimize its energy bill while serving its customers requests.
variance of the daily energy expenditure We modeled the problem using an MDP for which we
applied an offline dynamic programming strategy on a finite
horizon. Both deterministic and randomized offline policies
0.014
Bmax = 4 KWh
based respectively on dynamic and linear programming have
been developed, achieving the same optimal objectives.
Normalized error on daily energy cost

0.012 Bmax = 8 KWh


Bmax = 12 KWh
0.01 These two strategies can benefit from their prior knowl-
edge of the users traffic requests and unitary energy price
0.008
distributions to perform optimal energy storage decisions. The
0.006 operator can therefore opt to recharge its battery or power its
0.004 equipment by the smart grid when the energy prices are low.
Conversely, the operator might decide to use its own energy
0.002
reserves when energy prices are high. Eventually, the operator
0
0 5% 10% 15% 20% can act as an energy seller in the smart grid and take advantage
User traffic estimation error standard deviation
of high prices during certain hours to sell its energy surplus
Fig. 11: Effect of the user traffic estimation error on the to the grid.
variance of the daily energy expenditure Our results show that the proposed strategy optimizes the
energy expenditure of the operator compared to the case when
it is exclusively powered by the grid, and is robust against
This normalized error can be in favor or against the operator. errors in the estimation of user traffic and energy prices.
Even when the estimation error leads to an increase of the Furthermore, the robustness can be improved by increasing the
expenses, the operator can still gain money as compared to state space considered in the dynamic programming algorithm
the case when powered exclusively by the grid as shown in which does not effect much the running time of the process.
Fig. 5.
Fig. 12 shows the effect of the discretization of the battery R EFERENCES
states on the normalized error in the daily operator energy [1] M. Erol-Kantarci and H. T. Mouftah, "Energy-efficient information and
expenditure. We assume that there is no estimation errors on communication infrastructures in the smart grid: A survey on interactions
the user traffic and the energy prices. and open issues", IEEE Communications Surveys and Tutorials, vol 17,
pp. 179 - 197, 2015.
[2] Z. M. Fadlullah, Y. Nozaki, A. Takeuchi, and N. Kato, "A survey of game
0.12 theoretic approaches in smart grid", IEEE International Conference on
Bmax = 4 KWh
Wireless Communications and Signal Processing (WCSP), pp. 1 - 4, 2011.
Normalizes error on daily energy cost

Bmax = 8 KWh
0.1
Bmax = 12 KWh
[3] J. Qin, R. Sevlian, D. Varodayan, and R. Rajagopal, "Optimal electric
energy storage operation", IEEE Power and Energy Society General
0.08 Meeting, vol 7, pp. 6 - 8, 2012.
[4] P. Mahat, J. E. Jimenez, E. R. Moldes, S. I. Haug, I. G. Szczesny, K. E.
0.06 Pollestad, and L. C. Totu, "A micro-grid battery storage management",
IEEE power and energy society general meeting, pp. 1 - 5, 2013.
0.04
[5] S. Grillo, A. Pievatolo, and E. Tironi, "Optimal storage scheduling using
markov decision processes", IEEE Transactions on Sustainable Energy,
0.02
vol. 7, pp. 755 - 764, 2016.
[6] G. Auer, V. Giannini, C. Desset, I. Godor, P. Skillermark, M. Olsson, M.
A. Imran, D. Sabella, M. J. Gonzalez, O. Blume, and A. Fehske, "How
0
11 21 31 41 51 61 much energy is needed to run a wireless network?", IEEE Transactions
Number of battery states on Sustainable Energy, vol. 18, pp. 40-49, 2011.
Fig. 12: Effect of the battery number of states on the [7] J. Munkhammar and J. Widel, "A flexible markov-chain model for sim-
ulating demand side management strategies - applications to distributed
variance of the daily energy expenditure photovoltaics", conference proceedings of World Renewable Energy Fo-
rum (WREF), 2012
We notice that the efficiency of our policy gets better when [8] D. P. Bertsekas, " dynamic programming and optimal control", Chapman
and Hall, vol. 4th edition, 2005.
increasing the number of the battery SOC. We note also that [9] E. Altman, "Constrained markov decision processes", Chapman and Hall,
as the battery size increases, our policy needs to consider more 1999.

2473-2400 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

You might also like