You are on page 1of 3

1 A. 37 B.

2 Intr 38 RS
3 o 39 S
4 d 40 I
5 u 41 /
6 c 42 E
7 t 43 n
8 i 44 e
9 o 45 r
10 n 46 g
10 47 y
11 In recent years Wireless Sensor Networks (WSN) 48 -
12become an active research area due to the wide number of 49 C
13applications where they can be used. The WSN consists of 50 C
14so-called “nodes” as shown in figure 1. Modern WSN can 51
15consist from a few hundred to several thousand nodes. 52 A
16There is two type of nodes: sensor node and sink node. A 53 l
17sensor node can have connected one or more sensor. Its 54 g
18scope its to collect data from sensors and forward it to the 55 o
19sink node. Sink node is the point in WSN where data can 56 r
20be accessed by the outside world. 57 i
21 58 t
59 h
60 m
61
61 In [1] a multi-agent reinforcement learning algorithm
62for wildfire monitoring is proposed. For this algorithm,
63WSN is organized as a multi-hop mesh cooperative
64structure. The idea consists in grouping together multiple
65nodes that will work as a single entity (denoted as CN) in
66forwarding data from the source node to the sink. The
67network structure is shown in figure 2.
22
23 fig. 1 Representation of a Wireless Sensor Network
24
25 Each node in WSN has a limited battery life. Sometimes
26access to the node for recharging or replacement can be
27difficult or impossible. Due to this fact, routing algorithms
28should be aware of energy consumption to maximize the
68
29lifetime of the network. Reinforcement Learning is a good 69 Fig. 2 Multihop mesh cooperative structure
30approach for adaptability in dynamic network conditions. 70
31Also, RL can help in the construction and updating of 71 The main idea of RSSI is that nodes in CN are
32routing tables. Using routing tables WSN can take into 72considered as opponents to each other. They compete over
33account all the dynamic parameters that define the traffic. 73who will forward the data next. Only one node will be
34This chapter is a survey on some energy aware routing 74selected to forward the data. The rest of the nodes will
35algorithms in WSN using RL. 75help with forwarding in case of failure and monitor the
36 76arrival of a packet to the next CN group.
77 Each node maintains a Q-value of itself and of its
78cooperative neighbors that represents the payoff of what
79would have been received if that node was selected to
80forward the data and other nodes together selected to
81monitor the packet. After that, the node with the highest
82payoff will be selected to forward the data to the next CN
83group.
84 After the packet will arrive at the next CN group each
85node in the current group will receive an immediate
86reward from the environment. In case of packet
87transmission was successful each node in CN will be
88updated according to energy consumption compared to its 120
89neighbors. In case of a failure, node chose to forward the
121 C.
90packet will be rewarded with a negative reward and other
122 Mu
91nodes will be updated according to their link quality.
123 l
92 State
124 t
93 S i= { k } , where k ={ CN n−1 , CN n ,CN n +1 } 125 i
94 126 -
95 Action 127 A
96 A i={a f , am } 128 g
97 129 e
98 where a f represents the forwarding of the packet 130 n
131 t
99and am monitoring of the forwarded packet 132
100 133 F
101 Reward 134 r

{
135 a
(∑ (
j ∈CNn )
E j / N CN n −Ei)
,
136
137
m
e

102 r i= (∑
(
j ∈CN n
E j
)/N CN n −min j ∈CNn E j ) 138
139
w
o
RSSI i ,CN n+ 1 140 r
−σ N CN n 141 k
RSSI CN n ,CN n+ 1, 142
143 f
103 144 o
104 Equation 1 is used to calculate the reward in case of 145 r
105successful forwarding, where Ei is the consumed 146
106energy by the node i of the group CN n . Equation 147 P
1072 is used to calculate the reward in case of insuccesful 148 a
108forwarding. The parameter σ will take 1 for node that 149 c
109faia l to forward the data and 0 for the rest of the nodes. 150 k
110 151 e
111 Q-value update 152 t
t +1 t t t t t t t t +1 t +1 153 t t
112 Q i' , i ( s i , af ,a m ) =( 1−α ) Q i' ,i ( si ,a f , am ) +α (r i ( s i ) +γ
154ωsti CN ( s i ) +γ ωstj CN ( s j ) ) R
113 where i is the node that was selected to forward the 155 o
114data, i '
are the rest nodes in the CN group that 156 u
157 t
115monitor the forwarding, j is the node from 158 i
116 CN t+ 1 selected to continue to forward the data, 159 n
117 ω sti and ω stj are factors that weight the total payoff 160 g
161
118in CN t and CN t+ 1 and the CN ( s t ) is the 162 i
119maximum payoff of the CN group. 163 n
164
165 W
166 S
167 N
167
168In [4] a multi-agent framework for packet routing in WSN
169is proposed. The framework enables each sensor node to
170to create and maintain a list of cooperative neighbors
171based on node past routing experiences. The framework is
172not an actual routing protocol but more like a tool that can
173assist existing routing algorithm.
224
Algorithm Articl States Actions Reward/Cos
e t
1
2
3
225
174
175 Fig. 3 The Two-layer architecture
176
177The framework uses a two-layer architecture as shown in
178figure 3. The first layer is the real physical wireless sensor
179network. It consists of nodes connected to each other by
180some sort of wireless medium. In the second layer, each
181node is modeled as an agent that are connected to each
182other by a cooperation relation. The second layer is an
183abstract network and does not really exist. The second
184layer is formed on basis of nodes past cooperation. For
185example, considering figure 3, if many packets are sent
186from node 1 are forwarded through node 7, node 1 may
187decide to add node 7 in the list of its cooperative
188neighbors. Then, in the future, if sensor 1 will have to
189send packets it first will send them to the node 7 and node
1907 will forward the packets next to the destination for the
191node 1. In this case, the second layer is used to guide the
192packet routing process of the first layer
193There are two places where reinforcement learning
194approach is used in this framework. First one is when a
195node wants to decide to add or not another node to its
196cooperative neighbors set. The second one is when the
197node has to send the packets, it should decide which
198cooperative partner to choose to forward the data. Each
199node can add and remove nodes from its cooperative
200neighbors set autonomously and independently.
201If the node v i wants to add another node v j as its
202cooperative partner it should consider several factors that
203will influence the behavior of the network. In case of
204adding, the storage consumption of node v i will
205increase, also the energy consumption of node v j will
206increase but at the same time the energy consumption of
207other nodes and routing delay of the network can be
208reduced. All these factors will be used to calculate the
209reward of adding the node v j as node’s v i as
210cooperative partner. Learning process is done using the Q-
211learning algorithm using ε greedy exploration
212method.
213In case of deciding which cooperative neighbor to choose
214to forward the packets, node v i should consider
215similar things as in previous case. Generally, node v i
216prefers to forward packets through cooperative neighbors
217that have more energy, also prefers neighbors with a small
218normalized coverage area. A short distance to the sink is
219also preferred. Node v i use direct policy search
220algorithm to learn stochastic policies to choose
221cooperative neighbor.
222
223

You might also like