You are on page 1of 10

Received October 5, 2019, accepted October 18, 2019, date of publication October 24, 2019, date of current version

November 7, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2949465

Social Network-Oriented Learning Agent for


Improving Group Intelligence Coordination
JINGTAO QI , LIANG BAI, AND YANDONG XIAO
College of System Engineering, National University of Defense Technology, Changsha 410073, China
Corresponding author: Yandong Xiao (xiaoyandong08@gmail.com)
This work was supported by the National Natural Science Foundation of China under Grant 61902418.

ABSTRACT Group coordination is embedded in social networks, which aims to reach a consensus or solve
conflicts among social individuals. Recently, improving performance has become a challenging task and
drawn considerable interest in this field. To eliminate barriers of group coordination, we abstract the problem
into a networked color coordination game and introduce learning agents encoded with Q-learning to collect
local information and learn local individual behaviors. We first show that learning agents can effectively
improve the group coordination performance. By properly selecting parameters, we find that learning agents
acting with low greedy parameter levels and placed in central locations can drastically accelerate group
coordination. Greedy parameters and positions are vital to the problem. Finally, we indicate that learning
agents have a direct effect on neighbor individuals and an indirect effect on non-neighbor individuals to
act on the coordination network. Moreover, we propose a conflict relationship index which is the average
rounds required for solving conflicts and indicate learning agents solve conflicts that cannot be solved by
an individual. Hence, learning agents create further benefits to group coordination in these complex social
networks. This paper provides a detailed analysis of the learning agents in a networked color coordination
game and shows that artificial intelligence provides a solution to the group coordination problem.

INDEX TERMS Group coordination, learning agents, reinforcement learning.

I. INTRODUCTION and their group, and the inability of individuals to effectively


With the rapid development of network technology, social coordinate their actions globally [12]. Even if all individu-
network platforms are being widely used as the carrier of als effectively coordinate with their neighbors, this may not
competition and cooperation [1]. Therefore, improving com- result in the maximum collective interest for the group, which
petition and cooperation among intelligent individuals to is a sub-optimization problem [11], [13]–[15]. However, with
satisfy the needs of society represents a challenging and fasci- the development of technology and the constantly deepening
nating focus area [2], [3]. Because group cooperation requires understanding of social systems, the complex group coordi-
the sacrifice of individual interests to achieve a higher overall nation problem can become manageable in society [16], [17].
interest [4]–[7], it is often hard to achieve the maximum col- To solve the group coordination problem and satisfy the
lective societal interest [8]. Additionally, individual ‘‘blind’’ needs of society, experts in various fields (including biolo-
phenomena, for example, decisions completed with limited gists, physicists, and psychologists) are focusing on this prob-
information, make it difficult for individuals to weigh the lem [18]. There exists many original and influential works in
relationship between individual interests and collective inter- this field from the interdisciplines among statistical physics,
ests [7], [9], namely, the social dilemma [10], [11]. However, network science, game theory and so on. Perc et al. [19]
even when the social dilemma can be solved, there remains reviewed the evolutionary dynamics of group interactions
another obstacle–group coordination. Groups often have dif- on structured populations. Later, Perc et al. [20] reviewed
ficulty achieving the maximum collective interest. The pos- the statistical physics research of human cooperation, focus-
sible reasons for this problem include conflicting interests ing on spatial pattern formation, the spatiotemporal dynam-
among individuals, conflicting interests between individuals ics of observed solutions, and self-organization that may
either promote or hinder socially favorable states. Besides,
The associate editor coordinating the review of this manuscript and Wang et al. [21] highlighted the importance of pattern forma-
approving it for publication was Lin Wang . tion and collective behavior for promoting the cooperation

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
156526 VOLUME 7, 2019
J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

under adverse conditions, as well as the synergy effects. with Q-learning based on local information into the
In the application of human lives, Helbing et al. [22] showed network in exchange for the same number of con-
that complexity science can mitigate the undesirable cascade ventional agents. In addition, we design clear action
effects and accelerate the emergence of favorable kinds of rules for q-bots and conventional agents. We classify
self-organization in the system. Within the field of control the conflicts into resolvable conflicts and unresolvable
science and engineering, many works have emerged. Con- conflicts according to the action rules of conventional
sidering network structures, Enemark et al. [23] explained agents.
why the negative effect of constraint outweighs the posi- 2) We carry out comparison experiments on the network
tive effect of communication in complex coordination tasks. with and without q-bots. By analyzing the comparison
Rand et al. [24] provided an experimental demonstration of results, we find that the main cause of the group coor-
the power of a static network structure in stabilizing human dination problem is that conventional agents cannot
cooperation, which is consistent with a quantitative evolu- reach the collective objective by eliminating only their
tionary game theoretical prediction. In a more recent work, local conflicts in the individual perception. In addition,
Enemark et al. [25] showed that connections can influence we prove the effectiveness of q-bots in solving the
coordination in two different ways, wherein connections that group coordination problem.
reduce the number of solutions to a problem play a nega- 3) To evaluate the effect of parameters on q-bots in solving
tive role in coordination and connections facilitate coordina- the group coordination problem, we evaluate 10 condi-
tion. To improve this problem, Peysakhovich and Lerer [26] tions: 1 control condition without q-bots and 9 condi-
constructed consequentialist conditional cooperation in a tions that combine the greedy parameters and network
large class of games that can construct good strategies by locations of q-bots. In this experiment, we find that
conditioning one’s behavior solely based on outcomes using the greedy parameter and network location play an
deep reinforcement learning techniques and demonstrated important role in solving the problem. We show that
that the method is effective in social dilemmas beyond sim- q-bots acting with low levels of greedy parameters
ple matrix games. Shirado and Christakis [12] researched placed in central locations meaningfully improve the
the group coordination problem and discover a surprising group coordination by more than 80%.
solution, namely, that locally noisy agents improve global 4) To better understand the mechanism of q-bots in solv-
human coordination and help the group achieve the max- ing the group coordination problem, we carry out
imum collective interests. In the field of computer tech- 2 experiments. We calculate the degree of node color
nology, this problem is often abstracted as a multi-agent change for every 100 generations in one experiment.
problem in research. Salazar et al. [27] presented a dis- Based on this experiment, we show that q-bots act
tributed approach to coalition emergence in large-scale multi- on the coordination network by affecting not only
agent systems which achieves full cooperation over differ- neighbor individuals but also non-neighbor individuals.
ent complex networks. Peleteiro et al. [28] presented a new Then, we propose a conflict relationship index in which
mechanism based on three main pillars (indirect reciprocity, the average rounds required for solving conflicts and
coalitions and rewiring) to improve cooperation among indicate q-bots eliminate conflicts that cannot be solved
self-interested agents placed in a reputation-based game. by individuals.
Shibusawa et al. [29] proposed a behavioral strategy called
The remainder of this paper is organized as follows.
the expectation of cooperation, which is simple and easy to
In Section II, a networked color coordination game model
implement but, nevertheless, can spread over and maintain
cooperation in agent networks under certain conditions. How- is presented to characterize group coordination. Based on
ever, these studies are not applicable to solving the social this, in Section III, we introduce q-bots into the model and
dilemma of coordination in some agent networks, or they are formulate the action rules of conventional agents and q-bots.
To verify and explore the effectiveness of q-bots in solving
not applicable to actual social systems.
the group coordination problem, numerical experiments are
According to the above research, if locally noisy agents
performed on a networked color coordination game case, and
improve global human coordination and help the group
achieve the maximum collective interests, learning agents the results are presented in Section IV. Our conclusions and
can bring about a better outcome. Artificial intelligence has final discussion are provided in Section V.
recently arisen as a widespread theory, and it applies to many
problems. In this paper, we introduce learning agents based II. GROUP COORDINATION MODEL
on local information into a networked color coordination The group coordination problem is complex and unique
game and provide a solution to the group coordination prob- in social networks. Therefore, to solve this problem,
lem. The major contributions of this paper can be summarized the abstracted and representative model is critical [31]. In this
as follows. section, a networked color coordination game model is pre-
1) We abstract the group coordination problem into a sented to characterize this problem.
networked color coordination game [30]. Based on this Group coordination can be characterized by a networked
model, we introduce learning agents (q-bots) encoded color coordination game. The color coordination game is an

VOLUME 7, 2019 156527


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

undirected network G = (V , E, C, S) [32], where V = collective interest of the group, which is characterized as the
Vc1 (r) ∪ Vc2 (r) ∪ Vc3 (r) ∪ · · · ∪ Vck (r) = {v1 , v2 , v3 , . . . , vz } state in which there is no conflict in the game. In addition,
denotes the node set, E = {e1 , e2 , e3 , . . . , em } ⊆ V × the rounds of the game are considered as a problem-solving
V represents information flows between agents, set C = evaluation index, and we set an upper limit u for the rounds
{c1 , c2 , c3 , . . . , ck } shows the alternative colors and the sorted of the game in our experiment.
sequence S(r) = (s1 (r), s2 (r), s3 (r), . . . , sz (r)) records the
colors of the nodes in round r. All agents are divided into III. THE ACTION RULES FOR AGENTS
k sets Vcj (r)(j = 1, 2, 3, . . . , k) according to their color in To mimic the individuals in real-life group coordination,
round r. For example, if the color of one node vi is cj in we formulate the action rules for conventional agents and
round r, si (r) = cj , this node belongs to set Vcj (r). Here, q-bots.
z = |V | = |S|, m = |E| and k = |C| express the numbers of
nodes, edges, and colors in the networked color coordination A. THE CONVENTIONAL AGENTS ACTION RULE
game, respectively. In the model, edges that connect nodes We formulate the rule for conventional agents as follows. For
with the same color are considered conflicts, while other one conventional agent connected with greater than or equal
edges are considered conflict-free. As noted, agents act using to one conflict edge in round r, this agent randomly selects
local information in the game. Specifically, in this model, from the whole color set if the random number is less than the
agents can see only the colors of neighbors with whom they random parameter of conventional agents and the colors of its
share the same edge in addition to their own color, which is neighbors occupy all color in set C; otherwise, the agent ran-
also called the ‘‘blind’’ phenomenon. domly selects from the set including the color not occupied
Based on social network characteristics, we build the by its neighbors. This agent will not change its color if it is
model network according to scale-free networks [33], connected without conflict edges.
namely, the network structure is created from the beginning
by randomly attaching new nodes (each with l links) to
existing nodes based on the degree of existing nodes [12]. Algorithm 1 The Conventional Agent Action Rule
The topology of networks is fixed during the whole color 1: if ∃vj ∈ Nvi , si (r) = sj (r)(j 6 = i) then
Nv
coordination game. In our network, the adjacency index is 2: if Pc ∈ U [0, 1] ≤ P or ∀cj ∈ C, Vcj i (r) 6= ∅ then
( 3: C0 = C
1 if vi and vj are connected
d(vi , vj ) = (1) 4: else
Nv
0 otherwise. 5: C 0 = {cj |Vcj i (r) = ∅, cj ∈ C}
Before the game, we colorize each node by randomly 6: end if
selecting a color with a uniform distribution in the set C. 7: si (r + 1) ∈ U (C 0 )
8: else
For example, we randomly select one color cx to colorize one
node vi , that is, 9: si (r + 1) = si (r)
 10: end if
cx ∈ U (C)

si (0) = cx (2) Algorithm 1 shows the above-mentioned conventional

Vcx (0) = Vcx (0) ∪ {vi }.

agent action rule, where set Nvi includes the neighbors of one
During the game process, each agent acts simultaneously node vi and vi , Pc ∈ U [0, 1] represents a random number Pc
according to action rules, which are introduced in Section III, generated with a uniform distribution in the interval [0, 1],
in each round. For example, the color of one node vi is cx in the constant P represents the random parameter of conven-
round r, and then this node changes its color to cy based on tional agents (characterized by the uncertainty of individuals
Nv
the action rules in round r + 1. This process is in a practical problem), set Vcj i (r) denotes nodes colorized
  with cj that belong to set Nvi and U (C 0 ) denotes a randomly
si (r) = cx
 si (r + 1) = cy
 selected color with a uniform distribution in set C 0 . By estab-
vi ∈ Vcx (r) → Vcx (r + 1) = Vcx (r) − {vi } (3) lishing this rule, we consider the action of the individual in


/ Vcy (r)
vi ∈


Vcy (r + 1) = Vcy (r) ∪ {vi }. reality, such as the individual uncertainty and the individual
expectation of coordinating with neighbors.
When the game is finished, the condition that should be met When the condition meets the requirement of Pc ∈
is the existence of no conflicts in the network. Specifically, U [0, 1] ≤ P, Fig. 1(a) and (b) show examples of this action
any two nodes that share the same edge have different colors. rule. In Fig. 1(a), the center node changes its color to pur-
This condition is ple for the next round because it meets the conditions that
Nv Nv
∀ cx ∈ C, ∀ vi , vj ∈ Vcx (r), d(vi , vj ) 6 = 1. (4) Vcj i (r) 6= ∅(cj = green, orange) and Vcj i (r) = ∅(cj =
purple) for the current round. In Fig. 1(b), the center node
As the condition of the game finishing, it describes the changes to the other color generated with a uniform distribu-
objective of group coordination that achieves the maximum tion of the set C = {purple, green, orange} for the next round

156528 VOLUME 7, 2019


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

a set of states W , a set of actions A, a reward b and a table T


that records the expected reward of an action taken in a given
state. The core algorithm is as following:
T 0 (wr , ar ) ← (1−α)T (wr , ar )+α[br +γ max T (wr+1 , a)],
a
(5)
where T 0 denotes the updated table, br is the reward received
when moving from state wr to state wr+1 , α is the learning
rate and γ is the discount factor. To select an optimal action
for any given state, (5) is used to iteratively update T during
the learning process.
In our model, we use a multi-agent Q-learning method
[34], [35], which exchanges h q-bots for the same number of
conventional agents in our model. One q-bot located at node
vi corresponds to one table Tvi . For multi-agent Q-learning,
the state of one q-bot is a sorted sequence S Nvi (r) that records
the color of the q-bot and its neighbors. This state records the
location information of neighbors corresponding to the q-bot
and the color information of neighbors. The action set of one
q-bot is set C, which denotes the colors that can be selected by
the q-bots. When the network composed of one q-bot and its
neighbors is conflict-free, the corresponding reward b equals
1; otherwise, it equals 0.
FIGURE 1. The examples of action rules for conventional agents and the
rule-induced conflicts. In this example, K = 3, C = {purple, green,
orange}, black (or red) edges denote conflict-free (or conflict) cases. Algorithm 2 The Learning Agent Action Rule
(a), (b) Examples for action rules for conventional agents. (c) Example for
resolvable conflicts. (d) Example for unresolvable conflicts. 1: if ∃vj ∈ Nvi , si (r) = sj (r)(j 6 = i) then
2: if εc ∈ U [0, 1] ≥ ε then
Nv
3: si (r + 1) = argmax c Tx (S Nvi (r), c)
because the node meets the condition that Vcj i (r) 6 = ∅(cj = 4: else
Nv
green, orange, purple) for the current round. 5: if Pc ∈ U [0, 1] ≤ P or ∀cj ∈ C, Vcj i (r) 6= ∅ then
In addition, conflicts are classified as resolvable and unre- 6: C0 = C
solvable conflicts based on this rule [12]. When the condition 7: else
Nv
meets the requirement of Pc ∈ U [0, 1] ≤ P, the resolvable 8: C 0 = {cj |Vcj i (r) = ∅, cj ∈ C}
conflicts can be eliminated when either node selects from the 9: end if
set composed of the color not occupied by neighbors, but 10: si (r + 1) ∈ U (C 0 )
the unresolvable conflicts cannot. Figure. 1(c) and (d) show 11: end if
examples of resolvable conflicts and unresolvable conflicts, 12: else
respectively. In Fig. 1(c), the 2 center nodes change their 13: si (r + 1) = si (r)
color to purple and orange based on this rule for the next 14: end if
round. By doing so, this conflict is eliminated; therefore, it is
considered a resolvable conflict. In Fig. 1(d), the 2 center Based on the multi-agent Q-learning method, q-bots act
nodes change their color to purple based on this rule for the with a greedy strategy [36], and we set the greedy parameter
next round. However, this conflict sinks into infinite loop; (ε). We clarify the action rules of the q-bots as follows. One
therefore, it is considered an unresolvable conflict. In the q-bot qi connected with greater than or equal to one conflict
following experiments, conflicts are regarded as unsolvable edge selects the maximum expected reward of a color in a
conflicts when two agents have only one color to select given state if the random number is greater than ε; other-
according to conventional agent rules; otherwise, they are wise, it selects based on the conventional agent action rule.
regarded as resolvable conflicts. This q-bot does not change its color if it is not connected
with a conflict edge. Algorithm 2 shows these action rules.
B. THE LEARNING AGENT ACTION RULE Because each agent acts simultaneously according to action
To improve the group coordination problem, we introduce rules, we can get the next state of one q-bot and update the
q-bots into the color coordination game in exchange for the corresponding table Tvi after all agents choose actions.
same number of conventional agents. We encode q-bots with Independently of the above-mentioned ε, we also manipu-
Q-learning, which is a model-free reinforcement learning late the q-bot network location, including its random, central
algorithm [34]. The Q-learning algorithm includes an agent, and peripheral attributes. A peripheral location means that

VOLUME 7, 2019 156529


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

q-bots are assigned to locations that have the lowest network without q-bots always get stuck into an infinite loop of unre-
degree. Likewise, a central location means that q-bots are solvable conflicts. By contrast, games with q-bots can help to
assigned to a location with the largest network degree. A ran- avoid previous pitfall and easily achieve a conflict-free state
dom location means that q-bots are randomly assigned to the because they can induce other nodes to break the infinite loop
network. of unsolvable conflicts by choosing the color with the highest
reward obtained from iterative learning.
IV. CASE STUDY To validate the above-mentioned hypothesis, we perform
To eliminate the barriers of group coordination, we introduce an experiment and show the result in Fig. 2(c). Because
q-bots into the networked color coordination game and per- games without q-bots often sink into an infinite loop of
form a simulation experiment. In the experiment, where the unresolvable conflicts, games without q-bots rarely complete
number of edges added each time during building a network within the upper limit, and the number of rounds in games
l = 2, the number of colors k = 3 (chromatic number without q-bots does not change with increasing generations.
representing the minimum number of colors which is neces- In contrast, the number of rounds in games with q-bots
sary to color the networks to meet the conflict-free condition, decreases with increasing generations, and more and more
because there are no four interconnected nodes in the model games complete within the upper limit. This result supports
network constructed with l = 2), the random parameter of our hypothesis that q-bots help to improve the group coordi-
conventional agents P = 0.1, the discount factor γ = 0.9 and nation problem.
the learning rate α = 0.5. The networks constructed in the
following experiment are globally solvable. First, we analyze B. ANALYSIS OF KEY LEARNING AGENT PARAMETERS
the key cause of the group coordination problem and evaluate There are many learning agent parameters, but our model
the effectiveness of q-bots in solving this problem. Then, is based on networks, and q-bots are based on the learn-
we analyze the effect of key parameters on q-bots in the solu- ing algorithm; therefore, we consider only the key param-
tion. Finally, we investigate the mechanism of q-bots in depth eters, including network location and ε. In this section,
to solve this problem, which includes the mode of q-bots we manipulate the q-bot network location and ε to evaluate
acting on coordination networks and conflict relationships. 10 conditions, including 1 control condition without q-bots
and 9 condition combinations including 3 types of network
A. EFFECTIVENESS ANALYSIS OF LEARNING AGENTS locations (random, central and peripheral) and 3 levels of ε
In this experiment, by contrasting the performance with and (0.1, 0.3 and 0.5).
without q-bots in one game, we find the key cause of the Figure. 3 shows the results of this experiment as the
group coordination problem and present a hypothesis that relationship between rounds and the survival proportion
q-bots can effectively solve the group coordination prob- in 20 games under the 10 conditions. With decreasing ε,
lem. Additionally, we give a simple example to illustrate the the survival proportion obviously decreases, which indicates
advantages of q-bots. Then, by contrasting the performance of that q-bots with a smaller ε solve the group coordination
the network with and without q-bots in 10 different networks problem better. There is a clear distinction between the three
with 1000 nodes, we validate this hypothesis. The q-bots are types of network locations. The central location case is the
assigned to center locations, and ε = 0.1 in this section. The best, and the peripheral location case is the worst. The ran-
experimental results are presented in Fig. 2. dom location case lies between them. The q-bots with cen-
As shown in Fig. 2(a), for the game without q-bots, tral locations have many more neighbors than those with
although conventional agents aim to eliminate all conflicts the other two types of locations; therefore, we assume that
based on the rule, most of the conflicts remain in the network, q-bots act on coordination networks by affecting not only
and the game rounds are too long. Therefore, we consider individuals to whom they are connected directly but also
the key cause of the group coordination problem as being other individuals. This assumption is proven in Section IV-C.
that conventional agents cannot achieve the collective goal by Additionally, the performance of the game with q-bots with
eliminating only their local conflicts individually, called the different ε levels and locations is either better or worse. For
‘‘blind’’ phenomenon. In contrast, for the game with q-bots, example, when q-bots act with ε = 0.1 and are placed in
the game rounds are much shorter, and there are considerably central location, games have the lowest proportion of survival
fewer conflicts. Based on the above analysis, we hypothe- in 20 games, and q-bots improving the game by more than
size that q-bots can effectively solve the group coordination 80%. This result supports our conclusion in Section IV-A
problem. that q-bots effectively solve the group coordination problem.
Compared with conventional agents, the advantage of However, when q-bots act with ε = 0.5 and are placed
q-bots is that they can choose the action with the highest in a peripheral location, games have the maximum survival
reward according to the learning information, so that the proportion, which demonstrate q-bots even play a negative
games with q-bots can achieve a conflict-free state with a role. Therefore, ε and location are vital to q-bots in solving
higher probability, and the disadvantage of learning agents group coordination. By properly selecting the parameters,
is that iterative learning takes time. To prove this point, q-bots can solve the problem with the best effect. Otherwise,
we present an example for illustration in Fig. 2(b). Games q-bots play a negative role in group coordination.

156530 VOLUME 7, 2019


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

FIGURE 2. The experiments to compare game performance without q-bots or with q-bots. We provide two animations for the game–one with
q-bots and one without q-bots (see Supplementary Videos). (a) Bar chart of the number of conflicts and 6 snapshots of colored networks that
compare the games without q-bots and with q-bots in one game (z = 20, h = 3 and u = 200). The upper and lower transverse axes show the
results without q-bots and with q-bots, respectively. In the bar chart, dark blue and dark red bars show the number of unresolvable conflicts, and
light blue and red edges show the number of resolvable conflicts. In the 6 snapshots, circle nodes show conventional agents, square nodes show
q-bots, red edges and blue edges show conflicts, and black edges represent the absence of conflict. (b) Examples for the analysis of action rules
for conventional agents and q-bots. (c) The error bar for the rounds that compare the games without q-bots and with q-bots in 10 different
networks (each network was tested with 500 generations, z = 1000, h = 50 and u = 2000). The blue error bar shows the case without q-bots, and
the red error bar shows the case with q-bots.

C. ACTION MECHANISM OF LEARNING AGENTS ON q-bots acting on coordination networks. We count the number
COORDINATION NETWORKS of node color changes for every 100 generations in the process
To study the way q-bots act on group coordination in depth, of iterative learning and normalize these data by dividing
we carry out an experiment to analyze the action mode of their maximum. The processed data show the degree of color

VOLUME 7, 2019 156531


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

FIGURE 3. Survival curves of the game according to ε and the locations


of the q-bots. Each of the conditions is conducted over 20 different
networks (each network was tested once with z = 20, h = 3 and u = 200).
The curves show the percentages of games unsolved within the upper
limit. Dark blue curves show the results of the games with q-bots FIGURE 4. Effect of q-bots on the behavior of other nodes. The color of
according to the ε level (horizontal axis) and network location (vertical the nodes denotes the number of color changes each 100 generations in
axis). Light blue curves show the results of the control condition of games one game (z = 20, h = 3 and u = 200). The coloring network is based on
without q-bots. the interval of generations (horizontal axis). Circle nodes show the
conventional agents, and square nodes show the q-bots. (a) The games
with q-bots by ε level (vertical axis). (b) The games without q-bots.

change in the nodes. In this section, q-bots are placed in a D. ACTION MECHANISM OF LEARNING AGENTS ON
central location, and the results are shown in Fig. 4. CONFLICT RELATIONSHIPS
For games without q-bots, there is no clear distinction To study the way q-bots act on group coordination in depth,
between the different intervals of generations, and the degree we carry out an experiment to analyze the action mode of
of color change is too high. For cases with q-bots, the degree q-bots acting on conflict relationships among individuals.
of color change decreases with decreasing ε. This result We propose the average rounds index for solving conflicts.
matches the result in Section IV-B indicating that q-bots with For a conflict with one edge in one game, the definition of
smaller ε levels influence nodes more strongly. The degree of the average rounds required for solving conflict is
color change in the nodes decreases with the process of learn-
CLi
ing, but there is no clear distinction between the generation CSi = , (6)
of intervals [100, 200] and [200, 300], which shows that the CNi
effect of the q-bots on group coordination will be saturated where CLi shows the number of rounds in which edge ei
with the process of learning. sinks into this type of conflict and CNi shows the number
It is apparent that q-bots influence their neighbors. How- of intervals in which edge ei sinks into this type of conflict.
ever, if the q-bot cannot influence other nodes, the group For many edges or many games, we simply weight the mean
coordination with q-bots has almost no improvement. There- of this index. This index is used to measure the efficiency
fore, we are more confident in the assumption given in of conflict solving. Based on this index, we analyze the
Section IV-B. By comparing the results of games with and action mode of q-bots acting on conflict edges in 10 different
without q-bots, we find that the degree of color change for networks with 1000 nodes. In this section, q-bots are placed in
nodes connected with q-bots and other nodes decreases with a central location, and edges are classified as edges connected
the decrement of ε levels, but the degree of decrease for to q-bots and edges not connected to q-bots based on their
nodes connected with q-bots is obviously higher than that relationship with q-bots.
for other nodes with the decrement of ε levels. Therefore, Figure. 5 shows the results of this experiment. For games
we validate the assumption mentioned in Section IV-B stating without q-bots, there is no clear distinction in the average
that q-bots have a direct effect on the neighbor individuals rounds required for solving conflict with increasing gener-
and an indirect effect on the non-neighbor individuals to act ations. For games with q-bots, the average rounds required
on the coordination network, but ‘‘information’’ is lost in the for solving conflict decreases with increasing generations.
dissemination. Thus, q-bots act on the coordination network. The q-bots with a smaller ε level have better performance.

156532 VOLUME 7, 2019


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

FIGURE 5. Effect of q-bots on the edges. The curves show changes in the average rounds required for solving conflict with increasing generations. Each
case is performed in 10 different networks (each network was tested with 500 generations, z = 1000, h = 50 and u = 2000). Blue curves show the cases
of edges connected to q-bots, and orange curves show the cases of edges not connected to q-bots. From left to right, the cases are in sequence of the
index of conflicts, resolvable conflicts, and unresolvable conflicts. (a) The cases with q-bots by ε level (vertical axis). (b) The cases without q-bots.

Additionally, the improvement of this index for edges con- problem. In the coordination process, individuals often have
nected to q-bots is obviously higher than that for edges not difficulty reaching the collective objective because of the
connected to q-bots, which supports the results that q-bots personal interest and ‘‘blind’’ phenomenon in the social
solve the problem of coordination effectively and have an networks. To satisfy the needs of society and solve the
effect on all individuals (the effect on neighbors is larger than group coordination problem, we propose a solution on the
that on non-neighbors) in the above experiment. basis of artificial intelligence. We use a networked color
In addition, we find that the improvement magnitude of coordination game model to characterize group coordination
unresolvable conflicts is much higher than that of resolv- and introduce q-bots into this model. The learning agent
able conflicts in games with q-bots. Therefore, we ignore function for solving the group coordination problem is as
the improvement of resolvable conflicts. By contrasting the follows.
games with and without q-bots, we conclude that q-bots 1) The group coordination problems caused by individ-
eliminate the conflicts that cannot be solved by individuals. uals unable to reach the collective objective by elim-
This result matches the example in Fig.2 (b). Thus, q-bots inating only their local conflicts from an individual
act on the conflict relationship among individuals in group standpoint are improved by q-bots.
coordination. 2) By properly selecting the ε and network location,
q-bots drastically improve group coordination. The
V. CONCLUSION q-bots acting with low levels of ε and placed in cen-
Group coordination often uses social networks, but improving tral locations improve the group coordination by more
group coordination represents a challenging and fascinating than 80%.

VOLUME 7, 2019 156533


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

3) In acting on the coordination network, q-bots work by [10] A. Lerer and A. Peysakhovich, ‘‘Maintaining cooperation in
having not only a direct effect on neighbor individuals complex social dilemmas using deep reinforcement learning,’’
2017, arXiv:1707.01068. [Online]. Available: https://arxiv.org/
but also an indirect effect on non-neighbor individuals. abs/1707.01068
To act on the conflict relationship, q-bots eliminate [11] R. M. Dawes, ‘‘Social dilemmas,’’ Annu. Rev. Psychol., vol. 31, no. 1,
conflicts that cannot be solved by an individual. In the pp. 169–193, 1980.
[12] H. Shirado and N. A. Christakis, ‘‘Locally noisy autonomous agents
above two ways, q-bots improve the group coordination improve global human coordination in network experiments,’’ Nature,
problem to help groups achieve collective interests. vol. 545, no. 7654, pp. 370–374, 2017.
[13] R. L. Calvert, ‘‘Leadership and its basis in problems of social coordina-
As a result, the finding in this paper provides q-bots to tion,’’ Int. Political Sci. Rev., vol. 13, no. 1, pp. 7–24, 1992.
improve group coordination by solving the sub-optimization [14] J. R. Dyer, A. Johansson, D. Helbing, I. D. Couzin, and J. Krause, ‘‘Lead-
problem arising from the individual ‘‘blind’’ phenomenon. ership, consensus decision making and collective behaviour in humans,’’
Philos. Trans. Roy. Soc. B, Biol. Sci., vol. 364, no. 1518, pp. 781–789, 2008.
For example, we set q-bots in positions (decision mak- [15] J. B. Van Huyck, R. C. Battalio, and R. O. Beil, ‘‘Tacit coordination games,
ers) that have the most neighbors in practice, which can strategic uncertainty, and coordination failure,’’ Amer. Econ. Rev., vol. 80,
improve the auxiliary function and provide certain decision- no. 1, pp. 234–248, 1990.
[16] Y.-Y. Liu, J.-J. Slotine, and A.-L. Barabási, ‘‘Controllability of complex
making support. However, beyond this point, this work on networks,’’ Nature, vol. 473, no. 7346, pp. 167–173, 2011.
group coordination applies not only to human populations [17] Y.-D. Xiao, S.-Y. Lao, L.-L. Hou, and L. Bai, ‘‘Edge orientation for
but also to groups composed solely of robots attempting optimizing controllability of complex networks,’’ Phys. Rev. E, Stat.
Phys. Plasmas Fluids Relat. Interdiscip. Top., vol. 90, no. 4, 2014,
to coordinate their actions [12]. From a broad perspective, Art. no. 042804.
this work can be applied in more realistic or complex [18] V. Capraro, ‘‘A model of human cooperation in social dilemmas,’’ PLoS
coordination problems, such as those involving military One, vol. 8, no. 8, 2013, Art. no. e72427.
[19] M. Perc, J. Gomez-Gardenes, A. Szolnoki, L. M. Floría, and Y. Moreno,
robots or autonomous moving vehicles. The code that sup- ‘‘Evolutionary dynamics of group interactions on structured populations:
port the findings of this study are available from GitHub A review,’’ J. Roy. Soc. Interface, vol. 10, no. 80, 2013, Art. no. 20120997.
(http://github.com/JingtaoQi/color-game). [20] M. Perc, J. J. Jordan, D. G. Rand, Z. Wang, S. Boccaletti, and A. Szolnoki,
‘‘Statistical physics of human cooperation,’’ Phys. Rep., vol. 687, pp. 1–51,
Although this paper proves our idea, we consider only May 2017.
some basic factors, including the ε level and network location [21] Z. Wang, L. Wang, A. Szolnoki, and M. Perc, ‘‘Evolutionary games on
of the learning agents [37]. Further work is required, includ- multilayer networks: A colloquium,’’ Eur. J. Phys. B, vol. 88, p. 124,
May 2015.
ing considering dynamic networks in which the connections [22] D. Helbing, D. Brockmann, T. Chadefaux, K. Donnay, U. Blanke,
between nodes may change [38], the effect of historical infor- O. Woolley-Meza, M. Moussaid, A. Johansson, J. Krause, S. Schutte, and
mation on individual decision making [39], ‘‘prosocial pref- M. Perc, ‘‘Saving human lives: What complexity science and information
systems can contribute,’’ J. Stat. Phys., vol. 158, no. 3, pp. 735–781,
erences’’ in which individuals imitate the majority behavior Feb. 2015.
[40], the effect of the social environment on individual deci- [23] D. P. Enemark, M. D. McCubbins, R. Paturi, and N. Weller, ‘‘Does more
sion making [41], [42] and the effect of individual reputations connectivity help groups to solve social problems,’’ in Proc. 12th ACM
Conf. Electron. Commerce, Jun. 2011, pp. 21–26.
on cooperative behaviors (individuals can maximize their rep-
[24] D. G. Rand, M. A. Nowak, J. H. Fowler, and N. A. Christakis, ‘‘Static
utation by cooperating with surrounding high-reputation indi- network structure can stabilize human cooperation,’’ Proc. Nat. Acad. Sci.
viduals) [43], [44]. Furthermore, because of the particularity USA, vol. 111, no. 48, pp. 17093–17098, 2014.
of group coordination in society, how to control complex [25] D. P. Enemark, M. D. McCubbins, R. Paturi, and N. Weller, ‘‘Can we
think locally, act globally? Understanding when local information can
group coordination is also a promising research topic. facilitate global coordination,’’ PSN, Lab. Exp., Topic, Nov. 2010. doi:
10.2139/ssrn.1711582.
[26] A. Peysakhovich and A. Lerer, ‘‘Consequentialist conditional cooperation
REFERENCES
in social dilemmas with imperfect information (short workshop version),’’
[1] E. A. Smith, ‘‘Communication and collective action: Language and the in Proc. 32nd Workshops AAAI Conf. Artif. Intell., 2018, pp. 310–316.
evolution of human cooperation,’’ Evol. Hum. Behav., vol. 31, no. 4, [27] N. Salazar, J. A. Rodriguez-Aguilar, J. L. Arcos, A. Peleteiro, and
pp. 231–245, 2010. J. C. Burguillo-Rial, ‘‘Emerging cooperation on complex networks,’’
[2] E. Fehr and U. Fischbacher, ‘‘Social norms and human cooperation,’’ in Proc. 10th Int. Conf. Auto. Agents Multiagent Syst., vol. 2, 2011,
Trends Cognit. Sci., vol. 8, no. 4, pp. 185–190, 2004. pp. 669–676.
[3] M. Scatà, A. Di Stefano, A. La Corte, P. Liò, E. Catania, E. Guardo, and [28] A. Peleteiro, J. C. Burguillo, and S. Y. Chong, ‘‘Exploring indirect reci-
S. Pagano, ‘‘Combining evolutionary game theory and network theory to procity in complex networks using coalitions and rewiring,’’ in Proc. Int.
analyze human cooperation patterns,’’ Chaos, Solitons Fractals, vol. 91, Conf. Auton. Agents Multi-Agent Syst., 2014, pp. 669–676.
pp. 17–24, Oct. 2016. [29] R. Shibusawa, T. Otsuka, and T. Sugawara, ‘‘Spread of cooperation in
[4] D. G. Rand and M. A. Nowak, ‘‘Human cooperation,’’ Trends Cogn. Sci., complex agent networks based on expectation of cooperation,’’ in Proc.
vol. 17, no. 8, pp. 413–425, Aug. 2013. Int. Conf. Princ. Pract. Multi-Agent Syst. Cham, Switzerland: Springer,
[5] R. Axelrod and W. D. Hamilton, ‘‘The evolution of cooperation,’’ Science, 2016, pp. 76–91.
vol. 211, no. 4489, pp. 1390–1396, 1981. [30] M. Kearns, S. Suri, and N. Montfort, ‘‘An experimental study of the
[6] S. Strang and S. Q. Park, ‘‘Human cooperation and its underlying mecha- coloring problem on human subject networks,’’ Science, vol. 313, no. 5788,
nisms,’’ in Social Behavior From Rodents to Humans. Cham, Switzerland: pp. 824–827, 2006.
Springer, 2016, pp. 223–239. [31] Y. Xiao, M. T. Angulo, J. Friedman, M. K. Waldor, S. T. Weiss, and
[7] M. Tomasello and A. Vaish, ‘‘Origins of human cooperation and morality,’’ Y.-Y. Liu, ‘‘Mapping the ecological networks of microbial communities,’’
Annu. Rev. Psychol., vol. 64, pp. 231–255, Jan. 2013. Nature Commun., vol. 8, no. 1, 2017, Art. no. 2042.
[8] D. D. P. Johnson, P. Stopka, and S. Knights, ‘‘The puzzle of human [32] I. Sghir, J.-K. Hao, I. B. Jaafar, and K. Ghédira, ‘‘A distributed hybrid
cooperation,’’ Nature, vol. 421, no. 6926, pp. 911–912, 2003. algorithm for the graph coloring problem,’’ in Proc. Int. Conf. Artif. Evol.
[9] N. Miller, S. Garnier, A. T. Hartnett, and I. D. Couzin, ‘‘Both information (Evol. Artificielle). Cham, Switzerland: Springer, 2015, pp. 205–218.
and social cohesion determine collective decisions in animal groups,’’ [33] A.-L. Barabási and R. Albert, ‘‘Emergence of scaling in random net-
Proc. Nat. Acad. Sci. USA, vol. 110, no. 13, pp. 5263–5268, 2013. works,’’ Science, vol. 286, no. 5439, pp. 509–512, 1999.

156534 VOLUME 7, 2019


J. Qi et al.: Social Network-Oriented Learning Agent for Improving Group Intelligence Coordination

[34] C. J. C. H. Watkins and P. Dayan, ‘‘Q-learning,’’ Mach. Learn., vol. 8, LIANG BAI received the B.E. and B.M. degrees
nos. 3–4, pp. 279–292, 1992. from Xi’an Jiao Tong University, in 2002, and
[35] L. Waltman and U. Kaymak, ‘‘A theoretical analysis of cooperative behav- the M.E. and Ph.D. degrees from the National
ior in multi-agent q-learning,’’ in Proc. IEEE Int. Symp. Approx. Dyn. University of Defense Technology, in 2005 and
Program. Reinforcement Learn., Apr. 2007, pp. 84–91. 2008, respectively.
[36] E. R. Gomes and R. Kowalczyk, ‘‘Dynamic analysis of multiagent Q- He is currently a Professor with the School
learning with ε-greedy exploration,’’ in Proc. 26th Annu. Int. Conf. Mach. of System Engineering, National University
Learn., 2009, pp. 369–376.
of Defense Technology. His research interests
[37] A. Sánchez, ‘‘Physics of human cooperation: Experimental evidence and
include multimedia content analysis and access,
theoretical models,’’ J. Stat. Mech., Theory Exp., vol. 2018, no. 2, 2018,
Art. no. 024001. particularly for video and images, big multimedia
[38] D. G. Rand, S. Arbesman, and N. A. Christakis, ‘‘Dynamic social networks data, complex system modeling, and network analysis.
promote cooperation in experiments with humans,’’ Proc. Nat. Acad. Sci. Dr. Bai is a member of the Chinese Computer Federation.
USA, vol. 108, no. 48, pp. 19193–19198, 2011.
[39] M. Cebrian, R. Paturi, and D. Ricketts, ‘‘Experimental study of the impact
of historical information in human coordination,’’ 2012, arXiv:1202.2503.
[Online]. Available: https://arxiv.org/abs/1202.2503
[40] M. N. Burton-Chellew and S. A. West, ‘‘Prosocial preferences do not
explain human cooperation in public-goods games,’’ Proc. Nat. Acad. Sci.
USA, vol. 110, no. 1, pp. 216–221, 2013.
[41] K. A. Cronin, D. J. Acheson, P. Hernández, and A. Sánchez, ‘‘Hierar-
chy is detrimental for human cooperation,’’ Sci. Rep., vol. 5, Dec. 2015,
Art. no. 18634.
[42] P. Van Den Berg, L. Molleman, J. Junikka, M. Puurtinen, and F. J. Weissing,
‘‘Human cooperation in groups: Variation begets variation,’’ Sci. Rep.,
vol. 5, Nov. 2015, Art. no. 16144.
[43] Y. Dong, S. Sun, C. Xia, and M. Perc, ‘‘Second-order reputation promotes
cooperation in the spatial prisoner’s dilemma game,’’ IEEE Access, vol. 7,
pp. 82532–82540, 2019.
[44] M.-H. Chen, L. Wang, S.-W. Sun, J. Wang, and C.-Y. Xia, ‘‘Evolu-
tion of cooperation in the spatial public goods game with adaptive
reputation assortment,’’ Phys. Lett. A, vol. 380, nos. 1–2, pp. 40–47,
YANDONG XIAO received the B.E. degree in
Jan. 2016.
information system engineering and the Ph.D.
JINGTAO QI received the B.E. degree in simula- degree in system engineering from the National
tion engineering from the National University of University of Defense Technology, Changsha,
Defense Technology, Changsha, China, in 2018, China, in 2012 and 2018, respectively, and the
where he is currently pursuing the M.E. degree in Ph.D. degree from Harvard University, Boston,
control science and engineering. MA, USA, where he was also a Research
His current research interests include studying Trainee.
complex systems with a combination of theoretical He is currently a Lecturer with the School
tools and data analysis, mathematical modeling of System Engineering, National University of
of heterogeneous information networks, applying Defense Technology. His research interests include complex system mod-
network methodologies to analyze the develop- eling, artificial intelligence, swarm intelligence, interdisciplinary principles
ment of complex collection intelligence, and the data-driven study of col- for community ecology, and social networks.
lective behavior.

VOLUME 7, 2019 156535

You might also like