Professional Documents
Culture Documents
A novel Byzantine fault tolerance consensus for Green IoT with intelligence
based on reinforcement
Peng Chen (first author) a ,1 , Dezhi Han (first author) a ,1 , Tien-Hsiung Weng b , Kuan-Ching Li b ,∗,
Arcangelo Castiglione c
a
Department of Computer Science and Technology, Shanghai Maritime University, Shanghai 200120, China
b Department of Computer Science and Information Engr. (CSIE), Providence University, Taichung 43301, Taiwan
c Department of Computer Science, University of Salerno, Via Giovanni Paolo II, 132, 84084, Fisciano, Salerno, Italy
Keywords: To enhance the consensus performance of Blockchain in the Green Internet of Things (G-IoT) and improve
Blockchain the static network structure and communication overheads in the Practical Byzantine Fault Tolerance (PBFT)
Green IoT consensus algorithm, in this paper, we propose a Credit Reinforce Byzantine Fault Tolerance (CRBFT) consensus
Byzantine fault tolerance
algorithm by using reinforcement learning. The CRBFT algorithm divides the nodes into three types, each with
Reinforcement learning
different responsibilities: master node, sub-nodes, and candidate nodes, and sets the credit attribute to the
Consensus algorithm
Smart city
node. The node’s credit can be adjusted adaptively through the reinforcement learning algorithm, which can
dynamically change the state of nodes. CRBFT algorithm can automatically identify malicious nodes and invalid
nodes, making them exit from the consensus network. Experimental results show that the CRBFT algorithm can
effectively improve the consensus network’s security. Besides, compared with the PBFT algorithm, in CRBFT,
the consensus delay is reduced by about 40%, and the traffic overhead is reduced by more than 45%. This
reduction is conducive to save energy and reduce emissions.
1. Introduction line with the design concept of G-IoT. As the technological foundation
of smart cities, the IoT provides smart cities with urban perception
Green Internet of Things (G-IoT) is a new design concept of the capabilities. Therefore, the development of G-IoT contributes to the
Internet of Things (IoT) that optimizes network devices and introduces sustainable development of smart cities [8,9].
new technologies to achieve energy saving, pollution reduction, and The consensus algorithm is an essential part of Blockchain technol-
lower operating costs [1]. In recent years, with the development of ogy, and it is necessary to solve Byzantine faults. A suitable consensus
the IoT technology, energy consumption problems in network mainte- algorithm is conducive to reducing consensus energy consumption and
nance, communication overhead, and data management have become resource waste. A Byzantine fault takes its name from the Byzantine
increasingly prominent. Therefore, the demand for G-IoT is becoming general problem, proposed by Lamport [10]. The Byzantine general
more and more urgent [2]. A Blockchain is a list of records linked and problem is typically used to characterize consistency problems in dis-
protected by cryptography that holds the following main characteris- tributed systems and represents the underlying issue for the Blockchain
tics: decentralization, distribution, peer-to-peer (P2P) architecture [3]. consensus Algorithm. More precisely, the Byzantine general problem
The essence of a Blockchain is a distributed database technology [4].
in a computer system can be expressed as follows: assuming there is
More precisely, through cryptography, consensus algorithm, distributed
a reliable channel for the transmission of messages, how to avoid the
storage, and point-to-point technology, each node in the network main-
influence of malicious nodes in the system, so that the whole system
tains the network data’s consistency and validity and constructs a
can run well without the impact of such nodes, and ensure the integrity,
decentralized distributed system [5]. The use of Blockchain in the
reliability, and consistency of information data [11]?
IoT can enhance security and mitigate energy consumption problems
The Blockchain consensus algorithm aims to make the transaction
caused by centralized servers [6,7]. As a result, the combination of IoT
data verified by more than half of the nodes. Research on consensus
and blockchain can effectively reduce energy consumption, which is
algorithms has long been underway. Pease and Lamport first proposed
conducive to IoT energy saving and reducing operation costs and is in
∗ Corresponding author.
E-mail addresses: dzhan@shmtu.edu.cn (D. Han), thweng@pu.edu.tw (T.-H. Weng), kuancli@pu.edu.tw (K.-C. Li), arcastiglione@unisa.it (A. Castiglione).
1
Authors Peng Chen and Dezhi Han contributed equally to this work.
https://doi.org/10.1016/j.jisa.2021.102821
the Byzantine Fault Tolerance (BFT) algorithm in the 1980s [12]. This (1) We apply for the first time reinforcement learning to the design
algorithm relies on the mutual transmission of information between of the BFT consensus algorithm, introduce the concept of credit,
the nodes to reach a deterministic consensus result. However, BFT is and propose a reinforcement-based BFT consensus algorithm,
not practical since, in this algorithm, the complexity of the messages CRBFT. With the proposed algorithm, the network has cognitive
exchanged between the nodes is exponential. The process of joining and intelligence, which allows it to identify automatically malicious
exiting nodes requires special processing. In 1993, Cynthia Dwork and nodes and failed nodes, adaptively adjust the credit of nodes
Moni Naor first proposed the proof-of-work (PoW) algorithm [13]. In so that the state between nodes can be dynamically adjusted
this algorithm, the client needs to perform a certain amount of compli- with time. It reduces the interference of malicious nodes to the
cated calculations. Therefore, this algorithm requires more nodes and consensus process and improving the security of the consensus
computing power, resulting in a long transaction time. In 1999, Miguel network;
Castro and Barbara Liskov improved the BFT by proposing the Practical (2) We improve the PBFT algorithm. Change from the C/S paradigm
Byzantine Fault Tolerance (PBFT) algorithm [14]. The PBFT inherited to the P2P paradigm, remove the confirmation phase, and per-
the advantages of BFT while reducing the algorithm complexity from form synchronous verification when the master node changes.
exponential to polynomial. However, the PBFT algorithm adopts the It makes the algorithm conform to decentralization, reduces
Client–Server (C/S) structure [15], and its consensus nodes are fixed, communication overhead, and lowers consensus energy con-
making it unable to perceive the changes in the number of nodes sumption.
dynamically. Therefore, with the further development of Blockchain (3) The experimental results show that compared with the tra-
technology, many researchers have proposed improved algorithms to ditional PBFT consensus algorithm, the CRBFT algorithm we
overcome the limitations of the PBFT algorithm. proposed significantly reduces the consensus delay. In the pro-
In particular, Malkhi et al. introduce the Flexible BFT [16], a new posed algorithm, the performance is less affected by the increase
approach for BFT consensus which is resilient to higher malicious levels in the number of nodes.
than possible in a pure Byzantine fault model. But the new fault model
it proposed cannot predict what these replicas would do if they can vio- The remaining of this paper is organized as follows. Section 2
late safety. Duan et al. present hBFT, a hybrid, Byzantine fault-tolerant introduces some preliminary knowledge and background necessary to
algorithm [17], which can detect and identify faulty clients. But if understand the consensus algorithm design better. Section 3 presents
clients are participating, it is necessary to ensure that the master node the design ideas and details concerning the CRBFT consensus algorithm
is correct to maintain performance. Liu introduces a Dynamic autho- based on reinforcement learning. In Section 4, we show the experi-
rization of the Byzantine fault-tolerant (DDBFT) algorithm [18], which mental results achieved through simulation. Finally, in Section 5, we
applies the Delegated Proof of Stake (DPoS) algorithm to PBFT. The provide some concluding remarks and future research prospects.
DDBFT algorithm is dynamic. This characteristic improves throughput
and reduces the delay. On the other hand, it causes the system blocking 2. Preliminaries and problem formulation
when the block size (transmitted data size) exceeds the node processing
capacity, resulting in wasting resources. Li and Zhang in [19] describe a To better illustrate the design details of the Credit Reinforcement
Group-Hierarchy (GH) algorithm based on PBFT, which divides replicas Byzantine Fault Tolerance (CRBFT) consensus algorithm proposed in
into groups. Each group executes the normal-case operation of PBFT this paper, in this section, we introduce the basic knowledge of the
concurrently to reach a local consensus. Then, every group’s primary Practical Byzantine Fault Tolerance (PBFT) algorithm and provide the
as the consensus representative reaches an agreement with other pri- concepts underlying the reinforcement learning (RL) algorithm used in
maries to reach a global consensus. This algorithm reaches a consensus our proposal.
faster than PBFT but does not deal with malicious nodes, wasting many
resources, and weakening system stability. The above algorithms have
2.1. Practical Byzantine Fault Tolerance (PBFT)
some drawbacks, especially in resource consumption, which do not
meet the G-IoT requirements.
Lamport et al. showed that an effective Byzantine Fault Tolerant
In recent years, with the coming of the research upsurge of machine
algorithm exists when the number of traitors (which can make mis-
learning (ML), the combination of Blockchain and machine learning
takes) in the Byzantine system does not exceed 1/3 of the total amount.
has attracted researchers’ attention [9,20,21]. For example, machine
Conversely, if it exceeds 1/3, there is no guarantee that the system
learning applies to the design of fair data transaction protocol based on
will achieve a consistent result [27]. Therefore, the solution to the
Blockchain [22], and deep learning (DL) finds application in financial
Byzantine problem can be determined if Eq. (1) is satisfied,
investment research based on Blockchain [23]–[24]. Reinforcement
learning (RL) is a behavior-based machine learning method, which can 𝑛 ≥ 3𝑓 + 1 (1)
adjust its behavior through the interaction between agent and envi-
ronment. More precisely, RL has attracted much attention in artificial where 𝑛 is the total number of nodes, and 𝑓 is the Byzantine nodes’
intelligence (AI) for its excellent decision-making ability [25]–[26]. maximum tolerable number.
Based on the above problems, in this paper, we propose a Credit The PBTF algorithm requires that all nodes have the same result of
Reinforcement Byzantine Fault Tolerance (CRBFT) consensus algorithm operation execution under the same given service state and parameters,
based on the RL algorithm, where ‘‘credit’’ is the quantitative per- and all nodes must start execution from the same state. Under this
formance of the credibility of each consensus node, that is, ‘‘credit’’ constraint, even if there are failed nodes, the PBFT algorithm agrees
is an indicator of whether a node can participate in the consensus. on the total execution order of all non-failed nodes, thereby ensuring
Besides, a node with a higher credit value has a higher priority to safety [28]. The PBFT algorithm divides all nodes into three types: a
participate in the consensus. The CRBFT consensus algorithm combines client node, a master node, and a backup node. Again, it divides the
RL with the PBFT algorithm, using the RL algorithm to identify ma- consensus process into three stages: pre-preparation stage, preparation
licious nodes and invalid nodes automatically. Furthermore, CRBFT stage, and confirmation stage. We show the consensus process of the
is reliable, efficient, safe, and dynamic. It can reduce communication PBFT algorithm in Fig. 1.
overhead and consensus energy consumption, which is conducive to The result of the process is not valid until the client node receives
G-IoT development. the same result from at least 𝑓 + 1 normal nodes that do not make
In detail, the main contributions of this paper are as follows: mistakes.
2
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
Fig. 1. The PBFT algorithm consensus process. Fig. 2. Schematic diagram for implementations of the actor-critic.
2.2. Reinforcement learning (RL) parameters by reward 𝑅𝑡+1 to get a higher return. The Actor then
directs the update of the action based on the value derived from the
Reinforcement learning (RL) is a type of machine learning inspired Critic. Then, the objective function is constructed, and the parameters
by animal psychology and combined with psychology, control theory, are updated iteratively to make the output meet the threshold of the
and other related subjects [29]–[30]. In simple terms, RL is a cyclic objective function. The final output is the approximate optimal solution
process in which an agent takes action to change its state to gain reward of the Bellman equation. Therefore, the Actor-Critic method can solve
and interact with the environment. Markov Decision Processes (MDPs) the optimal Bellman equation adaptively, as shown by the diagram in
provide a framework for the study of RL. A typical MDPs problem can Fig. 2.
be expressed as a five-tuple (𝑆, 𝐴, 𝑃 , 𝑅, 𝛾), where 𝑆 is a set of states,
and 𝐴 is a set of actions. The transition probability 𝑃 describes the 3. Credit Reinforcement Byzantine Fault Tolerance consensus al-
probability distribution of the agent’s transition from the current state gorithm design
𝑠 ∈ 𝑆 to other states after the action of 𝑎 ∈ 𝐴. 𝑅 is the reward function,
which defines the reward for the action 𝑎 ∈ 𝐴. 𝛾 is a discount factor that In this section, we provide a detailed description of our proposal.
is mainly used to balance current and future rewards. The objective of More precisely, we first present the design approach underlying the
MDPs is to get the maximum return 𝐺𝑡 when taking the corresponding Credit Reinforcement Byzantine Fault Tolerance (CRBFT) consensus
action 𝑎 under state 𝑠. The return 𝐺𝑡 is the total discounted reward from algorithm, and then we give the relative details.
time-step 𝑡. Formally, 𝐺𝑡 is defined by Eq. (2),
3.1. Algorithm design approach
𝐺𝑡 = 𝑅𝑡+1 + 𝛾 1 𝑅𝑡+2 + 𝛾 2 𝑅𝑡+3 + ⋯ (2)
where 0 < 𝛾 < 1, 𝑅𝑡+1 represents the reward of the state 𝑠𝑡 to 𝑠𝑡+1 . The In the Blockchain, the primary purpose is to reach a consensus
optimal strategy to obtain the maximum return can be found by solving on the Blockchain transaction information across the entire network,
the optimal Bellman equation, which is defined as follows which does not involve the consensus request’s order. Therefore, to
∑ adapt better the Byzantine fault-tolerant consensus algorithm to the
𝑣∗ (𝑠) = 𝑚𝑎𝑥𝑎 𝑞 ∗ (𝑠, 𝑎) = 𝑚𝑎𝑥𝑎 (𝑅𝑎𝑠 + 𝛾 𝑎 ∗ ′
𝑃𝑠𝑠′ 𝑣 (𝑠 )) (3) Blockchain, we propose a Credit Reinforcement Byzantine Fault Tol-
𝑠′ erance (CRBFT) consensus algorithm.
∑
𝑞 ∗ (𝑠, 𝑎) = 𝑚𝑎𝑥𝜋 𝑞 𝜋 (𝑠, 𝑎) = 𝑅𝑎𝑠 + 𝛾 𝑎
𝑃𝑠𝑠 ∗ ′ ′
′ 𝑚𝑎𝑥𝑎′ 𝑞 (𝑠 , 𝑎 ) (4) First of all, we adapt the PBFT algorithm to make it compatible
𝑠′ with the Blockchain system’s effective application. More precisely, we
where 𝑣∗ (𝑠) represents the optimal long-term value of the state 𝑠, that propose the following improvements to the PBFT algorithm:
is, the value of the state in which all possible actions are considered,
and the optimal action is selected; strategy 𝜋 is a method for agent to 1. According to the Blockchain decentralized architecture, we shift
choose actions; 𝑞 𝜋 (𝑠, 𝑎) is the action-value function, which represents from the Client–Server (C/S) paradigm to the Peer-to-Peer (P2P)
the expected return of using strategy 𝜋 and taking action 𝑎 in state 𝑠; paradigm. Thus, there is no client in the system;
𝑞 ∗ (𝑠, 𝑎) represents the optimal value among the action-value functions 2. We divide the consensus node into three types: master node,
generated under all strategies; 𝑅𝑎𝑠 is the expected return of taking action sub-node, and candidate node;
𝑎 in state 𝑠; 𝑃𝑠𝑠𝑎 is the probability of taking action 𝑎 from state 𝑠 to state 3. We set the ‘‘credit’’ attribute for consensus nodes. In this way,
′
′
𝑠. the system can dynamically divide the types of consensus nodes,
We can solve the optimal Bellman equation by the Temporal- and nodes can dynamically join and leave the system;
Difference (TD) algorithm [31]. The TD algorithm is a model-free RL 4. We remove the confirmation phase from the consensus process
algorithm, which updates the state value 𝑉 (𝑠𝑡 ) by predicting the TD of the PBFT algorithm and carry out the synchronous verification
target, where TD target= 𝑅𝑡+1 + 𝛾𝑉 (𝑠𝑡+1 ). process when the master node changes;
5. We perform adaptive consensus confirmation and reputation
𝑉 (𝑠𝑡 ) ← 𝑉 (𝑠𝑡 ) + 𝛼[𝑅𝑡+1 + 𝛾𝑉 (𝑠𝑡+1 ) − 𝑉 (𝑠𝑡 )]. (5) adjustment based on reinforcement learning.
The purpose of the TD algorithm update is to minimize the error We remark that removing the verification phase of the protocol re-
between the final predicted value and the true value, to reduce its error. duces the network’s bandwidth to a certain extent. However, to remove
TD error 𝛿𝑡 is defined by Eq. (6): state inconsistency and ensure the system consistency after the master
node change, the synchronous verification process is included after the
𝛿𝑡 ≐ 𝑅𝑡+1 + 𝛾𝑉 (𝑠𝑡+1 ) − 𝑉 (𝑠𝑡 ). (6)
change of such a node. More precisely, the synchronous verification
The Actor-Critic method [32] is an effective RL algorithm. In this process works as follows. The sub-node sends a synchronization request
method, the Actor selects the action according to the current system to the new master node to verify whether the master node number
state. On the other hand, the Critic gives the corresponding value is consistent. After the synchronization is successful, the master node
evaluation according to the current system state and the action selected sends backup data to the sub-node, and then the sub-node verifies
by the Actor, parameterizes the action-value function, and updates its the validity of the backup data. Through this operation, we reduce
3
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
4
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
where 𝜑𝑖 is the 𝑖𝑡ℎ hidden node input of the critic network; 𝑛𝑖𝑛 + 1 is
the total number of inputs into the critic network, including the analog
action value from the action network; 𝜙𝑖 is the corresponding output;
𝑁ℎ is the total number of hidden nodes in the critic network; 𝑤𝑐 is the
weight vector in the critic network.
According to the error propagation equation of the back propaga-
tion algorithm and the chain rule, the gradient of the neural network
objective function to the weights can be obtained. We summarize the
Fig. 5. Schematic diagram for the implementation of a nonlinear critic network using
a feedforward network with one hidden layer.
adaptation of the critic network as follows:
• 𝛥𝑤(2)
𝑐 (hidden to output layer)
𝜕𝐸𝑐 (𝑘)
to quantify the output of 𝑢(𝑘) better. Then, compare the action output 𝛥𝑤(2)
𝑐 (𝑘) = 𝑙𝑐 (𝑘)[− ]
𝑢(𝑘) and 𝑟𝑒𝑘 to select the reinforcement signal 𝑟(𝑘). If 𝑢(𝑘) and 𝑟𝑒𝑘 have
𝑖
𝜕𝑤(2)
𝑐𝑖 (𝑘)
the same sign, the output is regarded as successful, and 𝑟(𝑘) is ‘‘0’’. On 𝜕𝐸 (𝑘) 𝜕𝐽 (𝑘) (12)
= −𝑙𝑐 (𝑘)[ 𝑐 ]
the other hand, if the sign is different, it is considered to be a failure, 𝜕𝐽 (𝑘) 𝜕𝑤(2) (𝑘)
𝑐𝑖
and 𝑟(𝑘) is ‘‘1’’. = −𝑙𝑐 (𝑘)[𝛼𝑒𝑐 (𝑘)𝜙𝑖 (𝑘)]
In the initial stage of consensus, the weights of the action network
• 𝛥𝑤(1)
𝑐 (input to hidden layer)
and critic network are random. During the neural network’s learning
process, the critic network uses the 𝑟(𝑘) to update the weights and 𝜕𝐸𝑐 (𝑘)
𝛥𝑤(1)
𝑐 (𝑘) = 𝑙𝑐 (𝑘)[− ]
obtain the optimal 𝐽 (𝑘). The action network then uses the optimal 𝐽 (𝑘) 𝑖𝑗
𝜕𝑤(1)
𝑐𝑖𝑗 (𝑘)
to update its weights and get the optimal output. The neural network’s 𝜕𝐸𝑐 (𝑘) 𝜕𝐽 (𝑘) 𝜕𝜙𝑖 (𝑘) 𝜕𝜑𝑖 (𝑘)
= −𝑙𝑐 (𝑘)[ ] (13)
detailed design process will be given in parts A and B of this subsection. 𝜕𝐽 (𝑘) 𝜕𝜙𝑖 (𝑘) 𝜕𝜑𝑖 (𝑘) 𝜕𝑤(1) (𝑘)
𝑐𝑖𝑗
In Fig. 6, we show a diagram characterizing the process for tuning the
1
Actor-Critic method’s parameters. = −𝛼𝑙𝑐 (𝑘)𝑒𝑐 (𝑘)𝑤(2) 2
𝑐𝑖 (𝑘) ⋅ [ 2 (1 − 𝜙𝑖 (𝑘))]𝑥𝑗 (𝑘)
A. Critic network. The learning objective of the critic network is to where 𝑙𝑐 (𝑘) > 0 is the learning rate of the critic network when
minimize the error between the approximation value of the value func- dealing with 𝑖𝑡ℎ message. We remark that this rate usually de-
tion and the actual value, while optimizing the maximum return. The creases over time to a small value.
prediction error function 𝑒𝑐 (𝑘) of the critic network and the minimized B. Action Network. Previously we defined that if the output is regarded
objective function 𝐸𝑐 (𝑘) is defined as follows: as successful, 𝑟(𝑘) is ‘‘0’’. That is, ‘‘0’’ was defined as the reinforcement
signal for ‘‘success’’. To satisfy the Bellman equation and maximize the
𝑒𝑐 (𝑘) = 𝛼𝐽 (𝑘) − [𝐽 (𝑘 − 1) − 𝑟(𝑘)] (7)
state value function, the ultimate learning target of the action network,
denoted by 𝑈𝑐 , is set to ‘‘0’’ in the algorithm. Through observation, we
1 2
𝐸𝑐 (𝑘) = 𝑒 (𝑘). (8) found that the parameter adjustment principle of the action network
2 𝑐
5
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
We define the list NT to record nodes where 𝑟𝑒 is ‘‘1’’, and the list 4.2. Performance evaluation
NF to record nodes where 𝑟𝑒 is ‘‘−1’’. The consensus is confirmed when
the number of nodes in NT is greater than or equal to 2𝐹 . Conversely, 4.2.1. Algorithm performance test
when the number of nodes in NF is greater than or equal to 2𝐹 , the In the experiment, we set 𝐹 to 3, 𝑁 to 13, and we consider 10
sub-node should not trust the master node. consensuses. Again, we set node 3 to become malicious at the 3𝑟𝑑
consensus, node 2 to become failed at the 4𝑡ℎ consensus, and node 5
to become malicious at the 6𝑡ℎ consensus. We show the credit trends
4. Results and discussion of each node in Fig. 7.
In Fig. 7, we can observe that in the 3𝑟𝑑 consensus, node 3 becomes
This section first explains the simulation settings. Then, we test a malicious node and its credit decreases. When the credit of node 3 is
the performance of the CRBFT algorithm and compare the CRBFT lower than 𝐶𝑏𝑎𝑠𝑒 , node 11 changes from a candidate node to a sub-node,
algorithm with other related works to verify its effectiveness and and node 3 becomes a candidate node. Node 2 becomes a failed node at
availability. Finally, we list two algorithm applications. the 4𝑡ℎ consensus, and the credit also decreases, but the decrease rate
6
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
is slower than malicious nodes. When the credit is lower than 𝐶𝑏𝑎𝑠𝑒 , it
becomes a candidate node.
We can conclude that the proposed algorithm can adaptively adjust where 𝑛 is the number of nodes participating in the consensus. Simi-
the credit of consensus nodes, effectively identify malicious nodes and larly, we can get the number of single consensus messages for CRBFT
failed nodes, and dynamically adjust the types of nodes, improving con- as follow:
sensus security. It can also be observed that it will give the successful
𝑍𝐶𝑅𝐵𝐹 𝑇 = 4𝑛 − 3 (25)
sub-node priority to take part in the next consensus, and the master
node has the advantage of being the master node next time. The total amount of messages in GH [19] is given as below:
The comparison of a single consensus time is shown in Fig. 8. We
can observe that the first consensus takes the longest time, and with 𝑛2 ∑
𝑗=𝑔
𝑍𝐺𝐻 = 2 +2 𝑀𝑗2 + 2𝑔 2 + 𝑔(𝑛 − 2) − 𝑛 (26)
the learning of the neural network, the consensus time decreases until 𝑔 𝑗=1
the learning ends. The consensus time stabilizes at around 200 ms.
where 𝑔 is the number of groups, 𝑀𝑗 is the number of nodes for the 𝑗𝑡ℎ
4.2.2. Delay test group. We set the number of nodes in each group to 2 in the experiment.
Consensus delay is an important index to measure the speed of the We compare the number of messages between these algorithms with
consensus algorithm. A low consensus delay can make the transaction a different number of nodes, as shown in Fig. 10. More precisely,
be confirmed quickly. Thus, the Blockchain results in being more secure as shown in Fig. 10, the number of messages exchanged by these
and practical. More precisely, the consensus delay 𝑇𝑑 tested in this three algorithms increases as the number of nodes increases, but PBFT
paper is a consensus completion time, and it is defined by Eq. (23): increases faster. Although GH has fewer messages than PBFT, it still
𝑇𝑑 = 𝑇𝑡𝑐 − −𝑇𝑡𝑟 (23) has more messages than CRBFT. In other words, CRBFT has a small
increase, compared with PBFT and GH, the number of messages is
where 𝑇𝑡𝑟 is the transaction start time, and 𝑇𝑡𝑐 is the consensus comple- reduced by more than 45%. It can effectively reduce the communica-
tion time. Then, 𝐹 is set to values 1, 2, and 3, and 𝑁 is set to values tion frequency of a single consensus process and reduce the resource
4, 7, and 10, respectively. The average value is obtained through 10
consumption of the network.
tests and compared with the PBFT algorithm. The statistical results are
shown in Fig. 9.
In Fig. 9, we can observe that with the increase of nodes, the 4.3. Algorithm application
consensus delay of both algorithms will increase, but the increase of
the PBFT is higher than the CRBFT, so the rise of consensus nodes has 4.3.1. Smart contracts in power system
a more significant impact on PBFT. Besides, using the same number of Blockchain technology can realize more accurate and reliable per-
nodes, the consensus delay of CRBFT is significantly lower than that of ception, transmission and recording of power systems, and more intel-
PBFT, which is reduced by about 40%. ligent and efficient mining, integration, and analysis of physical char-
acteristics and internal connections. Besides, Blockchain technology
4.2.3. Communication frequency comparison
can provide an open, transparent, and credible platform for electricity
According to the PBFT algorithm execution flow, the number of
market transactions, reducing contract execution risks and regulatory
messages in each phase is simplified to get the Eq. (24) as follows:
costs. Simultaneously, it assists people in thinking and decision-making
𝑍𝑃 𝐵𝐹 𝑇 = 2𝑛 ⋅ (𝑛 − 1) (24) in the power system and its related social systems, which improves the
7
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
Fig. 10. The number of messages between the related algorithms. This paper proposes a Credit Reinforcement Byzantine Fault Toler-
ance Consensus (CRBFT) algorithm based on reinforcement learning. In
the CRBFT algorithm, the credit attribute is set to the node. The node
credit is then adaptively adjusted through the reinforcement learning
algorithm to have a specific network cognitive ability. The proposed
algorithm can automatically identify malicious nodes and failure nodes
in the consensus network, thus improving consensus network security,
reducing consensus delay, stimulating energy saving, and emission
reduction. Moreover, compared with the PBFT algorithm, the CRBFT
algorithm has lower consensus delay, fewer communication times, and
less network resource consumption. These characteristics make our
proposal suitable for constructing Green IoT and promoting smart cities’
development.
As future research, we want to optimize the algorithm details, for
example, parameter selection in credit adjustment rules. In particular,
we intend to find new solutions to speed up the learning process
Fig. 11. The Blockchain mode of power system state estimation.
and stability of the neural network, besides further improving the
algorithm’s security, to apply it to different types of Blockchain in
IoT, such as DAG-structured blockchains [34]–[35]. Meanwhile, we
management level [33]. Its related applications are, for example, the intend to explore the possibility of applying the Blockchain in the smart
Blockchain mode of power system state estimation shown in Fig. 11. grid to improve the algorithm further and promote the smart grid’s
Fig. 11 is a block chain-based power system state estimation model. development.
In Fig. 11, the data center and users form a blockchain power system,
and they both have communication and verification functions. The Declaration of competing interest
data center receives power state estimation data from users and stores
the data. If an attacker wants to tamper with the information of a The authors declare that they have no known competing finan-
data center or a single user, it needs to attack most users in the cial interests or personal relationships that could have appeared to
blockchain system at the same time, which requires a huge cracking influence the work reported in this paper.
cost. Therefore, the blockchain model can better prevent data intrusion
and form a data protection barrier. Acknowledgment
More precisely, combining Fig. 11 and Blockchain technology
knowledge, we can apply the CPBFT algorithm to power systems. This study was funded by the National Natural Science Foundation
Based on Blockchain data’s tamper-proof characteristics, Blockchain of China, under grants 61873160 and 61672338.
nodes store energy consumption information collected from the IoT’s
intelligent metering devices, and then consumers in the Blockchain self- References
execute smart contracts through the CRBFT algorithm. When the smart
contract legally satisfies all the judgment conditions, and the consensus [1] Huang J, Meng Y, Gong X, Liu Y, Duan Q. A novel deployment scheme for green
node reaches a consensus, the smart contract automatically enforces internet of things. IEEE Internet Things J 2014;1(2):196–205. http://dx.doi.org/
10.1109/JIOT.2014.2301819.
and improves transaction flexibility and efficiency. Concurrently, the
[2] Zhang X, Huang Y, Wang WB. Green internet of things: Requirements,
increase or decrease of credibility in the CRBFT algorithm is used as development status and key technologies. Telecommun Sci 2012;28(8):96–104.
rewards and punishments to balance energy demand and grid energy [3] Chang J, Han F. Blockchain: from digital currency to credit society. Beijing:
production rules, thereby balancing energy supply and demand. When China CITIC Press; 2016.
there is an external attack, the CRBFT algorithm can screen out the [4] Yuan Y, Wang FY. Blockchain: The state of the art and future trends. Acta
Automat Sinica 2016.
error node and take defensive measures to ensure the information [5] Bonneau J, Miller A, Clark J, Narayanan A, Kroll JA, Felten EW. SoK: Research
security of the power system. The CRBFT algorithm ensures each perspectives and challenges for bitcoin and cryptocurrencies. In: 2015 IEEE
transaction’s legitimacy, guarantees the power system’s safety, and is Symposium on security and privacy, 2015, pp. 104–121.
conducive to the construction and development of smart grids. [6] Tian Q, Han D, Li K-C, Liu X, Duan L, Castiglione A. An intrusion detection
approach based on improved deep belief network. Appl Intell 2020;50:3162–78.
[7] Xiao L, Han D, Meng X, Liang W, Li K-C. A secure framework for data sharing
4.3.2. Transaction settlement and clearing in finance in private blockchain-based WBANs. IEEE Access 2020;8:153956–68.
In the field of clearing and settlement, the traditional transaction [8] Dorri A, Kanhere SS, Jurdak R, Gauravaram P. Blockchain for IoT security and
model is that both parties keep separate accounts, and as the data is privacy: The case study of a smart home. In: 2017 IEEE international conference
recorded by each other, the authenticity is difficult to guarantee. In on pervasive computing and communications workshops (PerCom Workshops), 2017.
[9] Liang W, Zhang D, Lei X, Tang M, Li K-C, Zomaya A. Circuit copyright
contrast, the data in the blockchain is distributed, and each node has blockchain: Blockchain-based homomorphic encryption for IP circuit protection.
access to all the transaction information, and once changes are detected IEEE Trans Emerg Top Comput 2020. http://dx.doi.org/10.1109/TETC.2020.
the whole network can be notified to prevent tampering. 2993032.
8
P. Chen et al. Journal of Information Security and Applications 59 (2021) 102821
[10] Drdobbs. The byzantine generals problem. Acm Trans Program Lang Syst [22] Zhao YQ, Yu Y, Li YN, Han G, Du XJ. Machine learning based privacy-preserving
1982;4(3):382–401. fair data trading in big data market. Inform Sci 2019;478:449–60.
[11] Lamport L. Seminal research document related to the field of byzantine fault [23] Xie MH, Li HY, Zhao YJ. Blockchain financial investment based on deep learning
tolerance. 1982. network algorithm. J Comput Appl Math 2020;372:112723.
[12] Pease M, Shostak R, Lamport L. Reaching agreement in the presence of faults. [24] Pang XW, Zhou YQ, Wang P, Lin WW, Chang V. An innovative neural network
J ACM 1980;27:228–34. approach for stock market prediction. J Supercomput 2020;76(3):2098–118.
[13] Dwork C, Naor M. Pricing via processing or combatting junk mail. In: Annual [25] Mendel JM, Mclaren RW. Reinforcement learning control and pattern recognition
international cryptology conference, 1992. systems. In: A prelude to neural networks. 1970.
[14] Castro M, Liskov B. Practical byzantine fault tolerance. Acm Trans Comput Syst [26] Busoniu L, Babuska R, Schutter BD, Ernst D. Reinforcement learning and dynamic
2002;20(4):398–461. programming using function approximators, first ed.. USA: CRC Press, Inc.; 2010.
[15] Androutsellis-Theotokis S, Spinellis D. A survey of peer-to-peer content [27] Pease M, Shostak R, Lamport L. Reaching agreement in the presence of faults.
distribution technologies. ACM Comput Surv 2004;36(4):335–71. J ACM 1980;27:228–34.
[16] Malkhi D, Nayak K, Ren L. Flexible byzantine fault tolerance. In: Proceedings of [28] Reiter MK. A secure group membership protocol. IEEE Trans Softw Eng
the 2019 ACM SIGSAC conference on computer and communications security. 1996;22(1):P.31–42.
CCS ’19, New York, NY, USA: Association for Computing Machinery; 2019, p. [29] Sutton R, Barto A. Reinforcement learning: An introduction (adaptive
1041–53. http://dx.doi.org/10.1145/3319535.3354225. computation and machine learning). 1998.
[17] Duan S, Peisert S, Levitt KN. hBFT: Speculative byzantine fault tolerance with [30] Lee D, Seo H, Jung MW. Neural basis of reinforcement learning and decision
minimum cost. IEEE Trans Dependable Secure Comput 2015;12(1):58–70. http: making. Ann Rev Neuroence 2012;35(1):287.
//dx.doi.org/10.1109/TDSC.2014.2312331. [31] Sutton, Richard S. Learning to predict by the methods of temporal differences.
[18] Liu XF. Research on blockchain performance improvement based on byzantine Mach Learn 1988;3(1):9–44.
fault tolerance consensus algorithm based on dynamic authorization. [Ph.D. [32] Werbos P. Approximate dynamic programming for real-time control and neural
thesis], Hangzhou: Zhejiang University; 2017. modeling. 1992.
[19] Li QW. Research on consensus efficiency based on practical byzantine fault [33] Wang S, Guo CX, Feng B, Zhang H, Du ZD. Application of blockchain
tolerance. In: ICMIC. Guiyang; 2018. technology in power system: Prospects and ideas. Autom Electr Power Syst
[20] Liang W, Fan Y, Li K-C, Zhang D, Gaudiot J-L. Secure data storage and 2020;44(11):10–24.
recovery in industrial blockchain network environments. IEEE Trans Ind Inf [34] Suhail S, Hussain R, Khan A, Hong CS. Orchestrating product provenance
2020;16(10):6543–52. story: When IOTA ecosystem meets electronics supply chain space. Comput
[21] Liang W, Huang W, Long J, Zhang K, Li K-C, Zhang D. Deep reinforcement Ind 2020;123:103334. http://dx.doi.org/10.1016/j.compind.2020.103334, https:
learning for resource protection and real-time detection in lot environment. IEEE //www.sciencedirect.com/science/article/pii/S0166361520305686.
Internet Things J 2020;7(7):6392–401. [35] Suhail S, Hussain R, Jurdak R, Hong CS. Trustworthy digital twins in the
industrial internet of things with blockchain. 2020, arXiv:2010.12168.