You are on page 1of 19

applied

sciences
Article
State Merging and Splitting Strategies for Finite State Machines
Implemented in FPGA
Adam Klimowicz * and Valery Salauyou

Faculty of Computer Science, Bialystok University of Technology, Wiejska 45A, 15-351 Bialystok, Poland
* Correspondence: a.klimowicz@pb.edu.pl

Abstract: Different strategies for the combination of merging and splitting transformation procedures
for incompletely specified finite state machines implemented on field-programmable logic devices are
offered. In these methods, such optimization criteria as the speed of operation, power consumption
and implementation cost are considered already in the early phase of finite state machine synthesis.
The methods also take into account the technological features of programmable logic devices and the
state assignment method. The transformation quality ratio is calculated on the base of estimations of
consumed power, critical path delay and number of utilized logic cells. The user is also able to choose
the order of merging and splitting procedures and the direction of the optimization by setting weights
for each criterion. The methods of the estimation of optimization criteria values are described, and
the experimental results are also discussed.

Keywords: logic synthesis; field-programmable gate arrays; finite state machines; logic optimization;
state splitting; state merging

Citation: Klimowicz, A.; Salauyou, V.


1. Introduction
State Merging and Splitting A digital system can be described as a collection of finite state machines (FSM) and
Strategies for Finite State Machines combinational circuits. FSMs are often used as independent modules as control devices.
Implemented in FPGA. Appl. Sci. Usually, the original machines must be created by the engineer any time when he wants
2022, 12, 8134. https://doi.org/ to create a new project. The success of the entire project is largely determined by the
10.3390/app12168134 parameters of the FSMs built in the digital circuit. Therefore, the questions of finding the
Academic Editor: Alexander
optimal representation of a finite state machine are always topical.
Barkalov Currently, field-programmable gate arrays (FPGA) are commonly applied for building
digital systems. A substantial number of optimization algorithms for finite state machines
Received: 18 July 2022 are focused on their implementation in FPGA. The criteria for optimizing FSMs are generally
Accepted: 12 August 2022
the area (cost of implementation), speed (critical delay path) and power consumption
Published: 14 August 2022
(dissipation). The area criterion is not a critical restriction because new FPGAs have a great
Publisher’s Note: MDPI stays neutral number of logical elements built from look-up tables (LUT). Recently, the most significant
with regard to jurisdictional claims in optimization criteria are critical path delay and energy consumption.
published maps and institutional affil- The traditional process of the synthesis of finite state machines contains the following
iations. phases, which are sequentially executed: a minimization of the number of states (state
merging), an encoding of states (state assignment) and a synthesis of the combinational
part of finite state machine. Nevertheless, traditional methods frequently contradict the
FSM optimization target at the stage of logic synthesis because all of the above-mentioned
Copyright: © 2022 by the authors.
design phases totally ignore the characteristics of the technological base and the constraints
Licensee MDPI, Basel, Switzerland.
of the logic synthesis process.
This article is an open access article
A classic attempt to solution of the problem of machine state merging relies on a
distributed under the terms and
creation of sets of matching states and searching for a minimal closed cover, which is an
conditions of the Creative Commons
NP-complete problem [1]. Ref. [2] describes an exact minimization method based on the
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
mapping of incompletely specified FSMs to a FSMs tree. In [3], a branch-and-bound search
4.0/).
algorithm for the determination of sets of matching states is presented. One of the most

Appl. Sci. 2022, 12, 8134. https://doi.org/10.3390/app12168134 https://www.mdpi.com/journal/applsci


Appl. Sci. 2022, 12, 8134 2 of 19

known solutions is the STAMINA program [4], which can work in heuristic and exact
variants and applies explicit enumeration to the solution of the state minimization task.
The other implementation of merging procedure is presented in Ref. [5], which describes a
program for parallel state reduction and state encoding, where incompletely specified state
codes can be built.
The value of splitting the states of FSMs in state encoding procedure was declared
in [6] and soon after in [7], where the splitting operation was used to decrease the power
dissipation and resource utilization of the designed FSM. Ref. [8] describes an application of
state splitting for the simultaneous state minimization and the state assignment of the FSM.
Many authors have pondered the synthesis methods for high-speed FSMs imple-
mented on programmable logic devices with a large variety of approaches. Ref. [9] con-
siders the problem of state encoding and optimization of the combinational part upon
the implementation of high-performance FSMs in complex programmable logic devices
(CPLD). Ref. [10] presents a novel architecture that is particularly optimized for implemen-
tation of reconfigurable FSMs; this architecture is called the transition-based reconfigurable
FSM (TR-FSM) and shows a significant reduction in area, speed, and power consumption
in relation to FPGA architectures. In [11], the implementation of finite state machines in
FPGA with the application of integral blocks of read-only memory (ROM) is described.
The presented approach shows two pieces of FSMs structure with multiplexers on inputs
of ROM blocks, which allow decreasing the area and increasing the FSM speed. Ref. [12]
presents BT-FSM, which is a finite state machine with a single bit input, where the state
transition graph is in a form of a binary tree. The architecture of FSM is based on the
previously developed model of the finite virtual state machine (FVSM) [13]. In [14], a modi-
fication of the feedback of asynchronous FSMs and convergent state encoding is proposed.
In this approach, asynchronous FSMs can be realized as simply as synchronous ones.
Ref. [15] presents the extended burst-mode architecture, based on local synchronization
signals, which allows using approaches for synchronous machines, for the synthesis of
asynchronous machines. The increase in speed of FSM can be achieved also by using a
state splitting procedure. Ref. [16] presents the method based on the splitting of internal
states, which makes it possible to decrease the ranks of transition functions and decrease
the number of logic levels of transition functions.
Many approaches to the power consumption reduction of state machines have been
recently proposed. They are mostly based on special state encoding procedures, decompo-
sition, device clocking control and others. In Ref. [17] a genetic fuzzy c-mean c1ustering-
based decomposition method, named GFCM-D, is offered for FSM partition into a set of
c-fuzzy clusters. For reaching low power consumption, the target function of GFCM-D is
minimized with the application of a genetic algorithm. A partitioning is widely used for
FSM power minimization because most of time, only one of sub-FSMs should be clocked; in
consequence, the energy is saved. Ref. [18] proposes a multi-population evolution strategy,
denoted as MPES to accomplish the task of searching for a low power state encoding in
FSM synthesis. MPES resolves this problem by using inner and outer evolution strategies
(ES). In the inner strategy, subpopulations evolve independently and are responsible for
local search in separate regions, while the outer strategy plays the role of a shell to optimize
the subpopulations of inner-ES for improved solutions. Ref. [19] proposes a fast algorithm
based on state transitions probability and simple control logic to realize the partitioned
machines. An effective method for decreased dynamic power by reducing the switching ac-
tivity is clock gating. Ref. [20] presents a consolidated and close-grained architecture-level
clock gating mechanism for low-power hardware accelerators which are automatically
created by a high-level synthesis tool. Another method includes the conception of clock
gating into both the state logic (DGS) and output logic (DGO) in FSM individually and can
be applied in most cases in any FSM [21]. The gating control logic automatically extracts
information from the FSM state description. The desired adjacency graph to reduce the
power dissipation is used in the method from [22]. A low-power state-encoding technique
with upper bound peak current constraints is proposed in [23]. Ref. [24] presents a syn-
Appl. Sci. 2022, 12, 8134 3 of 19

thesis methodology dedicated to low power implementations of combinatorial circuits in


FPGA devices. In this method, Boolean functions are defined by BDD (binary decision
diagram). A brand-new structure of the switch activity BDD is suggested, which uses a
function decomposition to minimize the switching activity of the logic. The algorithm
of state encoding based on a decomposition and probabilistic description of the FSM is
proposed in [25]. In this method, a binary tree with nodes created by sharing a finite state
machine is used.
In several papers, the area is reduced concurrently with the minimization of the
power dissipation in the phase of state encoding. Most works [26–29] propose genetic
algorithms for this purpose. A new methodology of logic decomposition with application
of BDDs is offered in [30]. The core of the proposed algorithm is the multiple cutting of
a BDD. Additionally, methods of searching for the finest technology mapping focused
on the configurability of FPGA logic blocks are described. Refs. [31,32] propose a novel
technology-dependent design method which produces FSMs with three levels of logic
blocks and regular systems of connections between the logic levels. The algorithm is
based on splitting the set of internal states into two subsets. Each subset relates to a unique
fragment of an FSM. The offered algorithm is placed in the area of two-fold state assignment
techniques. In [33], the method based on constructing a partition for the set of output
variables is proposed. It minimizes the number of extra variables encoding the collections
of output variables (COVs).
The analysis of the known works shows that there are no works when in the primary
phase of the synthesis process; the occupied area, speed, and power consumption are
simultaneously minimized with the concurrent merging and splitting of the internal states
of the FSM. Methods claiming that several optimization criteria are considered at the same
time and, in fact, are reduced to the traditional approach, at each stage of which several
algorithms are proposed.
In this paper, three heuristic strategies for optimizing incompletely specified FSMs
are proposed, which, at the stage of merging and splitting states, take into account the pa-
rameters of the technological base, the method of state encoding used in the synthesis, and
to improve such parameters as the area, speed, and power consumption. The considered
approach is focused on the implementation of finite automata on FPGAs based on look-up
tables. In system-on-programmable-chip (SoC) devices, the programmable section has the
FPGA architecture, so the proposed techniques can be also applied when implementing
state machines on SoC devices.

2. Materials and Methods


2.1. Idea of the Method
The idea of the approach is to execute sequential operations of splitting or merging
the states if it is possible. In this way, new equivalent FSMs are built, whose further
implementation can give various outcomes in terms of area, speed, and power consumption.
In this article, three different strategies for combining and splitting states are shown:
• Merge-then-split strategy (MS), where state merging is performed first and, when it is
impossible, state splitting is performed.
• Split-then-merge strategy (SM), where state splitting is performed first and, when it is
impossible, state splitting is performed.
• Combined strategy (COMB), where the decision as to whether to perform state merging
or splitting is made at each subsequent finite automaton transformation operation.
There can be also different optimization criteria considered in the proposed strategies:
power consumption, speed, area, and balanced optimizations. To calculate the evaluation
parameters and then implement the FSM, a state assignment using the selected method
should be performed. To determine which state to split or which pair of states to choose,
trial merging and trial splitting operations are executed.
Appl. Sci. 2022, 12, 8134 4 of 19

The implementation cost is not a critical boundary because contemporary FPGA


devices have a large number of logical elements. For this reason, the area parameter is not
considered in this work, but it was investigated in earlier works [34].

2.2. Estimation of Optimization Criteria


For the estimation of the optimization criteria, all states (for splitting) or couples of
states (for merging) should be considered in sequence. For each state or couple of states,
a trial splitting or merging is executed. Next, the internal states are encoded applying
one of the common encoding techniques, and the set of logic functions relating to the
combinational part of the FSM is constructed. After that, for the state to split or pair of
states to merge, power consumption Pi , critical delay path Si or transformation quality Qi
ratios are estimated.

2.2.1. Estimation of Power Consumption


In most cases, the power dissipation in the digital circuits is a combination of the two
components: static power—associated with the staying in some state (e.g., high level on the
outputs); and the dynamic power—associated with alternating the state of the device. The
dynamic power dissipation of the digital system depends on the frequency of switching
the output registers.
The static power (also called leakage power) is generally the result of the unwanted
subthreshold current in the transistor channel when the transistor is turned off. It depends
on the supply voltage, the switching threshold voltage, and the transistor size. All these
parameters depend on the technological base, which is used for circuit implementation.
Using equivalent transformations of the FSM and different state assignment methods at the
stage of logic synthesis, we cannot change these parameters. To reduce the static power,
the methods such as dynamic voltage and frequency scaling, multi-voltage threshold and
power gating should be used additionally.
A finite state machine is a tuple F = {A, X, Y, ϕ, ψ, areset }. In this notation, A is an
M-element set of internal states A = {a1 , . . . , aM }, with one selected initial state (areset ). The
set X is an L-element set of input values X = {x1 , . . . , xL }, and the set Y is an N-element set of
output values Y = {y1 , . . . , yN }. There are also two functions describing the behavior of the
FSM depending on the input vector: transition function ϕ and output function ψ. Transition
function ϕ : A × X → A defines the next FSM state, depending on the present FSM state
and the input vector. Output function ψ : A × X → Y for Mealy FSM, or ψ : A → Y for
Moore FSM, defines the output vector for a current state and input vector (Mealy) or only
for a current state (Moore).
Additionally, for real implementations of FSMs in digital systems, the tuple F also
contains a set of codes C = {c1 , . . . , cM } whose cardinality is equal to the cardinality of a
set A because each code ci relates to the state ai . Each code can be saved as an R-bit vector,
where R ∈ hdlog2 M ei, M. R is also the number of memory elements needed to store the
code of a present FSM state. Moreover, all codes of states should be orthogonal i.e., there
must be no pairs of two identical codes.
The method which was described in [35] can be used to compute the dynamic power
consumption of an FSM. The method is based on the state assignment and the probability
of a “1” (or “0”) appearing on the input. Then, the power dissipation of an FSM can be
described by the formula
R
Ptotal = ∑r=1 Pr , (1)
where Ptotal —entire dynamic power dissipation of FSM; and Pr —dynamic power dissi-
pation of r-th flip-flop (a state code memory element). The power consumption of each
flip-flop is determined by the expression.
Appl. Sci. 2022, 12, 8134 5 of 19

1 2
Pr = V × f × C × SAr , (2)
2 dd
where Pr —power dissipated by memory element r; Vdd —supply voltage; f —operating
frequency; C—output capacitance of each flip-flop; and SAr —switching activity of r-th
flip-flop, r ∈ <1, R>.
Let ci be a binary vector used as a code of state ai . Assuming that the number of bits of
code ci is equal to R, let Vr (ci ) represent the value of r-th bit of code ci of state ai , r ∈ <1, R>.
Then, the switching activity SAr of memory element r is described by the following formula:
M M   
SAr = ∑i=1 ∑ j=1 P ai → a j × V r (ci ) Vr cj ,
M
(3)

where P(ai → aj )—probability of transition from state ai to state aj (ai , aj ∈ A); and —logic
L

operator “exclusive or” (XOR). The probability of transition P(ai → aj ) can be calculated
using the following equation:
 
P ai → a j = P ( ai ) × P X ai , a j ) , (4)

where P(ai )—probability that the ai is the current state of the FSM; and P(X(ai , aj ))—
probability that the input vector is equal to X(ai , aj ), which causes a transition from state ai
to state aj .
Let Vb (X) represent the value of the b-th variable of input vector X. The probability
P(X(ai , aj )) that input vector of the FSM is identical to X(ai , aj ) is described by the equation
 
L
= ∏ b =1 P V b X a i , a j = d ,
 
P X ai , a j (5)

where d ∈ {“1”, “0”, “–”}; and P(xb = d)—the probability that input variable xb from input
vector X(ai , aj ) is identical to d.
In our method, we assume that probabilities of both 0 and 1 on any FSM input are the
same, thus P(xb = 0) = P(xb = 1) = 12 and P(xb = “–”) = 1. Notice that we do not consider the
correlations between the values on individual inputs.
Next, from the following system of equations, we can determine the probability P(ai )
that a current state of FSM is ai , i = <1, M>:
M
∑P
 
P ( ai ) = a j × P X a j , ai , i = h1, M i. (6)
j =1

When no transitions between states aj and ai exist, it can be assumed that P(X(aj , ai )) = 0.
Consequently, when transitions from the state aj to state ai exist, the value P(X(aj , ai )) is
defined as a sum of the probabilities for every input vector, which causes a transition from
state aj to state ai .
The Formula (6) denotes the linear system of M equations in M variables P(a1 ), . . . ,
P(aM ). The system is linearly dependent, and the number of its solutions is infinite. How-
ever, we can notice that the machine is always in one of its internal states, and Formula (7)
is correct:
M
∑ P(ai ) = 1. (7)
i =1

One of the equations in (6) should be substituted by Equation (7) to solve the system
of Equation (6). The power estimation algorithm was fully described in [35].

2.2.2. Estimation of Critical Path


In general, the architecture of contemporary FPGAs can be characterized as a set of
logic elements based on look-up tables (LUTs). The LUTs can implement any Boolean
function, with a small number of input variables (usually 4–8), so they can be called
Appl. Sci. 2022, 12, 8134 6 of 19

function generators. When the number of arguments of logic functions is greater than the
number of LUT inputs n, the logic function should be decomposed regarding the number of
arguments [36]. The most common decomposition methods are linear (serial) and parallel.
The length of the critical path of combinational part of FSM defines the speed of work
of an entire FSM. This parameter is equal to the number of logic elements participating in the
critical path. The maximum number of arguments Lmax of the logic functions implemented
in the combinational part of the FSM can be determined after state assignment and creating
transition functions. If the technological base of implementation is a FPGA device, the
length of the critical path is defined only by parameter Lmax . When the linear decomposition
is applied, it can be formulated as

Si = 1 + int((Lmax − n)/(n − 1)). (8)

When the parallel decomposition is used, it can be described as

Si = int(logn Lmax ). (9)

The full critical path estimation process was fully presented in work [37].

2.2.3. Estimation of Transformation Quality Ratio


If we want to use different criteria to evaluate the quality of the merging or splitting
the states, a weighted sum can be used, which is one of the well-known methods of
discrete multicriteria optimization [38]. Due to its ease, this method is probably the most
widespread solution. In this method, a scalar cost function is specified as an aggregation of
costs with weights.
Let F = (F1 , . . . , Fd ) be a d-dimensional function and let λ = (λ1 , . . . , λd ) be a vector
which fulfils the following conditions:

∀ j ∈ [1 . . . d], λ j > 0, (10)

d
∑ j=1 λ j = 1. (11)

The λ-aggregation of F is the following function:

d
Fλ = ∑ λ j Fj . (12)
j =1

Naturally, the elements of λ correspond to the relative significance (weight). In this


case, we have two criteria: power Pi and speed Si . The weights for them can be specified
by the user, appropriately wP (power) and wS (speed). With respect to above consideration,
the transformation quality (aggregation) function Qi for single state (for splitting) or any
pair of states (for merging) can be specified as follows:

Qi = w P P̂i + wS Ŝi , (13)

where P̂i and Ŝi are normalized criteria parameters Pi and Si . The normalization is
performed to eliminate the influence of wide range of magnitudes for the considered
parameters. The normalization can be described by the formula

K̂i = (Kmax − Ki )/(Kmax − Kmin ), (14)

where Ki —one of considered criteria parameters (Pi or Si ), Kmax = 2·Ki , Kmin = 0. The
assumed values of Kmin and Kmax ensure that the initial value of the transformation quality
ratio will be equal to 0.5.
Appl. Sci. 2022, 12, 8134 7 of 19

2.3. State Merging Procedure


The merging procedure is based on the algorithm for the minimization of the number
of FSM states offered in [39]. The idea of this algorithm relies on the sequential merging
of only two states. For this reason, the set G of all couples of internal states which satisfy
the merging conditions is settled at each step. Next, for every couple in the set G, an
experimental merging is performed. Then, the couple that has the highest chance for
merging other pairs in the next step is selected for the final merging.
We can join two machine states as and at (replace by one state ast ) in case of their
equivalency. It means that the FSM behavior stays the same without changes after merging.
FSM work does not vary after merging states as and at if the conditions of transitions from
the states as and at that go to separate states are orthogonal. If transitions from states as
and at go to the one state, then the conditions of transitions should be equal. Additionally,
the output vectors produced at these transitions should not be orthogonal. Please note
that during merging procedure, the wait states can be created. The methods of choosing
couples of states to merge and the merging algorithm are fully described in [39].

2.4. State Splitting Procedure


The procedure of splitting the internal states of the FSM is an equivalent transformation
of the FSM that does not change its behavior, general structure, and type. Therefore,
including splitting into the synthesis process while implementing the finite state machine
in FPGA devices is useful and can be simply added to the procedure of system design.
The state splitting procedure may lead to a decrease in power dissipation of the FSM [7]
and to a gain in its speed of operation [40]. Any splitting of states leads to a growth in the
number of states and hence, may lead to an increase in the number of memory elements
needed for FSM implementation (increasing the cost). For this reason, the state-splitting
procedure, taking into account the cost of realization of the FSM, is not considered in
this paper.

2.4.1. State Splitting Procedure for Power Minimization


Using the classic state encoding methods, exactly one orthogonal code is assigned to
each internal state. This implies applying codes with a Hamming distance greater than one.
It may lead to an increase in switching activity of the memory elements used for saving the
codes of FSM states. Of course, it is difficult or even impossible to guarantee a Hamming
distance equal to 1 for all codes. The splitting of the internal states is one of the solutions
to this problem. This operation gives more chances to find the couple of codes with the
Hamming distance equal to one. Therefore, this should lead also to the decrease in power
dissipation in the FSM [7].
Let XP (ai ) = {z ∈ Z: ϕ(aj , z) = ai , ai ∈ A, aj ∈ A} be the set of all input vectors, which
cause the transitions to the state ai . Let XF (ai ) = {z ∈ Z: ϕ(ai , z) = ak , ai ∈ A, ak ∈ A} be the set
of all input vectors, which trigger the transition from the state ai .
(1) (2)
For any state ai , card(XP (ai )) > 1 can be split into two new states ai and ai . After
(1) (2)
this procedure, the state ai is substituted with states ai and ai such that we have the
following:
• Sets XF for the new states are the same as the set for source state:
   
(1) (2)
X F ai = X F ai = X F ( a i ), (15)

• Set XP of the source state ai is split into two individual components:


       
(1) (2) (1) (2)
X P ai ∪ X P ai = X P ( a i ), X P a i ∩ X P ai = ∅. (16)

The procedure of splitting the internal states of the FSM is reversible, hence the
(1) (2)
machine can return to its previous form by the merging of the states ai and ai into
Appl. Sci. 2022, 12, 8134 8 of 19

one state ai . After splitting, the number internal states are greater for the final FSM, but
the average number of the input vectors that cause the transitions to the state is lower.
Additionally, it is more feasible to assign the codes with a smaller value of the Hamming
distance, which causes the lower power consumption in the synthesized FSM.

2.4.2. State Splitting Procedure for Critical Path Minimization


The state splitting procedure for speed maximization comes from Ref. [40] but is
adapted to use both binary and one-hot types of encoding. Just like in the work [40], the
key strategy relies on searching for the set D of all states fulfilling the conditions for splitting:

∃ ai ∈ A, card( B( ai )) > 1, (17)


∃ a j ∈ B ( a i ), r j ≤ r ∗ , (18)
where rj is the number of arguments of the function that initiates the transition to state ai ,
r* is the upper limit of the number of arguments for all transition functions, A is a set of
internal states, and B(ai ) is a set of states with transitions to state ai .
If the conditions are satisfied, for each state from the set D the trial splitting is made.
Each state ai ∈ D is split into two new states. The first state is related to transitions from
state aj ∈ B(ai ), where rj = max. The second state is related to the remaining transitions to
state ai . Finally, state ai is selected for real splitting, which best fits the optimization criteria
in regard to the FSM operation speed (minimization of critical path length Si ).

2.5. General FSM Synthesis Method


The general synthesis method uses two equivalent transformations of FSMs: a splitting
and a merging. For this purpose, two sets are created: D—a set of states that can be split;
and G—a set of state pairs that can be joined. Next, for each equivalent machine, the power
consumption Pi , the maximum critical delay path (speed) Si , and the cost of implementation
(area) Ci , are calculated. The area parameter is not considered in this paper, as it was
mentioned before. From the obtained results, a state (for splitting) or a couple of states
(for merging) is selected for which the considered parameter is lowest (in case of speed or
power) or highest (when using the balanced method) after the modification of the FSM.
In the merge-then-split (MS) strategy, there is always a merging that is performed
first and after all possible merges, the splitting of states should be done. This strategy for
the speed minimization is described using Algorithm 1. If we want to consider another
criterion of optimization (e.g., power), we should replace the Si parameter with the Pi
parameter. In the case of using a balanced variant, we should use the transformation
quality ratio Qi and replace all “lower than” operators with “greater than” operators in
Algorithm 1.
At the start of Algorithm 1, an initial FSM form is saved as the best one (line 1). Next,
the subroutine for seeking couples for merging (building the set G) is executed (line 3).
If there is no possibility to merge the states, the algorithm moves to the splitting phase,
otherwise the trial merging is performed in the following way: first, the present FSM is
saved, merging is executed, then the states are encoded, and the critical path ratio for
current FSM is determined (lines 6–15). Among all solutions, the one is selected for which
the critical path ratio Si is minimal. After that, the real merging is performed, and a selection
of states for the next merging is executed once more (lines 16–21).
Appl. Sci. 2022, 12, 8134 9 of 19

Algorithm 1. General algorithm for FSM synthesis (power-aware merge-then-split strategy).


1: best_FSM ← FSM, last_FSM ← FSM
2: Si ← MAX_P_VALUE
3: G ← FindMergePairs(FSM)
4: WHILE G 6= ∅ DO
5: Sm ← MAX_S_VALUE
6: WHILE G 6= ∅ DO
7: Save(FSM)
8: FSM ← Merge(FSM, (as , at ) ∈ G)
9: Encode(FSM)
10: IF CriticalPath(FSM) < Sm THEN
11: Sm ← CriticalPath(FSM)
12: Selected_Pair ← (as , at )
13: END IF
14: Restore(FSM)
15: END WHILE
16: FSM ← Merge(FSM, Selected_Pair)
17: Last_FSM ← FSM
18: IF Sm < CriticalPath(best_FSM) THEN
19: best_FSM ← FSM, Si ← Sm
20: END IF
21: G ← FindMergePairs(FSM)
22: END WHILE
23: D ← FindSplitStates(FSM)
24: WHILE D 6= ∅ DO
25: WHILE D 6= ∅ DO
26: Save(FSM)
27: Encode(FSM)
28: FSM ← Split(FSM, ai ∈ D)
29: IF CriticalPath(FSM) < Si THEN
30: Ss ← CriticalPath(FSM)
31: Selected_State ← (ai )
32: END IF
33: Restore(FSM)
34: END WHILE
35: IF Ss < CriticalPath(Last_FSM) THEN
36: FSM ← Split(FSM, Selected_State)
37: Last_FSM ← FSM, No_Split ← FALSE
38: ELSE
39: No_Split ← TRUE
40: END IF
41: IF Ps < CriticalPath(best_FSM) THEN
42: best_FSM ← FSM, Si ← Ss
43: END IF
44: IF No_Split = FALSE THEN
45: D ← FindSplitStates(FSM)
46: ELSE
47: D←∅
48: END IF
49: END WHILE
50: END

After the merging phase, the subroutine for seeking states for splitting (building the
set D) is performed (line 23). If there is no possibility to split any states, the algorithm stops,
otherwise, the trial splitting is executed as follows. First, the present FSM is saved, splitting
is executed, then the states are encoded, and the critical path ratio of FSM is determined
(lines 25–34). Among all solutions, the one is selected for which the critical path ratio Si is
minimal. Then the real splitting is performed, and a selection of states for the next splitting
Appl. Sci. 2022, 12, 8134 10 of 19

is executed once more (lines 35–48). The final FSM form is the one with the lowest critical
path ratio from all considered equivalent forms during the work of the algorithm.
The splitting process may be divergent, and therefore the stop condition for splitting
is included in the algorithm. It is made in lines 35–40 of Algorithm 1, where the critical
path ratio Ss of the splitting FSM at this time is compared to the identical value determined
for the last completed splitting. If the splitting does not lead to a further decrease in the
critical path, it should not be executed.
In the split-then-merge (SM) strategy, there is always a splitting performed first, and
after all possible splits, the merging of states should be done. The algorithm for this strategy
can be obtained from Algorithm 1. The only operation which should be performed is to
replace lines 3–22 with lines 23–49 in Algorithm 1.
In the combined strategy (COMB), at each step, the trial merging and trial splitting is
performed. Then, the decision of which transformation (splitting or merging) finally should
be performed (depending on selected criteria) is made. The combined strategy with the
consideration of the balanced variant of optimization is described in the form of Algorithm 2.

Algorithm 2. General algorithm for FSM synthesis (balanced combined strategy).


1: best_FSM ← FSM, last_FSM ← FSM
2: G ← FindMergePairs(FSM)
3: D ← FindSplitStates(FSM)
4: Qi ← 0.5
5: WHILE G 6= ∅ and D 6= ∅ DO
6: Qm ← 0
7: WHILE G 6= ∅ DO
8: Save(FSM)
9: FSM ← Merge(FSM, (as , at ) ∈ G)
10: Encode(FSM)
11: IF TransformationRatio(FSM) > Qm THEN
12: Qm ← TransformationRatio(FSM)
13: Selected_Pair ← (as , at )
14: END IF
15: Restore(FSM)
16: END WHILE
17: Qs ← 0
18: WHILE D 6= ∅ DO
19: Save(FSM)
20: Encode(FSM)
21: FSM ← Split(FSM, ai ∈ D)
22: IF TransformationRatio(FSM) > Qi THEN
23: Qs ← TransformationRatio(FSM)
24: Selected_State ← (ai )
25: END IF
26: Restore(FSM)
27: END WHILE
28: IF Qm > Qs THEN
29: FSM ← Merge(FSM, SelectedPair)
30: Qi ← Qm
31: ELSE
32: IF Qs > TransformationRatio(Last_FSM) THEN
33: FSM ← Split(FSM, Selected_State)
34: Qi ← Q m
35: Last_FSM ← FSM, No_Split ← FALSE
36: ELSE
Appl. Sci. 2022, 12, 8134 11 of 19

Algorithm 2. Cont.
37: No_Split ← TRUE
38: END IF
39: END IF
40: IF Qi > TransformationRatio(best_FSM) THEN
41: best_FSM ← FSM
42: END IF
43: G ← FindMergePairs(FSM)
44: IF No_Split = FALSE THEN
45: D ← FindSplitStates(FSM)
46: ELSE
47: D←∅
48: END IF
49: END WHILE
50: END

At the start of Algorithm 2, an initial FSM form is stored as the best one (line 1).
Next, the subroutines for seeking couples for merging (building the set G) and states for
splitting (building the set D) are performed (lines 2–3). If merging or splitting the states
are impossible, the algorithm goes to the end, otherwise, the trial merging and splitting
procedures are executed as follows. At the beginning, the present FSM is stored and
next, merging or splitting procedures are executed, then the states are encoded, and the
transformation ratio of FSM is determined (lines 7–27). Among all solutions, the one is
selected for which the transformation ratio Qi = max(Qs , Qm ) is maximal, where Qs and
Qm are the transformation quality ratios for splitting and merging, respectively. Finally,
the real merging or splitting procedure is performed, and the subroutine for the selection
of states for the next merging or splitting is executed again (lines 40–47). The final FSM
form is the one with the highest transformation quality ratio from all considered equivalent
forms (lines 40–41).
The splitting process may be divergent, like in the previously mentioned strategies.
For that reason, the stop condition for splitting should be included. It is made in lines
32–38 of Algorithm 2.
After execution of one of the variants of the general algorithm of synthesis, the
minimization of the number of FSM transitions and minimization of the number of input
variables should be also made, if necessary, as it was explained in [39].

3. Results
The proposed three strategies for synthesis of FSMs were implemented as a part of a
system for the optimization of digital systems based on programmable logic devices. To
estimate the efficiency of the proposed strategies, we used MCNC FSM benchmarks [41].
Four methods of state assignment were investigated: binary, one-hot, JEDI (default output
dominant algorithm) [42] and power optimized sequential encoding [43]. For all three
strategies (MS, SM and COMB), three different optimization criterions were used: power
consumption, speed of operation and balanced variant with identical weights for power
and speed parameters (50%). If we also consider four types of encodings, we have 36
different variants of synthesis method considered in the paper.
The example experimental results for binary encoding and power oriented optimiza-
tion are presented in Table 1, where Name is a benchmark filename, C0 , S0 and P0 are,
respectively, the number of used logic elements (cost), maximum critical path described
by a number of logic levels (speed), and dissipated power in milliwatts of the initial FSM
before synthesis; C1 , S1 and P1 are, respectively, the cost, speed and dissipated power after
synthesis using the MS strategy; and C2 , S2 and P2 are, respectively, the cost, speed and
dissipated power after synthesis using the SM strategy. Finally, C3 , S3 and P3 are the same
parameters obtained using the COMB strategy. A power dissipation was evaluated using
the following values: output capacitance C = 3 pF, frequency f = 5 MHz, supply voltage
Appl. Sci. 2022, 12, 8134 12 of 19

VCC = 5 V, input probability P(xi = 1) = 0.5. Values of #MX and #SX are the numbers of
merges and splits performed during the procedure. Similar tables were made for other
variants, but only the statistical parameters are presented in this section.

Table 1. The experimental results for binary encoding and power-oriented optimization.

Initial FSM MS Strategy SM Strategy COMB Strategy


Name C0 S0 P0 C1 S1 P1 #M 1 #S1 C2 S2 P2 #M 2 #S2 C3 S3 P3 #M 3 #S3
BBARA 6 3 62.14 5 2 61.71 3 1 5 2 61.71 4 1 5 2 61.71 4 2
BBSSE 11 3 226.01 11 3 162.49 3 1 11 3 162.49 4 1 11 3 226.01 3 1
BBTAS 5 2 134.51 5 2 112.50 0 2 5 2 105.16 2 2 5 2 112.50 1 2
BEECOUNT 7 2 113.28 7 2 91.89 2 1 7 2 91.89 4 2 7 2 91.89 2 1
CSE 11 3 58.06 12 3 55.22 0 3 12 3 55.22 0 3 12 3 55.22 0 3
DK14 8 2 239.67 8 2 239.67 0 2 8 2 228.18 2 2 8 2 225.34 2 3
DK16 8 2 401.14 8 2 391.58 0 4 8 2 387.33 4 4 8 2 385.38 3 3
EX1 24 4 204.39 24 4 191.12 0 1 24 4 178.23 1 1 24 4 178.23 2 2
EX4 13 2 165.18 13 2 163.22 0 2 13 2 163.22 2 2 13 2 165.18 3 3
EX6 11 2 274.01 11 2 274.01 0 1 11 2 274.01 1 1 11 2 274.01 1 1
LION9 5 2 192.57 3 1 84.38 5 2 3 1 84.38 8 3 4 2 77.26 4 1
PLANET 25 3 422.84 25 3 351.51 0 8 25 3 351.51 8 8 25 3 351.51 8 8
S1 11 4 329.30 11 4 286.21 0 4 11 4 286.21 4 4 11 4 261.20 11 11
S1488 25 4 72.68 25 4 71.62 0 3 25 4 71.62 3 3 25 4 71.28 5 6
S1494 25 4 73.04 25 4 71.72 0 4 25 4 71.72 4 4 25 4 71.72 3 4
S27 4 2 192.75 4 2 156.78 1 1 4 2 161.50 3 2 4 2 156.78 2 1
S386 11 3 170.33 11 3 169.10 0 1 11 3 169.10 1 1 11 3 170.33 1 1
S420 7 3 150.00 7 3 107.81 0 1 7 3 107.81 1 1 7 3 150.00 1 1
S510 13 3 240.57 13 3 240.57 0 1 13 3 237.03 1 1 13 3 233.49 1 3
S832 24 4 134.16 24 4 134.14 0 2 24 4 134.14 2 2 24 4 134.14 2 2
SAND 14 4 215.62 15 4 182.12 0 2 15 4 182.12 2 2 15 4 182.12 2 2
SSE 11 3 226.01 11 3 162.49 3 1 11 3 162.49 4 1 11 3 226.01 3 1
TBK 8 4 263.16 9 4 228.21 0 11 9 4 228.21 0 11 9 4 228.21 0 11
TRAIN11 5 2 101.90 3 1 93.75 7 1 3 1 83.33 11 4 3 1 93.75 8 1

It can be seen in Table 1 that for power-oriented optimization in all investigated cases,
we have lesser or equal power consumption for the transformed FSMs than for the initial
FSM. It also can be noticed that the number of merges and splits depends on the used
strategy. It has the lowest average values for the MS strategy and significantly higher
values for the SM and COMB strategies.
To examine the efficiency of strategies with the application of different state assign-
ments and optimization directions, the gain/loss ratios were calculated. The gain/loss
ratio is a relation of value of the considered parameter for the initial FSM to value of the
considered parameter for the transformed FSM. The minimum, average and maximum
ratios for MS strategy are presented in Table 2. All values are the geometric mean of all
ratios calculated for each benchmark.
Table 2. The gain/loss ratios of results for merge-then-speed strategy.

Parameter Encoding Power Direction Speed Direction Balanced


Binary 1.00/1.15/2.28 0.65/1.02/2.05 1.00/1.15/2.28
Power One-hot 1.00/1.01/1.14 0.99/1.00/1.07 1.00/1.02/1.14
Min./Avg./Max. JEDI 1.00/1.13/1.58 0.38/0.92/1.21 0.69/1.08/1.45
Sequential 1.00/1.07/1.47 0.90/1.00/1.18 0.83/1.04/1.24
Binary 1.00/1.05/2.00 1.00/1.08/2.00 1.00/1.08/2.00
Critical Path One-hot 1.00/1.09/2.00 1.00/1.08/2.00 1.00/1.08/2.00
Min./Avg./Max. JEDI 1.00/1.00/1.00 1.00/1.08/2.00 1.00/1.08/2.00
Sequential 0.67/0.98/1.00 1.00/1.08/2.00 1.00/1.08/2.00
Binary 0.89/1.03/1.67 1.00/1.05/1.67 0.89/1.04/1.67
Area One-hot 0.96/1.10/2.40 0.89/1.03/2.00 0.89/1.04/2.00
Min./Avg./Max. JEDI 0.89/1.00/1.25 1.00/1.05/1.67 0.89/1.03/1.67
Sequential 0.89/0.97/1.25 1.00/1.05/1.67 0.89/1.02/1.67
Appl. Sci. 2022, 12, 8134 13 of 19

For the MS strategy, the average results acquired using the presented method are
in all cases better than the results obtained for the initial FSM regarding the parameters
corresponding to the optimization direction (e.g., power to power direction) in all styles of
encoding used. It can be also noticed that the encoding type, in many cases, has a major
influence on the result obtained using a specific optimization variant. When the speed
optimization is used, the estimated power consumption significantly increases in many
cases. It confirms that these two directions contradict each other. Similar observations can
be made for other two strategies, the results of which are shown in Table 3 (SM strategy)
and Table 4 (COMB strategy).

Table 3. The gain/loss ratios of results for speed-then-merge strategy.

Parameter Encoding Power Direction Speed Direction Balanced


Binary 1.00/1.15/2.30 0.82/1.02/2.05 1.00/1.16/2.28
Power One-hot 1.00/1.01/1.14 0.86/1.00/1.14 1.00/1.02/1.14
Min./Avg./Max. JEDI 1.00/1.17/1.64 0.38/0.96/1.22 0.69/1.13/1.45
Sequential 1.00/1.07/1.47 0.77/0.99/1.13 0.99/1.06/1.24
Binary 1.00/1.02/1.50 1.00/1.08/2.00 1.00/1.08/2.00
Critical Path One-hot 1.00/1.09/2.00 1.00/1.08/2.00 1.00/1.09/2.00
Min./Avg./Max. JEDI 1.00/1.00/1.00 1.00/1.08/2.00 1.00/1.05/2.00
Sequential 0.67/0.98/1.00 1.00/1.08/2.00 1.00/1.05/2.00
Binary 0.89/1.01/1.25 1.00/1.05/1.67 0.89/1.04/1.67
Area One-hot 0.96/1.10/2.40 0.89/1.04/2.00 0.96/1.09/2.00
Min./Avg./Max. JEDI 0.89/0.99/1.25 1.00/1.05/1.67 0.89/1.00/1.67
Sequential 0.89/0.97/1.25 1.00/1.05/1.67 0.89/1.00/1.67

Table 4. The gain/loss ratios of results for combined strategy.

Parameter Encoding Power Direction Speed Direction Balanced


Binary 1.00/1.11/2.49 0.83/1.01/1.45 1.00/1.12/2.49
Power One-hot 1.00/1.01/1.14 0.77/1.00/1.13 1.00/1.02/1.14
Min./Avg./Max. JEDI 1.00/1.14/1.62 0.38/0.92/1.35 0.69/1.08/1.45
Sequential 1.00/1.07/1.47 0.68/0.98/1.27 0.83/1.04/1.24
Binary 1.00/1.02/1.50 1.00/1.03/2.00 1.00/1.05/2.00
Critical Path One-hot 1.00/1.09/2.00 1.00/1.08/2.00 1.00/1.09/2.00
Min./Avg./Max. JEDI 1.00/1.00/1.00 1.00/1.03/2.00 1.00/1.08/2.00
Sequential 0.67/0.98/1.00 1.00/1.03/2.00 1.00/1.08/2.00
Binary 0.89/1.01/1.25 1.00/1.03/1.67 0.89/1.03/1.67
Area One-hot 0.96/1.10/2.40 0.89/1.05/2.00 0.96/1.08/2.00
Min./Avg./Max. JEDI 0.89/1.00/1.25 1.00/1.03/1.67 0.89/1.03/1.67
Sequential 0.89/0.98/1.25 1.00/1.03/1.67 0.89/1.02/1.67

The proposed strategies were also compared to methods described in our previous
works, where only the state merging procedure was considered. The results for the merging
strategy (M) are presented in Table 5. Additionally, besides the three described optimization
directions (power, speed and balanced), the state minimization direction [39] was examined.
As it can be noticed, adding the state splitting procedure to the method in most cases
increases the gain ratios for all examined parameters, i.e., power, speed, and area.
mization directions (power, speed and balanced), the state minimization direction [39]
was examined. As it can be noticed, adding the state splitting procedure to the method in
most cases increases the gain ratios for all examined parameters, i.e., power, speed, and
area.
Appl. Sci. 2022, 12, 8134 14 of 19
Table 5. The gain/loss ratios of results for merging only strategy.

Parameter Encoding Power Direction Speed Direction Balanced State min.


Table
Binary 5. The gain/loss ratios
1.00/1.07/2.28of results for merging only
1.00/1.05/2.05 strategy.
1.00/1.06/2.28 1.00/1.06/2.05
Power One-hot 1.00/1.01/1.14 0.99/1.00/1.07 1.00/1.01/1.14 0.92/1.01/1.14
Parameter Encoding Power Direction Speed Direction Balanced State min.
Min./Avg./Max. JEDI 1.00/1.07/1.58 0.99/1.05/1.26 0.91/1.05/1.26 0.99/1.05/1.26
Binary
Sequential 1.00/1.07/2.28
1.00/1.01/1.13 1.00/1.05/2.05
0.90/1.00/1.13 1.00/1.06/2.28
0.83/1.00/1.13 1.00/1.06/2.05
0.90/1.00/1.13
Power One-hot 1.00/1.01/1.14 0.99/1.00/1.07 1.00/1.01/1.14 0.92/1.01/1.14
Min./Avg./Max. Binary
JEDI 1.00/1.05/2.00
1.00/1.07/1.58 1.00/1.08/2.00
0.99/1.05/1.26 1.00/1.08/2.00
0.91/1.05/1.26 1.00/1.08/2.00
0.99/1.05/1.26
Critical Path One-hot
Sequential 1.00/1.09/2.00
1.00/1.01/1.13 1.00/1.09/2.00
0.90/1.00/1.13 1.00/1.09/2.00
0.83/1.00/1.13 1.00/1.09/2.00
0.90/1.00/1.13
Min./Avg./Max. JEDI
Binary 1.00/1.02/2.00
1.00/1.05/2.00 1.00/1.08/2.00
1.00/1.08/2.00 1.00/1.08/2.00
1.00/1.08/2.00 1.00/1.08/2.00
1.00/1.08/2.00
Critical Path Sequential
One-hot 1.00/1.02/1.50
1.00/1.09/2.00 1.00/1.08/2.00
1.00/1.09/2.00 1.00/1.08/2.00
1.00/1.09/2.00 1.00/1.08/2.00
1.00/1.09/2.00
Min./Avg./Max. JEDI
Binary 1.00/1.02/2.00
1.00/1.04/1.67 1.00/1.08/2.00
1.00/1.05/1.67 1.00/1.08/2.00
1.00/1.05/1.67 1.00/1.08/2.00
1.00/1.05/1.67
Area Sequential
One-hot 1.00/1.02/1.50
1.00/1.10/2.40 1.00/1.08/2.00
1.00/1.07/2.00 1.00/1.08/2.00
1.00/1.10/2.40 1.00/1.08/2.00
1.00/1.11/2.40
Min./Avg./Max. Binary
JEDI 1.00/1.04/1.67
1.00/1.03/1.67 1.00/1.05/1.67
1.00/1.05/1.67 1.00/1.05/1.67
1.00/1.05/1.67 1.00/1.05/1.67
1.00/1.05/1.67
Area One-hot 1.00/1.10/2.40 1.00/1.07/2.00 1.00/1.10/2.40 1.00/1.11/2.40
Sequential 1.00/1.02/1.25 1.00/1.05/1.67 1.00/1.05/1.67 1.00/1.05/1.67
Min./Avg./Max. JEDI 1.00/1.03/1.67 1.00/1.05/1.67 1.00/1.05/1.67 1.00/1.05/1.67
Sequential 1.00/1.02/1.25 1.00/1.05/1.67 1.00/1.05/1.67 1.00/1.05/1.67
To compare three distinct aspects considered in experiments (strategy, optimization
direction and encoding), the average values of all parameters were calculated in depend-
To compare three distinct aspects considered in experiments (strategy, optimization
ance on the
direction anddifferent point
encoding), of view.values
the average In Figure
of all1a, we can see
parameters werethat the MS strategy
calculated is best
in dependance
for speed and area optimization, and the SM strategy for power optimization. We can
on the different point of view. In Figure 1a, we can see that the MS strategy is best for speed also
see that
and area all strategies produce
optimization, and thebetter results for
SM strategy thanpower
for the initial FSMs, We
optimization. notcan
onlyalso
in the
seecases
that
of speed and power, but also for the area parameter.
all strategies produce better results than for the initial FSMs, not only in the cases of speed
and power, but also for the area parameter.

(a) (b)
Figure 1. Comparison of average results: (a) for different strategies; (b) for different optimization
Figure 1. Comparison of average results: (a) for different strategies; (b) for different optimization
directions.
directions.

Figure 1b shows a comparison of the results for different optimization directions.


It confirms that the power direction gives better results in the power aspect, and speed
direction, in the speed aspect. It can be also noticed that balanced optimization gives
satisfactory results in all three aspects (power, speed, and area).
In the case of the M strategy (from previous works), the results in terms of speed and
area were significantly worse than those obtained from the MS, SM and COMB strategies.
Only the average power dissipation was similar to the value obtained using the COMB
strategy (Figure 1a). The state minimization direction gives the best results for the area
parameter and comparable results for power and speed parameters (Figure 1b).
Figure 2 shows a comparison of the results for different encoding methods. It shows
that one-hot encoding style gives the best results in terms of speed, as predicted. The
most suitable encoding methods for power optimization are the binary and JEDI methods.
The sequential assignment, which was designed especially for power optimization, gives
moderate results in terms of power consumption. Using all encoding types, the average
results for the transformed FSM are always better than those of the initial FSM.
parameter and comparable results for power and speed parameters (Figure 1b).
Figure 2 shows a comparison of the results for different encoding methods. It shows
that one-hot encoding style gives the best results in terms of speed, as predicted. The most
suitable encoding methods for power optimization are the binary and JEDI methods. The
sequential assignment, which was designed especially for power optimization, gives
Appl. Sci. 2022, 12, 8134 moderate results in terms of power consumption. Using all encoding types, the average 15 of 19
results for the transformed FSM are always better than those of the initial FSM.

Figure 2. Comparison of average results for different encoding styles.


Figure 2. Comparison of average results for different encoding styles.
To check the effectiveness of the proposed strategies, the benchmarks converted by the
To check the effectiveness of the proposed strategies, the benchmarks converted by
proposed synthesis method were also synthesized and implemented using Intel Quartus
the
Primeproposed
and Xilinx synthesis
(AMD) method were Four
Vivado tools. also scenarios
synthesizedwereand implemented using Intel
chosen:
Quartus Prime and Xilinx (AMD) Vivado tools. Four scenarios were chosen:
• Initial FSM with default encoding (provided by Quartus Prime or Vivado);
•• Initial
PowerFSM with with
direction default encoding
JEDI encoding; (provided by Quartus Prime or Vivado);
•• Power direction with JEDI encoding;
Speed direction with one-hot and binary encoding;
•• Speed
Balanceddirection
variantwith
withone-hot
binary and binary encoding;
encoding.
• Balanced variant with binary encoding.
All scenarios were performed for all three considered strategies (MS, SM and COMB).
All scenarios
All benchmarks were
were performedusing
synthesized for allthe
three
sameconsidered strategies
design flow (MS, (balanced
parameters SM and COMB). mode).
All benchmarks were synthesized using the same design flow
The Quartus Prime and Vivado tools have also their own optimization procedures for parameters (balanced
mode). The Quartus
performance, Prime
area, and powerand parameters,
Vivado toolsbut have also
they their own
mostly optimization
operate in the phase procedures
of fitting
for
and routing. The equivalent conversions of FSM do not consider this phase; theyphase
performance, area, and power parameters, but they mostly operate in the operate of
fitting and routing. The equivalent conversions of FSM do not consider
only in the pre-synthesis stage. For this reason, we decided to use default compiler this phase; they
operate
parametersonlyfor
in the
all pre-synthesis
examinations.stage. ThreeFor this reason,
output valueswe decided
were carriedto out
use default compiler
from report files
parameters for all examinations.
for further analysis: Three output
total logic elements, maximum values were
clock carried out
frequency and from report files
total power. The
for further
authors analysis:
chose for thetotal logic elements,
implementation themaximum clock frequency
EP4CE115F29I8L FPGA device and from
total the
power. The
Cyclone
authors chose
IV E family for the
(Intel) andimplementation
XC7A35TSCG324-1 the EP4CE115F29I8L
FPGA device from FPGA device
Artix-7 fromThe
(Xilinx). the Cyclone
example
IV E family (Intel) and XC7A35TSCG324-1 FPGA device from Artix-7
results for MS strategy are shown in Table 6, where C0 , F0 and P0 are, respectively, the (Xilinx). The exam-
cost
ple results for MS strategy
of implementation (number areofshown in Table
used logic 6, where
elements), C0, F0 andfrequency
maximum P0 are, respectively,
(in MHz) and the
cost of implementation
dissipated (number of
power (in milliwatts) of used logic FSM
the initial elements), maximum
(without frequencyC(in
transformation); 1 , FMHz)
1 and
and dissipated
P1 are, power
respectively, the(in milliwatts)
same parameters of the initial
after powerFSM (without
direction transformation);
transformation withCJEDI1, F 1

and P1 are,Crespectively,
encoding; ,
2 2F and P 2 the
are, same parameters
respectively, after
identical power
values direction
after using transformation
speed direction with
variant
with encoding;
JEDI one-hot encoding;
C2, F2 andand finally,
P2 are, C3 , F3 andidentical
respectively, P3 are, respectively, the same
values after using speedparameters
direction
after synthesis, using the balanced variant with binary encoding.
variant with one-hot encoding; and finally, C3, F3 and P3 are, respectively, the same pa-
The after
rameters comparison
synthesis,of using
average thevalues
balanced of all investigated
variant parameters
with binary encoding. for all scenarios
and variants in the case of using the Quartus Prime tool for implementation is depicted
in Figure 3. It can be noticed that the proposed strategies can be successfully used with
the Quartus Prime tool. The worst results were obtained using the power optimization
direction. Poor results in terms of power arise from the fact that the optimized power
parameter (dynamic power) is significantly less than the static device power and has
minimal influence on the total device power. Although all considered scenarios were
optimized for speed or power, the most significant gain was noticed for the area parameter.
39 359.32 135.13 58 317.56 132.59 39 359.32 135.13 39 359.32 135.13
66 420.88 132.1 66 423.73 132.11 64 418.06 133.99 70 360.36 132.12
216 157.23 143.14 158 187.06 143.12 176 200.56 135.16 171 191.31 143.38
27 513.61 134.12 24 534.76 136.34 25 460.62 136.35 28 577.37 136.36
43 308.17 137.38
Appl. Sci. 2022, 12, 8134 49 279.64 133.47 53 351.25 133.48 43 308.17 137.38 16 of 19
20 490.2 132.1 13 500 131.06 11 506.33 131.06 7 526.87 131.04
128 432.53 146.81 131 423.91 147.4 125 454.34 138.19 145 335.68 147.66
131 197.71 138.19 142Table193.84 138.22
6. The example 134 implementation
results after 195.16 138.21 183 Prime
using Quartus 190.73 138.4
tool for MS strategy.
364 168.8 147.58 443 152.95 148.14 346 169.87 147.61 432 151.91 148.27
328 162.84 147.61 Initial
364FSM 166.17 Power Direction
147.62 344 174.92 Speed
147.61Direction
408 159.41 Balanced
148.15
Default Encoding JEDI Encoding One-Hot Encoding Binary Encoding
46 358.29 135.83 45 347.58 131.84 46 321.23 135.82 45 316.56 131.83
Name C0 F0 P0 C1 F1 P1 C2 F2 P2 C3 F3 P3
135 173.55 136.79 127 175.25 137.76 127 193.5 136.84 121 176.55 136.48
BBARA 24 416.15 132.87 23 425.89 132.88 21 424.09 132.6 21 418.59 132.88
71 BBSSE648.09 142.7
37 79
332.01 515.46 40142.72387.9 71 135.59
135.57 641.85 40 140.28
386.55 77135.6 650.62
41 142.71
386.7 135.68
BBTAS
247 144.26 9
145.79 680.27 131.98
237 146.54 9146.13 693 131.98
212 161.08 10 657.03
145.72 131.98
208 160.08 11 657.89
145.86 131.99
BEECOUNT 31 392.62 131.71 28 471.92 133.07 25 472.81 134.07 22 471.48 133.93
221 CSE 164.1 140.45
108 218 135.46
225.02 178.57 94140.7191.2 198 135.48 189.86 97 140.75
215.56 183 133.18 199.48
118 140.8 131.64
201.09
37 DK14 332.01 135.57
39 40
359.32 387.9
135.13 58135.59317.56 40 132.59386.55 39 135.6
359.32 41
135.13 386.739 135.68
359.32 135.13
DK16 66 420.88 132.1 66 423.73 132.11 64 418.06 133.99 70 360.36 132.12
194 EX1 182.58 139.6
216 217 143.14
157.23 169.09 158139.65187.06 206 143.12 174.28176 139.59
200.56 207 135.16 171.03
171 139.57
191.31 143.38
19 515.2
EX4 132.07
27 9
513.61 577.7
134.12 24131.06534.76 8 136.34 577.03 25 131.05
460.62 7136.35 578.03
28 131.05
577.37 136.36
EX6 43 308.17 137.38 49 279.64 133.47 53 351.25 133.48 43 308.17 137.38
LION9 20 490.2 132.1 13 500 131.06 11 506.33 131.06 7 526.87 131.04
PLANET The128
comparison
432.53 of146.81
average 131values of all investigated
423.91 147.4 125 parameters
454.34 for all 145
138.19 scenarios
335.68and 147.66
S1 131 197.71 138.19 142 193.84 138.22 134 195.16 138.21 183 190.73 138.4
variants364
S1488 in the168.8
case of147.58
using the443 Quartus
152.95 Prime
148.14 tool346for implementation
169.87 147.61 is
432depicted
151.91 in 148.27
S1494
Figure 3.328It can162.84 147.61 that364
be noticed 166.17
the proposed 147.62
strategies344 can 174.92 147.61
be successfully 408
used 159.41 the 148.15
with
S27 46 358.29 135.83 45 347.58 131.84 46 321.23 135.82 45 316.56 131.83
Quartus135
S386 Prime173.55
tool. The worst 127
136.79 results175.25
were obtained
137.76 using the
127 193.5power optimization
136.84 121 direc- 136.48
176.55
S420 71 648.09 142.7 79 515.46
tion. Poor
S510 247
results in terms
144.26 145.79
of power
237
arise
146.54
from142.72
146.13
71
the fact 212
that 641.85
the 140.28
optimized
161.08 145.72
power77
208
650.62
parameter
160.08
142.71
145.86
(dynamic
S832 221power) 164.1is significantly
140.45 218 less178.57
than the140.7static device
198 power
189.86 and has minimal
140.75 183 influ- 140.8
199.48
SAND 37 332.01 135.57 40 387.9 135.59 40 386.55 135.6 41 386.7 135.68
SSEence on 194
the total
182.58device 139.6power.
217 Although
169.09 all considered
139.65 206 scenarios
174.28 were optimized
139.59 207 171.03 for 139.57
speed or19power,
TRAIN11 the most
515.2 132.07significant
9 gain
577.7 was noticed8for the
131.06 area parameter.
577.03 131.05 7 578.03 131.05

Figure 3. Comparison of average results after implementation using Quartus Prime tool.
Figure 3. Comparison of average results after implementation using Quartus Prime tool.
A similar comparison of average values of area, speed and power parameters for all
A similar comparison of average
scenarios and variants values of area,
in the case speed and power
of implementation parameters
using the foris all
Vivado tool presented
scenarios and variants in the case of implementation using the Vivado tool is presented inused in
in Figure 4. It can be noticed that the proposed strategies can be successfully
Figure 4. It can bemost casesthat
noticed alsothe
withproposed
the Vivadostrategies
tool. The worst
can beresults were obtained
successfully usedusing
in mosta balanced
optimization direction in terms of area, but it was not the optimization goal. Similarly, as
for the Quartus Prime tool, the lack of significant gain in terms of power is due to the fact
that the optimized dynamic power is considerably less than the total device power.
cases also with the Vivado tool. The worst results were obtained using a balanced optimi-
zation direction in terms of area, but it was not the optimization goal. Similarly, as for the
Quartus
Appl. Sci. 2022, 12, 8134 Prime tool, the lack of significant gain in terms of power is due to the fact that 17 of 19
the optimized dynamic power is considerably less than the total device power.

Figure 4. Comparison of average results after implementation using Vivado tool.


Figure 4. Comparison of average results after implementation using Vivado tool.
4. Conclusions
4. Conclusions The transformation of finite state machines is an important phase in the FSM synthesis
process.ofTypically,
The transformation finite state state merging is
machines is an
used for the reduction
important phase inofthe theFSMnumber of memory
synthe-
elements, but state splitting can be useful for the minimization of the power dissipation and
sis process. Typically, state merging is used for the reduction of the number of memory
the critical path of the combinational part of FSM. In this paper, three different strategies
elements, but stateforsplitting can be usefulwere
FSM transformations for presented.
the minimization In addition,of the
the power
offered dissipation
approach allows to
and the critical path of the
reduce notcombinational
only the number part of FSM.
of FSM statesInandthisoptimize
paper, FSM
threekey different strat-but also
parameters,
egies for FSM transformations
to minimize thewere number presented. In addition,
of FSM transitions the offered
and input variablesapproach
due to theallowsapplication of
to reduce not onlyadditional
the number algorithms.
of FSM states and optimize FSM key parameters, but also
The merge-then-split strategy is the best solution for speed and area optimization, and
to minimize the number of FSM transitions and input variables due to the application of
the speed-then-merge strategy is better for power optimization. It can be noticed that all
additional algorithms.
strategies produce better results than initial FSMs, not only in terms of speed and power,
The merge-then-split strategy
but also taking is the best solution
into consideration the cost offor speed and area optimization,
implementation.
and the speed-then-merge strategy
The most is better
important for power
objective of this optimization.
approach is notIt tocan
findbe thenoticed that
FSM representation
with a minimal number of states, but to find such
all strategies produce better results than initial FSMs, not only in terms of speed and a form of the FSM that has the optimal
value of the considered parameter,
power, but also taking into consideration the cost of implementation. e.g., speed, power consumption or transformation
quality ratio (weights sum of the speed and power parameters). One of the most significant
The most important objective of this approach is not to find the FSM representation
conclusions from the research is that the finite state machine with a minimum number of
with a minimal number
states isof states,
not, in manybutcases,
to find such result
the finest a form in of the to
regard FSM that has
the power the optimal
consumption and speed.
value of the considered parameter, e.g., speed, power consumption or transformation
The implemented method can be successfully used with commercial EDA tools, such as
quality ratio (weights
Intel sum
Quartus of Prime
the speed and power
and Xilinx parameters).
(AMD) Vivado. Onehave
These tools of the
theirmost signifi-
own implementation
cant conclusions from the research is that the finite state machine with a minimum num-of FSMs;
optimization techniques, but they do not perform the equivalent transformations
ber of states is not,they
in only
manycan change
cases, thethe style result
finest of the state assignment
in regard to theorpower
minimize the logic functions
consumption
(output and transition functions) at the stage of logic synthesis. Most of the optimization
and speed.
work in EDA industrial software is performed at post-synthesis stages, such as fitting and
The implemented
routing,method can
so it will bebe successfully
also used for state used with transformed
machines commercialwith EDA thetools, suchmethod.
proposed
as Intel Quartus PrimeInand Xilinx (AMD) Vivado. These tools have their own
the proposed approach, only the merging of a couple of states is considered, and implemen-
tation optimizationstates
techniques,
are split intobutonly
theytwodostates.
not perform
The algorithmthe equivalent transformations
can be reworked to merge a larger of group
FSMs; they only can change the style of the state assignment or minimize the logic func- of the
of states and to split a state to two or more states. Moreover, the additional perfection
tions (output and offered
transition method can be achieved
functions) by taking
at the stage into synthesis.
of logic account incompletely
Most of the specified
optimi-values for
the transition functions as supplementary conditions for the merging or splitting possibility
zation work in EDA industrial software is performed at post-synthesis stages, such as fit-
of the FSM internal states.
ting and routing, so it will be also used
The application forautomatic
of the state machines
selection transformed
of strategy andwithweights theforproposed
balanced variant
method. is also considered in such a way as to find the FSM that is optimal in terms of all criteria
In the proposed
taken approach,
into account. onlyForthe
thatmerging
purpose, of theausing
couple of states
of some is considered,
artificial and e.g.,
intelligence methods,
neural networks, could be useful.
states are split into only two states. The algorithm can be reworked to merge a larger
group of states and to split a state to two or more states. Moreover, the additional perfec-
tion of the offered method can be achieved by taking into account incompletely specified
values for the transition functions as supplementary conditions for the merging or split-
ting possibility of the FSM internal states.
Appl. Sci. 2022, 12, 8134 18 of 19

Author Contributions: Conceptualization, A.K. and V.S.; methodology, A.K.; software, A.K.; vali-
dation, A.K. and V.S.; formal analysis, V.S.; investigation, A.K.; writing—original draft preparation,
A.K.; supervision, V.S. All authors have read and agreed to the published version of the manuscript.
Funding: The work was supported by the WZ/WI-IIT/4/2020 grant from Bialystok University
of Technology and funded with resources for research by the Ministry of Education and Science
in Poland.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The address of benchmark data set used in this paper is as follows:
https://ddd.fit.cvut.cz/www/prj/Benchmarks/MCNC.7z (accessed on 1 July 2022).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Pfleeger, C.F. State reduction in incompletely specified finite state machines. IEEE Trans. Comput. 1973, C-22, 1099–1102. [CrossRef]
2. Pena, J.M.; Oliveira, A.L. A new algorithm for exact reduction of incompletely specified finite state machines. IEEE Trans.
Comput.-Aided Des. 1999, 18, 1619–1632. [CrossRef]
3. Gören, S.; Ferguson, F. On state reduction of incompletely specified finite state machines. Comput. Electr. Eng. 2007, 33, 58–69.
[CrossRef]
4. Rho, J.-K.; Hachtel, G.; Somenzi, F.; Jacoby, R. Exact and heuristic algorithms for the minimization of incompletely specified state
machines. IEEE Trans. Comput. Aided Des. 1994, 13, 167–177.
5. Avedillo, M.J.; Quintana, J.M.; Huertas, J.L. SMAS: A program for concurrent state reduction and state assignment of finite state
machines. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Singapore, 11–14 June 1991;
pp. 1781–1784.
6. Yuan, L.; Qu, G.; Villa, T.; Sangiovanni-Vincentelli, A. An FSM reengineering approach to sequential circuit synthesis by state
splitting. IEEE Trans. Comput. Aided Des. 2008, 27, 1159–1164. [CrossRef]
7. Grzes, T.N.; Solov’ev, V.V. Minimization of Power Consumption of Finite State Machines by Splitting Their Internal States.
J. Comput. Syst. Sci. Int. 2015, 54, 367–374. [CrossRef]
8. Avedillo, M.J.; Quintana, J.M.; Huertas, J.L. State merging and state splitting via state assignment: A new FSM synthesis algorithm.
IEE Proc. Comput. Digital Tech. 1994, 141, 229–237. [CrossRef]
9. Czerwinski, R.; Kania, D. Synthesis method of high speed finite state machines. Bull. Pol. Acad. Sci. Tech. Sci. 2010, 4, 635–644.
[CrossRef]
10. Glaser, J.; Damm, M.; Haase, J.; Grimm, C. TR-FSM: Transition-based reconfigurable finite state machine. ACM Trans. Reconfig.
Technol. Syst. (TRETS) 2011, 3, 23. [CrossRef]
11. Garcia-Vargas, I.; Senhadji-Navarro, R. Finite state machines with input multiplexing: A performance study. IEEE Trans. Comput.
Aided Des. Integr. Circ. Syst. 2015, 5, 867–871. [CrossRef]
12. Senhadji-Navarro, R.; Garcia-Vargas, I. High-performance architecture for binary-tree-based finite state machines. IEEE Trans.
Comput. Aided Des. 2018, 37, 796–805. [CrossRef]
13. Senhadji Navarro, R.; García Vargas, I. Finite Virtual State Machines. IEICE Trans. Inf. Syst. 2012, E-95-D, 2544–2547. [CrossRef]
14. Pedroni, V.A. Introducing deglitched-feedback plus convergent encoding for straight hardware implementation of asynchronous
finite state machines. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Lisbon, Portugal,
24–27 May 2015; pp. 2345–2348.
15. De Faria Barbosa, F.T.; De Oliveira, D.L.; Curtinhas, T.S.; De Abreu Faria, L.; De Souza Luciano, J.F. Implementation of Locally-
Clocked XBM State Machines on FPGAs Using Synchronous CAD Tools. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64,
1064–1074. [CrossRef]
16. Solov’ev, V.V. Synthesis of Fast Finite State Machines on Programmable Logic Integrated Circuits by Splitting Internal States.
J. Comput. Syst. Sci. Int. 2022, 61, 360–371. [CrossRef]
17. Tao, Y.; Wang, Q.; Zhang, Y. Genetic Fuzzy c-mean clustering-based decomposition for low power FSM synthesis. In Proceedings
of the IEEE Congress on Evolutionary Computation (CEC), San Sebastian, Spain, 5–8 June 2017; pp. 642–648.
18. Tao, Y.Y.; Zhang, L.J.; Wang, Q.Y.; Chen, R.; Zhang, Y.Z. A multi-population evolution strategy and its application in low
area/power FSM synthesis. Nat. Comput. 2019, 18, 139–161. [CrossRef]
19. Li, S.; Choi, K. A high performance low power implementation scheme for FSM. In Proceedings of the International SoC Design
Conference (ISOCC), Jeju, Korea, 3–6 November 2014; pp. 190–191.
20. Riahi Alam, M.; Salehi Nasab, M.E.; Fakhraie, S.M. Power Efficient High-Level Synthesis by Centralized and Fine-Grained Clock
Gating. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 1954–1963. [CrossRef]
21. Nag, A.; Das, S.; Pradhan, S.N. Low-power FSM synthesis based on automated power and clock gating technique. J. Circuits Syst.
Comput. 2019, 28, 1920003. [CrossRef]
Appl. Sci. 2022, 12, 8134 19 of 19

22. Sait, S.M.; Oughali, F.C.; Arafeh, A.M. FSM State-Encoding for Area and Power Minimization Using Simulated Evolution
Algorithm. J. Appl. Res. Technol. 2012, 10, 845–858. [CrossRef]
23. Wang, L.-Y.; Chu, Z.-F.; Xia, Y.-S. Low Power State Assignment Algorithm for FSMs Considering Peak Current Optimization.
J. Comput. Sci. Technol. 2013, 28, 1054–1062. [CrossRef]
24. Kubica, M.; Opara, A.; Kania, D. Logic Synthesis Strategy Oriented to Low Power Optimization. Appl. Sci. 2021, 11, 8797.
[CrossRef]
25. Kajstura, K.; Kania, D. Low Power Synthesis of Finite State Machines State Assignment Decomposition Algorithm. J. Circuits Syst.
Comput. 2018, 27, 1850041. [CrossRef]
26. Xia, Y.; Almaini, A.E.A. Genetic algorithm based state assignment for power and area optimization. IEE Proc. Comput. Digit. Tech.
2002, 149, 128–133. [CrossRef]
27. Chaudhury, S.; KrishnaTejaSistla, K.T.; Chattopadhyay, S. Genetic algorithm based FSM synthesis with area-power trade-offs.
Integr. VLSI J. 2009, 42, 376–384. [CrossRef]
28. Chattopadhyay, S.; Yadav, P.; Singh, R.K. Multiplexer targeted finite state machine encoding for area and power minimization. In
Proceedings of the IEEE India Annual Conference, Kharagpur, India, 20–22 December 2004; pp. 12–16.
29. Aiman, M.; Sadiq, S.M.; Nawaz, K.F. Finite state machine state assignment for area and power minimization. In Proceedings of
the IEEE International Symposium on Circuits and Systems (ISCAS), Island of Kos, Greece, 21–24 May 2006; pp. 5303–5306.
30. Kubica, M.; Kania, D. Area-oriented technology mapping for LUT-based logic blocks. Int. J. Appl. Math. Comput. Sci. 2017, 27,
207–222. [CrossRef]
31. Barkalov, A.; Titarenko, L.; Mielcarek, K. Improving characteristic of LUT based Mealey FSMs. Int. J. Appl. Math. Comput. Sci.
2020, 30, 745–759.
32. Barkalov, A.; Titarenko, L.; Chmielewski, S. Improving Characteristics of LUT-Based Moore FSMs. IEEE Access 2020, 8,
155306–155318. [CrossRef]
33. Barkalov, A.; Titarenko, L.; Chmielewski, S. Mixed encoding of collections of output variables for LUT-based mealy FSMs.
J. Circuits Syst. Comput. 2019, 28, 1950131. [CrossRef]
34. Klimowicz, A. Area Targeted Minimization Method of Finite State Machines for FPGA Devices. In Computer Information Systems
and Industrial Management. CISIM 2018; Saeed, K., Homenda, W., Eds.; Lecture Notes in Computer Science 2018; Springer: Cham,
Switzerland, 2018; Volume 11127, pp. 370–379.
35. Klimowicz, A.; Grzes, T. Combined State Merging and Splitting Procedure for Low Power Implementations of Finite State
Machines. In Advances in Systems Engineering. ICSEng 2021; Borzemski, L., Selvaraj, H., Światek, ˛ J., Eds.; Lecture Notes in
Networks and Systems 2022; Springer: Cham, Switzerland, 2022; Volume 364, pp. 190–199.
36. Zakrevskij, A.D. Logic Synthesis of Cascade Circuits; Izdatel’stvo Nauka: Moscow, Russia, 1981. (In Russian)
37. Klimowicz, A. Combined State Splitting and Merging for Implementation of Fast Finite State Machines in FPGA. In Computer
Information Systems and Industrial Management. CISIM 2020; Saeed, K., Dvorský, J., Eds.; Lecture Notes in Computer Science;
Springer: Cham, Switzerland, 2020; Volume 12133, pp. 65–76.
38. Zadeh, L.A. Optimality and non-scalar-valued performance criteria. IEEE Trans. Automat. Control 1963, AC-8, 59–60. [CrossRef]
39. Klimowicz, A.S.; Solov’ev, V.V. Minimization of incompletely specified mealy finite-state machines by merging two internal states.
J. Comput. Syst. Sci. Int. 2013, 52, 400–409. [CrossRef]
40. Salauyou, V. Synthesis of High-Speed Finite State Machines in FPGAs by State Splitting. In Computer Information Systems
and Industrial Management. CISIM 2016; Saeed, K., Homenda, W., Eds.; Lecture Notes in Computer Science; Springer:
Berlin/Heidelberg, Germany, 2016; Volume 9842, pp. 741–751.
41. Yang, S. Logic Synthesis and Optimization Benchmarks User Guide. Version 3.0; Technical Report; Microelectronics Center of North
Carolina: Research Triangle Park, NC, USA, 1991.
42. Lin, B.; Newton, R.A. Synthesis of multiple level logic from symbolic high-level description languages. In Proceedings of the
International Conference on VLSI, Cambridge, MA, USA, 2–4 October 1989; pp. 187–196.
43. Grzes, T.N.; Solov’ev, V.V. Sequential algorithm for low-power encoding internal states of finite state machines. J. Comput. Syst.
Sci. Int. 2014, 53, 92–99. [CrossRef]

You might also like