You are on page 1of 90

Autonomous Voltage Control for Grid Operation Using Hardware Implementation of

Reinforcement Learning

Rizky Ardi M (13217054)


Devananda (13217061)
Kevin Sutardi (13217088)
School of Electrical Engineering and Informatics ITB
Abstrak lamp that is only used at night. Thus, various
problems will arise with the power system model
Abstract- Currently, the power system requires a manual
being made. In general, changes in load power
regulator to make the voltage at a stable level. This is due to
can create conditions where overvoltage and
the change in the value of the load used by customers
under-voltage occurs. Therefore, it is necessary to
frequently. However, there are various other technologies that
change the parameters of the power system model
can be applied to overcome this problem. One method is to use
to prevent overvoltage and under-voltage
artificial intelligence, especially reinforcement learning. This
conditions that can damage the load.
method can be applied using an FPGA and is useful in
overcoming power system problems. Therefore, in this Currently, this voltage regulation process is
experiment, a hardware design process based on Verilog, C- regulated using a specific system and uses officers
based software design, memory simulation, and system for this regulatory process. However, along with
integration on zybo was carried out. The results that have the development of technology, there are many
been successfully carried out are the first to third processes. models to solve this problem. One model that can
The process of integrating the system on zybio has been tried, resemble human capabilities is by using artificial
but the display that is issued is not as desired because it gives intelligence. AI itself has various forms such as
an inappropriate display on the terminal. machine learning, deep learning, and
reinforcement learning. Reinforcement learning is
Keywords: Reinforcement Learning, Hardware,
a method that performs learning using a Q-table
Software, Memory, System Integration
to analyze the actions that the system must take
1. INTRODUCTION based on its current conditions to achieve its
goals. One common model of reinforcement
Currently, electrical energy is a source of learning is the maze solver which has a clear
energy that is widely used by humans. However, representation of the state and its actions.
to be able to produce an electric power source, it Therefore, to realize the hardware architecture
is necessary to carry out a series of processes in verification, this application model can be used.
order to obtain the appropriate power and
voltage. Then there is the issue of electric power Thus, reinforcement learning can be used to
transmission. As is well known, the generation of solve the stress repair system more effectively
electric power requires a precise area at once and because the learning process does not require
is quite dangerous for humans. Therefore, the humans to operate it. This can be done by making
location of the electric generator will be far from a the state as the current-voltage condition and
load of its use. Therefore, there must be a action are taken to change the voltage of the
mechanism needed to transmit power. generator and transformer taps. Next, it is
necessary to carry out the process of realization of
A simple model of the power transmission the reinforcement learning system.
process is to model it with generators,
transformers, line impedances, and electrical A simple model for this realization process is
loads. Generally, the electricity generated in the to use an FPGA that can synthesize the
generator will be increased by using a step-up reinforcement learning architecture created. The
transformer. Then this voltage will be transmitted use of FPGA to realize machine learning is
using a cable to the area close to the settlement expected to provide better results and
and the voltage is again lowered by using a step- performance at speed compared to using high-
down transformer. This process is carried out to level computing. By using this combination of
reduce the loss of power in the transmission hardware and software architectures, optimum
process due to resistance cable lines. results will be obtained.

Basically, the load behavior that arises due to Thus, in this experiment, a design process
the use of energy will change and change. Like a will be carried out to produce a reinforcement
learning architecture on the hardware that will be
used to perform Q table calculations and will be
integrated with software that determines the
state. Therefore, for this experiment the objectives
to be achieved are as follows.
1. Design hardware reinforcement
learning based on Verilog and verify it
with ModelSim.
2. Creating a software design based on the
C language related to the application
used and conducting verification
3. Doing memory design on zybo and [3]
F IGURE 1 R EINFORCEMENT LEARNING FRAMEWORK
verifying it.
4. Integrate memory, hardware, and 2.2 DESAIN VERILOG DAN MODELSIM
software to solve problems.
Verilog is a hardware description
2. LITERATURE REVIEW language (HDL) used to model electronic
systems. This language is most often used in the
2.1 REINFORCEMENT LEARNING design and verification of digital circuits at the
transfer level of register abstraction. Apart from
Reinforcement learning is a machine that, Verilog is also used in the verification of
learning technique that deals with the process of analog circuits and mixed-signal circuits, as well
software agents taking action. Reinforcement as in the design of genetic circuits. In 2009, the
learning is one of the three basic machine learning Verilog standard was merged into the
paradigms, apart from supervised learning and SystemVerilog standard, resulting in the IEEE
unsupervised learning. 1800-2009 Standard. Since then, Verilog has
This technique differs from supervised officially become part of the SystemVerilog
learning does not require labeled input/output language. The current version is the IEEE 1800-
pairs to be generated, and it does not require sub- 2017 standard.
optimal actions to be corrected explicitly. Instead, Hardware description languages such as
the focus is on finding a balance between Verilog are similar to software programming
exploration (uncharted territory) and exploitation languages in that they include ways of describing
(current knowledge). propagation time and signal strength (sensitivity).
The environment in RL is usually There are two types of assignment operators,
expressed in the form of a Markov decision namely blocking tasks (=), and non-blocking tasks
process (MDP) because many RL algorithms for (<=). Non-blocking assignments allow designers
this context use dynamic programming to describe machine state updates without
techniques. The main difference between classical needing to declare and use temporary storage
dynamic programming methods and RL variables. Since this concept is part of the
algorithms is that RL algorithms do not assume semantics of the Verilog language, designers can
knowledge of the exact mathematical model of quickly write large circuit descriptions in a
MDP and they target large MDPs, where the exact relatively concise and concise form. At the time of
method is not feasible. the introduction of Verilog, this language
represented a tremendous increase in
productivity for circuit designers who had used
graphical schema capture software and specially
written software programs to document and
simulate electronic circuits.
Verilog designers wanted a language
with a syntax similar to the C programming
language, which is already widely used in
engineering software development. Like C,
Verilog is case sensitive and has a basic
preprocessor (although it is less sophisticated

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 2


than ANSI C / C ++). The control flow keywords 2.3 MODELSIM
(if / else, for, while, case, etc.) are the same, and
the operator priority is C compatible. Syntax
differences include the bit-width required for
variable declarations, procedural block
demarcation (Verilog uses begin/end instead of
curly braces {}), and many other minor
differences. Verilog requires that variables be
given a definite size. In C, this size is assumed to
be of the variable 'type' (eg an integer type might
be 8 bits).
F IGURE 2 M ODELSIM LOGO
Verilog design consists of a hierarchy of
modules. Modules encapsulate a design hierarchy ModelSim is a multi-language environment by
and communicate with other modules via a series Mentor Graphics, for simulating hardware
of input, output, and two-way declared ports. description languages such as VHDL, Verilog,
Internally, a module can contain any combination and SystemC, and includes a built-in C debugger.
of the following: net / variable declarations (wire, ModelSim can be used independently, or in
reg, integer, etc.), concurrent and sequential block conjunction with Intel Quartus Prime, Xilinx ISE,
of statements, and other module instances (sub- or Xilinx Vivado. Simulations are performed
hierarchies). Sequential statements are placed using a graphical user interface (GUI), or
inside the start / end block and are executed automatically using scripts..
sequentially within the block. However, the
blocks themselves are executed concurrently, 2.4 VIVADO
making Verilog the data flow language.
Verilog's concept of 'wire' consists of
signal value (4-state: "1, 0, floating, undefined")
and signal strength (strong, weak, etc.). This
system allows abstract modeling of shared signal
paths, where multiple sources drive the network
together. When a cable has multiple drivers, the
cable rating (readable) is determined by the
function of the source driver and its strength. F IGURE 3 V IVADO L OGO
A subset of statements in Verilog can be
Vivado is a software produced by Xilinx to
synthesized. A Verilog module conforming to a
synthesize and analyze HDL (Hardware
synthesizable coding style, known as RTL
Description Language) which we look forward to
(register-transfer level), can be physically realized
embedding in FPGA (Field Programmable Gate
by synthesis software. The synthesis software
Array). Of course here the FPGA that we will use
algorithmically converts the Verilog source
is the output from Xilinx. Prior to Vivado, there
(abstract) into a netlist, a logically equivalent
was a predecessor software known as Xilix ISE.
description consisting of only basic logic
primitives (AND, OR, NOT, flip-flop, etc.)
2.5 ZYNQ ARCHITECTURE
available in certain FPGA or VLSI technologies.
Further manipulation of the netlist ultimately Zynq is an FPGA SoC. The Zynq architectural
leads to circuit fabrication blueprints (such as block diagram is shown in Figure 1. In Zynq there
photomask sets for ASICs or bitstream files for is a Processing System (PS) and Programmable
FPGAs). Logic (PL). On the PS there is a dual core ARM
Cortex-A9 processor, while the PL itself is an

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 3


FPGA.

F IGURE 4 Z YNC A RCHITECTURE

At PL, we can design using Verilog or use the


IP core provided by Xilinx. Design on PL can F IGURE 7 M AX B LOCK D ESIGN
communicate with PS through several interfaces
such as SGP, MGP, HP, and ACP. The design
made on the PL must be compatible with the
standard AXI bus in order to communicate with
the PS.

2.6 REINFORCEMENT LEARNING


ARCHITECTURE F IGURE 8 Q-U PDATER D ESIGN
The following are a number of reference
architectures used to realize the reinforcement 3. METHODOLOGY
learning module. This design is based on a journal
In this experiment, a number of software and
that discusses the hardware design process for
boards are used to carry out the realization
efficient RL.
process. The following are some of the tools and
software used.
• Computer
• Modelsim
• Vivado
• FPGA Xilinx Zybo board
To be able to realize this reinforcement
learning process in FPGA, in general it is
necessary to carry out 4 stages in accordance with
predetermined goals. The following are the steps
F IGURE 5 Q-L EARNING A GENT D ESIGN that need to be done

F IGURE 6 Q-L EARNING A CCELERATOR D ESIGN

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 4


B. Software design process
Perform hardware
design and verification To be able to do software design, a programming
process will be carried out using the C language
to realize the reinforcement learning model. This
is done by adapting the model to the C language
Design software in C
by understanding the model in the Matlab. The
language following is a diagram of the process.
Search for the system's
matlab model
Creating a vivado
memory block

Adapt to C language
Integrate hardware,
software and memory

A. Hardware Design Program integration


In the hardware design process, Verilog-based
programming is necessary to create a circuit. This
circuit will later be implemented on a zybo board.
Of course, in order to realize the reinforcement Perform verification
learning process, it is necessary to make a certain
architecture that works well. Therefore,
architecture is used in literature studies to create
Adapt code to fit
a good hardware reinforcement learning model.
memory
The following are the steps taken to be able to
make a hardware design.
C. Block Memory Vivado
Making blocks in the To be realized, the hardware requires real
architectural part memory that can be accessed on the board.
Therefore, it is necessary to use memoy directly
on the board that can do the read and write
Perform the simulation process process well. Therefore, the process can be done
as in the following diagram.

Make a project and


Debuging dann block do vivado setup
verification

Enter the Memory


Perform the first process of up Block and its settings
to three for each block

Integrate the created verilog Create a testbench


blocks

Create state selector and Perform Memory


Control Unit Simulation

D. Integration Process
Carry out the final verification
Finally, after the whole process has gone
process with certain applications
well, the integration process can be carried out

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 5


between PL, PS and Memory. This process is 1. States Space
carried out by following the procedure as follows.
State is defined as a vector of system information
Creating a Q-Agent that is used to represent system conditions,
module on PL namely the Bus Voltage Magnitude. Stress at load
quantized (0.05)
2. Action Space
Perform Q-Agent
simulation In manual control, actions that can be taken to
deal with voltage problems include adjusting the
generator terminal voltage set point, switching
shunt elements, changing the tap ratio, etc. In this
Creating an AXI Interface task, the action chosen to be applied to
Reinforcement Learning is to adjust the set point.
For each generator, the terminal voltage can be
adjusted in the range [0.95, 0.975, 1.0, 1.025, 1.05].
Adapt the C language
The combination of all available generators
program to provide action
becomes an Action Space for training Agents.
3. Rewards

Perform integration There are several stress operating zones that are
defined to determine the quality of the stress
distribution.

Observe the results and • Normal Zone (0.95-1.05 pu) (100 points)
demo • Violation Zone (0.8-0.95 pu or 1.05-1.25
pu) (-50 points)
• Diverged Zone (<0.8 pu or> 1.25 pu) (-100
points)
Application Details of Reinforcement Learning on
Voltage Control in Electric Power Systems
Conventional electric power systems have several 4. RESULT AND ANALYSIS
challenges such as fast and deep ramps and
increasing uncertainty that threatens the safety A. Block diagram
and economics of their operations. In extreme 1. General Design
conditions or local disturbances, if not properly
controlled, the disturbance can spread to
neighboring settlements and cause a cascade of
disturbances, potentially causing widespread
blackout. Therefore, it is necessary to detect early
operation problems. In addition, it may take a lot
of time for the system to return to its normal state.
F IGURE 10 G ENERAL D ESIGN D IAGRAM B LOCK

Basically, the system being developed


will have 3 parts. These sections are the memory
module, Q-Agent, and the software section. It can
be observed in the section above, that BRAM will
function as a block that stores the value of q value
F IGURE 9 G RID O PERATION AS RL E NVIRONMENT and will be updated each process by the q agent.
The Q agent is the hardware tasked with
For the application that is trying to be performing computations and ordering readings
made, the power system model is used as above. and writing in BRAM. Meanwhile, the software
Then it can be observed a number of states, will carry out the process of determining the state,
actions and rewards which will be used in the determining the reward and communicating with
following information. the environment outside the system
2. Q-Agent Block
Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 6
Then there is the mux with delay and Max block
which is tasked to provide input to the Q updater.
Block mux is used to select Q (st, at) and MaxBlock
to find the maximum value of the next Q state
value so that the above equation parameters can
be met. There is also a decoder that accepts action
and provides a signal enabled for its output. This
decoder serves as a memory regulator so that
active memory can be determined. Action ram is
a memory model used to simulate BRAM.

F IGURE 11 Q-A GENT DIAGRAM BLOCK

The image above shows a block diagram


of the Q agent whose task is to compute the q-
values and determine the action to be selected.
This block has a large part, namely the q learning
accelerator and policy generator. The learning
module is a module that focuses on computation
of the Q value. Meanwhile, the policy generator is
a module that determines what actions to choose.
There are also a number of delay blocks to make
the next action, state, and reward become the
current state.
F IGURE 13 Q-L EARNING ACCELERATOR DIAGRAM BLOCK
3. Policy Generator

B. Hardware Simulation in Verilog


1. Implementation of Max Block
Implementation is done using an assign process
because the assumptions are combinational. The
designer avoids using if code to prevent latches
F IGURE 12 P OLICY G ENERATOR DIAGRAM BLOCK from appearing in the chain. The form of the
comparator is made up of 4 levels, each of which
To be able to determine the action, the has 2 inputs. In total, this code uses 8
policy generator uses 2 main basis blocks, namely comparators.
the action selector and randomizer and 1 block
delay. In this process, an action selection process
is carried out using the greedy algorithm or
random exploration. This depends on the epsilon
value and also the random value. Therefore, we
need an LSFR that can generate random values.
4. Q Learning Accelerator
As already mentioned, the Q learning accelerator
is responsible for calculating the Q values and
storing them in memory. The diagram below F IGURE 14 MAX BLOCK SIMULATION RESULT USING MODELSIM
shows that there are several blocks with their
respective functions. Block Q updater is a block It can be observed that of the nine inputs, the
that functions to perform multiplication using a output value will be the maximum value of the
shifter barrel and perform the addition so that a input set.
new Q value is generated according to the
2. Barrel Shifter
following equation
The multiplication process was also carried out
using the barrel shifter method in the

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 7


reinforcement learning model. Therefore, a Likewise, the simulation was carried out and the
hardware barrel shifter was made and the appropriate results were obtained.
following results were obtained.

F IGURE 18 Q-L EARNING ACCELERATOR SIMULATION RESULT


USING MODELSIM

F IGURE 15 BARREL SHIFTER SIMULATION RESULT USING 6. Action Selector


MODELSIM
Changes to the action selector, previously only
3. Q-Updater using the greedy algorithm, have now been given
a randomized value process. The following are
In the implementation, Q-Matrix updater
the results of the simulation carried out.
temporarily uses only 1 barrel shifter because 1
barrel shifter is assumed to be accurate enough
and to make testing easier. If during the next
experimental stage the system is deemed
inadequate, the number of shifter barrels will be F IGURE 19 A CTION S ELECTOR SIMULATION RESULT USING
increased. In determining the state itself, there is MODELSIM
a quantization process which means that high
accuracy is not required in this process. Here are 7. Policy Generator
the verilog simulation results from the Q-updater
A policy generator module has also been
module.
prepared and a simulation process is carried out.
It can be observed that the results obtained are
consistent.

F IGURE 16 Q-U PDATER SIMULATION RESULT USING MODELSIM

F IGURE 20 P OLICY G ENERATOR SIMULATION RESULT USING


4. Action Ram MODELSIM
Action Ram is a module that functions as memory
to store Q (write) values and can later be read for 8. Q-Agent
input from Q-update. The simulation results of Q-Agent has also been successfully compiled and
the action ram used are shown in the following provided a simulation in accordance with the
figure. following expectations.

F IGURE 17 A CTION RAM SIMULATION RESULT USING MODELSIM


F IGURE 21 Q-A GENT SIMULATION RESULT USING MODELSIM
5. Q-Learning Accelerator
This Q-Learning Accelerator module is realized
by following the previous hardware diagram.
Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 8
C. Verifying Hardware Design with Maze First, the program will ask for input from the user
Solver Example in the form of load voltage, which is the voltage at
the load. The load voltage will then be quantified
The Q-Agent process is then tested to carry out
to get the starting state of the system. The
the validation process with simulations using
program then displays the state of the Q table and
modelsim. In the following simulation results, it
the actions to be taken by the system so that the
can be seen that the Q table has been updated and
load voltage is in a safe zone prior to training. The
the agent has made it to the destination.
Q table in the initial state is filled with random
values so that the actions taken by the system are
not optimal.
After that, the training will be carried out with the
desired number of episodes, each of which has 15
iterations. In each iteration, if the state to be
achieved is already in the normal zone, the
F IGURE 22 MAZE SOLVER SIMULATION RESULT USING iteration will be terminated. In addition, in each
PROPOSED HARDWARE DESIGN iteration, the Q table value will be updated, as
well as the state of the system.
The generation process carried out is 300
After the episodes have reached the desired
times. To be able to complete this process, it takes
5694 ns because a clock with a period of 2 ns is number, the Q table will be displayed again. The
actions that the system will take to keep the load
used. This means that to be able to complete this
voltage in a safe zone will also be displayed. The
process, 2347 clock cycles are needed.
final result of the Q table and the actions taken by
the system should be more optimal than before.
D. Software design and verification For this reason, the verification process is also
Besides creating a hardware architecture, a C- carried out using the C language in total. And the
based software system will be created which will following are the results obtained
be used to determine the state and rewards to be
given to the Q-Agent. This is a flowchart of the C
language used.

F IGURE 23 SOFWARE DESIGN FLOWCHART

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 9


F IGURE 24 RESULT OF VOLTAGE CONTROL LEARNING
ALGORITHM IN C

The results above show that the C language


program has been verified and gives the desired
results.

E. Block Memory Simulation of Timing F IGURE 26 Q-A GENT SIMULATION RESULT USING VIVADO

In addition, the verification and simulation It can be observed that the memory and
process are carried out with a timing diagram. hardware have gone through the timing diagram
This is done using the zybo board via the vivado above.
application. The following is the simulation result
of the BRAM block timing. Based on this simulation, the following utilization
results are obtained.

F IGURE 25 BRAM TIMING SIMULATION RESULT F IGURE 27 UTILIZATION RESULT

It can be observed that the memory The image above shows that the system
simulation with read and write processes has has been running using certain timing conditions
been successfully carried out. Next, we can also parameters such as slack.
observe the write and read addresses and other
signals.

F. Integration Process
First, the verification process is carried out on F IGURE 28 TIMING CONDITION PARAMETER
the zybo board related to the Q-Agent on the
hardware being made. The following is a In the simulation process, a clock with a
simulation carried out using a zybo board. This frequency and period value is used as above.
simulation has integrated zybo hardware and
block memory.

F IGURE 29 P OWER CONSUMPTION

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 10


Based on the information above, it can There is also information related to
also be seen that the power dissipated by the primitives and their functions as follows.
system is 1.663 W.
Next, the information below describes
information related to the design made which is
likely to affect the size. The following utilities are
related to slice logic which provides information

F IGURE 30 GATE UTILIZATION

In addition, information related to the


type of register is also provided as in the
following figure.

F IGURE 33 PRIMITIVE FUNCTION UTILIZATION

The process has reached the final stage,


namely integrating the software with the memory
and hardware that has been simulated. However,
after adapting the model C code and trying to do
the integration, the results displayed have not
shown what is desired. The selected action values
should appear, however, certain values that do
not match appear.

5. CONCLUSION
F IGURE 31R EGISTER UTILIZATION Based on the experiments carried out, the
following conclusions were obtained.
It also provides block memory
information related to the memory used and not • It can be concluded that the hardware
used on the board. design has been successfully carried out
and simulated related to the Q-Agent.
Verification using the maze solver has also
been carried out.
• Software using the C language is also well-
made and provides appropriate results and
verification.
• Simulated timing of block memory has
been carried out with successful writing
and reading.
F IGURE 32 MEMORY UTILIZATION • The integration process has been
attempted. Hardware and memory have
been successfully simulated on Zybo.

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 11


However, when trying to integrate with the
software, an error appears in the
information display.

REFERENCE

[1] Sergio Spanò, An Efficient Hardware


Implementation ofReinforcement
Learning: The Q-LearningAlgorithm,
IEEE Acces, Rome, 2019.
[2] https://en.wikipedia.org/wiki/Verilog, 29
Januari 2020, 20:00
[3] Ruisheng Diao, Zhiwei Wang, Di Shi,
Qianyun Chang, Jiajun Duan, & Xiaohu
Zhang. (2019). Autonomous Voltage
Control for Grid Operation Using Deep
Reinforcement Learning.

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 12


APPENDIX
ActionRam.v

//action ram

module ActionRAM(clk, en, wr_addr, rd_addr, write_en, data_in, data_out);

input clk;

input en;

input write_en;

input[5:0] wr_addr ;

input[5:0] rd_addr;

input[15:0] data_in;

output reg[15:0] data_out;

//File Name Parameter

parameter FILENAME = "memory_in.list";

//Memory Model

reg[15:0] mem[0:63];

initial begin

$readmemh (FILENAME, mem);

end

always@(posedge clk) begin

if(!en) begin

data_out = 16'd0;

end

else begin

data_out <= mem[rd_addr];

end

if (write_en) begin

mem[wr_addr] <= data_in;

end

else begin
mem[wr_addr] <= mem[wr_addr]; //do nothing

end

end

endmodule

ActionRam_tb.v

//Testbench Action RAM

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module ActionRAM_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

reg write_en = 1'b0;

reg [5:0] wr_addr ;

reg [5:0] rd_addr;

reg [15:0] data_in;

wire [15:0] data_out;

//port mapping

ActionRAM DUT(.clk(clk),

.en(en),

.write_en(write_en),

.wr_addr(wr_addr),

.rd_addr(rd_addr),

.data_in(data_in),

.data_out(data_out));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 14


//test case

initial begin

#10;

en = 1'b1;

#20;

write_en = 1'b1;

wr_addr = 6'd5;

rd_addr = 6'd5;

data_in = 16'd5;

#20;

write_en = 1'b0;

rd_addr = 6'd5;

#20;

write_en = 1'b1;

wr_addr = 6'd1;

rd_addr = 6'd1;

data_in = 16'd10;

#20;

write_en = 1'b0;

rd_addr = 6'd1;

#20;

end

//display monitor

initial begin

$monitor("time = %2d\n dout = %2d",

$time , data_out);

end

endmodule

ActionSelector.v

module ActionSelector (clk, start, q_values, epsilon, action);

input clk, start;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 15


input [63:0] q_values; // Values of Q_Table at row equal to current state

input [15:0] epsilon; // epsilon = 1 - episode/301

output reg [3:0] action; // action that will be taken

wire [15:0] max_1;

wire [15:0] max_2;

wire [15:0] max_value;

wire [15:0] random;

wire [15:0] random_value;

Randomizer randomizer_1 (16'b0011110011000011,start,clk,random);

Randomizer randomizer_2 (16'b1100001100111100,start,clk,random_value);

assign max_1 = (q_values[15:0] >= q_values[31:16]) ? q_values[15:0] : q_values[31:16];

assign max_2 = (q_values[47:32] >= q_values[63:48]) ? q_values[47:32] :


q_values[63:48];

assign max_value = (max_1 >= max_2) ? max_1 : max_2;

always @(*) begin

if (epsilon <= random) begin

if (q_values[15:0] == max_value) begin

action = 4'd4;

end

else if (q_values[31:16] == max_value) begin

action = 4'd3;

end

else if (q_values[47:32] == max_value) begin

action = 4'd2;

end

else if (q_values[63:48] == max_value) begin

action = 4'd1;

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 16


else begin

action = 4'd1;

end

end

else begin

if (random_value <= 16'b0000000001000000) begin

action = 4'd1;

end

else if (random_value <= 16'b0000000010000000) begin

action = 4'd2;

end

else if (random_value <= 16'b0000000011000000) begin

action = 4'd3;

end

else begin

action = 4'd4;

end

end

end

endmodule

ActionSelector_tb.v

module ActionSeletor_tb();

reg [63:0] q_values;

reg [15:0] epsilon;

wire [3:0] action;

ActionSelector action_selector(q_values, epsilon, action);

initial begin

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 17


q_values =
64'b0000000000001100000000000000000100000000000000100000000000000011;

epsilon = 16'b0000000011100000;

$display("Random = %f", $itor(epsilon)*2.0**-8.0);

#10;

q_values =
64'b0000000000001100000000000000000100000000000000100000000000000111;

epsilon = 16'b0000000011000000;

$display("Random = %f", $itor(epsilon)*2.0**-8.0);

#10;

$stop;

end

endmodule

BarrelShifter.v

module BarrelShifter(op, shift_mag, result);

input [15:0] op;

input [3:0] shift_mag;

output [15:0] result;

wire [15:0] mux1_out;

wire [15:0] mux2_out;

wire [15:0] mux3_out;

mux mux_instance1(shift_mag[0], op, (op >> 1), mux1_out);

mux mux_instance2(shift_mag[1], mux1_out, (mux1_out >> 2), mux2_out);

mux mux_instance3(shift_mag[2], mux2_out, (mux2_out >> 4), mux3_out);

mux mux_instance4(shift_mag[3], mux3_out, (mux3_out >> 8), result);

endmodule

BarrelShifter_tb.v

module tb_barrel;

reg [15:0] op;

reg [3:0] shift_mag;

wire [15:0] result;

BarrelShifter shifter_instance(op, shift_mag, result);

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 18


initial begin

shift_mag = 4'b0001;

op = 16'b0000000000000001;

#10;

shift_mag = 4'b0101;

op = 16'b0000000000000101;

#10;

$stop;

end

endmodule

CU.v

module ControlUnit(clk,enb,epsilon,next_action,current_st,episode,fail,finish,print);

input clk;

input [15:0]epsilon;

input enb;

input [3:0] next_action; //hilangkan kalo pake qagent

//input [5:0] current_state;

output reg [9:0] episode;

output reg fail;

output reg finish;

output reg print;

output reg [5:0] current_st;

wire [3:0] w_next_action;

wire signed [15:0] w_next_reward ;

wire [5:0] next_state;

reg [5:0] current_state = 6'd0;

reg start;

reg en;

reg [1:0] current_condition = 2'b00;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 19


reg [1:0] next_condition = 2'b00;

reg [5:0] counter;

RewardGenerator Reward(.next_state(next_state),

.next_reward(w_next_reward));

StateSelector State(.next_action(w_next_action),

.current_state(current_state),

.next_state(next_state));

QLearningAgent Agent(.clk(clk),

.en(en),

.start(start),

.next_reward(w_next_reward),

.next_state(next_state),

.epsilon(epsilon),

.next_action(w_next_action));

always@(*) begin

if(current_condition == 2'b00) begin

//State Start

current_state=6'd1;

current_st=6'd1;

start=1'b1;

en=1'b1;

print=1'b1;

counter=4'd0;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 20


end

else if (current_condition ==2'b01) begin

//State Calculation

start=1'b0;

en=1'b1;

print=1'b1;

end

else if (current_condition ==2'b10) begin

//State finish

start=1'b0;

en=1'b0;

print=1'b0;

counter=4'd0;

finish=1'b1;

end

else begin

//State Tidak Berhasil

start=1'b0;

en=1'b0;

print=1'b0;

counter=4'd0;

fail=1'b0;

end

if(current_state == 6'd25) begin

next_condition=2'b10;

end

else if(current_state == 5 || current_state == 8 || current_state == 7 ||


current_state == 20 || current_state == 14 || current_state == 17 ||

current_state == 19 || current_state == 22) begin

next_condition=2'b00;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 21


end

else if (counter > 5'd14) begin

next_condition=2'b00;

end

else if (episode == 10'd256) begin

next_condition=2'b11;

end

else if (current_condition == 2'b00) begin

next_condition=2'b01;

end

end

always@(posedge clk) begin

current_state<=next_state;

current_st<=next_state;

current_condition<=next_condition;

if (enb == 1'b1) begin

current_condition<=2'b00;

episode=10'd0;

end

else if (current_condition == 2'b01) begin

counter=counter+1;

end

else if (current_condition == 2'b00) begin

episode= episode+1;

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 22


end

endmodule

CU_tb.v

`timescale 100 ps/1 ps // time-unit = , precision =

module ControlUnit_tb();

//input Declaration

reg clk = 1'b1;

reg enb = 1'b1;

reg [15:0] epsilon;

reg [3:0] next_action ;

wire [9:0] episode;

wire fail;

wire finish;

wire print;

wire [5:0] current_st;

wire [35:0] step;

// reg [3:0] gamma;

// reg [3:0] alpha;

//reg [16:0] action_counter;

//wire [15:0] result;

// wire [63:0] Q_out_action;

//port mapping

ControlUnit DUT(.clk(clk),

.enb(enb),

.epsilon(epsilon),

.next_action(next_action),

.current_st(current_st),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 23


.episode(episode),

.fail(fail),

.finish(finish),

.print(print));

reg[3:0] memory_map[0:24];

//read initial memory access

initial begin

$readmemh ("memory_map.list", memory_map);

end

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

assign step = ({episode, 8'b00000000}) * 18'b000000000000000001;

initial begin

#20;

//next_reward = 16'b00000111_00000000; //7

enb = 1'b0;

next_action = 4'b0000;

//current_action = 4'd1;

// alpha = 4'b1000;

// gamma = 4'b1110;

#20;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 24


enb= 1'b0;

next_action = 4'b0000;

//next_reward = 16'b00000101_00000000; //7

//current_action = 4'd1;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 25


#20;

next_action = 4'b0011;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

end

always@(episode) begin

epsilon = 16'b0000000100000000 - (step[23:8]);

end

//display monitor

always@(negedge clk) begin

$readmemh ("memory_map.list", memory_map);

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 26


case(current_st)

5'b00001: begin memory_map[0]=3'd1; end

5'b00010: begin memory_map[1]=3'd1; end

5'b00011: begin memory_map[2]=3'd1; end

5'b00100: begin memory_map[3]=3'd1; end

5'b00101: begin memory_map[4]=3'd1; end

5'b00110: begin memory_map[5]=3'd1; end

5'b00111: begin memory_map[6]=3'd1; end

5'b01000: begin memory_map[7]=3'd1; end

5'b01001: begin memory_map[8]=3'd1; end

5'b01010: begin memory_map[9]=3'd1; end

5'b01011: begin memory_map[10]=3'd1; end

5'b01100: begin memory_map[11]=3'd1; end

5'b01101: begin memory_map[12]=3'd1; end

5'b01110: begin memory_map[13]=3'd1; end

5'b01111: begin memory_map[14]=3'd1; end

5'b10000: begin memory_map[15]=3'd1; end

5'b10001: begin memory_map[16]=3'd1; end

5'b10010: begin memory_map[17]=3'd1; end

5'b10011: begin memory_map[18]=3'd1; end

5'b10100: begin memory_map[19]=3'd1; end

5'b10101: begin memory_map[20]=3'd1; end

5'b10110: begin memory_map[21]=3'd1; end

5'b10111: begin memory_map[22]=3'd1; end

5'b11000: begin memory_map[23]=3'd1; end

5'b11001: begin memory_map[24]=3'd1; end

5'b11010: begin memory_map[25]=3'd1; end

default: begin memory_map[0]=3'd1; end

endcase

$monitor("%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d


%d %d",

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 27


memory_map[0], memory_map[1],memory_map[2],memory_map[3],memory_map[4],

memory_map[5], memory_map[6], memory_map[7], memory_map[8],


memory_map[9],

memory_map[10],memory_map[11],memory_map[12],memory_map[13],memory_map[14],

memory_map[15],memory_map[16],memory_map[17],memory_map[18],memory_map[19],

memory_map[20],memory_map[21],memory_map[22],memory_map[23],memory_map[24]);

end

endmodule

ControlUnit.v

module ControlUnit(clk,enb,epsilon,next_action,current_st,episode,fail,finish,print);

input clk;

input [15:0]epsilon;

input enb;

input [3:0] next_action; //hilangkan kalo pake qagent

//input [5:0] current_state;

output reg [8:0] episode;

output reg fail;

output reg finish;

output reg print;

output reg [5:0] current_st;

wire [3:0] w_next_action;

wire signed [15:0] w_next_reward ;

wire [5:0] next_state;

reg [5:0] current_state = 6'd0;

reg start;

reg en;

reg [1:0] current_condition = 2'b00;

reg [1:0] next_condition = 2'b00;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 28


reg [5:0] counter;

//current_condition = 2'b00;

RewardGenerator Reward(.next_state(next_state),

.next_reward(w_next_reward));

StateSelector State(.next_action(next_action),

.current_state(current_state),

.next_state(next_state));

/*

QLearningAgent Agent(.clk(clk),

.en(en),

.reward(w_next_reward),

.next_state(w_next_state)

.next_action(w_next_action));

*/

always@(*) begin

if(current_condition == 2'b00) begin

//State Start

current_state=6'd1;

current_st=6'd1;

//next_state=6'd1;

start=1'b1;

en=1'b0;

print=1'b1;

counter=4'd0;

end

else if (current_condition ==2'b01) begin

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 29


//State Calculation

start=1'b0;

en=1'b1;

print=1'b1;

end

else if (current_condition ==2'b10) begin

//State finish

start=1'b0;

en=1'b0;

print=1'b0;

counter=4'd0;

finish=1'b1;

end

else begin

//State Tidak Berhasil

start=1'b0;

en=1'b0;

print=1'b0;

counter=4'd0;

fail=1'b0;

end

if(current_state == 6'd25) begin

next_condition=2'b10;

end

else if(current_state == 5 || current_state == 8 || current_state == 7 ||


current_state == 20 || current_state == 14 || current_state == 17 ||

current_state == 19 || current_state == 22) begin

next_condition=2'b00;

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 30


else if (counter > 5'd14) begin

next_condition=2'b00;

end

else if (episode == 9'd300) begin

next_condition=2'b11;

end

else if (current_condition == 2'b00) begin

next_condition=2'b01;

end

end

always@(posedge clk) begin

current_state<=next_state;

current_st<=next_state;

current_condition<=next_condition;

if (enb == 1'b1) begin

current_condition<=2'b00;

end

else if (current_condition == 2'b01) begin

counter=counter+1;

end

else if (current_condition == 2'b00) begin

episode= episode+1;

//current_state=6'd1;

end

end

endmodule

ControlUnit_tb.v
Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 31
`timescale 100 ps/1 ps // time-unit = , precision =

module ControlUnit_tb();

//input Declaration

reg clk = 1'b1;

reg enb = 1'b1;

reg [15:0] epsilon = 15'd0;

reg [3:0] next_action ;

wire [8:0] episode;

wire fail;

wire finish;

wire print;

wire [5:0] current_st;

// reg [3:0] gamma;

// reg [3:0] alpha;

//reg [16:0] action_counter;

//wire [15:0] result;

// wire [63:0] Q_out_action;

//port mapping

ControlUnit DUT(.clk(clk),

.enb(enb),

.epsilon(epsilon),

.next_action(next_action),

.current_st(current_st),

.episode(episode),

.fail(fail),

.finish(finish),

.print(print));

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 32


reg[3:0] memory_map[0:24];

//read initial memory access

initial begin

$readmemh ("memory_map.list", memory_map);

end

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

initial begin

#20;

//next_reward = 16'b00000111_00000000; //7

enb = 1'b0;

next_action = 4'b0000;

//current_action = 4'd1;

// alpha = 4'b1000;

// gamma = 4'b1110;

#20;

enb= 1'b0;

next_action = 4'b0000;

//next_reward = 16'b00000101_00000000; //7

//current_action = 4'd1;

#20;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 33


next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0000;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 34


#20;

next_action = 4'b0000;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0011;

#20;

next_action = 4'b0000;

#20;

next_action = 4'b0000;

end

always@(posedge clk) begin

epsilon=1- episode/300;

end

//display monitor

always@(negedge clk) begin

$readmemh ("memory_map.list", memory_map);

case(current_st)

5'b00001: begin memory_map[0]=3'd1; end

5'b00010: begin memory_map[1]=3'd1; end

5'b00011: begin memory_map[2]=3'd1; end

5'b00100: begin memory_map[3]=3'd1; end

5'b00101: begin memory_map[4]=3'd1; end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 35


5'b00110: begin memory_map[5]=3'd1; end

5'b00111: begin memory_map[6]=3'd1; end

5'b01000: begin memory_map[7]=3'd1; end

5'b01001: begin memory_map[8]=3'd1; end

5'b01010: begin memory_map[9]=3'd1; end

5'b01011: begin memory_map[10]=3'd1; end

5'b01100: begin memory_map[11]=3'd1; end

5'b01101: begin memory_map[12]=3'd1; end

5'b01110: begin memory_map[13]=3'd1; end

5'b01111: begin memory_map[14]=3'd1; end

5'b10000: begin memory_map[15]=3'd1; end

5'b10001: begin memory_map[16]=3'd1; end

5'b10010: begin memory_map[17]=3'd1; end

5'b10011: begin memory_map[18]=3'd1; end

5'b10100: begin memory_map[19]=3'd1; end

5'b10101: begin memory_map[20]=3'd1; end

5'b10110: begin memory_map[21]=3'd1; end

5'b10111: begin memory_map[22]=3'd1; end

5'b11000: begin memory_map[23]=3'd1; end

5'b11001: begin memory_map[24]=3'd1; end

5'b11010: begin memory_map[25]=3'd1; end

default: begin memory_map[0]=3'd1; end

endcase

$monitor("%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d


%d %d",

memory_map[0], memory_map[1],memory_map[2],memory_map[3],memory_map[4],

memory_map[5], memory_map[6], memory_map[7], memory_map[8],


memory_map[9],

memory_map[10],memory_map[11],memory_map[12],memory_map[13],memory_map[14],

memory_map[15],memory_map[16],memory_map[17],memory_map[18],memory_map[19],

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 36


memory_map[20],memory_map[21],memory_map[22],memory_map[23],memory_map[24]);

end

endmodule

Decoder.v

module Decoder (

input [3:0] at,

output en1,en2,en3,en4,en5,en6,en7,en8,en9,en10,en11,en12,en13,en14,en15);

reg [15:0] out;

always @(*) begin

case(at)

4'd1: begin out=16'b0000000000000001; end

4'd2: begin out=16'b0000000000000010; end

4'd3: begin out=16'b0000000000000100; end

4'd4: begin out=16'b0000000000001000; end

4'd5: begin out=16'b0000000000010000; end

4'd6: begin out=16'b0000000000100000; end

4'd7: begin out=16'b0000000001000000; end

4'd8: begin out=16'b0000000010000000; end

4'd9: begin out=16'b0000000100000000; end

4'd10: begin out=16'b0000001000000000; end

4'd11: begin out=16'b0000010000000000; end

4'd12: begin out=16'b0000100000000000; end

4'd13: begin out=16'b0001000000000000; end

4'd14: begin out=16'b0010000000000000; end

4'd15: begin out=16'b0100000000000000; end

default : begin out=16'b0000000000000000; end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 37


endcase

end

assign en1 = out[0];

assign en2 = out[1];

assign en3 = out[2];

assign en4 = out[3];

assign en5 = out[4];

assign en6 = out[5];

assign en7 = out[6];

assign en8 = out[7];

assign en9 = out[8];

assign en10 = out[9];

assign en11 = out[10];

assign en12 = out[11];

assign en13 = out[12];

assign en14 = out[13];

assign en15 = out[14];

endmodule

Decoder_tb.v

//Testbench Max Blcok

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module decoder_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

reg[3:0] action;

wire en1,en2,en3,en4,en5,en6,en7,e8,en9,en10,en11,en12,en13,en14,en15;

//port mapping

Decoder DUT( .at(action),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 38


.en1(en1),

.en2(en2),

.en3(en3),

.en4(en4),

.en5(en5),

.en6(en6),

.en7(en7),

.en8(en8),

.en9(en9),

.en10(en10),

.en11(en11),

.en12(en12),

.en13(en13),

.en14(en14),

.en15(en15));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

//test case

initial begin

#10;

action = 1;

#20;

action = 2;

#20;

action = 3;

#20;

action = 4;

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 39


//display monitor

// initial begin

// $monitor("time = %2d\n dout = %2d",

// $time , Q_max);

// end

endmodule

Delay.v

// Block Delay

module DelayActionRAM(clk, din,dout);

input clk;

input [15:0] din;

output reg [15:0] dout = 16'd0;

//buffer register;

reg [15:0] temp1,temp2,temp3;

always@(posedge clk) begin

temp1 <= din;

dout <= temp1;

end

endmodule

module DelayReward(clk,din,dout);

input clk;

input [15:0] din;

output reg [15:0] dout = 16'd0;

//buffer register;

reg [15:0] temp1,temp2;

always@(posedge clk) begin

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 40


//temp1 <= din;

dout <= din;

end

endmodule

module DelayState(clk,din,dout);

input clk;

input [5:0] din;

output reg [5:0] dout = 5'd0;

//buffer register;

reg [5:0] temp1,temp2;

always@(posedge clk) begin

temp1 <= din;

dout <= temp1;

end

endmodule

Delay_tb.v

//Testbench Action RAM

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module Delay_tb();

reg clk = 0;

reg [15:0] din;

wire [15:0] dout;

DelayActionRAM DUT(.clk(clk),

.din(din),

.dout(dout));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 41


end

initial begin

#10;

din = 16'd10;

#20;

din = 15'd20;

end

endmodule

MaxBlock.v

//Model 1 register di akhir

module MaxBlock (

input [15:0] Q_Act1,

input [15:0] Q_Act2,

input [15:0] Q_Act3,

input [15:0] Q_Act4,

input [15:0] Q_Act5,

input [15:0] Q_Act6,

input [15:0] Q_Act7,

input [15:0] Q_Act8,

input [15:0] Q_Act9,

input [15:0] Q_Act10,

input [15:0] Q_Act11,

input [15:0] Q_Act12,

input [15:0] Q_Act13,

input [15:0] Q_Act14,

input [15:0] Q_Act15,

input clk,

output reg [15:0] out);

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 42


wire [15:0] a;

wire [15:0] b;

wire [15:0] c;

wire [15:0] d;

wire [15:0] e;

wire [15:0] f;

wire [15:0] g;

wire [15:0] h;

wire [15:0] i;

wire [15:0] j;

wire [15:0] k;

wire [15:0] l;

wire [15:0] m;

wire [15:0] RegOut;

assign a = (Q_Act1>=Q_Act2)? Q_Act1 : Q_Act2;

assign b = (Q_Act3>=Q_Act4)? Q_Act3 : Q_Act4;

assign c = (Q_Act5>=Q_Act6)? Q_Act5 : Q_Act6;

assign d = (Q_Act7>=Q_Act8)? Q_Act7 : Q_Act8;

assign e = (Q_Act9>=Q_Act10)? Q_Act9 : Q_Act10;

assign f = (Q_Act11>=Q_Act12)? Q_Act11 : Q_Act12;

assign g = (Q_Act13>=Q_Act14)? Q_Act13 : Q_Act14;

assign h = (a>=b)? a : b;

assign i = (c>=d)? c : d;

assign j = (e>=f)? e : f;

assign k = (g>=Q_Act15)? g : Q_Act15;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 43


assign l = (h>=i)? h : i;

assign m = (j>=k)? j : k;

assign RegOut = (l>=m)? l : m;

always@(posedge clk) begin

out=RegOut;

end

endmodule

/*

//Model Register di tiap level

module Max_Block (

input [15:0] Q_Act1,

input [15:0] Q_Act2,

input [15:0] Q_Act3,

input [15:0] Q_Act4,

input [15:0] Q_Act5,

input [15:0] Q_Act6,

input [15:0] Q_Act7,

input [15:0] Q_Act8,

input [15:0] Q_Act9,

input [15:0] Q_Act10,

input [15:0] Q_Act11,

input [15:0] Q_Act12,

input [15:0] Q_Act13,

input [15:0] Q_Act14,

input [15:0] Q_Act15,

input clk,

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 44


output reg [15:0] out);

wire [15:0] a;

wire [15:0] b;

wire [15:0] c;

wire [15:0] d;

wire [15:0] e;

wire [15:0] f;

wire [15:0] g;

wire [15:0] h;

wire [15:0] i;

wire [15:0] j;

wire [15:0] k;

wire [15:0] l;

wire [15:0] m;

wire [15:0] RegOut;

reg [15:0] Rega;

reg [15:0] Regb;

reg [15:0] Regc;

reg [15:0] Regd;

reg [15:0] Rege;

reg [15:0] Regf;

reg [15:0] Regg;

reg [15:0] Regh;

reg [15:0] Regi;

reg [15:0] Regj;

reg [15:0] Regk;

reg [15:0] Regl;

reg [15:0] Regm;

assign a = (Q_Act1>=Q_Act2)? Q_Act1 : Q_Act2;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 45


assign b = (Q_Act3>=Q_Act4)? Q_Act3 : Q_Act4;

assign c = (Q_Act5>=Q_Act6)? Q_Act5 : Q_Act6;

assign d = (Q_Act7>=Q_Act8)? Q_Act7 : Q_Act8;

assign e = (Q_Act9>=Q_Act10)? Q_Act9 : Q_Act10;

assign f = (Q_Act11>=Q_Act12)? Q_Act11 : Q_Act12;

assign g = (Q_Act13>=Q_Act14)? Q_Act13 : Q_Act14;

assign h = (Rega>=Regb)? Rega : Regb;

assign i = (Regc>=Regd)? Regc : Regd;

assign j = (Rege>=Regf)? Rege : Regf;

assign k = (Regg>=Q_Act15)? Regg : Q_Act15;

assign l = (Regh>=Regi)? Regh : Regi;

assign m = (Regj>=Regk)? Regj : Regk;

assign RegOut = (Regl>=Regm)? Regl : Regm;

always@(posedge clk) begin

Rega=a;

Regb=b;

Regc=c;

Regd=d;

Rege=e;

Regf=f;

Regg=g;

Regh=h;

Regi=i;

Regj=j;

Regk=k;

Regl=l;

Regm=m;

out=RegOut;

end

endmodule

*/

MaxBlock_tb.v
Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 46
//Testbench Max Blcok

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module MaxBlock_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

reg[15:0] Q_Act1;

reg[15:0] Q_Act2;

reg[15:0] Q_Act3;

reg[15:0] Q_Act4;

reg[15:0] Q_Act5;

reg[15:0] Q_Act6;

reg[15:0] Q_Act7;

reg[15:0] Q_Act8;

reg[15:0] Q_Act9;

reg[15:0] Q_Act10;

reg[15:0] Q_Act11;

reg[15:0] Q_Act12;

reg[15:0] Q_Act13;

reg[15:0] Q_Act14;

reg[15:0] Q_Act15;

wire[15:0] Q_max;

//port mapping

MaxBlock DUT(.clk(clk),

.Q_Act1(Q_Act1),

.Q_Act2(Q_Act2),

.Q_Act3(Q_Act3),

.Q_Act4(Q_Act4),

.Q_Act5(Q_Act5),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 47


.Q_Act6(Q_Act6),

.Q_Act7(Q_Act7),

.Q_Act8(Q_Act8),

.Q_Act9(Q_Act9),

.Q_Act10(Q_Act10),

.Q_Act11(Q_Act11),

.Q_Act12(Q_Act12),

.Q_Act13(Q_Act13),

.Q_Act14(Q_Act14),

.Q_Act15(Q_Act15),

.out(Q_max));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

//test case

initial begin

#10;

Q_Act1 = 16'd1;

Q_Act2 = 16'd2;

Q_Act3 = 16'd3;

Q_Act4 = 16'd4;

Q_Act5 = 16'd0;

Q_Act6 = 16'd0;

Q_Act7 = 16'd0;

Q_Act8 = 16'd0;

Q_Act9 = 16'd0;

Q_Act10 = 16'd0;

Q_Act11 = 16'd0;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 48


Q_Act12 = 16'd0;

Q_Act13 = 16'd0;

Q_Act14 = 16'd0;

Q_Act15 = 16'd0;

#10;

Q_Act1 = 16'd2;

Q_Act2 = 16'd3;

Q_Act3 = 16'd6;

Q_Act4 = 16'd1;

#30;

Q_Act1 = 16'd5;

Q_Act2 = 16'd4;

Q_Act3 = 16'd1;

Q_Act4 = 16'd2;

#20;

Q_Act1 = 16'd4;

Q_Act2 = 16'd7;

Q_Act3 = 16'd2;

Q_Act4 = 16'd3;

end

//display monitor

initial begin

$monitor("time = %2d\n dout = %2d",

$time , Q_max);

end

endmodule

MazeMap.v

//Maze Map

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 49


module MazeMap(clk,current_state, out_map_row_1, out_map_row_2, out_map_row_3,
out_map_row_4, out_map_row_5);

input clk;

input [6:0] current_state;

//output

output reg[3:0] out_map_row_1, out_map_row_2, out_map_row_3, out_map_row_4,


out_map_row_5;

reg[3:0] memory_map[0:24];

reg [6:0] prev_state;

//read initial memory access

initial begin

$readmemh ("memory_map.list", memory_map);

end

always @(posedge clk) begin

prev_state <= current_state;

out_map_row_1 <= memory_map[0:4];

out_map_row_2 <= memory_map[5:9];

out_map_row_3 <= memory_map[10:14];

out_map_row_4 <= memory_map[15:19];

out_map_row_5 <= memory_map[20:24];

end

always @(*) begin

case(prev_state)

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 50


5'b00000: begin memory_map[0]=3'd0; end

5'b00001: begin memory_map[1]=3'd0; end

5'b00010: begin memory_map[2]=3'd0; end

5'b00011: begin memory_map[3]=3'd0; end

5'b00100: begin memory_map[4]=3'd0; end

5'b00101: begin memory_map[5]=3'd0; end

5'b00110: begin memory_map[6]=3'd0; end

5'b00111: begin memory_map[7]=3'd0; end

5'b01000: begin memory_map[8]=3'd0; end

5'b01001: begin memory_map[9]=3'd0; end

5'b01010: begin memory_map[10]=3'd0; end

5'b01011: begin memory_map[11]=3'd0; end

5'b01100: begin memory_map[12]=3'd0; end

5'b01101: begin memory_map[13]=3'd0; end

5'b01111: begin memory_map[14]=3'd0; end

5'b10000: begin memory_map[15]=3'd0; end

5'b10001: begin memory_map[16]=3'd0; end

5'b10010: begin memory_map[17]=3'd0; end

5'b10011: begin memory_map[18]=3'd0; end

5'b10100: begin memory_map[19]=3'd0; end

5'b10101: begin memory_map[20]=3'd0; end

5'b10110: begin memory_map[21]=3'd0; end

5'b10111: begin memory_map[22]=3'd0; end

5'b11000: begin memory_map[23]=3'd0; end

5'b11001: begin memory_map[24]=3'd0; end

default: begin memory_map[0]=3'd0; end

endcase

case(current_state)

5'b00000: begin memory_map[0]=3'd1; end

5'b00001: begin memory_map[1]=3'd1; end

5'b00010: begin memory_map[2]=3'd1; end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 51


5'b00011: begin memory_map[3]=3'd1; end

5'b00100: begin memory_map[4]=3'd1; end

5'b00101: begin memory_map[5]=3'd1; end

5'b00110: begin memory_map[6]=3'd1; end

5'b00111: begin memory_map[7]=3'd1; end

5'b01000: begin memory_map[8]=3'd1; end

5'b01001: begin memory_map[9]=3'd1; end

5'b01010: begin memory_map[10]=3'd1; end

5'b01011: begin memory_map[11]=3'd1; end

5'b01100: begin memory_map[12]=3'd1; end

5'b01101: begin memory_map[13]=3'd1; end

5'b01111: begin memory_map[14]=3'd1; end

5'b10000: begin memory_map[15]=3'd1; end

5'b10001: begin memory_map[16]=3'd1; end

5'b10010: begin memory_map[17]=3'd1; end

5'b10011: begin memory_map[18]=3'd1; end

5'b10100: begin memory_map[19]=3'd1; end

5'b10101: begin memory_map[20]=3'd1; end

5'b10110: begin memory_map[21]=3'd1; end

5'b10111: begin memory_map[22]=3'd1; end

5'b11000: begin memory_map[23]=3'd1; end

5'b11001: begin memory_map[24]=3'd1; end

default: begin memory_map[0]=3'd1; end

endcase

end

//display monitor

// initial begin

// $monitor("%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d


%d %d %d",

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 52


// memory_map[0],
memory_map[1],memory_map[2],memory_map[3],memory_map[4],

// memory_map[5], memory_map[6], memory_map[7], memory_map[8],


memory_map[9],

//
memory_map[10],memory_map[11],memory_map[12],memory_map[13],memory_map[14],

//
memory_map[15],memory_map[16],memory_map[17],memory_map[18],memory_map[19],

//
memory_map[20],memory_map[21],memory_map[22],memory_map[23],memory_map[24]);

// end

endmodule

PolicyGenerator.v

module PolicyGenerator(clk, start, q_values, epsilon, current_action,next_action);

input clk; // Clock

input start;

input [15:0] epsilon;

input [63:0] q_values; // Q Values untuk 1 State

output reg [3:0] current_action; // Next Action

output [3:0] next_action;

wire [3:0] w_current_action; // Next Action Wire

ActionSelector Act(.clk(clk),.start(start),.q_values(q_values), .epsilon(epsilon),


.action(w_current_action));

assign next_action=w_current_action;

//Delay Action

always@(posedge clk)

begin

current_action <= w_current_action;

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 53


endmodule

PolicyGenerator_tb.v

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module PolicyGenerator_tb();

reg clk = 1'b0; // Clock

reg [15:0] epsilon;

reg [63:0] q_values; // Q Values untuk 1 State

wire [3:0] action;

PolicyGenerator policy_generator(.clk(clk), .q_values(q_values),


.epsilon(epsilon), .current_action(action));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

initial begin

q_values =
64'b0000000000001100000000000000000100000000000000100000000000000011;

epsilon = 16'b0000000011100000;

$display("epsilon = %f", $itor(epsilon)*2.0**-8.0);

#10;

q_values =
64'b0000000000001100000000000000000100000000000000100000000000000111;

epsilon = 16'b0000000011000000;

$display("epsilon = %f", $itor(epsilon)*2.0**-8.0);

#20;

q_values = 64'h0001100001000010;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 54


$stop;

end

endmodule

QLearningAccelerator.v

// Q-Learning Accelerator

module QLearningAccelerator(clk, en, current_action, current_state, next_state,


current_reward, Q_out_action);

// Input and Output

input clk, en; // Control Signal

input[3:0] current_action; // Current Action

input [5:0] current_state; // Current State

input [5:0] next_state; //Next State

input signed [15:0] current_reward; // Current Reward

output reg[63:0] Q_out_action; // Q Values in Q Matrix of row equal to Current


State

reg [15:0] Q_new; // Updated Q Value

//wiring Q new

wire [15:0] w_Q_new;

//wiring

wire[15:0] out_ram_1, out_delay_1,

out_ram_2, out_delay_2,

out_ram_3, out_delay_3,

out_ram_4, out_delay_4,

out_ram_5, out_delay_5,

out_ram_6, out_delay_6,

out_ram_7, out_delay_7,

out_ram_8, out_delay_8,

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 55


out_ram_9, out_delay_9,

out_ram_10, out_delay_10,

out_ram_11, out_delay_11,

out_ram_12, out_delay_12,

out_ram_13, out_delay_13,

out_ram_14, out_delay_14,

out_ram_15, out_delay_15;

//decoder to ram wire

wire en1,

en2,

en3,

en4,

en5,

en6,

en7,

en8,

en9,

en10,

en11,

en12,

en13,

en14,

en15;

//new q value wire

//reg [15:0] Q_new;

//wire output mux

wire[15:0] q_value_selected;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 56


//wire q maximum

wire[15:0] q_max;

//module instantiation

ActionRAM ram_1(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en1),

.data_in(w_Q_new),

.data_out(out_ram_1));

ActionRAM ram_2(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en2),

.data_in(w_Q_new),

.data_out(out_ram_2));

ActionRAM ram_3(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en3),

.data_in(w_Q_new),

.data_out(out_ram_3));

ActionRAM ram_4(.clk(clk),

.en(en),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 57


.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en4),

.data_in(w_Q_new),

.data_out(out_ram_4));

ActionRAM ram_5(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en5),

.data_in(Q_new),

.data_out(out_ram_5));

ActionRAM ram_6(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en6),

.data_in(Q_new),

.data_out(out_ram_6));

ActionRAM ram_7(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en7),

.data_in(Q_new),

.data_out(out_ram_7));

ActionRAM ram_8(.clk(clk),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 58


.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en8),

.data_in(Q_new),

.data_out(out_ram_8));

ActionRAM ram_9(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en9),

.data_in(Q_new),

.data_out(out_ram_9));

ActionRAM ram_10(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en10),

.data_in(Q_new),

.data_out(out_ram_10));

ActionRAM ram_11(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en11),

.data_in(Q_new),

.data_out(out_ram_11));

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 59


ActionRAM ram_12(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en12),

.data_in(Q_new),

.data_out(out_ram_12));

ActionRAM ram_13(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en13),

.data_in(Q_new),

.data_out(out_ram_13));

ActionRAM ram_14(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en14),

.data_in(Q_new),

.data_out(out_ram_14));

ActionRAM ram_15(.clk(clk),

.en(en),

.wr_addr(current_state),

.rd_addr(next_state),

.write_en(en15),

.data_in(Q_new),

.data_out(out_ram_15));

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 60


//Delay

DelayActionRAM delay_1(.clk(clk),

.din(out_ram_1),

.dout(out_delay_1));

DelayActionRAM delay_2(.clk(clk),

.din(out_ram_2),

.dout(out_delay_2));

DelayActionRAM delay_3(.clk(clk),

.din(out_ram_3),

.dout(out_delay_3));

DelayActionRAM delay_4(.clk(clk),

.din(out_ram_4),

.dout(out_delay_4));

DelayActionRAM delay_5(.clk(clk),

.din(out_ram_5),

.dout(out_delay_5));

DelayActionRAM delay_6(.clk(clk),

.din(out_ram_6),

.dout(out_delay_6));

DelayActionRAM delay_7(.clk(clk),

.din(out_ram_7),

.dout(out_delay_7));

DelayActionRAM delay_8(.clk(clk),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 61


.din(out_ram_8),

.dout(out_delay_8));

DelayActionRAM delay_9(.clk(clk),

.din(out_ram_9),

.dout(out_delay_9));

DelayActionRAM delay_10(.clk(clk),

.din(out_ram_10),

.dout(out_delay_10));

DelayActionRAM delay_11(.clk(clk),

.din(out_ram_11),

.dout(out_delay_11));

DelayActionRAM delay_12(.clk(clk),

.din(out_ram_12),

.dout(out_delay_12));

DelayActionRAM delay_13(.clk(clk),

.din(out_ram_13),

.dout(out_delay_13));

DelayActionRAM delay_14(.clk(clk),

.din(out_ram_14),

.dout(out_delay_14));

DelayActionRAM delay_15(.clk(clk),

.din(out_ram_15),

.dout(out_delay_15));

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 62


//multiplexer 16 to 1

Mux16to1 mux16to1(.sel(current_action),

.d1(out_delay_1),

.d2(out_delay_2),

.d3(out_delay_3),

.d4(out_delay_4),

.d5(out_delay_5),

.d6(out_delay_6),

.d7(out_delay_7),

.d8(out_delay_8),

.d9(out_delay_9),

.d10(out_delay_10),

.d11(out_delay_11),

.d12(out_delay_12),

.d13(out_delay_13),

.d14(out_delay_14),

.d15(out_delay_15),

.dout(q_value_selected));

//this is new

//Decoder

Decoder Decoder(.at(current_action),

.en1(en1),

.en2(en2),

.en3(en3),

.en4(en4),

.en5(en5),

.en6(en6),

.en7(en7),

.en8(en8),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 63


.en9(en9),

.en10(en10),

.en11(en11),

.en12(en12),

.en13(en13),

.en14(en14),

.en15(en15));

//Max_Block

MaxBlock MaxBlock(.clk(clk),

.Q_Act1(out_ram_1),

.Q_Act2(out_ram_2),

.Q_Act3(out_ram_3),

.Q_Act4(out_ram_4),

.Q_Act5(out_ram_5),

.Q_Act6(out_ram_6),

.Q_Act7(out_ram_7),

.Q_Act8(out_ram_8),

.Q_Act9(out_ram_9),

.Q_Act10(out_ram_10),

.Q_Act11(out_ram_11),

.Q_Act12(out_ram_12),

.Q_Act13(out_ram_13),

.Q_Act14(out_ram_14),

.Q_Act15(out_ram_15),

.out(q_max));

//Q updater

QUpdater Qupdater(.old_Q(q_value_selected),

.max_Q(q_max),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 64


.reward(current_reward),

.new_Q(w_Q_new));

always@(posedge clk) begin

Q_new <= w_Q_new;

Q_out_action[15:0] <= out_delay_1;

Q_out_action[31:16] <= out_delay_2;

Q_out_action[47:32] <= out_delay_3;

Q_out_action[63:48] <= out_delay_4;

end

endmodule

QLearningAccelerato_tb.v

//Testbench Action RAM

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module QLearningAccelerator_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

reg [3:0] current_action = 4'd0;

reg [5:0] current_state;

reg [5:0] next_state;

reg signed [15:0] current_reward;

// reg [3:0] gamma;

// reg [3:0] alpha;

reg [16:0] action_counter;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 65


wire [15:0] result;

wire [63:0] Q_out_action;

//port mapping

QLearningAccelerator DUT(.clk(clk),

.en(en),

.current_action(current_action),

.current_state(current_state),

.next_state(next_state),

.current_reward(current_reward),

.Q_out_action(Q_out_action));

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

//test case

// always@(*) begin

// if (next_state == 6'd25)

// reward = 16'd100;

// else if (action_counter == 16'd15)

// reward = 16'hFFCE;// -50

// else if (next_state == 6'd3)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd4)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd7)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd13)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd14)

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 66


// reward = 16'hFF9C; //-100

// else if (next_state == 6'd17)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd19)

// reward = 16'hFF9C; //-100

// else if (next_state == 6'd22)

// reward = 16'hFF9C; //-100

// else

// reward = 16'h0; // 0

// end

initial begin

#10;

current_reward = 16'b00000111_00000000; //7

en = 1'b1;

current_state = 6'b000001; //1

next_state = 6'b000010; //2

//current_action = 4'd1;

// alpha = 4'b1000;

// gamma = 4'b1110;

#20;

current_state = 6'b000010;

next_state = 6'b000011;

//current_action = 4'd1;

#20;

current_state = 6'd3;

next_state = 4'd8;

current_action = 4'd1;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 67


#20;

current_state = 4'd8;

next_state = 4'd3;

current_action = 4'd4;

end

//display monitor

initial begin

$monitor("time = %2d\n result = %2d",

$time , result);

end

endmodule

QLearningAgent.v

//Module Q learning Agent

module QLearningAgent (clk, en, start, next_reward, next_state,epsilon,next_action);

input clk;

input en;

input start;

input[15:0] next_reward; // Reward

input[5:0] next_state; // Next State

input[15:0] epsilon;

output [3:0] next_action;

wire [63:0] w_q_values;

//wire w_next_reward;

//wire w_next_state;

wire [15:0] w_curr_reward;

wire [5:0] w_curr_state;

wire [3:0] w_curr_action;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 68


QLearningAccelerator QLearningAccelerator(.clk(clk),

.en(en),

.current_action(w_curr_action),

.current_state(w_curr_state),

.next_state(next_state),

.current_reward(w_curr_reward),

.Q_out_action(w_q_values));

PolicyGenerator PolicyGenerator(.clk(clk),

.start(start),

.current_action(w_curr_action),

.epsilon(epsilon),

.q_values(w_q_values),

.next_action(next_action));

DelayReward DelayReward(.clk(clk),

.din(next_reward),

.dout(w_curr_reward));

DelayState DelayState(.clk(clk),

.din(next_state),

.dout(w_curr_state));

endmodule

QLearningAgent_tb.v

//Testbench Action RAM

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module QLearningAgent_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 69


reg [5:0] next_state;

reg signed [15:0] next_reward;

reg signed [15:0] next_reward_temp;

// reg [3:0] gamma;

// reg [3:0] alpha;

//reg [16:0] action_counter;

//wire [15:0] result;

// wire [63:0] Q_out_action;

//port mapping

QLearningAgent DUT(.clk(clk),

.en(en),

.next_state(next_state),

.next_reward(next_reward));

reg[3:0] memory_map[0:24];

//read initial memory access

initial begin

$readmemh ("memory_map.list", memory_map);

end

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

//test case

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 70


always@(*) begin

if (next_state == 6'd25)

next_reward = 16'd100;

else if (next_state == 6'd3)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd4)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd7)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd13)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd14)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd17)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd19)

next_reward = 16'h9C00; //-100

else if (next_state == 6'd22)

next_reward = 16'h9C00; //-100

else

next_reward = 16'h0; // 0

end

initial begin

#10;

//next_reward = 16'b00000111_00000000; //7

en = 1'b1;

memory_map[0] = 1;

next_state = 6'b000001; //1

//current_action = 4'd1;

// alpha = 4'b1000;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 71


// gamma = 4'b1110;

#20;

next_state = 6'b000010; //2

memory_map[1] = 1;

//next_reward = 16'b00000101_00000000; //7

//current_action = 4'd1;

#20;

next_state = 6'b000111; //7

memory_map[2] = 1;

//next_reward = 16'b00000101_00000000; //7

#20;

next_state = 6'b001000; //8

memory_map[7] = 1;

#20;

memory_map[8] = 1;

//next_reward = 16'b00000101_00000000; //7

end

//display monitor

initial begin

$monitor("%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d %d %d\n%d %d %d


%d %d",

memory_map[0], memory_map[1],memory_map[2],memory_map[3],memory_map[4],

memory_map[5], memory_map[6], memory_map[7], memory_map[8],


memory_map[9],

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 72


memory_map[10],memory_map[11],memory_map[12],memory_map[13],memory_map[14],

memory_map[15],memory_map[16],memory_map[17],memory_map[18],memory_map[19],

memory_map[20],memory_map[21],memory_map[22],memory_map[23],memory_map[24]);

end
endmodule

QUpdater.v

module QUpdater (old_Q, max_Q, reward, new_Q);

input [15:0] old_Q;

input [15:0] max_Q;

input [15:0] reward; // Current Reward

output [15:0] new_Q; // Updated Q value

wire [15:0] max_i;

wire [15:0] max_j;

wire [15:0] max_k;

wire [15:0] combined_Q;

wire [15:0] final_Q;

BarrelShifter barrel_max_i(max_Q, 4'd1, max_i);

BarrelShifter barrel_max_j(max_Q, 4'd2, max_j);

BarrelShifter barrel_max_k(max_Q, 4'd3, max_k);

BarrelShifter barrel_alpha(combined_Q, 4'd1, final_Q);

assign combined_Q = reward + max_i + max_j + max_k - old_Q;

assign new_Q = final_Q + old_Q

endmodule

QUpdater_tb.v

module QUpdater_tb();

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 73


reg signed [15:0] old_Q;

reg signed [15:0] max_Q;

reg [3:0] gamma;

reg [15:0] current_reward;

reg [3:0] alpha;

wire [15:0] new_Q;

localparam sf = 2.0**-8.0; // Q4.4 scaling factor is 2^-4

QUpdater q_updater(old_Q, max_Q, current_reward, new_Q);

initial begin

old_Q = 16'b00000010_00000000; // 2

max_Q = 16'b00000010_00000000; // 2

current_reward = 16'b00000111_00000000; //7

#10;

$display("new_Q = %f", $itor(new_Q)*sf);

$stop;

end

endmodule

Randomizer.v

module Randomizer(ic,start,clk,q); // main module for lfsr

input [15:0]ic;

input start, clk;

output [15:0]q;

wire s;

wire [15:0]lfs;

assign s=lfs[15]^lfs[10]^lfs[9]^lfs[5];

dff dff1(lfs[15],start?ic[15]:s,clk);

dff dff2(lfs[14],start?ic[14]:lfs[15],clk);

dff dff3(lfs[13],start?ic[13]:lfs[14],clk);

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 74


dff dff4(lfs[12],start?ic[12]:lfs[13],clk);

dff dff5(lfs[11],start?ic[11]:lfs[12],clk);

dff dff6(lfs[10],start?ic[10]:lfs[11],clk);

dff dff7(lfs[9],start?ic[9]:lfs[10],clk);

dff dff8(lfs[8],start?ic[8]:lfs[9],clk);

dff dff9(lfs[7],start?ic[7]:lfs[8],clk);

dff dff10(lfs[6],start?ic[6]:lfs[7],clk);

dff dff11(lfs[5],start?ic[5]:lfs[6],clk);

dff dff12(lfs[4],start?ic[4]:lfs[5],clk);

dff dff13(lfs[3],start?ic[3]:lfs[4],clk);

dff dff14(lfs[2],start?ic[2]:lfs[3],clk);

dff dff15(lfs[1],start?ic[1]:lfs[2],clk);

dff dff16(lfs[0],start?ic[0]:lfs[1],clk);

assign q = {8'b00000000, lfs[7:0]};

endmodule

module dff (Q, D, Clock); //sub module- d flip flop

output Q;

input D;

input Clock;

reg Q;

always @(posedge Clock)

begin

Q <= D;

end

endmodule

RewardGenerator.v

module RewardGenerator (next_state, next_reward);

input [5:0] next_state;

output reg signed [15:0] next_reward;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 75


always @(*) begin

case (next_state)

6'd25 : next_reward = 16'd100; //100

6'd3 : next_reward = 16'hFF9C; //-100

6'd4 : next_reward = 16'hFF9C; //-100

6'd7 : next_reward = 16'hFF9C; //-100

6'd13 : next_reward = 16'hFF9C; //-100

6'd14 : next_reward = 16'hFF9C; //-100

6'd17 : next_reward = 16'hFF9C; //-100

6'd19 : next_reward = 16'hFF9C; //-100

6'd22 : next_reward = 16'hFF9C; //-100

default : next_reward = 16'h0; //0

endcase

end

endmodule

StateSelector.v

module StateSelector (

input [3:0] next_action,

input [5:0] current_state,

output reg [5:0] next_state);

always @(*) begin

case(next_action)

//Geser Kanan

4'b0000: begin

if(current_state%5 != 0)

begin

next_state=current_state+1;

end

else

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 76


begin

next_state=current_state;

end

end

//Geser Atas

4'b0001: begin

if(current_state > 5)

begin

next_state=current_state-5;

end

else

begin

next_state=current_state;

end

end

//Geser Kiri

4'b0010: begin

if(current_state%5 != 1)

begin

next_state=current_state-1;

end

else

begin

next_state=current_state;

end

end

//Geser Bawah

4'b0011: begin

if(current_state%5 < 21)

begin

next_state=current_state+5;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 77


end

else

begin

next_state=current_state;

end

end

default: begin next_state=current_state; end

endcase

end

endmodule

StateSelector_tb.v

module PenentuState_Block_tb;

reg [5:0] current_st;

reg [3:0] action;

wire [5:0] next_st;

PenentuState_Block PenentuState(.current_state(current_st), .at(action),


.next_state(next_st));

initial begin

current_st = 6'b001000;

action = 4'b0001;

#100;

current_st = 6'b001000;

action = 4'b0000;

#100;

current_st = 6'b001000;

action = 4'b0010;

#100;

current_st = 6'b001000;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 78


action = 4'b0011;

#100;

$stop;

end

endmodule

Multiplexer.v

//multiplexer 16 to 1

module Mux16to1(sel, d0, d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15, dout);

input[3:0] sel;

input[15:0] d0, d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15;

output reg [15:0] dout;

always@(*) begin

case (sel)

4'd1 : dout = d1;

4'd2 : dout = d2;

4'd3 : dout = d3;

4'd4 : dout = d4;

4'd5 : dout = d5;

4'd6 : dout = d6;

4'd7 : dout = d7;

4'd8 : dout = d8;

4'd9 : dout = d9;

4'd10 : dout = d10;

4'd11 : dout = d11;

4'd12 : dout = d12;

4'd13 : dout = d13;

4'd14 : dout = d14;

4'd15 : dout = d15;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 79


default : dout = d1;

endcase

end

// assign dout = (sel == 4'd0) ? d0 :

// (sel == 4'd1) ? d1 :

// (sel == 4'd2) ? d2 :

// (sel == 4'd3) ? d3 :

// (sel == 4'd4) ? d4 :

// (sel == 4'd5) ? d5 :

// (sel == 4'd6) ? d6 :

// (sel == 4'd7) ? d7 :

// (sel == 4'd8) ? d8 :

// (sel == 4'd9) ? d9 :

// (sel == 4'd10) ? d10 :

// (sel == 4'd11) ? d11 :

// (sel == 4'd12) ? d12 :

// (sel == 4'd13) ? d13 :

// d14;

endmodule

Multiplexer_tb.v

//Testbench Max Blcok

`timescale 1 ns/10 ps // time-unit = 1 ns, precision = 10 ps

module Mux16to1_tb();

//input Declaration

reg clk = 1'b0;

reg en = 1'b0;

reg[3:0] sel;

reg[15:0] d0, d1,d2,d3,d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 80


wire[15:0] dout;

//port mapping

Mux16to1 DUT( .sel(sel),

.d1(d1),

.d2(d2),

.d3(d3),

.d4(d4),

.d5(d5),

.d6(d6),

.d7(d7),

.d8(d8),

.d9(d9),

.d10(d10),

.d11(d11),

.d12(d12),

.d13(d13),

.d14(d14),

.d15(d15),

.dout(dout)

);

//clock generator

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

//test case

initial begin

#10;

d0 = 4'd0;

d1 = 4'd1;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 81


d2 = 4'd2;

d3 = 4'd3;

d4 = 4'd4;

d5 = 4'd5;

d6 = 4'd6;

d7 = 4'd7;

d8 = 4'd8;

d9 = 4'd9;

d10 = 4'd10;

d11 = 4'd11;

d12 = 4'd12;

d13 = 4'd13;

d14 = 4'd14;

d15 = 4'd15;

sel = 4'd0;

#20;

sel = 4'd1;

#20;

sel = 4'd2;

#20;

sel = 4'd3;

#20;

sel = 4'd4;

#20;

sel = 4'd5;

#20;

sel = 4'd6;

#20;

sel = 4'd7;

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 82


#20;

sel = 4'd8;

end

//display monitor

initial begin

$monitor("time = %2d\n dout = %2d",

$time , dout );

end

endmodule

Mux.v

module mux (sel, op1, op2, result);

input sel;

input [15:0] op1;

input [15:0] op2;

output [15:0] result;

assign result = (sel == 1'b0) ? op1 : op2;

endmodule

tb_action.v

module tb_action;

reg clk = 1'b1; // Clock

reg start = 1'b1; // Start signal, Active Low

reg [63:0] q_values;

reg [15:0] epsilon;

wire [3:0] action;

ActionSelector action_selector(clk, start, q_values, epsilon, action);

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

end

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 83


initial begin

#10;

start = 1'b0;

q_values =
64'b0000000000001100000000000000000100000000000000100000000000000011;

epsilon = 16'b0000000011100000;

$display("Random = %f", $itor(epsilon)*2.0**-8.0);

#10;

q_values =
64'b0000000000001100000000000000000100000000000000100000000000000111;

epsilon = 16'b0000000011000000;

$display("Random = %f", $itor(epsilon)*2.0**-8.0);

#10;

$stop;

end

endmodule

tb_random.v

module tb_random;

reg [15:0] seed;

reg clk = 1'b1; // Clock

reg start = 1'b1; // Start signal, Active Low

wire [15:0] out;

always begin

#10 clk = ~clk; //Clock dengan periode 20 time unit

if (clk) begin

$displayb(out);

$display("Random Number = %f", $itor(out)*2.0**-8.0);

end

end

Randomizer randomizer(seed, start, clk, out);

initial begin

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 84


seed = 16'b0100001001000010;

#10;

start = 1'b0;

#1000;

$stop;

end

endmodule

axi_control_tb.v

`timescale 1ns / 1ps


module axi_control_tb();
localparam T = 8;
// *** Multiplier ***
reg clk = 0;
reg rst_n = 0;

wire [15:0] new_Q;


wire [3:0] current_action;

// *** AXI control ***


wire s_axi_arready;
reg [31:0] s_axi_araddr;
reg s_axi_arvalid;
wire s_axi_awready;

reg [31:0] s_axi_awaddr;


reg s_axi_awvalid;
reg s_axi_bready;
wire [1:0] s_axi_bresp;
wire s_axi_bvalid;
reg s_axi_rready;
wire [31:0] s_axi_rdata;
wire [1:0] s_axi_rresp;
wire s_axi_rvalid;
wire s_axi_wready;
reg [31:0] s_axi_wdata;
reg [3:0] s_axi_wstrb;
reg s_axi_wvalid;
integer i;

ActionRAM_wrapper uut ( .clk(clk),


.rst_n(rst_n),
.new_Q(new_Q),
.current_action(current_action),
.s_axi_araddr(s_axi_araddr),
.s_axi_arready(s_axi_arready),
.s_axi_arvalid(s_axi_arvalid),
.s_axi_awaddr(s_axi_awaddr),
.s_axi_awready(s_axi_awready),
.s_axi_awvalid(s_axi_awvalid),
.s_axi_bready(s_axi_bready),
.s_axi_bresp(s_axi_bresp),

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 85


.s_axi_bvalid(s_axi_bvalid),
.s_axi_rdata(s_axi_rdata),
.s_axi_rready(s_axi_rready),
.s_axi_rresp(s_axi_rresp),
.s_axi_rvalid(s_axi_rvalid),
.s_axi_wdata(s_axi_wdata),
.s_axi_wready(s_axi_wready),
.s_axi_wstrb(s_axi_wstrb),
.s_axi_wvalid(s_axi_wvalid) );

always begin
// *** Clock ***
clk= ~clk;
#(T/2);
end

initial begin
// *** Init ***
// axi_control_write(4'h8, 17'h000); //write epsilon and start
s_axi_awaddr = 0;
s_axi_awvalid = 0;
s_axi_wstrb = 0;
s_axi_wdata = 0;
s_axi_wvalid = 0;
s_axi_bready = 1;
s_axi_araddr = 0;
s_axi_arvalid = 0;
s_axi_rready = 1;

// *** Reset ***


rst_n = 0;
#(T*5);
rst_n = 1;
#(T*5);

// *** Configuration and start ***


axi_control_write(8'h0, 4'd1); //write next_state
axi_control_write(8'h4, 16'd0); //write next_reward
axi_control_write(8'h8, 17'h0c00); //write epsilon
axi_control_write(8'hc, 1'b0); //start
// Wait until process is done
#(T*50);
axi_control_read(4'h10);

#(T*10);

// ### 2 ###
// *** Configuration and start ***
axi_control_write(8'h0, 4'd2); //write next_state
axi_control_write(8'h4, 16'hFF9C); //write next_reward
// axi_control_write(4'h8, 17'h0f000); //write epsilon and start
// Wait until process is done
#(T*50);
axi_control_read(8'h10);
#(T*10);
// *** Configuration and start ***
axi_control_write(8'h0, 4'd3); //write next_state
axi_control_write(8'h4, 16'hFF9C); //write next_reward
// Wait until process is done
#(T*50);
axi_control_read(8'h10);

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 86


// *** Read output ***

#(T*10);
// ### 4 ### // *** Configuration and start ***
axi_control_write(8'h0, 4'd4); //write next_state
axi_control_write(8'h4, 16'h0); //write next_reward
// Wait until process is done
#(T*50);
axi_control_read(8'h10);
end

task axi_control_write;
input [31:0] awaddr;
input [31:0] wdata;

begin // *** Write address ***


s_axi_awaddr = awaddr;
s_axi_awvalid = 1;
#T;
s_axi_awvalid = 0;
// *** Write data ***
s_axi_wdata = wdata;
s_axi_wstrb = 4'hf;
s_axi_wvalid = 1;
#T;
s_axi_wvalid = 0;
#T;
end
endtask

task axi_control_read;
input [31:0] araddr;
begin
// *** Read address ***
s_axi_araddr = araddr;
s_axi_arvalid = 1;
#T;
s_axi_arvalid = 0;
#T;
end
endtask
endmodule

Maze solver dengan axi control

#include <stdio.h>
#include <math.h>
#include "platform.h"
#include "xil_printf.h"
#include <stdlib.h>

#define CTRL_BASE 0X4000000

int main()
{
//REGISTERS
uint32_t *ctrl_p;
init_platform();
//Initialize pointer

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 87


ctrl_p = (uint32_t *) CTRL_BASE;

/* ctrl_p : next_state
* ctrl_p+1 : next_reward
* ctrl_p+2 : epsilon
* ctrl_p+3 : start
* ctrl_p+4 : next_action (input)
*/

int episode,new_action, t, reward;


double epsilon;

int state_i, state_j;


uint8_t state = 0;
int new_state = 0;
*(ctrl_p+3) = 1;
*(ctrl_p+3) = 0;

for (episode = 0; episode < 500; episode++)


{

epsilon = (double)(1.00 - (double)(episode+1)/501.00);

*(ctrl_p+2) = round(epsilon*pow(2,16));
// printf("\nepisode : %d epsilon : %d", episode,
*(ctrl_p+2));
//writing start
*(ctrl_p+3) = 0;

for (t = 0; t < 15; t++)


{
if (state == 24)
{
break;
}
if (new_state == 24)
{
reward = 0x6400; //100
}
else if (t == 14)
{
reward = 0xf100;
}
else if (new_state == 4)
{
reward = 0x9C00;
}
else if (new_state == 6)
{
reward = 0x9C00;
}
else if (new_state == 7)
{
reward = 0x9C00;
}
else if (new_state == 13)
{
reward = 0x9C00;
}
else if (new_state == 16)

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 88


{
reward = 0x9C00;
}
else if (new_state == 18)
{
reward = 0x9C00;
}
else if (new_state == 19)
{
reward = 0x9C00;
}
else if (new_state == 21)
{
reward = 0x9C00;
}
else
{
reward = 0;
}
//read new action
new_action = (uint32_t)*(ctrl_p+4);
printf("%d->", new_action);

// write next reward


*(ctrl_p+1) = reward;

switch(new_action)
{
case 1:
state_j += 1; //down
break;
case 2:
state_i -= 1; //up
break;
case 3:
state_j -= 1; //left
break;
case 4:
state_i += 1; //right
break;
}

if (state_i < 0)
{
state_i = 0;
}
else if (state_j < 0)
{
state_j = 0;
}
else if (state_i > 4)
{
state_i = 4;
}
else if (state_j > 4)
{
state_j = 4;
}

// //writing next_state

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 89


new_state = state_i * 5 + state_j;
state = new_state;
*(ctrl_p + 0) = new_state;
// printf("\naction\n");

}
printf("\n");
}
cleanup_platform();
return 0;
}

Laporan Ujian Akhir –Perancangan Sistem VLSI– STEI ITB 90

You might also like