You are on page 1of 22

PROPOSAL PRESENTATION

UNIVERSITY OF MAIDUGURI

FACULTY OF ENGINEERING

DEPARTMENT OF COMPUTER ENGINEERING

PROPOSAL PRESENTATION

TOPIC: VLSI DESIGN AND IMPLEMENTATION OF A


MATRIX PROCESSOR UNIT
BY
KUBURAH MOHAMMED

PGA/18/05/07/08806

SUPERVISORS
DR.S. J. Bassi

DR. P.Y. DIBAL

DATE:
Presentation Outline

• Background of the Study


• Related Works
• Problem Statement
• Aim And Objectives
• Scope
• Methodology
• Preliminary Result
• Conclusion
Background of the Study
 Matrix operations is one of the most fundamental mathematical tools used
for the characterization and analysis of observations in experiments and
design. (Naser, S. M, 2021; Al-Neama, M. W, 2014; K.A Stroud & Booth,
2007).
• At the frontier of physics, matrix operations are playing critical roles in the
advancement of the knowledge of quantum mechanics (Betzios et al, 2018;
Scott, T. C et al, 2017).
• In critical fields like computing, signal processing, and communication
engineering, matrix operations are making it possible to design digital filters
which are making major improvements and innovations possible (Sidney-
Burus, C. 2019; Stroud, Wang, X & Serpedin, E. 2016).
• Quite a number of efforts have been made by engineers in the design of
processors which perform matrix operations in hardware and in real-time
(Wang, Li, Han, Feng, & Lin, 2016; Jain et al, 2014).
Background cont’d
• Processor which perform matrix computations are called MATRIX
PROCESSORS.

They take in a matrix, perform the desired operation, and produce a


result.
Related Works Table
S/N REFERENCES LATENCY RESOURCE ACCURACY POWER HIGH
UTILIZATION LEVEL
CONSUMPTION SYNTHESIS

HIGH LOW HIGH LOW HIGH LOW HIGH LOW

1 Chetan, S. 65.43%
et al, 2020 2.0µs Not
and 36,520 X
clear
79.41%
2 Rusul S. et
al, 2020 18%
40,05
2.7µs and less X
1
22%
3 Ting, T. et
al, 2013 3.41
1µs high √ X
%
4 Proposed √
work
√ √ √ √
Related Works Table contd

• High power consumption is done to high latency, power consumption and resource utilization
occurs as a result of suboptimal designs input and output
Problem statement
• Several authors have designed matrix processors, but good memory management which yields power consumption was sub optimal

in their designs(Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. 2017)

• Matrix processors have been designed by several authors to perform a variety of computations, but a design targeting an FPGA are

powered by floating point computations is envisaged to yield higher accuracy is yet to be considered (Yu, Q., Maddah-Ali, M. A., &

Avestimehr, A. S. 2020).

• Matrix processors capable of performing floating point computations are based on field programmable gate arrays (FPGAs) are an

open area of research as they are quite challenging to develop.

• Several authors have designed matrix processors under different constraints like memory and speed, but their designs showed high

resource utilization because of the techniques they used.(Weeks, M., & BAYOUMI, M. (2003).

• In FPGA based designs, several authors used either VHDL or Verilog. However, VHDL designs require extensive coding which are

error prone; our work takes the abstraction level higher to the (HLS) level which gives a better level of abstraction.

• This work aims to fill this gap by developing a matrix processor using an FPGA as the target platform and the IEEE-754 single

precision floating point representation of numbers.


Aim and Objectives
The aim of this work is to develop a matrix co-processor unit based on IEEE-754 single precision floating
point representation of numbers using FPGA.

To achieve this aim, the following objectives are proposed:

i. To design a floating point arithmetic and logic unit that will be developed to perform four fundamental
mathematical operations of floating point addition, subtraction, multiplication, and division; these
operations will play critical roles in the performance of the processor.
• To design an architecture based on (i) for each of the matrix operations the processor will
perform.
ii. To use VHDL and HLS to perform hardware description for each of the matrix operations based on the
architecture in (i) and integrate into a co-processor.

iii. To implement and evaluate the system performance.


Scope of the Study

• The scope of this study is confined to the development of a matrix processing unit based on the IEEE-754

single precision floating point representation of numbers and VHDL as the language for coding and hardware

description. The matrix operations that will be performed by the processor will be from small-to-medium range

operations which will be clearly defined in the methodology of this work.


Significance of the Study

• The significance of this study is that an enhancement to the speed of microprocessors will be achieved by removing the

responsibility of matrix computation from the core processor so that it will focus on other critical areas of computation.

Secondly, the application of the processor designed in this thesis in portable devices like mobile phones, laptops, and tabs

will significantly enhance their performance in terms of speed and memory management. Finally, this study insulates the

central microprocessor from the errors associated with computationally intensive sub-systems like matrix operators because a

separate unit will be saddled with this responsibility.


Research procedure
• The step-by-step approach will be followed in order to realize the objectives of this work are
elucidated by the methodology. The entire procedure is divided into three steps- the first is the input
data which is passed to the MPU co-processor as shown below This is followed by the operation type
which is also an input. The MPU then uses these information in performing the necessary matrix
operation before sending the result back to the CPU.

The input unit interfaces the matrix co-processor to the CPU. The data to be used for the matrix
operation and the type of operation to be performed is supplied as an input by the CPU. The matrix
co-processor then performs the exact operation specified by the CPU as succinctly represented in
figure below. Upon completion of the matrix operation, the co-processor sends the result to the CPU
through the output interface represented as computation results in figure below. The CPU then uses
the results for further processing of the initial task.

The matrix co-processor receives two inputs- an instruction code and matrix data. The instruction
code tells the co-processor what operation will be performed on the matrix data. The matrix co-
processor is made up of an instruction decoder matrix operational units. The instruction decoder
BLOCK DIAGRAM
TOOLS

The tools to be used are:


• Xilinx VIVADO
• Xilinx HLS
• MATLAB
MEASUREMENT MATRICES

• Speed = computations / seconds

• Accuracy = Expected results – measurement results x 100%


measured results

• Area utilization = ∑utilized hardware

• Power consumption = P = VI
ARCHITECTURAL DESIGN
PRELIMINARY RESULTS

Floating point multiplication


PRELIMINARY RESULTS contd…

Floating point addition


EXPECTED RESULTS

At the end of this work, the following results will be expected:

i. A processor capable of performing an array of matrix operations will be developed

ii. The processor will be developed into an IP CORE which can be integrated into any design.
CONCLUSION
• At the end of this work, a matrix co-processor capable of performing floating point-based matrix
computations in an efficient manner will be developed.
REFERENCES
IBIS World. (2020). Global Consumer Electronics Manufacturing Industry - Market Research
Report. Retrieved March 26, 2021, from Global Consumer Electronics Manufacturing industry
trends (2015-2020) website: https://www.ibisworld.com/global/market-research-reports/global-
consumer-electronics-manufacturing-industry/

Umurgolu, Y., Conficconi, D., Rasnayake, L., Preusser, T. B., & Sjalander, M. (2019). Optimizing Bit-
Serial Matrix Multiplication for Reconfigurable Computing. ArXiv E-Prints, 1901.000v2, 1–42.

Abdelfattah, A., Tomov, S., & Dongarra, J. (2020). Matrix multiplication on batches of small
matrices in half and half-complex precisions. Elsevier - Journal of Parallel and Distributed
Computing, 145, 188–201. https://doi.org/10.1016/j.jpdc.2020.07.001
Bigdeli, A., Biglari-Abhari, M., Salcic, Z., & Lay, Y. T. (2006). A New Pipelined Systolic Array-Based
Architecture for Matrix Inversion in FPGAs with Kalman Filter Case Study. Eurasip Journal of
Applied Signal Processing, 2006, 1–12. https://doi.org/10.1155/ASP2006/89186

Chadanbi, S., Majumdar, A., Becchi, M., Chakradhar, S., & Graf, H. P. (2010). A Programmale
Parallel Accelerator for Learning and Classification. 19th International Conference on Parallel
Architectures and Compilation Techniques, 273–284.
REFERENCES CONT’D

• Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. (2017). Network Latency Estimation for
Personal Devices: A Matrix Completion Approach. EEE/ACM Transactions on Networking,
25(2), 724–737. https://doi.org/10.1109/TNET.2016.2612695.

• Yu, Q., Maddah-Ali, M. A., & Avestimehr, A. S. (2020). Straggler Mitigation in Distributed
Matrix Multiplication: Fundamental Limits and Optimal Coding. ArXiv E-Prints, 1801.07487,
1–14.

• Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. (2017). Network Latency Estimation for
Personal Devices: A Matrix Completion Approach. EEE/ACM Transactions on Networking,
25(2), 724–737. https://doi.org/10.1109/TNET.2016.2612695

You might also like