Professional Documents
Culture Documents
UNIVERSITY OF MAIDUGURI
FACULTY OF ENGINEERING
PROPOSAL PRESENTATION
PGA/18/05/07/08806
SUPERVISORS
DR.S. J. Bassi
DATE:
Presentation Outline
1 Chetan, S. 65.43%
et al, 2020 2.0µs Not
and 36,520 X
clear
79.41%
2 Rusul S. et
al, 2020 18%
40,05
2.7µs and less X
1
22%
3 Ting, T. et
al, 2013 3.41
1µs high √ X
%
4 Proposed √
work
√ √ √ √
Related Works Table contd
• High power consumption is done to high latency, power consumption and resource utilization
occurs as a result of suboptimal designs input and output
Problem statement
• Several authors have designed matrix processors, but good memory management which yields power consumption was sub optimal
in their designs(Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. 2017)
• Matrix processors have been designed by several authors to perform a variety of computations, but a design targeting an FPGA are
powered by floating point computations is envisaged to yield higher accuracy is yet to be considered (Yu, Q., Maddah-Ali, M. A., &
Avestimehr, A. S. 2020).
• Matrix processors capable of performing floating point computations are based on field programmable gate arrays (FPGAs) are an
• Several authors have designed matrix processors under different constraints like memory and speed, but their designs showed high
resource utilization because of the techniques they used.(Weeks, M., & BAYOUMI, M. (2003).
• In FPGA based designs, several authors used either VHDL or Verilog. However, VHDL designs require extensive coding which are
error prone; our work takes the abstraction level higher to the (HLS) level which gives a better level of abstraction.
• This work aims to fill this gap by developing a matrix processor using an FPGA as the target platform and the IEEE-754 single
i. To design a floating point arithmetic and logic unit that will be developed to perform four fundamental
mathematical operations of floating point addition, subtraction, multiplication, and division; these
operations will play critical roles in the performance of the processor.
• To design an architecture based on (i) for each of the matrix operations the processor will
perform.
ii. To use VHDL and HLS to perform hardware description for each of the matrix operations based on the
architecture in (i) and integrate into a co-processor.
• The scope of this study is confined to the development of a matrix processing unit based on the IEEE-754
single precision floating point representation of numbers and VHDL as the language for coding and hardware
description. The matrix operations that will be performed by the processor will be from small-to-medium range
• The significance of this study is that an enhancement to the speed of microprocessors will be achieved by removing the
responsibility of matrix computation from the core processor so that it will focus on other critical areas of computation.
Secondly, the application of the processor designed in this thesis in portable devices like mobile phones, laptops, and tabs
will significantly enhance their performance in terms of speed and memory management. Finally, this study insulates the
central microprocessor from the errors associated with computationally intensive sub-systems like matrix operators because a
The input unit interfaces the matrix co-processor to the CPU. The data to be used for the matrix
operation and the type of operation to be performed is supplied as an input by the CPU. The matrix
co-processor then performs the exact operation specified by the CPU as succinctly represented in
figure below. Upon completion of the matrix operation, the co-processor sends the result to the CPU
through the output interface represented as computation results in figure below. The CPU then uses
the results for further processing of the initial task.
The matrix co-processor receives two inputs- an instruction code and matrix data. The instruction
code tells the co-processor what operation will be performed on the matrix data. The matrix co-
processor is made up of an instruction decoder matrix operational units. The instruction decoder
BLOCK DIAGRAM
TOOLS
• Power consumption = P = VI
ARCHITECTURAL DESIGN
PRELIMINARY RESULTS
ii. The processor will be developed into an IP CORE which can be integrated into any design.
CONCLUSION
• At the end of this work, a matrix co-processor capable of performing floating point-based matrix
computations in an efficient manner will be developed.
REFERENCES
IBIS World. (2020). Global Consumer Electronics Manufacturing Industry - Market Research
Report. Retrieved March 26, 2021, from Global Consumer Electronics Manufacturing industry
trends (2015-2020) website: https://www.ibisworld.com/global/market-research-reports/global-
consumer-electronics-manufacturing-industry/
Umurgolu, Y., Conficconi, D., Rasnayake, L., Preusser, T. B., & Sjalander, M. (2019). Optimizing Bit-
Serial Matrix Multiplication for Reconfigurable Computing. ArXiv E-Prints, 1901.000v2, 1–42.
Abdelfattah, A., Tomov, S., & Dongarra, J. (2020). Matrix multiplication on batches of small
matrices in half and half-complex precisions. Elsevier - Journal of Parallel and Distributed
Computing, 145, 188–201. https://doi.org/10.1016/j.jpdc.2020.07.001
Bigdeli, A., Biglari-Abhari, M., Salcic, Z., & Lay, Y. T. (2006). A New Pipelined Systolic Array-Based
Architecture for Matrix Inversion in FPGAs with Kalman Filter Case Study. Eurasip Journal of
Applied Signal Processing, 2006, 1–12. https://doi.org/10.1155/ASP2006/89186
Chadanbi, S., Majumdar, A., Becchi, M., Chakradhar, S., & Graf, H. P. (2010). A Programmale
Parallel Accelerator for Learning and Classification. 19th International Conference on Parallel
Architectures and Compilation Techniques, 273–284.
REFERENCES CONT’D
• Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. (2017). Network Latency Estimation for
Personal Devices: A Matrix Completion Approach. EEE/ACM Transactions on Networking,
25(2), 724–737. https://doi.org/10.1109/TNET.2016.2612695.
• Yu, Q., Maddah-Ali, M. A., & Avestimehr, A. S. (2020). Straggler Mitigation in Distributed
Matrix Multiplication: Fundamental Limits and Optimal Coding. ArXiv E-Prints, 1801.07487,
1–14.
• Zhu, R., Liu, B., Niu, D., Li, Z., & Zhao, H. V. (2017). Network Latency Estimation for
Personal Devices: A Matrix Completion Approach. EEE/ACM Transactions on Networking,
25(2), 724–737. https://doi.org/10.1109/TNET.2016.2612695