FPGA Implementations of HEVC Inverse DCT Using High-Level Synthesis

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/289490072
FPGA implementations of HEVC Inverse DCT using high-level synthesis
Conference Paper · September 2015

DOI: 10.1109/DASIP.2015.7367262
CITATIONS READS
12 556
2 authors, including:
Ercan Kalali
Delft University of Technology
31 PUBLICATIONS 296 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ercan Kalali on 06 January 2016.
The user has requested enhancement of the downloaded file.

FPGA Implementations of HEVC Inverse DCT
Using High-Level Synthesis
Ercan Kalali, Ilker Hamzaoglu
Faculty of Engineering and Natural Sciences, Sabanci University
34956 Tuzla, Istanbul, Turkey
{ercankalali, hamzaoglu}@sabanciuniv.edu
Abstract— High Efficiency Video Coding (HEVC), the tools take C or C++ codes as input, and generate Verilog or
recently developed international video compression standard, has VHDL codes. MATLAB Simulink HDL Coder takes
50% better video compression efficiency than H.264 video MATLAB Simulink models as input, and generates Verilog or
compression standard at the expense of significantly increased VHDL codes.
computational complexity. HEVC Inverse Discrete Cosine
Transform (IDCT) algorithm accounts for 11% of the Since HEVC 2D IDCT performs matrix multiplication
computational complexity of an HEVC video encoder. Recently, operations, it is suitable for HLS implementation. Therefore,
commercial and academic high-level synthesis (HLS) tools are in this paper, the first FPGA implementations of HEVC 2D
started to be successfully used for FPGA implementations of IDCT algorithm using HLS tools in the literature are
digital signal processing algorithms. Therefore, in this paper, the
first FPGA implementations of HEVC 2D IDCT algorithm using
proposed. The proposed HEVC 2D IDCT hardware are
HLS tools in the literature are proposed. The proposed HEVC implemented on Xilinx FPGAs using three HLS tools; Xilinx
IDCT hardware are implemented on Xilinx FPGAs using three Vivado HLS, LegUp, MATLAB Simulink HDL Coder.
HLS tools; Xilinx Vivado HLS, LegUp, MATLAB Simulink HDL
The inputs given to these HLS tools, C codes for Xilinx
Coder. Using HLS tools significantly reduced the FPGA
development time, and the resulting FPGA implementations
Vivado HLS and LegUp and MATLAB Simulink model for
achieved real-time performance. Therefore, HLS tools can be MATLAB Simulink HDL Coder, are developed based on the
used for FPGA implementation of HEVC video encoder. HEVC 2D IDCT software implementation in the HEVC
reference software encoder (HM) version 15 [7]. Two HEVC
Keywords—HEVC, IDCT, FPGA Implementation, HLS. 2D IDCT HLS implementations are done. In one of them, in
the C codes and MATLAB Simulink model multiplications
I. INTRODUCTION with constants are implemented using multiplication
ITU and ISO joint collaborative team on video coding operation. In the other one, they are implemented using
(JCT-VC) recently developed a new international video addition and shift operations.
compression standard called High Efficiency Video Coding
Some of the optimization options of these HLS tools are
(HEVC) [1]-[2]. HEVC has 50% better video compression
used in order to increase performances of the FPGA
efficiency than H.264 video compression standard.
implementations such as pipelining and loop unrolling.
HEVC standard uses Discrete Cosine Transform (DCT) and Verilog RTL codes generated by these three HLS tools for the
Inverse Discrete Cosine Transform (IDCT) same as H.264 two HEVC 2D IDCT HLS implementations are verified to
standard. However, H.264 standard uses only 4x4 and 8x8 work in a Xilinx Virtex 6 FPGA.
Transform Unit (TU) sizes for DCT/IDCT. HEVC standard
Using HLS tools significantly reduced the FPGA
uses 4x4, 8x8, 16x16, and 32x32 TU sizes for DCT/IDCT.
development time. The implementation results show that the
Larger TU sizes achieve better energy compaction. However,
proposed HEVC 2D IDCT FPGA implementations using HLS
they increase the computational complexity exponentially.
achieved real-time performance with acceptable hardware
DCT and IDCT are heavily used in an HEVC encoder [3]- area. Therefore, HLS tools can be used for FPGA
[5]. IDCT has high computational complexity. It accounts for implementation of HEVC video encoder.
11% of the computational complexity of an HEVC video
A few HLS implementations for H.264 video compression
encoder. DCT and IDCT account for 25% of the
standard are proposed in the literature [8]-[11]. There are a
computational complexity of an all intra HEVC video
few HLS implementations based on MPEG reconfigurable
encoder.
video coding [12]-[13]. There are several HLS
Recently, commercial and academic high-level synthesis implementations for image and video processing algorithms
(HLS) tools are started to be successfully used for FPGA such as sorting in the median filter [14]-[17]. Several
implementations of digital signal processing algorithms. The handwritten Verilog RTL implementations for HEVC video
high level synthesis tools accept their inputs in different compression standard are also proposed in the literature [18]-
formats [6]. For example, Xilinx Vivado HLS and LegUp [23].
Because of the multiplications with large coefficients,
truncation and clipping operations are added to HEVC IDCT
after the horizontal 1D transform and the vertical 1D
transform. The truncation operations are shown in (2) and (3).
͸Ͷ ͸Ͷ ͸Ͷ ͸Ͷ
ͺ͵ ͵͸ െ͵͸ െͺ͵
‫ܶܥܦܫ‬ସ௫ସ ൌ൦ ൪ (1)
͸Ͷ െ͸Ͷ െ͸Ͷ ͸Ͷ
͵͸ െͺ͵ ͺ͵ െ͵͸
ܶ‫݊݋݅ݐܽܿ݊ݑݎ‬ଵ஽ ൌ ሺ‫ܶܥܦܫ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൅ ͸Ͷሻ ‫ ب‬͹(2)

ܶ‫݊݋݅ݐܽܿ݊ݑݎ‬ଶ஽ ൌ ሺ‫ܶܥܦܫ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൅ ʹͲͶͺሻ ‫(ʹͳ ب‬3)
III. HEVC 2D IDCT HLS IMPLEMENTATIONS

The proposed HLS implementation of HEVC 2D IDCT for
all TU sizes is shown in Fig. 1. IDCT inputs are selected
depending on size of the IDCT operation (4x4, 8x8, 16x16 or
32x32). First, 1D column IDCT is performed, and the resulting
coefficients are clipped. Then, 1D row IDCT is performed
using the transpose of the resulting matrix as input, and the
resulting coefficients are clipped.
Two HEVC 2D IDCT HLS implementations are done.
Fig. 1. HLS Implementation of HEVC 2D IDCT
These HLS implementations are synthesized to Verilog RTL
The HEVC 2D IDCT HLS implementation proposed in this using three different HLS tools. The inputs given to these HLS
paper is the first HLS implementation for HEVC video tools, C codes for Xilinx Vivado HLS and LegUp and
compression standard. It is also the first HLS implementation MATLAB Simulink model for MATLAB Simulink HDL
for the transform operations in video compression standards. Coder, are developed based on the HEVC 2D IDCT software
In addition, HEVC IDCT algorithm is one of the most implementation in the HEVC reference software encoder (HM)
computationally complex algorithms compared to other HLS version 15 [7]. In one HLS implementation, in the C codes and
implementations for both image processing and video MATLAB Simulink model multiplications with constants are
compression. implemented using multiplication operation. In the other one,
they are implemented using addition and shift operations.
The rest of the paper is organized as follows. HEVC 2D
IDCT algorithm is explained in Section II. In Section III, the Verilog RTL codes generated by these three HLS tools for
proposed HLS implementations are explained, and the the two HEVC 2D IDCT HLS implementations are verified
implementation results are given. Finally, Section IV presents with RTL simulations. RTL simulation results matched the
the conclusions. results of HEVC 2D IDCT implementation in HEVC reference
software encoder (HM) version 15 [7]. The Verilog RTL codes
II. HEVC 2D IDCT ALGORITHM are synthesized and mapped to a Xilinx XC6VLX550T FF1760
HEVC standard uses Inverse Discrete Cosine Transform FPGA with speed grade 2 using Xilinx ISE 13.4. The FPGA
(IDCT) same as H.264 standard. However, H.264 standard implementations are verified with post place and route
uses only 4x4 and 8x8 Transform Unit (TU) sizes for simulations.
DCT/IDCT. HEVC standard uses 4x4, 8x8, 16x16, and 32x32 A. Xilinx Vivado HLS
TU sizes for DCT/IDCT. Xilinx Vivado HLS tool can generate Verilog RTL codes
HEVC standard performs 2D transform by first performing from C/C++ and System C codes. It can optimize the area,
1D column transform and then performing 1D row transform. speed and power consumption of the hardware implementation.
The coefficients in the DCT/IDCT matrices are derived from It provides bit-accurate and cycle-accurate implementations. It
the DCT basis functions. However, integer DCT coefficients has several optimization options such as pipelining, loop
are used for simplicity. unrolling, and loop merging. It allows adding specific RAM
blocks, FIFOs, or ROMs to HLS implementation with
First, an integer IDCT matrix for 32x32 TU size is directives. It also allows adding specific DSP blocks such as
generated. The other TU sizes (4x4, 8x8, 16x16) use the sub- multiplier, divider or square unit. In addition, it has an option
sampled versions of the 32x32 TU size. An IDCT matrix for to select I/O ports as bus, memory, FIFO or acknowledge type.
4x4 TU size is shown in (1). It also allows adding high speed AXI-4 busses for data transfer.
void COL_partialButterflyInverse8( (multiplications with constants are implemented using
int15 resid[DCT_8], int7 coeff[31], int7 coef8[16], multiplication operation) and AS (multiplications with
int16 *Y1, int16 *Y2, int16 *Y3, int16 *Y4,
int16 *Y5, int16 *Y6, int16 *Y7, int16 *Y8)
constants are implemented using addition and shift operations),
{ for each TU size and for all TU sizes are given in Table II.
char j,l,k = 0;
int26 E[4], O[4], EE[2], EO[2]; C. MATLAB Simulink HDL Coder
for(l=0; l<4; l++) #pragma HLS unroll factor=2

MATLAB Simulink is a widely used modeling tool for
{O[l] = coef8[l*4]*resid[1] + coef8[l*4+1]*resid[3] + many applications such as signal processing and
coef8[l*4+2]*resid[5] + coef8[l*4+3]*resid[7]; communication. MATLAB Simulink HDL Coder tool can
}; generate Verilog RTL codes from MATLAB Simulink models.
It has several optimization options such as pipelining, clock
EO[0] = coeff[1]*resid[2] + coeff[2]*resid[6];
EO[1] = coeff[2]*resid[2] - coeff[1]*resid[6]; gating, resource sharing, RAM mapping, and delay balancing.
EE[0] = coeff[0]*resid[0] + coeff[0]*resid[4];
EE[1] = coeff[0]*resid[0] - coeff[0]*resid[4]; TABLE I. XILINX VIVADO HLS IMPLEMENTATION RESULTS
#pragma HLS pipeline Freq. Full

TU Imp. LUTs DFFs Slices BRAMs I/O
E[0] = EE[0] + EO[0]; (MHz) HD fps
E[1] = EE[0] - EO[0];
E[2] = EE[1] + EO[1]; M 1081 560 356 1 134 222 142
4x4
E[3] = EE[1] - EO[1]; AS 663 373 212 1 134 294 189
M 3486 2515 1193 3 262 222 72
*Y1 = Clip3(-32768, 32767, (E[0] + O[0] + 64) >> 7); 8x8
AS 2834 2010 919 1 262 313 101
*Y2 = Clip3(-32768, 32767, (E[1] + O[1] + 64) >> 7); M 7159 5495 2554 13 518 208 80
*Y3 = Clip3(-32768, 32767, (E[2] - O[2] + 64) >> 7); 16x16
AS 5000 4090 1602 1 518 333 128
*Y4 = Clip3(-32768, 32767, (E[3] - O[3] + 64) >> 7);
*Y5 = Clip3(-32768, 32767, (E[3] + O[3] + 64) >> 7); M 13046 10366 4556 25 1030 208 54
32x32
*Y6 = Clip3(-32768, 32767, (E[2] + O[2] + 64) >> 7); AS 40764 28772 12605 13 1030 208 54
*Y7 = Clip3(-32768, 32767, (E[1] - O[1] + 64) >> 7); M 24542 18416 8193 39 1045 200 52
*Y8 = Clip3(-32768, 32767, (E[0] - O[0] + 64) >> 7); All
AS 50566 34955 14944 13 1045 208 54
}
Fig. 2. Xilinx Vivado C Code TABLE II. LEGUP IMPLEMENTATION RESULTS
A part of the C codes developed and given as input to Freq. Full

Xilinx Vivado HLS tool is shown in Fig. 2. Since HEVC 2D TU Imp. LUTs DFFs Slices BRAMs I/O
(MHz) HD fps
IDCT performs matrix multiplication operations, many for M 4546 1232 1484 2 781 149 36
loops are used in the C codes. Therefore, loop unrolling 4x4
AS 1431 623 554 1 781 162 39
directive is used in the C codes to increase performance. M 8669 2701 2772 7 781 147 35
Pipelining directive is also used in the C codes to increase 8x8
AS 4804 2474 1613 1 781 213 51
performance. M 14557 4896 4829 27 781 130 31
16x16
AS 17199 8876 5582 21 781 150 36
The FPGA implementation results for the Verilog RTL
M 26854 9420 8654 27 781 128 31
codes generated by Xilinx Vivado HLS tool for the two C 32x32
AS 51254 26427 15580 21 781 147 35
codes, M (multiplications with constants are implemented
using multiplication operation) and AS (multiplications with M 53605 17714 16950 56 829 130 31
All
AS 74072 38369 23481 41 781 143 35
for each TU size and for all TU sizes are given in Table I.
TABLE III. MATLAB SIMULINK HDL CODER IMPLEMENTATION
B. LegUp RESULTS
LegUp is an open-source HLS tool. It can generate Verilog Freq. Full

TU Imp. LUTs DFFs Slices BRAMs I/O
RTL codes from C codes. It provides few optimization options (MHz) HD fps
such as pipelining and loop unrolling. However, data M 661 160 200 0 130 149 96
dependencies because of loop unrolling cannot be detected by 4x4
AS 377 134 133 0 116 166 107
LegUp. It does not provide bit-accurate implementations. This M 2620 262 750 0 258 110 61
8x8
increases area and decreases performance of FPGA AS 1167 174 375 0 242 136 75
implementation. It generates RTL codes especially for Altera M 9649 484 2671 0 514 104 54
16x16
FPGA synthesis tools. However, the generated RTL codes can AS 3924 362 1210 0 488 120 62
also be synthesized using Xilinx ISE tool with some M 34499 1005 9684 0 1026 100 50
32x32
modifications. AS 12861 668 3878 0 983 110 55
M 46471 1393 13727 0 1032 100 50
The FPGA implementation results for the Verilog RTL All
13669 1140 3932 0 1000 110 55
AS
codes generated by LegUp HLS tool for the two C codes, M
It has many available models and it allows adding handwritten The HEVC 2D IDCT hardware in [23] is shown in Fig. 4.
MATLAB codes. These speed up the design process. However, In this hardware, IDCT inputs are selected depending on size
implementing large designs is the main drawback of Simulink. of the IDCT operation (4x4, 8x8, 16x16 or 32x32). The
Because large designs require complex graphical models, and hardware uses an efficient butterfly structure for column and
this slows down the design process. row transforms. After 1D column IDCT, the resulting
coefficients are stored in a transpose memory, and they are
A part of the MATLAB Simulink model developed and
used as input for 1D row IDCT. The multiplication operations
given as input to MATLAB Simulink HDL Coder tool is
are performed in the datapaths using only adders and shifters.
shown in Fig. 3. Pipelining and clock gating optimizations are
In order to reduce number and size of the adders in the
used in the model. The FPGA implementation results for the
hardware, Hcub MCM algorithm [24] is used for calculating
Verilog RTL codes generated by MATLAB Simulink HDL
4, 8, 16 and 32 point IDCT. Hcub algorithm determines
Coder tool for the two MATLAB Simulink models, M
necessary shift and addition operations in a multiplier block.
(multiplications with constants are implemented using
There are 4 multiplier blocks in 8x8 datapath, 8 multiplier
multiplication operation) and AS (multiplications with
blocks in 16x16 datapath and 16 multiplier blocks in 32x32
datapath.
for each TU size and for all TU sizes are given in Table III.
These results show that the proposed HEVC 2D IDCT
D. Hardware Comparison FPGA implementations using HLS achieved real-time
performance with acceptable hardware area. In addition, HLS
The implementation results of the three HLS tools (Xilinx
implementations have comparable area and clock frequency
Vivado HLS, LegUp, MATLAB Simulink HDL Coder) are
results with handwritten Verilog RTL implementation.
compared with the implementation result of the handwritten
However, handwritten Verilog RTL implementation processes
Verilog RTL implementation of the HEVC 2D IDCT more frames per second (fps) than the HLS implementations.
hardware in [23] except the proposed energy reduction This is because the handwritten Verilog RTL implementation
technique. The handwritten Verilog RTL codes are also
includes several optimizations such as a complex pipeline and
synthesized and mapped to a Xilinx XC6VLX550T FF1760
resource sharing between different TU sizes. In addition, HLS
FPGA with speed grade 2 using Xilinx ISE 13.4. The
implementations spend some redundant clock cycles
comparison results are given in Table IV. especially for hand-shaking protocols.
Fig. 3. MATLAB Simulink Model

Fig. 4. HEVC 2D IDCT Hardware [23]
TABLE IV. HARDWARE COMPARISON

REFERENCES
Freq.
LUTs DFFs Slices BRAMs fps
(MHz) [1] High Efficiency Video Coding, ITU-T Rec. H.265 and ISO/IEC 23008-2
54 (HEVC), ITU-T and ISO/IEC, Apr. 2013.
Xilinx Vivado HLS 50566 34955 14944 13 208 [2] M. T. Pourazad, C. Doutre, M. Azimi, P. Nasiopoulos, “HEVC: The
Full HD
35 New Gold Standard for Video Compression”, IEEE Consumer
LegUp 74072 38369 23481 41 143 Electronics Magazine, July 2012.
Full HD
[3] Y. J. Ahn, W. J. Han, D. G. Sim, “Study of decoder complexity for
MATLAB Simulink 55 HEVC and AVC standarts based on tool-by-tool comparison”, SPIE
13669 1140 3932 0 110
HDL Coder Full HD Applications of Digital Image Processing XXXV, vol. 8499, Aug. 2012.
Handwritten 48 [4] F. Bossen, B. Bross, K. Suhring, D. Flynn, "HEVC complexity and
38790 11762 11343 32 150 implementation analysis", IEEE Trans. on Circuits and Systems for
Verilog RTL[23] Ultra HD
Video Technology, vol.22, no.12, pp.1685-1696, Dec. 2012.
[5] J. Vanne, M. Viitanen, T.D. Hämäläinen, A. Hallapuro, “Comparative
rate-distortion-complexity analysis of HEVC and AVC video codecs”,
IV. CONCLUSIONS IEEE Trans. on Circuits and Systems for Video Technology, vol. 22, no.
12, pp.1885-1898, Dec. 2012.
In this paper, FPGA implementations of HEVC 2D IDCT [6] W. Meeus, K. V. Beeck, T. Goedeme, J. Meel, D. Stroobandt, “An
algorithm for all TU sizes using HLS are proposed. The overview of today's high-level synthesis tools,” Springer Design
proposed HEVC 2D IDCT hardware are implemented on Automation for Embedded Systems, vol. 16, no. 3, pp. 31-51, Sept.
Xilinx FPGAs using three HLS tools; Xilinx Vivado HLS, 2012.
[7] K. McCann, B. Bross, W.J. Han, I.K. Kim, K. Sugimoto, G. J. Sullivan,
LegUp, MATLAB Simulink HDL Coder. Using HLS tools “High Efficiency Video Coding (HEVC) Test Model (HM) 15 Encoder
significantly reduced the FPGA development time. The Description”, JCTVC-Q1002, June 2014.
implementation results show that the proposed HEVC 2D [8] T. Damak, I.Werda, N. Masmoudi, S. Bilavarn, “Fast prototyping H.264
IDCT FPGA implementations using HLS achieved real-time deblocking filter using ESL tools,” 8th International Multi-Conf. on
System, Signals and Devices (SSD), pp. 1-4, March 2011.
performance with acceptable hardware area. Therefore, HLS
[9] S. Kim, H. Kim, T. Chung, J-G. Kim, “Design of H.264 video encoder
tools can be used for FPGA implementation of HEVC video with C to RTL design tool,” Int. SoC Design Conference, pp. 171-174,
encoder. Nov. 2012.
[10] K. Fleming, C-C. Lin, N. D. Arvind, G. Raghavan, J. Hicks, “H.264
decoder : A case study in multiple design points,” ACM/IEEE Int. Conf.
on Formal Methods and Models for Co-Design, pp. 165-174, Jun. 2008.
[11] P. P. Carballo, O Espino, R. Neris, P. H. Fernandez, T. M. Szydzik, A. [17] M. Schmid, N. Apelt, F. Hanning, J. Teich, “An image processing
Nunez, “Scalable video coding deblocking filter FPGA and ASIC library for C-based high-level synthesis,” Int. Conf. on Field
implementation using high-level synthesis methodology,” Euromicro Programmable Logic and Applications (FPL), Sept. 2014.
Conf. on Digital System Design (DSD), pp. 415-422, Sept. 2013. [18] E. Kalali, Y. Adibelli, I. Hamzaoglu, “A high performance and low
[12] S. S. Bhattacharyya, J. Eker, J. W. Janneck, C. Lucarz, M. Mattavelli, energy intra prediction hardware for HEVC video decoding,” Conf. on
M. Raulet, “Overview of the MPEG reconfigurable video,” Springer Design and Architectures for Signal and Image Processing (DASIP), pp.
Journal of Signal Processing Systems, vol. 63, no. 2, pp. 251-263, May 1-8, Oct. 2012.
2011. [19] T. Dias, N. Roma, L. Sousa, “High performance multi-standard
[13] J. F. Nezan, N. Siret, M. Wipliez, F. Palumbo, L. Raffo, “Multi-purpose architecture for DCT computation in H.264/AVC high profile and
systems : A novel dataflow-based generation and mapping strategy,” HEVC codecs,” Conf. on Design and Architectures for Signal and Image
IEEE Int. Symposium on Circuits and Systems (ISCAS), pp. 3073-3076, Processing (DASIP), pp. 14-21, Oct. 2013.
May 2012. [20] E. Ozcan, Y. Adibelli, I. Hamzaoglu, “A high performance deblocking
[14] H. Ye, L. Lacassagne, D. Etiemble, L. Cabaret, J. Falcou, A. Romero, O. filter hardware for high efficiency video coding,” IEEE Transactions on
Florent, “Impact of high level transforms on high level synthesis for Consumer Electronics, vol. 59, no. 3, pp. 714-720, Aug. 2013.
motion detection algorithm,” Conf. on Design and Architectures for [21] E. Kalali, I. Hamzaoglu, “A low energy HEVC sub-pixel interpolation
Signal and Image Processing (DASIP), pp. 1-8, Oct. 2012. hardware,” IEEE Int. Conference on Image Processing (ICIP), pp. 1218-
[15] G. Schewior, C. Zahl, H. Blume, S. Wonneberger, J. Effertz, “HLS- 1222, Oct. 2014.
based FPGA implementation of a predictive block-based motion [22] C. Diniz, M. Shafique, F. Dalcin, S. Bampi, J. Henkel, “A deblocking
estimation algorithm - A field report,” Conf. on Design and filter hardware architecture for the high efficiency video coding
Architectures for Signal and Image Processing (DASIP), pp. 1-8, Oct. standard,” Design Automation and Test in Europe Conference, March
2014. 2015.
[16] O. A. Abella, G. Ndu, N. Sonmez, M. Ghasempour, A. Armejach, J. [23] E. Kalali, E. Ozcan, O. M. Yalcinkaya, I. Hamzaoglu, “A low energy
Navaridas, W. Song, J. Mawer, A. Cristal, M. Lujan, “An empirical HEVC inverse transform hardware,” IEEE Transactions on Consumer
evaluation of high-level synthesis languages and tools for database Electronics, vol. 60, no. 4, pp. 754-761, Nov. 2014.
acceleration,” Int. Conf. on Field Programmable Logic and Applications [24] Y. Voronenko, M. Püschel, "Multiplierless Constant Multiple
(FPL), Sept. 2014. Multiplication", ACM Trans. on Algorithms, vol. 3, no. 2, May 2007.
View publication stats

FPGA Implementations of HEVC Inverse DCT Using High-Level Synthesis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FPGA Implementations of HEVC Inverse DCT Using High-Level Synthesis

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

FPGA implementations of HEVC Inverse DCT using high-level synthesis

Conference Paper · September 2015

The user has requested enhancement of the downloaded file.

ܶ‫݊݋݅ݐܽܿ݊ݑݎ‬ଵ஽ ൌ ሺ‫ܶܥܦܫ‬ሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൅ ͸Ͷሻ ‫ ب‬͹(2)

III. HEVC 2D IDCT HLS IMPLEMENTATIONS

for(l=0; l<4; l++) #pragma HLS unroll factor=2

#pragma HLS pipeline Freq. Full

A part of the C codes developed and given as input to Freq. Full

LegUp is an open-source HLS tool. It can generate Verilog Freq. Full

Fig. 3. MATLAB Simulink Model

TABLE IV. HARDWARE COMPARISON

View publication stats

You might also like