Professional Documents
Culture Documents
Abstract—Histogram of Oriented Gradient (HOG) is a popular Additionally, in the age of cloud computing, the two last
feature description for the purpose of object detection. However, phases can be done in the server for IoT devices. Hence,
HOG algorithm requires a high performance system because of simplifying cell histogram generation part is very necessary,
its complex operation set. In HOG algorithm, the cell histogram
generation is one of the most complex part, it uses inverse especially for IoT devices.
tangent, square, square root, floating point multiplication. In this Many researchers have proposed diverse approximate meth-
paper, we propose an accurate and low complex cell histogram ods such as LUT or CORDIC or bin’s boundary angles in
generation by bypass the gradient of pixel computation. It cell histogram generation to reduce complexity levels. Even
employs the bin’s boundary angle method to determine the two though approximate models reduce the accuracy of extracting
quantized angles. However, instead of choosing an approximate
value of tan, the nearest greater and the nearest smaller of each pixel feature, they still require large memory areas or use many
tan value from the ratios between pixel’s derivative in y and x iterations.
direction are used. The magnitudes of two bins are the solutions In this work, we would like to propose a new method,
of a system of two equations, which represents the equality of the which extracts features of a pixel accurately and is suitable
gradient of a pixel and its two bins in both vertical and horizontal for low resource systems and hardware implementation. The
direction. The proposed method spends only 30 addition and
40 shift operations to caculate two bins of a pixel. Simulation difference in this method is that it is not an approximate model,
results show that the percentage of error when reconstructing the and it computes the quantized orientations and corresponding
differences in x and y direction are always less than 2% with 8- gradient magnitude of each quantized orientations directly. The
bit length of the fractional part. Additionally, manipulating the idea comes from the properties of decomposing the gradient
precision of gradient magnitude is very simple by pre-defined of a pixel into two components (bins). The proposed method
sine and cosine values of quantized angles. The synthesis results
of a hardware implementation of the proposed method occupy includes only addition and shift operation. As our simulation
3.57 KGEs in 45nm NanGate standard cell library. The hardware results, extracting non-normalization features of a pixel spends
module runs at the maximum frequency of 400 MHz, and the about 30 additions and 40 shifts for 8-bit of fraction of
throughput is 0.4 pixel/ns for a single module. It is able to support magnitude. In the term of accuracy, error when reconstructing
48 fps with 4K UHD resolution. the pixel’s derivatives in both direction from voted bins is
always less than 2%. A hardware implementation occupies
I. I NTRODUCTION
about 3.57 KGEs in NanGate 45nm standard cell library. The
The Histogram of Oriented Gradients (HOG) [1] is a feature maximum operating frequency of the hardware is 400MHz,
descriptor used in computer vision and image processing for and its throughput is up to 0.4 pixel/ns.
object detection. It gets the quantity of strong characteristics The rests of the paper are organized as follows. Section II
from the shape change of the object by dividing a local domain introduces the cell histogram generation and some previous
into plural blocks and making the incline of each histogram. optimization works. Section III shows the detail of proposed
Practically, HOG feature achieves very high accuracy level, method. Section IV presents details of the hardware imple-
it is up to 96.6% of the detection rate with 20.7% of the mentation of proposed method and simulation results. Finally,
false positive rate [2]. Hence, it is the key role of wide range the conclusion and the perspectives are presented in the section
application domains including robotic, security surveillance V.
and automotive. However, the complexity level of HOG is
II. C ELL HISTOGRAM GENERATION AND ITS
the most difficult problem when using in embedded systems
APPROXIMATIONS
or implementing into hardware.
The HOG algorithm can be separated into 3 phases: cell A. Conventional cell histogram generation
histogram generation, block normalization, and the SVM Histogram of Oriented Gradient (HOG), pioneered by
classification. The first phase plays as the most complex one DALAL and TRIGGS [1], becomes one of the most popular
and consumes the most energy. Because extracting the feature methods for feature extraction. In the conventional HOG, the
of each pixel requires series of complex operations: inverse cell histogram generation part is the most dominant power
tangent, square, square root, floating point multiplication. consumption, up to 58% HOG power in Takagi’s work [3].
range, but they have different orientations. They will have the
𝒚
same HOG features. 𝑴
III. P ROPOSED CELL HISTOGRAM GENERATION
In this work, we proposed a robust and low complexity 𝑩𝟐 × 𝐬𝐢𝐧 𝜽𝒊+𝟏 𝑩𝟐
method to identify two feature bins of a pixel. This method
has no calculation of the gradient of pixel. To determine the 𝒅𝒚
two quantized orientations exactly, a trusted form of Equation
d
11 is proposed. It is from the limitation of the dxy and the
tan θ values. Gradient magnitudes of two bins are solutions
of a system of two equations instead of voting gradient. 𝑩𝟏
𝑩𝟏 × 𝐬𝐢𝐧 𝜽𝒊 𝜽𝒊+𝟏
𝜽𝒊
A. Robust quantized angles determination
Equation 14 shows our basic idea to calculate the quantized 𝑶 𝒙
𝒅𝒙
angles. In stead of choosing fixed bit length of fractional
P iof tan(θ) values or representing a tan(θ) value by sum
part
𝑩𝟐 × 𝐜𝐨𝐬 𝜽𝒊+𝟏
2 , our proposed method limits each tan(θ) by two rational 𝑩𝟏 × 𝐜𝐨𝐬 𝜽𝒊
d As
numbers: its nearest smaller dxy , called Bs
and its nearest
dy Ag
greater dx , called Bg . Those fractions do not only depend
dy
on the tan values, it is from examining all cases of dx and Fig. 1. Decomposition a vector into form of two vectors
tan values to reach the expected results.
As Ag Ag d
< tan(θ) < (14) • B
g
is the nearest greater or equal number dxy of tan(θ)
Bs Bg value;
As A • As , Bs , Ag , Bg is integer number in |dx |, |dy | range.
Table I shows all B and Bgg in a case of 8-bit length pixel.
s • θ is quantized orientation;
With those rational numbers in the table, multiplying with
• α is the gradient orientation of the current pixel.
very large number such as 105 is not necessary to determine
quantized orientation exactly. B. Magnitude of two bins calculation
As the discussion in section II-D, the results of bin’s bound-
TABLE I ary angles do not provide enough arguments for voting. Our
dy
tan θ AND ITS NEAREST VALUES d
x work proposed a system of two equations from information:
θi , θi+1 , dx and dy to calculate gradient magnitude of two bins
Nearest smaller Nearest greater exactly.
tan(θ)
dy
| dx | of tan(θ) dy
| dx | of tan(θ) Back to the ideas of HOG, it decomposes gradient of a pixel
43 3 I(x, y) into a combination of two bins as described in Figure 1
244
tan(10) 17
and Equation 17. Hence, in each direction, the sum of two bins
56 97
57
tan(30) 168 equals to the gradient of the pixel. Equation 18 and 19 present
230
tan(50) 87 those equalities in the Ox and Oy axises, respectively. Finally,
193 73
250 11
the two magnitudes are the solutions of the two equations, and
tan(70)
91 4 have the form as Equation 20 and Equation 21, respectively.
two quantized angles. The most complex part are sin, cos
values, but they are pre-computed. The accurate level of HOG
ca be controlled by choosing these values. To achieve these
advantages, this method requires determining the quantized
orientations exactly. If there is any errors in determining
direction of two bins, it produces negative magnitudes.
C. Proposed cell histogram generation scheme
Figure 2 shows the data flow of proposed cell histogram
generation, which uses our method. As shown in Figure
2, the input contains 4 neighbor pixels I(x + 1)(y), I(x −
1, y), I(x, y + 1), I(x, y − 1) of pixel I(x, y). In the first step,
the gradient in x and y directions dx and dy are computed. In
the second step, we have absolute values of dx and dy , and
the quadrant position of the gradient of pixel I. If the sign
−−−−−→
of dx is opposite to the sign of dy , the gradient M (x, y) is
in the second quadrant. Otherwise, it is in the first quadrant.
Because the left side of Oy axis is a reflection of the right side
of Oy, so computing the magnitude of two voted bins in the
first quadrant is enough. The third step is to determine the two
quantized angles θi , θi+1 via comparison between A × dx and
B × dy , where A is the dividend, and B is the divisor in Table
d
I. In this scheme, we use the nearest greater values dxy of tan θ.
I. After the orientations of two bins are determined, the two
−→ −−−→
corresponding magnitudes ||Bθi ||, ||Bθi+1 || are obtained from
Equations 20 and 21. Finally, we have to correct the angles
from quadrant position information. If those angles are in the
second quadrant, converting the θ in the first quadrant into the
second quadrant is done by a subtraction θ = 180 − θ.
As shown in Figure 2, we have already known one of Fig. 2. Dataflow diagram of the proposed method
two numbers in multiplication operations. It allows converting
all multiplications into form of shifts and additions. Table II TABLE II
C ONVERTING ALL MULTIPLY TO SHIFT AND ADDITION OPERATIONS WITH
shows all of the transformations with 8-bit length of fractional 8- BIT OF FRACTIONAL PART
part in sine and cosine. It is equivalent to multiply by 256. The
size of fractional part of magnitude is controlled by the sine, Shift and Shift and
B × dy A × dx
cosine values in the Table II. For example, if an application Addition Addition
needs a 10-bit length of fractional part, the cosine, sine values 4 22 11 23 + 21 + 20
have to multiply 1024 instead of 256. From Table II and
73 26 + 23 + 20 87 26 + 24 + 23 − 20
Figure 2, the total number of operations to identify two feature
168 27 + 25 + 23 97 26 + 25 + 20
bins of a pixel are about 30 additions and nearly 40 shifts.
17 24 + 20 3 21 + 20
IV. H ARDWARE IMPLEMENTATION AND EXPERIMENTAL cos(10) × 256 28 − 22 sin(10) × 256 25 + 23 + 2 2
RESULTS
cos(30) × 256 28 − 25 − 21 sin(30) × 256 27
We implemented the proposed method into hardware and cos(50) × 256 27 + 25 + 22 + 20 sin(50) × 256 27 + 26 + 2 2
examined its results. Figure 3 shows the hardware module
cos(70) × 256 26 + 24 + 23 sin(70) × 256 28 − 24 + 20
of the proposed method. It includes 4 input pixels I(x +
1, y), I(x−1, y), I(x, y+1), I(x, y−1) with 8-bit pixel length. cos(90) × 256 0 sin(90) × 256 28
1 15
The output contains the magnitude and the direction of two sin(20)
2+ 16
bins. Angle is 4-bit length data, and magnitude is 17-bit length
data with 8-bit of fractional part.
As shown in Figure 3, the module includes 4 steps. Firstly, The fourth step produces two voted bins, including orientation
the difference dx and dy are computed from the 4 input and magnitude.
pixels. In the second step the absolute values of dx , dy are Figure 4 presents the detail of the fourth steps. It has two
computed, and the quadrant position of the gradient of pixel multiplexers. The first multiplexer has 4 inputs from 4 angle
I(x, y) is detected. After having those absolute values, the pre- comparisons. This multiplexer chooses the highest angle θ in
calculating step calculates all the multiplications in Table II. the first quadrant. For each quantized angle, it also produces
clk 𝑑𝑥
4 𝜃1
rst / 𝑑𝑦 𝑑𝑥′ − 𝑑𝑥
4
𝑒𝑥 =
𝜃2 𝑑𝑥
𝐼(𝑥 + 1, 𝑦) 𝑑𝑥 , 𝑑𝑦 , Angle and /
𝑑′𝑦 = 𝐵1 × 𝑠𝑖𝑛𝜃1 + 𝐵2 × 𝑠𝑖𝑛𝜃2
𝐵 × |𝑑𝑦 |
𝑑𝑥 , 𝑑𝑦 𝑞𝑢𝑎𝑑𝑟𝑎𝑛𝑡 𝐼𝐼 Magnitude 𝑑𝑦′ − 𝑑𝑦
𝐴 × |𝑑𝑥 |
𝐼(𝑥 − 1, 𝑦) Calculation 17 𝑑′𝑥 = 𝐵1 × 𝑐𝑜𝑠𝜃1 + 𝐵2 × 𝑐𝑜𝑠𝜃2 𝑒𝑦 =
𝐵1 𝑑𝑦
/
𝐼(𝑥, 𝑦 + 1) 17 𝐵2
/
𝐼(𝑥, 𝑦 − 1) Step 1 Step 2 Step 3 Step 4 Fig. 5. Error of calculation model
corresponding gradient magnitude from Equation 20 and 21. In this paper, we have proposed a new method to do
The second multiplexer converts the θ in first quadrant into cell histogram generation part without calculating the actual
the second quadrant if the sign(dx ) is opposite sign(dy ). gradient of a pixel. It identifies the two bins by the prop-
erties of decomposing gradient of a pixel into two standard
1 vectors. This method extracts HOG features of a pixel with
𝑞𝑢𝑎𝑑𝑟𝑎𝑛𝑡 𝐼𝐼 / 30 additions and 40 shifts. The error of the reconstructed
values of dx and dy from the two voted bins are always less
1 4 𝜃
4 1 than 2%. Additionally, it is very flexible, we only need to
17 × 𝑑𝑦 ≥ 3 × 𝑑𝑥 / /
/ define new constants for sine and cosine values of quantized
1 4 4
𝜃 angles to change the precision. The hardware implementation
168 × 𝑑𝑦 ≥ 97 × 𝑑𝑥 / / / 2
of the proposed method occupies about 3.57 KGEs with 45nm
1 17
𝐵1 NanGate standard cell library, at the maximum frequency
73 × 𝑑𝑦 ≥ 87 × 𝑑𝑥 / / 17
of 400MHz, and the throughput of 0.4 pixel/ns for a single
1 /
𝐵2 module.
4 × 𝑑𝑦 ≥ 11 × 𝑑𝑥 /
R EFERENCES
Fig. 4. Submodule angle and magnitude calculation
[1] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human
Table III shows the hardware implementation results of Detection,” in IEEE Computer Society Conference on Computer Vision
some previous works and our proposed method. Pei-Yin and and Pattern Recognition (CVPR’05), vol. 1, pp. 886–893. [Online].
Available: http://ieeexplore.ieee.org/document/1467360/
Hsiao’s works calculated the gradient of current pixels with [2] K. Negi, K. Dohi et al., “Deep pipelined one-chip FPGA implementation
approximate model via the SRA approximation for magnitude of a real-time image-based human detection algorithm,” in 2011
and CORDIC for angle. Munteanu combined Equation 10 and International Conference on Field-Programmable Technology, pp. 1–8.
[Online]. Available: http://ieeexplore.ieee.org/document/6132679/
bin’s boundary angles method. As shown in the Table III, the
[3] K. Takagi, K. Mizuno et al., “A sub-100-milliwatt dual-core
proposed method and Munteanu’s works can run at the highest HOG accelerator VLSI for real-time multiple object detection,”
frequency of 400Mhz. They are also the two smallest areas cost in 2013 IEEE International Conference on Acoustics, Speech
of implementation. However, the area of Munteanu’s module and Signal Processing, pp. 2533–2537. [Online]. Available: http:
//ieeexplore.ieee.org/document/6638112/
is still more than five of our proposed method. [4] K. Mizuno, Y. Terachi et al., “Architectural Study of
HOG Feature Extraction Processor for Real-Time Object
TABLE III Detection,” in 2012 IEEE Workshop on Signal Processing
H ARDWARE COMPARISON OF C ELL H ISTOGRAM G ENERATION Systems. [Online]. Available: https://pdfs.semanticscholar.org/f2c3/
9ff8ee36aca20915f6ce7195317b4e1a34cd.pdf
Publications Technology Areas (KGates) Frequency [5] R. Kadota, H. Sugano et al., “Hardware Architecture for HOG
Pei-Yin [9] 130nm 85.5 167Mhz Feature Extraction.” IEEE, pp. 1330–1333. [Online]. Available:
Hsiao [8] 90nm 29.1 N/A http://ieeexplore.ieee.org/document/5337209/
Munteanu [10] N/A 20 400Mhz [6] F. N. Iandola, M. W. Moskewicz et al., “libHOG: Energy-Efficient
Our Proposed 45nm NanGate 3.57 400Mhz Histogram of Oriented Gradient Computation,” in IEEE 18th
International Conference on Intelligent Transportation Systems, pp.
1248–1254. [Online]. Available: http://ieeexplore.ieee.org/document/
Figure 5 shows our error of calculation model for the 7313297/
proposed method. From the voted bin, we calculate a model [7] A. Suleiman and V. Sze, “An Energy-Efficient Hardware Implementation
of dx and dy by Equation 18 and 19, called d0x and d0y . With of HOG-Based Object Detection at 1080HD 60 fps with Multi-
Scale Support,” vol. 84, no. 3, pp. 325–337. [Online]. Available:
the d0x , d0y , dx , dy , the percentage of calculation errors at both http://link.springer.com/10.1007/s11265-015-1080-7
Ox and Oy directions is computed. Tested with all cases, the [8] S.-F. Hsiao, J.-M. Chan et al., “Hardware design of histograms of
results prove that our proposed method provides the percentage oriented gradients based on local binary pattern and binarization,”
in 2016 IEEE Asia Pacific Conference on Circuits and Systems
of calculation errors ex ey , which are always less than 2% with (APCCAS). IEEE, pp. 433–435. [Online]. Available: http://ieeexplore.
8-bit length of the fractional part. ieee.org/document/7803995/