You are on page 1of 5

2019 XXII Symposium on Image, Signal Processing and Artificial Vision (STSIVA)

Improving the Efficiency of Gabriel Graph-based


Classifiers for Hardware-optimized Implementations
Alan C. Souza∗ , Cristiano Leite Castro∗ , Janier Arias Garcia∗ , Luiz C. B. Torres∗† ,Leidy J. Acevedo Jaimes‡
, Brayan R. A. Jaimes∗
∗ Graduate Program in Electrical Engineering, Federal University of Minas Gerais, Belo Horizonte, Brazil
† Department of Computing and Systems, Federal University of Ouro Preto, Brazil
‡ Electricidad & Electronica, Universidad Francisco de Paula Santander, Cucuta, Colombia

Email:{alansouza, crislcastro, janier-arias, luiztorres, payo}@ufmg.br, leidyjuliethaj@ufps.edu.co

Abstract—This work evaluates strategies to reduce the com- number of the arithmetic operations is still high (according
putational cost of Gabriel graph-based classifiers, in order to to the number of the samples) demanding high resource con-
make them more suitable for hardware implementation. An sumption (grows exponentially) when implemented in FPGA
analysis of the impact of the bit precision provides insights on the
model’s robustness for lower precision arithmetic. Additionally, with a single floating point representation [8].
a parallelization technique is proposed in order to improve the In order to reduce the computational cost associated with
efficiency of the support edges computation. The results show the construction of the Gabriel graph, making CHIP-clas more
that the lower bit precision models are statistically equivalent to suitable to be implemented on FPGAs, this paper explores the
the reference double-precision ones. Also, the implementation of trade-off between accuracy and bit precision for floating-point
the proposed parallel algorithm provides a significant reduction
in the running time when applied in large datasets, while arithmetic. Also, a method that decomposes the construction
maintaining its accuracy. of Gabriel Graph into chunks of data is evaluated.
Index Terms—Classification, Gabriel graph, FPGA, numerical The structure of the present paper is as follows. The CHIP-
representation, parallel computing. class algorithm and related work are presented in Section II.
Section III presents the methodology applied to compare lower
I. I NTRODUCTION bit precision and also the parallelization technique proposed.
In the last few years there has been a growth towards the Section IV describes the results and Section V presents the
development of custom hardware architectures able to support conclusions.
the new Machine Learning models in an optimal and efficient
manner [1]. For embedded systems applications, there are II. BACKGROUND
some constraints to be considered in the design flow such as A. Support-edge Based Classifiers
power consumption and memory resources, among others. The family of classifiers that uses the data structure infor-
In this context, FPGA (Field Programmable Gate Arrays) mation in order to find the separation border among classes is
based implementations has been detached from other hardware known as CLAS (support-edge classifiers) [9]. This informa-
accelerators, such as Graphics Processing Units (GPUs). This tion is obtained by computing the GG from the training data
is mainly due to its intrinsic parallelism as well as its low set. As reported in [10], two vertices A and B forms an edge
energy consumption [2] [3]. if and only if there is no other vertex in the circle formed
Another key factor to be considered in the project of by A and B points. This statement can also be generalized
hardware-based implementation of Machine Learning algo- by a hypersphere that has A and B as its diameter. Let
rithms is the numerical representation since this decision V = {xi ∈ K | i = 1, 2, . . . N } be the set of vertices
usually leads to a trade-off among model accuracy, resource corresponding to the training set sample. The condition for a
consumption and performance [4]. Hence, for applications that given edge (xi , xj ) to be part of a Gabriel graph is presented
are more sensitive to model accuracy, a higher bit precision in (1):
size is often implemented. However, some studies have shown
robustness even when a lower precision is adopted. [5].
δ 2 (xi , xj ) ≤ δ 2 (xi , xz ) + δ 2 (xj , xz ), (1)
Recently a new binary classifier (CHIP-clas) based on the
Gabriel-graph (GG) structure was proposed in [6], [7]. CHIP- ∀ xz ∈ V and i 6= j 6= z; δ(.) is the operator that computes the
clas is suitable for online learning since its training process Euclidean distance between the vertices. The Gabriel graph for
occurs in a single pass without the need of solving an opti- a binary classification problem is shown in Fig. 1(a).
mization problem. However, its implementation in embedded The support edges (S) are formed by the subset of edges
systems is not straightforward. It was already shown that the that connect two vertices from different class labels as shown
in Fig. 1(b). The GG computational cost is O(dn2 ) in the best
This study was financed by Minas Gerais State Agency for Research and
Development - FAPEMIG and also the Coordenação de Aperfeiçoamento de case and O(dn3 ) in the average case and worst case [11] where
Pessoal de Nı́vel Superior - Brasil (CAPES) - Finance Code 001. n is the number of dataset samples and d is the dimension.

978-1-7281-1491-0/19/$31.00 ©2019 IEEE


2

1.5

0.5

−0.5

−1

−1.5

−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

(a)

1.5
Fig. 2: Hyperplanes obtained by CHIP-class. The result of the
1 combination is represented by the “circle” between the classes.
0.5

0
highly expensive. Thus, a recent study proposed an alternative
−0.5 decision rule, namely NN-clas, so that the classification of a
−1
test sample can be computed by considering only its nearest
neighbor in the border region among the classes [14]. For
−1.5
details, see pseudocode in Algorithm 1.
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
In the classification step, the Euclidean Distance between
each test sample and the set of pairs of vertices that form a
(b) support edge is computed. The test sample is associated with
Fig. 1: (a) Gabriel Graph, (b) Support structure vectors marked the class of the vertex that has the minimum distance from
with square. The line segment between them are the support it. This methodology presented statistically equivalent results
edges when compared with the CHIP-clas [14].

D. Related Work
However, by using the Dalaunay Triangulation structure, this
complexity can be decreased to O(n log n) in the plane [12]. In [8], an FPGA architecture was proposed for the CHIP-
clas using the previously presented NN-clas rule and imple-
B. Filtering technique menting operations in the standard IEEE-754 floating point
single bit precision. Although the computational complexity
In order to make the Gabriel graph-based models more
was reduced by using the NN-clas classification rule, the re-
robust to data sets that may present overlapping, which are
sults presented an exponential growth in resource consumption
common in real world applications, [7] designed a method to
(in terms of Lookup Tables and DSP blocks) as the data
detect and eliminate overlapping training samples based on a
samples increase.
quality measure proposed in [13]. This measure, presented in
(2), considers the vertex degree which is the number of edges E. Numerical Representation
incident to the vertex, where A(xi ) is the vertex degree of xi
and Â(xi ) is the number of all vertices that have an edge with Although Fixed-point arithmetic is more straightforward,
xi and are from the same class (label). since there is no need for dealing with an exponent or scale
factor, this representation presents a very reduced range when
Â(xi ) compared with floating-point. As stated by [15], in floating-
q(xi ) = , (2) point arithmetic, the hardware manages the normalization and
A(xi )
exponent settings, while in fixed-point this must be considered
C. Classification to be done by the user or designer, extending the complexity
The CHIP-clas algorithm can be considered a large-margin of the hardware implementation.
classifier and uses a hierarchical combination of hyperplanes to
compute the separation surface among the classes, as presented III. M ETHODOLOGY
in Fig. 2. Despite the fact that the Gabriel Graph can be computed
For embedded systems applications, the calculation of those with quasilinear time, as described in section II-A, the amount
hyperplanes that form the hierarchical combination may be of resources needed to compute the Delaunay’s Triangulation
Algorithm 1 Classification NN-clas incremental learning method was presented, based on the
input: The set of Support edges S, its labels yv and the set Gabriel Graph classifier. In this method, the data structure
of test samples Xt information provided by the graph was used to filter and
output: The estimated ŷ labels from Xt classify data in a temporal manner by using the information
for j in Xt do obtained from past data slots. In order to implement the
for i in V do parallelism, the python’s concurrent futures module was used.
Compute the distance between the test sample and Let Xi be the i − th slot of the divided training data. The
the vertices of the support edge set Algorithm 2 describes the proposed parallelization technique,
dist(i) ← δ(Xt (j), S(i)) where for each slot, the adjacency matrix is computed and
end for the output is the subset of vertices that belong to the support
ŷ ← label of the minimum distance sample in S edges set. Finally, all the support edges become the new
end for training data, by merging the output of all slots. With this final
return ŷ slot, the support edges are computed and the classification is
implemented using the NN-clas approach. It is important to
note that this process takes place after the data have been
graph structure makes it unfeasible for hardware-based imple- filtered using the technique described in section II-B.
mentation with memory restrictions. Hence, the methodology
applied here computes the Euclidean distance between each IV. R ESULTS AND D ISCUSSION
data point and all the samples of the dataset. The Table I outlines the average AUC-ROC results, along
The set of vertices that forms the adjacency matrix are with the standard deviation, for 12 datasets from the UCI
computed from (1), then the filtering technique described in machine learning repository [17]. The choice for the AUC
section II-B is used to remove the noise samples from the data. metric is mainly because of its capabilities to validate models
The adjacency matrix is computed again from the remaining in datasets with unbalanced class distributions. The data sam-
samples and the support edges are obtained from the set of ples were normalized between {−1, 1} and the models use
edges that have vertices from different labels. 10-fold cross-validation in order to evaluate its generalization
Finally, the estimated label from an arbitrary sample is for different test sets. The datasets names are presented along
obtained using the classification process described in the with the number of samples (n) and dimensions (d).
Algorithm 1 since this approach demands fewer resources The models are described as follows: The NN-clas, pre-
when compared with the hyperplanes’ hierarchical mixture sented in section II-D, with double precision (NN-clas 64),
presented in section II-C. is the reference model and serves as a statistical comparisor
A. Numerical Precision for the NN-clas with half bit precision (NN-clas 16). The
parallelization technique proposed in section III-B is also
In [8], the implementation of the single floating-point preci-
evaluated, for half and double precision(Par-clas 16 and Par-
sion in the hardware was arbitrarily chosen. In order to assess
clas 64).
the model performance in lower precision and have a decision-
making framework for future designs, this work shows an Algorithm 2 Parallel Support Edges Algorithm
analysis of the impact that the bit-width precision has in terms
of the model performance. for i in Xi do
The NN-clas algorithm was implemented in two different bit Compute local the adjacency matrix Ai
precision, the reference model that has IEEE-754 with double Vi ← (vi,1 , vi,2 ) ∈ Ai
floating-point bit precision and the half precision, where all if sign(vi,1 ) == sign(vi,2 ) then
the data and operations were converted using python settled Ŝi ← (vi,1 , vi,2 ) . local support edges Ŝ
for 16-bit precision. end if
end for
B. Parallelization Technique Xnew is the combination of all local support edges Ŝ
Although [14] provides a solution that improves the CHIP- A is the the adjacency matrix of Xnew
clas classification efficiency, its FPGA-based implementation for j in Xnew do
still presents a resource demand that grows exponentially with V ← (vj,1 , vj,2 ) ∈ A
the data size. This happens mainly because of the Gabriel if sign(vj,1 ) == sign(vj,2 ) then
graph construction process which computes all the Euclidean S ← (vj,1 , vj,2 ) . global support edges S
distances between each sample in the dataset. end if
The proposed method aims to explore the intrinsic paral- end for
lelism of the FPGA, implementing a framework to compute return S
the adjacency matrix and obtain the set of support edges
independently, by dividing the dataset into small random slots. Comparison of the average AUC measures was performed
This approach was inspired by the work of [16], where an using Friedman’s test, which is a robust non-parametric test
TABLE I: Average AUC results for the classifiers with half and double floating point precision
Datasets n d NN-clas16 NN-clas64 Par-clas16 Par-clas64
Banknote Auth. 1372 4 0.993 ± 0.009 1.000 ± 0.000 0.998 ± 0.003 1.000 ± 0.000
Austra. cred. 690 14 0.815 ± 0.009 0.772 ± 0.056 0.808 ± 0.060 0.772 ± 0.056
Fertility 100 9 0.762 ± 0.194 0.762 ± 0.194 0.762 ± 0.194 0.762 ± 0.194
Habermans S. 306 3 0.548 ± 0.158 0.477 ± 0.112 0.546 ± 0.110 0.484 ± 0.097
P.I.diabetes 768 8 0.656 ± 0.070 0.675 ± 0.090 0.642 ± 0.052 0.673 ± 0.090
Breast cancer 683 9 0.942 ± 0.034 0.955 ± 0.029 0.913 ± 0.055 0.951 ± 0.035
Climate M.S.C. 540 18 0.658 ± 0.093 0.663 ± 0.093 0.649 ± 0.080 0.663 ± 0.093
German Credit 1000 24 0.598 ± 0.066 0.625 ± 0.056 0.585 ± 0.064 0.625 ± 0.056
Parkinson 197 23 0.913 ± 0.023 0.913 ± 0.023 0.913 ± 0.023 0.913 ± 0.023
Sonar. M R 208 60 0.902 ± 0.115 0.848 ± 0.117 0.902 ± 0.115 0.848 ± 0.117
Statlog heart 270 13 0.812 ± 0.060 0.778 ± 0.022 0.812 ± 0.060 0.778 ± 0.022
ILPD 583 10 0.557 ± 0.056 0.536 ± 0.079 0.537 ± 0.051 0.529 ± 0.081

for study cases where the homoscedasticity and normal dis- TABLE II: Running time (s) for the NN-clas and parallel
tribution assumptions of the data sample cannot establish for methods
sure. Datasets NN-clas16 NN-clas64 Par-clas16 Par-clas64
Bank Auth. 1, 524.725 1, 250.717 346.552 317.585
It can be notice that with a significance level of 0.01, Aust. cr. 216.339 171.441 208.860 173.957
the results do not give any strong reasons to reject the null Fertility 2.074 1.986 4.002 3.954
hypothesis that there are no differences between the overall Haberman 31.451 25.969 23.121 18.231
model’s average results. Diabetes 241.091 211.145 134.637 202.378
Brea. ca. 342.741 283.129 58.028 48.291
This result shows that classifiers based on the Gabriel graph Climate 82.071 69.097 79.369 70.455
structure are robust to lower bit precision arithmetic since Germ. Cr. 545.567 330.612 532.542 355.853
the half-precision models are statistically equivalent when Parkinson 10.074 9.062 8.477 7.935
Sonar 10.960 9.191 10.704 9.163
compared with the double precision ones. Therefore, this S. heart 18.204 16.921 22.138 20.448
information can lead to hardware implementations that use ILPD 110.569 97.546 57.514 81.759
not just half of the memory resources but can perform at the
same level as the double and single bit precision models with
less logic resource consumption.
An analysis of the running time results also shows that
The results also shows the feasibility of the construction of models with double precision perform better. This can be
the graph structure in a parallel form. This is very important explained because of the fact that half-precision models have
in order to scale these models to large datasets, since the an additional step to convert the data samples into 16 bits
computational complexity grows exponentially with the size precision before computing the adjacency matrix and all the
of the data. model’s functions. However, this will not affect the hardware
The running time (in seconds) for each dataset is presented implementation.
in Table II. Experiments were run on a server with an Intel
Xeon Processor E5645 running at 2.40 GHz using 35 MB V. C ONCLUSION
of RAM, running Linux version 4.15.0-43. Besides being This work presented an analysis of the impact on the
statistically equivalent with the NN-clas results, the parallel decreased bit precision arithmetic in the model’s AUC metric
method presented a significant decrease in running time for results. Since the NN-clas algorithm did not present any
large datasets (those with more than 500 samples). For the statistically significant decrease in the AUC reports for half-
Banknote Authentication dataset, with 1372 samples, the run- precision arithmetic, future FPGA and other hardware archi-
ning time for the double precision models decrease more than tectures can be designed for the Gabriel graph-based family
70% using the parallelization technique. A similar behaviour of classifiers, using lower bit precision arithmetic. Therefore,
can be observed in the Breast Cancer dataset [18], with 683 the chip size and memory demand can be reduced in those
samples. implementations.
However, for small datasets, the parallelization technique The parallelization technique also presented statistically
did not improve the model efficiency. A deeper analysis can equivalent results and can be implemented in the Gabriel
be made on the size of the slot. The experiments observed as graph-based models in order to improve their efficiency and
result of the methodology employed here compute the slot size scalability for large datasets. Future work could extend this
as proportional to the number of samples in the data. Hence, approach by analyzing the impact of the number of slots
for datasets with a small number of samples, the parallel division in the model efficiency. An analysis of the model per-
algorithm split the data into fewer slots, because of the limited formance using fixed-point precision could also be evaluated.
number of samples available for the graph computation. For Finally, a new architecture for FPGA-based implementation
that reason, its running time had similar performance when can be proposed, exploring lower bit precision arithmetic and
compared with NN-clas in those datasets. the parallelization method presented here.
R EFERENCES
[1] Jeff Johnson. Rethinking floating point for deep learning. arXiv preprint
arXiv:1811.01721, 2018.
[2] Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers,
Karin Strauss, and Eric S Chung. Accelerating deep convolutional neural
networks using specialized hardware. Microsoft Research Whitepaper,
2(11), 2015.
[3] Miquel Vidal, Beñat Arejita, Javier Diaz, Carlos Alvarez, Daniel
Jiménez-González, Xavier Martorell, Filippo Mantovani, et al. Imple-
mentation of the k-means algorithm on heterogeneous devices: a use case
based on an industrial dataset. In Parallel Computing is Everywhere
(serie: Advances in Parallel Computing), volume 32, pages 642–651.
IOS Press, 2018.
[4] Gene Frantz and Ray Simar. Comparing fixed-and floating-point dsps.
Texas Instruments, Dallas, TX, USA, 2004.
[5] Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish
Narayanan. Deep learning with limited numerical precision. In
International Conference on Machine Learning, pages 1737–1746, 2015.
[6] LCB Torres, CL Castro, F Coelho, F Sill Torres, and AP Braga.
Distance-based large margin classifier suitable for integrated circuit
implementation. Electronics Letters, 51(24):1967–1969, 2015.
[7] LCB Torres. Classificador por arestas de suporte (CLAS): Métodos
de aprendizado baseados em grafos de Gabriel. PhD thesis, Tese de
doutorado, UFMG, 2016.
[8] Liliane dos Reis Gade. Estudo e desenvolvimento arquitetural para
implementação de um classificador geométrico de margem larga em
sistemas embarcados. Master’s thesis, Universidade Federal de Minas
Gerais, Brasil, 2018.
[9] Igor Pereira Gomes, Luiz Carlos Bambirra Torres, and Antônio
de Pádua Braga. Aprendizado de métrica supervisionado para classi-
ficador por arestas de suporte.
[10] K Ruben Gabriel and Robert R Sokal. A new statistical approach to
geographic variation analysis. Systematic zoology, 18(3):259–278, 1969.
[11] Wan Zhang and Irwin King. A study of the relationship between support
vector machine and gabriel graph. In Proceedings of IEEE World
Congress on Computational IntelligenceInternational Joint Conference
on Neural Networks, 2002.
[12] David W Matula and Robert R Sokal. Properties of gabriel graphs
relevant to geographic variation research and the clustering of points in
the plane. Geographical analysis, 12(3):205–222, 1980.
[13] Michaël Aupetit and Thibaud Catz. High-dimensional labeled data
analysis with topology representing graphs. Neurocomputing, 63:139–
169, 2005.
[14] Liliane dos Reis Gade, Cristiano Leite de Castro, Luiz Carlos Bambirra
Torres, Frederico Gualberto Ferreira Coelho, Antônio Pádua Braga,
Janier Arias Garcıa, and Frank Sill Torres. Nn-clas: classificador
geométrico de margem larga baseado na regra do vizinho mais próximo.
2017.
[15] Christopher Inacio and Denise Ombres. The dsp decision: Fixed point
or floating? IEEE Spectrum, 33(9):72–74, 1996.
[16] Marcus Vincius de Freitas Diadelmo. Aprendizado incremental com
memória parcial via grafo de gabriel. Master’s thesis, Universidade
Federal de Minas Gerais, Brasil, 2016.
[17] Dua Dheeru and Efi Karra Taniskidou. UCI machine learning repository,
2017.
[18] Olvi L Mangasarian. Cancer diagnosis via linear programming. SIAM
news, 23(5):1–18, 1990.

You might also like