You are on page 1of 18

Optimizing the Embedded Platform using OpenCV

February 17, 2012


att !eber
"ml#eber1$ro%&#ell%ollins'%om(
2
)
*oals
)
Pro+e%t ,pproa%h - .esults
)
Future /deas
)
.eferen%es
0
)
1o 2uantify the effe%ts of the many optimizations
a3ailable and see #hat effe%t, if any po#er
management has
)
ost /mportant .e2uirements "/.s(

inimal startup and lo# laten%y pro%essing time

On4demand Po#er anagement


) 5a%&ground

6tilized a O,P0 pro%essor for image pro%essing

7inu8 2'9'0:'; <ernel #ith O,P P pat%hes

5uildroot #= Crosstool4ng tool%hain


Goals
;
)
Cost=5enefit

Compiler > Co4Pro%essor > Po#er anagement >


?pe%ialized Cores

?upporting soft#are "#hi%h &ernel, pa%&ages,


3endor libraries, et%(
) @efine ben%hmar&ing tool
) *ather metri%s for optimization methods applied to

Platform "<ernel=rootfs(

,ppli%ation

!ith po#er management a%ti3e


Project Approach
A
)
*ot%has
,re 5inary %ompatibility - ar%hite%ture "arm3A, 39, 37a''''(
mas&ing a problemB
,re your Platform - ,pp using the same tool%hainB
,re features li&e VFP "Ve%tor Floating Point( - ,d3an%ed ?/@
e8tension "a&a CEOC( enabledB
)
5uilding your o#n has some additional benefits
D ?our%e %ontrol - ability to re%reate=fi8 issues
D *eared to#ards your CP6 ar%h - hard#are FP6
D Could tailor &ernel headers to get a ne#er feature
Possibly in%orporate the latest 7inaro *CC
Project Approach: Compiler/Toolchain
Know your toolchain!
9
)
OpenC3 2'1

%3at%h1emplate"( algorithm as the test %ase


cvMatchTemplate( img, tpl, res, CV_TM_CCORR_NORMED );

7ots of matri8 math

Ea%h of the time measurements #ere +ust for the


algorithm e8e%ution and not the image load time

A'A5 image is sear%hed for the image of a small


boat
Project Approach: Benchmarking Tool
7
)
1estE Compiler Optimization
)
@es%riptionE <ernel and .ootfs are built #ith same flags belo#
and e8e%uting off an ?@Card'
) FlagsE
CFLA! "# $pipe $O%
)
.esultE F1:'0Ase% $G00hz
Project Approach: Metrics Test !
Compiler
G
)
1estE Compiler Optimization - use of hard#are %o4pro%essors
)
@es%riptionE <ernel and .ootfs are built #ith same flags belo#
and e8e%uting off an ?@Card'
) FlagsE
CFLA! "# $pipe $O% $m&p'#(e)( $&tree$vect)ri*e $m&l)at$a+i#s)&t&p
)
.esultE F;':1se% $G00hz
F7AH in%rease in performan%e
Project Approach: Metrics Test "
Compiler Co4Pro%essor
0
5
10
15
20
25
O3
O3 w/Neon
:
)
1estE Compiler Optimization - Po#er anagement
)
@es%riptionE <ernel and .ootfs are built #ith same flags belo#' Po#er
management is enabled to idle and fre2uen%y s%ale the CP6 on4demand
bet#een 000 and G00hz' /t uses the default s%aling trigger threshold for
the 2'9'0:'; &ernel'
"CoteE Purely ,. %ore instru%tions'(
)
FlagsE
$pipe $O%
) .esultE F1:'0:se% $0004G00hz
F;0mse% "2H( in%rease in pro%essing time #= P
) CommentE ?olely ,. instru%tions %ause the s%heduler to ha3e more
demand for a higher %lo%& speed earlier, so it results in a small in%rease in
the additional pro%essing time re2uired'
Project Approach: Metrics Test #
Compiler
Po#er
anagement
19.33
19.34
19.35
19.36
19.37
19.38
19.39
19.4
O3
O3 w/PM
T
i
m
e
(
S
e
c
o
n
d
s
)
10
)
1estE Compiler Optimization, %o4pro%essors and Po#er anagement
)
@es%riptionE <ernel and .ootfs are built #ith same flags belo#' Po#er
management is enabled to idle and fre2uen%y s%ale the CP6 on4demand
bet#een 000 and G00hz' /t uses the default s%aling trigger threshold for
the 2'9'0:'; &ernel'
"CoteE ,. %ore and Ceon instru%tions'(
)
FlagsE
$pipe $O% $m&p'#(e)( $&tree$vect)ri*e $m&l)at$a+i#s)&t&p
) .esultE FA'12se% $0004G00hz
F210mse% ";H( in%rease in pro%essing time #= P
) CommentE 7ess time spent e8e%uting ,. instru%tions, sin%e the Ceon
%ore is offloading some of the pro%essing, %auses more e8e%ution at 000hz
and a slight in%rease in pro%essing time'
Project Approach: Metrics Test $
Compiler
Po#er
anagement
Co4Pro%essor
4.8
4.85
4.9
4.95
5
5.05
5.1
5.15
O3 w/Neon
O3 w/Neon &
PM
T
i
m
e
(
S
e
c
o
n
d
s
)
11
) Finish testing #ith @?P and 1/ Code% Engine
/nitial tests #ith CE, 7P, @?P7/C<, 1/ Code% Engine are #or&ing
/ssues #ere found #ith the C9,%%el used in ?oC OpenCV @?P #or&
"ne#er 1/ libraries, &ernel and %ompiler issues'''''(
1/ measurements #ith /ntegra ?OC "floating point @?P( sho# a G9H
speed up for the mat%h template algorithm
Project Approach: %uture Tests
Compiler
Po#er
anagement
Co4Pro%essor
?pe%ialized
Cores
1! 1!
12
Project Approach: Per&ormance Metric 'ummary
The key to the ne(t step is controlling o&&loa)ing o*erhea)
Test Result (sec)
"1 #O3 19.35
"2 #O3 & Neon 4.91
"3 #O3 w/ PM 19.39
"4 #O3 & Neon w/PM 5.12
"5 #O3 & Neon w/PM
& $SP
%s&. '3.07
10
Project Approach: Power Management Test
) 1ools > ben%h po#er4supply and data logging multimeter
) ?tartup board "po#er4supply is set to a 1, limit at AV(
) First test is on4demand
,r))t-+'il.r))t /01 ech) 23444442 5 6s7s6.evices6s7stem6cp'6cp'46cp'&re86scali(g_ma9_&re8
,r))t-+'il.r))t /01 ech) 2)(.ema(.2 56s7s6.evices6s7stem6cp'6cp'46cp'&re86scali(g_g)ver()r
cp'&re8$)map: tra(siti)(: 344444 $$5 %44444
,r))t-+'il.r))t /01 ;6)pe(cv_templatematch
<OR=>N555
cp'&re8$)map: tra(siti)(: %44444 $$5 344444
?;@A4444 sec)(.s )& pr)cessi(g
t@: %A4444 tA: ?B44444
Cl)cCspersec: @444444
cp'&re8$)map: tra(siti)(: 344444 $$5 %44444
,r))t-+'il.r))t /01
) ?e%ond test is userspa%e set fre2uen%y
,r))t-+'il.r))t /01 ech) 2'serspace2 5 6s7s6.evices6s7stem6cp'6cp'46cp'&re86scali(g_g)ver()r
,r))t-+'il.r))t /01 ech) 23444442 5 6s7s6.evices6s7stem6cp'6cp'46cp'&re86scali(g_setspee.
cp'&re8$)map: tra(siti)(: %44444 $$5 344444
,r))t-+'il.r))t /01 ;6)pe(cv_templatematch
<OR=>N555
D;E@4444 sec)(.s )& pr)cessi(g
t@: @@4444 tA: ?4A4444
Cl)cCspersec: @444444
,r))t-+'il.r))t /01
1;
) CoteE the @?P adds an additional F07Am!, sho#n in yello# - pre3ents the
,. from s%aling up to G00hz' 1he %hart sho#s only an estimate of @?P
po#er dra#IAJ and an appro8imate timeline from 1/ #hitepaper findings'
) /f an O,P *P6 options #as added, the appro8 po#er dra# #ould in%rease
by F:0m!' !eKre not sure yet ho# mu%h o3erhead this #ould %ause on
the ,.'''
Project Approach: +nitial Power Measurements
1 2 3 4 5 6 7 8 9 10 11 12
0
0.5
1
1.5
2
2.5
3
(e)*+e(o),d-M # O.en/0 Tem.+)&e M)&c1 Powe, $,)w
23M4800M15
23M4300#800M15
%s&. 23M4300#800M15 & $SP
Time(Seconds)
P
o
w
e
,

(
w
)
&
&
s
)
O,i*. P,ocessin*
1A
) /n3estigate the ne# issues of Po#er anagement in a multi4
%ore #orld
Lo# %ould load statisti%s be maintained for dynami% po#er %ontrol
a%ross %oresB
aybe add hoo&s into e8isting CP6Fre2 frame#or& for on4demand
based on anti%ipated %ompletion from other %oresB !hat if 7inu8 on
the primary CP6"s( suspended #hile the offloaded tas& is being
pro%essedB
%uture +)eas
7!
19
) *soC pro+e%tE OpenCV @?P ,%%eleration "2010(
D /n3estigate OpenCV %ode issues "lots of floating point and ?17(
D *ather po#er, timing and laten%y=/PC o3erhead numbers using the
1/ Code% Engine approa%h
Possibly implement %ustom @?P approa%h based on results
) *P6
D /n3estigate "future( ?*M *raphi%s ?@< #ith OpenC7 support
D Currently the only published 3endor supporting OpenC7 is Nii7,5?
"N? ?OC( and 1/ "O,PA(
%uture +)eas
17
)
Lard#are
D 5eagleboardM
D
"optional( 7/4A00 %amera
) .epository - !i&i
in%ludes 8loader, uboot, sd%ard s%ripts, &ernel - rootfs, test se2uen%es
git:66gith'+;c)m6mattheF$l$Fe+er6+'il.r))t;git
https:66gith'+;c)m6mattheF$l$Fe+er6+'il.r))t6FiCi
) 5uildroot O3er3ie#
http:66&ree$electr)(s;c)m6p'+6c)(&ere(ces6A4@@6elce6'si(g$+'il.r))t$real$
pr)Gect;p.&
Project +n&ormation
1G
I1JhttpE==###'ti'%om=lit=#p=spry17A=spry17A'pdf
I2JhttpE==###'ti'%om=lit=#p=spry1;;=spry1;;'pdf
I0JhttpsE==%ode'google'%om=p=open%34dsp4
a%%eleration=#i&i=*etting?tarted1
I;JhttpE==old'nabble'%om=.e2uest4for4%omments4on4pa%&ages4for41/
H27s4O,P04and4@09A4pro%essors4td2:7;1229'html
IAJhttpE==pro%essors'#i&i'ti'%om=inde8'php=O,P0A00OPo#erOEstimatio
nO?preadsheet
I9JhttpE==###'sa&oman'%om=O,P=an4o3erie#4of4omap04po#er4
management4#ith4290:4pm'html
I7JhttpE==###'ti'%om=general=do%s=#tbu=#tbugen%ontent'tspB
template/dP9120-na3igation/dP11:GG-%ontent/dP;90G
Cre)its/,e&erences

You might also like