Bo Yang, Liang Guang, Tero Säntti, Juha Plosila

Parameter-Optimized Simulated Annealing for Application Mapping on Networks-on-Chip

Outline
Introduction Application Mapping Implementation of SA Nelder-Mead Simplex Method Experiment and Analysis Conclusion

ZDNet.. • Intel: Why a 1.000-core chip is feasible..Introduction Moore’s Law is still valid (ITRS’s perspective) What could we do with billions of transistors? • Tens to hundreds of cores on a single chip • 80-core Intel Terascale Chip • Tilera TILE-Gx Family with 16 to 100 processing cores • . 2010 Manycore architecture has become the mainstream for parallel commputing .

bottleneck Networks-on-Chip (NoC) • Better scalability • Higher reliability • More reusability . point-to-point connections.Introduction Major Concern • Communication. instead of computation • Great impact on perfomance and energy consumption • Conventional bus.

Application Mapping Application • a set of concurrent tasks • modeled by the communication weighted graph (CWG) Many-core NoC • a set of tiles and links • Modeled by the computation and communication resource graph (CCRG) .

Objective: mapping solution to minimize the communication energy consumption .Application Mapping The role is to determine how to place each task on a tile of the NoC so that the specific design interests and costraints are fulfilled.

Application Mapping Energy model of NoC [Jingcao2005] • Energy consumped by one communication • where • : data volume transferred from task i to j : distance of communication channel from node i to j on the NoC • and : energy consumed by switch and link for transferring one bit of data on the NoC .

.Application Mapping Objective Formulization • Communication energy consumption of an application • Given constants the product of and and . Weghted Communication of an Application (WCA) Smaller WCA Better solution The objective of the application mapping is to find the optimal solution with minimal WCA. Eapp is linearly proportional to of all communications .

. Greedy Incremental (GI).55e25 ) • Heuristic search including Simulated Annealing (SA). n=m=25. 25! 1.Application Mapping NP-hard problem • to map m tasks on n cores (m • n) (n n! m 1 )! possible solutions • search space increases exponentionally with problem size m and n • Exhaustive search is impossible.. Tabu Search (TS).g. etc. (e.

. T) • etc. T) • Termination function Terminate(i. R) • Move function Move(S.Simulated Annealing Pro and Con • Be able to find global optima • Numerous computions and evaluations-long runtime Parameters and Functions • Initial temperature T0 • Final temperature Tf • Cost fucntion Cost(S) • Temperature function Temp(i) • Acceptance function Accept( C.

q: cooling ratio • L: # of iterations at each temperature .Simulated Annealing Cost function Cost(S) • Cost(S) = WCA of solution S Temperature function Temp(i) Temp(i ) T 0 q i L • i: # of iterations.

C : cost difference • K: normalized ratio Termination function Terminate(i. T) • Single random swapping • A task in current solution is randomly selected and swapped to a randomly selected tiles to generate a new solution .Simulated Annealing Acceptance function Accept( C. T) random() prob 1 1 exp( KCCT ) 0 • C0: initial cost. R) Temp(i ) Tf R R max N C 0 Z Move function Move(S.

Simulated Annealing Initial temperature T0 and final temperature Tf • Solve the acceptance function for T T KC 0 ln( C 1 1) prob T0 and Tf can be derived by: T0 C max 1 1) KC0 ln( prob0 Tf C min 1 KC0 ln( 1) prob f • prob0:probability of accepting Cmax at temperature T0 • probf:probability of accepting Cmin at temperature Tf .

we need to determine parameters: • q: colling ratio • K: normalizing ratio • prob0: probability of accepting Cmax at temperature T0 • probf: probability of accepting Cmin at temperature Tf • Cmax. instead of being set mannually and independently. .Simulated Annealing To summary. Cmin : coputed using a finite number of trial moves Considerations on parameter selection • problem-specific • jointly afftect the performance of SA The set of parameters should be selected in a system way.

the expansion point or the contraction point. …. xkn) • Sort the n+1 function values so that f(p0) f(p1) … f(pn) To get the minimum of f(p). x2. xk2. or by updating all points when the preceding replacements failed.Nelder-Mead Simplex Method Method for minimization of a function f(p) • Proposed by Nelder et. xn • n+1 points form the initial simplex.al in 1965 • f(p) : function with n variables x1. each point pk is a ntuple (xk1. in each iteration: • a new simplex is formed:either by replacing the point pn with the refelection point. …. • Sort the n+1 function values of points in the new simplex and continue the process .

refer J. f(p1). f(pn) converge to one value which is the approximation of the mimimum value of function f(p) For more detail.Nelder-Mead Simplex Method The process terminates until f(p0). A simplex method for function minimization. …. .Mead.Nelder and R.A.

prob0 and probf • Initial simplex:5 initial points consisting of selected values of 4 variables • SA algorihm applies each set of parameters of one point and finds one mapping solution • The WCA of the mapping solution found by the SA algorithm is defined as the value of function f(p) and compared with others • The Nelder-Mead method terminates when all 5 points converge to one point which represents the set of optimized parameters we try to find • This set of optimized parameters is then applied to the SA algorithm to find the best mapping solution .Nelder-Mead Simplex Method Parameter-Optimized SA (POSA) algorithm • Variables: q. K.

16 tasks) • MPEG4 12 tasks • multimedia systems application (MMS.Expereiment and Result Setup Four applications • video object plane decoder (VOPD.264 decoder (H264. Tf:unbounded • Exponential form of acceptance function • Random swapping move function Simulator in NoCMap is used to obtain the communication energy consumption for POSA and NoCMap . T0:100. 16 tasks) Reference work: NoCMap ([Jingcao2005]) • Parameters are set manually • q:0. 25 tasks) • H.9.

05 0. for different problems.05 0.36 0. different sets of parmaters should be applied to the SA algorithm .05 0.42 probf 0.34 0.36 0.94 0.49 Parameters are problem-specific Instead of using identical set of parameters.44 0.Expereiment and Result Optimized Parameters Application VOPD MPEG4 MMS H264 q 0.89 prob0 0.72 0.95 0.05 K 0.91 0.62 0.

77e4 1.06% 1.64% 1.14e7 1.Expereiment and Result Number of Iterations Application VOPD MPEG4 MMS H264 Avg.18e5 1.94% • POSA uses significantly less number of iterations • On average less than 1% of that in NoCMap .61e6 1.74e4 2.30e6 2.02% 0.61e6 POSA 2.04% 1. NoCMap 4.94e4 POSA/NoCMap 0.

75 11.17 0.Expereiment and Result Runtime of SA (seconds) App VOPD MPEG4 MMS H264 Avg.41 POSA’ 0.059 1.04 1.63 0. a 237 times of speedup is achieved .087 0.072 NoCMap /POSA’ 364 267 147 171 237 POSA includes the runtime of the Nelder-Mead method POSA’ is the runtime of SA applying the optimized parameters On average.94 1.74 171. NoCMap 31.34 POSA 15.50 9.67 181.04 1.69 15.74 12.90 NoCMap /POSA 2.

Expereiment and Result Weighted Communication (WCA) The mapping solution of POSA yields comparable WCA with that of NoCMap .

Expereiment and Result Energy Consumption(EC) Be consistent with the result of WCA The mapping solution of POSA yields comparable communication energy consumption with that of NoCMap .

With the set of optimized parameters. For the set of benchmarks.Conclusion A method to systematically select the parameters of the SA algorithm for the application mapping problem is proposed. . significantly less number of evaluations are processed in the POSA and the SA algorithm is accelerated. The accelerated POSA algorithm achieves comparable energy consumption with NoCMap. the POSA obtains the same quality mapping solutions while using less than 1% of iterations of NoCMap and achieving an average of 237 times of speedup.

Thank for your attention! Comments and Questions .