Function Unit Specialization through Code Analysis
Daniel Benyamin and William H. Mangione-SmithElectrical Engineering Department, UCLA, Los Angeles, CA, 90095
f
benyamin,billms
g
@icsl.ucla.edu
Abstract
Many previous attempts at ASIP synthesis have employed tem- plate matching techniques to target function units to applicationcode, or directly design new units toextract maximum performance.This paper presents an entirely new approach to specializing hard-ware for application specific needs. In our framework of a parame-terized VLIW processor, we use a post-modulo scheduling analysisto reduce the allocated hardware resources while increasing thecode’s performance. Initial results indicate significant savings inarea, as well as optimizations to increase FIR filter code perfor-mance 200% to 300%.
1 Introduction
Digital signal processing (DSP) has exhibited remarkablegrowth in many mainstream commercial applications, and as a re-sult, now challenges traditional design flows to meet the markets’needs. While these design approaches can take many paths, theusual choices are either the hardware design of an ASIC or the soft-ware design for a programmable DSP.Taking an ASIC approach provides unmatched performance andcost benefits. Unfortunately, ASICs by definition tend to target anarrow range of applications. On the other extreme, programmableprocessors have become more powerful to meet the growing com-putational demand. Doing so while maintaining programmability,however, usually produces devices too costly and inefficient. A re-cent trend is to consider Application-Specific Instruction set Pro-cessors (ASIPs) [2]. ASIPs provide dedicated resources for criticaltasks (e.g. motion estimation), while offering a complete instruc-tion set for flexibility. The motivation for employing ASIPs is clear.Thecompiler has agood view of theapplication at hand, and shouldutilize the hardware accordingly.Our approach is to allow the compiler to have complete con-trol over the architectural design of a VLIW processor, a frame-work which we call Application Specific Configurable Architec-ture. Within this framework lies techniques to specialize the VLIWfunction units through code analysis. In this paper, we focus onoptimizing loops, a technique which often yields the greatest per-formance gains in DSP applications. More specifically, we strivetowards two independent goals:1. Create function units for smaller hardware at the cost of run-time performance.2. Create function units for faster run-time performance at thecost of larger area.The key to our approach is to use
modulo scheduling
[3] as aguide. From the modulo schedule tables we can decide what func-
MdesElcorthis paperExecutablecodePerformanceMonitorCandidatespecializationsRun-timestatistics
Pick candidatesGenerate newhardware description
Figure 1:
The system overview.tionality is necessary in the datapath, and what operations shouldbe optimized for greatest reduction in cycle count. In contrast toprevious efforts, this approach allows for fast and accurate special-ization.
2 Related Work and System Overview
Most prior research efforts either attempted to generate code forcomplex architectural features, or to directly synthesize these fea-tures from a base instruction set. Liem [4] took the traditional ap-proach of instruction-set matching, but included techniques to tar-get specialized datapath units. One problem with this and relatedmethods, as seenlater, is thatmorecomplex functions areless likelyto be used. Razdan [1] and Choi [5], on the other hand, both in-vestigated methods of synthesizing new architectural features. Un-fortunately, Razdan’s methods were limited to optimizing Booleanoperations, with little capability of further performance gains. Choigenerated complex, multi-cycle units, but did not account for thepossible explosion in hardware cost.Our system augments existing VLIW compiler technology; Fig-ure 1 illustrates the system we are using. The
MDES
and
ELCOR
components are part of the Trimaran compiler system [7, 8, 9].
MDES
is a flexible machine description system well-suited toVLIW architectures, and
ELCOR
is Trimaran’s back-end com-piler; it is within the
ELCOR
tool that we have inserted our al-gorithms. Currently, candidate specializations are produced in aseparate file, but no feed-back loop exists. We plan to add the capa-bility of iterating through the design space. All specialization stud-ies take place
after
scheduling, and only “pinhole” optimizationsare necessary to target new functionality.
Add a Comment