105.100.000 DesignnTest ESL Special Edition Complete

SPECIAL ITC SECTION
SEPTEMBER-OCTOBER 2006
ALEXANDER TORRES 2006
Electronic System-Level Design

Component-Based Design Platform-Based Taxonomy Improving Transition Delay Test
Authorized licensed use limited to: COMSATS INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on May 18,2010 at 06:52:08 UTC from IEEE Xplore. Restrictions apply.
SeptemberOctober 2006 Volume 23 Number 5 http://www.computer.org/dt
Features
ITC Special Section
335 G
Copublished by the IEEE Computer Society and the IEEE Circuits and Systems Society
uest Editors Introduction: The True State of the Art of ESL Design Sandeep K. Shukla, Carl Pixley, and Gary Smith Component-Based Design Environment for ESL Design Patrick Schaumont and Ingrid Verbauwhede odeling Embedded Systems: From SystemC and Esterel to DFCharts Ivan Radojevic, Zoran Salcic, and Partha S. Roop Platform-Based Taxonomy for ESL Design Douglas Densmore, Roberto Passerone, and Alberto Sangiovanni-Vincentelli he Challenges of Synthesizing Hardware from C-Like Languages Stephen A. Edwards
388 G 390 E
uest Editors Introduction: ITC Helps Get More out of Test Kenneth M. Butler xtracting Defect Density and Size Distributions from Product ICs Jeffrey E. Nelson, Thomas Zanon, Jason G. Brown, Osei Poku, R.D. (Shawn) Blanton, Wojciech Maly, Brady Benware, and Chris Schuermyer mproving Transition Delay Test Using a Hybrid Method Nisar Ahmed and Mohammad Tehranipoor mpact of Thermal Gradients on Clock Skew and Testing Sebasti A. Bota, Josep L. Rossell, Carol de Benito, Ali Keshavarzi, and Jaume Segura
338 A
348 M
402 I 414 I
359 A
375 T
ISSN 0740-7475
Cover design by Alexander Torres
Departments
333 387 425 426 428 430 432
From the EIC Counterpoint TTTC Newsletter Book Reviews Standards CEDA Currents The Last Byte
TECHNICAL AREAS ____

Analog and Mixed-Signal Test: Michel Renovell, LIRMM; renovell@lirmm.fr CAE/CAD: Dwight Hill, Synopsys; hill@synopsys.com Congurable Computing: Fadi Kurdahi, University of California, Irvine; kurdahi@ece.uci.edu Deep-Submicron IC Design and Analysis: Sani Nassif, IBM; nassif@us.ibm.com Defect and Fault Tolerance: Michael Nicolaidis, iRoC Technologies; michael.nicolaidis@iroctech.com Defect-Based Test: Adit Singh, Auburn University, adsingh@eng.auburn.edu Design for Manufacturing, Yield, and Yield Analysis: Dimitris Gizopoulos, University of Piraeus; dgizop@unipi.gr Design Reuse: Grant Martin, Tensilica; gmartin@ieee.org Design Verication and Validation: Carl Pixley, Synopsys; cpixley@synopsys.com Economics of Design and Test: Magdy Abadir, Freescale; m.abadir@freescale.com Embedded Systems and Software: Sharad Malik, Princeton University; sharad@ee.princeton.edu Embedded Test: Cheng-Wen Wu, National Tsing Hua University; cww@ee.nthu.edu.tw Infrastructure IP: Andr Ivanov, University of British Columbia; ivanov@ece.ubc.ca Low Power: Anand Raghunathan, NEC USA; anand@nec-labs.com Memory Test: Fabrizio Lombardi, Northeastern University; lombardi@ece.neu.edu Microelectronic IC Packaging: Bruce Kim, University of Alabama; bruce.kim@ieee.org Nanotechnology Architectures and Design Technology: Seth Goldstein, Carnegie Mellon University; seth.goldstein@cs.cmu.edu Performance Issues in IC Design: Sachin Sapatnekar, University of Minnesota; sachin@ece.umn.edu SoC Design: Soha Hassoun, Tufts University; soha@cs.tufts.edu System Specication and Modeling: Sandeep Shukla, Virginia Polytechnic and State University; shukla@vt.edu Member at Large: Kaushik Roy, Purdue University; kaushik@ecn.purdue.edu
DEPARTMENTS
Book Reviews: Scott Davidson, Sun Microsystems, scott.davidson@sun.com; Grant Martin, Tensilica, gmartin@ieee.org; and Sachin Sapatnekar, Univ. of Minnesota, sachin@ece.umn.edu Conference Reports and Panel Summaries: Yervant Zorian, Virage Logic; zorian@viragelogic.com DATC Newsletter: Joe Damore; joepdamore@aol.com Interviews: Ken Wagner, Design Implementation and Ottawa Design Centre, PMC Sierra; ken_wagner@pmc-sierra.com The Last Byte: Scott Davidson, Sun Microsystems; scott.davidson@sun.com Perspectives: Alberto Sangiovanni-Vincentelli, University of California, Berkeley, alberto@eecs.berkeley.edu; and Yervant Zorian, Virage Logic, zorian@viragelogic.com The Road Ahead: Andrew Kahng, University of California, San Diego; abk@ucsd.edu Roundtables: William H. Joyner Jr., Semiconductor Research Corp.; william.joyner@src.org Standards: Victor Berman, Cadence Design Systems; vberman@cadence.com TTTC Newsletter: Bruce Kim, University of Alabama; bruce.kim@ieee.org
Staff Editor Rita Scanlan IEEE Computer Society 10662 Los Vaqueros Circle Los Alamitos, CA 90720-1314 Phone: +1 714 821 8380 Fax: +1 714 821 4010 rscanlan@computer.org Group Managing Editor Janet Wilson j.wilson@computer.org Assoc. Staff Editor Ed Zintel Magazine Assistant dt-ma@computer.org Contributing Editors Thomas Centrella Noel Deeley Tim Goldman Louise ODonald Joan Taylor Art Direction Joseph Daigle Cover Design Alexander Torres Publisher Angela Burgess aburgess@computer.org Associate Publisher Dick Price Membership/Circulation Marketing Manager Georgann Carter Business Devel. Mgr. Sandy Brown Advertising Coordinator Marian Anderson
Editor in Chief Kwang-Ting (Tim) Cheng Univ. of California, Santa Barbara timcheng@ece.ucsb.edu Editor in Chief Emeritus Rajesh K. Gupta Univ. of California, San Diego gupta@cs.ucsd.edu Associate EIC Magdy Abadir Freescale Semiconductor m.abadir@freescale.com CS Publications Board Jon Rokne (chair) Michael R. Blaha Mark Christensen Frank Ferrante Roger U. Fujii Phillip Laplante Bill N. Schilit Linda Shafer Steven L. Tanimoto Wenping Wang CS Magazine Operations Committee Bill N. Schilit (chair) Jean Bacon Pradip Bose Arnold (Jay) Bragg Doris L. Carver Kwang-Ting (Tim) Cheng Norman Chonacky George V. Cybenko John C. Dill Robert E. Filman David Alan Grier Warren Harrison James Hendler Sethuraman (Panch) Panchanathan Roy Want
D&T ALLIANCE PROGRAM______________

DTAP chair: Yervant Zorian, Virage Logic; zorian@viragelogic.com Asia: Hidetoshi Onodera, Kyoto University; onodera@i.kyoto-u.ac.jp CANDE: Richard C. Smith, EDA and Application Process Consulting; dsmith@topher.net DAC: Luciano Lavagno, Politecnico di Torino, lavagno@polito.it; and Andrew Kahng, University of California, San Diego DATC: Joe Damore; joepdamore@aol.com DATE: Ahmed Jerraya, TIMA; ahmed.jerraya@imag.fr Europe: Bernard Courtois, TIMACMP; bernard.courtois@imag.fr Latin America: Ricardo Reis, Universidade Federal do Rio Grande do Sul; reis@inf.ufrgs.br TTTC: Andr Ivanov, University of British Columbia; ivanov@ece.ubc.ca
Submission Information: Submit a Word, pdf, text, or PostScript version of your submission to Manuscript Central, http://cs-ieee.manuscriptcentral.com. Editorial: Unless otherwise stated, bylined articles and columns, as well as product and service descriptions, reect the authors or rms opinions. Inclusion in IEEE Design & Test of Computers does not necessarily constitute endorsement by the IEEE Computer Society or the IEEE Circuits and Systems Society. All submissions are subject to editing for clarity and space considerations. Copyright and reprint permissions: Copyright 2006 by the Institute of Electrical and Electronics Engineers, Inc. All rights reserved. Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of US Copyright Law for private use of patrons (1) those post-1977 articles that carry a code at the bottom of the rst page, provided the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923; (2) for other copying, reprint, or republication permission, write to Copyrights and Permissions Department, IEEE Publications Administration, 445 Hoes Lane, PO Box 1331, Piscataway, NJ 08855-1331.
ADVISORY BOARD _____

Anthony Ambler, University of Texas at Austin Ivo Bolsens, Xilinx William Mann Tom Williams, Synopsys Yervant Zorian, Virage Logic
IEEE Design & Test of Computers (ISSN 0740-7475) is copublished bimonthly by the IEEE Computer Society and the IEEE Circuits and Systems Society. IEEE Headquarters: 345 East 47th St., New York, NY 10017-2394. IEEE Computer Society Publications Ofce: 10662 Los Vaqueros Circle, PO Box 3014, Los Alamitos, CA 90720-1314; phone +1 714 821 8380. IEEE Computer Society Headquarters: 1730 Massachusetts Ave. NW, Washington, DC 20036-1903. IEEE Circuits and Systems Society Executive Ofce, 445 Hoes Lane, Piscataway, NJ 08854; phone +1 732 465 5853. Annual subscription: $38 for CS members and $68 for other IEEE society members in addition to IEEE and Computer Society dues; $69 for members of other technical organizations outside the IEEE. Back issues: $20 for members and $96 for nonmembers. The Biomedical Engineering Citation Index on CD-ROM lists IEEE Design & Test of Computers articles.
Postmaster: Send undelivered copies and address changes to IEEE Design & Test of Computers, Circulation Dept., PO Box 3014, Los Alamitos, CA 90720-1314. Periodicals postage paid at New York, NY, and at additional mailing ofces. Canadian GST #125634188. Canada Post Corp. (Canadian distribution) Publications Mail Agreement #40013885. Return undeliverable Canadian addresses to 4960-2 Walker Road; Windsor, ON N9A 6J3. Printed in USA.
The new world of ESL design

DESIGNERS ARE HUNGRY for electronic systemlevel (ESL) methodologies and supporting tools that can raise the abstraction level of design entry and enhance the global analysis and exploration of design trade-offs. A recent report by Gartner Dataquest on worldwide EDA market trends forecasted a strong growth rate for ESL tools over the next five years. However, existing solutions remain inadequate, and a comprehensive ESL design infrastructure brings with it several challenges that design and test professionals must solve. This issue of IEEE Design & Test discusses some of these challenges and their corresponding solutions. Guest editors Sandeep Shukla, Carl Pixley, and Gary Smith have collected a set of interesting articles concerning languages, tools, and methodologies of ESL design. Id like to thank them for the great job theyve done in putting together this strong issue.
In addition, we are happy to present a special section highlighting the 2006 International Test Conference (ITC). In the sub-65-nanometer technology era, in which electronic products encounter a far wider variety of failure sources and a higher failure rate than ever, test has gradually expanded its role in the semiconductor industry. Test is no longer limited to defect detection. It has become a critical technology for debugging, yield improvement, and design for reliability as well. This trend inspired this years ITC theme, Getting More out of Test. Guest editor Ken Butler, 2005 ITC program chair, has selected three articles for this special section that highlight this theme. We also have some exciting plans for the next few issues of D&T. Special-issue themes will include impor-
tant industry topics such as process variation and stochastic design and test, biochips, functional validation, and IR drop and power supply noise effects on design and test. We will also present exciting roundtables, such as the one moderated by Andrew Kahng at the 43rd Design Automation Conference (DAC 06), on design and tool challenges for next-generation multimedia, game, and entertainment platforms. In addition, at the 6th International Forum on Application-Specic MultiProcessor SoC (MPSoC 06), Roundtables editor Bill Joyner moderated a roundtable on single-chip multiprocessor architectures, which we will include in a future issue of D&T. You will also see interesting interviews with key technologists, such as Texas Instruments Hans Stork, keynote speaker at this years DAC. If youd like to participate in a future D&T issue, please submit your theme or nontheme manuscript as soon as it is ready. To serve as a guest editor, submit your special-issue proposal for evaluation by the D&T editorial board. See D&Ts Web site (http://computer. org/dt) for guidelines. For additional information or clarication, please feel free to contact me directly.
Kwang-Ting (Tim) Cheng Editor in Chief IEEE Design & Test
0740-7475/06/$20.00 2006 IEEE
Copublished by the IEEE CS and the IEEE CASS
SeptemberOctober 2006
333
Guest Editors Introduction: The True State of the Art of ESL Design
Sandeep K. Shukla
Virginia Polytechnic and State University
Gary Smith
Gartner Dataquest
Carl Pixley
Synopsys
ESL, OR ELECTRONIC SYSTEM LEVEL, is a new buzzword in the EDA industry. It has found its way into the mainstream EDA vocabulary in the past few years because of increased interest in nding new ways to raise the abstraction level for the design entry point during the electronic-systems design process. In hardware design over the past three decades, the design entry point has moved upward in the abstraction hierarchyfrom hand-drawn schematics to gate-level design, to RTL descriptions. As hardware design complexity has become increasingly unmanageable, finding ways to design hardware ICs at higher abstraction levels and developing tools to automatically create the circuits actual layouts has gained more importance in industry and academia. This upward trend in abstraction has enabled engineers to exploit the scientic and engineering advances that have tracked Moores law quite closely. Since the late 1980s, the design entry points abstraction level had remained almost stagnant at the structural RTL. Behavioral synthesis had remained mostly elusive, with some domain-specic success areas, such as DSP chips. But by the late 1990s, recognition of the socalled productivity gap problem led to various attempts at abstraction enhancement. These attempts gave rise to various languages for system-level design, such as SpecC, SystemC, and variants of these. Tools and methodologies for using these languages for design entry have emerged in the market and in academic circles. In the meantime, integrated systems such as cell phones, network routers, consumer electronics, and personal electronic devices like PDAs started to dominate
the electronics landscape. In contrast to earlier computing devices such as microcontrollers and general-purpose microprocessors (GPPs), these systems had one thing in common: you could distribute their functionality into hardware or software with sufcient uidity based on various trade-offs in performance, power, cost, and so on. This development broke the hardware abstraction for software development hitherto used in traditional computing platforms, such as GPPs, as illustrated by the Windows and Intel platforms in desktop computing. Another phenomenon that had occurred for decades in avionics, automotive, and industrial-control systems also gained increased attention among EDA researchers. The design of such systems embedded software was typically at a much higher abstraction level, using synchronous languages, Argos-like visual environments, and so on to describe required control trajectories. Control-systems engineers had also used Matlab and similar mathematical and visual enhancements of such tools for decades to design, validate, and even synthesize their software. In the meantime, increasingly more computing devices were mixtures of software and hardware, and there was increased exibility for deciding the hardwaresoftware partitioning. Consequently, architectural exploration at the functional and architectural level became increasingly critical for nding the right trade-off points in the design. Its best to perform such explorations before the designers commit to the RTL hardware logic design, or before the embedded-software writers commit to the embedded-software code. These evolutionary trajectories of electronic-system design led to the introduction of ESL design. According
0740-7475/06/$20.00 2006 IEEE
335
to the popular ESL Now! Web site (http://www. esl-now.com), ESL design concerns the following:
the development of product architectures and specications, including the incorporation and conguration of IP, the mapping of applications to a product specication, including hardware/software partitioning and processor optimization, the creation of pre-silicon, virtual hardware platforms for software development, the determination/automation of a hardware implementation for that architecture, and the development of reference models for verifying the hardware.
In this special issue, we explore recent developments in ESL languages, tools, techniques, and methodologies related to improving productivity or enhancing design quality. We wanted this issue to answer the following key questions regarding ESL design:

What is ESL design, and what are the current languages that support ESL features? What tool chains and design flows are appropriate for ESL-based design and validation? What new validation techniques and methodologies are available if ESL abstractions are used in a design ow? Are there any test technology benets? Are there major industrial projects today that have been successful due to ESL usage? What are the market indicators and forces that might make or break ESL design?
Although the articles in this special issue dont necessarily answer all these questions, they address some key issues and are quite thought-provoking. In the rst article, Patrick Schaumont and Ingrid Verbauwhede focus on two properties they see as key to ESL design: abstraction and reuse. They present an ESL design ow using the Gezel language, and they show with several very different design examples how Gezel supports their case for reuse and abstraction. The second article, by Ivan Radojevic, Zoran Salcic, and Partha Roop, considers the need for directly expressing heterogeneous, hierarchical behaviors for modeling specic embedded systems. The authors examined two existing ESL languages: SystemC and Esterel. Their analysis led them to create a new computation model as well as a graphical language to gain the direct expressivity
they need for their model. Although there have been various attempts at changing SystemC and Esterel to t modeling requirements, these authors mainly consider standard SystemC and Esterel here. In the next article, Douglas Densmore, Roberto Passerone, and Alberto Sangiovanni-Vincentelli attempt to stem the seemingly ever-increasing tide of confusion that permeates the ESL world. Not only are software developers and hardware designers having difculty nding a common languageverbally, as well as designwisebut communication failures are common within those communities as well. Traditionally, there are three rules of design: First, there is a methodology, then there is a design ow, and last there are the tools necessary to ll that ow. But, as this article points out, we seem to have approached ESL backward. We have built tools, but we have no ow. And, it goes without saying, we have no methodology. No wonder then that the predictions of ESL taking off in the next four years seem to be overly optimistic. Still, the customer demand is there. But these customers have had to ll the need with internally developed ESL tools. The University of California, Berkeley, has long been the champion of platform-based design, and these authors base their taxonomy on a combination of UC Berkeleys platform work and Dan Gajskis Y-chart work (at UC Irvine). Hopefully, this taxonomy will help stem the tide of confusion and enable the design community to turn around its ESL efforts. Finally, the article by Stephen Edwards presents one side of an ongoing debate on the appropriateness of Clike languages as hardware description languages. In the ESL landscape, it is often assumed that a high-level programming language can substitute for a higherabstraction-level hardware description language. This article attempts to deconstruct such a myth about the C programming language by extensively documenting the shortcomings of such an approach and by identifying the features that an ESL language should have. A brief alternative opinion by John Sanguinetti immediately follows this article.
ESL DESIGN, METHODOLOGIES, LANGUAGES, AND

TOOLS are still not clearly identied and taxonomized, and the articles in this special issue attempt to reduce some of the confusion regarding the term ESL. However, we believe that we are still in the early stages of ESLbased design. Many more discussions, expository articles, and debates must take place before it can nd its permanent design entry point in industry.
336
IEEE Design & Test of Computers
The articles in this special issue could not cover everything. Although many of the synthesis technologies mentioned address algorithm designfor instance, for DSPtechnologies to synthesize high-level control logic are necessary for ESL design to address the breadth of circuits designed by hardware engineers. In the recent past, researchers insufciently addressed behavioral synthesis, but this segment is now showing increased activity. Bluespec (http://www.bluespec.com), for example, offers new technology to raise the abstraction level for complex control logic and to synthesize RTL design from these descriptions. Other behavioral-synthesis solutions are coming to the market as transaction-level models. We hope you find the articles in this special issue interesting. We encourage you to send us critiques, comments, or questions about this special issue. Letters to the editor for publication in future issues are also encouraged. Finally, we thank the authors, the reviewers, and the editorial staff at IEEE Design & Test for their help in making this issue possible.
Sandeep K. Shukla is an assistant professor of computer engineering at Virginia Tech. He is also founder and deputy director of the Center for Embedded Systems for Critical Applications (CESCA), and he directs the Fermat (Formal Engineering Research with Models, Abstractions, and Transformations) research lab. His research interests include design automation for embedded-systems design, especially system-level design languages, formal methods, formal specication languages, probabilistic modeling and model checking, dynamic power management, application of stochastic models and model analysis tools for defect-tolerant system design, and reliability measurement of defect-tolerant systems. Shukla has a PhD in computer science from the State University of New York (SUNY) at Albany. He has been elected as a College of Engineering Faculty Fellow at Virginia Tech, and he is on the editorial board of IEEE Design & Test. Carl Pixley is group director at Synopsys. His pioneering achievements include model checking based on binary decision diagrams (BDDs), Boolean equivalence, alignability equivalence, constraint-based verication, and C-toRTL verication. Pixley has a PhD in mathematics from SUNY at Binghamton. He is a member of the IEEE and
the Mathematics Association of America, and is verication editor for IEEE Design & Test. Gary Smith is a chief analyst at Gartner Dataquest, where he is part of the Design & Engineering Group and serves in the Electronic Design Automation Worldwide program. His research interests include design methodologies, ASICs, and IC design. Smith has a BS in engineering from the United States Naval Academy in Annapolis, Maryland. He is a member of the Design Technology Working Group for the International Technology Roadmap for Semiconductors (ITRS). Direct questions or comments about this special issue to Sandeep K. Shukla, Department of Electrical and Computer Engineering, Virginia Polytechnic and State University, Blacksburg, VA 24061, shukla@ vt.edu; Carl Pixley, Synopsys, 2025 NW Cornelius Pass Rd., Hillsboro, OR 97124, cpixley@synopsys.com; or Gary Smith, Gartner Dataquest, 281 River Oaks Pkwy, San Jose, CA 95134, gary.smith@gartner.com.
For further information on this or any other computing topic, visit our Digital Library at http://www.computer.org/ publications/dlib.
The IEEE Computer Society

publishes over 150 conference publications a year. For a preview of the latest papers in your eld, visit
www.computer.org/publications/
337
A Component-Based Design Environment for ESL Design

Patrick Schaumont
Virginia Tech
Ingrid Verbauwhede
Katholieke Universiteit Leuven
Editors note: This article focuses on two key properties that the authors see as critical to ESL design: abstraction and reuse. The authors present an ESL design flow using the Gezel language. Using several very different design examples, they show how this design flow supports their case for abstraction and reuse. Carl Pixley, Synopsys
ESL design has many faces
RECENTLY, there has been an increasingly greater variety of target architecture options for digital electronics design. Whereas the driving applications for these architectures are often governed by standards and thus tend to be regularized, there is still a lot of design freedom in the target architectures themselves. There is a wide range of programmable-processor architectures,1,2 and with any given application, designers must balance performance, power consumption, time to market, and silicon cost.3 The obvious question is how to choose the most appropriate target architecture for a given application. In this article, we present Gezel, a component-based, electronic system-level (ESL) design environment for heterogeneous designs. Gezel consists of a simple but extendable hardware description language (HDL) and an extensible simulation-and-refinement kernel. Our approach is to create a system by designing, integrating, and programming a set of programmable components. These components can be processor models or hardware simulation kernels. Using Gezel, designers can clearly distinguish between component design, platform integration, and platform programming, thus separating the roles of component builder, platform builder, and platform user. Embedded applications have driven the development of this ESL design environment. To demonstrate the broad scope of our component-based approach, we discuss three applications that use our environment; all are from the eld of embedded security.
A common definition for ESL design is the collection of design techniques for selecting and refining an architecture. But ESL design has many aspects and forms. Even within a single application domain, system-level design can show wide variations that are difficult to capture with universal design languages and architectures. Therefore, you can also think of ESL design as the ability to successfully assemble a system out of its constituent parts, regardless of their heterogeneity or nature. Consider the following three examples. All of them closely relate to design for secure embedded systems, but they also require very different design configurations. Thus, these examples show the need for a more general approach, which we achieve using Gezel.
Example 1: Public-key cryptography on 8-bit microcontrollers

Sensor networks and radio-frequency identication tags are examples of the next generation of distributed wireless and portable applications requiring embedded privacy and authentication. Public-key systems are preferable because they allow a more scalable, exible key distribution compared to secret-key cryptosystems. Unfortunately, public-key systems are computationally intensive and hence consume more power. Recent proposals suggest replacing the RSA (Rivest-ShamirAdleman) system with more economical solutions such as elliptic-curve cryptosystems (ECCs) or hyper-ellipticcurve cryptosystems (HECCs). ECCs and HECCs provide security levels equivalent to RSA but with shorter word lengths (a 1,024-bit RSA key is equivalent to a 160-bit ECC key and an 83-bit HECC key), at the expense of highly complex arithmetic. Figure 1 shows the hierarchy and mapping of such a system. On top is the HECC
338
0740-7475/06/$20.00 2006 IEEE
point multiplication operHyper-elliptic-curve ation, which consists of a cryptography (HECC) API sequence of basic ellipticcurve point operations. Each of these basic ellipC code Scalar multiplication tic-curve operations conSoftware sists of a sequence of 8051 CPU more elementary operaAssembly Point or divisor language tions in the underlying operations P0 P1 routines Galois field. For HECC, this field is 83 bits. If the Combined Microcode system were an ECC, this CTL DATA Galois field (2nelements) sequences eld would be 160 bits. operations Hardware Galois field (2n) We implemented this coprocessor design as an 8051 microBasic n Galois field (2 elements) Data path controller, extended with operations a hardware acceleration unit. The 8-bit microcontroller interfaces are quite Figure 1. Public-key cryptography on an 8-bit microcontroller. narrow compared to HECC word lengths. Therefore, when building a hardware acceleration unit, it is crucial to silicon.4 The protocol, shown in Figure 2a, accepts an consider overall system performance. Because of the hier- input ngerprint and compares it to a prestored, secret archy in the calculations, there are multiple ways to accel- template. The matching algorithm must treat this temerate the HECC operationsin contrast to secret-key plate as a secret, and the ThumbPod-2 system stores it algorithms, which have fewer hierarchy layers and thus in a secure circuit style that is resistant to side-channel offer fewer implementation choices. As a stand-alone opti- attacks. However, because the matching algorithm mized C implementation, an HECC point multiplication manipulates the template, part of the algorithms circuit takes 192 seconds to calculate. A small hardware accel- must also migrate to a secure circuit style. Because this erator, requiring only 480 extra FPGA lookup unit tables secure circuit style consumes twice the area of normal (LUTs) and 100 bytes of RAM, improves this time by a fac- circuits, mapping the complete matching protocol to it tor of 80, to only 2.5 seconds. Figure 1 indicates the result- would be inefcient. We therefore separated the protoing split between hardware and software, which is not yet col into an insecure software partition and a secure optimal for an 8051. hardware partition, and we ended up with the impleHardware acceleration makes HECC public key pos- mentation in Figure 2b. The software reads the input nsible on small, embedded microcontrollers. But the gerprint and feeds the data to the oracle inside the optimal implementation depends on the selection of secure partition. The oracle compares each input minuthe mathematical algorithms and the system-level archi- tia with the template minutia, returning only a globaltecture. Only a platform-based design approach makes matching result: reject or accept. It is impossible for an this design space exploration possible and discloses attacker with full access to the untrusted software to opportunities for global improvement. determine how the oracle has obtained this decision. The design and verification of the secure protocol requires continuous covalidation between hardware Example 2: Concurrent codesign for secure and software. We evaluated various attack scenarios partitioning The design of secure embedded systems leads to that attempt to extract the secret template from the design cases requiring tight interaction between hard- secure hardware partition, assuming that the attacker ware and softwareeven down to the single-statement can arbitrarily choose the software program at the inselevel. Figure 2 shows a ngerprint authentication design, cure partition. This led to an iterative renement of the the ThumbPod-2 system, which is resistant to side-chan- oracle interface and the driving software, which we nel attacks; we implemented and tested this design in designed completely within the Gezel environment.
339
RAM or Flash ThumbPod-2 client Minutiae extraction
Leon-2 processor
Matching algorithm
Bridge
AMBA
UART
Out port Matching algorithm Reject Load bogus Accept Template Load master Master key
In port
Chip command interface
Oracle
Cryptography module Master key
Session key Sk
AMBA Advanced Microcontroller Bus Architecture UART Universal asynchronous receiver transmitter
Template
Root of trust (b)
Secure circuit style
(a)
Figure 2. Partitioning for security in the ThumbPod-2 system: protocol for session key generation (a), and implementation (b).
Example 3: Accelerated embedded virtual machines

For a third application, shown in Figure 3, we had to provide hardware acceleration of a cryptographic library for an embedded virtual machine.5 We used a Java embedded virtual machine, the Kilobyte Virtual Machine (KVM), extended with native methods that allow hardware access directly from a Java application. We integrated an advanced encryption standard (AES) coprocessor into the Java virtual machines host processor, and we triggered execution of the coprocessor using a native method. The virtual machine handles all data management and synchronization. As Figure 3b shows, hardware acceleration can improve performance by two orders of magnitude. Moreover, data movement from Java, to and from the coprocessor, has two orders of magnitude of overhead compared to actual hardware execution. A combined optimization of the Java-native API, the coprocessor, and the coprocessor interface is necessary to avoid design errors and, more importantly, security holes in the nal system. All three examples are practical design problems from the eld of embedded security. There is no unied design platform or unied design language that could solve all of them. However, its still possible to general-
ize their underlying design principles by using a component-based approach.
Component-based ESL design

Each programmable architecture comes with a specific set of design techniques. ESL design, therefore, is no tightly knit set of techniques, tools, and data models. Unlike RTL design, which logic synthesis enabled, ESL design doesnt offer a standard design ow. In fact, ESL design might never be unied in a single design ow, given the architectural scope, the complexities in capturing all facets of an application, and the daunting task of developing tools for these different facets. Still, all ESL technologies share two fundamental objectives: facilitating design reuse and supporting design abstraction. These two objectives have guided every major technology step that has turned transistors into gates, and gates into systems. Reuse and abstraction for ESL design, however, are unique and different from other technology transitions. In ESL design, reuse relates not only to architectures but also to design environments. For example, when a designer builds a SoC architecture around a microprocessor, the microprocessors compiler and the instruction-set simulator (ISS) are as critical to the designs success as the actual microprocessor implementation.
340
CPU
Java application electronic code book aes() J2ME Kilobyte Virtual Machine (KVM) Cryptography
Java API interface 5.28 Overhead (no. of cycles, log10) 5 4 3 2 1 3.25 109 performance gain
KVM native interface (KNI) Advanced encryption or Driver C standard (AES) Cryptography hardware AES Core (a)
KNI interface Hardwaresoftware interface AES coprocessor
160 integration overhead
0 (b)
Java
Hardware
Figure 3. Accelerated embedded virtual machine: general structure (a) and performance improvements and associated overhead (b).
The compiler and the simulator are Programming interface reused design environments, and the microprocessor is a reused design artifact. Simulation-and-refinement As another example, consider SystemC. kernel Application Component You can view SystemC as a reusable independent design design environment for RTL hardware Integration interface components. As a C++ library, it can link Finite-state Instruction to any environment that needs RTL hardmachine with data set ware design capability; thus, the SystemC path (FSMD) architecture library itself is a reusable component. Abstraction in ESL design concerns Gezel Instruction not only the masking of implementation Application-domain Platform kernel set simulator specific design details but also platform programming mechanisms. Finding successful systemIP block Memory bus level abstractions is extremely difficult, Scheduling and interconnect because abstractions tend to restrict the scope of coverable target architectures. Gezel ARM-C For example, C is a successful programming abstraction for a single-core system, but it becomes cumbersome in multiApplication Platform-based Platform specific design core systems. Despite the multitude of system-level design languages, none has so far been able to unify the architecture Figure 4. Three phases for ESL design automation: component, platform, design space in a single programming and platform based. abstraction. These two elements of ESL designits reuse of as a single programmable element included in a platdesign environments and design artifacts, and the com- form. For example, a microprocessor, reconfigurable ponent-specific nature of its programming abstrac- hardware, a software virtual machine, and the SystemC tionsguided us toward a component-based approach simulation kernel are all programmable components. As Figure 4 shows, a component-based model for ESL in system design. In ESL design, we dene a component
341
Designers have used component-based design FSMD1 FSMD2 FSMD1 FSMD2 approaches, typically in software development, to Wire (input is address problems requirIP block instantaneously ing high architectural defined by connected output) flexibility. For example, (a) (b) Cesario et al. presents a component-based apFigure 5. Finite-state machine with data path (FSMD) network: pure (a) and extended (b). proach for multiprocessor SoC (MPSOC) design,8 design requires a design flow with three phases of based on four types of components: software tasks, design: component, platform, and platform based. These processor cores, IP cores, and interconnects. phases correspond to the creation, integration, and use of programmable components. Several different engi- Designing and integrating FSMD neers might work in each design phase, each with his components with Gezel own perspective on an application. These engineers genThe Gezel design environment (http://rijndael.ece.vt. erally fall into one of three categories: design automa- edu/gezel2) supports the modeling and design of hardtion, hardware design, or software design. Figure 4 offers ware components. By integrating the Gezel kernel with the perspective of the design automation engineer. other simulators (such as ISSs), we obtain a platform In component design, a design automation engineer simulator. The three examples we discussed all rely on develops a design environment for a single program- custom hardware design, each with a different platform. mable component. The engineers can do this indepen- Weve combined Gezel with other programmable comdent of the application. Two interfacesintegration and ponents, such as 32- and 8-bit cores. Weve also comprogrammingcharacterize a programmable compo- bined it with other types of programming environments, nent. Through the integration interface, a component including the SystemC simulation kernel and Java. For connects to another (possibly heterogeneous) compo- the parts of the design described in the Gezel language, nent. Between these two is a simulation-and-renement the Gezel design environment automatically creates kernel. Component design can be very elaborate, VHDL, enabling technology mapping into FPGA or including, for instance, the development of an ISS and standard cells. a C compiler for a new processor. In platform design, a design engineer or design Platform-based design using Gezel automation engineer selects various programmable The Gezel language captures hardware using a cyclecomponents and combines them into a single platform based description paradigm based on the finite-state by interconnecting their integration interfaces. Platform machine with data path (FSMD) model. Widely used for design requires the creation of a platform system sched- RTL hardware design, this model has been popularized uler to coordinate the individual components activi- through SpecCharts and SpecC.9 The FSMD model ties. This phase also requires the creation of expresses a single hardware module as a combination communication channels between components. The of a data path and its controller. You can combine sevnotion of a platform as an articulation point between eral different FSMDs into a network, as Figure 5a shows. application and architecture is a well-known concept.6,7 A pure FSMD network is only of limited value for a platIn platform-based design, a design engineer devel- form simulator, because such a network supports only ops an application by writing application programs for communication between FSMDs. Such a network doeseach programmable component in the platform. The nt have the means to communicate with any part of a platform simulator lets the designer instantiate a partic- platform that is not captured as an FSMD. To employ FSMDs as platform components, Gezel ular application and tweak overall system performance. For heterogeneous components, its important to bring supports extended FSMD networks, as Figure 5b shows. the individual components programming semantics Such an extended FSMD network also includes a secsufciently close together so that a designer can easily ond type of module called an IP block. An IP block has an interface similar to that of an FSMD, but the IP block migrate between them.
342
is implemented outside the Gezel language. A similar concept of capturing heterogeneity also exists in Ptolemy.10 Technically, an IP block is implemented as a shared library in C++ and thus can include arbitrary programming constructs within the boundaries of a cycle-based interface. To the Gezel programmer, the IP block looks like a simulation primitive. The platform designer defines the IP blocks behavior. In a component-based design model, these IP blocks implement communication channels, which connect Gezel to a wide range of other components, such as ISSs, virtual machines, and system simulation engines.
FSMD Application
Embedded software
Platform simulator Parser Gezel kernel (C++ library) VHDL code generator C++ code generator RT code generator Communication channel Executableobject hierarchy User-defined IP block implementation Instruction set simulator
Platform design using Gezel
Figure 6 illustrates a platform simulator that uses the Gezel kernel and several ISSs. Each component simulator exists Figure 6. Gezel platform simulator. as an individual (C++) library, linked together in a system simulation. For this platform simulator, we use IP blocks to implement the simulators to the Gezel FSMD model using IP blocks. cosimulation interfaces between the Gezel model and There are two categories of IP blocks, corresponding the ISS. In addition, a system scheduler calls all the to two different design scenarios: IP blocks that model included component simulators. We implement the a processors bus or a dedicated communication port to implement a coprocessor design scenario like the platform simulator in C++. The extended FSMD network in Gezel, combined one in Figure 7a. Other IP blocks capture a complete with the component-based design model, offers essen- component. Designers can also use the Gezel IP block construct tial advantages over a traditional HDL- or SystemCbased approach. VHDL has no means to natively to explore multiprocessor architectures, such as the support a simulation setup like the one in Figure 6, PicoBlaze microcontrollers shown in Figure 7b. In the because it lacks the equivalent of an IP block construct. multiprocessor design scenario, the Gezel model capConsequently, an HDL-based design ow usually imple- tures the complete platform, clearly improving exibilments such a simulation setup at the HDL level. This ity. In addition, this model allows dynamically selecting needlessly increases simulation detail and penalizes the number and types of cores. The Gezel language captures synchronous, single-clock hardware designs. The simulation performance. Its also possible to implement such a simulation platform simulators in Table 1, however, can accomsetup in SystemC. But the platform and the application modate multiple clock frequencies to the individual are no longer distinguishable, because SystemC cap- processors included within the simulation. Many of the environments in Table 1 are open tures everything in C++. This complicates the synthesis of the application onto the final platform. In other source, which greatly eases the construction of platform words, SystemC does not distinguish between the plat- simulators. In commercial environments, open source might still be an unattainable goal, but there are still sigform and platform-based design phases. Table 1 lists several platform components that nicant benets from using an open interface. Several weve used with Gezel to create platform simulators. of our cosimulators (including TSIM and SH-ISS) use They include 8- and 32-bit ISSs, Java (through its native commercial, closed-source components, built on the interface), and SystemC. We coupled each of these basis of an open interface.
Cycle-true system scheduler
343
(expressed in VHDL or SystemC, for example). Simulation Cross-compiler IP block interface The metadata provides Component engine* or assembler Core Port or bus additional language-neutral information on the IP 8-bit cores interface. However, a Atmel AVR Avrora GNU avr-gcc component-based design PicoBlaze kpicosim KCPSM3 assembler flow with Gezel does not 8051 Dalton ISS SDCC, Keil CC need this encapsulation, 32-bit cores because the language ARM Simit-ARM GNU arm-linux-gcc directly models the reuse Leon2-Sparc TSIM GNU sparc-rtems-gcc interfaces. Indeed, these SH3-mobile SH-ISS GNU sh-elf-gcc reuse interfaces correSimulation engines spond to the set of IP Java JVM 1.4 javac blocks that connect the SystemC SystemC 2.0.1 GNU g++ * Information on simulation engines is available as follows: Gezel models to other platAvrora: http://compilers.cs.ucla.edu/avrora (open source); form components. kpicosim: http://www.xs4all.nl/~marksix (open source); Consider the case in Dalton ISS (Dalton 8051): http://www.cs.ucr.edu/~dalton/i8051 (open source); which multiple parties Simit-ARM: http://sourceforge.net/projects/simit-arm (open source); participate in the platTSIM (TSIM 1.2; cross compiler, sparc-rtems-gcc 2.95.2): http://www.gaisler.com; form-based design phase. SH-ISS (Renesas SH3DSP simulator and debugger, v3.0; cross compiler: sh-elf-gcc 3.3): For example, for the simuhttp://www.kpitgnutools.com lator of Figure 6, assume that an IP developer creates hardware components in Gezel, and a system inteSystematic reuse with a component-based grator creates the system (embedded) software. In such approach We can also implement IP management with Gezel. a case, the IP developer expects a reasonable level of IP transfer is notoriously difficult because reuse inter- IP protection before releasing the actual implementafaces are hard to dene. Microprocessor buses have tra- tion, whereas the system integrator wants access to the ditionally been the reuse interface of choice. New hardware components as detailed and as soon as posindustry efforts such as the Open Core Protocol IP (OCP- sible. Gezel can support this scenario, as Figure 8 shows. IP, http://www.ocpip.org) and the Spirit consortium We define two phases in the IP transfer. In IP creation (http://www.spiritconsortium.com) have focused on and evaluation, the IP developer provides a cycle-based generically packaging IP components rather than using simulation model of the hardware IP as a black box to standard buses. Spirits approach is to provide a meta- the system integrator; this model provides a nonsyndata model that encapsulates existing IP components thesizable simulation view of the IP. When the system
Table 1. Platform simulators using Gezel. Gezel model IP block FSMD network IP block PicoBlaze IP block FSMD network PicoBlaze IP block Gezel model
ARM
(a)
(b)
Figure 7. Application of different IP block categories: coprocessor (a) and multiprocessor (b) design scenarios.
344
integrator decides to Hardware developer System integrator acquire the hardware IP, Gezel the second phase of the IP (Black-box IP transfer begins. Now the view) FSMD creation Gezel IP developer provides a and ARM-C IP block evaluation synthesizable version of the hardware IP in VHDL. VHDL code Generate Simulation The component-based generation simulation view library approach of Gezel is wellPlatform C++ suited for this IP design flow. We model black Implementation boxes as IP blocks. The IP block simulation views are IP ARM in binary format as shared IP transfer VHDL bus transfer core VHDL libraries, and thus of little Platform implementation value for this implementation. We wrote two code generators for FSMD networks in Gezel. The first Figure 8. IP reuse in the platform-based design phase. converts FSMDs into equivalent IP block simulation views. The second converts FSMD Table 2. IP model complexity. (NCLOC: noncommented source line of code) into synthesizable VHDL code. The IP Model line count (NCLOC) Area (no. developer can use them together to Design Gezel C++ (IP blocks) VHDL of LUTs)* Speed (ns)** implement the design ow of Figure 8. Table 2 shows several examples of IP CHKSUM 149 1,564 907 131 9.19 modules designed in Gezel. They range EUCLID 69 710 62 557 560.00 from simple components, such as an JPEG 526 8,091 719 5,514 14.62 Internet packet check-sum evaluation AES 292 2,653 1,807 3,332 8.29 module (CHKSUM) to complex IP modBOXMUL 763 6,105 6,282 4,225 20.30 ules, such as an AES module and a high* Target platform was Xilinx Virtex4, speed grade 12 speed Gaussian-noise generator for ** Speed is the clock period we recorded after place and route. bit-error-rate measurements (BOXMUL). For each module, Table 2 lists the line counts of the original Gezel design and the amount of 8051 ISS (http://www.cs.ucr.edu/~dalton/i8051/). Using generated code in C++ and VHDL. We also mapped the IP block models, we designed communication links VHDL code onto an FPGA, and Table 2 gives the area between the 8051 ISS and the coprocessor. We develand speed of the results. We expect the numbers shown oped the driver software running on the 8051 using the to be close to those of manually written VHDL. For Keil tool suite. The platform simulator maps the HECC example, a comparable AES design by Usselman on mathematical formulas into a combination of C, assemXilinx Spartan3 technology lists a LUT count of 3,497. bly language, and hardware. After obtaining a suitable partitioning, we converted the hardware coprocessor into VHDL. We then combined this coprocessor with a Design examples revisited Now, we briefly discuss how we used our compo- synthesizable view of the 8051 processor and mapped nent-based approach to support the three design exam- it into an FPGA. ples presented earlier.
Public-key cryptography
The platform simulator for the HECC application consisted of two components: the Gezel kernel and the
Security partitioning for an embedded ngerprint authentication design

This platform contains the Leon2 ISS and the Gezel kernel. We constructed it in a process similar to that of
345
constructing the public-key cryptography platform. We developed software using the GNU tool suite. In a later design phase, we used the VHDL code generator to convert the Gezel design into VHDL, eventually leading to a tested and fully functional chip.4 This design, however, requires tting the hardware coprocessor onto a nonstandard synthesis design flow based on logic for resisting side-channel attacks. So that chip designers could verify their custom synthesis ows, we extended the platform simulator to record trace stimuli for individual hardware modules. We can also provide this capability using the IP block approach. It is important to separate design flow issues, such as the stimuli recording facility, from actual design issues. The design flow in Figure 4 also supports this concept by distinguishing between the platform builder and the platform user. Gezel lets users write new IP blocks in C++ according to a standard template, and more advanced Gezel users can develop them as library plug-ins.
Designers of heterogeneous architectures will inevitably get in touch with new design cultures and practices, not only from those novel ESL tools but also from their colleague designers.
Acknowledgments
We thank the reviewers for their constructive feedback. We also thank the many students who have experimented with Gezel and whose designs weve mentioned in this article. This research has been made possible with the support of STMicroelectronics, Atmel, the National Science Foundation, University of California Microelectronics and Computer Research Opportunities (UC Micro), SRC, and FWO (Fonds voor Wetenschappelijk Onderzoek).
References
1. C. Rowen and S. Leibson, Engineering the Complex
SoC: Flexible Design with Congurable Processors,

Prentice Hall, 2004. 2. T.J. Todman et al., Recongurable Computing: Architectures and Design Methods, Proc. IEE, vol. 152, no. 2, Mar. 2005, pp. 193-207. 3. D. Talla et al., Anatomy of a Portable Digital Mediaprocessor, IEEE Micro, vol. 24, no. 2, Mar.-Apr. 2004, pp. 32-39. 4. K. Tiri et al., A Side-Channel Leakage Free Coprocessor IC in 0.18um CMOS for Embedded AES-Based Cryptographic and Biometric Processing, Proc. 42nd
Acceleration of embedded virtual machines

For the third design, we integrated three components: a port of the Java-embedded virtual machine, the SH3DSP ISS, and the Gezel kernel. We developed software in Java, C, and assembly language. In addition, this design required a considerable number of cryptographic support libraries. This kind of design demonstrates the importance of varying the design abstraction level within a single platform. The entire cryptographic application in Java can take millions of clock cycles, and the hardware coprocessor is active for a fraction of the time. On the one hand, we need increased simulation efciency (and decreased simulation detail) for much of the design, but on the other hand, at a few select places we must observe every bit that toggles in every gate. A component-based design approach can cope with this heterogeneity.
Design Automation Conf. (DAC 05), ACM Press, 2005,

pp. 222-227. 5. Y. Matsuoka et al., Java Cryptography on KVM and Its Performance and Security Optimization Using HW/SW Co-design Techniques, Proc. Intl Conf. Compilers,
Architecture, and Synthesis for Embedded Systems

(CASES 04), ACM Press, 2004, pp. 303-311. 6. T. Claassen, System on a Chip: Changing IC Design Today and in the Future, IEEE Micro, vol. 21, no. 3,
HETEROGENEOUS SYSTEM architectures will continue
May-June 2003, pp. 20-26. 7. A. Sangiovanni-Vincentelli, Dening Platform-Based Design, EE Times, Feb. 2002, http://www.eetimes.com/ news/design/showArticle.jhtml?articleID=16504380. 8. W.O. Cesario et al., Multiprocessor SoC Platforms: A Component-Based Design Approach, IEEE Design &
to dominate in applications that require dedicated, high-performance, and energy-efcient processing. The challenge at the electronic system level will be to design these architectures in increasingly shorter design cycles. New tools will have to quickly create not only derivative platforms but also entirely new platforms. We are exploring novel mechanisms in Gezel to further accelerate platform construction, and we are presently working on such a platform designer for FPGA technology. Wed also like to stress that ESL design requires not only new tools but also a change in design culture.
Test, vol. 19, no. 6, Nov.-Dec. 2002, pp. 52-63.

9. D. Gajski et al., SpecC: Specication Language and
Methodology, Kluwer Academic Publishers, 2000.

10. E. Lee, Overview of the Ptolemy Project, tech. memo UCB/ERL M03/25, Dept. of Electrical Eng. and Computer Science, Univ. of California, Berkeley, 2003.
346
F E AT U R I N G
Patrick Schaumont is an assistant professor in the Electrical and Computer Engineering Department at Virginia Tech. His research interests include design methods and architectures for embedded systems, with an emphasis on demonstrating new methodologies in practical applications. Schaumont has an MS in computer science from Ghent University, Belgium, and a PhD in electrical engineering from the University of California, Los Angeles. He is a senior member of the IEEE. Ingrid Verbauwhede is an associate professor at the University of California, Los Angeles, and an associate professor at Katholieke Universiteit Leuven, in Belgium. Her research interests include circuits, processor architectures, and design methodologies for real-time, embedded systems in application domains such as security, cryptography, DSP, and wireless. Verbauwhede has an electrical engineering degree and a PhD in applied sciences, both from Katholieke Universiteit Leuven. She is a senior member of the IEEE. Direct questions or comments about this article to Patrick Schaumont, 302 Whittemore Hall (0111), Virginia Tech, VA 24061; schaum@vt.edu.
IN 2007
Healthcare Mining a Sensor-Rich World Urban Computing Security & Privacy
IEEE Pervasive Computing delivers
the latest peer-reviewed developments in pervasive, mobile, and ubiquitous computing to developers, researchers, and educators who want to keep abreast of rapid technology change. With content thats accessible and useful today, this publication acts as a catalyst for progress in this emerging eld, bringing together the leading experts in such areas as Hardware technologies Software infrastructure Sensing and interaction with the physical world Graceful integration of human users Systems considerations, including scalability, security, and privacy
Sign Up Today for the IEEE Computer Societys e-News

Be alerted to
articles and special issues conference news registration deadlines
Subscribe Now!
Available for FREE to members.
VISIT www.computer.org/pervasive/subscribe.htm
347
www.computer.org/e-News
Modeling Embedded Systems: From SystemC and Esterel to DFCharts

Ivan Radojevic, Zoran Salcic, and Partha S. Roop
University of Auckland
Editors note: This article addresses the need for directly expressing heterogeneous, hierarchical behaviors for modeling specific embedded systems. After analyzing two existing ESL languages, SystemC and Esterel, the authors created a new model of computation and a graphical language to gain the direct expressivity they need for their model. Although researchers have suggested various changes to SystemC and Esterel to fit modeling requirements, this article considers mainly standard SystemC and Esterel. Sandeep K. Shukla, Virginia Polytechnic and State University
THE DESIGN PRODUCTIVITY of engineers has not
kept pace with rapid improvements in silicon technology. This has resulted in what is commonly known as the productivity gap. To close this gap, researchers have introduced various system-level design languages (SLDLs) to raise the design abstraction level by focusing on a systems behavior rather than low-level implementation details. A major challenge that SLDLs face stems from the behavioral heterogeneity of most embedded systems. For example, one part of an embedded system might perform intensive computations on samples that regularly arrive from an analog-to-digital converter. Another part of the same system might perform only minor computations while being ready to quickly respond to events that arrive asynchronously from the environment. An embedded systems behavior usually involves a set of concurrent, communicating processes. A model of computation (MoC) denes the rules for communication and synchronization between processes. Different MoCs are suitable for different behaviors. For example, hierarchical concurrent nite-state machines (HCFSMs), which the statecharts family uses,1 are suitable for describing control-dominated behavior, whereas dataflow models are good for data-dominated
behavior. SLDLs must support multiple MoCs to successfully cope with embedded systems behavioral heterogeneity. Using a case study of a practical, heterogeneous embedded system called frequency relay, we evaluate the modeling capabilities of two popular system-level languages, SystemC and Esterel.2,3 Based on this case study, we establish an expanded set of system-level language requirements, against which we evaluate the strengths and weaknesses of these two languages. Because of these languages limitations, we suggest a new MoC for heterogeneous systems called DFCharts, which SystemC and Esterel should follow to support better modeling of heterogeneous embedded systems. (The Related work sidebar discusses other efforts to compare languages for embedded-systems design.) DFCharts targets heterogeneous embedded systems by combining a data-dominated MoC called synchronous dataflow (SDF) with a control-dominated MoC called Argos (which, like statecharts, is based on HCFSMs).4,5 In terms of MoCs that are combined, DFCharts is similar to *charts,6 which also uses HCFSMs and SDF. However, *charts allows only hierarchical refinement of one model by another. At each hierarchical level, blocks must obey the semantics of a single MoC, but internally a designer can refine each block into a system that behaves according to some other model. The major problem with this approach has to do with the communication between hierarchical levels, which can lead to the loss of some of a given MoCs original characteristics. Unlike *charts, DFCharts lets SDFs and FSMs coexist at the same hierarchical level, and a rendezvous mechanism of communicating sequential
348
0740-7475/06/$20.00 2006 IEEE
processes (CSPs) enables communication between them.7 In this way, each model retains its characteristics, and there is more exibility in modeling.
Related work
There have been a few other attempts to describe and compare languages for embedded-systems design. Edwards reviews hardware description languages, programming languages, and system-level languages.1 Cai et al. compare specication languages SpecC and SystemC.2 Gorla et al. compare several languages for system specication.3 They also use the case study of a practical heterogeneous embedded system to illustrate relevant concepts. Brisolara et al. use the same case study to compare two variants of the Unified Modeling Language with Simulink.4 The key difference between our work and these is that we closely concentrate on the link between the specification languages and the models of computation (MoCs) suitable for heterogeneous systems. Moreover, we introduce a new MoC, DFCharts, to model heterogeneous systems. Other models that target heterogeneous embedded systems include Reactive Process Networks,5 Funstate,6 Composite Signal Flow,7 and Mode Automata.8
Initial system-level requirements for SystemC and Esterel

Whereas an industry consortium is proposing SystemC and it has no formal semantics, Esterel has formal semantics and formal verification capabilities. Hence, both languages represent differing perspectives on system-level modeling. Some of the key modeling requirements at the system level are as follows:
Separation of communication and computation. This makes the model suitable for reuse in an environment involving several independently developed, concurrent components. Concurrency and communication primitives at a high abstraction level. The purpose of the system-level design is to create a model involving several components, each having its own MoC. Therefore, the modeling language must combine several MoCs and facilitate communication among them. Functional hierarchy. The modeling language might need to express a particular functionality hierarchically to enable succinct specification. Hierarchy should allow mixing different MoCs that exist at different hierarchical levels. This requirement is also called hierarchical heterogeneity.8 Exception handling. Because exceptions are critical to embedded systems, the language must provide direct support to capture and handle exceptions.
References
1. S. Edwards, Languages for Digital Embedded Systems, Kluwer Academic Publishers, 2000. 2. L. Cai, S. Verma, and D.D. Gajski, Comparison of SpecC and
SystemC Languages for System Design, tech. report CECS-0311, Center for Embedded Computer Systems, Univ. of California, Irvine, 2003. 3. G. Gorla et al., System Specication Experiments on a Common Benchmark, IEEE Design & Test, vol. 17, no. 3, July-Sept. 2000, pp. 22-32. 4. L. Brisolara et al., Comparing High-Level Modeling Approaches for Embedded System Design, Proc. Asia and
In light of the frequency relay case study, we will expand these requirements.
South Pacic Design Automation Conf. (ASP-DAC 05), ACM

Press, 2005, pp. 986-989. 5. M. Geilen and T. Basten, Reactive Process Networks, Proc.
Case study: Frequency relay

Power systems need protection from overloading. When a power system is overloaded, its necessary to disconnect some loads to prevent damage. A signicant decrease in the main AC signals frequency level (the normal value is 50 Hz) indicates a dangerously overloaded system. The same problem also occurs when the AC signals rate of change (ROC) is too fast. The frequency relay is a system that measures the frequency and its ROC in a power network, comparing measurement results against a set of thresholds that a control system can modify via the Internet. If the current thresholds indicate that the frequency is too low or that its ROC is too fast, the frequency relay disconnects some loads from the network by
4th ACM Intl Conf. Embedded Software (EMSOFT 04), ACM

Press, 2004, pp. 137-146. 6. K. Strehl et al., FunStateAn Internal Design Representation for Codesign, IEEE Trans. Very Large Scale Integration (VLSI)
Systems, vol. 9, no. 4, Aug. 2001, pp. 524-544.

7. A. Jantsch and P. Bjureus, Composite Signal Flow: A Computational Model Combining Events, Sampled Streams, and Vectors, Proc. Design, Automation and Test in Europe
Conf. (DATE 00), IEEE CS Press, 2000, pp. 154-160.

8. F. Maraninchi and Y. Remond, Mode-Automata: A New Domain-Specic Construct for the Development of Safe Critical Systems, Science of Computer Programming, vol. 46, no. 3, Mar. 2003, pp. 219-254.
349
Figure 1 illustrates the main operation that we just described, mode1. Data-dominated processes perform a DSP operation Switches Frequency Switch Communication similar to autocorrelation; calculation control network this operation is necessary Parameter settings for frequency calculation. Rate-of-change Control-dominated pro(ROC) Timer calculation cesses perform various decision-making and minor computations. The Figure 1. Main operation of frequency relay mode1. (Clear boxes indicate dataparameter-settings process dominated processes. Shaded boxes indicate control-dominated processes.) monitors the interface with the Internet. The frequency calculation and ROC calculation processes determine the frequency and its ROC. t1/st Figure 2 shows the switch-control process, repret1/st S3 S2 senting it as an FSM with four states. The initial state is to S3. Each state determines how many switches are t2/st closed. For example, three switches are closed in S3, t2/st to/st t3/st t3/st whereas all three switches are open in S0. The state transitions come from inputs t1, t2, and t3, which indicate t3/st whether certain thresholds have been exceeded. The S0 S1 input from timer to (time-out) is also a factor. The to/st switch-control block can restart the timer by emitting t3/st t2/st output st. Figure 3 shows the frequency relays global states. The initial state, initialize, congures some system paraFigure 2. Switch-control process. meters. After this initialization, init_done, the next state is mode1, in which the main operation occurs (as described by the processes in Figure 1). If reset occurs, initialize mode2 the system reinitializes. When off occurs, mode1 terminates and mode2 begins. Nothing happens in this state; the system simply stops, and all switches close. If on occurs, the system enters mode1 again. The FSM in reset on Figure 3 represents the frequency relays top level. (The init_done off processes in Figure 1 are one level below this.) The arrows between the processes in Figure 1 denote directions of communication, but so far we have not discussed the communication semantics. Before writing mode1 the specification in SystemC and Esterel, we need to state the required communication mechanisms. Furthermore, we need to state how the computations Figure 3. Global states of frequency relay. inside the processes will occur. By identifying the required models for computation and communication, opening one or more switches (three in the case we pre- we can make a complete list of requirements against sent here), as determined by a decision algorithm. The which we will evaluate SystemC and Esterel. The three data-dominated blocks perform intensive system gradually reconnects loads if the frequency and computations on samples that regularly arrive from the its ROC improve.
AC waveform Averaging filter Symmetry function Peak detection
350
power system. Lee and Messerschmitt have successfully applied SDF for this type of behavior.4 In SDF, processes communicate through asynchronous FIFO buffers. Each process can re when its ring rule is satised, and this determines how many tokens must be present in the input buffers. Imperative statements (which programmers can write in C, for instance) describe the algorithms inside the processes. FSMs, such as the one for the switch-control process in Figure 2, can effectively capture the control-dominated processes behavior. FSMs can also be hierarchical. The most convenient communication model among concurrent FSMs appears to be synchronous reactive (SR).9 (In fact, most variants of statecharts use SR.1) Thus, we need hierarchical, concurrent FSMs with SR communication. In addition, we need imperative statements for minor computations performed on state transitions. Finally, we can use CSP-like rendezvous for communication between peak detection and frequency calculation processes. A high-level communication mechanism guarantees lossless transmission of data without buffers. The models listed thus far (HCFSM, SR, SDF, CSP, and imperative statements) cover the majority of models that Edwards et al. discuss.9 An important model that we havent discussed is discrete event. Although highly expressive, discrete-event models are very difcult to synthesize.9
Table 1. Level of support provided by SystemC and Esterel, in a scale of 0 to 3, with 3 being the highest level of support. Requirement Concurrent processes Rendezvous communication Support for dataow Support for HCFSMs Data transformations Hierarchy and preemption SystemC 3 2 2 2 3 0 Esterel 3 2 0 3 3 3
The first five requirements relate to the first two requirements given earlier (separation of communication and computation, and concurrency and communication primitives at a high abstraction level). The last requirement, hierarchy and preemption, relates to the last two requirements (functional hierarchy and exception handling) from the earlier list.
Evaluation of SystemC and Esterel based on these requirements

Now, we evaluate the level of support of SystemC and Esterel for each of the six expanded system-level requirements. Table 1 summarizes the results of this evaluation. Concurrent processes. SystemC relies on implicitly assumed concurrency; processes defined in a single module are concurrent. When multiple modules connect at any hierarchical level, they always execute concurrently. In fact, specifying the execution order of modules, as in sequential or pipelined execution (available in some other languages), is not possible in SystemC. The designer would have to use control signals to manipulate the execution order of modules. Esterel lets programmers explicitly create concurrency using parallel operator || at any hierarchical level. The || operator creates concurrent threads that communicate and synchronize using the synchronous broadcast. This approach is based on the SR MoC, which assumes there is a global clock. Esterel generates inputs and corresponding outputs in the same tick of the global clock, leading to the logical-zero delay model. Also, Esterel broadcasts events generated in any thread to all other threads. Clever programming would be necessary for any other form of concurrency, however. Rendezvous communication. SystemC has no higherlevel construct to implement rendezvous directly.
Suitability of SystemC and Esterel for modeling heterogeneous embedded systems

Based on the frequency relay case study, we expand the system-level language requirements given earlier into the following six requirements:

concurrent processes, an essential requirement and a precondition for all other points that follow; rendezvous communication; support for dataow, including buffered communication between processes and specication of ring rules for dataow modules; support for HCFSM models with synchronous communication; imperative statements to describe data transformations inside SDF actors, as well as smaller computations performed by FSMs; and hierarchy and preemption, multiple processes inside a hierarchical state, and instant termination of lowerlevel processes when any transition leaves the hierarchical state.
351
AC waveform Averaging filter Symmetry function Peak detection
Global state finite-state machine
Switches Switch control
Frequency calculation Parameter settings ROC calculation
Communication network
faster than they would otherwise need to work. A more efcient implementation would specify datadominated blocks as asynchronous tasks, taking more than one tick to complete computations. However, using asynchronous tasks leads to integration problems.
Support for HCFSMs. SystemC lets you describe FSMs using switch-case Figure 4. Modied frequency relay model for SystemC implementation. constructs, which can be nested for hierarchical However, creating rendezvous between two processes FSMs. This involves using multiple state variables. Signal using wait and notify statements should not be difcult. sensitivities and the wait statement support reactivity. Esterel does not allow direct specification of ren- However, SystemC cannot match powerful preemption dezvous. Instead, programmers must create rendezvous statements such as abort and trap in the SR-based Esterel using a combination of appropriately employed await language. Esterel obviously completely supports SR communiand emit statements. cation. Statements such as abort and trap can naturally Support for dataow. In SystemC, primitive channel describe preemption. Although Esterels imperative statesc_fo can implement FIFO buffers. Because of constant ments can easily describe an FSM, using visual syntax is data rates, its best to implement data-dominated blocks probably more convenient in most cases. This is where as method processes. Theres no need to use less efcient SyncCharts (http://www.esterel-technologies.com) comthread processes. However, only thread processes that plements Esterel. are dynamically scheduled by the SystemC kernel can use sc_fo buffers. Hence, implementing static schedul- Data transformations. SystemC, as an imperative language, provides excellent support for describing ing with the ring rules of the SDF model is difcult. Esterel allows the implementation of a FIFO buffer as sequential algorithms. In Esterel, C is available as a host a separate process (C function), thus separating compu- language; hence, Esterel can specify complex algotation and communication. However, the FIFO process rithms for data transformations inside transformational would still synchronize with the tick signal. Thus, the blocks similar to the way SystemC does. However, abstraction level would be lower than in asynchronous Esterel requires you to assume that computation of timeSDF buffers. In the frequency relay, the SDF blocks per- consuming algorithms is instantaneous. forming signal processing must be reactive, like all other processes in the system. The event to which they react is a Hierarchy and preemption. In SystemC, there is no sample from the analog-to-digital converter. The problem direct way to implement exceptions modeled by exits is that all processes must align with a single tick signal from higher-level hierarchical states. We indicated earlithat is, they must read inputs and produce outputs at the er that hierarchy in an FSM could be modeled by using same time instant. The most efcient solution for the SDF nested switch-case statements; however, this type of modprocesses is to have the tick signal coincide with the AC eling is not applicable here, because its not possible to input signals sampling frequency. The ticks must be fre- instantiate processes inside a case branch. Because prequent enough to capture all system inputs. Thus, the empting processes is not possible, one or more control process with the fastest system inputs determines the tick signals must control each process. Consequently, the signal rate. The result is an implementation that is likely global-state FSM in Figure 3 must be at the same hierarto be inefcient, because the data-dominated blocks work chical level as the processes in Figure 1 (see Figure 4).
Timer
352
Esterel supports behavioral hierarchy and has several statements that enable preemption. For example, concurrent blocks can run inside the body of an abort statement.
Table 2. SystemC les for frequency relay specication (effective lines of source code). SystemC les averaging_lter.cpp symmetry_function.cpp peak_detection.cpp frequency_calculation.cpp roc_calculation.cpp parameter_settings.cpp switch_control.cpp timer.cpp frequency_relay.cpp testbench.cpp Code size 85 95 66 93 100 239 135 38 251 412
Additional analysis of SystemC and Esterel

Tables 2 and 3 give the SystemC and Esterel specications of the frequency relay. The Esterel specication is a mixture of Esterel les with an .strl extension, and C files with a .c extension. Neither specification completely follows the model in Figure 1. The total code size for the SystemC specification, excluding the testbench le, was 1,102 lines. The total code size for the Esterel specication was 901 lines. This difference is not significant, considering that the SystemC specication has more les and thus more declarations. Each SystemC le contains one process. The rst three les in Table 2 contain thread processes; all others contain method processes. Although the time required to prepare a simulation is important, a more critical factor is the actual simulation time. The Esterel simulation took close to 4 hours, whereas the SystemC simulation took only 5 minutes. We performed both simulations on the same platform. For SystemC, we used Microsoft Visual C++ version 6 with SystemC class library 2.0.1. For Esterel, we used Esterel Studio version 4, which supports Esterel version 5. The latest, recently released version of Esterel (version 7) allows multiclock designs that are globally asynchronous, locally synchronous (GALS). Several factors might account for the huge difference in actual simulation times, but the most interesting one concerns modeling in Esterel. The entire system must run on one clock because Esterel doesnt support multiple clocks. The process with the fastest changing inputsthe parameter-setting blockdetermines the system speed. This speed is unnecessarily high for datadominated parts, which need to read inputs only when a sample arrives. Consequently, there are many ticks with absent inputs in this part of the system. Although simulation is the most widely used validation method, it is not the only one. The other method is formal verication, which Esterel specications (unlike SystemC) may employ. However, formal verication is not particularly helpful for the frequency relay, because any useful properties that could be veried would relate to data-dependent internal activities rather than inputs and outputs. It would be difficult to define such properties using Esterel observers, which check properties only in the control part.
Table 3. Esterel les for frequency relay specication (effective lines of source code). SystemC les dataow.strl averaging_lter.c symmetry_function.c measurement.strl freq_average.c roc_average.c parameter_settings.strl switch_control.strl frequency_relay.strl Code size 76 34 41 77 31 43 251 139 209
DFCharts
Because of the limitations of SystemC and Esterel, we introduced DFCharts as a model they should support to capture heterogeneous embedded systems. (We explain the detailed semantics of DFCharts elsewhere.10) DFCharts combines two well-known models, SDF and Argos,4,5 in a novel way. SDF is suitable for datadominated systems. Argos is suitable for control-dominated systems. SDF belongs to the family of dataflow models. In SDF, each process operates on streams of tokens in rings. A process firing rule specifies how many tokens each firing consumes and produces. In SDF, unlike dynamic dataow models, those numbers must be constant, which limits buffer size and makes it possible to construct efficient static schedules. Because of static scheduling, the iteration of an SDF graph is clearly identifiable: It is a series of process firings that return the buffers to their original state. In Figure 5a, some possi-
353
2 2
B C
1 1
1 A 1
1 S0 3
a
S1
S1 Refinement
b /c
FSM2
c /d
FSM3
FSM1 Synchronous Hiding parallel (b)
(a)
Figure 5. Example specications of the two models used in DFCharts: SDF (a) and Argos (b).
ble schedules that create a single iteration are BCA, CBA, or C and B running concurrently before A. The numbers next to the processes describe their firing rules. SDF is suitable for a wide range of signal-processing systems with constant data rates. Argos models consist of parallel and hierarchical compositions of FSMs. Argos execution is based on the synchrony hypothesis, which states that all computations and communications in the system are instantaneous. As a result, there is no delay between inputs and outputs; they are synchronous. Model execution involves a series of instants (called ticks) of a global clock. In each tick, Argos reads inputs and instantaneously produces outputs. Because all components react simultaneously, there is no need for scheduling. The three main operators that Argos uses to construct the HCFSM model are renement for hierarchy, synchronous parallel for concurrency, and hiding for synchronization. Figure 5b shows a simple Argos specication, which renes state S1 into two concurrent FSMs that synchronize using event c. When S1 is active and event b occurs, FSM2 makes the transition and emits c, causing the transition in FSM3 in the same instant, which in turn emits d. In the instant when the rened FSM leaves the hierarchical state, the rening FSMs can react. Thus, d is emitted even if signal a is present. (This corresponds to the notion of weak preemption, called weak abort in Esterel). Like Argos, DFCharts has synchronous parallel, renement, and hiding operators. However, it also has the additional asynchronous parallel operator, which it uses to connect an SDF graph with one or more FSMs. This operator is asynchronous because the SDF graph operates independently of FSMs. The SDF graph synchronizes with FSMs only between two iterations: when its receiving inputs for the next iteration and sending outputs produced during the previous iteration. SDF graphs can be at any level in the hierarchy of FSMs. All FSMs in a DFCharts specication use the same set of ticks (clock). When a tick occurs, every FSM makes a transition. However, SDF graphs operate at their own
speed. This produces a system with multiple clock domains, a different domain for each SDF graph and a single clock domain for all FSMs. This type of mixed synchronous and asynchronous specication supports efcient implementation. Moreover, because DFCharts allows FSMs and an SDF graph at the same hierarchical level, each retains its own characteristics. The example in Figure 6 illustrates the features of DFCharts. At the top level, state S2 is rened into two parallel FSMs that synchronize by local event e. S1 is also rened into two FSMs, connected by the synchronous parallel operator; in addition, the asynchronous parallel operator connects these two FSMs with SDF graph SDF1. The communication between the SDF graph and the FSMs passes through channels ch1 and ch2. The arrows indicate the direction of data exchange. For the SDF graph, ch1 is an output channel, and ch2 is an input channel. The communication through each channel occurs when both the SDF graph and the relevant FSM are ready for it. (The SDF graph and the FSM meet using CSP-style rendezvous operations.) If the sender attempts to send when the receiver is not ready, the sender will block itself. Similarly, if the receiver attempts to read while the sender is not ready, the receiver will block itself. FSMs communicate with SDF graphs from rendezvous states, which cannot be rened. A rendezvous state is one that has an outgoing transition triggered by a rendezvous action. In Figure 6, the rendezvous states are S7 and S9. When FSM4 is in S7, it is ready to receive data from SDF1 through ch1, as evident from transition ch1?x. We use CSP notation,7 where ? denotes a read action, and ! denotes a write action. When SDF1 is ready to send data, the communication occurs, triggering transition ch1?x. The data received from SDF1 is stored in variable x, event h is emitted, and state S8 begins. S8 can also follow S7 when event m is present, preempting rendezvous on ch1. On the other hand, FSM5 remains blocked in S9 until SDF1 is ready to receive data through ch2 from variable y. Figure 7 shows how DFCharts represents the frequency relay. Property verication in a DFCharts model is similar
354
S2 FSM1 a S1 FSM2 S0 FSM3 S3 Refinement S5
b /d
S2
n /f
S4
e e
g
S6
g /e
Asynchronous parallel Hiding S1 SDF1 ch4 1 1 1 B 3 1 1 ch3 S8 SDF2 FSM6 SDF3 C 2 2 ch2 ch1 ch1?x/h S8 FSM2 S7 FSM3
Synchronous parallel
S9
m l
ch2!y S10
y =y +1 h/assign y1 = y1 + 2 2 2
Figure 6. Example of DFCharts model.
freq_relay initialize mode2
init_done
reset
mode1 find_peaks
1 1 symmetry function 1 1
On
Off
1 ch1
averaging filter
peak detection
1 ch2
Timer
Switch control
Parameter settings
ROC calculation
Frequency calculation
Figure 7. Frequency relay in DFCharts.
to that in Argos. In the latter, combining FSMs removes hierarchy and concurrency. The result is a single, flat FSM, whose behavior is equivalent to the original model. In DFCharts, it is also necessary to integrate SDF graphs. DFCharts accomplishes this by representing the
operation of each SDF graph as an equivalent HCFSM. In general, the top-level FSM representing an SDF graph has two states: io (I/O) and iterate. Figure 8 gives a simple example of an SDF graph with one input channel and one output channel.
355
io state ioc
si1
cin?din
so1
init
init
iterate
so2
cout!dout itc
si2
so3
Figure 8. FSM representing the operation of a two-channel SDF graph.
The io state is rened by as many concurrent FSMs as there are inputs and outputs. The input FSM, which consists of two states, receives data through channel cin, and stores it into variable din. The output FSM sends data from variable dout through channel cout, as the transition from so2 to so3 indicates. If no iteration has occurred yet, which the presence of init indicates, there is nothing to send, and the output FSM enters so3 immediately after so1. Otherwise, init is absent (denoted by init in Figure 8), and so2 is entered from so1. When the input and output FSMs enter si2 and so3, respectively, ioc becomes present (io complete), and the top-level FSM enters iterate, thus completing a single iteration of the SDF graph. An FSM representing a particular schedule can further rene this state. However, this refinement isnt necessary for the global analysis.
will synchronize it to all SR blocks in the upper hierarchical level. Such assumptions are likely to produce inefficient implementations. With the parallel heterogeneity used in DFCharts, FSMs are free to react to external events, and SDF graphs can run at their own speed. The Communicating Reactive State Machines (CRSM) language also extends Argos with an asynchronous parallel operator, which uses rendezvous channels to connect parallel FSMs.11 Thus, DFCharts has more in common with CRSM than Argos. However, the purpose of the asynchronous parallel operator in CRSM is to connect parts in a distributed system, whereas in DFCharts this operator serves to connect physically close control-dominated and data-dominated parts. Another important difference is that in CRSM the asynchronous parallel operator can function only at the top level (in a GALS manner), whereas in DFCharts it can function at any hierarchical level.
Feature extensions of SystemC and Esterel

According to our analysis, SystemC only partially supports or does not support at all the expanded systemlevel requirements of rendezvous communication, dataflow, HCFSMs, and hierarchy and preemption. A designer can construct a rendezvous channel using wait and notify statements to create the necessary request and acknowledge lines for the rendezvous protocol, but this could take some effort. Ideally, the designer should add a standard rendezvous channel to the library of channels that includes sc_fifo, sc_signal, and so on. Asynchronous thread processes that communicate through FIFO channels using blocking reads provide a good foundation for dataow models. However, its also still difcult in SystemC to specify firing rules and construct static-scheduling orders, so improvements are necessary in this area as well. Synchronous processes can be created in SystemC, and this is essential for HCFSM support. Its also possible to model reactivity using signal sensitivities and wait and notify statements. But the absence of preemption is a serious disadvantage when modeling control-dominated behavior. Processes cannot be instantaneously terminated or interrupted, which is necessary for the hierarchy and preemption requirement. Overcoming this fundamental limitation would require making deep changes in SystemCs simulation semantics. SystemC-H is an extension of SystemC that incorporates some of these desired changes.12 SystemC-H has an extended SystemC kernel to better support SDF, CSP,
Comparison between DFCharts and other models

Besides DFCharts, the only other model that combines FSMs and SDFs is *charts,6 which is a part of Ptolemy.8 The Ptolemy environment hierarchically combines several MoCs. At each hierarchical level, blocks must obey a single MoCs semantics, but a designer can internally rene each block into a system that behaves according to some other model. The closest subset of Ptolemy to DFCharts is *charts, which focuses on mixing FSMs with other models. With hierarchical heterogeneity, it might be difficult in *charts to devise a meaningful communication mechanism between outer and inner models. The inner model might lose some properties while adjusting to the outer model. For example, if a network of SR blocks renes an SDF block, the refining blocks receive their inputs through blocking reads, so they are not really reactive. Conversely, if an SDF network renes an SR block, the SDF network must conform to the synchrony hypothesis. This means *charts will assume its iteration is instantaneous and
356
and FSM models. Constructing static schedules for SDF models is possible, and this increases simulation efficiency. Another important addition is hierarchical heterogeneity with SDF and FSM models. In its current form, though, SystemC-H probably wouldnt be able to support DFCharts entirely, because the former adheres to purely hierarchical heterogeneity, as in Ptolemy, whereas DFCharts represents a mixture of hierarchical and parallel heterogeneity. Like SystemC, Esterel does not directly support rendezvous, but using await and emit statements, a designer could construct rendezvous. The main problem with Esterel is its complete lack of support for the third expanded system-level requirement: support for dataow, including buffered communication between processes and specication of ring rules for dataow modules. The assumption made by the synchrony hypothesis (that all computations are instantaneous) is seldom valid for data-dominated systems. Furthermore, Esterel syntax is not appropriate for dataow. It would be possible to design a dataow network inside an asynchronous task. But, describing something in an asynchronous task means going outside Esterel and its development tools. Creating a solid basis for an integrated environment requires defining a MoC (such as SDF) for asynchronous tasks and interfacing this MoC with the SR model.
2. Open SystemC Initiative, SystemC Version 2.0 Users
Guide; http://www.systemc.org.
3. G. Berry and G. Gonthier, The Esterel Synchronous Programming Language: Design, Semantics, Implementation, Science of Computer Programming, vol. 19, no. 2, Nov. 1992, pp. 87-152. 4. E.A. Lee and D.G. Messerschmitt, Synchronous Data Flow, Proc. IEEE, vol. 75, no. 9, Sept. 1987, pp. 12351245. 5. F. Maraninchi and Y. Remond, Argos: An AutomationBased Synchronous Language, Computer Languages, vol. 27, nos. 1-3, 2001, pp. 61-92. 6. A. Girault, B. Lee, and E. Lee, Hierarchical Finite State Machines with Multiple Concurrency Models, IEEE
Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 6, June 1999, pp. 742-760.
7. C.A.R. Hoare, Communicating Sequential Processes,
Comm. ACM, vol. 21, no. 8, Aug. 1978, pp. 666-677.

8. J. Eker et al., Taming HeterogeneityThe Ptolemy Approach, Proc. IEEE, vol. 91, no. 1, Jan. 2003, pp. 127-144. 9. S. Edwards et al., Design of Embedded Systems: Formal Methods, Validation, and Synthesis, Proc.
IEEE, vol. 85, no. 3, Mar. 1997, pp. 366-390.

10. I. Radojevic, Z. Salcic, and P. Roop, Modeling Heterogeneous Embedded Systems in DFCharts, Proc. Forum
Design and Specication Languages (FDL 05), European Chips and Systems Initiative, 2005, pp. 441-452. 11. S. Ramesh, Communicating Reactive State Machines:
WE INTEND TO CREATE a graphical environment for designing embedded systems using DFCharts. Therefore, weve implemented a Java class library to execute DFCharts specifications. This library incorporates methods for analyzing SDF graphs from Ptolemy II. In fact, this was one of the reasons we chose Java for the implementation. The next step is to create a graphical interface. Another direction of research, which is the focus of this article, is to modify widely accepted system-level languages such as SystemC and Esterel to support DFCharts.
Design, Model and Implementation, Proc. IFAC Work-
shop Distributed Computer Control Systems, Pergamon

Press, 1998; http://www.cfdvs.iitb.ac.in/projects/crsm/ ifac.ps. 12. H. Patel and S. Shukla, SystemC Kernel Extensions for
Heterogeneous System Modeling: A Framework for Multi-MoC Modeling & Simulation, Kluwer Academic
Publishers, 2004.
References
1. M. von der Beeck, A Comparison of Statecharts Variants, Proc. Formal Techniques in Real-Time and
Fault-Tolerant Systems, LNCS 863, Springer-Verlag,

1984, pp. 128-148.
Ivan Radojevic is a PhD candidate in the Department of Electrical and Computer Engineering at the University of Auckland in New Zealand. His research interests include design languages, models of computation, and hardware-software codesign for embedded systems. Radojevic has a BE in electrical engineering from the University of Auckland.
357
Zoran Salcic is a professor of computer systems engineering at the University of Auckland. His research interests include complex digitalsystems design, custom-computing machines, recongurable systems, FPGAs, processor and computer systems architectures, embedded systems and their implementation, design automation tools for embedded systems, hardware-software codesign, new computing architectures and models of computation for heterogeneous embedded systems, and related areas in computer systems engineering. Salcic has a BE, an ME, and a PhD in electrical and computer engineering from the University of Sarajevo. He did most of his PhD research at the City University of New York (CUNY). He is a senior member of the IEEE.
Partha S. Roop is a senior lecturer in the Department of Electrical and Computer Engineering at the University of Auckland. His research interests include the design and verification of embedded systemsespecially formal verification techniques such as model checking and module checking, and their applications in embedded systems. Roop has a BE in engineering from Anna University, Madras, India; an MTech from the Indian Institute of Technology, Kharagpur, India; and a PhD in computer science from the University of New South Wales, Sydney, Australia. Direct questions or comments about this article to Ivan Radojevic, Department of Electrical and Computer Engineering, University of Auckland, 38 Princess St., Auckland, New Zealand; irad002@ec. auckland.ac.nz.
IEEE Design & Test Call for Papers

IEEE Design & Test, a bimonthly publication of the IEEE Computer Society and the IEEE Circuits and Systems Society, seeks original manuscripts for publication. D&T publishes articles on current and near-future practice in the design and test of electronic-products hardware and supportive software. Tutorials, how-to articles, and real-world case studies are also welcome. Readers include users, developers, and researchers concerned with the design and test of chips, assemblies, and integrated systems. Topics of interest include Analog and RF design, Board and system test, Circuit testing, Deep-submicron technology, Design verication and validation, Electronic design automation, Embedded systems, Fault diagnosis, Hardware-software codesign, IC design and test, Logic design and test, Microprocessor chips, Power consumption, Recongurable systems, Systems on chips (SoCs), VLSI, and Related areas.
To submit a manuscript to D&T, access Manuscript Central, http://cs-ieee.manuscriptcentral.com. Acceptable le formats include MS Word, PDF, ASCII or plain text, and PostScript. Manuscripts should not exceed 5,000 words (with each average-size gure counting as 150 words toward this limit), including references and biographies; this amounts to about 4,200 words of text and ve gures. Manuscripts must be double-spaced, on A4 or 8.5-by-11-inch pages, and type size must be at least 11 points. Please include all gures and tables, as well as a cover page with author contact information (name, postal address, phone, fax, and e-mail address) and a 150-word abstract. Submitted manuscripts must not have been previously published or currently submitted for publication elsewhere, and all manuscripts must be cleared for publication. To ensure that articles maintain technical accuracy and reect current practice, D&T places each manuscript in a peer-review process. At least three reviewers, each with expertise on the given topic, will review your manuscript. Reviewers may recommend modications or suggest additional areas for discussion. Accepted articles will be edited for structure, style, clarity, and readability. Please read our author guidelines (including important style information) at http://www.computer.org/dt/author.htm. Submit your manuscript to IEEE Design & Test today! D&T will strive to reach decisions on all manuscripts within six months of submission.
358
A Platform-Based Taxonomy for ESL Design

Douglas Densmore
University of California, Berkeley
Alberto Sangiovanni-Vincentelli
University of California, Berkeley
Roberto Passerone
University of Trento
Editors note: This article presents a taxonomy for ESL tools and methodologies that combines UC Berkeleys platform-based design terminologies with Dan Gajskis Y-chart work. This is timely and necessary because in the ESL world we seem to be building tools without first establishing an appropriate design flow or methodology, thereby creating a lot of confusion. This taxonomy can help stem the tide of confusion. Gary Smith, Gartner Dataquest
seem quite aggressive, most agree that ESLs overarching benets include

raising the abstraction level at which designers express systems, enabling new levels of design reuse, and providing for design chain integration across tool ows and abstraction levels.
THE GROWTH OF THE EDA INDUSTRY has been less than satisfactory in the past few years. For example, in 2005 growth was only 0.6%,1 and in 2006 it is predicted to be less than 3%.2 The reasons are varied and are beyond the scope of this article. However, one of the main issues is the failure of EDA to address new customers. New customers imply a revenue potential that is not consuming present business, thus allowing real industry growth. Traditionally, EDA has served the IC industry, where the demand for tools has been rampant since the early 1980s. An obvious adjacent market for EDA growth is electronic system-level (ESL) design. (See the Trends affecting the ESL design market sidebar for a brief history and explanation of how various market factors have contributed to developments in ESL design.) The 2004 International Technology Roadmap for Semiconductors (ITRS) placed ESL a level above RTL, including both hardware and software design. The ITRS dened ESL to consist of a behavioral (before HW/SW partitioning) and architectural level (after) and claimed it would increase productivity by 200,000 gates per designer-year. The ITRS states that ESL will improve productivity by 60% over an Intelligent Testbench approachthe previously proposed ESL design improvement.3 Although these claims cannot yet be veried and
The purpose of this article is to paint the ESL design landscape by providing a unified framework for placing and analyzing existing and future tools in the context of an extensible design ow. This approach should help designers use tools more efficiently, clarify their flows entry and exit points, and highlight areas in the design process that could benet from additional tools and support packages. This framework is based on platform-based design concepts.4,5 Using this framework, weve classified more than 90 different academic and industrial ESL offerings and partitioned the tool space into metaclasses that span an ideal design flow. (Although we try to cover as much of the ESL tool space as possible, we make no claim of completeness. We apologize in advance to the authors of tools we have inadvertently ignored. Also, we dont analyze the extensive literature that describes these tools; rather, we identify Web sites that contain relevant information.) We used this framework to explore three design scenarios to demonstrate how those involved in ESL design at various levels and roles can effectively select tools to accomplish their tasks more efciently than in a traditional IC design flow. The ability to study design scenarios goes beyond mere classification, because our framework exposes the relationships and constraints
0740-7475/06/$20.00 2006 IEEE
359
Trends affecting the ESL design market

The number of electronic system-level (ESL) designers is reportedly several orders of magnitude larger than the number of IC designers. However, until the late 1990s, the system-level design market had been highly fragmented. Consumers were unwilling to pay a high price for tools, so EDA companies produced relatively simple tools. For most of the products in this market, the end products complexity was not a limiting factor. In the late 1990s, the situation began to change dramatically as system complexity reached an inection point with the appearance of increasingly powerful electronic devices. Demand increased for demonstrably safe, efficient, and fault-tolerant operation of transportation systems such as automobiles and airplanes. Demand also increased for greater functionality in IT and communication devices, such as computing equipment and cell phones. During the past 10 years, several recalls (consider those from BMW and Daimler-Chrysler alone in the past two years, for example) and delays in the launch of previously announced products in the consumer electronics sectors demonstrated that new design methods, tools, and ows were sorely needed to prevent expensive xes in the eld and to bring new products to the market more quickly and reliably. This situation created the conditions for the birth of new tool companies and new offerings in established EDA companies to address the needs of a changing market. However, because the system industry landscape is very diversewith companies varying as widely as Nokia and General Motors, Boeing and Otis Elevators, and HewlettPackard and ABBa design approach that could satisfy all these diverse needs would have required a large investment, with a high risk of failure. Hence, the bulk of the ESL design effort (with a few notable exceptions) has come from academia and some small start-up companies trying to address a subset of the many problems and geared toward a limited number of potential customers. For years, Gartner Dataquest has predicted dramatic growth in ESL tool revenues, which unfortunately has failed to materialize. One of the reasons for unrealized growth is the lack of a vision in EDA of what system-level design ought to be and of how various tools fit in an overall methodology that the system industry at large could satisfactorily adopt. Consequently, there is confusion about the very definition of ESL and about what role it could play in the overall design of electronic products. Some companies have adopted ESL methodologies and tools, developed either internally or in academic circles, integrating some commercial tools as well. However, we are certainly at a relatively early stage of adoption.
among different classes to the designer, who may wish to implement a specic integration ow. (The Related work sidebar discusses other efforts to categorize ESL design approaches.)
The ESL classication framework

The design framework shown in Figure 1 is based on the platform-based design (PBD) paradigm presented by Sangiovanni-Vincentelli and Martin.5 This framework treats the design process as a sequence of steps that repeat themselves as the design moves from higher abstraction levels to implementation. The primary structure is a Y shape; thus, it is similar to the famous Y-chart introduced by Gajski. The left branch expresses the functionality (what) that the designer wishes to implement; the right branch expresses the elements the designer can use to realize this functionality (how); and the lower branch identifies the elements the designer will use to implement the functionality (the mapping).6 In this context, the right branch is the platform, and it includes
a library of elements, including IP blocks and communication structures, and composition rules that express which elements can be combined and how; and a method to assess the quantities associated with each elementfor example, power consumed or time needed to carry out a computation.
Each legal composition of elements from the platform is a platform instance. Mapping involves selecting the design components (choosing the platform instance) and assigning functionality parts to each element, thus realizing the complete functionality, possibly with overlaps. Designers optimize this process according to a set of metrics and constraints defined from the cost gures provided, or quantities mentioned. The designers then use these metrics to evaluate the designs feasibility and quality. This view of the design process is basically an abstraction of a process that designers have used implicitly for years at particular abstraction levels. For exam-
360
Related work
We are not the rst to realize the importance of categorizing ESL design approaches. Smith and Nadamuni used two axes for this purpose.1 The first axis contains three methodology components: an algorithmic methodology, a processor and memory methodology, and a control-logic methodology. Each refers to the way in which a designer thinks about the design or its components. The second axis includes the abstraction levels to express the designs: behavioral, architectural, and platform based. Smith and Nadamuni examined approximately 50 approaches in this framework. Maniwa presented a similar approach, also based on two axes, to categorize industrial tools.2 The rst axis is the design style: embedded software, SoC (hardware), behavioral, or component. The second axis is the language (for example, C, C++, or Verilog) to describe the design. Maniwa examined approximately 41 approaches. Gries also used two axes to classify ESL tools developed in academia and industry.3 The axes in this case related to abstraction levels (for example, system level and microarchitectural level) and design stages (such as application, architecture, and exploration). Gries examined approximately 19 approaches. Finally, Bailey, Martin, and Anderson provided a comprehensive set of taxonomies: a model taxonomy, a functional-verification taxonomy, a platform-based design taxonomy, and a hardware-dependent software taxonomy.4 To the best of our knowledge, their book provides the best classification of high-level design tools, and we follow its denitions when appropriate. Compared to their approach, our paradigm places tools in a more general design context and gives guidelines on how to connect the available tools, and IP blocks and their models, in a design ow.
References
1. G. Smith and D. Nadamuni, ESL Landscape 2005, Gartner Dataquest, 2005. 2. T. Maniwa, Focus Report: Electronic System-Level (ESL) Tools, Chip Design, Apr./May 2004, http://www. chipdesignmag.com/display.php?articleId=23&issueId=4. 3. M. Gries, Methods for Evaluating and Covering the Design Space during Early Design Development, Integration: The
VLSI J., vol. 38, no. 2, Dec. 2004, pp. 131-138.

4. B. Bailey, G. Martin, and T. Anderson, Taxonomies for
the Development and Verication of Digital Systems,

Springer, 2005.
ple, interpreting the logic synthesis process in this framework, we nd the following:

RTL code or Boolean functions represent the designs functionality. The platform includes a library of gates, or highercomplexity logic blocks. Mapping is the actual logic synthesis step that implements the functionality as an interconnection of gates (platform instance) optimizing a set of metrics involving area, power, and timing; the synthesis tool then exports the mapped design (gate-level netlist) to the layout phase, and the physical design tool maps this representation to a physical platform.
Functionality
Platform
Mapping
The PBD paradigm applies equally well to the application and algorithmic levels, where functionality can be a mathematical descriptionfor example, a Moving Picture Experts Group (MPEG) encoding algorithm. Also, the platform can be a set of subalgorithms for implementing each functional block of the encoding method. The result of the mapping process then goes to a lower level, where the left branch is a mapped platform instance, and the right
Figure 1. Platform-based design classication framework elements. Functionality indicates functional representations of a design completely independent of implementation architectures. Platform concerns the modules used to implement the functional descriptionfor example, processors, memories, and custom hardware. Mapping refers to instances of the design in which the functionality has been assigned to a set of correctly interconnected modules.
361
branch is a new set of elements for implementing the mapped platform instance. This process repeats until the result of the mapping process is a fully implemented solution. Thus, the design process is partitioned into levels, where each level represents a particular abstraction. The corresponding platform and mapping process optimizes specic aspects of the design. This framework prescribes a unied design methodology and hence is useful for identifying where existing tools and ows t and how to integrate them in the overall system design process.
Classifying ESL tools

We use the PBD paradigm to classify several ESL-related tools. Doing so casts present system-level design efforts in a global framework that serves as a unifying element. Of course, existing approaches may fall into more than one classication category because they cover more than one step of PBD. We could consider this a fault of the classication method, because a classication is effective only if it can cleanly partition the various objects being classied. However, partitioning the design steps rather than the tool coverage is more powerful because it identies the tools roles in the overall design paradigm. Indeed, the classication criteria can provide hints on how to connect different tools to yield an encompassing design ow. Weve developed an environment for design space exploration called Metropolis, which completely reects the design paradigm followed here. Metropolis can serve as the unifying framework for system design, where tool developers can embed tools, libraries, and approaches if the appropriate interfaces are built. The classication classes reect the Y-shaped diagram, with an additional classication criterion related to the abstraction level at which the tools work (see Figure 1): Bin F consists of functional representations of a design independent of implementation architectures and with no associated physical quantity, such as time or power. For example, a Simulink diagram expressing an algorithm for automotive engine control and a Ptolemy II description of an MPEG-decoding algorithm both belong to this bin. These diagrams could be renements of more abstract representations such as metamodels, as in Metropolis. To this bin, we assign tools that manipulate, simulate, and formally or informally analyze functional descriptions. Bin P represents the library of modules for implementing the functional description. The modules are architectural elements such as processors, memories, coprocessors, FPGAs, custom hardware blocks, and
interconnections (buses, networks, and so on). The elements also include middleware, such as operating systems for processors and arbitration protocols for buses, because these software components present the architectural services that the hardware offers to the application software. To this bin, we assign tools for connecting or manipulating the modules, as well as tools for analyzing the property of the complete or partial platform instances obtained. Bin M represents mapped instances of the design in which the designer or an automatic mapping tool has assigned functionality to a set of correctly interconnected modules. The connection between bins F, P, and M represents the mapping process. In this bin, we classify any tool that assigns architectural elements to functionality or generates the designs mapped view. For example, bin M would include a high-level synthesis tool because the designer has assigned, perhaps manually, part of the functionality to a virtual hardware component in the platform and is asking the tool to generate the lower-level view, in this case an RTL description of the design. By the same token, we can classify a code generation tool in bin M because the designer has assigned (perhaps manually) part of the functionality to a software-programmable element of the library and is asking the tool to generate the lower-level view. In this case, the view is a software programwhether assembly language, C, or a higher-level languagewhich is then compiled to move toward implementation. In this article, we consider the compilation phase and the synthesis from RTL to gates to be part of a traditional design ow and thus not part of our ESL tool classication. Some tools can handle two or even all three aspects of the PBD paradigm. To classify these tools, we introduce metaclasses (or metabins), indicated by combinations of F, P, and M. For example, in metabin FM, we assign a synthesis tool that handles functional components along with their mappings to platform components. Tools classied in metaclasses cover several parts of the PBD design ow. Designers using these tools can benefit from the design view we propose by clearly decoupling function from architecture and mapping. Doing so can enhance reusability and help the designer reach a correct implementation efciently. To make the partitioning of the tools ner, we introduced another, orthogonal criterion for classication: the abstraction level at which the tools operate. Whereas PBD doesnt limit the abstraction levels that designers use per se, most of the tools we reviewed work at three levels, listed here from highest to lowest:
362
Table 1. Tools in bin F: Industrial. (C: component level; I: implementation level; S: system level) Provider MathWorks Tools Matlab Focus High-level technical computing language and interactive environment for algorithm development, data visualization, analysis, and numeric computation. Scilab Novas Software Mentor Graphics EDAptive Computing Time Rover DBRover, TemporalRover, StateRover Maplesoft Wolfram Research Maple Mathematica EDAStar SystemVision Mixed-signal and high-level simulation Military and aerospace system-level design Temporal rules checking, pattern recognition, and knowledge reasoning Mathematical problem development and solving Graphical mathematical development and problem solving with support for Java, C, and .Net Mesquite Software Agilent Technologies National Instruments LabView Test, measurement, and control application development Agilent Ptolemy CSIM 19 Process-oriented, general-purpose simulation toolkit for C and C++ Functional verication C: Timed synchronous dataow S: LabView programming language http://www.ni.com/labview http://www.agilent.com S: C, C++ http://www.mesquite.com S: Mathematical equations S: Mathematical equations http://www.wolfram.com http://www.maplesoft.com S: VHDL-AMS, Spice, C S: Performance models C: Statecharts assertions http://www.time-rover.com http://www.mentor.com/products/ sm/systemvision http://www.edaptive.com Scicos Verdi Graphically model, compile, and simulate dynamic systems Debugging for SystemVerilog I: Discrete event http://www.novas.com S: Hybrid systems http://www.scilab.org Abstraction S: Matlab language, vector, and matrix operations Web site http://www.mathworks.com/products/ matlab
System level S corresponds to heterogeneous designs that use different models of computation (MoCs) to represent function, platforms, and mappings. Component level C involves subsystems containing homogeneous components. Implementation level I comprises the final design step, when the design team considers the job complete.
Bin F
Tools in this bin often serve to capture designs and their specications quickly without making any assumptions about the underlying implementation details (see Tables 1-3). At this level, the descriptions might include behavioral issues such as concurrency, or communication concepts such as communication protocols. Some tools handle only one MoCfor example, nitestate machines (FSMs). Others are more general, handling a set of MoCs or having no restrictions. For example, the Simulink representation language handles discrete dataflow and continuous time. Hence, it is a limited heterogeneous modeling-and-analysis tool. Ptolemy II, with its actor-oriented abstract semantics,
We now present our classification, beginning with tools that fall into individual binsthose meant to be part of a larger tool ow or that work in a very specic application domain. We then address tools that cover larger portions of the design ow space.
363
Table 2. Tools in bin F: Academic. Provider Univ. of California, Berkeley Royal Inst. of Technology, Sweden Mozart Board Mozart ForSyDe Tools Ptolemy II Focus Modeling, simulation, and design of concurrent, real-time, embedded systems System design starts with a synchronous computational model, which captures system functionality Advanced development platform for intelligent, distributed applications S: Object-oriented GUI using Oz http://www.mozart-oz.org C: Synchronous MoC http://www.imit.kth.se Abstraction S: All MoCs Web site http://ptolemy.eecs.berkeley.edu
Table 3. Tools in bin F: Languages. Provider Celoxica Univ. of California, Irvine Tools Handel-C SpecC Focus Compiling programs into hardware images of FPGAs or ASICs ANSI-C with explicit support for behavioral and structural hierarchy, concurrency, state transitions, timing, and exception handling Inria Univ. of Kansas Esterel Rosetta Synchronous-reactive programming language Compose heterogeneous specications in a single declarative semantic environment Mozart Board Oz Advanced, concurrent, networked, soft real-time, and reactive applications Various ROOM Real-time object-oriented modeling S: Object oriented NA C: Dataow synchronization http://www.mozart-oz.org C: Synchronous reactive S: All MoCs http://www-sop.inria.fr/meije/ esterel/esterel-eng.html http://www.sldl.org Abstraction C: Communicating C: C language based Web site NA http://www.ics.uci.edu/~specc
sequential processes
can handle all MoCs. Depending on the MoC supported, design entry for each tool could start at a higher or a lower abstraction level.
focus on integration and communication components. This categorys main characteristic is congurability, which ensures the applicability of a platform or components to a wide variety of applications and design styles.
Bin P
This category includes providers of platforms or platform components, as well as tools and languages that describe, manipulate, or analyze unmapped platforms (see Tables 4 and 5). Similar to tools in bin F, those in bin P can span several abstraction layers and support different kinds of architectural components. For example, Xilinx and Altera mainly concern programmable hardware devices, whereas Tensilica focuses on congurable processors. Others, such as Sonics and Beach Solutions,
Bin M
This bin contains tools dedicated to rening a functional description into a mapped platform instance, including its performance evaluation and possibly the synthesis steps required to proceed to a more detailed abstraction level (see Tables 6-8). The tools in bin M vary widely in particular design style, MoC, and supported application area. To provide the necessary quality of results, the tools are typically very specic.
364
Table 4. Tools in bin P: Industrial. Provider Prosilog Tools Nepsys Focus Standards-based IP libraries and support tools (SystemC) Abstraction C: RTL and transaction-level SystemC; VHDL for SoCs Beach Solutions Altera Xilinx Mentor Graphics Sonics Xilinx Design and Reuse Stretch Sonics Studio ISE, EDK, XtremeDSP Hosted Extranet Services Software Congurable Processor compiler ProDesign CHIPit Transaction-based verication platform C: FPGA-based rapid prototyping http://www.prodesign-usa.com Compile a subset of C into hardware for instruction extensions C: Softwarecongurable processors http://www.stretchinc.com Quartus II Platform Studio Nucleus EASI-Studio Solutions to package and deploy IP in a repeatable, reliable manner FPGAs, CPLDs, and structured ASICs IP integration framework Family of real-time operating systems and development tools On-chip interconnection infrastructure FPGAs, CPLDs, and structured ASICs IP delivery systems I: Bus-functional models I: IP blocks, C, and RTL; FPGAs S: All types of IP http://www.design-reuse.com http://www.xilinx.com I: IP blocks, C, and RTL; FPGAs C: IP blocks, FPGAs S: Software http://www.xilinx.com http://www.mentor.com/products/ embedded_software/nucleus_rtos http://www.sonicsinc.com http://www.altera.com C: Interconnection http://www.beachsolutions.com Web site http://www.prosilog.com
Table 5. Tools in bin P: Languages. Provider Spirit Consortium Tools Spirit Focus IP exchange and integration standard written in XML Abstraction S: Various IP levels Web site http://www.spiritconsortium.com
Metabin FP
This category consists of languages that can express both functionality and architecture (see Tables 9 and 10 on p. 368). Typically, they express algorithms and different styles of communication and structure for different MoCs. Assertions, or constraints, complement the platform description. In the case of Unied Modeling Language (UML), the semantics are often left unspecied.
the often superior quality of achievable implementation results.
Metabin PM
This metabin includes tools that combine architectural services and mapping (see Tables 12-14 on pp. 369370). These tools have a tight coupling between the services they provide and how functionality can map to these services. They require the use of other tools for some aspect of system design (often in the way the design functionality is specied).
Metabin FM
This metabin reects tools that provide some combination of functional description and analysis capabilities plus mapping and synthesis capabilities (see Table 11 on p. 368). In this case, the platform architecture is typically xed. This lack of exibility is offset by
Metabin FPM
Entries in this category are the frameworks that support the PBD paradigm (see Tables 15 and 16 on p. 371).
365
Table 6. Tools in bin M: Industrial, set I. Provider MathWorks dSpace ETAS Tools Real-Time Workshop TargetLink Ascet Focus Code generation and embeddedsoftware design Optimized code generation and software development Modeling, algorithm design, code generation, and software development, with emphasis on the automotive market Y Explorations eXCite Take virtually unrestricted ISO or ANSI-C with channel I/O behavior and generate Verilog or VHDL RTL output for logic synthesis AccelChip Forte Design Systems Future Design System Center Automation Catalytic ACE Associate Compiler Experts Tenison VTOC RTL to C++ or SystemC I: RTL, transactional http://www.tenison.com Co-development Suite DeltaFX, RMS CoSy Synthesis of DSP algorithms on processors or ASICs Automatic generation of compilers for DSPs I: DSP-C and embedded-C language extensions http://www.ace.nl I: Matlab algorithms http://www.catalytic-inc.com ASCI-C to RTL synthesis toolset C: C to RTL http://www.future-da.com AccelChip and AccelWare Cynthesizer Behavioral synthesis C: SystemC to RTL http://www.forteds.com DSP synthesis; Matlab to RTL C: Matlab http://www.accelchip.com S: C language input http://www.yxi.com S: Ascet models http://en.etasgroup.com/products/ ascet/index.shtml Abstraction S: Simulink-level models S: Simulink models http://www.dspace.com Web site http://www.mathworks.com
In particular, Metropolis fully embodies this paradigm, covering all bins and all abstraction layers. In this category, we include design space exploration tools and languages that can separately describe the functionality on the one hand, and the possible architectures for an implementation on the other. These tools can also map the functionality onto the platform instances to obtain metrics for the implementations performance.
Design scenarios
Here, we use the PBD framework of Figure 1 to map three design flow scenarios on the tool landscape. Figure 2 (see p. 372) shows the metabins and the hierarchical levels where activities take place.
ture and modify the initial specification quickly; the ability to express concurrency, constraints, and other behavior-specific characteristics efficiently; and the ability to capture useful abstract services for implementing high-level specications into a more detailed functional view. The flow thus starts at the higher abstraction levels in bin F of our classication. We can expand these levels into a Y diagram of the same structure as the one described in Figure 1. This structure offers

Scenario 1: New application design from specication

The requirements of this scenario include the need to start from a high-level specication; the desire to cap-
exible specication captureno ties to a particular implementation style or platform; services that help move the abstract design toward a more constrained version (for example, algorithms that can implement functionality); and independent mapping of functionality onto algorithmic structures that enable reuse of the functional specication.
366
Table 7. Tools in bin M: Industrial, set II. Provider Sequence Design Tools ESL Power Technology, Power Theater, CoolTime, CoolPower PowerEscape PowerEscape (with CoWare) Architect, PowerEscape Synergy, PowerEscape Insight CriticalBlue Cascade Design ow for application-specic hardware acceleration coprocessors for ARM processors Synfora Actis Impulse Accelerated Technologies Poseidon Design Systems SynaptiCAD Avery Design Systems Emulation and ZeBu Verication Engine Functional verication I: Hardware emulation http://www.eve-team.com SynaptiCAD line TestWizard Triton Tuner, Triton Builder Design ow for application-specic hardware acceleration coprocessors Testbench generators and simulators Verilog HDL, VHDL, and C-based testbench automation I: RTL and C http://www.avery-design.info C: RTL and SystemC http://www.syncad.com C: C and SystemC http://www.poseidon-systems.com PICO Express AccurateC CoDeveloper C to RTL, or C to System C (transaction-level models) Static code analysis for SystemC C to FPGA I: Pipeline processor arrays C: C syntax and semantic checking C: C code http://www.impulsec.com http://www.actisdesign.com http://www.synfora.com I: C code to Verilog or VHDL http://www.criticalblue.com Memory hierarchy design, code performance analysis, complete proling C: C code http://www.coware.com/products/ powerescape.php Focus Power analysis and optimization Abstraction I: SystemC level Web site http://www.sequencedesign.com
Table 8. Tools in bin M: Academic. Provider Univ. of Illinois at UrbanaChampaign Tools Impact Compiler Focus Compilation development for instruction-level parallelism Abstraction S: C code for highperformance processors Web site http://www.crhc.uiuc.edu/Impact
Lets examine an example in the multimedia domain: the implementation of a JPEG encoder on a heterogeneous multiprocessor architecture such as the Intel MXP5800. This architecture has eight image signal
processors (ISP1 to ISP8) connected with programmable quad ports (eight per processor).7 The encoder compresses raw image data and emits a compressed bitstream. The first step in the scenario is to choose a
367
Table 9. Tools in metabin FP: Industrial. Provider MathWorks Tools Simulink, State Flow Focus Modeling, algorithm design, and software development Abstraction S: Timed dataow, FSMs Web site http://www.mathworks.com
Table 10. Tools in metabin FP: Languages. Provider Open SystemC Initiative Object Management Group Accellera Unied Modeling Language SystemVerilog Hardware description and verication S: Transaction level, language extension of Verilog RTL, assertions http://www.systemverilog.org Tools SystemC Focus Provide hardware-oriented constructs within the context of C++ Specify, visualize, and document software system models S: Object-oriented, diagrams http://www.uml.org Abstraction S: Transaction level to RTL Web site http://www.systemc.org
Table 11. Tools in metabin FM: Industrial. Provider Celoxica Tools DK Design Suite, Agility Compiler, Nexus-PDK BlueSpec BlueSpec Compiler, BlueSpec Simulator I-Logix Mentor Graphics Esterel Technologies Calypto SCADE, Esterel, Studio SLEC System Code generation for safety-critical applications such as avionics and automotive Functional verication between system level and RTL C: SystemC, RTL http://www.calypto.com I: Synchronous http://www.esterel-technologies.com Rhapsody and Statemate Catapult C Real-time UML-embedded applications C++ to RTL synthesis C: Untimed C++ http://www.mentor.com S: UML based http://www.ilogix.com BlueSpec SystemVerilog rules and libraries S: SystemVerilog and http://www.bluespec.com term-rewriting synthesis Focus Algorithmic design entry, behavioral design, simulation, and synthesis Abstraction C: Handel-C based Web site http://www.celoxica.com
particular MoC to describe the designs functionality. To be more efficient in applying our proposed design paradigm, the designer should use a MoC that is also suitable for describing the architectures capabilities. Hence, the designer eases the mapping task and the analysis of the mapped designs properties. In addition, a synthesis step could execute the mapping process automatically. Because this is a data-streaming application that
maps onto a highly concurrent architecture, it is natural to use a Kahn process networks (KPN) representation. In KPN, a set of processes communicate through one-way FIFO channels. Reads from channels are blocked when no tokens are present; processes cannot query the channel status. However, this model is Turing complete, so scheduling and buffer size are undecidable. The KPN model of the JPEG encoder algorithm is completely independent of the target architecture sat-
368
Table 12. Tools in metabin PM: Industrial, set I. Provider ARM Tools RealView MaxSim Tensilica Xtensa, XPRES Focus Embedded microprocessors and development tools; system-level development tools Programmable solutions with specialized Xtensa processor description from native C and C++ code Summit System Architect, Efciently design and analyze the Visual Elite architecture and implementation of multicore SoCs and large-scale systems VaST Systems Technology Virtio Virtio Virtual Platform Cadence Incisive High-performance software model of a complete system Integrated tool platform for verication, including simulation, formal methods, and emulation Mentor SpiraTech ARC International Arithmatica CellMath Tool Suite Platform Express XML-based integration environment Cohesive ARC Protocol abstraction transformers Embedded microprocessors and development tools Proprietary improvements for implementing silicon computational units C: XML-based structure C: Transaction level, IP blocks I: ISA extensions, microarchitectural level I: Microarchitectural datapath computation elements and design http://www.arithmatica.com http://www.arc.com http://www.spiratech.com http://www.mentor.com Comet, Meteor Very high-performance processor and architecture models S: Virtual processor, bus, and peripheral devices I: Virtual platform models at SystemC level S: RTL and SystemC assertions http://www.cadence.com http://www.virtio.com http://www.vastsystems.com C: SystemC http://www.sd.com Abstraction C: C++ ARM processor development C: Custom ISA processor, C and C++ code http://www.tensilica.com Web site http://www.arm.com
isfying the requirements for this scenario. We could use Ptolemy II to capture this model and simulate the selected algorithms behavior. To allow a better analysis and to rene the model toward implementation, we can map this model into another dataow model, similar to cyclostatic dataow,8 which permits only one writer per channel but allows multiple reader processes. For all channels, each reader process can read each data token exactly once. Also, this dataow model allows limited forms of data-dependent communication. To enable the execution of multiple processes on a single processing element, this MoC supports multitasking. In particular, the system may suspend a process only between rings. Because of the limitations
just discussed, this MoC lets designers decide scheduling, buffer sizing, and mapping. It is easy to express the model in Ptolemy II and to describe it in Simulink or the Signal Processing Worksystem (SPW). This rst stepmapping a more exible model for the functionality into a more restricted one that is easier to implement and analyze is critical in any system-level design. Subsequently, the mapped specification becomes the functional representation for the diagram in Figure 1. So, the ow can continue at lower abstraction levels with tools in metabin FM for an integrated solution, or in bin F followed by M for a multitool solution. Because most of the architecture is fixed, an efficient, specialized approach is more appropriate. Figure 2a shows a
369
Table 13. Tools in metabin PM: Industrial, set II. Provider Target Compiler Technologies Arteris ChipVision Design Systems Wind River Systems CoWare Various platform solutions ConvergenSC Provide various platforms for different design segments (auto, consumer) Capture, design, and verication for SystemC S: SystemC functionality input; SystemC, HDL services Carbon Design Systems GigaScale IC InCyte Chip estimation and architecture analysis Virtutech Virtutech Simics Build, modify, and program new virtual systems National Instruments CoWare LisaTek LabView 8 FPGA Create custom I/O and control hardware for FPGAs Embedded-processor design tool suite C: LabView graphical programming C: Lisa architecture description language http://www.coware.com http://www.ni.com/fpga S: High-level chip information (gate count, I/O, IP blocks) I: C language and ISAs http://www.virtutech.com http://www.chipestimate.com VSP Presilicon validation ow C: Verilog and VHDL, bus protocols http://www.carbondesignsystems.com http://www.coware.com I: Software API http://www.windriver.com Danube, NoCexplorer Orinoco Pre-RTL power prediction for behavioral synthesis C: SystemC algorithm input http://www.chipvision.com Tools Checkers (ISS) Focus developing, programming, and verifying embedded IP cores Synthesis of NoC Abstraction I: Mapping of C code to processors written in nML C: NoC dataow http://www.arteris.net Web site http://www.retarget.com
Chess (compiler), Retargetable tool suite for
Table 14. Tools in metabin PM: Academic. Provider Carnegie Mellon Univ. Tools MESH Focus Enable heterogeneous microdesign through new simulation, modeling, and design strategies Univ. of California, Los Angeles xPilot Automatically synthesize high-level behavioral descriptions for silicon platforms Abstraction C: C input; programmable, heterogeneous multiprocessors C: C, SystemC http://cadlab.cs.ucla.edu/soc Web site http://www.ece.cmu.edu/~mesh
potential traversal of the framework. For our JPEG case, we can map the functionality onto the MXP5800 using the Metropolis environment to analyze potential problems with the architecture or to optimize the applications coding for the chosen platform instance.
Scenario 2: New integration platform development

This scenario describes the development of a new integration platform: a hardware architecture, embedded-software architecture, design methodologies
370
Table 15. Tools in metabin FPM: Industrial. Provider CoFluent Design MLDesign Technologies MLDesigner Tools CoFluent Studio Focus Design space exploration through Y-chart modeling of functional and architectural models Integrated platform for modeling and analyzing the architecture, function, and performance of high-level system designs Mirabilis Design VisualSim product family Multidomain simulation kernel and extensive modeling library S: Discrete event, dynamic dataow, and synchronous dataow S: Discrete event, synchronous dataow, continuous time, and FSM Synopsys System Studio Algorithm and architecture capture, performance evaluation S: SystemC http://www.synopsys.com http://www.mirabilisdesign.com http://www.mldesigner.com Abstraction S: Transaction-level SystemC Web site http://www.couentdesign.com
Table 16. Tools in metabin FPM: Academic. Provider Univ. of California, Berkeley Seoul National Univ. Vanderbilt Univ. GME, Great, Desert Peace Tools Metropolis Focus Operational and denotational functionality and architecture capture, mapping, renement, and verication Codesign environment for rapid development of heterogeneous digital systems Metaprogrammable tool for navigating and pruning large design spaces S: Objected-oriented C++ kernel (Ptolemy based) S: Graph transformation, UML and XML based, and external component support Delft Univ. of Technology Univ. of California, Berkeley Artemis, Compaan and Laura, Sesame, Spade Mescal Programming of application-specic programmable platforms S: Extended Ptolemy II, http://www.gigascale.org/mescal network processors Workbench enabling methods and tools to model applications and SoC-based architectures C: Kahn process networks http://ce.et.tudelft.nl/artemis http://repo.isis.vanderbilt.edu http://peace.snu.ac.kr Abstraction S: All MoCs Web site http://www.gigascale.org/metropolis
(authoring and integration), design guidelines and modeling standards, virtual-components characterization and support, and design verification (hardwaresoftware, hardware prototype), focusing on a particular target application.9 Unlike the first scenario, this one is not concerned with the design of a particular application but rather with the development of a
substrate to realize several applications. Characteristic of this scenario is the service- and mapping-centric requirements that concern tools in metabin PM for development and analysis at the desired abstraction level. The platform developer builds the substrate, or platform, and uses the tools in metabin PM. The platform user proceeds in metabin FM to map the desired
371
Functionality
Platform
Functionality
Platform
Functionality
Platform
F0
P0 Step 1. FPM-based tool
F0
P0 Step 1. PM-based tool
F0
P0
Mapping
Mapping
Mapping
M0
M0
M0 Platform P1 Step 2. FM (augmented functionality)
Option 1. P-tool at appropriate abstraction level
Functionality
Option 1a. F ( (multitools)
Functionality
Functionality
F1 F1 Option 2. FM (integrated tools) Platform P1 Mapping Option 1b. M (multitools) Mapping M1 (a) (b) M1
F1
Platform
P1 Option 2. FP with synthesis to lower-level flows
Mapping
M1 (c)
Figure 2. Metabins and hierarchical levels for three design scenarios: new application design from specication (a), new integration platform development (b), and legacy design integration (c).
functionality to the selected platform instance. Figure 2b illustrates the metabin flows that support these development requirements. Consider as a test case the development of a new electronic control unit (ECU) platform for an automotive engine controller. The application designers have already developed the application code for the platform, but a Tier 1 supplier wants to improve the cost and performance of its part of the platform to avoid losing an important original equipment manufacturer (OEM) customer. If the designers employ the paradigm described in this article, the application becomes as independent on the ECU platform as possible. Next, in collaboration with a Tier 2 supplier (a chip maker), the Tier 1 supplier determines qualitatively that a dual-core architecture would offer better performance at a lower manufacturing cost. A platform designer then uses a tool for platform development, such as LisaTek, to capture the dual-core architecture. If the dual core is based
on ARM processing elements, the designers and the Tier 1 supplier can also use ARM models and tool chains. An appropriate new real-time operating system could exploit the implementations multicore nature. At this point, the designers map the application onto one of the possible dual-core architectures, considering the number of bits supported by the CPU, the set of peripherals to integrate, and the interconnect structure. For each choice, the designers simulate the mapped design with the engine control software or a subset of it to stress the architecture. These simulations can employ the ARM tools or VaST offerings to rapidly obtain important statistics such as interconnect latency and bandwidth, overall system performance, and power consumption. At the end of this exercise, the Tier 2 supplier is fairly condent that its architecture is capable of supporting a full-edged engine control algorithm. Any other Tier 1 supplier can use this product now for its engine control offering.
372
Scenario 3: Legacy design integration

The final scenario represents a common situation for many companies wishing to integrate their existing designs into new ESL flows. In this case, its difficult to separate functionality and architecture, because in most embedded systems the documentation refers to the final implementation, not to its original specifications and the relative implementation choices. If modifying the design is necessary to implement additional features, its very difficult to determine how the new functionality will affect the existing design. This situation calls for reverse engineering to extract functionality from the final implementation. The most effective way to do this might be to start the description of the functionality from scratch, using tools in bin F. An alternative might be an effective encapsulation of the legacy part of the design so that the new part interacts cleanly with the legacy part. We could then consider existing components as architectural elements that we must describe using tools in bin P. This, in turn, is possible at different abstraction levels. Because legacy components typically support a specific application, mapping is often unnecessary, and functional or architectural cosimulation can validate a new design. Metabin FP at the system level is therefore the appropriate flow model in this case. Figure 2c illustrates this scenario.
N.R. Satish, Qi Zhu, Simone Gambini, Wei Zheng, Will Plishker, Yang Yang, and Yanmei Li. A special thanks goes to Guang Yang and Trevor Meyerowitz for their valuable feedback. This work was done under partial support from the Center for Hybrid Embedded Software Systems and the Gigascale Systems Research Center.
References
1. G. Smith et al., Report on Worldwide EDA Market
Trends, Gartner Dataquest, Dec. 2005.

2. J. Vleeschouwer and W. Ho, The State of EDA: Just Slightly up for the Year to Date Technical and Design Software, The State of the Industry, Merrill Lynch report, Dec. 2005. 3. International Technology Roadmap for Semiconductors
2004 Update: Design, 2004, http://www.itrs.net/Links/

2004Update/2004_01_Design.pdf. 4. A. Sangiovanni-Vincentelli, Dening Platform-Based Design, EE Times, Feb. 2002, http://www.eetimes.com/ news/design/showArticle.jhtml?articleID=16504380. 5. A. Sangiovanni-Vincentelli and G. Martin, PlatformBased Design and Software Design Methodology for Embedded Systems, IEEE Design & Test, vol. 18, no. 6, Nov.-Dec. 2001, pp. 23-33. 6. D.D. Gajski and R.H. Kuhn, Guest Editors Introduction: New VLSI Tools, Computer, vol. 16, no. 12, Dec. 1983, pp. 11-14.
ESL WILL EVENTUALLY BE in the limelight of the
7. A. Davare et al., JPEG Encoding on the Intel MXP5800: A Platform-Based Design Case Study, Proc.
design arena. But structural conditions in the EDA and electronics industry must change to offer a sufficiently receptive environment that will allow the birth of new companies and the evolution of present ones into this exciting area. An important technical prerequisite is industry and academia agreement on a holistic view of the design process in which to cast existing and future tools and flows. Our unified design framework can act as a unifying element in the ESL domain. However, standardization of systemlevel design will take years and require significant effort to fully materialize.
3rd Workshop Embedded Systems for Real-Time Multimedia (ESTIMedia 05), IEEE CS Press, 2005, pp. 89-94.
8. G. Bilsen et al., Cyclo-Static Dataow, IEEE Trans. Sig-
nal Processing, vol. 44, no. 2, Feb. 1996, pp. 397-408.

9. H. Chang et al., Surviving the SOC Revolution: A Guide
to Platform-Based Design, Kluwer Academic Publishers,

1999.
Acknowledgments
We thank the following for their support in reviewing this article and in helping to classify the various ESL approaches. Without them, this article would not have been possible: Abhijit Davare, Alessandro Pinto, Alvise Bonivento, Cong Liu, Gerald Wang, Haibo Zeng, Jike Chong, Kaushik Ravindran, Kelvin Lwin, Mark McKelvin,
Douglas Densmore is a PhD candidate in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research interests focus on system-level architecture modeling, with emphasis on architecture refinement techniques for system-level design. Densmore has a BS in computer engineering from the University of Michigan, Ann Arbor, and an MS in electrical engineering from the University of California, Berkeley. He is a member of the IEEE.
373
Roberto Passerone is an assistant professor in the Department of Information and Communication Technology at the University of Trento, Italy. His research interests include systemlevel design, communication design, and hybrid systems. Passerone has a Laurea degree in electrical engineering from Politecnico di Torino, Italy, and an MS and a PhD in electrical engineering and computer sciences from the University of California, Berkeley. He is a member of the IEEE. Alberto Sangiovanni-Vincentelli holds the Buttner Endowed Chair of the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley. His research interests include design tools and methodologies, large-
scale systems, embedded controllers, and hybrid systems. Sangiovanni-Vincentelli has a PhD in engineering from Politecnico di Milano. He is cofounder of Cadence and Synopsys, an IEEE Fellow, a member of the General Motors Scientic and Technology Advisory Board, and a member of the National Academy of Engineering.
Direct questions or comments about this article to Douglas Densmore, Dept. of Electrical Engineering and Computer Sciences, Univ. of California, Berkeley, 545Q Cory Hall (DOP Center), Berkeley, CA 94720; densmore@eecs.berkeley.edu.
DONT RUN THE RISK.
BE SECURE.
Ensure that your networks operate safely and provide critical services even in the face of attacks. Develop lasting security solutions, with this peer-reviewed publication. Top security professionals in the field share information you can rely on: Wireless Security Securing the Enterprise Designing for Security Infrastructure Security Privacy Issues Legal Issues Cybercrime Digital Rights Management Intellectual Property Protection and Piracy The Security Profession Education Order your subscription today.
www.computer.org/security/
Submit an article to IEEE Security & Privacy. Log onto Manuscript Central at http://cs-ieee.manuscriptcentral.com/.
374
The Challenges of Synthesizing Hardware from C-Like Languages

Stephen A. Edwards
Columbia University
Editors note: This article presents one side of an ongoing debate on the appropriateness of C-like languages as hardware description languages. The article examines various features of C and their mapping to hardware, and makes a cogent argument that vanilla C is not the right language for hardware description if synthesis is the goal. Sandeep K. Shukla, Virginia Polytechnic and State University
THE MAIN REASON people have proposed C-like languages for hardware synthesis is familiarity. Proponents claim that by synthesizing hardware from C, we can effectively turn every C programmer into a hardware designer. Another common motivation is hardware-software codesign: Designers often implement todays systems as a mix of hardware and software, and its often unclear at the outset which portions can be hardware and which can be software. The claim is that using a single language for both simplies the migration task. I argue that these claims are questionable and that pure C is a poor choice for specifying hardware. On the contrary, the semantics of C and similar imperative languages are distant enough from hardware that C-like thinking might be detrimental to hardware design. Instead, successful hardware synthesis from C seems to involve languages that vaguely resemble C, mostly its syntax. Examples of these languages include Celoxicas Handel-C1 and NECs Behavior Description Language (BDL).2 You can think of executing C code on a traditional sequential processor as synthesizing hardware from C, but the techniques presented here strive for more highly customized implementations that exploit greater parallelism, hardwares main advantage. Unfortunately, the C language has no support for userspecified parallelism, and so either the synthesis tool
must find it (a difficult task) or the designer must use language extensions and insert explicit parallelism. Neither solution is satisfactory, and the latter requires that C programmers think differently to design hardware. My main point is that giving C programmers tools is not enough to turn them into reasonable hardware designers. Efcient hardware is usually very difcult to describe in an unmodied C-like language, because the language inhibits specication or automatic inference of adequate concurrency, timing, types, and communication. The most successful C-like languages, in fact, bear little semantic resemblance to C, effectively forcing users to learn a new language (but perhaps not a new syntax). As a result, techniques for synthesizing hardware from C either generate inefcient hardware or propose a language that merely adopts part of C syntax. Here, I focus only on the use of C-like languages for hardware synthesis and deliberately omit discussion of other important uses of a design language, such as validation and algorithm exploration. C-like languages are far more compelling for these tasks, and one in particular, SystemC, is now widely used, as are many ad hoc variants.
A short history of C
Dennis Ritchie developed C in the early 1970s,3 based on experience with Ken Thompsons B language, which had evolved from Martin Richards Basic Combined Programming Language (BCPL). Ritchie described all three as close to the machine in the sense that their abstractions are similar to data types and operations supplied by conventional processors. A core principle of BCPL is its memory model: an
0740-7475/06/$20.00 2006 IEEE
375
Performance or bust
Throughout this article, I assume that optimizing performancefor example, speed under area and power constraintsis the main goal of hardware synthesis (beyond, of course, functional correctness). This assumption implicitly shapes all my criticisms of using C for hardware synthesis and should denitely be considered carefully. On the one hand, performance optimization has obvious economic advantages: An efficient circuit solves problems faster, is cheaper to manufacture, requires less power, and so forth. Historically, this has been the key focus of logic synthesis, high-level synthesis, and other automated techniques for generating circuits. On the other hand, optimization can have disadvantages such as design time and nonrecurring engineering costs. The distinction between full-custom ICs and ASICs illustrates this. A company like Intel, for example, is willing to invest an enormous number of hours in designing and hand-optimizing its next microprocessors layout because of the volume and margins the company commands. A company like Cisco, however, might implement its latest high-end router on an FPGA because it doesnt make economic sense to design a completely new chip. Both approaches are reasonable. A key question, then, is: What class of problems does hardware synthesis from C really target? This article assumes an audience of traditional hardware designers who want to design hardware more quickly, but other articles target designers who would otherwise implement their designs in software but need faster results. The soundness of my conclusions may well depend on which side of this fence youre on.
undifferentiated array of words. BCPL represents integers, pointers, and characters all in a single word; the language is effectively typeless. This made perfect sense on the word-addressed machines BCPL was targeting, but it wasnt acceptable for the byte-addressed PDP-11 on which C was rst developed. Ritchie modied BCPLs word array model to add the familiar character, integer, and oating-point types now supported by virtually every general-purpose processor. Ritchie considered Cs treatment of arrays to be characteristic of the language. Unlike other languages that have explicit array types, arrays in C are almost a side effect of its pointer semantics. Although this model leads to simple, efcient implementations, Ritchie observed that the prevalence of pointers in C means that compilers must use careful dataow techniques to avoid aliasing problems while applying optimizations.
Ritchie listed a number of infelicities in the language caused by historical accident. For example, the use of break to separate cases in switch statements arose because Ritchie copied an early version of BCPL; later versions used endcase. The precedence of bitwise-AND is lower than the equality operator because the logicalAND operator was added later. Many aspects of C are greatly simplified from their BCPL counterparts because of limited memory on the PDP-11 (24 Kbytes, of which 12 Kbytes were devoted to the nascent Unix kernel). For example, BCPL allowed the embedding of arbitrary control ow statements within expressions. This facility doesnt exist in C, because limited memory demanded a one-pass compiler. Thus, C has at least four dening characteristics: a set of types that correspond to what the processor directly manipulates, pointers instead of a rst-class array type, several language constructs that are historical accidents, and many others that are due to memory restrictions. These characteristics are well-suited to systems software programming, Cs original application. C compilers have always produced efcient code because the C semantics closely match the instruction set of most general-purpose processors. This also makes it easy to understand the compilation process. Programmers routinely use this knowledge to restructure source code for efficiency. Moreover, Cs type system, while generally very helpful, is easily subverted when needed for lowlevel access to hardware. These characteristics are troublesome for synthesizing hardware from C. Variable-width integers are natural in hardware, yet C supports only four sizes, all larger than a byte. Cs memory model is a large, undifferentiated array of bytes, yet hardware is most effective with many small, varied memories. Finally, modern compilers can assume that available memory is easily 10,000 times larger than that available to Ritchie.
C-like hardware synthesis languages

Table 1 lists some of the C-like hardware languages proposed since the late 1980s (see also De Micheli4). One of the earliest was Cones, from Stroud et al.5 From a strict subset of C, it synthesized single functions into combinational blocks. Figure 1 shows such a function. Cones could handle conditionals; loops, which it unrolled; and arrays treated as bit vectors. Ku and De Micheli developed HardwareC6 for input to their Olympus synthesis system.7 It is a behavioral hardware language with a C-like syntax and has extensive support for hardware-like structure and hierarchy.
376
Table 1. C-like languages for hardware synthesis. Language Cones HardwareC SystemC Ocapi C2Verilog BDL Handel-C SpecC Bach C CASH Catapult C Comment Early, combinational only Behavioral synthesis centered Verilog in C++ Algorithmic structural descriptions Comprehensive Many extensions and restrictions (NEC) C with CSP (Celoxica) Resolutely renement based Untimed semantics (Sharp) Synthesizes asynchronous circuits ANSI C++ subset (Mentor Graphics)
Transmogrier C Limited scope
INPUTS: IN[5]; OUTPUT: OUT[3]; rd53() { int count, i; count = 0; for (i=0 ; i<5 ; i++) if (IN[i] == 1) count = count + 1; for (i=0 ; i<3 ; i++) { OUT[i] = count & 0x01; count = count >> 1; } } Figure 1. A function that returns a count of the number of 1s in a ve-bit vector in Cones. The function is translated into a combinational circuit.
#dene SIZE 8 process gcd (xi, yi, rst, ou) in port xi[SIZE], yi[SIZE]; in port rst; out port ou[SIZE]; { boolean x[SIZE], y[SIZE]; write ou = 0; if ( rst ) < x = read(xi); y = read(yi); > if ((x != 0) & (y != 0)) repeat { while (x >= y) x = x y; < x = y; /* swap x and y */ y = x; > } until (y == 0); else x = 0; write ou = x; } Figure 2. Greatest common divisor algorithm in HardwareC. Statements within a < > block run in parallel; statements within a { } block execute in parallel when data dependencies allow.
#pragma intbits 8 seven_seg(x) #pragma intbits 4 int x; { #pragma intbits 8 int result; x = x & 0xf; result = 0; if (x == 0x0) result = 0xfc; if (x == 0x1) result = 0x60; if (x == 0x2) result = 0xda; if (x == 0x3) result = 0xf2; if (x == 0x4) result = 0x66; if (x == 0x5) result = 0xb6; if (x == 0x6) result = 0xbe; if (x == 0x7) result = 0xe0; if (x == 0x8) result = 0xfe; if (x == 0x9) result = 0xf6; return(~result); } twodigit(y) int y; { int tens; int leftdigit, rightdigit; outputport(leftdigit, 37, 44, 40, 29, 35, 36, 38, 39); outputport(rightdigit, 41, 51, 50, 45, 46, 47, 48, 49); tens = 0; while (y >= 10) { tens++; y = 10; } leftdigit = seven_seg(tens); rightdigit = seven_seg(y); }
Figure 2 shows the greatest common divisor (GCD) algorithm in HardwareC. Galloways Transmogrier C is a fairly small C subset that supports integer arithmetic, conditionals, and loops.8 Unlike Cones, it generates sequential designs by inferring a state at function calls and at the beginning of while loops. Figure 3 shows a decoder in Transmogrier C.
Figure 3. Two-digit decimal-to-seven-segment decoder in Transmogrier C. Output-port declarations assign pin numbers.
377
#include systemc.h #include <stdio.h> struct decoder : sc_module { sc_in<sc_uint<4> > number; sc_out<sc_bv<7> > segments; void compute() { static sc_bv<7> codes[10] = { 0x7e, 0x30, 0x6d, 0x79, 0x33, 0x5b, 0x5f, 0x70, 0x7f, 0x7b }; if (number.read() < 10) segments = codes[number.read()]; } SC_CTOR(decoder) { SC_METHOD(compute); sensitive << number; } }; struct counter : sc_module { sc_out<sc_uint<4> > tens; sc_out<sc_uint<4> > ones; sc_in_clk clk; void tick() { int one = 0, ten = 0; for (;;) { if (++one == 10) { one = 0; if (++ten == 10) ten = 0; } ones = one; tens = ten; wait(); } } SC_CTOR(counter) { SC_CTHREAD(tick, clk.pos()); } };
Figure 4. A two-digit, decimal-to-seven-segment decoder in SystemC. The decoder produces combinational logic; the counter produces sequential logic.
S0 !eof/sfg 3 S1 /sfg 1
fsm f; initials0; state s1; s0 << always << sfg1 << s1;
s1 << cnd(eof) << sfg2 << s1; eof/sfg 2 s1 << !cnd(eof)<< sfg3 << s0;
Figure 5. FSM described in Ocapi. This is a declarative style executed to build data structures for synthesis rather than compiled in the traditional sense.
SystemC is a C++ dialect that supports hardware and system modeling.9 Its popularity stems mainly from its simulation facilities (it provides concurrency with lightweight threads), but a subset of the language can be synthesized. SystemC uses the C++ class mechanism to model hierarchical structure and describes hardware through combinational and sequential processes, much as Verilog and VHDL do. Cynlib, from Forte Design Systems, is similar. Figure 4 shows a decoder in SystemC. The Ocapi system from IMEC (the Interuniversity Microelectronics Center in Belgium) is also C++ based but takes a different approach.10 Instead of being parsed, analyzed, and synthesized, the C++ program is run to generate in-memory data structures that represent the hardware systems structure. Supplied classes provide mechanisms for specifying data paths, finitestate machines (FSMs), and similar constructs. These data structures are then translated into languages such as Verilog and passed to conventional synthesis tools. Figure 5 shows an FSM in Ocapi. The C2Verilog compiler developed at CompiLogic (later called C Level Design and, since November 2001, part of Synopsys) is one of the few compilers that can claim broad support of ANSI C. It can translate pointers, recursion, dynamic memory allocation, and other thorny C constructs. Panchul, Soderman, and Coleman hold a broad patent covering C-to-Verilog-like translation, which describes their compiler in detail.11 NECs Cyber system accepts BDL.2 Like HardwareC, Cyber is targeted at behavioral synthesis. BDL has been in industrial use for many years and deviates greatly from ANSI C by including processes with I/O ports, hardware-specific types and operations, explicit clock cycles, and many synthesis-related pragmas. Celoxicas Handel-C is a C variant that extends the language with constructs for parallel statements and Occam-like rendezvous communication.1 Handel-Cs timing model is uniquely simple: Each assignment statement takes one cycle. Figure 6 shows a four-place buffer in Handel-C. Gajski et al.s SpecC language12 is a superset of ANSI C, augmented with many system- and hardware-modeling constructs, including constructs for FSMs, concurrency, pipelining, and structure. The latest language reference manual lists 33 new keywords.13 SpecC imposes a renement methodology. Thus, the entire language is not directly synthesizable, but a series of manual and automated rewrites can rene a SpecC description into one that can be synthesized. Figure 7 shows a state machine described in a synthesizable RTL dialect of SpecC.
378
const dw = 8; void main(chan (in) c4 : dw, chan (out) c0 : dw) { int d0, d1, d2, d3; chan c1, c2, c3; void void void void e0() e1() e2() e3() { { { { while while while while (1) (1) (1) (1) { { { { c1 c2 c3 c4 ? ? ? ? d0; d1; d2; d3; c0 c1 c2 c3 ! ! ! ! d0; d1; d2; d3; } } } } } } } }
par { e0(); e1(); e2(); e3(); } }
behavior even( in event clk, in unsigned bit[1] rst, in bit[31:0] Inport, out bit[31:0] Outport, in bit[1] Start, out bit[1] Done, out bit[31:0] idata, in bit[31:0] iocount, out bit[1] istart, in bit[1] idone, in bit[1] ack_istart, out bit[1] ack_idone) { void main(void) { bit[31:0] ocount; bit[31:0] mask; enum state { S0, S1, S2, S3 } state; state = S0; while (1) { wait(clk); if (rst == 1b) state = S0; switch (state) { case S0: Done = 0b; istart = 0b; ack_idone = 0b; if (Start == 1b) state = S1; else state = S0; break; case S1: mask = 0x0001; idata = Inport; istart = 1b; if (ack_istart == 1b) state = S2; else state = S1; break; case S2: istart = 0b; ocount = iocount; if (idone == 1b) state = S3; else state = S2; break; case S3: Outport = ocount & mask; ack_idone = 1b; Done = 1b; if (idone == 0) state = S0; else state = S3; break; } } } };
Figure 6. Four-place buffer in Handel-C. The ? and ! operators are CSP-inspired receive and transmit operators.
Like Handel-C, Sharps Bach C is an ANSI C variant with explicit concurrency and rendezvous communication.14 However, Bach C only imposes sequencing rather than assigning a particular number of cycles to each operation. Also, although it supports arrays, Bach C does not support pointers. Budiu and Goldsteins CASH compiler is unique among the C synthesizers because it generates asynchronous hardware.15 It accepts ANSI C, identifies instruction-level parallelism (ILP), and generates an asynchronous dataow circuit. Mentor Graphics recent (2004) Catapult C performs behavioral synthesis from an ANSI C++ subset. Because it is a commercial product, details of its features and limitations are not publicly available. However, it appears to be a strict subset of ANSI C++ (that is, with few, if any, language extensions).
Concurrency
The biggest difference between hardware and software is its execution model. Software follows a sequential, memory-based execution model derived from Turing machines, whereas hardware is fundamentally concurrent. Thus, sequential algorithms that are efficient in software are rarely the best choice in hardware. This has serious implications for software programmers designing hardwaretheir familiar toolkit of algorithms is suddenly far less useful. Why is so little software developed for parallel hardware? The plummeting cost of parallel hardware would make such software appear attractive, yet concurrent programming has had limited success compared with its sequential counterpart. One funda-
Figure 7. State machine in a synthesizable RTL dialect of SpecC. The wait(clk) statement denotes a clock cycle boundary.
mental reason is that humans have difficulty conceiving of parallel algorithms, and thus many more sequential algorithms exist. Another problem is disagreement about the preferred parallel-programming
379
model (for example, shared memory versus message passing), as demonstrated by the panoply of parallelprogramming languages, none of which has emerged as a clear winner. Rather than exposing concurrency to the programmer and encouraging the use of parallel algorithms, the more successful approach has been to automatically expose parallelism in sequential code. Because C does not naturally support user-specied concurrency, such a technique is virtually mandatory for synthesizing efcient hardware from plain C. Unfortunately, these techniques are limited.
Finding parallelism in sequential code

There are three main approaches to exposing parallelism in sequential code, distinguished by their granularity. Instruction-level parallelism (ILP) dispatches groups of nearby instructions simultaneously. Although this has become the preferred approach in the computer architecture community, programmers recognize that there are fundamental limits to the amount of ILP that can be exposed in typical programs.16 Adding hardware to approach these limits, usually through speculation, results in diminishing returns. The second approach, pipelining, requires less hardware than ILP but can be less effective. A pipeline dispatches instructions in sequence but overlaps themthe second instruction starts before the rst completes. Like ILP, interinstruction dependencies and control-ow transfers tend to limit the maximum amount of achievable parallelism. Pipelines work well for regular loops, such as those in scientific or signal-processing applications, but are less effective in general. The third approach, process-level parallelism, dispatches multiple threads of control simultaneously. This approach can be more effective than ner-grained parallelism, depending on the algorithm, but process-level parallelism is difcult to identify automatically. Hall et al. attempt to invoke multiple iterations of outer loops simultaneously,17 but unless the code is written to avoid dependencies, this technique might not be effective. Exposing process-level parallelism is thus usually the programmers responsibility. Such parallelism is usually controlled through the operating system (for example, Posix threads) or the language itself (for example, Java).
Approaches to concurrency
The C-to-hardware compilers considered here take either of two approaches to concurrency. The first approach adds parallel constructs to the language,
thereby forcing the programmer to expose most of the concurrency. SystemC, BDL, and Ocapi all provide process-level parallel constructs. HardwareC, HandelC, SpecC, and Bach C additionally provide statementlevel parallel constructs. SystemCs parallelism resembles that of standard hardware description languages (HDLs) such as Verilog, in which a system is a collection of clock-edge-triggered processes. HardwareC, Handel-C, SpecC, and Bach Cs approaches are more like software, providing constructs that dispatch collections of instructions in parallel. The other approach lets the compiler identify parallelism. Although the languages that provide parallel constructs also identify some parallelism, Cones, Transmogrier C, C2Verilog, Catapult C, and CASH rely on the compiler to expose all possible parallelism. The Cones compiler takes the most extreme approach, attening an entire C function with loops and conditionals into a single two-level combinational function evaluated in parallel. The CASH compiler takes an approach closer to compilers for VLIW processors, carefully examining interinstruction dependencies and scheduling instructions to maximize parallelism. None of these compilers attempts to identify process-level parallelism. Both approaches have drawbacks. The latter approach places the burden on the compiler and therefore limits the parallelism achievable with normal, sequential algorithms. Although carefully selecting easily parallelized algorithms could mitigate this problem, such thinking is foreign to most software programmers and may be more difcult than thinking in an explicitly concurrent language. The former approach, by adding parallel constructs to C, introduces a fundamental and far-reaching change to the language, again demanding substantially different thinking by the programmer. Even for a programmer experienced in concurrent programming with, say, Posix threads, the parallel constructs in hardware-like languages differ greatly from the thread-and-sharedmemory concurrency model typical of software. A good hardware specification language must be able to express parallel algorithms, because they are the most efficient for hardware. Its inherent sequentiality and often undisciplined use of pointers make C a poor choice for this purpose. Which concurrency model the next hardware design language should employ remains an open question, but the usual software modelasynchronously running threads communicating through shared memoryis clearly not the one.
380
Timing
The C language is mute on the subject of time. It guarantees causality among most sequences of statements but says nothing about the amount of time it takes to execute each sequence. This exibility simplies life for compilers and programmers alike but makes it difficult to achieve specific timing constraints. Cs compilation technique is transparent enough to make gross performance improvements easy to understand and achieve, and differences in efciency of sequential algorithms is a well-studied problem. Nevertheless, wringing another 5% speedup from an arbitrary piece of code can be difcult. Achieving a performance target is fundamental to hardware design. Miss a timing constraint by a few percentage points and the circuit will fail to operate or the product will fail to sell. Achieving a performance target under power and cost constraints is usually the only reason to implement a particular function in hardware rather than using an off-the-shelf processor. Thus, an adequate hardware specication technique needs mechanisms for specifying and achieving timing constraints. This disparity leads to yet another fundamental question in using C-like languages for hardware design: where to put the clock cycles. Figure 8 shows a program fragment that is interpreted in at least three different ways by different compilers. Most of the compilers described here generate synchronous logic in which the clock cycle boundaries have been dened. There are only two exceptions: Cones and CASH. Cones only generates combinational logic; CASH generates self-timed logic. Compilers use various techniques for inserting clock cycle boundaries, which range from fully explicit to fully implicit. Ocapis clocks are the most explicit. The designer specifies explicit state machines, and each state gets a cycle. At some point in the SpecC refinement ow, the state machines are also explicit, although clock boundaries might not be explicit earlier in the ow. The clocks in the Cones system are also explicit, but in an odd waybecause Cones generates only combinational logic, clocks are implicit at function boundaries. SystemCs clock boundaries are also explicit; as in Cones, the clock boundaries of combinational processes are at the edges, and in sequential processes, explicit wait statements delay a prescribed number of cycles. BDL takes a similar approach. HardwareC lets the user specify clock constraints, an approach common in high-level synthesis tools. For example, the user can require that three particular statements should execute in two cycles. This presents a
for (i = 0 ; i < 8 ; i++) { a[i] = c[i]; b[i] = d[i] || f[i]; }
Figure 8. It is not clear how many cycles it should take to execute this (contrived) loop written in C. Cones does it in one (it is combinational), Transmogrier-C chooses eight (one per iteration), and Handel-C chooses 25 (one per assignment). Others, such as HardwareC, allow the user to specify the number.
greater challenge to the compiler and is sometimes more subtle for the designer, but it allows exibility that can lead to a better design. Bach C takes a similar approach. Like HardwareC, the C2Verilog compiler also inserts cycles using fairly complex rules and provides mechanisms for imposing timing constraints. Unlike HardwareC, however, these constraints are outside the language. Transmogrifier C and Handel-C use fixed implicit rules for inserting clocks. Handel-Cs are the simplest: Each assignment and delay statement takes one cycle; everything else executes in the same clock cycle. Transmogrier Cs rules are nearly as simple: Each loop iteration and function call takes a cycle. Unfortunately, such simple rules can make it difcult to achieve a particular timing constraint. To speed up a Handel-C specification, assignment statements might require fusing, and Transmogrier C might require loops to be manually unrolled. The ability to specify or constrain detailed timing in hardware is another fundamental requirement. Whereas slow software is an annoyance, slow hardware is a disaster. When something happens in hardware is usually as important as what happens. This is another big philosophical difference between software and hardware, and again hardware requires different skills. A good hardware specication language needs the ability to specify detailed timing, both explicitly and through constraints, but should not demand the programmer to provide too many details. The best-effort model of software is inadequate by itself.
Types
Data types are another central difference between hardware and software languages. The most fundamental type in hardware is a single bit traveling through a memoryless wire. By contrast, each base type in C and
381
C++ is one or more bytes stored in memory. Although Cs base types can be implemented in hardware, C has almost no support for types smaller than a byte. (The one exception is that the number of bits for each eld in a struct can be specied explicitly. Oddly, none of these languages even mimics this syntax.) As a result, straight C code can easily be interpreted as bloated hardware. Compilers take three approaches to introducing hardware types to C programs. The first, and perhaps the purest, neither modies nor augments Cs types but allows the compiler or designer to adjust the width of the integer types outside the language. For example, the C2Verilog compiler provides a GUI that lets the user set the width of each variable in the program. In Transmogrier C, the user can set each integers width through a preprocessor pragma. The second approach is to add hardware types to the C language. HardwareC, for instance, adds a Boolean vector type. Handel-C, Bach C, and BDL add integers with an explicit width. SpecC adds all these types and many others that cannot be synthesized, such as pure events and simulated time. The third approach, used by C++-based languages, is to provide hardware-like types through C++s type system. C++ supports a one-bit Boolean type by default, and its class mechanism makes it possible to add more types, such as arbitrary-width integers, to the language. The SystemC libraries include variable-width integers and an extensive collection of types for xed-point fractional numbers. Ocapi, because it is an algorithmic mechanism for generating structure, also effectively takes this approach, letting the user explicitly request wires, buses, and so on. Catapult C presumably has a similar library of hardware-like types. Each approach, however, is a fairly radical departure from Cs call-it-an-integer-and-forget-about-it approach. Even the languages that support only C types compel a user to provide each integers actual size. Worrying about the width of each variable in a program is not something a typical C programmer does. Compared with timing and concurrency, however, adding appropriate hardware types is a fairly easy problem to solve when adapting C to hardware. C++s type system is exible enough to accommodate hardware types, and minor extensions to C sufce. A larger question, which none of the languages adequately addresses, is how to apply higher-level types such as classes and interfaces to hardware description. SystemC has some facilities for inheritance, but the inheritance mechanism is simply the one used for software; it is not clear that this
mechanism is convenient for adding to or modifying the behavior of existing hardware. Incidentally, SystemC has supported more high-level modeling constructs such as templates and more elaborate communication protocols since version 2.0, but they are not typically synthesizable. A good HDL needs a rich type system that allows precise definition of hardware types, but it should also assist in ensuring program correctness. C++s type system is denitely an improvement over Cs in this regard.
Communication
C-like languages are built on the very flexible RAM communication model. They implicitly treat all memory locations as equally costly to access, but this is not true in modern memory hierarchies. At any point, it can take hundreds or even thousands of times longer to access certain locations. Designers can often predict the behavior of these memories, specifically caches, and use them more efciently. But doing so is very difcult, and C-like languages provide scant support for it. Long, nondeterministic communication delays are anathema in hardware. Timing predictability is mandatory, so large, uniform-looking memory spaces are rarely the primary communication mechanism. Instead, hardware designers use various mechanisms, ranging from simple wires to complex protocols, depending on the systems needs. An important characteristic of this approach is the need to understand a systems communication channels and patterns before it is running because communication channels must be hardwired.
The problem with pointers

Communication patterns in software are often difcult to determine a priori because of the frequent use of pointers. These are memory addresses computed at runtime, and as such are often data dependent and cannot be known completely before a system is running. Implementing such behavior in hardware mandates, at least, small memory regions. Aliasing, when a single value can be accessed through multiple sources, is an even more serious problem. Without a good understanding of when a variable can be aliased, a hardware compiler must place that variable into a large, central memory, which is necessarily slower than a small memory local to the computational units that read and feed it. One of Cs strengths is its flexible memory model, which allows complicated pointer arithmetic and essentially uncontrolled memory access. Although very useful for system programs such as operating systems, these
382
abilities make analyzing an arbitrary C programs communication patterns especially difcult. The problem is so great, in fact, that software compilers often have an easier time analyzing a Fortran program than an equivalent C program. Any technique that implements a C-like program in hardware must analyze the program to understand all possible communication pathways; resort to large, slow memories; or do some combination of the two. Smria, Sato, and De Micheli applied pointer analysis algorithms from the software compiler literature to estimate the communication patterns of C programs for hardware synthesis.18 Although this is an impressive body of work, it illustrates the difculty of the problem. Pointer analysis identies the data to which each pointer can refer, allowing memory to be divided. Solving the pointer analysis problem precisely is undecidable, so researchers use approximations. These are necessarily conservative and hence might miss opportunities to split memory regions, leading to higher-cost implementations. Finally, pointer analysis is a costly algorithm with many variants.
Communication costs
Softwares event-oriented communication style is another key difference from hardware. Every bit of data communicated among parts of a software program has a cost (that is, a read or write operation to registers or memory), and thus communication must be explicitly requested in software. Communicating the first bit is very costly in hardware because it requires the addition of a wire, but after that, communication is actually more costly to disable than to continue. This difference leads to a different set of concerns. Good hardware communication design tries to minimize the number of pathways among parts of the design, whereas good software design minimizes the number of transactions. For example, good software design avoids forwarding through copying, preferring instead to pass a reference to the data being forwarded. This is a good strategy for hardware that stores large blocks of data in memory, but is rarely appropriate in other cases. Instead, good hardware design considers alternate data encodings, such as serialization.
Languages that ignore Cs memory model dont support arrays or pointers. Instead they look only at how local variables communicate between statements. Cones is the simplest; all variables, arrays included, are interpreted as wires. HardwareC and Transmogrier C dont support arrays or memories. Ocapi also falls into this class, although arrays and pointers can assist during system construction. BDL is perhaps the richest of this group, supporting multidimensional arrays, but it doesnt support pointers or dynamic memory allocation. Languages in the second group go to great lengths to preserve Cs memory model. The CASH compiler takes the most brute-force approach. It synthesizes one large memory and puts all variables and arrays into it. The Handel-C and C2Verilog compilers can split memory into multiple regions and assign each to a separate memory element. Handel-C adds explicit constructs to the language for specifying these elements. SystemC also supports explicit declaration of separate memory regions. Other languages provide communication primitives whose semantics differ greatly from Cs memory style of communication. HardwareC, Handel-C, and Bach C provide blocking, rendezvous-style (unbuffered) communication primitives for communicating between concurrently running processes. SpecC and later versions of SystemC provide a large library of communication primitives. Again, the difference between appropriate software and hardware design is substantial. Software designers usually ignore memory access patterns. Although this can slow overall memory access speed, it is usually acceptable. Good hardware design, in contrast, usually starts with a block diagram detailing every communication channel and attempts to minimize communication pathways. So, software designers usually ignore the fundamental communication cost issues common in hardware. Furthermore, automatically extracting efficient communication structures from software is challenging because of the pointer problem in C-like languages. Although pointer analysis can help mitigate the problem, it is imprecise and cannot improve an algorithm with poor communication patterns. A good hardware specication language should make it easy to specify efcient communication patterns.
Communication approaches
The languages considered here fall broadly into two groups: those that effectively ignore Cs memory model and look only at communication through variables, and those that adopt the full C memory model.
Metadata
A high-level construct can be implemented in many different ways. However, because hardware is at a far lower level than software, there are many more ways to
383
Table 2. The big challenges for hardware languages. Challenge Concurrency model Specifying timing Types Communication patterns Hints and constraints Comment Specifying parallel algorithms How many clock cycles? Need bits and bit-precise vectors Need isolated memories How to implement something
implement a particular C construct in hardware. For example, consider an addition operation. A processor probably has only one useful addition instruction, whereas in hardware there are a dizzying number of different adder architecturesfor example, ripple carry, carry look-ahead, and carry save. The translation process for hardware therefore has more decisions to make than translation for software. Making many decisions correctly is difcult and computationally expensive. Furthermore, the right set of decisions varies with design constraints. For example, a designer might prefer a ripple-carry adder if area and power are at a premium and speed is a minor concern, but a carry-look-ahead adder if speed is a greater concern. Much effort has gone into improving optimization algorithms, but it remains unrealistic to expect all these decisions to be automated. Instead, designers need mechanisms that let them ask for exactly what they want. Such designer guidance takes two forms: manual rewriting of high-level constructs into the desired lower-level ones (for example, replacing a + operator with a collection of gates that implement a carry-lookahead adder) or annotations such as constraints or hints about how to implement a particular construct. Both are common RTL design approaches. Designers routinely specify complex data paths at the gate level instead of using higher-level constructs. Constraint information, often supplied in an auxiliary le, usually drives logic optimization algorithms. Although it might seem possible to use C++s operator-overloading mechanism to specify, for example, when a carry-look-ahead adder should implement an addition, using this mechanism is probably very difficult. C++s overloading mechanism uses argument types to resolve ambiguities, which is natural when you want to treat different data types differently. But the choice of algorithm in hardware is usually driven by resource constraints (such as area or delay) rather than data representation (although, of course, data representation does matter). Concurrency is the fundamental problem.
In software, there is little reason to have multiple implementations of the same algorithm, but it happens all the time in hardware. Not surprisingly, C++ doesnt support this sort of thing. The languages considered here take two approaches to specifying such metadata. One group places it within the program itself, hiding it in comments, pragmas, or added constructs. The other group places it outside the program, either in a text file or in a database populated by the user through a GUI. C has a standard way of supplying extra information to the compiler: the #pragma directive. By denition, a compiler ignores such lines unless it understands them. Transmogrifier C uses the directive to specify integer width, and Bach C uses it to specify timing and mapping constraints. HardwareC provides three language-level constructs: timing constraints, resource constraints, and arbitrary string-based attributes, whose semantics are much like a C #pragma. BDL has similar constructs. SpecC takes the other approach; many tools for synthesizing and rening SpecC require the user to specify, using a GUI, how to interpret various constructs. Constructs such as addition, which are low level in software, are effectively high level in hardware. Thus, there must be a mechanism for conveying designer intent to any hardware synthesis procedure, regardless of the source language. A good hardware specication language needs a way of guiding the synthesis procedure to select among different implementations, trading off between, say, power and speed.
WHY BOTHER generating hardware from C? It is clearly not necessary, because there are many excellent processors and software compilers, which are certainly the cheapest and easiest way to run a C program. So why consider using hardware? Efciency is the logical answer. Although general-purpose processors get the job done, well-designed customized hardware can always do it faster, using fewer transistors and less energy. Thus, the utility of any hardware synthesis procedure depends on how well it produces efcient hardware specialized for an application. Table 2 summarizes the key challenges of a successful hardware specication language. Concurrency is fundamental for efcient hardware, but C-like languages impose sequential semantics and require the use of sequential algorithms. Automatically exposing concurrency in sequential programs is limited in effectiveness, so a successful language requires explicit concurrency, something missing from most
384
C-like languages. Adding such a construct is easy, but teaching software programmers to use concurrent algorithms is difcult. Careful timing design is also required for efficient hardware, but C-like languages provide essentially no control over timing, so the language needs added timing control. The problem amounts to where to put the clock cycles, and the languages offer a variety of solutions, both implicit and explicit. The bigger problem, though, is changing programmer habits to consider such timing details. Using software-like types is also a problem in hardware, which wants to manipulate individual bits for efciency. The problem is easier to solve for C-like languages. Some languages add the ability to specify the number of bits used for each integer, for example, and C++s exible type system allows hardware types to be dened. The type problem is the easiest to address. Communication also presents a challenge. Cs exible global-memory communication model is not efficient for hardware. Instead, memory should be broken into smaller regions, often as small as a single variable. Compilers can do so to a limited degree, but efciency often demands explicit control over this. A fundamental problem, again, is that C programmers generally dont worry about memory, and C programs are rarely written with memory behavior in mind. A high-level HDL must let the designer provide constraints or hints to the synthesis system because of the wide semantic gap between a C program and efcient hardware. There are many ways to implement a construct such as addition in hardware, so the synthesis system needs a way to select an implementation. Constraints and hints are the two main ways to control the algorithm, but standard C has no such facility. Although presenting designers with a higher level of abstraction is obviously desirable, presenting them with an inappropriate level of abstractionone in which they cannot effectively ask for what they wantis not much help. Unfortunately, C-like languages, because they provide abstractions geared toward the generation of efcient software, do not naturally lend themselves to the synthesis of efcient hardware. The next great hardware specification language wont closely resemble C or any other familiar software language. Software languages work well only for software, and a hardware language that does not produce efficient hardware is of little use. Another important issue will be the languages ability to build systems from existing pieces (known as IP-based design), which none
of these languages addresses. This ability appears necessary to raise designer productivity to the level needed for the next generation of chips. Looming over all these issues, however, is verication. What we really need are languages that let us create correct systems faster by making it easier to check for, identify, and correct mistakes. Raising the abstraction level and facilitating efcient simulation are two well-known ways to achieve this, but are there others?
Acknowledgments
Edwards is supported by the National Science Foundation, Intel, Altera, the SRC, and New York States NYSTAR program.
References
1. Handel-C Language Reference Manual, RM-1003-4.0, Celoxica, 2003. 2. K. Wakabayashi and T. Okamoto, C-Based SoC Design Flow and EDA Tools: An ASIC and System Vendor Perspective, IEEE Trans. Computer-Aided Design of Inte-
grated Circuits and Systems, vol. 19, no. 12, Dec. 2000,
pp. 1507-1522. 3. D.M. Ritchie, The Development of the C Language,
History of Programming Languages-II, T.J. Bergin Jr.

and R.J. Gibson Jr., eds., ACM Press and Addison-Wesley, 1996. 4. G. De Micheli, Hardware Synthesis from C/C++ Models, Proc. Design, Automation and Test in Europe (DATE 99), IEEE Press, 1999, pp. 382-383. 5. C.E. Stroud, R.R. Munoz, and D.A. Pierce, Behavioral Model Synthesis with Cones, IEEE Design & Test, vol. 5, no. 3, July 1988, pp. 22-30. 6. D.C. Ku and G. De Micheli, HardwareC: A Language for
Hardware Design, Version 2.0, tech. report CSTL-TR90-419, Computer Systems Lab, Stanford Univ., 1990. 7. G. De Micheli et al., The Olympus Synthesis System,
IEEE Design & Test, vol. 7, no. 5, Oct. 1990, pp. 37-53.
8. D. Galloway, The Transmogrier C Hardware Description Language and Compiler for FPGAs, Proc. Symp.
FPGAs for Custom Computing Machines (FCCM 95),

IEEE Press, 1995, pp. 136-144. 9. T. Grtker et al., System Design with SystemC, Kluwer Academic Publishers, 2002. 10. P. Schaumont et al., A Programming Environment for the Design of Complex High Speed ASICs, Proc. 35th
Design Automation Conf. (DAC 98), ACM Press, 1998,

pp. 315-320. 11. Y. Panchul, D.A. Soderman, and D.R. Coleman, System
for Converting Hardware Designs in High-Level
385
Programming Language to Hardware Implementations,

US patent 6,226,776, Patent and Trademark Ofce, 2001. 12. D.D. Gajski et al., SpecC: Specication Language and
Structures, IEEE Trans. Very Large Scale Integration
(VLSI) Systems, vol. 9, no. 6, Dec. 2001, pp. 743-756.
Methodology, Kluwer Academic Publishers, 2000.

13. R. Dmer, A. Gerstlauer, and D. Gajski, SpecC
Language Reference Manual, Version 2.0, SpecC Consortium, 2001. 14. T. Kambe et al., A C-Based Synthesis System, Bach, and Its Application, Proc. Asia South Pacic Design
Automation Conf. (ASP-DAC 01), ACM Press, 2001, pp.

151-155. 15. M. Budiu and S.C. Goldstein, Compiling ApplicationSpecic Hardware, Proc. 12th Intl Conf. Field-Program-
mable Logic and Applications (FPL 02), LNCS 2438,

Springer-Verlag, 2002, pp. 853-863. 16. D.W. Wall, Limits of Instruction-Level Parallelism, Proc.
Stephen A. Edwards is an associate professor in the Computer Science Department of Columbia University. His research interests include embedded-system design, domain-specific languages, and compilers. Edwards has a BS from the California Institute of Technology and an MS and a PhD from the University of California, Berkeley, all in electrical engineering. He is an associate editor of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. He is a senior member of the IEEE. Direct questions and comments about this article to Stephen A. Edwards, Dept. of Computer Science, Columbia University, 1214 Amsterdam Ave. MC 0401, New York, NY 10027; sedwards@cs.columbia.edu.
4th Intl Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS 91), Sigplan
Notices, vol. 26, no. 4, ACM Press, 1991, pp. 176-189. 17. M.W. Hall et al., Detecting Coarse-Grain Parallelism Using an Interprocedural Parallelizing Compiler, Proc.
Supercomputing Conf., IEEE Press, p. 49.

18. L. Smria, K. Sato, and G. De Micheli, Synthesis of Hardware Models in C with Pointers and Complex Data
ADVERTISER INDEX SEPTEMBER/OCTOBER 2006

FUTURE ISSUE:
Advertising Personnel Marion Delaney IEEE Media, Advertising Director Phone: +1 415 863 4717 Email: md.ieeemedia@ieee.org Marian Anderson Advertising Coordinator Phone: +1 714 821 8380 Fax: +1 714 821 4010 Email: manderson@computer.org Sandy Brown IEEE Computer Society, Business Development Manager Phone: +1 714 821 8380 Fax: +1 714 821 4010 Email: sb.ieeemedia@ieee.org
November-December 2006
Process Variation and Stochastic Design and Test

Advertising close date: 01 October 06
Advertising Sales Representatives
Product Display
John Restchack Phone: +1 212 419 7578 Fax: +1 212 419 7589 Email: j.restchack@ieee.org
New England (recruitment) John Restchack Phone: +1 212 419 7578 Fax: +1 212 419 7589 Email: j.restchack@ieee.org Southeast (recruitment) Thomas M. Flynn Phone: +1 770 645 2944 Fax: +1 770 993 4423 Email: flynntom@mindspring.com
Midwest/Southwest (recruitment) Darcy Giovingo Phone: +1 847 498-4520 Fax: +1 847 498-5911 Email: dg.ieeemedia@ieee.org Northwest/Southern CA (recruitment) Tim Matteson Phone: +1 310 836 4064 Fax: +1 310 836 4067 Email: tm.ieeemedia@ieee.org
Japan (recruitment) Tim Matteson Phone: +1 310 836 4064 Fax: +1 310 836 4067 Email: tm.ieeemedia@ieee.org Europe (recuirtment) Hilary Turnbull Phone: +44 1875 825700 Fax: +44 1875 825701 Email: impress@impressmedia.com
Recruitment Display
Mid Atlantic (recruitment) Dawn Becker Phone: +1 732 772 0160 Fax: +1 732 772 0164 Email: db.ieeemedia@ieee.org
A different view: Hardware synthesis from SystemC is a maturing technology

John Sanguinetti Forte Design Systems
IN THE CHALLENGES of Synthesizing Hardware from C-Like Languages, Stephen Edwards has provided a good survey of the many attempts to adapt C to hardware creation. His thesis, that pure C is a poor choice for specifying hardware, was recognized by all the people doing this work. Unfortunately, he does not recognize the evolution of these C variations. The last in this line, SystemC, has satisfactorily addressed all the language issues. Edwards acknowledges that using pragmas to direct the synthesis process is a satisfactory way to provide the necessary additional information for efcient hardware creation, yet he criticizes the language for not providing a different means of doing so. In Table 2 of the article, two of the ve listed challenges are language issues (concurrency and data types), and three are synthesis issues. Edwards acknowledges that SystemC adequately solves the concurrency model and types challenges but seems unaware that existing modern synthesis products have solved the other threespecifying timing, communication patterns, and hints and constraintswith pragmas.
ucts results from starting at a higher abstraction level, not from the language.
Little trouble for competent hardware designer

Edwards says, My main point is that giving C programmers tools is not enough to turn them into reasonable hardware designers. This statement is unarguably true. Giving people C compilers is not enough to turn them into reasonable programmers either. Tuning code for performance has long been recognized as a separate skill, closely related to the underlying target processor. For efcient performance, vector, SIMD (single instruction, multiple data), SMP (symmetric multiprocessing), and VLIW (very long instruction word ) machines all require special techniques, encompassing both coding style and pragmas. It should surprise no one that when the underlying target processor is raw gates, additional skill and knowledge are required. In fact, a competent hardware designer has little trouble creating efcient hardware using SystemC and a modern synthesis product.
Problems have largely been solved

Confusing language with the synthesis process, Edwards comes to the conclusion that C-like languages do not naturally lend themselves to the synthesis of efcient hardware. That is simply wrong. Commercial SystemC synthesis tools routinely produce more efficient hardware than handwritten RTL code typically produces. Edwards argues that properties of C-like languages make this synthesis process computationally hard and time-consuming. Although some of the properties he has cited do make synthesis more difficult, those problems have largely been solved. Fundamentally, the complexity imposed on these synthesis prodIN THE END, though, Edwards thesis is beside the
point. As IC capacity increases, it is becoming routine to implement increasingly larger algorithms in hardware, for the performance and efficiency reasons Edwards cites. Those algorithms nearly always start out in C or C++. It is far better to operate on the original version directly than to manually translate it to a different language before beginning to transform it into hardware. Recognition of this fact has motivated most of the efforts surveyed here. Sure, there are challenges, but the benets are worth it.
0740-7475/06/$20.00 2006 IEEE
387
ITC Special Section
Guest Editors Introduction: ITC Helps Get More out of Test

Kenneth M. Butler
Texas Instruments
THIS SPECIAL SECTION of IEEE Design & Test of Computers, along with the International Test Conference 2006, highlights the value that test adds to the electronics manufacturing business. It leads us to think about test in a whole new way. The theme for ITC 2006 is Getting More out of Test, which is very appropriate in light of recent advances and changes in our industry. These days, everybody is talking about things like design for manufacturability (DFM), yield enhancement technologies, test-based outlier techniques, and the like. Based on these concepts, whole companies have been founded and have prospered, such as PDF Solutions, whose CEO was keynote speaker at ITC 2005. What makes these developments truly exciting is the role test plays in all of these new technologies. Test is truly the cornerstone on which the disciplines of yield and reliability engineering are built. And were not just talking about characterization test or an occasional product lot, but large production volumes analyzed with new and ever-more powerful data mining and data reduction techniques. We have also had to rethink what it means for a die, chip, board, or system to pass or fail a test. In the early days, particularly for digital products, we could always devise a test whose results were clear indicators of good or bad units. Yes, there was (and is) the perennial question of the tests coverage or thoroughness. But, that aspect related more to the effort level expended to incorporate good test-access mechanisms into the design and less to the technology in which the product was manufactured. Today, however, we see ample evidence of electronics failure mechanisms increasingly subtle nature. We can view this problem from two perspectives: the time zero or test escape question, and the separate but equally important reliability aspect.
A good example of the former is the relatively recent proliferation of fault models and test approaches that various groups are advocating. Everybody continues to rely on the workhorse stuck-at fault model for bulk static defect coverage. But how long will that strategy continue to work for us? At what point must we supplement, or dare I say replace, stuck-at testing with other candidate test techniques such as N-detect tests, extracted bridging fault tests, or other nontraditional forms of testing? Authors in this magazine, at ITC, and at other venues continue to grapple with this question. On the reliability side, the underlying mechanisms, such as channel hot carrier (CHC) effects and negative bias temperature instability (NBTI), have always been there. We have known about them for decades, but their impact on quality and product lifetime was relatively invisible to us. Unfortunately, that statement is no longer true. NBTI and other reliability mechanisms degrade product lifetime and performance and demand that we add margins for their occurrence. So, again, we must call on test to help us identify these problems when they occur, quantify the magnitude of the yield/reliability impact, and screen the material before it gets into the consumers hands. Overall, therefore, we can see that test must play an ever-more-important role in more aspects of the electronics business. The first article in this special section, Extracting Defect Density and Size Distributions from Product ICs by Jeffrey Nelson et al., is a classic example of learning all you can about the manufacturing process via production test. Today, the cost to construct and populate an IC wafer fabrication facility is measured in billions of dollars, and the cost of a mask set in an advanced technology is approaching or can exceed $1 million. The inevitable outcome of these spiraling costs is that
388
0740-7475/06/$20.00 2006 IEEE
fewer companies can afford to maintain captive IC manufacturing sites and thus are moving to fabless, foundrybased business models. But how do you learn and respond to important yield and defect Pareto information when design and manufacturing are in two completely separate companies, often geographically distant from each other, without having to devote costly wafer volume to test vehicles? This article addresses that important and timely question. Improving Transition Delay Test Using a Hybrid Method by Nisar Ahmed and Mohammad Tehranipoor deals with the increasingly complex subject of delay test. Starting somewhere around the 130-nm technology node, and perhaps spurred by the advent of copper metallization, delay defects suddenly became something that, left untested, could result in too large an escape rate as seen by the customer. The industry responded in earnest by applying delay test techniques to large numbers of production ICs. Immediately, users of this technology discovered issues with things like pattern volume, realizable coverage, and test generation tool runtimes. This article is an example of the types of new thinking being applied to this problem to make delay test more tractable and more usable, thus getting more out of it. The final article, Impact of Thermal Gradients on Clock Skew and Testing by Sebasti Bota et al., in some sense turns the ITC theme on its ear. To get more out of test, we must fundamentally understand not only its capabilities but also its limitations. As die sizes grow increasingly larger and clock rates continue to climb, so, too, do power requirements, driving die temperatures higher as well. Within-die thermal gradients can have negative effects on timing and clocking, which degrade testings accuracy and results. This article systematically examines the issue of thermal effects, introduces a methodology for quantifying them, and proposes a design technique for counteracting them.
these, interact with their authors, visit the exhibits oor and see the new products that leverage the best test has to offer, and, most importantly, share your thoughts and ideas on how we can get more out of test. I would like to take this opportunity to thank Editorin-Chief Tim Cheng and the entire IEEE D&T editorial staff for their encouragement and assistance in producing this special issue.
Kenneth M. Butler is a TI Fellow at Texas Instruments in Dallas. His research interests include outlier techniques for quality and reliability and test-data-driven decision making. Butler has a BS from Oklahoma State University and an MS and a PhD from the University of Texas at Austin, all in electrical engineering. He was the program chair of ITC 2005 and currently serves on the program and steering committees. He is a Senior Member of the IEEE and a member of the ACM. Direct questions and comments about this special section to Kenneth M. Butler, Texas Instruments, 13121 TI Boulevard, MS 366, Dallas, TX 75243; kenb@ti.com.
JOIN A THINK TANK

L
ooking for a community targeted to your area of expertise? IEEE Computer Society Technical Committees explore a variety of computing niches and provide forums for dialogue among peers. These groups inuence our standards development and offer leading conferences in their elds.
TAKEN AS A WHOLE, the articles demonstrate the
changing role of test in the entire electronics industry and how its not just for pass/fail anymore. Contributors to ITC, IEEE Design & Test, and numerous other IEEE test conferences and workshops are continually inventing and demonstrating new ways in which the test process can increase our rate of product and process learning, speed products to yield and reliability entitlement, and generally contribute more to our collective bottom line. I hope that this information will inspire you to come to ITC, see the presentations of articles like
Join a community that targets your discipline. In our Technical Committees, youre in good company.
www.computer.org/TCsignup/
389
ITC Special Section
Extracting Defect Density and Size Distributions from Product ICs

Jeffrey E. Nelson, Thomas Zanon, Jason G. Brown, Osei Poku, R.D. (Shawn) Blanton, and Wojciech Maly
Carnegie Mellon University
Brady Benware and Chris Schuermyer

LSI Logic
Editors note: Defect density and size distributions are difficult to characterize, especially if you have little or no access to test vehicles specifically designed for the purpose. The authors propose a new methodology for extracting that information directly from production test data on actual products. Ken Butler, Texas Instruments
itselfrather than memory or specialized test structures that waste silicon area and often do not represent the products design style.
Background
DEFECTS FREQUENTLY OCCUR during IC manufacture. Modeling the resulting yield loss is an important part of any design-for-manufacturability strategy. Of the many mechanisms that cause yield loss, some have sufciently accurate models and are well understood, whereas others are unpredictable and difcult to characterize. Current yield-related research focuses mainly on systematic defects. In contrast, this article addresses random spot defects, which affect all processes and currently require a heavy silicon investment to characterize. We propose a new approach for characterizing random spot defects in a process. This approach enables accurate measurement of parameters for the critical-area yield modelthe workhorse of modern yield-learning strategies. IC manufacturers often neglect the need to tune the yield modelthat is, to continuously update yield model parametersbecause of the silicon area required to characterize a process. But the inherently stochastic nature of yield makes frequent process characterization necessary for accurate yield models. We present a system that overcomes the obstacle of silicon area overhead by using available wafer sort test results to measure critical-area yield model parameters. We use only wafer sort test results, so no additional silicon area is required. Our strategy uses the most realistic characterization vehicle for the product ICthe product
Defect density and size distributions (DDSDs) are important parameters for characterizing spot defects in a process. A DDSD tells us what the defect density is for a given defect radius that is, the number of defects per unit area. The distribution gives this information for all defect radii. Typically, though, as defect radius increases, defect density quickly decreases. Thus, we can generally curtail the distribution and measure only defect density for a range of defect radii, because larger defects have a density approaching zero. This inherent feature becomes useful in attempting to discretize the DDSD. We can subdivide the distributions characterizing a process beyond defect size. Each metal layer of the process can potentially have a different DDSD. Ideally, wed like to measure each layers DDSD rather than attempt to characterize all layers simultaneously with a single distribution. These distributions are parameters for the critical-area yield model.1-3 IC manufacturers measure DDSDs primarily with specialized test structures on a wafer. Test structures contain geometries specically designed to observe defects. When a defect occurs in a particular region of a test structure, that structure observes the defect, making it easy for the process engineer to identify what the defect mechanism is, where it occurred, and to learn about the defects size. The price we pay for this convenience is that test structures consume silicon area on the wafer.
390
0740-7475/06/$20.00 2006 IEEE
Thus, test structures impose a trade-off between area cost and defect observability. Consider the three wafers in Figure 1. In Figure 1a, the entire wafer is dedicated to test structures. This conguration allows excellent defect observability, but the obvious drawback is that no product can be manufactured from itproduct volume is zero. Manufacturers typically use a full wafer of test structures only during the earliest yield-learning phase, when the yield improvement realized from these structures signicantly outweighs manufacturing cost. In Figure 1b, products have replaced many of the test structures, raising volume to a medium level. However, observability has decreased because now there is a signicant amount of area where defects can occur with no direct ability to characterize them. The wafer in Figure 1b also contains test structures in the scribe lines. This configuration is a compromise between defect observability and volume. Manufacturers typically use it during yield ramp, when volume is necessary, but the ability to characterize defectsparticularly systematic defectsis still required. Finally, the wafer conguration shown in Figure 1c uses the entire silicon area to manufacture products. The scribe lines still contain test structures because they dont affect product volume. As in the Figure 1b configuration, this configuration provides limited area to observe defects, but it is even more extreme because it relegates the test structures to the scribe lines. This conguration is used most during the volume phase of yield ramp, when characterization of random spot defects is most important for predicting yield. The observability-versus-area trade-off has led to research that seeks the best of both worlds: high observability and low (or no) area overhead. In particular, researchers have used SRAMs to extract DDSDs.4 This technique requires no additional overhead, because the characterization vehicle (the SRAM) is a useful product itself. SRAMs, however, have undesirable characterization characteristics, such as confinement to a few metal layers, which limits the scope of observable defects. SRAMs extremely regular structure means that if the replicated cell has a narrow scope of geometric features for defect observation, this limitation will extend over the entire chip. These limitations are only noteworthy when the memories are extracting DDSDs for yield-loss prediction for random-logic circuits. A preferable defect characterization vehicle in such cases is a random-logic product. Other researchers have suggested using a random-
(a)
(b)
(c)
Figure 1. Wafers with different test structure congurations and varying levels of defect observability (gray areas and scribe lines represent test structures): all test structures and no products (a), some test structures replaced by products (b), and entire area used for products, with test structures in scribe lines only (c).
logic product to estimate the defect pareto in a process using only test results.5 That work, in conjunction with the SRAM work, inspired the initial idea that we could extract a DDSD for each process layer using a randomlogic product IC as a virtual test structure.6 The rst publication describing an investigation of this idea appeared in March 2006.7 Here, we elaborate on that publication and present new findings from an experiment conducted on test data from silicon wafers provided by LSI Logic.
Proposed approach
Our system accurately characterizes spot defects that contribute to yield loss by measuring defect density in each metal IC layer, without the silicon overhead required by current techniques. The various geometries and line spacing in a typical layout lead to defects of different sizes with varying effects on the IC (some small
391
ITC Special Section
E1
E4
E6 b
technique that relates the analyzed ICs test responses to defect characteristics that could cause such test responses. We will describe two mappings: one between defect probabilities and fault probabilities, and one between faults and test responses.
E3 b E2 (a) c E5
Microevents and macroevents

A spot of extra conducting material deposited in a metal layer can introduce an extra, unwanted bridge connection between nonequipotential metal regions in the layer. In most cases, a bridge will affect the circuits electrical behavior. An instance of a bridge that connects two or more nonequipotential metal islands is called a microevent.4 Each microevent involves a set of circuit nodes, S = {n1, n2, , nm}, that are bridged by the spot defect of a specific radius. We can calculate the probability of a single, independent microevent using the critical-area yield model.7 Equation 1 shows the probability that microevent i will occur, where Ci is the microevents critical area, and Dj(ri) is the defect density for defects of radius ri (the same radius as microevent i) in layer j, the layer in which microevent i occurs. pi = e Ci D j ( ri )
(b)
Figure 2. Sample layout with six microevents: four in metal layer 1 (a), and two in metal layer 2 (b). Microevents E1 to E3 have radius r1 (solid boxes) and E4 to E6 have radius r2 (dashed boxes), where r1 < r2. Spot defects are circles.
defects may have a negligible impact). Therefore, in addition to defect density, we must measure the distribution of defect sizes. The strategy for achieving this goal is straightforward.6-8 By nature, each spot defect affects only a small subset of nodes in close proximity to one another. Each spot defect leads to a unique, defective circuit response. Likewise, given a circuit response, there are some potential spot defects that cause that response. Using results from structural testing, we can estimate the probability of a particular circuit response and consequently the probabilities of defect occurrence. By grouping responses according to specic characteristics, such as the size of a defect necessary to cause that circuit response, we can determine the occurrence probabilities of defects of that size. Using a modeling strategy to predict faulty circuit responses as a function of defect characteristics in the process, we can mathematically derive defect characteristics that minimize the difference between the modeled test response probabilities and the estimated test response probabilities. Thus, the calculated defect characteristics must represent the actual defect characteristics in the process. Of course, for this to be true, certain conditions must be met. We propose a defect characterization methodology based on this concept. That is, we develop and apply a modeling strategy that predicts probabilities of test responses depending on a DDSD, and then we find the DDSD that leads to agreement between circuit test responses measured by a tester and test responses predicted by the model. To accomplish this, we have developed a modeling
(1)
Here, we dene microevent Ei as a bridge, thus limiting our scope to spot defects causing bridges. We do this for two reasons: First, it is important that the physics of the investigated yield loss mechanism be well understood, which is indeed the case for bridges. Second, spots of extra conducting material are still a major reason for IC malfunctions in many processes. An ICs vulnerability to random spot defects greatly depends on the layout. The critical-area concept was developed to provide a metric of design sensitivity to defects.1,9 Critical area is the layout region where, if a spot defect of radius r occurs, a circuit can fail. Figure 2 shows a small portion of a sample layout with signal lines in metal 1 and metal 2. The figure illustrates six microevents: four in metal 1 and two in metal 2. Four sample spot defects demonstrate how a microevent can occur. Each microevent has an associated critical area for a specic defect radius. For example, microevents E1 to E3 have critical area for a defect of radius r1, represented by the solid boxes associated with each microevent label. Likewise, microevents E4 to E6 have critical area for radius r2, represented by the dashed boxes. This example shows that even within a single metal layer, microevents involving the same circuit node set S can occur in several discrete regions. In this
392
case, S = {b, c}. Each discrete region of critical area represents a separate microevent. In addition, microevents involving the same set of circuit nodes can exist in different metal layers. Critical-area measurement occurs in steps. First, we measure critical area for all potential microevents in a layout for a given radius, rstart. In each subsequent step, the defect radius is incremented by a small amount and the rst step repeated for the new radius. This process repeats, continuing over a specified range of defect radii until reaching rend. We can now define a macroevent as the set of all microevents that exist for the same set of circuit nodes S. As mentioned, many microevents involving S can exist in different layers for different defect radii. So, a collection of independent microevents describes each macroevent. Figure 2 shows a single macroevent, occurring between lines b and c, which consists of microevents 1 through 6. Because a macroevent is a set of independent microevents, the probability of a macroevent involving S is one minus the product of the probabilities of each microevent involving S not occurring. Thus, in this example, the probability of the macroevent involving b and c occurring is one minus the product of the probabilities of each of the six microevents not occurring. Critical-area extraction for a range of defect radii provides a list of microevents and their associated critical areas. With those measurements, we can calculate microevent probabilities, and thus macroevent probabilities, as a function of defect densities. Because a macroevent represents a multiline bridge, we have in fact extracted a list of potential bridge defects along with their occurrence probability. This results in the rst mapping between defects and faults.
Logic-level modeling
The nal modeling stage necessary for mapping defect characteristics to test responses is a mapping between the macroevent list and the test responses. This mapping is embodied by the T matrix, which we calculate by simulating the entire test set against each macroevent. Because simulation time for a large number of macroevents (even a small circuit can have hundreds of thousands) can be enormous, we model them as logiclevel faults, making efcient simulation possible. To maintain accuracy when simulating at the logic level, we rst derive an accurate gate-level model of the circuit. Typical standard-cell representations obscure the cells internal workings, causing the omission of impor-
tant signal lines from the logic-level netlist. This netlist includes only standard-cell ports, even if the standard cell contains several CMOS logic gates. Therefore, we map a standard-cell layout to a logic-level description that captures the structure of static CMOS gates in the cell, using the gate primitives NAND, NOR, and NOT. This change lets us consider gate outputs routed in metal 1 in a standard cell during microevent extraction and tie them to logic signals in the netlist. An AND-gate standard cell illustrates this issue. Typically, an AND gate is implemented in CMOS by a NAND gate followed by an inverter, with the connection between the two routed in metal 1. Microevents involving the internal metal 1 routing might occur, but without the layout-to-logic mapping used here, we have no basis for forming a logic-level fault model that includes this metal line. With our mapping, we can efciently handle critical area that involves all metal lines in a standard cell (which can account for a signicant portion of the chips total critical area). However, some standard cells might still contain metal structures that are not mapped to the logic level. These polygonal structures are metal lines that dont correspond to a CMOS logic gates output (these structures do not include power and ground, which easily map to logic 1 and 0). They are typically in complex CMOS gates such as AND-OR-INVERT gates, multiplexers, and other complex logic functions. Although we could ignore macroevents involving these polygons, they will become an additional source of error. We developed a technique to handle the polygons by mapping their logic functions to standard cell ports, and we used this technique in the silicon experiment that we describe later. The extracted macroevents represent bridges that can involve two or more signal lines. Test engineers commonly use bridge faults10 to model two-line bridge defects, but because macroevents can involve more than two lines, more-advanced fault models are necessary. We use the voting-bridge fault model,11 in which pull-up and pull-down network drive strengths determine the erroneous lines. We form a voting model for each macroevent by separately summing the drive strengths of all lines in the macroevent driven to logic 0 and logic 1. We then compare the two sums to determine which logic value will be imposed on the other lines. An error occurs on each line with the weaker logic value. To implement the voting model described here, we use fault tuples, a generalized fault representation mechanism.12 Despite the
393
ITC Special Section
complex models we use, the behavior of real spot defects is unpredictable and therefore can be a source of error. To simulate the macroevents modeled as votingbridge faults, we use FATSIM, a concurrent fault simulator for fault tuples.12 To determine which test vectors detect which macroevents, we use no fault dropping during simulation. The resulting data is stored in the T matrix, which has the following form: t11 , T = M t V ,1 t1,2 tV ,2 L t1,M M L tV ,M
cretely using some number of points, we can concatenate all the DDSDs defect densities into a single vector. The linear regressions output will be this vector, which can then be split into a DDSD for each metal layer. We present a detailed mathematical description of these steps elsewhere.9,10
Simulation experiment
To evaluate the proposed approach, we performed an experiment based on a simulated, articial process. We assumed DDSDs for each layer of the artificial process and inserted defects into the process based on these distributions. We measured the estimated yield per test vector by emulating a tester. We then applied the DDSD extraction strategy to the circuit and compared the extracted DDSDs with the inserted DDSDs.
where V is the number of test vectors simulated, M is the total number of macroevents, and ts,i is a 1 (0), indicating that macroevent i is detected (undetected) by test vector s. The T matrix provides the mapping between logic-level faults and circuit test responses. We have veried qualitatively that an inaccurate T matrix can signicantly decrease the overall accuracy of our DDSD extraction approach. When we use a random T matrix, the resulting DDSDs have no resemblance to the expected distribution. Therefore, it is critical that macroevents be modeled precisely and simulated correctly; otherwise, the T matrixs quality will be questionable. Simulation techniques that are more detailed than a logic-level model (for example, transistor-level Spice simulation) could possibly lead to greater accuracy, but they would increase the required simulation time considerably.
Demonstration circuit
For this experiment, we used circuit c3540 from the 1985 International Symposium on Circuits and Systems (ISCAS) benchmark suite.13 We logically optimized the c3540 implementation and technology-mapped it to a 0.18-micron commercial standard-cell library. The nal layout was routed in ve metal layers and used approximately 100 m 100 m of area. In modern manufacturing processes, a design of this size would typically be free of defects because of relatively low defect densities. To ease the simulation burden, we assumed that a single die consisted of 10,000 parallel instances of c3540, with each instance retaining its original controllability and observability. As a result, each die had an area of approximately 1 cm2 and could still be tested with a test set for a single instance of c3540. Although this die had a total critical area comparable to typical chips, it lacked the diverse geometrical features that a die would normally exhibit. However, the impact of design diversity on the DDSD extraction technique was not the experiments focus. After preparing the demonstration circuit, we extracted macroevents, modeled them using fault tuples, and simulated them with FATSIM to generate the T matrix. The production test set consisted of 155 stuck-at test patterns. During macroevent extraction, we determined critical area for a range of defect sizes to build a critical-area function for each macroevent. For metal layers 1 through 4, the critical-area function domain was 0.2 micron to 2 microns, and for metal layer 5, it was 0.34 micron to 2 microns, with samples spaced at 50-nm intervals. This resulted in 182 critical-area points. We determined the limits on the basis of minimum line spacing for the
DDSD extraction
As discussed earlier, we can measure DDSDs by minimizing the difference between the predicted and the observed probability of passing tests (yield per test). We have described the various components necessary to predict probability pi of test i passing. We adapt the critical-area yield model for this task, using critical-area functions of macroevents, and the DDSD per layer as parameters of the model. After measuring the T matrix and critical-area functions of macroevents, the DDSDs are the only unknown parameters of the model. We can easily measure observed yield per test pi from tester results as the ratio of the number of chips that pass test i to the total number of chips manufactured. We can find the DDSDs that minimize the error between pi and pi by using linear regression. The key idea is to abandon the concept of individual DDSDs per layer. Because we will capture each distribution dis-
394
lower bound and selected the upper bound to capture a sufcient portion of the DDSDs tail. Figure 3 shows the total discretized critical-area function (sum of critical area functions of all microevents involving the layer) for each of the five metal layers for one instance of c3540.
12,000 1 2 10,000 3 4 5 Metal

182 bins 19 bins
Tester emulation
In the proposed DDSD extraction 6,000 methodology, we measure the yield per test from the structural test results of a 4,000 real tester. In the simulation experiment, we substituted tester emulation for actual test results. We generated defects 2,000 according to a stochastic Poisson process in which each potential defect is an independent event. The assumed 0 DDSD followed the well-known power 0.2 0.3 0.5 1 2 law, with the defect densities shown in Defect radius (m) Table 1. We increased defect densities to levels well beyond realistic figures to reduce the simulation time required for Figure 3. Critical-area functions (white symbols) extracted from all metal test emulation. layers of a single instance of circuit c3540 from the ISCAS 85 benchmark We consider each macroevents suite. Black symbols represent critical-area functions after combining a occurrence an independent Poisson range of defect sizes. process because we assume that each defects occurrence is independent of all others. As a result, each macroevent occurs with a freTable 1. Injected defect density and size distributions quency dictated by a Poisson process at a rate deter(DDSDs) following the power law distribution, with power mined from the critical-area function of the macroevent parameter p and peak-probability parameter X0 = 0.05 m and the DDSDs. Table 2 shows the percentage of dies for each metal layer. D0 [cm2] represents defect density. containing zero, one, two, or three macroevents in a Metal layer sample size of 50,000 for this experiment. From this Parameter 1 2 3 4 5 table, we reach two conclusions:
D0 (cm2)
Critical area (m2)
8,000
1 3
2 4
2 3
1 2
3 3
Because the occurrence rates of the number of macroevents per die align with the theoretical occurrence rates, 50,000 dies are sufcient. Of the simulated dies, multiple macroevents affect only a small percentage.
Table 2. Occurrence rates for the number of macroevents per die for a sample size of 50,000. Parameter No. of macroevents per die Percentage of dies 0 94.17 Occurrence rate 1 5.67 2 0.15 3 0.01
From the artificial process simulation, we knew which macroevents occurred on each faulty die. We then obtained the yield per test by inspecting the T matrix. The yield per test varied slightly around an average of 98% for each test. We assume that no masking effects occur for dies affected by multiple macroevents. Thus, if a test detects any of the individual macroevents, we assume that the test will fail. Table 2 shows that the
assumption that no masking occurs applies to about 0.16% of all dies; thus, any impact from this assumption is minimal.
395
ITC Special Section
DDSD extraction
We formulated the DDSD extraction process as a minimization problem to be solved using linear regression analysis. Here, we detail the regression procedure for the demonstration circuit. As already mentioned, the total number critical-area points from the critical-area analysis for all layers is 182. It is natural to likewise want to discretize the DDSDs by solving for their values at the same points as the critical area points. Each of these is referred to as a bin. The individual defect densities in the 182 bins comprise the DDSD vector we wish to derive. However, given that there are only 155 test vectors, we can obtain only 155 yields per test. Consequently, there are more unknowns than equations, which means the minimization is an undetermined problem with an innite number of solutions. To reformulate the problem so that it is solvable, we grouped sample points for defect size ranges into fewer, wider bins, thus reducing the overall number of densities to be derived. Figure 3 shows the 19 bins used for this experiment. We recalculated critical-area functions for the new bin arrangements, represented by the black symbols in Figure 3. This reconstruction doesnt affect the T matrix, so there is no need to resimulate the faults. We used principal component regression to find the values for the 19 bins that make up the DDSDs. We obtained 95% confidence intervals for the extracted DDSDs, using standard bootstrapping techniques.14 Figure 4 shows the nal extracted results of the analysis for all five metal layers. The triangles represent the 19 extracted DDSD vector components, and the small circles represent the assumed DDSD components. Although the results arent perfect, the inserted DDSD and the extracted DDSD correlate wella positive and promising result. Figure 4 also shows the 95% confidence intervals for each DDSD component. Some of the condence intervals are quite large. The source of this variance can be traced to the properties of the criticalarea functions and the T matrix. Specifically, criticalarea functions that contribute to one tests failing correlate strongly with critical-area functions contributing to other test patterns.
Silicon experiment
After the success of the simulation experiment, we conducted a similar experiment on a chip manufactured in a commercial facility. The chip is an array of 64-bit ALUs manufactured in a 0.11-micron process. LSI
Logic designed the chip as a process development and silicon-debugging vehicle closely mimicking the design style of the companys other digital-logic products. Hence, the chip is ideally suited for testing and validating our DDSD extraction strategy. Each die contains 384 ALUs, each independently controllable and observable (similar to the assumption made in the simulation experiment). The chips structure is convenient from the perspective of scale because the die is partitioned into many small blocks, each a single ALU. Although not all designs are this regular, large designs are frequently partitioned into smaller blocks and tested separately with scan chains. Analyzing each block independently or limiting the analysis to just a handful of blocks is one strategy for coping with the large number of macroevents associated with an industrial design. We performed the experiment in almost the same manner as that of the simulation experiment. We adjusted the critical-area bins to account for the smaller feature size. The bin edges were 0.1, 0.2, 0.4, 1, and 2 microns. The silicon chip was routed in six layers rather than five and thus required 23 bins (like metal layer 5, metal layer 6 was captured with only three bins). Another difference in this experiment was that we used real test results for a test set containing 262 patterns provided by the manufacturer. We extracted the results using 451 failing ALUs; the parts yield is IP, so we dont disclose the total number of manufactured ALUs. Figure 5 shows the extracted DDSDs for the six metal layers. We did not simply parameterize an assumed model, yet the extracted curve for each layer follows a power law distribution, a DDSD shape typically found in manufacturing processes. This strongly indicates that these results are meaningful. Additionally, the plots indicate that although the distributions dont vary widely, there are differences in defect densities from layer to layer. The y-axis in each graph has the same range, making plot comparisons easier. Finally, the large confidence intervals for the smallest defect sizes in metal layers 5 and 6 occur because there is very little critical area for small defects in the higher metal layers, as Figure 6 shows. This can be the result of either design rules that force lines to be farther apart or simply the decreased routing density in those layers. Either way, there is limited ability to observe small defects in those layershence, the large confidence intervals. The results of the experiment on chips fabricated in
396
Metal 1 1.0 1 2 0.8 Defect density (cm )

2
Metal 2 1.0 4 Defect density (cm2) 5 6 0.8 7 8
Extracted defect density Presumed defect density
0.6
0.6
0.4
0.4
0.2
0.2
0 0.2 (a) 1.0 9 0.8 Defect density (cm ) 95% confidence interval 0.6 Defect density (cm2)
2
0 0.5 1 Defect radius (m) Metal 3 1.0 10 11 12 0.8 14 15 16 2 (b) 0.2 0.5 1 Defect radius (m) Metal 4 2
0.6
0.4
0.4
0.2
0.2
0 0.2 (c) 0.5 1 Defect radius (m) 1.0 17 0.8 Defect density (cm2) 18 2 (d) Metal 5
0 0.2 0.5 1 Defect radius (m) 2
19
0.6
0.4
0.2
0 0.3 0.5 (e) 1 Defect radius (m) 2
Figure 4. Assumed and extracted DDSDs for all metal layers and corresponding 95% condence intervals: metal 1 (a), metal 2 (b), metal 3 (c), metal 4 (d), and metal 5 (e).
397
ITC Special Section
Metal 1 1 2 3 4 5 6 7
Metal 2 8 9 10 11
Metal 3 12
0.2 0.5 (a)
2 (b)
0.2 0.5
2 (c)
Defect radius (m) Metal 4 14 15 16
Defect radius (m) Metal 5 18 19 20
0.2 0.5 1 Defect radius (m) Metal 6 21 22 23
(d)
0.2 0.5 1 Defect radius (m)
2 (e)
2 (f)
Figure 5. Extracted DDSDs for all metal layers in a fabricated 64-bit ALU test chip, and corresponding 95% condence intervals. Defect densities are hidden to protect IP, but the scale of all plots is identical. Metal 1 (a), metal 2 (b), metal 3 (c), metal 4 (d), metal 5 (e), and metal 6 (f).
silicon conrm the results of the simulation experiment: We can measure DDSDs that characterize a process in ordinary digital circuits using only slow, structural test results from the product.
RATHER THAN DISCARDING pass/fail test results once a
part has been sorted, we can derive valuable process characteristics from the test data. Our strategy extracts DDSDs consistent with those wed expect to see for a modern manufacturing processan achievement not previously accomplished without using additional silicon area. Our ongoing research is looking for ways to improve accuracy by using high-delity fault models and greater data volume, as well as by accounting for yield loss due to other defect types such as open circuits.
Many manufacturers continue to rely on inspection techniques whose quality degrades with every new process generation. Our approach to extracting process characteristics doesnt suffer from the same degradation. Although manufacturers stand to gain much from using this approach, our strategy also offers an opportunity for fabless companies to gain insight into the fabrication of their chips. For the rst time, such companies can independently compute their products defect characteristics and improve design yield by tuning designs for a given fabline.
Acknowledgments
Semiconductor Research Corporation supported this work under contract 1172.001.
398
5.E+04
Metal 1 Metal 2 Metal 3 Metal 4 Metal 5 Metal 6
4.E+04
Critical area (m2)
3.E+04
2.E+04
1.E+04
0.E+00 0 0.5 1.0 Defect radius (m) 1.5 2.0
Figure 6. Total critical-area functions per layer extracted from all metal layers of a 64-bit ALU.
References
1. W. Maly and J. Deszczka, Yield Estimation Model for VLSI Artwork Evaluation, Electronic Letters, vol. 19, no. 6, Mar. 1983, pp. 226-227. 2. D. Schmitt-Landsiedel et al., Critical Area Analysis for Design-Based Yield Improvement of VLSI Circuits, Qual-
8. J.E. Nelson et al., Extraction of Defect Density and Size
Distributions from Wafer Probe Test Results, tech. report

CSSI 05-02, Center for Silicon System Implementation, Carnegie Mellon Univ., 2005. 9. C.H. Stapper, Modeling of Integrated Circuit Defect Sensitivities, IBM J. Research and Development, vol. 27, no. 6, Nov. 1983, pp. 549-557. 10. K.C.Y. Mei, Bridging and Stuck-at Faults, IEEE Trans.
ity and Reliability Eng. Intl, vol. 11, 1995, pp. 227-232.
3. D.J. Ciplickas, X. Li, and A.J. Strojwas, Predictive Yield Modeling of VLSICs, Proc. 5th Intl Workshop Statistical
Computers, vol. 23, no. 7, July 1974, pp. 720-727.

11. R.C. Aitken and P.C. Maxwell, Biased Voting: A Method for Simulating CMOS Bridging Faults in the Presence of Variable Gate Logic Thresholds, Proc. Intl Test Conf. (ITC 93), IEEE Press, 1993, pp. 63-72. 12. R.D. Blanton, Methods for Characterizing, Generating
Metrology (WSM 00), IEEE Press, 2000, pp. 28-37.

4. J. Khare, D. Feltham, and W. Maly, Accurate Estimation of Defect-Related Yield Loss in Recongurable VLSI Circuits, IEEE J. Solid-State Circuits, vol. 8, no. 2, Feb. 1993, pp. 146-156. 5. Y.J. Kwon and D.M.H. Walker, Yield Learning via Functional Test Data, Proc. Intl Test Conf. (ITC 95), IEEE Press, 1995, pp. 626-635. 6. W. Maly, Spot Defect Size Measurements Using Results
Test Sequences for, and Simulating Integrated Circuit Faults Using Fault Tuples and Related Systems and Computer Program Products, US Patent 6,836,856,
Patent and Trademark Ofce, 2004. 13. F. Brglez and H. Fujiwara, A Neutral Netlist of 10 Combinational Benchmark Designs and a Special Translator in Fortran, Proc. Intl Symp. Circuits and Systems (ISCAS 85), IEEE Press, 1985, pp. 695-698. 14. B. Efron and R.J. Tibshirani, An Introduction to the Boot-
of Functional Test for Yield Loss Modeling of VLSI IC,

white paper, Carnegie Mellon Univ., 2004. 7. J.E. Nelson et al., Extraction of Defect Density and Size Distributions from Wafer Sort Test Results, Proc.
Design, Automation and Test in Europe (DATE 06),

IEEE Press, 2006, pp. 913-918.
strap, Chapman & Hall, 1993.
399
ITC Special Section
Jeffrey E. Nelson is a PhD candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University. His research interests include process characterization and testing of digital systems. He has a BS and an MS in electrical and computer engineering from Rutgers University and Carnegie Mellon University, respectively. He is a member of the IEEE. Thomas Zanon is a PhD candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University and a yield ramping consulting engineer at PDF Solutions, in San Jose, California. His research interests include defect and process characterization based on test results. Zanon has a Dipl. Ing. degree in electrical engineering and information technology from the Technische Universitaet Muenchen. He is a member of the IEEE and EDFAS. Jason G. Brown is a PhD candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University. His research interests include defect-based test, inductive fault analysis, and layout-driven diagnosis. He has a BS in electrical engineering from Worcester Polytechnic Institute and an MS in computer engineering from Carnegie Mellon University. Osei Poku is a PhD candidate in the Department of Electrical and Computer Engineering at Carnegie Mellon University. His research interests include various aspects in test and diagnosis of VLSI circuits, such as automatic test pattern generation, volume diagnosis, and diagnosisbased yield learning. Poku has a BS in electrical engineering from Hampton University and an MS in electrical and computer engineering from Carnegie Mellon University. R.D. (Shawn) Blanton is a professor in the Department of Electrical and Computer Engineering at Carnegie Mellon University, where he is the associate director of the Center for Silicon System Implementation (CSSI). His research interests
include test and diagnosis of integrated, heterogeneous systems. He has a BS in engineering from Calvin College, an MS in electrical engineering from the University of Arizona, and a PhD in computer science and engineering from the University of Michigan, Ann Arbor. Wojciech Maly is the Whitaker Professor of Electrical and Computer Engineering at Carnegie Mellon University. His research interests focus on the interfaces between VLSI design, testing, and manufacturing, with emphasis on the stochastic nature of phenomena relating these three VLSI domains. Maly has an MSc in electronic engineering from the Technical University of Warsaw and a PhD from the Institute of Applied Cybernetics, Polish Academy of Sciences. Brady Benware is a staff engineer in the Product Engineering group at LSI Logic, where his current focus is on developing defect-based test methods to achieve very low defective-parts-per-million levels. Benware has a PhD in electrical engineering from Colorado State University. Chris Schuermyer is an engineer in the Advanced Defect Screening group at LSI Logic. His research interests include test for yield and defect learning, defect-based testing, and logic diagnosis. He has a BS in physics and a BS and an MS in electrical engineering, all from Portland State University.
Direct questions or comments about this article to R.D. Blanton, Dept. of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA 15213; blanton@ece.cmu.edu.
400
PURPOSE The IEEE Computer Society is
PUBLICATIONS AND ACTIVITIES
the worlds largest association of computing professionals, and is the leading provider of technical information in the field.
MEMBERSHIP Members receive the
Computer. The flagship publication of the IEEE Computer Society, Computer publishes peer-reviewed technical content that covers all aspects of computer science, computer engineering, technology, and applications.
Periodicals. The society publishes 14 AVAILABLE INFORMATION
monthly magazine Computer, discounts, and opportunities to serve (all activities are led by volunteer members). Membership is open to all IEEE members, affiliate society members, and others interested in the computer field.
COMPUTER SOCIETY WEB SITE
To obtain more information on any of the following, contact the Publications Office: Membership applications Publications catalog Draft standards and order forms Technical committee list Technical committee application Chapter start-up procedures Student scholarship information Volunteer leaders/staff directory IEEE senior member grade application (requires 10 years practice and significant performance in five of those 10)
magazines, 10 transactions, and one letters. Refer to membership application or request information as noted at left.
Conference Proceedings & Books.
The IEEE Computer Societys Web site, at www.computer.org, offers information and samples from the societys publications and conferences, as well as a broad range of information about technical committees, standards, student activities, and more.
OMBUDSMAN Members experiencing prob-
Conference Publishing Services publishes more than 175 titles every year. CS Press publishes books in partnership with John Wiley & Sons.
Standards Working Groups. More
than 150 groups produce IEEE standards used throughout the world.
Technical Committees. TCs provide
Conferences/Education. The society To check membership status or report a holds about 150 conferences each year change of address, call the IEEE toll-free and sponsors many educational activities, CHAPTERS Regular and student chapters number, +1 800 678 4333. Direct all other including computing science accreditation. worldwide provide the opportunity to interComputer Society-related questions to the act with colleagues, hear technical experts, Publications Office, +1 714 821 8380. and serve the local professional community. E X E C U T I V E C O M M I T T E E
President: DEBORAH M. COOPER* PO Box 8822 Reston, VA 20195 Phone: +1 703 716 1164 Fax: +1 703 716 1159 d.cooper@computer.org
President-Elect: MICHAEL R. WILLIAMS* Past President: GERALD L. ENGEL* VP, Conferences and Tutorials: RANGACHAR KASTURI (1ST VP)* VP, Standards Activities: SUSAN K. (KATHY) LAND (2ND VP)* VP, Chapters Activities: CHRISTINA M. SCHOBER* VP, Educational Activities: MURALI R. VARANASI VP, Electronic Products and Services: SOREL REISMAN VP, Publications: JON G. ROKNE VP, Technical Activities: STEPHANIE M. WHITE* Secretary: ANN Q. GATES* Treasurer: STEPHEN B. SEIDMAN 20062007 IEEE Division V Director: OSCAR N. GARCIA 20052006 IEEE Division VIII Director: STEPHEN L. DIAMOND 2006 IEEE Div. VIII Director-Elect: THOMAS W. WILLIAMS Computer Editor in Chief: DORIS L. CARVER Executive Director: DAVID W. HENNAGE
* voting member of the Board of Governors nonvoting member of the Board of Governors
lemsmagazine delivery, membership status, or unresolved complaintsmay write to the ombudsman at the Publications Office or send an e-mail to help@computer.org.
professional interaction in over 30 technical areas and directly influence computer engineering conferences and publications.
BOARD OF GOVERNORS
Term Expiring 2006: Mark Christensen, Alan Clements, Robert Colwell, Annie Combelles, Ann Q. Gates, Rohit Kapur, Bill N. Schilit Term Expiring 2007: Jean M. Bacon, George V. Cybenko, Antonio Doria, Richard A. Kemmerer, Itaru Mimura, Brian M. OConnell, Christina M. Schober Term Expiring 2008: Richard H. Eckhouse, James D. Isaak, James W. Moore, Gary McGraw, Robert H. Sloan, Makoto Takizawa, Stephanie M. White Next Board Meeting: 01 Nov. 06, San Diego, CA
COMPUTER SOCIETY O F F I C E S
Washington Office 1730 Massachusetts Ave. NW Washington, DC 20036-1992 Phone: +1 202 371 0101 Fax: +1 202 728 9614 E-mail: hq.ofc@computer.org Los Alamitos Office 10662 Los Vaqueros Cir., PO Box 3014 Los Alamitos, CA 90720-1314 Phone:+1 714 821 8380 E-mail: help@computer.org Membership and Publication Orders: Phone: +1 800 272 6657 Fax: +1 714 821 4641 E-mail: help@computer.org Asia/Pacific Office Watanabe Building 1-4-2 Minami-Aoyama,Minato-ku, Tokyo107-0062, Japan Phone: +81 3 3408 3118 Fax: +81 3 3408 3553 E-mail: tokyo.ofc@computer.org
IEEE
OFFICERS
President: MICHAEL R. LIGHTNER President-Elect: LEAH H. JAMIESON Past President: W. CLEON ANDERSON Executive Director: JEFFRY W. RAYNES Secretary: J. ROBERTO DE MARCA Treasurer: JOSEPH LILLIE VP, Educational Activities: MOSHE KAM VP, Publication Services and Products: SAIFUR RAHMAN VP, Regional Activities: PEDRO RAY President, Standards Association: DONALD N. HEIRMAN VP, Technical Activities: CELIA DESMOND IEEE Division V Director: OSCAR N. GARCIA IEEE Division VIII Director: STEPHEN L. DIAMOND President, IEEE-USA: RALPH W. WYNDRUM, JR. rev. 2 Aug. 06
EXECUTIVE
STAFF
Executive Director: DAVID W. HENNAGE Assoc. Executive Director: ANNE MARIE KELLY Publisher: ANGELA BURGESS Associate Publisher: DICK PRICE Director, Administration: VIOLET S. DOAN Director, Business & Product Development: PETER TURNER Director, Finance and Accounting: JOHN MILLER
ITC Special Section
Improving Transition Delay Test Using a Hybrid Method

Nisar Ahmed and Mohammad Tehranipoor
University of Connecticut
Editor's note: Structured delay test using scan transition tests is becoming commonplace. But high coverage and compact tests can still be elusive in some situations. The authors propose a novel technique combining the cost-effectiveness of launch-from-capture test with the coverage/pattern volume advantages of launch-from-shift. Ken Butler, Texas Instruments
THIS TRANSITION-FAULT-TESTING TECHNIQUE
combines the launch-off-shift method and an enhanced launch-off-capture method for scan-based designs. The technique improves fault coverage and reduces pattern count and scan-enable design effort. It is practice oriented, suitable for low-cost testers, and implementable with commercial ATPG tools. Scan-based structural tests increasingly serve as a costeffective alternative to the at-speed functional-pattern approach to transition delay testing.1,2 Transition fault testing involves applying a pattern pair (V1, V2) to the circuit under test. V1 is the initialization pattern, and V2 is the launch pattern. V2 launches the desired signal transition (0 1 or 1 0) at the target node, and the response of the circuit under test is captured at functional speed (the rated clock period). The entire operation consists of three cycles:

initializationa scan-in operation applies V1; launcha transition is launched at the target gate terminal (V2 is applied); and capturethe transition is captured at an observable point.
Transition fault test patterns can be generated and applied in three ways: the launch-off-shift (LOS) or skewed-load method, the launch-off-capture (LOC) or broadside method, or the enhanced-scan method. In this article, we focus only on the rst two methods. In LOS,
the transition at a target gate output is launched in the last shift cycle during the shift operation. Figure 1a shows the waveforms during a LOS operations cycles. The launch cycle is part of the shift operation and is immediately followed by a fast capture pulse. The time period for the scan-enable signal (SEN) to make this 1 0 transition corresponds to the functional frequency. Hence, LOS requires that SEN be timing critical. In LOC, the transition is launched and captured through the functional pin (D) of any ip-op in the scan chain. Figure 1b shows the waveforms of the LOC method, which separates the launch cycle from the shift operation. Because launch pattern V2 depends on the functional response of initialization vector V1, the launch path is less controllable, so test coverage is low. LOC relaxes the at-speed constraint on SEN and adds dead cycles after the last shift to provide enough time for SEN to settle low. As device frequencies become higher, production test equipment capabilities limit the ability to test a device at speed. Rather than purchasing a more expensive tester, test engineers use one of several on-chip DFT alternatives, such as an on-chip clock generator for atspeed clock, pipeline SEN generation, or on-chip atspeed SEN generation3 for LOS transition fault testing. The LOS method is preferable to the LOC method in terms of ATPG complexity and pattern count. However, because of increasing design sizes, the SEN fan-out exceeds any other net in the design. LOS constrains SEN to be timing critical, requiring a design effort that makes it difficult to implement products in reasonable turnaround times. Thats the main reason for the widespread use of the LOC method, especially on very low-cost testers.2 In this article, we propose a hybrid technique that uses both LOS and LOC in scan-based designs, pro-
402
0740-7475/06/$20.00 2006 IEEE
viding higher fault coverage and lower pattern count with a small scan-enable design effort. (The Related work sidebar discusses other approaches to improving transition delay test quality.)
Initialization
Launch
Capture
Clock
Overview
Our proposed scan architecture controls a small subset of selected scan cells by the LOS method, and controls the remaining scan cells by the enhanced launch-off-capture, or ELOC, method (see the Related work sidebar). We use an efficient ATPG-based controllability-and-observability measurement approach to select the scan cells controlled by LOS or ELOC. The selection criteria improve fault coverage and reduce the overall pattern count. Because a few scan cells are LOS controlled, only a small subset of the scan chains SEN signals must be timing closed; this reduces the scanenable design effort. The method is robust and practice oriented, and it uses existing commercial ATPG tools.4 To control the scan chain operation mode (LOS or ELOC), two new cells called local scan-enable generators (LSEGs) generate on-chip SEN signals. The scanenable control information for the launch and capture cycles is embedded in the test data itself. The LSEGs can be inserted anywhere in the scan chain with negligible hardware area overhead. The proposed technique is suitable for low-cost testers because it doesnt require external at-speed SEN.
SEN
(a)
Scan-in pattern i Scan-out response i 1
Scan-in pattern i + 1 Scan-out response i
Initialization
Launch
Capture
Clock
SEN
(b)
Scan-in pattern i Scan-out response i 1
Scan-in pattern i + 1 Scan-out response i
Figure 1. Transition delay fault pattern generation methods: launch-off-shift (LOS) (a) and launch-off-capture (LOC) (b).
Motivation
ELOC improves the controllability of launching a transition through either the scan path or the functional path.5 However, it provides less observability than LOS does because a scan chain working in shift mode to launch a transition is not observable at the time of capture (SEN is held high during the launch and capture cycles). Therefore, ELOCs fault coverage is less than that of LOS but greater than that of LOC. Figure 2a (on p. 405) shows fault coverage analysis for the three transition fault methods. A common set of transition faults is detected by both LOS and LOC, and some faults in the LOC transition fault set are not detected by LOS, such as shift-dependency untestable faults.6,7 However, ELOC covers LOCs entire transition fault set and also detects some extra faults in the LOS-detected fault set. This is because LOC is a special case in which all local SEN signals are held at 0 during the launch and capture cycles. ELOC provides an intermediate fault coverage point between LOS and the conventional LOC method.5 To improve fault coverage and identify the union of
fault sets detected in both the LOS and ELOC modes, the scan cells must be controllable in both modes. Also, to reduce the design effort for at-speed, scan-enable signal (required for LOS), we must determine the minimum number of scan cells that require very high controllability and observability during pattern generation. We must control the resulting smaller subset of scan cells in LOS mode, and the remaining scan cells in ELOC mode. This reduces the design effort to timingclose the SEN signal at speed as required for LOS-controlled scan ip-ops. Figure 2b shows an example of a hybrid scan architecture with eight scan chains. The LOS-controlled scan ip-ops are stitched in separate scan chains. A fast SEN signal controls the first three scan chains containing LOS-controlled ip-ops, and a slow SEN signal controls the remaining scan chains in ELOC mode. Moreover, this architecture also requires conguring the LOS-controlled scan chains in functional mode because some faults are detected only by LOC and not by LOS.
Local SEN generation

The new method for testing transition faults provides more controllability in launching a transition but requires an independent SEN for each scan chain. We can use multiple scan-enable ports, but this increases
403
ITC Special Section
Related work
Wang, Liu, and Chakradhar propose a hybrid scan architecture that controls a small subset of selected scan cells by launch-off shift (LOS), and the rest by launch-off capture (LOC).1 The authors have designed a fast scanenable signal (SEN) generator that drives the LOS-controlled scan flip-flops. The selection criteria of the LOS-controlled scan flip-flops determine the methods effectiveness. In some cases, the number of patterns generated by the hybrid method exceeds the LOC pattern count. Moreover, the LOS-controlled flip-flops cannot be used in LOC mode. Figure A1 shows the SEN waveforms of this hybrid technique. In a new scan-based, at-speed test called enhanced launch-off-capture (ELOC), the ATPG tool deterministically targets the transition launch path either through a functional path or the scan path.2 The technique improves transition fault testing controllability and fault coverage, Initialization Launch Capture and it does not require SEN to change at speed. Figure CLK A2 shows SEN waveforms in the ELOC technique. The SEN1 LOC SEN signal of a subset of SEN2 scan chains stays at 1 (SEN1) LOS during the launch and capture cycles to launch the Scan-in pattern i Scan-in pattern i + 1 transition only. The second Scan-out response i 1 Scan-out response i (1) SEN signal (SEN2) controls the remaining scan chains to Initialization Launch Capture launch a transition through the functional path during the CLK launch cycle and capture the response during the capture SEN1 Shift-mode cycle. Figure A3 shows a cirSEN2 cuit with two scan chains, LOC chain 1 acting as a shift register, and chain 2 acting in Scan-in pattern i Scan-in pattern i + 1 functional mode. The conScan-out response i 1 Scan-out response i (2) ventional LOC method is a special condition of the Chain 2 ELOC method in which the SEN signals of all chains are Controlled 0 during the launch and capby SEN2 ture cycles. Two other proposed techCombo logic niques improve LOS fault Controlled coverage by reducing shift by SEN1 dependency.3,4 A technique by Li et al. reorders the scan flip-flops to minimize the number of undetectable Chain 1 (3) faults, and restricts the distance by which a scan flipFigure A. Previously proposed techniques: SEN waveforms in hybrid scan flop can be moved to create technique (1), SEN waveforms in enhanced LOC (ELOC) technique (2); ELOC the new scan chain order. controllabilitychain 1 used in shift mode, and chain 2 in functional mode (3).
404
Gupta et al. propose a technique that inserts dummy flipops and reorders scan ip-ops, considering wire length costs to improve path delay fault coverage. Wang and Chakradhar propose using a special ATPG to identify pairs of adjacent flip-flops and inserting test points (dummy gates or ip-ops) between them.5
Enhanced Launch-off-Capture Transition Fault Testing,
Proc. Intl Test Conf. (ITC 05), IEEE Press, 2005, pp. 246255. 3. W. Li et al., Distance Restricted Scan Chain Reordering to Enhance Delay Fault Coverage, Proc. 18th Intl Conf.
VLSI Design, IEEE Press, 2005, pp. 471-478.

4. P. Gupta et al., Layout-Aware Scan Chain Synthesis for Improved Path Delay Fault Coverage, Proc. Intl Conf.
References
1. S. Wang, X. Liu, and S.T. Chakradhar, Hybrid Delay Scan: A Low Hardware Overhead Scan-Based Delay Test Technique for High Fault Coverage and Compact Test Sets, Proc. Design, Automation and Test in Europe (DATE 03), IEEE Press, 2004, pp. 1296-1301. 2. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar,
Computer-Aided Design (ICCAD 03), IEEE Press, 2003,

pp. 754-759. 5. S. Wang and S.T. Chakradhar, Scalable Scan-Path Test Point Insertion Technique to Enhance Delay Fault Coverage for Standard Scan Designs, Proc. Intl Test
Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.
the number of pins. Two types of SEN signals must be generated on chip. The scan-enable control information for the scan ip-ops differs only during the patterns launch and capture cycles. Hence, we can use the low-speed SEN signal from the external tester for the scan shift operation and internally generate the scan-enable control information for only the launch and capture cycles.
AU LOS LOC
(a)
ELOC
SEN1
Fast SEN signal LOS-controlled
LSEG cells
Because our hybrid technique uses both LOS and enhanced LOC techniques, we must generate both fast and slow local SEN signals. We propose two LSEG cells to generate on-chip local SENs using a low-speed external SEN generated by a low-cost tester.
SEN2
Slow SEN signal ELOC-controlled
Slow scan-enable generator (SSEG). We designed an LSEG to control a scan Figure 2. Hybrid method analysis and architecture: Fault analysis of LOS, ip-ops transition launch path.5 In this LOC, and ELOC techniques (a), and hybrid scan architecture: with LOSarticle, we refer to this cell as the slow controlled scan chains using fast SEN signal and ELOC-controlled scan scan-enable generator (SSEG) because chains using slow SEN signal (b). the local SEN signal does not make an atspeed transition. Figure 3a shows the SSEG cell architecture. It consists of a single ip-op that scan-enable (LSEN) signal. Therefore, after going to conloads the control information required for the launch trol state Q at the end of the shift operation (that is, after and capture cycles. The input scan-enable (SENin) pin GSEN is deasserted), LSEN remains in this state as long connected to the external SEN signal from the tester is as GSEN asynchronously sets it to 1. The SSEG cell essencalled global scan-enable (GSEN). An additional out- tially holds the value 0 or 1 loaded at the end of the shift put scan-enable pin (SENout LSEN) represents the local operation (GSEN = 1) for the launch and capture cycles:
(b)
405
ITC Special Section
ture, which we call the fast scan-enable generator (FSEG). Table 2 shows 0 0 D Q Q Q D Q the FSEG cells operation 1 SD SD 1 FF FF modes. As in SSEG cell CLK CLK operation, GSEN = 1 repSENin (GSEN) SENin (GSEN) resents the patterns normal shift operation. When GSEN = 0 and Q = 1, LSEN SENout (LSEN) SENout (LSEN) = 1 and the scan ip-ops (b) (a) act in the shift-launch-capture mode to launch the transition from the scan Figure 3. LSEG cells: slow scan-enable generator (SSEG) cell (a) and fast scan-enable path and capture the generator (FSEG) cell (b). response at the next capture cycle (conventional LOS method). The LSEN from the FSEG cell makes a Table 1. SSEG operation, where GSEN is the global scan-enable signal, Q 1 0 at-speed transition at the launch cycle. The LSENis the ip-ops state, and LSEN is the local scan-enable signal. controlled scan ip-ops act in the conventional LOC GSEN Q LSEN Operation mode when GSEN = 0 and Q = 0 (functional-launchcapture mode). 1 X 1 Shift
0 0 1 0 11 00 Shift-launch (no capture) Functional launch and capture
LSEG cell operation

LSEG cells inserted in the scan chains pass control information as part of the test data. The scan-enable control information is part of each test pattern and is stored in the testers memory. Figure 4a shows the normal scan architecture with a single SEN signal from the external tester. The scan chain contains eight scan ip-ops, and the shifted test pattern is 10100110. Figure 4b shows the same circuit, which generates an LSEN signal from the test pattern data for the hybrid transition fault test method. The main objective is to deassert the external GSEN signal after the entire shift operation and then generate the LSEN signal from the test data during the launch and capture cycles. In this case, the shifted pattern is modied to [C]10100110, where C is the scan-enable control bit stored in the LSEG cell at the end of the scan operation. The GSEN signal asynchronously controls the shift operation. GSEN is deasserted after the nth shift (initialization) cycle, where n = 9; n is the length of the scan chain after insertion of the LSEG cell. After the GSEN signal is deasserted at the end of the shift operation, the scan-enable control during the launch and capture cycles is control bit C stored in the LSEG. At the end of the capture cycle, GSEN asynchronously sets the LSEN signal to 1 for scanning out the response. Figure 4c shows the process of test pattern application and the timing waveforms for the two LSEG cells, SSEG and FSEG.
Table 2. FSEG operation.
GSEN
1 0 0
Q
X 1 0
LSEN
1 10 00
Operation Shift Shift-launch capture Functional launch and capture
1 if GSEN = 1 LSEN = (GSEN + Q ) = Q if GSEN = 0 Table 1 shows the SSEG cells operation modes. GSEN = 1 represents the patterns normal shift operation. When GSEN = 0 and Q = 1, LSEN = 1 and the controlled scan flip-flops act in the shift mode to launch the transitions-only, shift-launch (no-capture) mode. Moreover, there is no capture, because the LSEN signal is 1 (LSEN = 1 1 at the launch edge). The other observable scan flip-flops perform the capture. The LSEN-controlled scan ip-ops act in the conventional LOC mode when GSEN = 0 and Q = 0 (functionallaunch-capture mode). Fast scan-enable generator (FSEG). Figure 3b shows our new local, at-speed, scan-enable generator architec-
406
10100110 Scan input (a)
Scan output
GSEN
[C]10100110 LSEG (FSEG or SSEG)
GSEN
(b)
LSEN
1 2 3 4 Shift operation 5 6 7 8 9 IC CLK LC CC
GSEN
C C C SSEG cell
LSEN = (GSEN + C)
(c) C FSEG cell
Figure 4. LSEG cell operation: Scan chain architecture (a), LSEN generation using LSEG (b), and
LSEN generation process and waveforms (c).
Flip-op selection: Measuring controllability and observability

In the LOS technique, the fault activation path (scan path), unlike the functional path used in the LOC method, is fully controllable from the scan chain input. Hence, in most cases, for the same detected fault, a LOS pattern requires fewer care bits than a LOC pattern. The controllability measure of a scan flip-flop is the percentage of patterns in the entire pattern set (P) for which a care bit is required in the scan flip-flop to enable activation or propagation of a fault effect. Figure 5 shows a scan flip-flop with an input (observability) and output (controllability) logic cone. A large output logic cone implies that the scan ip-op will control a greater number of faults; that is, a care bit will be required in their activation or propagation. Similarly, the input logic cone determines a scan ip-ops observability. We define this observability as the percentage of patterns in the entire pattern set (P) for which a valid care bit is observed in the scan ip-op. In a transition fault test pattern pair (V1, V2), initialization pattern V1 is essentially an IDDQ pattern to initialSeptemberOctober 2006
Observability logic cone
Controllability logic cone
D
Scan input
Scan flip-flop Figure 5. Scan ip-op controllability-andobservability measure.
ize the target gate to a known state. In the next time frame, pattern V2 is a stuck-at-fault test pattern to activate and propagate the required transition at the target node to an observable point. Therefore, to nd the controllability-observability measure of a scan ip-op, we use an ATPG tool to generate stuck-at patterns and force it to fill in dont-care (X) values for scan flip-flops that dont affect any faults activation or propagation. The
407
ITC Special Section
FSEG 1 2
Test architecture
The LSEG-based solution presented here provides a method of generating LOS-controlled internal LSEN signals from pattern data, and GSEN signals from the tester. The SSEG overhead of generating the LSEN signal is LSENi i the additional LSEG (SSEG or FSEG) cell in the scan chain. An LSEG cells area ELOC-controlled overhead is a few extra gates, which is negligible in modern designs. We assume n that the area overhead of the buffer tree required to drive all the LOS-controlled scan flip-flops through the LSEG cells is GSEN equal to the overhead of applying an atspeed GSEN signal from external ATE. Figure 6. Hybrid scan test architecture: FSEG cells driving LOS-controlled Figure 6 shows a multiple-scan-chain scan chains, and SSEG cells driving ELOC-controlled scan chains. architecture with n scan chains. The LOScontrolled scan ip-ops determined by the controllability-observability measurement are stitched Table 3. Case study design characteristics. in separate scan chains. Each scan chain i, where 1 i n, Characteristics No. consists of an LSEG (FSEG or SSEG) cell that generates signal LSENi for the respective scan chain. The GSEN sigClock domains 6 nal connects only to the SENin port of the LSEG cells. Scan chains 16
Scan ip-ops Nonscan ip-ops Transition delay faults 10,477 13 320,884
Study description
In this case study, we experimented with a subchip of an industrial-strength design with the characteristics listed in Table 3. One LSEG cell is inserted per scan chain. The test strategy was to get the highest possible transition fault test coverage. When generating test patterns for transition faults, we targeted only faults in the same clock domain. During pattern generation, only one clock is active during the launch and capture cycles. Hence, only faults in that particular clock domain are tested. All primary inputs remain unchanged, and all primary outputs are unobservable during test-pattern generation for transition faults. This is because the very low cost testers are not fast enough to provide PI values and strobe POs at speed.
ith scan ip-ops controllability is Ci = pc/P , where pc is the number of patterns with a care bit in the scan ipop during scan-in, and P is the total number of stuckat patterns. Similarly, observability is Oi = po/P, where po is the number of patterns with a care bit in the scan ipop during scan-out. We then use each scan ip-ops measured controllability and observability factors to determine cost function CFi = CiOi. The scan ip-ops are arranged in decreasing order of cost function, and a subset with very high cost functions is selected as LOS-controlled ipops. The ATPG-based controllability-observability measurement technique overcomes the limitation of the Scoap-based method8 used by Wang, Liu, and Chakradhar,6 which makes it possible to select a scan ipop that has high 0(1) controllability but is not controlled to 0(1) during pattern generation by the ATPG tool.
DFT insertion
We measure a scan ip-ops cost function (controllability observability) using the ATPG-based technique explained earlier. Figure 7 shows the cost function of each scan ip-op in our design. Approximately only 20% to 30% of the ip-ops require very high controllability and observability. Hence, SEN need not be at speed for all scan ip-ops. We arrange the scan ip-ops in decreasing order of cost function, and we use this order during scan insertion.
Case study
The following case study illustrates DFT insertion and ATPG ow of our hybrid scan transition fault-testing technique. It includes an analysis of extra detected faults.
408
In the new order of scan chains, the few 1.0 initial chains consist of very high controlCost function lability-observability ip-ops, and we select them for LOS according to their aver0.8 age cost function. We measure a scan chains average cost function as CFi /N, where CFi = Ci Oi is the cost function of 0.6 the ith scan ip-op in the chain, and N is the number of ip-ops in the scan chain. Figure 8 shows each chains average cost 0.4 function for normal scan insertion and after scan insertion based on controllability and 0.2 observability. As Figure 8b shows, after this scan insertion, the average cost function of the rst ve scan chains is very high (due 0 to scan ip-ops with very high cost func0 2,000 4,000 6,000 8,000 10,000 tions) and very low for the rest of the No. of cells chains. Therefore, we can design the rst ve chains SEN signal to be at speed (con- Figure 7. Cost functions of scan ip-ops in case study design. trolled by the FSEG cell), and the rest of the scan chains can use a slow-speed SEN (controlled by the SSEG cell). the basic-scan and sequential engines. The tool underWe used the Synopsys DFT Compiler for scan chain stands the LSEG technique and can choose the launch insertion.4 To insert the LSEG cells, the synthesis tool must path for the target transition fault deterministically. recognize the LSEG cell as a scan cell and stitch it into the Hence, there is no fundamental difference in ATPG chain. This means that the LSEG cell must be dened as methodology when we use the LSEG-based solution. The SEN signal for the ip-ops in the launch and capa new library cell with scan cell attributes. A workaround is to design the LSEG cell as a module, instantiate it, and ture cycles comes from an internally generated signal. declare it as a scan segment of length 1. The GSEN signal The OR gate in the LSEG cell generates the LSEN signal is connected to all LSEG cell SENin pins. During scan inser- through a logical OR of the ip-ops GSEN and Q output tion, we specify only the LSEG cell in the scan path (see Figure 3). The GSEN signal is high during scan shift because the tool will stitch the rest of the cells, including operation. The tool determines each chains LSEN and the LSEG cell, and balance the scan chain, depending on shifts the control value into the LSEG cell during pattern the longest scan chain length parameter. Additionally, the shift for launch and capture. It also deterministically tool provides the exibility to hook up each LSEG cells decides the combination of scan chains to work in shift SENout port in a particular chain to all the SENin ports of the or functional launch mode, to activate a transition fault. Table 4 shows results for conventional LOS and LOC scan ip-ops in the respective chain. (normal scan insertion), ELOC, and hybrid transition delay ATPG on the case study design. We see that LOS ATPG The ATPG tool must understand the LSEG methodol- gave approximately 3% higher fault coverage than LOC. ogy and deterministically choose the transition fault acti- ELOC gave approximately 1.9% higher fault coverage than vation path. We used a commercial ATPG tool, Synopsys the LOC method. The hybrid technique gave better fault TetraMax,4 which supports two ATPG modes: basic scan coverage than the other methods and provided a better and sequential. Basic-scan ATPG is a combinational-only pattern count than the LOC and ELOC methods. The patmode with only one capture clock between pattern tern count was greater than that of LOS but at the advanscan-in and response scan-out; the sequential mode uses tage of less scan-enable design effortonly ve scan a sequential time-frame ATPG algorithm. By default, chains being timing closed for at-speed SEN. (The hybrid when generating test patterns for the transition fault scan technique proposed by Wang, Liu, and Chakradhar6 model in functional launch mode, the ATPG tool uses a sometimes gives a greater pattern count than the LOC two-clock ATPG algorithm that has some features of both technique.) Our hybrid method used more CPU time than
Cost function = controllability observability
12,000
409
ITC Special Section
0.07
Analysis of extra detected faults
As Rearick discusses, the detection of functionally untestable faults poses a potential yield loss prob0.05 lem.9 We analyzed the additional faults detected 0.04 by the hybrid scan architecture over the conven0.03 tional LOC technique. To determine the nature of these extra faults, we per0.02 formed conventional LOC ATPG on them. For exam0.01 ple, for ITC99 benchmark circuit b17, the hybrid scan 0 method detected 17,926 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 extra faults. LOC ATPG on (a) Scan chains these faults showed all of them as nonobservable 0.25 faultsfaults that can be controlled but cannot be propagated to an observable point. 0.02 It can be argued that some of these nonobservable detected faults can 0.15 result in yield loss because some of them might be functionally untestable. 0.10 However, some of these faults are actually functionally testable but 0.05 become nonobservable because of low-cost tester ATPG constraints such as no primary input changes 0 c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 or no primary output measures. For example, of the Scan chains (b) LOS 17,926 extra faults detected by hybrid scan in the nonobservable class, 1,155 Figure 8. Average cost function before (a) and after (b) scan insertion based on were detectable without controllability and observability. the low-cost tester constraints. Also, Lai, Krstic, the other techniques because for hard-to-detect faults, the and Cheng show that functionally untestable nonobATPG tool must do more processing to determine the pos- servable faults might not need testing if the defect doessible combinations of the SSEG-controlled scan chains in nt cause a delay exceeding twice the clock period.10 shift register mode or functional mode. With technology scaling and increasing operating fre0.06 Average chain cost function Average chain cost function
410
quencies, detecting multicycle delay faults might become important, and more than two vectors are required to detect such faults.10 The hybrid scan technique can be advantageous because it eases ATPG and detects multicycle faults using a two-vector pair.
Table 4. Case study ATPG results. Parameter Detected faults Test coverage (%) Fault coverage (%) Pattern count CPU time (s) LOS 292342 91.30 91.11 1,112 329.30 LOC 282658 88.27 88.09 2,145 896.96 ELOC 288681 90.15 89.96 2,014 924.74 Hybrid 295288 91.92 91.74 1,799 1,014.60
Experimental results
We experimented with our hybrid scan technique on the three largest 1999 International Test Conference (ITC) benchmark circuits and on four more industrial designs ranging in size from 10,000 to 100,000 ip-ops. We inserted 16 scan chains in each design. For the LOS and LOC techniques, we used the Synopsys DFT Compiler to perform normal scan insertion. For the ELOC and hybrid techniques, we performed scan insertion based on controllability and observability, and we inserted one LSEG cell in each scan chain. In the case of ELOC, we inserted only SSEG cells in each scan chain. In the hybrid technique, we selected only the rst four scan chains to be LOS controlled (FSEG) after controllability-observability measurement; the remaining scan chains were ELOC controlled (SSEG). This reduced the at-speed scan-enable design effort signicantly because the SEN signal to only one fourth of the scan ip-ops needed to be timing closed. During ATPG, the faults related to clocks, scan-enable, and set or reset pins, referred to as untestable faults, are not added to the fault list. Table 5 shows the ATPG results, comparing the LOS, LOC, ELOC, and hybrid methods. The ELOC method provides higher fault coverage than the LOC method (up to 15.6% for design b19), and in most cases an intermediate fault coverage and pattern count between LOS and LOC. The hybrid method provides better coverage than all other methods because it has the
exibility to use combinations of functional and scan paths for launching a transition. This method provides higher fault coverage, by up to 2.68% (design D) and 19.12% (design b19) than LOS and LOC, respectively. In a worst-case analysis, the lower bound for ELOC is LOC with no extra faults detected over LOC, and the upper bound is LOS. Similarly, for the hybrid technique, the lower bound is ELOC, and the upper bound can be greater than or equal to LOS. However, in the worst-case scenario, for a given fault coverage, the hybrid method will still benefit in test-pattern count reduction compared to LOC, thereby reducing test time, with minimum scan-enable design effort. In some cases, the CPU time for the hybrid or ELOC method is greater than that of the LOC method because the ATPG tool needs a larger search space to nd the transition launch activation path for hard-to-detect faults. Typically, in an ASIC design ow, scan insertion takes place in a bottom-up manner, independent of a physical synthesis step. The DFT insertion tool stitches the scan chains based on the alphanumeric order of scan ip-op names in each module. The resulting scan chains are then reordered during physical synthesis to reduce the scan chain routing area. At the top level, the module-level
Table 5. ATPG results for 1999 International Test Conference (ITC) benchmark circuits and industrial designs. LOS No. of FFs Design b17 b18 b19 A B C D (1,000s) 1.4 3.3 6.6 10 30 50 104 Fault coverage (%) 95.09 92.67 85.98 91.11 87.94 81.10 92.15 No. of 1,088 1,451 2,280 1,112 4,305 6,869 5,933 CPU 95.4 279.7 645.3 329 3,569 8,415 6,559 patterns time (s) Fault coverage (%) 81.02 77.50 69.21 88.09 85.14 79.42 91.56 No. of patterns 1,190 1,309 1,153 2,145 8,664 12,073 10,219 CPU time (s) 1,000.8 1,020.9 1,050.4 896 7,800 22,930 12,088 LOC Fault coverage (%) 94.29 93.01 84.81 89.96 86.57 80.48 92.28 No. of patterns 1,328 1,876 1,422 2,014 8,539 11,583 12,505 CPU time (s) 325 726 1,000 924 8,702 25,642 47,788 ELOC Fault coverage (%) 96.50 95.18 88.33 91.74 88.03 83.29 94.83 No. of 1,179 1,334 1,590 1,799 8,062 CPU 187.9 336.6 1,000.9 1,014 6,611 patterns time (s) Hybrid
8,134 14,451 9,674 18,410
411
ITC Special Section
scan chains are stitched together. Similarly, in our bottom-up scan insertion ow, the scan chains in each module are stitched based on the decreasing order of scan ip-ops cost functions, and the resulting scan chains are reordered during physical synthesis to reduce the scan chain routing area. Therefore, the new scan insertion method will not be affected signicantly because scan insertion and physical synthesis are not performed for the entire chip. Although, it can be argued that our scan chain stitch for controllability and observability might slightly increase the scan chain routing area in some cases, the decreases in scan-enable design effort and area overhead compared with LOS are signicant. Moreover, the technique has the exibility to shufe and reorder the different groups of scan chains (LOS controlled and ELOC controlled) if any scan-chain-routing problem arises.
Proc. Intl Test Conf. (ITC 05), IEEE Press, 2005, pp.
246-255. 6. S. Wang, X. Liu, and S.T. Chakradhar, Hybrid Delay Scan: A Low Hardware Overhead Scan-Based Delay Test Technique for High Fault Coverage and Compact Test Sets, Proc. Design, Automation and Test in Europe (DATE 03), IEEE Press, 2004, pp. 1296-1301. 7. S. Wang and S.T. Chakradhar, Scalable Scan-Path Test Point Insertion Technique to Enhance Delay Fault Coverage for Standard Scan Designs, Proc. Intl Test
Conf. (ITC 03), IEEE Press, 2003, pp. 574-583.

8. L.H. Goldstein and E.L. Thigpen, SCOAP: Sandia Controllability/Observability Analysis Program, Proc. 17th
Design Automation Conf. (DAC 80), IEEE Press, 1980,

pp. 190-196. 9. K.J. Rearick, Too Much Delay Fault Coverage Is a Bad Thing, Proc. Intl Test Conf. (ITC 01), IEEE Press, 2001, pp. 624-633. 10. W.C. Lai, A. Krstic, and K.T. Cheng, On Testing the
THE PROPOSED HYBRID TECHNIQUE significantly
Path Delay Faults of a Microprocessor Using Its Instruction Set, Proc. 19th VLSI Test Symp. (VTS 00), IEEE Press, 2000, pp. 15-20.
reduces the design effort and eases the timing closure by selecting a small subset of scan chains to be controlled using LOS. The experimental results also show that the pattern count is reduced and fault coverage is considerably increased. A statistical analysis is required to find the optimum number of LOS-controlled scan chains. Minimizing the number of LOS-controlled scan chains will even further reduce the design effort, and future work must focus on this issue.
Acknowledgments
Mohammad Tehranipoors work was supported in part by SRC grant no. 2005-TJ-1322. Nisar Ahmed performed the implementation work at Texas Instruments, India.
Nisar Ahmed is a PhD student in the Electrical and Computer Engineering Department of the University of Connecticut. His research interests include design for testability, at-speed testing, and CAD. Ahmed has an MS in electrical engineering from the University of Texas at Dallas. He is a member of the IEEE. Mohammad Tehranipoor is an assistant professor in the Electrical and Computer Engineering Department at the University of Connecticut. He has a PhD in electrical engineering from the University of Texas at Dallas. His research interests include computer-aided design and test, DFT, delay fault testing, test resource partitioning, and test and defect tolerance for nanoscale devices. He is a member of the IEEE, the ACM, and ACM SIGDA. Direct questions and comments about this article to Mohammad Tehranipoor, ECE Dept. of Univ. of Connecticut, Storrs, CT 06268; tehrani@engr.uconn.edu.
References
1. X. Lin et al., High-Frequency, At-Speed Scan Testing,
IEEE Design & Test, vol. 20, no. 5, Sept.-Oct. 2003, pp.
17-25. 2. J. Saxena et al., Scan-Based Transition Fault Testing Implementation and Low Cost Test Challenges, Proc. Intl
Test Conf. (ITC 02), IEEE Press, 2002, pp. 1120-1129.

3. N. Ahmed et al., At-Speed Transition Fault Testing with Low Speed Scan Enable, Proc. 24th VLSI Test Symp. (VTS 05), IEEE Press, 2005, pp. 42-47. 4. User Manual for Synopsys Toolset Version 2005.09, Synopsys, 2005. 5. N. Ahmed, M. Tehranipoor, and C.P. Ravikumar, Enhanced Launch-off-Capture Transition Fault Testing,
412
Special ITC Section
Impact of Thermal Gradients on Clock Skew and Testing

Sebasti A. Bota, Josep L. Rossell, and Carol de Benito
University of the Balearic Islands
Ali Keshavarzi
Intel
Jaume Segura
University of the Balearic Islands
Editor's note: It is a well-known phenomenon that test-mode switching activity and power consumption can exceed that of mission mode. Thus, testing can induce localized heating and temperature gradients with deleterious results. The authors quantify this problem and propose a novel design scheme to circumvent it. Ken Butler, Texas Instruments
CMOS TECHNOLOGY SCALING has brought circuit
applications using hundreds of millions of transistors with dimensions below 65 nm and operating frequencies beyond 4 GHz. Among the many challenges imposed by this scaling race during the past decade, increasing power consumption from generation to generation is a major concern. Two factors have caused most of the increase in total circuit power consumption: a scaling model based on supply voltage reduction, forcing the same trend on transistor threshold voltage, and an increase in operating frequency. The rst factor contributes to static or leakage power increase because of the exponential dependence of the transistors off-state current on the threshold voltage. The second factor determines active power because of short-circuit and capacitor charging/discharging current components. Researchers have pursued the development of advanced techniques to control IC total power consumption; these techniques span many domains, including manufacturing technology, device design, circuit design, and architecture. In addition to increasing overall power, a related effect drawing signicant attention is increasing power density. This increase is due to circuit critical-dimension reduction, which packs more active devices per unit area and therefore increases both static and dynamic power density components. This trend has a direct impact on circuit junction temperature, with a resulting increase of overall average operating temperature. Power density
relates closely to circuit activity and generally is not uniformly distributed within the circuit. As a result, thermal gradients between circuit regions can be as high as 40C to 50C in high-performance designs, creating nonuniform thermal maps.1 This phenomenon can lead to hot spots in localized IC regions. The main challenges to the accurate prediction of power density distribution and control stem from a lack of tools capable of handling the various mechanisms that determine hot-spot appearance. Such capabilities would include accurate layout-based determination of induced activity and resulting power distribution, circuit thermal-impedance computation, and heat ux distribution determination. Power containment tools and methods have traditionally targeted overall mean power or peak power estimation and reduction and in general are not valid to predict hot spots. Predetermination of circuit hot spots is important not only for reliability (for example, an increase in wire temperature accelerates interconnect electromigration), but also because of the circuits delay dependency on temperature. Hot spots can slow specic circuit regions with respect to other blocks or the clock line and can cause circuit failure because of timing-rule violation. Circuit hot spots can also directly affect the clock line at a given point, causing timing violations. These problems pose two concerns for circuit testing:
normal circuit operation can induce a given thermal map that is not reproduced during circuit testing, and activity induced during circuit testing can lead to modified thermal maps that can cause a circuit to erroneously pass or fail the test.
414
0740-7475/06/$20.00 2006 IEEE
Differences in thermal-map distribution between normal and test mode operations lead to a nonuniform effect on relative path delay within logic blocks. Test-induced hot spots can articially slow noncritical paths or accelerate critical ones with respect to the clock, causing the entire die to fail (pass) delay testing for a good (bad) part. Therefore, if designers dont properly consider higher activity during test mode and its effect on the clock network, a given percentage of dies can fail during test due to test-induced thermal-map modication. This would cause increased yield loss because the thermal maps impact on path delay during normal operation is different from that induced during test. This article shows that clock circuit distribution plays an important role in determining the effect of these mechanisms on circuit behavior. The evolution of VLSI chips toward larger die sizes and faster clock speeds makes clock design an increasingly important issue. In a synchronous digital IC, the clock network signicantly inuences circuit speed, area, and power dissipation. Because the clock function is vital to a synchronous systems operation, clock signal characteristics and distribution networks have drawn much attention. Any uncertainty in clock arrival times between two points, especially if these points are near each other, can limit overall circuit performance or even cause functional errors. Clock signals typically carry the largest fanouts, travel over the longest distances, and operate at the highest speeds of any signal, either control or data, in the entire chip. Furthermore, technology scaling particularly affects clock signals because long global interconnect lines become more resistive. In addition, as technology feature size shrinks, global metal layers that carry the clock signal are closer to the substrate while the use of low-k dielectrics for intralevel gap lling can signicantly increase thermal effects because these dielectrics have lower thermal conductivity than SiO2. Both effects contribute to a higher impact of substrate temperature nonuniformities on the clock line thermal distribution. Therefore, designers must investigate the possibility that the nonuniform substrate temperatures effect on clock skew is a new delay fault mechanism, even with exact zero-skew clock-routing algorithms. In this article, we analyze the impact of within-die thermal gradients on clock skew, considering temperatures effect on active devices and the interconnect system. This effect, along with the fact that the test-induced thermal map can differ from the normal-mode thermal map, motivates the need for a careful consideration of the impact of temperature gradients on delay during test. After our analysis, we propose a dual-VDD clocking
D/2 Figure 1. Symmetric three-level H-tree layout for clock distribution. D is the length of the H tree.
strategy that reduces temperature-related clock skew effects during test.
Clock networks and clock skew

Clock network design is a critical task in developing high-performance circuits because circuit performance and functionality depend directly on this subsystems performance. When distributing the clock signal over the chip, clock edges might reach various circuit registers at different times. The difference in clock arrival time between the rst and last registers receiving the signal is called clock skew. With tens of millions of transistors integrated on the chip, distributing the clock signal with near-zero skew introduces important constraints in the clock distribution networks physical implementation and affects overall circuit power and area. Researchers have done extensive work on automatic clock network design to minimize the effect of unbalanced clock path delays resulting from routing or differences in capacitive loading at the clock sinks.2 Most clock distribution schemes exploit the irrelevance of the absolute delay from a central clock source to clocking elementsonly the relative phase between two clocking points is important. Early methods used symmetric structures such as H trees or balanced trees. Figure 1 shows the H-tree clock topology, which consists of trunks (vertical lines) and branches (horizontal lines). In nonbuffered trees, top-level interconnect segments are wider than lower-level segments. Furthermore, top-
415
Special ITC Section
level global interconnect segments are routed through upper metal layers, whereas low-level local segments are routed through lower metal layers. In addition to zero skew, a second important requirement for a clock network is obtaining a high slew rate to get sharp clock edges. Designers achieve this by inserting buffers and repeaters in the clock network, creating a multistage clock tree, to isolate downstream capacitance and reduce transition times. Clock networks with several buffer stages are common in highperformance digital designs. Researchers have also proposed approaches that incorporate uneven loading and buffering effects resulting in non-H-tree topologies.3 Current designs incorporate clock distribution networks consisting of two parts: a global clock network and a local network. The global clock network distributes the clock signal from the clock source to local regions and usually has a symmetric structure. The local distribution network delivers clock signals to registers in a local area using a nonsymmetric structure because register location in the circuit is typically not regular. Any phenomenon that affects a nets delay can contribute to skew, so we can no longer ignore the portion of clock skew caused by process variations in nanometer technologies. Process variationssuch as effective gate length, doping concentrations, oxide thickness, and interlayer dielectric thicknesscause uncertain device and interconnect characteristics and can be a source of signicant clock skew. Dynamic variations such as power supply variations, coupling noise, and junction temperaturecan contribute to additional skew during circuit operation.4 Temperature is difcult to model and predict because of the switching activities of the various blocks composing the circuit and their variation over time. Thus temperature is an important source of skew. A nonuniform temperature gradient created by a hot spot can signicantly impact clock tree performance and worsen worst-case clock skew. Algorithms used to design zero-skew clock tree networks usually dont consider process variations or nonuniform thermal distributions as possible sources for clock skew. Researchers have proposed grid-based clock networks driven by one or more lines of buffers as an alternative to tree topologies. This method has proved highly effective in reducing sensitivity to process variations and environmental effects, typically at the cost of consuming more wire resources and power. A recent trend is to use hybrid structures formed by a symmetric tree and a mesh for the global clock network.5 Mori et al. demonstrated that adding a mesh to bottom-
level leaves of an H tree helps signicantly reduce clock skew caused by process variations.6 We focus on the relative impact of temperature and nonuniform thermal maps on hybrid clock networks, as they are widely used to achieve low clock skew and power consumption.
Temperature effects on delay

The impact of environmental variations on skew is difficult to analyze given its dependence on circuit activity that changes over time. The two major sources of environmental variations are power supply variations and temperature. Power supply variations are the main source of jitter, whereas temperature is a source of skew (typical time constants for temperature changes are on the order of milliseconds). Temperature affects the delay of both interconnect lines and clock buffers. The main sources of temperature generation in the chip are switching activities of the cells over the substrate and joule heating of the interconnects when current passes through them. In a highperformance design, junction temperature can vary more than 50C and reach an absolute temperature of 120C in some circuit regions. To explain these mechanisms, we introduce the temperature dependence of interconnect and buffer parameters.
Interconnect temperature dependence

Interconnect delay relates to metal resistance and the parasitic capacitance of wires that connect gates. An interconnects resistance has a polynomial relationship to its temperature. Assuming a rst-order approximation, this dependence is
R(T) = r0(1 + (T T0))
(1)
where r0 is the unit length resistance at reference temperature T0, and is the temperature coefcient of resistance (C1). The dependence of capacitance on temperature is usually small and is not comparable to resistance variations. Deutch et al. reported that temperature variation has a marked impact on wire delay for long interconnects that are basically resistance limited in terms of delay (as compared with capacitive and inductive components).7 Interconnect line resistance changes are about 20%, for a variation of 75C from ambient temperature.
Buffer temperature dependence

Buffer delay also changes with temperature through transistor parameters dependence on junction tem-
416
perature. These parameters include threshold voltage (VT), mobility (), and silicon energy band gap (Eg). Energy band gap thermal variations are usually small and not comparable to VT and variations. The expressions for the relationships of these last two components with temperature are
3.2E-11 2.8E-11 Time (s) 2.4E-11 2.0E-11 1.6E-11 1.2E-11 50 30 10 10 30 50 70 Temperature (C) High-to-low Low-to-high
VT(T) = VT(T0) (T T0)

and
90 110 130
(T) = (T0)(T/T0)M
Figure 2. Delay versus temperature in a 70-nm
where T0 is room temperature (T0 = 300 K); is the threshold voltage temperature coefcient, whose typical value is 2.5 mV/K, and M is the temperature exponent, whose typical value is 1.5. Junction temperature variation is an important source of driver resistance variation and can have a significant impact on buffer propagation delay. Figure 2 shows the variation of high-to-low and low-to-high propagation time for a 70-nm inverter, obtained from electrical simulations using Berkeley Predictive Technology models (http://www.eas.asu.edu/~ptm). The switching speed of CMOS inverters used as buffers is basically a function of resistance-capacitance (RC) time constants. To determine switching speed in Figure 2, we measured the 50% transition delay of an inverter loaded with another inverter stage and ideal wires. We assume that capacitance is temperature independent. Figure 2 shows that a model similar to the one in Equation 1 can approximate driver resistance variation with temperature. Our analysis of interconnect and buffer delay variation with temperature makes clear that a uniform increase of IC junction temperature results in a net increase in absolute delay through the clock distribution path (clock latency). In balanced trees, this effect is irrelevant because the main parameter for setting the system clock period is the worst-case delay of logic blocks between two consecutive register stages. The key parameter affecting skew is the relative arrival of the clock edge at registers at the end of each clock path.
low-leakage inverter gate.
R3 R2 R4 C2 R1 n0 C1 R6 R5 R7 C5
(a)
n1
C3
n2
C4
n3
C6
n4
C7
n1
n2
n0
n3
(b )
n4
Figure 3. RC tree used to compute Elmore delay (a) and equivalent one-level H tree (b).
Nonuniform thermal map effects

As mentioned earlier, an ICs power dissipation distribution is not uniform and depends on device and interconnect electrical characteristics, layout circuit placement, and the relative switching activity of different chip blocks. In this sense, dynamic thermal gradients are inevitable during normal circuit operation. Here, we compare temperature effects on nonbuffered and buffered clock tree networks.
Nonbuffered trees
We model nonbuffered trees using a lumped-RC tree. Figure 3a shows an example RC tree. We assume that the
417
Special ITC Section
Table 1. Comparison of unbuffered clock tree designs. Design style A B C Mean delay (ps) 356.11 68.95 107.0 Mean skew (ps) 35.84 4.18 1.06 Sigma skew (ps) 15.09 0.88 0.33 Maximum skew (ps) 110.1 7.45 2.60
= (3 103)C1, rsh = 0.077 /sq at T0, and csh = 7.681018 F/m2 as unit sheet resistance and unit area capacitance, respectively. We analyzed clock tree structures with three different designs:

tree has been designed such that the only sources of skew are process variations and environmental conditions. Using an Elmore delay metric, the delay from root node n0 to sink node ni in the RC tree is Di =
R C
j k j k
Design A is a clock tree using minimum-width interconnects. Design B has interconnect widths computed with Chen and Wongs algorithm,8 which optimizes for both clock delay and minimum skew. Design C is the same as Design B except that it has a grid shorting the H trees sink nodes. This modication has moderate impact on mean delay but provides signicant skew reduction.
(2)
where Rj is the set of resistances in the path between the source (root) and node ni, and Ck is the downstream capacitance at j, defined as the sum of all capacitances at any node k such that the unique path in the tree from k to the root must pass through j. As an example, we can compute the delay from root node n0 to node n3 in the H tree of Figure 3b as follows: D1 = R1 D3 = R1
C + R C + R C
k 2 k 3 k =1 k=2 k =7
k =7
k= 4
C + R C + R C
k 5 k 6 k =1 k=5
k =7
Tree symmetry leads us to assume that at the reference temperature, R2(T0) = R5(T0) = RL1,0, C2 = C5 = CL1, R3(T0) = R4(T0) = R6(T0) = R7(T0) = RL2,0, and C3 = C4 = C6 = C7 = CL2, therefore there is no skew between nodes n1, n2, n3, and n4. Given that resistances are temperature dependent and parameter is positive, performance degrades with increasing temperature (worsening the effective signal delay). In addition, because a nonuniform thermal profile doesnt impact all regions of the clock network distribution but slows only a restricted area, it has a major effect on skew. Therefore, as a result of temperature nonuniformities, the H trees symmetry cannot guarantee zero skew. For simplicity and without loss of generality, we considered a symmetric three-level H-tree clock structure to evaluate and compare the effects of variability and temperature gradients in nonbuffered structures. The area covered by the tree is 5 mm 5 mm. We considered circuit parameters for AlCu interconnects with
We investigated the impacts of parameter variation and temperature gradients on skew for each structure. Table 1 shows mean delay, mean skew, sigma skew, and maximum skew obtained from Monte Carlo simulations for 1,000 samples at a uniform room temperature. Both mean delay and skew from design A (wmean = 0.45 m, 3 = 20%) are much higher than those obtained from design B, which used the optimization algorithm. Design C provides the best values for sigma and maximum skew distributions, while providing about one third of additional overall delay with respect to design B. Redundancy created by mesh loops smoothes out undesirable variations between signal nodes spatially distributed over the chip.9 Figure 4 shows skew induced by a local hot spot of radius D/8 (D is the length of the H tree shown in Figure 1) when located at different positions of the H tree obtained for designs A and B. Our most significant observations from these experiments are as follows:

Total skew depends on hot-spot position. In nonoptimized trees, worst-case skew occurs when the hot spot appears near the clock driver. A design algorithm to optimize clock tree skew also optimizes the impact of thermal-induced hot spots. In optimized clock trees, depending on the hot spots magnitude and size, its impact can be about 20% of the skew from parameter variations.
Figure 5 compares worst-case clock skew caused by hot spots affecting one whole quadrant for designs B (no grid) and C. For C, we considered an ideal grid (no parasitic capacitance) and a realistic grid (with parasitic capacitance). The amount of worst-case skew caused by a 10C difference is of the same order of magnitude as the
418
1.0E-11 8.0E-12
1.4E-12 1.2E-12 1.0E-12 Skew (s) Skew (s) 50
6.0E-12 4.0E-12 2.0E-12 0.0E+00 Hot spot (a)
8.0E-13 6.0E-13 4.0E-13 2.0E-13 0.0E+00 Hot spot (b)
Figure 4. Hot-spot-induced skew at different clock tree positions for design A (a) and design B (b). The skew is computed across the whole clock tree; only the quadrant where the hot spot is placed is shown for simplicity.
Skew (s)
delay of one clock buffer, while the skew caused by a 50C difference is of the same order of magnitude as the skew from process parameter variations. Figure 5 also shows that inserting a grid reduces skew resulting from nonuniform thermal maps.
7E-12 6E-12 5E-12 4E-12 3E-12 2E-12 1E-12 0 0 10 20 30 40 Temperature increase (C)
Without grid Nonideal grid Ideal grid (C = 0)
Buffered trees
Buffers isolate downstream capacitance in the clock network (see Equation 2), thus reducing latency and transition times. In these networks, buffers are a primary source of total clock skew for two reasons. First, device parameter variation with temperature is much larger than interconnect variation. Delay degradation caused by temperature effects on the drivers on-resistance are far more severe than delay caused by interconnect resistance thermal dependency. Second, delay related to wiring length between two consecutive buffer stages is independent of the RC parameters of previous and subsequent wiring stages. We designed a buffered H-tree clock network (design A) and a clock network with a grid shorting the buffered H trees sink nodes (design B) in a 1-V nominal supply voltage, 70-nm technology (http://www.eas.asu.edu/~ptm). For design B we considered an ideal grid B1 (no parasitic capacitance) and a realistic grid B2 (with parasitic capacitance). We considered a 2-mm 2-mm chip and synthesized a three-level symmetric H tree using the method described by Cheng et al.,10 obtaining ve buffer stages between the clock source and any of the 64 sinks.
Figure 5. Impact of a hot spot on skew in one clock network quadrant for an optimized clock tree without a meshing grid, with an ideal grid without parasitic capacitances, and with a grid including parasitic capacitances.
To compute process variabilitys inuence on skew, we repeated the Monte Carlo analysis described earlier (a 3 variation of 30% in threshold voltage and 20% in interconnection width). Table 2 shows mean delay, mean skew, sigma skew, and maximum skew at a uniform room temperature. Again, redundancy created by mesh loops noticeably reduces undesirable variations between signal nodes spatially distributed over the chip.
419
Special ITC Section
Table 2. Comparison of buffered clock tree designs. Design style A B1 (C = 0) B2 Mean delay (ps) 403 403 466 Mean skew (ps) 15.77 0.17 1.054 Sigma skew (ps) 4.36 0.23 0.39 Maximum skew (ps) 32.6 1.2 2.6 Skew (ns)
8E-03 6E-03 4E-03 2E-03

Gradual Nongradual
0E+00
Without grid With ideal grid without parasitic capacitances With grid including parasitic capacitances
9E-11 7E-11
2E-03 10
10
20 30 Trise (C)
40
50
Figure 7. Skew due to ve inverters at a hot spot Skew (s) 5E-11 3E-11 1E-11 1E-11 3E-11 20 10 0 10 20 30 40 Hot spot temperature increment (C) of temperature T in an eight-inverter chain (nongradual), and eight inverters at temperatures 1/4T, 1/2T, 3/4T, T, T, 3/4T, 1/2T, and 1/4T, respectively (gradual).
50
Figure 6. Impact of a hot spot on skew. Measurements used 27C as the reference temperature with buffers biased at nominal supply voltage.
Comparing designs B1 and B2 shows the impact of the additional capacitance caused by the grid. Figure 6 plots total skews dependency on the magnitude of temperature increase between two different clock paths (we assume that the hot spot affects all stages of one path, while the other path remains at the reference temperature). The gure shows that skew is roughly proportional to T. A comparison of results in Table 2 and Figure 6 indicates that in a clock network without a grid, skew related to a hot spot that increases temperature by 30C can be as much as 20% of overall clock latency (mean delay). The skew plotted in Figure 6 is due only to the thermal gradients effect; if the combined effect of thermal maps and process variability is included, skew increases 1.3 times in case A and 2.0 times in case B2. The interconnect system plays a fundamental role in overall delay (which decreases by 50%from 403 ps to 142 psif the interconnect is neglected through zero
wire resistance and capacitance). Despite this benet, the interconnect systems impact on thermal-induced skew is around 7%. Therefore, although overall delay is interconnect dominated, its heat-related variation is mainly due to active devices. We also ran two experiments to investigate the relative impact on delay of the number of inverters relative to spot size. In the first experiment (the nongradual case), we computed the skew caused by eight equal-size inverters, five at the same hot-spot-elevated temperature, and three at a reference temperature. In the second experiment, we considered a chain affected by a gradual hot spotnot all inverters affected by the hot spot had the same temperature, but the chain had a nonuniform, gradual thermal profile in terms of the hot spots peak temperature T above Tref. Temperature distribution decreased from the central inverters to the side inverters. We considered eight inverters on the chain at the following respective temperature increments: 1/4T, 1/2T, 3/4T, T, T, 3/4T, 1/2T, 1/4T. Note that the sum of all temperature values is 5T, the same as the sum of all temperature increments for the nongradual case with five inverters at temperature Tref + T. Figure 7 compares skew results obtained for the nongradual and gradual cases, showing that skew is almost identical in the two cases. This suggests that we can compute the additional delay of n buffers (Dn), each at temperature Ti as
420
Dn
(T T
i i =1
ref
10 Temperature increase (C) 8 6 4 2 0

90% fmax 50% fmax
where Tref is a reference temperature. Therefore, the skew between two different clock sinks i and j is proportional to Di D j
(T
k
i ,k
T j ,k )
If the sum is performed for all tree stages, Ti,k is the junction temperature of the k stage in the path from the root to sink i, and Tj,k is the junction temperature of the k stage in the path from the root to sink j. Finally, from our comparison of buffered and nonbuffered clock trees, we conclude the following:

10 18 26 34 42 50 58 66 74 82 90 98 Activity (%)
Figure 8. Temperature increase versus activity, while controlling inputs of an array of independent logic circuits.
In buffered trees, skew is less dependent on the hot spots position in the tree. The relative impact of thermal gradients on skew with respect to parameter-variation-induced skew is greater in buffered clock trees than in nonbuffered trees.
Temperature impact on testing

Operating frequency and circuit activity are the two main factors that determine a circuits active power and, therefore, contribute to nonuniformities in junction temperature distribution. Active power increases almost linearly with operating frequency, but circuit activitys effect on relative temperature at different operating frequencies has not been investigated. This issue is important in comparing a circuits relative temperature increase during normal and test modes. Typically, a circuit working in normal mode operates at its maximum frequency, but only a small fraction of its internal blocks are active. Designers determine power constraints for normal circuit operation, usually assuming that random logic blocks will have about 20% to 30% of activity with respect to the clock signal. On the other hand, circuit activity is substantially higher in test mode than in normal operating mode, although the effective operating frequency is much lower because test stimuli must be scanned in and out through DFT structures. Such switching activity increases the devices overall energy, peak power, and average power consumption. The resulting elevated average power will affect the chips temperature distribution, and might not only increase overall chip temperature but also increase hot-spot apparition. Figure 8 compares relative temperature increase with internal circuit activity at two different operating
frequencies (50% and 90% of maximum frequency) for a circuit constructed from a 7 7 array of c432 ISCAS benchmark circuits. We obtained these results with Rossell et al.s thermal and power computation models.11 We used the resulting power map density to obtain a thermal map and calculate temperature increase. The results show that we can obtain a relatively equal junction temperature increase by running the circuit near full speed with a typical circuit activity of 20% (normal mode), or at half speed with increasing activity to about 80%. Therefore, since we can achieve similar thermal levels during normal and test mode operations, it is worthwhile to investigate the effect of thermal maps on delay during test mode. Our results are in line with other work showing the relative impact of increased power dissipation during test mode.12 Researchers have proposed strategies for limiting test-induced power excess by controlling either peak or average power. Some propose a proper selection of test vectors to reduce power dissipation and energy consumption while achieving high fault coverage. Many of these techniques rely on power-constrained test-scheduling algorithms and focus on reducing or maintaining circuit power consumption within safe operating margins. These methods dont pursue uniform power distribution over the die and therefore dont guarantee a uniform thermal map. We have explored a possible method for avoiding the delay impact of artificially created thermal maps due to test activity and the consequent masking of test results. Bellaouar et al. have shown that the rate of dri-
421
Special ITC Section
HLconverter
LHconverter
3.0E-11 2.5E-11 2.0E-11 Skew (s) 1.5E-11 1.0E-11 5.0E-12 0.0E+00 5.0E-12 1.0E-11 10 0 10 20 30 40 Temperature increase (C) 50
VDD = 1.2 V VDDopt = 0.8 V
VDDopt region
VDD region
Figure 10. Skew versus temperature increase for a three-level buffered clock tree without grid, biased at nominal supply voltage VDD and at
Figure 9. Dual-voltage clock scheme.
VDDopt for various hot-spot temperatures.
3.5E-03 Average power per cycle (W) 3.0E-03 2.5E-03 2.0E-03 1.5E-03 1.0E-03 5.0E-04 2.0E-11 1.5E-11 Skew (HS = 10C) 1.0E-11 5.0E-12 0.0E+00
With grid Without grid
5.0E-12 1.0E-11 0.7 0.8 0.9 1.0 1.1 1.2
VDD
Figure 11. Isopower skew improvement gained by using clock grid design for low-voltage section of clock tree shown in Figure 9.
ver resistance variations due to temperature uctuations is strongly dependent on power supply voltage, and that an optimum bias voltage (VDDopt) minimizes these vari-
ations.13 We have proposed a dual-supply-voltage clock tree to reduce skew related to temperature gradients.14 Figure 9 shows such a tree. The high-to-low converter (HLconverter) is a buffer that converts the incoming clock signal to the chip from a standard swing to a lower voltage swing. The HLconverters structure is relatively straightforward. To convert the clock swing from the standard voltage range to a lower voltage range, we use a conventional buffer driven by supply voltage VDDopt. The clock signal is then transmitted on the chip as a lowvoltage signal. At the utilization points at the sink flipops, the low-to-high converter (LHconverter) converts the signal using the block to the higher voltage swing, which is the voltage used by the logic network. The LHconverters structure is more involved; some design examples appear in other works.15 We performed a simulation experiment on a 130-nm technology to test a multiple-supply-voltage scheme that uses a bias supply selected to compensate for temperature-related effects. As Figure 10 shows, changing the clock buffers supply voltage from VDD to VDDopt significantly reduced total skew. A related advantage of this clock scheme is a reduction in power consumption. However, side effects such as noise on the supply network could be significant. Also, an increased impact of process parameter variations on delay at the reduced supply voltage16 could compromise the compensation effects. Our results suggest an alternative clock skew optimization approach: introducing a clock grid in the
422
clock region polarized at VDDopt during testing. This option would minimize the impact of process parameter variations and noise on clock distribution. It would also optimize power dissipation because the clock distribution circuitry would not contain a grid mesh avoiding mesh-based architectures power penalty of about 40%.5 To verify this optimization, we compared average power per cycle dissipated by the clock tree distribution and the resulting clock skew in two designs: a design without a grid and a design with a grid on the portion of clock distribution at the reduced optimal supply voltage. Figure 11 shows the results, which conrm the benet in overall skew reduction for an isopower comparison between these two design alternatives. The isopower scheme refers to the comparison between the two designs. The upper graph in Figure 11 is used to set the power limit (2.1 mW in this case), thus the horizontal line intersects two points from each curve in this graph being at the same power value. The intersection of the horizontal curve (iso power) with the supply voltage of the With Grid design determines the supply voltage of the clock tree; the lower graph in Figure 11 provides the value of the skew reduction. The isopower scheme doesnt provide the full gain achievable because the reduced supply voltage provided by the isopower requirement (around 0.97 V) is slightly beyond the optimal supply voltage for this technology (0.8 V). Nevertheless, skew decreased from 12 ps to less than 1 ps.
the overall delay magnitude is interconnect dominated. In this case, the hot spots relative position has much less impact than in nonbuffered trees. Interestingly, we have also observed that in buffered trees the hot spots impact on delay can be quantied without computing the hot spots exact thermal spatial prole with respect to the buffers. This might signicantly affect future CAD tool development. Our results show the importance of having a temperature-aware clock tree design. The combination of cross-link insertion and multiple-supply-voltage clock schemes is likely to provide the best trade-off between skew reduction and power-conscious design.
Acknowledgments
This work was partially supported by the Spanish Ministry of Science and Technology, the Regional European Development Fund under EU project TEC2005-05712/MIC, and Intel Research Labs.
References
1. S. Borkar et al., Parameter Variations and Impact on Circuits and Microarchitecture, Proc. 40th Design
Automation Conf. (DAC 03), ACM Press, 2003, pp. 338342. 2. B. Lu et al., Process Variation Aware Clock Tree Routing, Proc. Intl Symp. Physical Design (ISPD 03), ACM Press, 2003, pp. 174-181.
CLOCK SKEW has as much impact on overall parametric yield as any propagation delay. Large clock skews can cause timing violations because of the erosion in setup or hold times. Researchers have reported that process parameter variations, parasitics, and noise effects such as crosstalk affect the delay of each clock tree branch. We have shown that temperature gradients can also be an important source of clock skew, causing spatially correlated variations. Nonbuffered and buffered clock-tree networks respond differently to nonuniform thermal maps. In nonbuffered trees, a hot spots relative location in the tree structure has a high impact on overall thermal skew. The clock network distribution temperature is difcult to evaluate because network parasitics come from resistive components distributed in different metal layers and at different levels from those of the main power sources. In buffered trees, the main contributions to skew are differences in clock tree buffer delay, even if
3. G.E. Tellez and M. Sarrafzadeh, Minimal Buffer Insertion Clock Trees with Skew and Skew Rate Constraints,
IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 16, no. 4, Apr. 1997, pp. 333342. 4. G. Bai, S. Bobba, and N. Hajj, Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay, Proc. 38th Design Automation Conf. (DAC 01), ACM Press, 2001, pp. 295-300. 5. C. Yeh et al., Clock Distribution Architectures: A Comparative Study, Proc. 7th Intl Symp. Quality Electronic
Design (ISQED 06), IEEE Press, 2006, pp. 85-91.

6. M. Mori et al., A Multiple Level Network Approach for Clock Skew Minimization with Process Variations, Proc.
Asia South Pacic Design Automation Conf. (ASP-DAC

04), ACM Press, 2004, pp. 263-268. 7. A. Deutch et al., On-Chip Wiring Design Challenges for Gigahertz Operation, Proc. IEEE, vol. 89, no. 4, Apr. 2001, pp. 529-555.
423
Special ITC Section
8. Y. Chen and D. Wong, An Algorithm for Zero-Skew Clock Tree Routing with Buffer Insertion, Proc.
European Design and Test Conf. (ED&TC 96), IEEE

Press, 1996, pp. 230-236. 9. A. Rajaram, J. Hu, and R. Mahapatra, Reducing Clock Skew Variability via Cross Links, Proc. 41st Design
Automation Conf. (DAC 04), ACM Press, 2004, pp. 18-23.

10. C.K. Cheng et al., Interconnect Analysis and Synthesis, Wiley InterScience, 2000. 11. J. Rossell et al., A Fast Concurrent Power-Thermal Model for Sub-100 nm Digital ICs, Proc. Design
Carol de Benito is an associate professor in the Electronic Technology Group of the University of the Balearic Islands. Her research interests include device and circuit modeling and lowtemperature CMOS design. De Benito has an MS in physics from the University of the Balearic Islands. Ali Keshavarzi is a research scientist at Circuit Research Laboratories (CRL) of Intel. His research interests include low-power/high-performance circuit techniques and transistor device structures for future generations of microprocessors. He has a PhD in electrical engineering from Purdue University. Jaume Segura is an associate professor in the Electronic Technology Group of the University of the Balearic Islands. His research interests include device and circuit modeling and very large-scale integration design and test. Segura has a PhD in physics from the Polytechnic University of Catalunya. Direct questions and comments about this article to Sebasti A. Bota or Jaume Segura, Electronic Tech. Group, Univ. Illes Balears, Cra. Valldemossa, km. 7.5, 07122 Palma de Mallorca, Spain; sebastia.bota@ uib.es, jaume.segura@uib.es.
Automation and Test in Europe (DATE 05), vol. 1, IEEE

Press, 2005, pp. 206-211. 12. E. Larsson and Z. Peng, Power-Aware Test Planning in the Early System-on-Chip Design Exploration Process,
IEEE Trans. Computers, vol. 55, no. 2, Feb. 2006, pp.

227-239. 13. A. Bellaouar et al., Supply Voltage Scaling for Temperature Insensitive CMOS Circuit Operation, IEEE Trans.
Circuits and Systems II, vol. 45, no. 3, Mar. 1998, pp.
415-417. 14. S. Bota et al., Within Die Thermal Gradient Impact on Clock-Skew: A New Type of Delay-Fault Mechanism,
Proc. Intl Test Conf. (ITC 04), IEEE Press, 2004, pp.
1276-1284. 15. J. Pangjun and S. Sapatnekar, Low-Power Clock Distribution Using Multiple Voltages and Reduced Swings,
IEEE Trans. Very Large Scale (VLSI) Integration Systems, vol. 10, no. 3, June 2002, pp. 309-318.
16. S. Bota et al., Low VDD vs. Delay: Is It Really a Good Correlation Metric for Nanometer ICs? Proc. 24th VLSI
Test Symp. (VTS 06), IEEE Press, 2006, pp. 358-363.
Sebasti A. Bota is an associate professor in the Electronic Technology Group of the University of the Balearic Islands, Palma de Mallorca, Spain. His research interests include very large-scale integration design and test and lowtemperature CMOS design. Bota has a PhD in physics from University of Barcelona in Spain. Josep L. Rossell is an associate professor in the Electronic Technology Group of the University of the Balearic Islands. His research interests include device and circuit modeling, very large-scale integration design and test, and low-temperature CMOS design. Rossell has a PhD in physics from the University of the Balearic Islands.
Members save 25%

on all conferences sponsored by the IEEE Computer Society.
Not a member? Join online today!
www.computer.org/publications/
424
TEST TECHNOLOGY TC
NEWSLETTER
UPCOMING TTTC EVENTS
12th International Workshop on Thermal Investigations of ICs and Systems
27-29 September 2006 Nice, France http://tima.imag.fr/conferences/therminic/ Therminic workshops are offered annually to address the essential thermal questions of microelectronic microstructures, and of electronic parts in general. This years workshop discusses issues in thermal simulation, monitoring, and cooling. To develop more appropriate fault models, designers and test engineers must have a good handle on both systematic and random defect mechanisms to support the manufacturability of ICs for defect-based test approaches. Because of increasing design complexity and process variability, the focus is shifting to such approaches. This workshop addresses these issues.
First IEEE International Design and Test Workshop (IDT 06)

19-20 November 2006 Dubai, United Arab Emirates http://www.tttc-idt.org/index_les/IDT.CFP.06.pdf This event provides a unique forum in the Middle East and Africa region for researchers and practitioners of VLSI design, test, and fault tolerance to discuss new research ideas and results. IDT will run in conjunction with the annual Innovations of IT Conference and in parallel with Global IT Exhibitions (GITEX).
21st IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT 06)
4-6 October 2006 Arlington, Va. http://netgroup.uniroma2.it/DFT06/cfp.html DFT provides an open forum for discussing defect and fault tolerance in VLSI systems, including emerging technologies. Topics include all aspects of design, manufacturing, test, reliability, and availability affected by defects during manufacturing or by faults during system operation.
7th International Workshop on Microprocessor Test and Verication

4-5 December 2006 Austin, Texas http://mtv.ece.ucsb.edu/MTV This workshop brings together researchers and practitioners from verication and test to discuss todays difcult challenges in the processor and SoC design environments. Its the ideal environment for joint test and verication experiences and innovative solutions.
International Test Conference (ITC 06)

24-26 October 2006 Santa Clara, Calif. http://www.itctestweek.org/ ITC is the worlds premier conference on the electronic test of devices, boards, and systems. It covers the complete cycle from design verication, test, diagnosis, and failure analysis to process and design improvement. At ITC, test and design professionals can confront the challenges the industry faces and learn how academia, design-tool and equipment suppliers, designers, and test engineers address these challenges.
NEWSLETTER EDITORS INVITATION

Id appreciate input and suggestions about the newsletter from the test community. Please forward your ideas, contributions, and information on awards, conferences, and workshops to Bruce C. Kim, Dept. of Electrical and Computer Engineering, Univ. of Alabama, 317 Houser Hall, Tuscaloosa, AL 354870286; bruce.kim@ieee.org. Bruce C. Kim Editor, TTTC Newsletter
IEEE International Workshop on Current & Defect Based Testing (DBT 06)
26-27 October 2006 Santa Clara, Calif. http://www.cs.colostate.edu/~malaiya/dbt.html
CONTRIBUTIONS TO THIS NEWSLETTER: Send contributions to Bruce C. Kim, Dept. of Electrical and Computer Engineering, Univ. of Alabama, 317 Houser Hall, Tuscaloosa, AL 35487-0286; bruce.kim@ieee.org. For more information, see the TTTC Web page: http://tab.computer.org/tttc/.
Book Reviews
A comprehensive EDA handbook

Scott Davidson Sun Microsystems
IN THE November-December 2004 Last Byte, I bemoaned the fact that design has become so complex that no one person can understand all of it, and that EDA tools have become so diverse and complicated that we confine ourselves to a small subset of their functionality. The massive book under review here, Electronic Design Automation for Integrated Circuits Handbook, represents the best way I know to address this problem.
Reviewed in this issue

Electronic Design Automation for Integrated Circuits Handbook, edited by Louis Scheffer, Luciano Lavagno, and Grant Martin (CRC Press, 2006, ISBN 0-849-33096-3, 2 vols., 1152 pp., $149.95).
This two-volume set contains 49 articles on EDA, ranging from high-level design to technology CAD. The rst volume, EDA for IC System Design, Verication, and Testing, has ve sections: An introductory section outlines and summarizes the design process. A section on system-level design discusses modeling languages, processor and system modeling, performance metrics, and system-level power management. The microarchitectural design section describes performance estimation, power management, and design planning at this level. Six chapters on logic verication cover design and verification languages, and various verification methods. The nal section, on test, focuses on DFT, test generation, and analog test. The second volume, EDA for IC Implementation, Circuit Design, and Process Technology, focuses on the second part of the IC design flow. It includes sections on synthesis, place and route, analog and mixed-signal design, physical verification, and technology CAD. Chapters within these sections cover topics such as synthesis, power management at all levels, design rule checking, design for manufacturability, timing analysis, noise analysis, and libraries. I confess that I did not read all the chapters in this book, wishing to complete the review before we move to
biochips. In fact, when I rst received this text, I was certain that Id never nish it. But Im not so sure now. I have already read more chapters than Id originally intended, and I think the reason for this explains why this handbook is a success. In most cases, the material covers the important points without going into so much detail or length as to be intimidating. Chapters range from seven pages to 33 pages, with an average of 15 to 20, each including an extensive list of references. This seemed just right for the surveys making up this handbook. You cannot completely learn EDA from a book like this, of course, but you can learn quite a lot about EDA. There are three types of chapters throughout the book. Some are introductory in nature, surveying a topic such as design flow at a high level. Some target EDA users, showing the types of tools that are available and putting them into context. Others target EDA developers, describing the algorithms underlying the tools, with information on the benets of each. Some subjects are covered from several angles. Most subjects could be, of course, but that would balloon this work into three or four volumes. For the most part, I was happy with the choice of angle; only in a few cases, such as the chapter on design rule checking, would I have preferred a more user-oriented approach.
426
0740-7475/06/$20.00 2006 IEEE
One danger of a handbook approach is repetition, as important subjects tend to get covered more than once. However, I found very little redundancy in this book. There was practically none in the sections on test. The editors must have done an excellent job reviewing chapter outlines. The most important thing, though, is how good the individual chapters are. So, I will give my impressions of some of the ones I read. The first full chapter is The Integrated Design Process and Electronic Design Automation. This chapter starts a bit abruptly, but quickly progresses into an excellent overview of the design process. Id recommend it to everyone who reads these volumes. I wish it had some pointers to subsequent chapters, however. Chapter 5, SoC Block-Based Design and IP Assembly, is an excellent tutorial focusing on real issues, especially in the area of verication. Chapter 8, Processor Modeling and Design Tools, provides a taxonomy and survey of architecture description languages (ADLs). The taxonomy is excellent, describing very clearly what ADLs are and what they are not. However, I would have liked to see more of an industrial focus in the survey. This chapter had a bit more of an academic slant than most of the others. The chapter on Design and Verication Languages, on the other hand, covers commercially available languages with excellent examples. An outline gives the salient points of each language, with strong points and weak points, and includes a taste of how to do coding in each language. At 28 pages, this is one of the longer chapters, but hardly a word is wasted. There are three chapters on test, two of which Id like to discuss here. Chapter 21, Design-for-Test, is one of the longest (35 pages) in the book. It contains even more text than the page count indicates, in fact, because it includes absolutely no gures or diagrams. This chapter covers a lot, from the objectives and history of DFT; through scan, BIST, and compression for logic testing; to memory test. It ends with a short section on FPGA test (which could easily have been cut). The reader of this chapter might have a hard time distinguishing which of these concepts are truly important and which are minor. In addition, there is often too much detail for an introductory survey. For example, there is almost an entire page in the section on logic BIST about structural dependencies and scan chain
lengths. These are issues, but they could have been eliminated to make the chapter shorter and more readable. The chapter on Automatic Test Pattern Generation is more developer oriented, with a survey of ATPG algorithms. Its somewhat academic, with a large section on Boolean satisability (SAT) solvers for test generation, but this is balanced by an excellent section on applications for ATPGs beyond test generation. The chapter on Logic Synthesis (in volume 2) is 15 pages and has 11 references. It gives a very high-level view of a well-known subject. I think the author was right to avoid trying to cover all aspects of this area in depth, instead pointing the reader to places for further study. Chapter 6, Static Timing Analysis, is one of the best chapters I read. It is at the right length and depth, and it provides helpful pseudocode for the major algorithms discussed. Chapter 9, Exploring Challenges of Libraries for Electronic Design, considers not cell libraries, but IP libraries. At eight pages, it is very short and supercial. The last three subsections are basically only outlines. The last chapter I want to highlight is the one on Design Databases. This chapter is excellent. It targets users, but displays a deep knowledge of the implementations of design databases. It is also very readable.
EVERY DESIGN GROUP should have a copy of this handbook in its library. It is an excellent reference text. It can also serve as outstanding background reading for new engineers exposed to some of these areas for the rst time. The material here is better organized and better written than what could be found on the Web. Putting together such a high-quality, substantive work is quite an achievement. Ill be reading more chapters for quite some time to come. Direct questions and comments about this department to Scott Davidson, Sun Microsystems, 910 Hermosa Court, M/S USUN05-217, Sunnyvale, CA 94085; scott.davidson@sun.com.
427
Standards
DASC sees moves toward formality in design

Victor Berman Cadence Design Systems
AT A RECENT IEEE Design Automation Standards Committee (DASC) meeting (http://www.dasc.org/ meetings/2006-07/20060727_DASC_Minutes.doc), we discussed two interesting standardization proposals: Rosetta and Esterel version 7. Both are based on technology that has been under development for a long time, and both target the formalization of system-level design and verication. But, otherwise, they take very different approaches. We hear a lot of talk about movement to more abstract design paradigms. Are these proposals confirmation of this trend, or are they yet another false start? Read these brief outlines, and decide for yourself.
The Rosetta language

System-level design involves consolidating information from multiple domains to predict the effects of design decisions. To support system-level design, a language must allow heterogeneous specification while providing mechanisms to compose information across domains. The goal of the Rosetta system-level design language (http://www.sldl.org/standards.htm) is to compose heterogeneous specifications in a single semantic environment. Rosetta provides modeling support for different design domains, employing semantics and syntax appropriate for each. Thus, Rosetta lets designers write individual specifications with semantics and vocabulary appropriate for their domains. Users compose information across specication domains by dening interactions between them. To achieve this end, Rosetta provides a collection of domains, called facets, for describing system models. Interactions provide a mechanism for dening constraints between domains. Facets dene system models from one engineering perspective. Users can write facets by extending a domain that provides vocabulary and semantics for the
model. Using the design abstractions that its domain provides, a facet describes a systems requirements, behavior, constraints, or function. Domains provide vocabulary and semantics for defining facets. Each domain provides mechanisms for describing data, computation, and communication models appropriate for one area of systems design. Interactions dene how information from one engineering domain is reected in another. Domains dont share a common set of semantics, but rather share information when necessary using interactions. Thus, Rosetta denes each design facet by using appropriate design abstractions from that facets domain rather than forcing a common design model across all facets. Facet algebra expressions use facets, domains, and interactions to compose models into system descriptions. Users can evaluate local design decisions from a systems perspective by using interactions to understand how these decisions impact other system domains. Work on Rosetta is ongoing, with this Web site serving as a clearinghouse for language definition and usage information. The various Web pages provide definition and tutorial documents, as well as examples and standardization information.
The Esterel language

Esterel (http://www.esterel-technologies.com) is a formal synchronous language for unambiguously specifying and implementing hardware and software embedded systems. Esterel was initially developed in academia, with strong cooperation by industrial users. The Esterel developer community has developed the current Esterel version 7 language as a proposed standard. The developers derived this version from the previous Esterel v5 academic version by adding new features necessary for hardware design. Because of the formal character of the language and semantic kernels, you can fully and faith-
428
0740-7475/06/$20.00 2006 IEEE
fully translate Esterel programs either to hardware circuit descriptions written in conventional hardware description languages (HDLs) or to equivalent conventional software programs, with the very same behavior in both cases. Its also possible to translate Esterel programs to input for formal-verication systems (for example, model checkers) so that veried properties will be guaranteed to hold in hardware and software implementations. The proposed project will create an initial IEEE standard based on Esterel v7, ensuring unambiguous denition of the language syntax and semantics and, therefore, full interoperability between Esterel-based program implementation, static analysis, and verication tools. The output of the project will be the standard Esterel Language Reference Manual. This projects purpose is to provide the EDA, semiconductor, and systems-design communities with a welldened, ofcial IEEE denition of the Esterel language. This is necessary because Esterel is not a minor variant of existing languages that could be defined with an addendum to existing standards. Rather, Esterel is unique in the way it formally merges sequencing as typically only software languages do, uses single-clock or multiclock concurrency as typically only HDLs do, and employs unique temporal primitives that drive the life and death of activities within programs. Esterel also supports formal denition of data paths based on arbitrary precision and exact arithmetic, bit vectors, and arrays of arbitrary dimensions and types. These language primitives facilitate, by at least one order of magnitude, the
expression of complex behavior, providing the user with unmatched clarity and productivity for specification, design, and verication activities. Esterel lets you obtain equivalent hardware and software targets from a single source, so hardware simulation using software is more tenable. Esterel also lets you perform late choices between hardware and software nal implementation. The key technical objective is to stabilize and fully dene the languages syntax and semantics. The technical aspects to be scrutinized concern the data paths arbitrary-precision and exact-arithmetic features, the temporal statements particular to Esterel, and the life and death of activities and signals. I have no doubt that developers can solve all the involved questions in a completely rigorous way, thus providing a fully solid basis for both users and tool builders, and ensuring full interoperability between tools from diverse origins. A derived objective is to ensure that its possible to effectively compile a given Esterel design to other standardized languages such as VHDL, Verilog, SystemVerilog, C, and SystemC, with the same guaranteed behavior for all these different targets. This will require checking that all Esterel constructs are synthesizable in hardware or software, up to well-identied limitations of back-end synthesis or compilation tools.
Direct questions and comments about this department to Victor Berman, Cadence Design Systems, 270 Billerica Road, Chelmsford, MA 01824; vberman@ cadence.com.
Get access
to individual IEEE Computer Society documents online. More than 100,000 articles and conference papers available!
US$9 per article for members US$19 for nonmembers
http://computer.org/publications/dlib/
429
CEDA Currents
A Conversation with Robert Brayton

On the occasion of Robert Brayton receiving the 2006 EDAA (European Design and Automation Association) Lifetime Achievement Award and the 2006 IEEE Emanuel R. Piore Award, Karti Mayaram from CEDA Newsletter spoke to him about his career, achievements, and moments of inspiration. Brayton also had some practical advice for young researchers.
It was a pleasure talking with Bob Brayton. All of us who have been affiliated with the EDA field are well aware of the many fundamental contributions hes made. His impact on the industry has been tremendous.
Back to school
After spending 26 years at IBM Research, Bob started a second career as a professor in the Electrical Engineering and Computer Sciences Department at the University of California, Berkeley, in 1987. He had spent a year at UC Berkeley on sabbatical from IBM Research in 1985, during which he had worked with some very talented students on logic synthesis and the development of industrial-quality tools. When he returned in 1986, IBM Research was offering early retirement, which he decided to accept, turning his sights toward academia. When UC Berkeleyhis top choicemade him an offer, he accepted, and he has continued to make important contributions in logic synthesis.
The Aha! moment

The most thrilling moment in his career was the development of the Sparse Tableau Approach (STA) for assembling and solving circuit equations. He and his colleague, Gary Hachtel (now a professor at the University of Colorado in Boulder), were having a conversation after a game of tennis. Theyd been thinking about an elegant solution for assembling circuit equations for some time. Suddenly, all the pieces of the puzzle were falling into place. This was the start of STA. Before their work, circuit equation assembly required different kinds of manipulations and reductions. STA provided a simple way to assemble electrical-circuit equations. There was no need for reducing equations; with STA, you could directly apply Gaussian elimination. This work was one of the cornerstones of IBMs circuit simulator, Astap (Advanced Statistical Analysis Program).
The early years

Bob grew up in Ames, Iowa, and attended Iowa State University, where he graduated with a BS in electrical engineering in 1956. After a 6-month stint in the US Army, he went to MIT to pursue a PhD in mathematics. He chose math because he thought he lacked mathematical foundations and he had a strong interest in this field. Bob believes this unique combination of an undergraduate EE degree and a PhD in math has been a major contributor to his success. He not only has a good understanding of the application area but also a strong foundation in mathematical tools to solve relevant problems. A year before completing the PhD program, Bob accepted a summer job at IBM Research (T.J. Watson Research Center) in the Mathematical Sciences Department. This was such a wonderful experience that he joined the department upon completing his PhD. Looking back, he realizes that both MIT and IBM Research were very influential in his life. He had the opportunity to work with exceptional people. Moreover, at IBM Research he had freedom to work on the research topics that most interested him. This combination of wonderful colleagues and exibility to pursue appealing subjects helped shape Bobs career.
Some thoughts on EDA developments

I also asked Bob what he thought were the most exciting developments in EDA. He said there is always a progression of things, and newer developments overshadow some of the developments of the past. But he named two topics that he saw as step functions. One was binary decision diagrams. BDDs provided a way for efciently manipulating large logic equations, and they proved important for logic synthesis and verification. The other topic was the work on solving stiff differential
430
0740-7475/06/$20.00 2006 IEEE
equations, performing equation assembly and solution, and integrating these techniques as packagessuch as the Astap and Spice circuit simulatorsfor solving circuit problems.
National Science Foundation

Power, Controls and Adaptive Networks (PCAN) Deadline: 7 September - 7 October 2006 http://nsf.gov/funding/pgm_summ.jsp?pims_id=13380
What lies ahead

Bob named deep-submicron and nanometer design as the greatest challenges facing EDA. We no longer have the luxury of working on independent problems that can be solved separately. Electrical interference and manufacturing variations make very low nanometer CMOS design a difcult problem. Then there are the new technologies that will require effective design tools.
Foundations of Computing Processes and Artifacts (NSF 06-585)

Deadline: 10 October 2006 http://www.nsf.gov/pubs/2006/nsf06585/ nsf06585.htm
Upcoming CEDA Events

Please see these Web sites for upcoming events:
Some practical advice

His advice to young researchers in a challenging funding environment is to work on relevant problems and to keep putting out proposals. Being able to solve relevant problems in interesting ways can be a big motivator. This is what motivates Bob. He is able to identify such problems and find interesting ways to solve themjust like putting together the pieces of a puzzle. The whole notion of fads driving research funding and publications is not appealing to him. Such an approach takes away resources from basic research. There should be more emphasis on fundamental work. Asked how he felt about receiving the two recent awards, he replied, surprised. He also thanked the people who took the time and effort to put together the nominations for these awards. Bob is an extremely modest and wonderful person who has made seminal contributions to EDA. We all are happy to see him get the recognition that he deserves.

CODES+ISSS: http://www.esweek.org Nano-Net: http://www.nanonets.org FMCAD: http://www.cs.utexas.edu/users/hunt/FMCAD ICCAD: http://www.iccad.com PATMOS: http://www.patmos-conf.org VLSI-SoC: http://tima.imag.fr/conferences/ VLSI-SoC06
CEDA Distinguished Speaker Reception

The Councils Distinguished Speaker Series features detailed presentations of the most signicant research results in EDA over the past year, as demonstrated by awards at our top conferences and journals. The second presentation in this series took place at the Moscone Center in San Francisco during DAC 2006. The featured article was by Janusz Rajski, Jerzy Tyszer, Mark Kassab, and Nilanjan Mukherjee, the authors of this years IEEE Transactions on Computer Aided Design Donald O. Pederson Best Paper Award. Their presentation, which covered several aspects of VLSI testing, had significant tutorial value and will be archived at the Councils Web site (http://www.c-eda.org).
CEDA Currents is a publication of the IEEE Council on Electronic Design Automation. Please send contributions to Kartikeya Mayaram (karti@eecs. oregonstate.edu) or Preeti Ranjan Panda (panda@ cse.iitd.ac.in).
Upcoming Research Funding Opportunities

US Department of Defense
Experimental and Theoretical Development of Quantum Information Science Deadline: 11 December 2006 http://www.arl.army.mil/main/Main/ DownloadedInternetPages/CurrentPages/ DoingBusinesswithARL/research/QC06Final6Jul06.pdf
431
The Last Byte
Getting more out of ITC

Anne Gattiker IBM Austin Research Lab
THE 2006 INTERNATIONAL TEST CONFERENCE
theme encourages us to consider ways for getting more out of test. How about getting more out of the International Test Conference? Technical paper sessions are the heart and soul of ITCand theres something there for everyone, from classic microprocessor and ATE sessions to delay, test compression, test power, and more. But there are plenty of ways to get more out of ITC. Weve changed the structure of ITC and Test Week (22-27 October) for this years new site, Silicon Valley. The new format offers some great opportunities. First, be sure to arrive in time for Mondays test Q&A panel (23 October), starting at 4:45 p.m. Come hear the experts discuss diverse test topics unrehearsed. Remind yourself that even the experts dont have all the answers; theres still plenty to debate on every topic. Get up the next morning to attend the Tuesday plenary, which starts at 9:30 a.m. The plenary kicks off a day specially organized to include material for those who manage test. Dont miss Tuesday afternoons executive test panel, which boasts an impressive array of participants sharing unique perspectives on the cost of quality. Watch the users of silicon debate the providers, and find out their views on how we can get more out of test. Afterward, enjoy the welcome reception, where you can meet friends and colleagues and find out which panelists perspectives they plan to take home with them. Be sure to schedule enough time to visit the exhibit floor. How else can you improve your standing with your children by bringing home all sorts of nifty gadgets and at the same time nd out about the latest offerings from the key vendors in test-related elds? Dont forget the free lunch Tuesday, Wednesday, and Thursday on
the exhibit floor. If the line looks long, grab a few colleagues and discuss the latest developments youve heard. Afterward, take advantage of an opportunity to hear industry authorities address your favorite topic and minetest, of courseat each days invited address, conveniently located adjacent to the exhibit hall. ITC has only one regular panel slot this year, so be sure not to miss itand dont forget the wine-andcheese party afterward. In addition to these treats, we have our usual outstanding set of papers, so you can learn what advances are on the way. We also have an interesting lecture series, providing you with information that you can take back to work to use right away. This year marks ITCs rst visit to Silicon Valley. For those who work in the area, this is the easiest ITC to attend yet. Getting to the Santa Clara Convention Center might be a shorter commute than going to workand parking is free.
I LOOK FORWARD to seeing you at ITC. Lets learn some,
do some business, have some laughs, and get inspired to get more out of test. You can nd out all about ITC and Test Week at http://itctestweek.org.
Anne Gattiker is a research staff member at IBM Austin Research Lab. Contact her at gattiker@us. ibm.com Direct questions, comments, and contributions about this department to Scott Davidson, Sun Microsystems, 910 Hermosa Court, M/S USUN05-217, Sunnyvale, CA 94085; scott.davidson@sun.com.
432
0740-7475/06/$20.00 2006 IEEE
Here now from the IEEE Computer Society
IEEE ReadyNotes
Looking for accessible tutorials on software development, project management, and emerging technologies? Then have a look at ReadyNotes, another new product from the IEEE Computer Society.
ReadyNotes are guidebooks that serve as quick-start references for busy computing professionals.
Available as immediately downloadable PDFs (with a credit card purchase), ReadyNotes sell for $19 or less.
www.computer.org/ReadyNotes
IEEE DESIGN & TEST OF COMPUTERS September-October 2006
CAREER ACCELERATOR FORUM

AUTUMN 2006
improve your career
in the long run.

Changing Lanes Webinars 12 October 2006
Visit our online Conference Pavilion and participate in our FREE LIVE WEBINARS on 12 October, featuring renowned career specialists, educators and prominent techscience professionals. Get insight and advice on advancing your degree, getting ahead in your field, landing your dream job and even changing your career. Plus, this interactive event makes it possible for you to question our expert panel all from the convenience of your desktop!
Featuring panelists in the fields of Engineering, Computer Science and MBA Education
Webinar 1: The Risks and Rewards of Entrepreneurship in Todays Technology Environment
TIME: 12:00 PM ET/9 AM PT/16:00 GMT Topics and Speakers TBA
It worked for Bill Gates, Andy Grove, Irwin Jacobs, and Scott McNally. Why not you? This may be just the time to start your own business or take an executive position at a start-up. Experienced executives and experts in entrepreneurship will tell you how to get started, what to expect, and how to leverage education and work experience to ensure success.
Interactive Exhibition Hall

Visit our Online Exhibition Hall to view presentations by leading colleges and universities. Find key information on graduate and postgraduate studies, take online campus tours, get career counseling advice and download admissions applications. Search at your own pace 24/7.
Webinar 2: Programming Your Future in Computer Science

TIME: 1 PM ET/10 AM PT/17:00 GMT Topics and Speakers TBA
Hit your stride at this innovative online event. Sign up today at
Wireless Internet and security are currently the hot technologies, and require a more diverse level of skills in computer science. But there are also opportunities for positions as systems analysts for those with degrees in related fields. Industry experts will tell you what employers are looking for, how to match your training and skills with the best jobs, and how to use relevant work experience to your advantage.
www.spectrum.ieee.org/caforum
VOLUME 23 NUMBER 5

105.100.000 DesignnTest ESL Special Edition Complete

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

105.100.000 DesignnTest ESL Special Edition Complete

Uploaded by

Copyright:

Available Formats

SPECIAL ITC SECTION

ALEXANDER TORRES 2006

Electronic System-Level Design

SeptemberOctober 2006 Volume 23 Number 5 http://www.computer.org/dt

ITC Special Section

Cover design by Alexander Torres

333 387 425 426 428 430 432

TECHNICAL AREAS ____

D&T ALLIANCE PROGRAM______________

ADVISORY BOARD _____

The new world of ESL design

Kwang-Ting (Tim) Cheng Editor in Chief IEEE Design & Test

0740-7475/06/$20.00 2006 IEEE

Copublished by the IEEE CS and the IEEE CASS

0740-7475/06/$20.00 2006 IEEE

Copublished by the IEEE CS and the IEEE CASS

Electronic System-Level Design

ESL DESIGN, METHODOLOGIES, LANGUAGES, AND

IEEE Design & Test of Computers

The IEEE Computer Society

Electronic System-Level Design

A Component-Based Design Environment for ESL Design

ESL design has many faces

Example 1: Public-key cryptography on 8-bit microcontrollers

0740-7475/06/$20.00 2006 IEEE

Copublished by the IEEE CS and the IEEE CASS

IEEE Design & Test of Computers

Electronic System-Level Design

RAM or Flash ThumbPod-2 client Minutiae extraction

Chip command interface

Cryptography module Master key

Root of trust (b)

Secure circuit style

Example 3: Accelerated embedded virtual machines

ize their underlying design principles by using a component-based approach.

Component-based ESL design

IEEE Design & Test of Computers

KNI interface Hardwaresoftware interface AES coprocessor

160 integration overhead

Electronic System-Level Design

IEEE Design & Test of Computers

Platform design using Gezel

Cycle-true system scheduler

Electronic System-Level Design

IEEE Design & Test of Computers

Security partitioning for an embedded ngerprint authentication design

Electronic System-Level Design

SoC: Flexible Design with Congurable Processors,

Acceleration of embedded virtual machines

Design Automation Conf. (DAC 05), ACM Press, 2005,

Architecture, and Synthesis for Embedded Systems

HETEROGENEOUS SYSTEM architectures will continue

Test, vol. 19, no. 6, Nov.-Dec. 2002, pp. 52-63.

Methodology, Kluwer Academic Publishers, 2000.

IEEE Design & Test of Computers

Sign Up Today for the IEEE Computer Societys e-News

Available for FREE to members.

Electronic System-Level Design

Modeling Embedded Systems: From SystemC and Esterel to DFCharts

THE DESIGN PRODUCTIVITY of engineers has not

0740-7475/06/$20.00 2006 IEEE

Copublished by the IEEE CS and the IEEE CASS

IEEE Design & Test of Computers