You are on page 1of 10

1

Coping with the Complexity of Microprocessor


Design at Intel – A CAD History

Patrick Gelsinger, Desmond Kirkpatrick, Avinoam Kolodny and Gadi Singer

Abstract — Necessity has driven the evolution of microprocessor design practices and CAD tools at Intel Corporation, as the
transistor count has grown by a factor of about 4X each processor generation. In order to cope with the complexity of design tasks,
Intel's engineers were early adopters and adapters of innovative CAD research from universities. A unique partnership with Alberto’s
group in U.C. Berkeley during the 1980's has created one of the first industrial-strength synthesis-based design flows, which became the
prevalent paradigm for the whole electronic industry. This paradigm enabled the semiconductor foundry business model, facilitated the
proliferation of fab-less semiconductor companies all over the world, while enabling Intel designers to keep pace with Moore’s Law.

This incredible growth rate could not be achieved by


I. INTRODUCTION hiring an exponentially-growing number of design
During the 1980's, Intel Corp. transformed itself from a engineers. It was fulfilled by adopting new design
semiconductor company producing memory chips into a methodologies and by introducing innovative design
computer company [1]. Intel's transformation was actually a automation software at every processor generation. These
part of a revolution in the whole electronics industry: in the methodologies and tools always applied principles of raising
beginning of the decade, microprocessors were considered as design abstraction, becoming increasingly precise in terms
toys; the computer industry was dominated by mainframes of circuit and parasitic modeling while simultaneously using
and minicomputers made by vertically-integrated ever-increasing levels of hierarchy, regularity, and automatic
companies. By the end of that decade, microprocessors synthesis. As a rule, whenever a task became too painful to
became the standard engines for computing platforms, and perform using the old methods, a new method and associated
the whole industry was restructured. Many more vendors tool were conceived for solving the problem. This way, tools
entered the industry, each specializing in different areas. and design practices were evolving, always addressing the
These changes were fueled by the continuous scaling of most labor-intensive task at hand. Naturally, the evolution of
MOS technology, which followed Moore's law. tools occurred bottom-up, from layout tools to circuit, logic,
Interestingly, in his original 1965 paper [2], Gordon Moore and architecture. Typically, at each abstraction level the
expressed a concern that the growth rate he predicted may verification problem was most painful, hence it was
not be sustainable, because the requirement to define and addressed first. The synthesis problem at that level was
design products at such a rapidly-growing complexity may addressed much later.
not keep up with his predicted growth rate. However, the This paper is about the co-evolution story of design
highly competitive business environment drove to fully methodologies, practices and CAD tools in Intel's design
exploit technology scaling. The number of available environment, as it had to cope with growing complexity
transistors doubled with every generation of process since the turbulent 80's and until recent years. It is
technology, which occurred roughly every two years. As interesting to note that at the beginning of this process the
shown in Table I, major architecture changes engineering culture was advocating a tall, thin designer.
in microprocessors were occurring with a 4X increase of Nowadays, VLSI engineers are highly specialized in
transistor count, approximately every second process different areas of the design discipline, where specialized
generation. Intel‟s microprocessor design teams had to come tools are used in each area. This is similar to the restructuring
up with ways to keep pace with the size and scope of every of the whole computer industry from vertical to horizontal.
new project. In the 80‟s, the CAD industry itself was nascent at best.
TABLE I: INTEL PROCESSORS 1971-1993 While some areas like schematic or layout entry had solid
Processor Intro Process Transistors Freq commercial offerings, the rapidly evolving complexity of
Date this young industry gave little hope from commercial tool
4004 1971 10 um 2,300 108KHz offerings at the time. Thus, most tools emerged from internal
8080 1974 6 um 6,000 2 MHz development, external university research or often a
8086 1978 3 um 29,000 10 MHz coevolving blend of internal work with external tools and
80286 1982 1.5um 134,000 12 MHz research. While there were a number of corporate university
80386 1985 1.5 um 275,000 16 MHz relationships at the time, none was as significant as that of
Intel486 DX 1989 1 um 1.2 M 33 MHz Intel with U.C. Berkeley. In particular, Alberto and his
Pentium 1993 0.8 um 3.1 M 60 MHz collaborative research team consisting of Prof. Robert
2

Brayton, Prof. Richard Newton and many graduate students, used by VLSI engineers in those NMOS days. The clever
had developed a strong partnership with Intel and its NMOS design tricks typically resulted in superior densities
microprocessor teams. This long partnership with Intel albeit with commensurate complexities they inherently
stands as one of the most fruitful relationships in EDA with carried with them.
fundamental breakthroughs in multiple elements of
microprocessor logic, synthesis and layout. Many of these B. Evolution of Intel's logic design and RTL modeling
early successes resulted in enormous benefit to Intel and
As it became too error-prone to debug logic behavior of
eventually made their way into the EDA industry as key
processor circuits by hand, and too time-consuming to verify
enablers of many EDA tools and today‟s fab-less /ASIC/SoC the logic behavior by circuit simulation using continuous
semiconductor industry. waveforms, people at Intel were looking for an executable
functional model. At that time, the mainframe computer
industry was already using gate-level logic simulators,
II. DESIGN ENVIRONMENT FOR THE EARLY X86 PROCESSORS which used variable-delay models for TTL gates (made with
bipolar junction transistors). An attempt to adopt logic
simulation at Intel resulted in a failure: A gate-level logic
A. Inherited tools from memory chips simulator called LOCIS was developed at Intel in the mid
Intel's initial design environment was formed to serve the 1970's, and the 8086 design engineers converted their
needs of memory chips. During the 70's, the primary CAD transistor-level schematics into an equivalent logic model
tools were layout capture and verification tools, used by using LOCIS gate models. However, the generic gate models
draftsmen to generate and check mask layouts. These tools of the simulator did not match the tricky MOS logic
were put in place because the layouts were already too structures of the 8086 schematics, and its gate-delay models
complicated to develop and maintain on solely paper or burdened the users with too many irrelevant timing-related
Mylar, hence polygon-based layout representations had to be messages and glitch warnings.
stored and handled by computerized tools, initially on After this experience, engineers turned to build functional
dedicated systems such as the Calma or Applicon. models with general-purpose programming languages. One
Engineers were doing circuit and logic designs at the of the first Register-Transfer Level (RTL) models at Intel
transistor level, usually by hand, producing hand-drawn was developed for the 8087 numeric co-processor in 1978. It
schematics at the transistor level for the layout designers. was a FORTRAN program which described the logic
The engineers did most of their design work using pencil and behavior of circuits, as extracted by human interpretation of
paper, but they also had circuit simulation tools derived from the transistor level schematics. It was used for verifying and
the industry standard Spice [3] program, which originated debugging the microcode programs stored on the chip.
from Don Pederson‟s group at U.C. Berkeley, and later on In the design of the 80286 processor, the starting point
refined by Newton, Alberto and students (Intel‟s version was was already a functional RTL model. This model was
known as ISPEC). It was possible to simulate and check manually translated into schematics in a top-down fashion,
logic behavior and timing waveforms for small circuits, up to rather than the other way around! The model was written in
a few hundred transistors. MainSail [4] an Algol-like general-purpose programming
As Intel started doing logic products, including the first language that derived from Stanford‟s AI Language (SAIL).
microprocessors (the Intel 4004, 8008, and 8080), design RTL modeling and simulation by a compiled program in a
engineers inherited all of those tools and methods which standard language (where logic propagation between gates is
were initially conceived for memory chip design. Some actually assumed to occur without any delay) was made
engineers preferred to perform logic design using gate-level possible because of a strictly-synchronous design
schematics, but this encountered some push-back from the methodology, with two non-overlapping clock signals Phi1,
layout designers who were familiar with transistor Phi2. During each phase of the clocks, new signal values can
representations, which directly matched the layout. propagate in the logic network, and the logic designer only
Translation of logic gate symbols into transistor structures cared about the final, steady-state values which were latched
was not a trivial task, because the early microprocessors and at the end of the clock phase. As a separate task, someone (a
numeric co-processors (8087, 80387) were designed in circuit designer) had to ensure that the cycle-time was long
NMOS technology. Circuit operation relied on device enough for the circuit to reach a steady state in each phase.
strength ratios, so each gate symbol had to be accompanied The RTL program simulated the circuit behavior at a
with specific transistor sizes. In addition, the prevailing cycle-by-cycle timing resolution by invoking code for each
design style supported many complex gate pull-down clock phase in turn. This approach was inspired by Mead and
devices, pass transistors for clocking structures, dynamic Conway's famous book [5]. Today, this approach seems
circuits and numerous other clever and often "tricky" trivially obvious. However, in that era, logic design was
structures which could not be cleanly represented by a typically done in the context of detailed timing-dependent
simple logic gate abstraction. Consequently, even logic behavior, where both timing and logical function were
design was actually performed by engineers at the transistor verified simultaneously. With the synchronous
level (a.k.a. switch-level), such that even well-known methodology, separation of the functional simulation from
techniques such as logic minimization by Karnaugh maps, the timing issues enabled successful large scale design and
which were taught at engineering schools, were not widely created two kinds of engineers, who could worry about two
3

separate problems: the logic designers focused on the very architecture and design it created, and the tools to dig
functional correctness problem, and the circuit designers into this problem were weak and laborious.
focused on transistor sizes, voltage levels, parasitic This crisis triggered the introduction of Static Timing
capacitances and gate delays. Separation of concerns like Analysis into Intel, and development of the Coarse-Level
this continues to be a powerful mechanism in design Circuit Debugger (CLCD) tool [10]. It was a schematic-level
automation. analysis tool for electrical rule checking and critical path
Taking advantage of MainSail's support of dynamical finding, which could discover circuit-levels bugs and resolve
linking of separately compiled modules, RTL models of device sizing issues. It could also extract the logic
large circuit blocks were coded as program modules, and a functionality of transistor-level circuit structures and
simulator, SIM [6], was developed to control and monitor represent them by logical expressions. However, the new
their execution. The first Intel design to use such a scheme capabilities were applied in the next generation
was the iAPX 432 chip set, developed in Oregon and microprocessor, the 386, which was no longer in NMOS but
released in 1981. rather in CMOS.
In the 80286 design, the blocks of RTL were manually
translated into the schematic design of gates and transistors III. THE 386 DESIGN ENVIRONMENT
which were manually entered in the schematic capture
system which generated netlists of the design. The In moving to the 386 during 1982, the design team quickly
schematics would be simulated via the switch-level ported the 286 design modules to the 386 design
simulator MOSSIM [7] and compared to the RTL design on
environment as a starting point. In particular, for the
a per clock per signal basis. This was a laborious procedure
complex memory protection model of the 286, some of these
but verified the logical integrity of the RTL with that of the
entered schematic design. Design changes were always blocks would make it to the final 386 with minimal changes.
challenging as they required the synchronization of the However, most of the remainder of the design went through
changes into RTL and schematic databases. radical changes with the move to 32-bit datapath width and
There was a separate path for the handful of the introduction of the flat paging model. The design work
programmable logic arrays. In this case the PLA functions iterated rapidly with the RTL being the center of the logic
were optimized using the internal LOGMIN tool which design team‟s efforts. RTL simulation for the first time
automated the logic minimization process. The same dominated the overall computing load of the design team as
resulting PLA codes were loaded into the RTL as a macro logical correctness became the focus of the team's activity.
function and into the schematic system and used to program With the team focused on RTL design and the substantial
the PLA arrays into the layout. Much of the early automation complexity increase from the 286, the question was how to
in PLA synthesis at Intel was enabled by Alberto‟s U.C. more effectively provide the translation to schematics and
Berkeley research in two-level logic minimization by the logical representation of the chip. In particular we were
Espresso [8] and physical automation (e.g. PLA folding [9]) looking for acceleration of the design process, minimization
to make large control circuit synthesis using PLAs practical. of manual translation errors and handling of the rapidly
increasing design complexity. With these goals in mind, the
C. The issue of performance verification relationship with U.C. Berkeley and ASV was quickly center
The RTL-based functional design methodology has to our efforts. Albert Yu (manager of the Microprocessor
separated the issue of timing from the issue of functional Division) and Pat Gelsinger (leading new design methods in
correctness, assuming that synchronous methodology was Corporate CAD at the time) visited Berkeley to explore some
enforced, and that the clock is slow enough for all logic paths of Alberto‟s research work and affinity toward our problems
to settle to a steady logic state within each clock phase. as well as the ability to partner on these challenges.
However, during this time critical paths were only modestly The meeting focused on topics such as the regularization
considered during the design phase largely due to lack of of layout and the potential use of YACR (yet another channel
tools and engineer‟s knowledge of the design and the „likely‟ router)[11], TimberWolf [12], logic synthesis, and potential
critical areas. The large majority of critical paths were not for multi-level logic synthesis, where the path between input
fixed until they were discovered on silicon. The clock could and output could propagate through several logic gates rather
be slowed down until no critical path failure existed. Then than just two as in a PLA. Albert Yu‟s proposition was that
the clocks frequency was sped up, but specific clock pulses Intel needed to keep a two year beat to develop a new
were extended to help isolate the failing circuit. For microprocessor and he thought that the only way to keep the
example, the 49th clock pulse during the test program could beat was to introduce new tools and methods. The potential
be made longer, to allow completion of a slow logic of multi-level logic synthesis and of regular layout was fully
operation somewhere in the chip. This was done by a special appreciated by Albert and Pat. Albert proposed to support
clock stretcher debugging equipment. However, the 286 the research at U.C. Berkeley, introduce the use of
design had many second sources and very quickly those
multi-level logic synthesis and automatic layout for the
manufacturers were finding clever ways to speed up their
control logic of the 386, and to set up an internal group to
designs to rival Intel‟s. This led to a minor crisis within Intel
implement the plan, albeit Alberto pointed out that
as the industry was quickly putting pressure on Intel in the
multi-level synthesis had not been released even internally to
4

other research groups in U.C. Berkeley. The design time, it was a monumental feat breaking ground in
manager of the project, Gene Hill, put Alberto on a performance, ISA compatibility and design methodology.
consulting contract to facilitate the above topics as well as
reviewing the overall floor plan to better understand the
broader applicability of advanced CAD methods to the
design.
It is important to note that with the 386, the era of CMOS
began at Intel. While we were far from the power wall of the
early part of the 2000 decade, NMOS power was increasing
at a near exponential rate. CMOS brought with it a
reasonable P device and a strong bias towards
complementary logic structures to eliminate steady-state
power dissipation, achieve symmetry between rise and fall
times, and get full-swing logic voltage levels regardless of
transistor sizes and transition speeds. With CMOS, there was
much less benefit to gain from a cleverly ratioed design.
While there were still arguments for complex domino type
design approaches, the inherent nature of CMOS design
created a strong move toward using a standard set of gates
from a cell library, rather than individually-sized and
customized gate structures which were common in the days
of NMOS.
Working with a cell library, we could employ U.C.
Berkeley tools like Espresso for logic minimization and
TimberWolf for simulated annealing of cell placement. We
were quickly demonstrating large regular blocks of
reasonably well optimized logic designs. While the idea of Figure 1: Intel 80386 Processor – Taking a clockwise path around the chip:
The upper right was bus interface and instruction decode, lower right was
simulated annealing seemed rather chaotic at best, the results test and control logic and the large microcode ROM, the lower left was the
were quite good. An oft-repeated lesson in science and data path for primary instruction executive. Moving up the data stack on the
engineering is to apply proven techniques from other fields left of the chip was the segment and virtual address generation and finally in
the top left was paging and final physical address generation. Synthesized
to similar problems in your field. In this case, simulated random logic blocks stand out clearly in the middle given their row of cells
annealing proved to be the perfect answer. Of course, with and routing channel characteristics. Photo courtesy of Intel Corporation.
ample computing cycles made available on the IBM 3081,
one could play with the parameters offered at length to find
ever more optimal layout results. Post global placement by IV. THE 486 DESIGN ENVIRONMENT
TimberWolf, specific cell placement occurred in
A. The challenge of logic design effort in the 486
standardized rows of standard cells and routing channels
with a tool called P3APR developed by Manfred Wiesel who While the 386 design heavily leveraged the logic design of
came to Intel from the BellMac project at AT&T. the 286, the 486 was a more radical departure with the move
In fact, the results were good enough that the design team to a fully pipelined design, the integration of a large floating
point unit, and the introduction of the first on-chip cache – a
eliminated all the small PLAs from the 286 and simply
whopping 8K byte cache which was a write through cache
converted them to interconnected logic gates (i.e. random
used for both code and data. Given that substantially less of
logic). This made the logic blocks larger with greater the design was leveraged from prior designs and with the 4X
potential for further logic design optimization. Only the I/O increase in transistor counts, there was enormous pressure
ring, the data and address path, the microcode array and three for yet another leap in design productivity While we could
large PLAs were not taken through the synthesis tool chain have pursued simple increases in manpower, there were
on the 386. While there were many early skeptics, the results questions of the ability to afford them, find them, train them
spoke for themselves. and then effectively manage a team that would have needed
With layout of standard cell blocks automatically to be much greater than 100 people that eventually made up
generated, the layout and circuit designers could myopically the 486 design team.
focus on the highly optimized blocks like the datapath and With this challenge in front of us then, several aggressive
I/O ring where their creativity could yield much greater goals were proposed for enabling our small team to tackle
impact. Further, these few large blocks greatly simplified the the 486 design:
overall global chip floor planning effort allowing a much  A fully automated translation from RTL to layout
more rapid final chip assembly with far fewer errors. (we called it RLS: RTL to Layout Synthesis)
Verification of final connectivity was performed by an  No manual schematic design
in-house program called CVS written by Todd Wagner [13]. (direct synthesis of gate-level netlists from RTL,
While today the 386‟s 275,000 transistors seem trivial, at the without graphical schematics of the circuits)
5

 Multi-level logic synthesis for the control functions input language for RTL modeling. Languages like Mainsail
 Automated gate sizing and optimization or general C didn‟t have the formalism required to describe
 Inclusion of parasitic elements estimation synthesizable hardware. Languages like VHDL were in the
 Full chip layout and floor planning tools process of being invented at the time but were considered
hopelessly complex given the broad industry process being
For executing this visionary design flow, we needed to put used to define them.
together a CAD system which did not exist yet. We traveled Thus, we launched the iHDL effort. A language definition
one more time to our now good friend Alberto at U.C. specifically with the formalism required for synthesis with
Berkeley to extend our previous collaboration with new tool clear semantics for items like busses, native algebraic and
development. A liaison person from Intel (Gary Gannot) was Boolean logic functions and the basic control flow
stationed in Berkeley for two years as a participant in mechanisms that a logic design required. The iHDL
Alberto's research team. language defined by Tzvi Ben Tzur, Randy Steck, Gadi
While we were working on the 386, academic CAD Singer and Pat Gelsinger met the bill. In a series of summits
research was going through a major renaissance at U.C. between Israel, Oregon and Santa Clara in 1985 and 1986 we
Berkeley. The original research in CAD there was being
converged on a language definition while the CAD team in
combined into the “Berkeley Synthesis Project” with focus
Israel was developing the language compiler. The result was
on merging logic synthesis and layout generation efforts.
a formal language description for RTL development and
After collaboration with Alberto at IBM in 1980-1982 and a
Berkeley sabbatical in 1985, Dr. Robert Brayton joined the logic/layout synthesis from that description. U.C. Berkeley‟s
U.C. Berkeley faculty full-time in 1987 and the three main adoption of standard intermediate format for logic
CAD professors, Alberto, Brayton, and Newton joined representation was a key enabler for Intel (and others) to
forces to build what became a highly prolific period in CAD. develop higher-level description languages. Amazingly,
Alberto coined this era as the “age of the heroes”, a “vibrant Intel didn‟t replace iHDL until 2005 with Verilog simply
era of creativity and expansion” in his tour de force DAC because of its expressive completeness and effectiveness for
2003 keynote speech. In hindsight, Alberto and his synthesis, i.e. a 20 year life to the language.
colleagues fostered strong industrial collaboration by their
decision to make the results of U.C. Berkeley research
C. Intel's first standard Cell Library
(including software systems) freely available to everyone.
Through this arrangement, the close technical collaboration The vision of automatic conversion of RTL to layout
between Intel and the U.C. Berkeley CAD group was able to hinged also upon the existence of a standard cell library. The
benefit academia and industry, which in turn fueled even library cells had to fit multiple tools: they had to have a
more research advances. standard “height” and ports to enable automatic placement
As the 486 project was starting in 1986, Gene Hill and routing. Their delay characteristics had to be modeled
(Director of microprocessor development) was deliberating for static timing analysis, and the whole library had to serve
whether to take the full risk, or work on a conservative plan as input to the logic synthesis tools. Beyond this, a decision
in parallel. Gary recalls: "He asked me if I felt comfortable was made to develop a single library for use by multiple
that the code written by the students at Berkeley would be design teams across Intel, and gain productivity due to the
reliable enough in a production worthy environment. Since I large-scale reuse and modularity. Given the long history of
was proud to be part of the MIS team, I immediately individual transistor optimization at Intel, getting agreement
responded that I felt very comfortable". Finally, Hill decided on standard cells was no small assignment.
to go for it: he transferred “open requisitions" to hire 15 Jim Nadir in Corp CAD was given the assignment to
engineers from his budget to Corporate CAD department. create the common cell library, working closely with people
There was agreement by Gene with Albert Yu and Mike at Intel's Technology Development group in Oregon. This
Aymar (who headed Corporate CAD) to form a central turned out to be one of the more difficult and political
methodology development group under Rafi Nave with Pat assignments anywhere in the company at the time, as each
Gelsinger and Jim Nadir at the center of the group. Jim project group in the company wanted to have some unique
Nadir‟s primary focus was on library and physical design, cells. The resistance to a standard cell library sounds absurd
Pat Gelsinger was in charge of the methodology and the today, when libraries are offered to design houses as the
tools, working closely with the CAD teams in US and Israel basic access interface to semiconductor manufacturers.
and with U.C. Berkeley and Alberto. He did not expect this
at the time, but his next assignment would be managing the
D. Intel's adaptation of logic synthesis from. Berkeley
486 design, so he quickly became the customer for the very
tool chain he was driving. We decided that our RLS system would be based on a tool
called MIS (Multi-level Logic Interactive Synthesis System)
[14], which was actually an experimental workbench which
B. Intel's Hardware Description Language was being developed by the graduate students at Berkeley for
A major technical challenge we had to overcome for executing various restructuring operations on combinational
enabling a direct link from RTL to logic synthesis was the logic blocks. Gary, our liaison person, was regularly sending
software releases of MIS from Berkeley, CA to the Intel
6

CAD team in Haifa, Israel. The team in Haifa wrote software and overall pretty well. In those days I got hooked on email. I
programs to perform a series of tasks: compile iHDL models remember describing to Andy Grove how amazing it worked
into intermediate data structures, decompose the compiled in allowing folks to communicate between various sites and
blocks into separate combinational blocks and sequential time zones. He didn't buy it at the time. This was one of the
elements (latches or flip-flops), feed each combinational rare times, maybe the only time I anticipated the importance
block into MIS for logic minimization (the output was a of an emerging trend before he did!"
network of generic NAND gates), convert the generic form
into a combination of actual gates from the library, and E. Physical design automation in RLS
combine all the results along with the sequential elements With the advent of multi-layer metal process technologies,
into a final netlist, which could be handed over to the layout layout synthesis became competitive with manual layout
synthesis tools. The library mapping step was developed by artwork. The complexity of creating dense designs now
U.C. Berkeley at our request [15], as it was essential for our made automation more suitable and acceptable to engineers.
program. At the time, Intel still used manual effort to generate more
A key challenge of applying logic synthesis to our regular structures such as memories and datapath, but control
industrial design was the clocking style: Intel designs logic was synthesized both at the logic and layout levels.
commonly used transparent latches to allow more flexibility Place and route algorithms came from Alberto‟s students at
in the amount of logic levels between state elements. U.C. Berkeley: TimberWolf used simulated annealing and
Furthermore, skew penalties apply only once to a loop of was directly and heavily used to create optimized
transparent latches, rather than to every sampling element as placements. While routers were written for industrial use,
with flip flops where the "hold time" is wasted in each flop. the algorithms were heavily based on technology from
Finally, latches were smaller. Yet this introduced Alberto, the infamous YACR2 algorithm [11] and the
complexity for synthesis in coping with a two-phase Chameleon [16] multi-layer approach by Doug Braun (who
clocking system, both at the logic level as well as during joined Intel in 1987 and wrote most of the routing
place-and-route. As design debate raged then (as it does compaction algorithms).
now) about whether to flop or to latch in any given design, The physical design automation software was written in
our CAD programs had to cope with both. We also needed to MainSail, as were most CAD tools at Intel at the time, and
address timing, parasitic estimation and the automated sizing the team produced a series of capabilities led by Manfred
of gates. To do this we used the internally generated tool Wiesel. The DAPR [17][18] standard-cell tool placed and
called CLCD [10]. In addition to CLCD, we also developed a routed blocks of several thousand standard cells in
central timing tool called TISS which managed the global double-back rows (shared power supply) with diffusion
timing signal requirements. Some of these would be sharing, routing over the cell, and double-layer metal
generated automatically from the synthesis, some would be technology. For the first time on the 486, we had developed a
globally determined by external requirements and some were full-chip floorplanning and assembly tool called ChPPR [19]
highly optimized design signals such as critical datapath which used a ½ design rule boundary abstraction to create
signals that were highly optimized by manual circuit design correct-by construction abutting block placements, over the
and layout approaches. cell routing, and a hierarchical global abutment and
Much of the RLS integration/development effort was done connectivity check that bypassed traditional connectivity
in Israel due to the central role that the CLCD tool played verification [13] which was orders of magnitude faster by
and the relative stability of the other tools in the flow. Pat eliminating layout extraction. The ChPPR hierarchical tool
Gelsinger recalls some of the sensitivity associated with was actively used on mainstream microprocessors at Intel
working across a geographical and cultural barrier. He says: until about 2005, almost twenty years.
“I demanded the CLCD team work directly for me as I knew
how central it was to the overall flow. Mike Aymar, who ran F. Complete RLS flow for Random logic synthesis
corporate CAD at the time, refused claiming I needed to The combination of all these tools was stitched together
learn how to manage indirectly and through influence. I into a system called RLS which was the first RTL to layout
wanted to kill him at the time knowing the Israeli‟s were system ever employed in a major microprocessor
tough and remote and I didn‟t have time for such nonsense if development program, although similar synthesis projects
we were going to pull the overall RLS system off in time for were implemented at several other companies in the 1980s.
the 486 program to start up on it. It was a valuable learning RLS was used only for control logic in the 486 chip,
and development experience for me as a manager, even if I covering the most complex and tedious logic design effort,
despised Aymar for at least a year for making me live while the highly regular data path was done manually for
through such a challenging management experience”. achieving high density and speed.
Aymar put a strong emphasis on continually pulling the RLS succeeded because it combined the power of three
teams together, between Oregon, California and Israel. He essential ingredients:
recalls: "This placed significant demands on people's  CMOS (which enabled the use of a cell library)
personal lives as they had to spend quite a bit of extended  A Hardware Description Language (providing a
time in remote sites from their home site. It worked though, convenient input mechanism to capture design intent)
7

 Synthesis (which provided the automatic conversion Berkeley might be a bit unexpected. However, with our US
from RTL to gates and layout) to Israel, U.C. Berkeley and commercial to university
This was the "magic and powerful triumvirate". Each one of collaborations, we had created an extraordinary sense of
these elements alone could not revolutionize design teamwork crossing numerous unwritten barriers to diversity
productivity. A combination of all three was necessary! and creativity.
These three elements were later standardized and integrated
by the EDA industry. This kind of system became the basis V. DESIGN ENVIRONMENT OF PENTIUM PROCESSORS
for all of the ASIC industry, and the common interface for
the fab-less semiconductor industry. The 486 processor was followed by Pentium, Pentium Pro
and more advanced generations, which integrated numerous
architectural extensions and continuously increased
complexity. It is interesting to note that the same basic
design methodology and design flow has remained in effect
through all of those generations, while the initial set of tools
were replaced by more robust and better integrated tool sets.
As the EDA industry has matured, some of the in-house tools
were replaced by commercial tools. Starting at the Pentium
generation, the two-phase clocking scheme was largely
replaced by a single-clock and master-slave flip-flops, which
were simpler to synthesize and check, and are easily
supported by commercial tools. RTL remains the primary
entry point into the design cycle. No higher level synthesis
has emerged in the design of processors, although higher
level models are used in defining and verifying system
architectures.
The first Pentium was a superscalar microprocessor
design and the micro-architecture included new features like
microcode-based instructions, 64-bit fast external data-bus
and a completely revamped Floating Point Unit with
unprecedented levels of performance (e.g., the FMUL was
about 15 times higher throughput than in the 486). At 3.1
million transistors, Pentium required stronger EDA
capabilities. Avtar Saini, the Pentium design manager, met
Gadi Singer who relocated from Israel to California in the
summer of 1990, designated to be the next Intel liaison
person in Alberto's group. Avtar talked to Gadi at Intel's
Santa Clara cafeteria on the evening before he drove to
Berkeley, and convinced him to retarget his stay and become
the Pentium DA manager. That shift did not end up a total
negative for the Intel-Berkeley interaction as the Pentium
DA team continued a very deep and effective interaction
Figure 2: Intel486 Processor -- counter-clockwise from top-left: memory with Alberto, Newton, and the rest of U.C. Berkeley team.
interface, 8k unified cache, floating point unit. At the bottom right is the Logic and layout synthesis for the control circuits in the
decode logic, microcode ROM at bottom all mostly hand-craft, then going
up split into data-path on left (hand-craft) and control on right (all synthesis Pentium could be performed by the RLS flow, and was no
with a small hand-craft section in middle of die), crossing control signals longer a problem. The productivity bottleneck for the
handled by full-chip assembly. Photo courtesy of Intel Corporation. Pentium design was mainly in the much more complex
datapath circuits, which were still designed at the schematic
At the end of the design, Pat, Gene and Alberto were level, by manual conversion of the RTL model. In particular,
featured in a video that Intel distributed worldwide to the translation of schematics to layout was too slow. The
universities [20]. The video described how microprocessor layout designers were using a new symbolic editor, but due
design was done at Intel, and how we had revolutionized to well entrenched practices they continued to lay down
CAD by working with Alberto‟s team to bring in new wires and gates in a polygon-oriented manner. With a
technology and delivering stunning acceleration in the 486 combination of basic training and a set of automation tools to
program. Our commercial to academic collaboration was aid symbolic layout, productivity tripled in a matter of
widely recognized in the industry as extremely effective. As weeks. This was an important lesson for the future, that the
part of that video Pat joked about that "small school in the human factor is a major aspect in getting value out of new
Bay". Being a Stanford graduate, a partnership with U.C. capabilities.
8

Manually-designed datapath circuits had to be checked to timing analysis. In previous products, smaller circuits were
verify that their behavior was identical to the RTL model. designed using accurate extractions, but large static timing
This area required substantial investment in developing test analysis was based on a simplified lumped capacitance
vectors that would be executed on both the schematic and extraction model. However, this was insufficient to support
RTL and cover all functionality branches with high the aggressive timing requirements and the new cross-unit
coverage. Simulating the schematics at switch-level was a interdependencies that introduced many long-haul signals.
major sink of computing resources, and incomplete coverage Distributed RC extraction and modeling was introduced for
left holes in verification that were manifested as circuit bugs. the Pentium, as well as the power grid analysis.
Gadi developed a new technology to formally and It is important to note that since the 1990s a very
completely validate the correlation between schematics and significant productivity gain was achieved by increasing the
RTL. It was a combination of two existing capabilities in a computational power available to design teams. It is
brand new context. First, the datapath circuit schematics important to consider the computing environment at Intel
were automatically analyzed for their logic expressions and design centers. Interestingly, beginning with the 386
translated into RTL representation [10], [21]. Then, the development Intel began employing UNIX as its primary
extracted logic models (in iHDL) were fed into the logic engineering development environment given its more
synthesis programs co-developed with Alberto's group, flexible and engineering oriented environment. In fact, Pat‟s
which could take two logic descriptions, turn them into entry to the design team was because of their zeal for UNIX.
canonical form and compare them mathematically. By using The 386 design team was fed up with the DEC 20 and the
this new Schematic Formal Verification (SFV) functionality, IBM CMS environment, and was highly attracted to the
all circuits that reside between latch/memory elements could flexibility of the UNIX environment. However, the only
be fully verified against their original RTL descriptions machine big enough at the time (and available) was the
without a single simulation cycle. This removed a whole IBM370-168, later replaced with 3081. Given Pat was a bit
domain of investment during the Pentium duration, reducing of a UNIX hacker at the time, he set up the entire design team
test development for Schematic Verification to zero, inside of his CMS account which was running the UTS
reducing the run time to a fraction of the previous dynamic UNIX environment from Amdahl. Thus, he was „root‟ on the
verification, and increasing the quality level towards zero UTS environment for the entire 386 design team. Everyone
schematic mismatches. was extremely motivated to get to UNIX and thus quickly
Still, functional verification of the full chip RTL model overlooked Pat's naiveté in logic design as a way to get away
has grown non-linearly with the size of the processor. The from the Corporate IT environment. “Live Free or Die”
importance of verification was exemplified by the infamous UNIX license plates commonly adorned design member‟s
“Pentium FDIV bug”, where a rare and minute numerical offices.
inaccuracy in some mathematical calculation has created a Late in the 486 design and entirely for the Pentium
business crisis. The technical challenge then was to formally generations Intel‟s whole computing environment was
verify floating-point arithmetic logic as well as all associated moved to local UNIX workstations. In addition to the
micro-code to be functionally correct. This spawned another interactive performance, the design team was extremely
phase of Intel‟s close collaboration with academic motivated to develop the 486 on 386 machines, Pentium on
researchers [22] (though not with a U.C. Berkeley emphasis) 486 machines and so on. The functional simulations could be
which led to the creation of the Intel Strategic CAD Labs. easily partitioned into different jobs for running on different
Formal verification looked like a promising approach. In workstations. A major invention at Intel was called
principle, this is a static method which examines the design NetBatch. The idea was to utilize all of the engineering
without simulating its behavior over time and does not workstations at Intel world-wide as a virtual pool for running
require test inputs. However, the promise did not fully verification tasks in parallel, exploiting time-zone
materialize. Functional equivalence checking of RTL to differences among sites. This is conceptually similar to grid
gates has been added to the design flow as a static check. and cloud computing which have become commercially
Widely used at Intel, SALT and PEPPER are two internally available several years later.
developed tools for combinational and sequential
equivalence checking respectively. However, dynamic
verification remains the main way to address the functional
verification problem. Formal techniques were helpful for
property checking by simulation instrumentation (tracking
violation of formally specified properties). RTL simulations,
carefully designed to “cover” the enormous space of
processor states and logical conditions, remain the primary
verification vehicle, and they still consume more than 80%
of the computing resources.
Yet another area which became critical in the Pentium era
was full chip timing and modeling of interconnects for static
9

automation). This kind of conservatism goes together with


risk-avoidance, as people stick to their familiar methods and
tools, trying to minimize risks from large scale engineering
programs.
It is also interesting that while manufacturing technology
scaling proceeded predictably via coordinated efforts, with
Moore's law and Dennard's theory as a top-down roadmap
and a strategic guideline, the evolution of CAD and design
methodology happened bottom-up and via numerous
controversies.
Finally, many of the breakthroughs described in this paper
were only accomplished by significant cross-discipline
cooperation. The design teams took significant risks in
embracing new methods that were yet to be proven. Design
tools were being invented simultaneously with the design
team‟s requirements.
Collaboration between Intel, Alberto and U.C. Berkeley
continues to this day in a broad range of areas of computer
architecture and in particular in the area of platform-based
design. It is very likely that in order to achieve the next step
function in design productivity, people in the electronic
design community will have to take such radical
Figure 3: Intel Pentium Processor – counter clockwise from top-left: codevelopment risks once again in large scale engineering
floating point unit (hand crafted datapath on the left and synthesized
controls on the right). The middle of the die consists of the primary datapath programs where failure is not an alternative. With such risky
(handcraft on the right) with a control section on the left (all synthesis) with endeavors, the “era of heroes” may be upon us once more.
a channel for chip assembly. The top right consists of an 8K data cache.
The bus interface logic resides below the data cache. The 8K instruction ACKNOWLEDGMENTS
cache occupies the lower right of the die. The instruction fetch and the
branch target buffer memory are on the lower left. The microcode ROM and The authors wish to thank their many collaborators on the
logic were drawn below the floating point unit. Photo courtesy of Intel design teams from across Intel, U.C. Berkeley. Such friends
Corporation.
and memories stand as some of the finest of our collective
VI. DISCUSSION AND TRENDS careers. Further, such an overview paper is prone to errors of
memory. While the authors have attempted to be as accurate
At each step of the CAD evolution, higher productivity as possible, we are certain that errors of recollection exist
was enabled by increased automation, which leveraged and there are significant contributions that should be
increasing compute capacity, higher abstraction, higher
recognized and better chronicled for a better history of CAD
regularity, more usage of hierarchy, and a more disciplined
and microprocessor development.
and restrictive methodology.
In the evolution we have described, using RTL instead of
schematics was an example of higher abstraction. Using a REFERENCES
cell library was an example of higher regularity. Hierarchical [1] A.S. Grove, Only the Paranoid Survive: How to Exploit the Crisis
Points that Challenge every Company, Doubleday 1996.
decomposition ("divide and conquer") was achieved when [2] G. Moore, “Cramming more components onto integrated circuits,”
complex problems were divided into independent pieces Electronics, Vol. 38, No. 8, April 19, 1965.
(e.g. separation of logic verification from timing verification, [3] T. Quarles, A. R. Newton, D. 0. Pederson, and A.
separation of logic synthesis from library mapping). This Sangiovanni-Vincentelli, “SPICE 3BI User‟s Guide”, Univ. of Calif.,
Berkeley, Apr. 1987.
decomposition led to specialization in the expertise of [4] C.R. Wilcox, M.L. Dagcforde, G. A. Jirak, “Mainsail Implementation
engineers: for example, due to RTL and synthesis, logic Overview,” Stanford Computer Systems Laboratory Report No CSL
designers have become programmers. TR-167, March 1980.
Examples of a restrictive methodology are numerous: the [5] C. Mead, L. Conway, Introduction to VLSI Systems,
Addison-Wesley, 1980. (Out-of-print: pre-print drafts available:
synchronous design paradigm, the specific iHDL language http://ai.eecs.umich.edu/people/conway/VLSI/VLSIText/VLSIText.html)
design for synthesis, the cell library, static CMOS, all [6] K. Tham, R. Willoner, D. Wimp, “Functional Design Verification by
involve some self-imposed restrictions as part of the Multi-Level Simulation,” Proceedings 21st Design Automation Conference,
engineering discipline. Disciplined restrictions are essential 1984, pp 473-478.
[7] R. E. Bryant, “MOSSIM: A switch-level simulator for MOS LSI,”
in every methodology. However, the introduction of new Proceedings of the 18th Design Automation Conference, 1981, pp 786-790.
methods and tools did not proceed smoothly, but rather [8] R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. L.
encountered skepticism and resistance from designers who Sangiovanni-Vincentelli, Logic Minimization Algorithms for VLSI
did not want to give away their work habits, their control of Synthesis, The Kluwer International Series in Engineering and Computer
Science, Vol. 2, Boston, MA: Kluwer Academic Publishers, 1984.
details and their wild creative "rights". They did not want to [9] G. D. Hachtel, A. L. Sangiovanni-Vincentelli, and A. R. Newton,
accept the standards/restrictions of new methodologies “Some results in optimal PLA folding (Invited Paper),” in Proc. IEEE Intl.
(which were chosen in order to save verification and allow Conf. on Circuits and Computers (ICCC '80), Vol. 2, New York, NY: IEEE,
1980, pp 1023-1027.
10

[10] A. Kolodny, R. Friedman, and T. Ben-Tzur, “Rule-based contributing to full-chip assembly and interconnect
Static Debugger and Simulation Compiler for VLSI Schematics,”
Proceedings of the IEEE International Conference on Computer-Aided
performance management as well as to the specification of
Design (ICCAD), Santa Clara, CA, Nov. 1985. Intel‟s 130 nm process technology. In 1999, he became
[11] J. Reed, A. L. Sangiovanni-Vincentelli, and M. Santomauro: “A New Intel‟s first Technical Liaison to the Gigascale Silicon
Symbolic Channel Router: YACR2,” IEEE Transactions on CAD of
Research Center at U.C. Berkeley. In 1986, he joined Intel
Integrated Circuits and Systems, 1985, pp 208-219.
[12] C. Sechen, A. L. Sangiovanni-Vincentelli. “TimberWolf 3.2: A New where he contributed to hierarchical, full-chip timing
Standard Cell Placement and Global Routing Package,” Proceedings 23rd analysis, floor-planning, layout synthesis, and extraction,
Design Automation Conference. 1986, pp. 432-439. earning two Intel Achievement Awards. He received the
[13] T. J. Wagner, “Hierarchical Layout Verification,” Proceedings 21st
Design Automation Conference, 1985, pp 484-489. S.B. degree in electrical engineering from Massachusetts
[14] R. K. Brayton, R. Rudell, A. L. Sangiovanni-Vincentelli, and A. R. Institute of Technology in 1986 and the Ph.D. degree in
Wang, "MIS: A multiple-level logic optimization system," IEEE Trans. electrical engineering and computer sciences from U.C.
Computer-Aided Design of Integrated Circuits and Systems, vol. CAD-6,
no. 6, Nov. 1987, pp 1062-1081. Berkeley in 1997.
[15] E. Detjens, G. Gannot, R. Rudell, A. Sangiovanni-Vincentelli, and A.
Wang, “Technology mapping in MIS,” Proceedings of the IEEE Avinoam Kolodny is an associate
International Conference on Computer-Aided Design ICCAD-87,
November 1987, pp 116-119. professor of electrical engineering at
[16] D. Braun, J. Burns, S. Devadas, K. H. Ma, K. Mayaram, F. Romeo, A. Technion –Israel Institute of
L. Sangiovanni-Vincentelli, “Chameleon: A New Multi-Layer Channel Technology. He joined Intel after
Router,” Proceedings 23rd Design Automation Conference, 1986, pp
495-502.
completing his doctorate in
[17] M. Rose, M. Wiesel, D. Kirkpatrick, and N. Nettleton, “Dense, microelectronics at the Technion in
Performance Directed, Auto Place and Route,” IEEE Custom Integrated 1980. During twenty years with the
Circuits Conference, 1988, pp 11.1.1-4.
company he was engaged in diverse
[18] M. Rose, N. Papakonstantinou, G. Wellington, D. Kirkpatrick, and M.
Wiesel, “Synthesis for High Performance Random Layout,” Proceedings areas including non-volatile memory
IEEE International Symposium on Circuits and Systems, 1990, pp 885-889. device physics, electronic design automation and
[19] S. Meier, N. Nettleton, D. Kirkpatrick, and D. Braun. “ChPPR - Chip organizational development. He pioneered static timing
Planning, Placement and Routing,” IEEE Custom Integrated Circuits
Conference, Section 2, 1990. analysis of processors as the lead developer of the CLCD
[20] G. Hill, “Design and Development of the Intel 80386 tool, served as Intel‟s corporate CAD system architect in
Microprocessor,” (video-recording) University Video Communications, California during the co-development of the RLS system and
Stanford, CA, 1988.
[21] D. Fischer, Y. Levhari, and G. Singer, "NETHDL: abstraction of the 486 processor, and was manager of Intel‟s performance
schematics to high-level HDL,” Proceedings of the Conference on European verification CAD group in Israel. He has been a member of
Design Automation, 1990, pp 90-96. the Faculty of Electrical Engineering at the Technion since
[22] Y. Chen, E. M. Clarke, P. Ho, Y. V. Hoskote, T. Kam, M. Khaira, J.
W. O'Leary, X. Zhao, “Verification of All Circuits in a Floating-Point Unit 2000. His current research is focused primarily on
Using Word-Level Model Checking,” Proceedings of Formal Methods in interconnect issues in VLSI systems, covering all levels from
Computer-Aided Design, 1996, pp 19-33. physical design of wires to networks on chip and multi-core
systems.
Pat Gelsinger is President and COO for
EMC Corporation„s Infrastructure
Gadi Singer is vice president of the
Products since 2009. Pat had numerous
Intel Architecture Group and general
roles for Intel in his near 30 years with
manager, SoC Enabling Group for Intel
the company including, Sr. VP and GM
Corporation. Singer joined Intel in
of Digital Enterprise Group, First ever
1983, holding a variety of senior
CTO for Intel, CTO for Intel
technical and management positions.
Architecture Group, GM of Desktop
He was appointed VP in 1999 and CTO
products, Design manager for the
of Intel Communications Group in
Pentium Pro and the 80486, Architect of the 80486, CAD
2004, among other accomplishments.
Logic Methodology manager and designer on the 80386. Pat
From 2005 through 2007, Singer served as general manager
has a Masters in EECS from Stanford, a BS in EECS from
of the Ultra Mobility Group. Among his prior roles, Singer
Santa Clara and an honorary doctorate from William Jessup
was GM of Intel's Design Technology Division, co-GM of
University. He has received a variety of industry recognition
the IA-64 Processor Division and GM of Enterprise
awards, published several books and many papers and is an
Processors Division. Singer received three Intel
IEEE Fellow. He is married with four adult children.
Achievement Awards for his technical contributions. Singer
Desmond Kirkpatrick is a Principal received his bachelor's degree in electrical engineering from
Engineer responsible for Intel‟s research Technion University, Israel, in 1983 where he also pursued
roadmap in design efficiency. From graduate studies from 1986 to 1988.
1991–1999, he was a member of the
Pentium Pro and Pentium 4
microprocessor design teams,