You are on page 1of 37

Design for

Reliable Data Processing and Storage M

Prof. Cecilia Metra


cecilia.metra@unibo.it
Tel: 051 2093013

Design for Reliable Data Processing and Storage M Cecilia Metra

Class Goals
 Basic concepts for the design of high reliability
integrated circuits (ICs):

 Why reliability needs ?

 ICs may present faults or get faulty...

– Which kind of faults ? Which effects ?


– How to realize about them ?
– How to design Integrated Circuits (ICs) and
systems in order to guarantee their correct
operation despite their faults ?
2
Design for Reliable Data Processing and Storage M Cecilia Metra

1
Class Goals (cnt’d)

 Possible Physical Defects and Fault Models

 Testing Approaches

 Design For Testability Approaches

 Fault Tolerant Design

3
Design for Reliable Data Processing and Storage M Cecilia Metra

Recommended Books
 J. Segura C. F. Hawkins, “CMOS Electronics – How It
Works, How It Fails” IEEE Press – Wiley, 2004.
 M. L. Bushnell, V. D. Agrawal, “Essential of Electronic
Testing”, Kluwer Academic Publishers, 2000
 M. Abramovici, M. A. Bruer, A. D. Friedman, “Digital
Systems Testing and Testable Design”, Computer Science
Press, 1990
 S. Mourad, Y. Zorian, “Principles of Testing Electronic
Systems”, Essential of Electronic Testing”,Wiley, 2000
 N. K. Jha, S. Kundu, “Testing and Reliable Design of
CMOS Circuits”, Kluwer Academic Publishers, 1990
 P. K. Lala, “Self-Checking and Fault Tolerant Digital
Design”, Morgan Caufmann Publ, 2001

4
Design for Reliable Data Processing and Storage M Cecilia Metra

2
Introduction to Testing: Summary
 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

3
Today’s Electronics

Design for Reliable Data Processing and Storage M Cecilia Metra

Today’s Electronics: Pervasive and


Connected - Internet of Things (IoT)

M. Bohr, Z. Ball, "Building Winning Products with Intel® Advanced Technologies and Custom Foundry
Platforms”, Intel Developer Forum, 2016
Design for Reliable Data Processing and Storage M Cecilia Metra

4
Huge Amount of Data Generated by
Autonomous Vehicles
 Huge amount of data to be processed and stored.

R. Mariani, “Making the Autonomous Dream Work“, Intel Fellow, Unviersity of Bologna presentation, May 2018

 Decisions more and more driven by such data


(autonomous vehicles, smart factories, etc).
But can we rely on the correctness of such data? Is the electronics
generating, processing and storing such data reliable?
Design for Reliable Data Processing and Storage M Cecilia Metra

Today’s Electronic Technology

M. Bohr, “Continuing Moore‘s Law“, Technology and Manufacturing Day, 19 September 2017

Design for Reliable Data Processing and Storage M Cecilia Metra

5
Today’s Electronic Technology (cont’d)
 How much small are 14nm?

M. Bohr, “14nm Process Technology: Opening New Horizons ”, Intel Developer Forum, 2014

Design for Reliable Data Processing and Storage M Cecilia Metra

Development of Electronic Technology


 The Moore law (1965) has driven the evolution of
microelectronic technology and is driving its future
developments.

Courtesy of Intel Corporation


Intel Techn. Journal, 2007

https://en.wikipedia.org/wiki/Moore%27s_law
Design for Reliable Data Processing and Storage M Cecilia Metra

6
Moore Law: Example
Intel Microprocessor Year of Transistors
Introduction
4004 1971 2,300
8008 1972 2,500
8080 1974 4,500
8086 1978 29,000
286 1982 134,000
386™ processor 1985 275,000
486™ DX processor 1989 1,200,000
Pentium® processor 1993 3,100,000
Pentium® II processor 1997 7,500,000
Pentium® III processor 1999 9,500,000
Pentium® 4 processor 2000 42,000,000
Itanium® processor 2001 25,000,000
Itanium® 2 processor 2003 220,000,000
Itanium® 2 processor (9MB cache) 2004 592,000,000
Dual Core Itanium® 2 processor 2005 1,200,000,000
Int. Techn. Journal, 2005
Design for Reliable Data Processing and Storage M Cecilia Metra

Moore Law: Example (cnt’d)

http://www.intel.com/content/www/us/en/silicon-innovations/moores-law-technology.html

Design for Reliable Data Processing and Storage M Cecilia Metra

7
How Has It Been Possible to Follow
the Moore’s Law?

 Architectural Changes: multicore/many-core systems


(since 2000)

 Material Changes: high-k gate insulator (since 2007)

 Device Changes: Tri-gate transistors (since 2011)

Design for Reliable Data Processing and Storage M Cecilia Metra

How Has It Been Possible to Follow


the Moore’s Law?
 Architectural Changes: multicore/many-core systems
(since 2000)
 June 15, 2010:  A trend that will continue
Experimental
microprocessor
with 48-cores

http://www.intel.com/pressroom/inn
ovation, June 15, 2010
IEEE Computer Society 2022 Report, 2014
Design for Reliable Data Processing and Storage M Cecilia Metra

8
How Is It Possible To Follow the
Moore Law ?
 Material Changes: high-k gate insulator (since 2007)

 Intel 45nm dual-core, Hafnium-based High-k Metal


Gate process.

Intel Press Kit, November, 2007


Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the


Moore Law ? (cnt’d)
 Intel Atom (March 3, 2008):
 dimensions < 25mm2 ( 11 die in
a US penny)
 47 million transistors
 Power consumption 0,6 – 2,5
Watt
 Up to 1,8 GHz speed
 45nm Hafnium-based High-k
Metal Gate process

Intel Press Release, March 3, 2008

Design for Reliable Data Processing and Storage M Cecilia Metra

9
How Is It Possible To Follow the
Moore Law ? (cnt’d)
 Hafnium-based High-k Metal Gate process advantages:

Intel’s High-k/Metal Gate k/Metal Gate Announcement November 4th, 2003

Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the


Moore Law ? (cnt’d)
Features of Hafnium-based High-k Metal Gate process:

Very high
current when
tr ON

Very
low
current
when tr
OFF

Intel’s High-k/Metal Gate k/Metal Gate Announcement November 4th, 2003

Design for Reliable Data Processing and Storage M Cecilia Metra

10
How Is It Possible To Follow the
Moore Law?(cnt’d)
 Device Changes: Tri-gate
transistors (since 2011):
Tri-Gate Transistors  higher
speed & lower IOFF ( low
power consumption) [2002]. R. S. Chau, Technology @ Intel
Magazine, August 2006
 Tri-Gate Transistors used
in 22nm SRAM demonstrated
in 2009
 Tri-Gate Transistors used
in 22nm microprocessor
demonstrated in April 2009
Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the


Moore Law ? (cnt’d)
 Planar Transistor  Tri-Gate Transistor

 2 fins  3 fins

Bohr, Mistry, “22nm Details_Presentation”, May 2011


Design for Reliable Data Processing and Storage M Cecilia Metra

11
How Is It Possible To Follow the
Moore Law ? (cnt’d)

 Higher Speed  Reduced Leakage (IOFF)

Bohr, Mistry, “22nm Details_Presentation”, May 2011

Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the


Moore Law ? (cnt’d)
 Intel® Core™ M Processor
(announced on September 5th, 2014):
 14 nm, 2nd generation 3-gate
transistor technology
 1.3 billion transistors
 Compared to previous Intel Core
processors
  50% performance
  40% graphic elaboration speed
  20% autonomy of charge
Intel Developer Forum San Francisco 2014

Design for Reliable Data Processing and Storage M Cecilia Metra

12
How Is It Possible To Follow the Moore Law?(cnt’d)
2nd generation 3-gate transistors

Intel Developer Forum San Francisco 2014


Closer fins   integration density
 Thinner and higer fins   performance
Lower number of fins   integration density
Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the Moore Law?(cnt’d)

 2nd generation 3-gate transistors

Intel Developer Forum San Francisco 2014


Design for Reliable Data Processing and Storage M Cecilia Metra

13
How Is It Possible To Follow the Moore Law?(cnt’d)
 10nm process using the 3rd generation of 3-gate
transistors:
10 nm fins are approx. 25% taller and approx. 25%
more closely spaced than 14nm
14nm 10nm
22nm

M. Bohr, “Technology Leadership“, Technology and Manufacturing Day, 19 September 2017

Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the Moore Law?(cnt’d)


 10 nm process: compared to 14nm, higher transistor
density (2,7%), higher performance (25%), and lower
power (45%)

https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2017/09/10-nm-icf-fact-sheet.pdf
Design for Reliable Data Processing and Storage M Cecilia Metra

14
How Is It Possible To Follow the
Moore Law ? (cnt’d)
 Intel Optane – announced on March 19th, 2017, available
since Aprile 24th, 2017 (16GB, 32GB)
 Intermediate solution between
DRAM and Flash memories
DRAM (faster than Flash, less
dense than Flash and volatile)
Flash – used in current SSD (non https://newsroom.intel.com/new
s/intel-introduces-worlds-most-
volatile, denser than DRAM, responsive-data-center-solid-
state-drive/
slower than DRAM)
non volatile + denser (10X) than DRAM and faster
(1000X) than Flash
Technology “ideal for …devices, applications, services…requiring
fast access to large sets of data”
(http://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html)
Design for Reliable Data Processing and Storage M Cecilia Metra

How Is It Possible To Follow the


Moore Law ? (cnt’d)
 Intel Optane– 3D Xpoint Technology:

http://wccftech.com/intel-storage-roadmap-2017-optane-nand/
Vertical stack (3D) of structures composed by
columns (cell, selector)  ↑ density
Each cell can be written/read changing only the
voltage sent to the selctor  ↑ speed
Design for Reliable Data Processing and Storage M Cecilia Metra

15
How Will It Be Possible To Follow the Moore Law?
 Following the Moore law enabled to  integration density
and complexity, as well as  performance, but also
 During fabrication:
  Process parameter variations’
entity &
 Likelihood of physical defects Courtesy of Jerry Soden,
Sandia Lab (USA)
New Challenges for Test and Diagnosis
 In the Field:
  Vulnerability to
transient faults
  likelihood of ageing
phenomena
New Challenges for Reliability Courtesy of Dr. Monica Alderighi, INAF (Italy)
Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

16
Testing: Definition

 Definition: Process guaranteeing that a fabricated


chip behaves accordingly to design constraints
(behaves correctly).

 Objectives: Testing should be able to:


 detect chips’ non-correct operation (chips thus
discarded);
“Pass” correctly operating chips.

Design for Reliable Data Processing and Storage M Cecilia Metra

“Actual” Testing
 Some correctly operating chips are discarded. The
fraction (or percentage) of such chips is called yield
loss.

 Some incorrectly operating chips (“faulty”) pass the


test. The fraction (or percentage) of such chips is
called defect level.

Design for Reliable Data Processing and Storage M Cecilia Metra

17
Testing

Chip
Correct Chips Prob (pass test) = high
(mainly)
Prob(corrett chip) = y
Correct

Fabricated
Chips
Chip
Faulty Chips (mainly)
Prob(faulty chip) = 1- y Prob (fail test) = high Faulty

Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

18
Why Testing ?
 Multiple fault causes (which could compromise the
chip correct operation) due to:

 Design errors
 Fabrication Process problems
 Materials’ defects
 Environmental factors
 Physical phenomena of various kind

Design for Reliable Data Processing and Storage M Cecilia Metra

Design Errors

 Examples:

 Pentium “division bug”

 Critical race on signal propagation paths

 Fluctuations of power supply

Design for Reliable Data Processing and Storage M Cecilia Metra

19
Fabrication Process Problems
 Missing/undesired electrical connection, for
instance due to:
 mask alignment errors
 mask defects

 Parasitic transistors

 Dielectric breakdown

Design for Reliable Data Processing and Storage M Cecilia Metra

Material Defects

 Material aging ( e.g. oxide wear-out, variations


of resistances, capacitances, inductances)

 Material Physical Irregularities. Example:


 Substrate defects (breaks, cristallographic
imperfections)

 Superficial impurities (ion migration)

Design for Reliable Data Processing and Storage M Cecilia Metra

20
Environmental Factors (I)

Alpha Particles: charged particles generated by the


radiactive decay of Uranium and Thorio impurities
present within packages  generation of electron-hole
couples.

Cosmic Rays: high energy neutrons coming from


space which hit the Si substrate generation of
electron-hole couples.

Design for Reliable Data Processing and Storage M Cecilia Metra

Environmental Factors (II)


Pollution of air (interconnection shorts).
Umidity (transient shorts).
Temperature (transient logical errors).
Pressure (transient interconnection opens/shorts).
Vibrations (transient interconnection opens).

Design for Reliable Data Processing and Storage M Cecilia Metra

21
Physical Phenomena of Various Kinds

 Corrosion (e.g. corroded chip-package connection)

 Electromigration (movement of metal particles in


metal wires with high density current)

 Elettromagnetic Interference (undesired coupling).

Design for Reliable Data Processing and Storage M Cecilia Metra

Example

Metal-2

W-Via

Metal-1

Courtesy of Jerry Soden, Sandia Lab (USA)

Design for Reliable Data Processing and Storage M Cecilia Metra

22
Temporal Characteristics of
Produced Efffects
 Permanent –Always present.

 Transient – Temporarily present, then the system


goes back to its correct operation – temporal
dependency, usually due to environmental
conditions

 Intermittent – Sometimes present, sometimes not


present

Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

23
Introduction to Testing: Summary
 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

Testing - Diagnosis - Failure Analysis


 Testing: allows to identify the faulty chips (to be
discarded).

 Diagnosis: allows to identify the faults within the


discarded chips.

 Failure Mode Analysis (FMA): allows to identify the


causes of faults, e.g., problems in design and/or
fabrication process

Design for Reliable Data Processing and Storage M Cecilia Metra

24
VLSI Chip Fabrication Flow
User Needs

Constraint definition

Design (at various levels)


Diagnosis &
FMA Fabrication

Testing
If faulty chip

Design for Reliable Data Processing and Storage M Cecilia Metra

Verification
 Checks the correct behavior of an IC prior to
fabrication.

 It is performed at various stages of the IC design

 It employs simulation tools.

Design for Reliable Data Processing and Storage M Cecilia Metra

25
Introduction to Testing: Summary
 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

Process Yield
 A fabrication defect is a “defective” area of the chip due
to errors in the fabrication process and/or material
defects.

 A chip without fabrication defects is called “good”.

 The percentage of “good” chips over all fabricated ones


with a fabrication process is called Yield, Y.

Number of good chips


Y = ------------------------------------------ X 100
Number of fabricated chips

Design for Reliable Data Processing and Storage M Cecilia Metra

26
Yield (dependencies)

Good Chips
Defective Chips

Defetcs
Wafer
“Non-clustered” defects “Clustered” defects
Yield = 12/22 = 0.55 Yield = 17/22 = 0.77
Courtesy of V. D. Agrawal, Agere (USA)
Design for Reliable Data Processing and Storage M Cecilia Metra

Yield (dependencies)
 Process yield depends on:

 Defect density d = average number of defects per


unit area (process parameter);

 Clustering parameter a (process parameter);

 Chip area A (with increasing chip area, the


likelihood of defects increases).

Design for Reliable Data Processing and Storage M Cecilia Metra

27
Yield Model
 Y = ( 1 + Ad / a ) – a
 d = defect density (= average number of defects
per unit area);
 A = chip area;
 a = clustering parameter

 Example: Ad = 1.0, a = 0.5 (typical values for VLSI


chips) Y = 0.58 (ok, for new processes)
 If unclustered defects (a = ) Y = e – Ad
 Example: Ad = 1.0 Y = 0.37 (too much low !)

Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

28
Yield and Production Cost (I)

 Chip cost =
Fabrication cost and wafer test
----------------------------------------------------------
Yield x Number of chips on the wafer

If low yield => high chip cost.

Design for Reliable Data Processing and Storage M Cecilia Metra

Yield and Production Cost (II)


 If new process typically low yield
design/process changes

 If established process high yield (> 60%).

 To improve yield: Testing + Diagnosis + defect cause


elimination (design/process modifications, etc)
yield increase ! Production cost reduction!

Design for Reliable Data Processing and Storage M Cecilia Metra

29
Introduction to Testing: Summary
 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

Defect Level - DL
 Ratio between the # faulty chips passing the test and
the total # of chips which pass the test.
 It is a quantitative measure of the testing
effectiveness/quality.
 It is measured in parts per millions (ppm).
 For commercial VLSI chips a DL > 500 ppm is
considered unacceptable.
 It can be estimated starting from the returned chips =
chips which fail in the field that are returned to the
foundry  DL = Number of returned chips normalized
over a million of sold chips.

Design for Reliable Data Processing and Storage M Cecilia Metra

30
Introduction to Testing: Summary
 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

Testing Economical Impact


 The usage time of a product is lower than its
design (and fabrication) time.

 The “Time to market” should be as shorter as


possible.

Design for Reliable Data Processing and Storage M Cecilia Metra

31
Time to Market

Loss of
Revenues

Revenues

Time to
Market Time in Months
T
Courtesy of Y. Zorian, Viragelogic (USA)
Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

32
Some Kinds of Testing

 Characterization Testing (or Design Debug, or


Verification Testing)

 Manufacturing (or Production) Testing

 Burn-In

 Incoming Inspection

Design for Reliable Data Processing and Storage M Cecilia Metra

Characterization Testing (I)


 Performed on the chips with a new design, checks the
design correctness (accordingly to constraints) –
usually requires to change the design.
 Very expensive.
 May require:
 Electron beam
 Optical analysis for some defects
 Artificial intelligence based methods (expert
systems)
 Several functional tests
 etc.

Design for Reliable Data Processing and Storage M Cecilia Metra

33
Characterization Testing (II)
 Requires to:

 Choose a statistical representative sample of


chips.
 Choose the test which allows to discriminate
the chips to pass/discard.
 Repeat the tests for every combination of
environmental variables.
 Perform diagnosis and design error
correction.

Design for Reliable Data Processing and Storage M Cecilia Metra

Manufacturing Testing
 Performed on all fabricated chips.

 Identifies if the fabricated chips behave accordingly


to constraints.

 Should cover an high % of faults.


 Should test every circuit on the chip.
 Should minimize the testing time (to reduce cost).
 Should be performed at the guaranteed speed.

Design for Reliable Data Processing and Storage M Cecilia Metra

34
Burn-in
 Testing performed by exposing the chip to high
temperatures and power supply voltages during
test pattern application, thus accelerating the
occurrence of faults which would otherwise occur
in the first years of operation in the field.

 It allows to identify the:


 Infant mortality (damaged chips which would
otherwise fail in the first years of usage in the
field  they are not sold)

Design for Reliable Data Processing and Storage M Cecilia Metra

Incoming Inspection (I)


 Testing performed for the customer by external
experts, prior to integrating a component into a
system.

 Allows not to integrate a faulty component into a


system, whose diagnosis cost would exceed the
Incoming Inspection cost.

Design for Reliable Data Processing and Storage M Cecilia Metra

35
Incoming Inspection (II)

 Can be performed:
 similarly to fabrication testing;
 more extensively than fabrication testing;
 accordingly to specific application needs.

 Often performed on a random sample of devices,


whose size depends on quality needs.

Design for Reliable Data Processing and Storage M Cecilia Metra

Introduction to Testing: Summary


 Testing and Reliability: challenges due to
continuous scaling of technology
 Testing definition
 Why Testing -- Examples
 Testing as a part of the VLSI chip fabrication flow
 Process Yield
 Cost of VLSI chip production
 Defect Level
 Testing Economical Impact
 Some Testing kinds
 Problems

Design for Reliable Data Processing and Storage M Cecilia Metra

36
Testing: Problems
 Extremely high number of possible physical defects
(defects) with respect to which testing has to be
performed.

 Testing based on fault models.

Design for Reliable Data Processing and Storage M Cecilia Metra

37

You might also like