You are on page 1of 111

VLSI Implementation of a Cellular Neural Network for Solving Partial Di erential Equations By Richard David Yentis, Jr. B.Sc.

June 1992, Lehigh University A Thesis submitted to The Faculty of The School of Engineering and Applied Science of the George Washington University in partial satisfaction of the requirements for the degree of Master of Science January 31, 1994 Thesis directed by Mona Elwakkad Zaghloul Professor of Engineering and Applied Science

Abstract
This thesis presents a locally connected neural network for solving a class of partial di erential equations. The network is based on previous theoretical work by others, but is extended to the twodemensional case. The network is designed and simulated with SPICE. Each neural cell is designed using active and passive components. An architecture is described to control the weights between the neurons. This architecture is usable by other neural networks, but is demonstrated with this cellular neural network. The major bene t of the architecture is that it does not require additional space outside of the cell for routing the control lines no matter how many cells are used. The CMOS VLSI implementation was fabricated and measured. A sixteen cell network is also simulated and measured to solve the steady-state heat ow problem under several di erent sets of conditions. The results of this network are compared to the numerical solution of the partial di erential equations.

Acknowledgments
At this point, I would like to acknowledge my deep appreciation to the many people who helped me throughout this work. First and foremost I would like to express my thanks to my advisor Professor Mona E. Zaghloul for her invaluable support encouragement and guidance. Without her help, this work would not be possible. I would also like to thank Dr. Desa Gobovic for getting me started, both for her personal help and for the previous papers on which this work is based. I also appreciate all of the members of my committee for any criticism, suggestions, or comments. A special thanks to Mr. Norris C. Hekimian for the fellowship which allowed me to continue this research. I am grateful to my colleague Michael Salter for convincing me to follow my instincts and Charles Hsu who was always willing to give advice. I would also like to thank Mr. White and his sta at SEASCF (especially Sheryl and Robert) for there assistance with computing matters. Mr. Petrella and his sta at the EE&CS Lab were also of great assistance with both equipment and analog testing advice. Last, but not least, I would like to thank both my parents and my sister for their support, both nancially and emotionally.

Contents
1 Introduction
1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Statement of Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Contribution of Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1 Equidistant Case : : : : : : : 2.2 Increased Accuracy : : : : : : 2.3 An Illustration : : : : : : : : 2.3.1 Equidistant Case : : : 2.3.2 Non-Equidistant Case

1 2 3

2 Formulation of the Problem

: : : : :

: : : : :

: : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :
ii

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: : : : : : : : : : : : :

: 5 : 9 : 14 : 15 : 19 : : : : : : : :

3 Architecture of VLSI Chip

3.1 Variable Resistors : : : : : : : : : 3.1.1 Generic : : : : : : : : : : : 3.1.2 Implementation : : : : : : : 3.2 Boundary Conditions and Outputs 4.1 Design Circuitry : : : : 4.1.1 Variable Resistor 4.1.2 Bu er : : : : : : 4.1.3 Capacitor : : : :

22

22 23 26 30 32 32 34 37

4 CMOS Implementation

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

32

iii 4.1.4 Glue Logic : : : 4.2 Simulation Results : : : 4.2.1 Variable Resistor 4.2.2 Bu er : : : : : : 4.2.3 Cell : : : : : : : 4.2.4 Matrix : : : : : : 4.2.5 Glue Logic : : :

: : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : :

37 37 38 38 39 42 42

5 Chip Design and Measurement


5.1 Layout : : : : : : : : : : : 5.1.1 Bu er : : : : : : : 5.1.2 Variable Resistors 5.1.3 Capacitor : : : : : 5.1.4 Cell : : : : : : : : 5.1.5 Matrix : : : : : : : 5.1.6 Complete Chip : : 5.2 Measurement : : : : : : : 5.2.1 Test Circuits : : : 5.2.2 Matrix : : : : : : :

43

43 44 44 46 48 48 48 51 51 55

6 Conclusion and Future Work

6.1 Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 6.2 Future Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66

64

A Complete Matrix Spice File B Shift Register Simulations C Glue Logic Simulations

68 72 77

C.1 Shift Register : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 77 C.2 Decoder : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 78

D Pin Out

81

iv

E MOSIS Parameters F Sample LV500 le

84 91

List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 4.4 4.5 Neighboring mesh points. : : : : : : : : : : : : : : : : : : : : : Locally connected neurons with weights. : : : : : : : : : : : : : Notation for the Non-Equidistant Case. : : : : : : : : : : : : : : Laplace's equation model for our heated plate example. : : : : : Discretization of a plate domain by a rectangular grid of points. 4x4 Cellular Neural Network for our heated plate example. : : : An example of a cell circuit. : : : : : : : : : : : : : : : : : : : : A general cell with many weights. : : : : : : : : : : A variable resistor. : : : : : : : : : : : : : : : : : : A local memory cell consisting of shift registers. : : Two cells with connected variable resistors. : : : : Matrix with External Shift Registers. : : : : : : : : Processing elements with local and global memory. Matrix with Output Control Circuitry. : : : : : : : A one bit shift register. : : : The bu er used in each cell. : AWB circuit of a bu er. : : : AWB simulation of the bu er. AWB circuit of a single cell. :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : :

6 8 10 15 16 17 18 24 24 25 27 28 29 31 35 36 39 40 41

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : : : : : : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

5.1 Layout of the analog bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45 v

vi 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 B.1 B.2 B.3 B.4 C.1 C.2 C.3 C.4 Picture of the analog bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Layout of the ve way variable resistor. : : : : : : : : : : : : : : : : : : : : : : : Picture of the ve way variable resistor. : : : : : : : : : : : : : : : : : : : : : : : Layout of the large capacitor. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the large capacitor connected to a cell. : : : : : : : : : : : : : : : : : : Layout of the complete cell. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the complete cell. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Layout of the complete chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the complete chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Input and output of the test bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : Output from the matrix of the example in Section 2.3.2. : : : : : : : : : : : : : : Output from the matrix showing rise and fall times. : : : : : : : : : : : : : : : : : Output from the equidistant case with the rst set of boundary conditions. : : : : Output from the \column" weight set and the second set of boundary conditions. Voltage sets used to test the complete matrix. : : : : : : : : : : : : : : : : : : : : Shift Register Cell used with esim. : : : : : : : : : : : : : Picture of the Shift Register Cell. : : : : : : : : : : : : : : AWB circuit of a 1-bit shift register driving a tap. : : : : AWB \oscilloscope" of a 1-bit shift register driving a tap.

: : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : :

45 46 46 47 47 49 50 52 53 54 56 57 60 61 63 72 73 75 76 77 77 79 79

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

: : : : : : : :

An 8-bit shift register used to set control lines for the variable resistors. Picture of the 8-bit shift register. : : : : : : : : : : : : : : : : : : : : : : A 3-to-8 Decoder used to select an output column. : : : : : : : : : : : : Picture of the 3-to-8 Decoder. : : : : : : : : : : : : : : : : : : : : : : : :

D.1 the layout of the pins on the MOSIS Tiny Chip frame. : : : : : : : : : : : : : : : : : 82

List of Tables
2.1 2.2 2.3 2.4 2.5 The voltages obtained by Spice simulation for the equal distant case. The voltages obtained by Maple for the equal distant case. : : : : : : The voltages obtained by spice simulation for the variable case. : : : The voltages obtained by Maple for the variable case. : : : : : : : : : The di erences between Table 2.3 and Table 2.4. : : : : : : : : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

19 19 20 21 21

4.1 The ve basic variations the resistor can achieve. : : : : : : : : : : : : : : : : : : : : 34 4.2 The aspect ratios (w/l) for the devices in each cell's bu er. : : : : : : : : : : : : : : 35 5.1 The aspect ratios (w/l) and sizes used for the bu er. : : : : : : : : : : : : : : : : : : 44 5.2 Results from the example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55 D.1 The pin out used for the chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83

vii

Chapter 1

Introduction
1.1 Background
Partial di erential equations (PDEs) are very useful in understanding the physical environment in which we live. They can be used to model numerous aspects of life from economics (the rate of change in production due to changes in capital) to music (the physics of a vibrating string) and, of course, to traditional engineering problems. The usual way of solving these equations is to use a digital computer. Unfortunately solving sets of partial di erential equations can be costly in both time and computer resources. One class of PDEs which is fairly common is referred to as the elliptic boundary problem, which is frequently solved using the nite di erence method. An example of the many problems that ll this class is that of modeling heat ow. The traditional method which uses di erence equations requires that the space over which the problem is to be solved is divided up into points and an equation must be solved at each point. For accurate results a large number of points and thus equations are needed. This requires a lot of computing time as well as a large amount of memory. Numerous books and journals have been dedicated to nding faster and less memory intensive methods. In a previous paper by Gobovic and Zaghloul GZ93] an idea was presented for solving this type of problem with neural networks. Our understanding of neural networks, both natural and arti cial, is based on the McCullochPitts model. MP43] This model essentially consists of a cell body, called the soma, many inputs from 1

2 other neurons called dendrites, and one axon which is the output. One axon can connect to many other neurons, even those that are not near by. The junction on the dendrite that connects to the axons of other cells is called a synapses. In general, each cell connects to many other cells. An Arti cial Neural Network (ANN) is a system for processing information based on the model of a biological neuron. ANNs, which can be implemented as digital or analog circuits have a major drawback when implemented as a VLSI design because each neuron may have many connections to many other neurons. A Cellular Neural Network (CNN) is an ANN which is made up of cells that are typically identical. If these cells are locally connected, that is each neuron is only connected to its closest neighboring neurons then it is well suited for VLSI implementations. CY88b] In essentially all neural networks, both natural and arti cial, the neurons are connected by weights which govern how much of an e ect the information from the other neuron will have on the current neuron. The rst paper by Gobovic and Zaghloul had an accuracy that was limited by the size of the mesh in which the problem domain was discretized on (e.g. the number of neurons available to be used). In a second paper GZ94] this problem was addressed in the same way that has been addressed for the digital computer solution: to change the distance between the nodes of the mesh. That paper developed the mathematical theory for a one dimensional case. This thesis extends the improvement made in GZ94] to the two dimensional case and presents an analog CMOS VLSI design and layout. An architecture for controlling the weights in the CNN is also presented.

1.2 Statement of Problem


The basic problem is simply to solve partial di erential equations in a fast manor. Since the traditional method involves solving these types of problems using digital computers in a serial manner, which tends to be slow and even slower as the problem gets larger, a completely new method is needed. Ideally the problem should be solved in some parallel manner so that the time required to generate a solution is not dependent upon the size of the problem. Upon a closer examination of the traditional algorithm, which is the nite di erence method, it becomes apparent that several processing elements could work in parallel to solve the problem. Since the space over which the problem to be solved is discretized, one processing element at each node

3 should be able to solve the problem rapidly. And in fact it is possible to solve this problem on large and expensive parallel processing computers. In this thesis a method for solving the problem more accurately using the traditional method has been developed. The proposed technique divides the problem space in an uneven manner so that there are more nodes in areas of greater change. Again this slightly more advanced model could be solved using a parallel processing computer. But these computers tend to be too large and complex to be used in an embedded system. Thus a proposed Cellular Neural Network is introduced to solve the problem. The proposed system that can solve these PDEs rapidly, and is smaller and less complicated than digital parallel processing computers. Like with digital computers a novel architecture is also introduced for controlling the processing elements.

1.3 Contribution of Thesis


A solution to these problems is proposed in this thesis. Our solution is to use a neural network. The same properties of the nite di erence method that allow the problem to be solved with multiple processors also allow the network of processors to be a locally connected. This fact allows us to use a cellular neural network. In order to use a CNN a cell must be designed that is able to handle this particular problem. It is most logical for the weights of the cell, which control how each cell interacts with its neighbors, be used to represent the distance from the other nodes. Therefore, if the cell designed has variable resistors it will be able to solve the problem more accurately, just as the method with variable distance is able to. For these reasons a cell is proposed that has variable resistors and is able to minimize an appropriate energy function, based on the speci cs of the elliptic boundary problem. The proposed theory shows that a CNN can be used for both one and two dimensional spaces and the equidistant and variable distant cases will also be presented. To control the variable resistances a new architecture is also proposed in this thesis. The architecture that is proposed is generic and is able to control the individual weights of a neural network, not only a CNN. This architecture could be used in many types of neural networks including the cellular neural network for solving partial di erential equations. The architecture proposed will also

4 have the property that no additional routing will be required outside of a cell to control it. In this way a large number of cells can be used without the amount of routing per cell increasing. The new architecture and the two-diemnsional partial di erential theory will be demonstrated by designing an electronic CMOS circuit. The circuit will be layed out for an Orbit 2 analog process. The chip will then be fabricated by the MOSIS service. Finally, the CNN will be tested to verify both the theory and the design. The second chapter of this thesis describes the mathematical model behind solving PDEs and how those methods can be related to CNNs. In addition it provides a theoretical model for a cell and uses that cell in two illustrations. Chapter three describes an architecture for controlling neural networks in general and the CNN described in this thesis in particular. The fourth chapter describes how each part of the cell is implemented in CMOS. Chapter ve explains how each part was layed out in preparation for VLSI fabrication. The testing procedures of the chip is also reviewed along with the results of the tests. The nal chapter contains conclusions and suggestions for future work based on the results of the fabricated chip. The results of this thesis are expected to serve as an important step toward making large neural networks more controllable and thus more practical. It is also expected to provide partial di erential equation solutions to embedded systems, which would have previously required a digital computer subsystem.

Chapter 2

Formulation of the Problem


The basic problem to be solved is a system of partial di erential equations. When these systems are combined with initial and boundary conditions they can be used in many areas of engineering to help model the physical world. A nite element analysis method is usually used. This method divides a continuous problem domain into discrete points and then concentrates on solving the system at each of those points. The solution can be made more accurate if the function is divided among more points. Another way of increasing the accuracy is to make the points closer together in regions were the most change is taking place. To derive our approach we use two-dimensional equations. We will consider the rst case, where all of the distances between the points are equal and see how it relates to a cellular neural network. Then we will consider the more complex case where the distances between points are not all equal. Finally we will give an example to illustrate the theory.

2.1 Equidistant Case


Consider the Poisson two-dimensional equation:

@ 2 u + @ 2 u = f (x y) @x2 @y2

(2.1)

6 de ned on a region R where u(x y) is a continuously unknown scalar function which satis es some given boundary conditions u( ) = at the boundary of region R, f (x y) is a given function and x and y are space variables. If we let R be a square bounded region in the (x y) plane then it can be divided evenly into a square mesh with mesh size h. We can de ne P (x y) = P (ih jh) as the node on the mesh at (i j ). In a two dimensional mesh each node has four neighbors: (i ; 1 j ), (i + 1 j ), (i j ; 1), (i j + 1).
(0,0) (1,0) (0,1) (0,j) (0,n+1) (1,n+1) (i-1,j) (i,j-1) (i,j) (i+1,j) (i,j+1)

(1,1)

(i,0)

(i,n+1)

(n+1,0) (n+1,1) (n+1,j) (n+1,n+1)

Figure 2.1: Neighboring mesh points. The continuous function u(x y) can be approximated by the set of values at the nodes of the mesh. The partial derivatives of u(x y) can be replaced by the di erence equations:

@ 2 u(x y) at P = u(i + 1 j ) + u(i ; 1 j ) ; 2u(i j ) @x2 h2 2 u(x y) @ at P = u(i j + 1) + u(ih2j ; 1) ; 2u(i j ) @y2
ij ij

(2.2)

7 Partial di erential equation (2.1) is then approximated by a set of n2 linear equations:

;u(i ; 1 j ) ; u(i + 1 j ) ; u(i j ; 1) ; u(i j + 1) + 4u(i j ) + h2 f (i j ) = 0 i j = 1 ::: n

(2.3)

In order to simplify the system of equations into a matrix form we de ne a vector u to be the n2 values of u

0 1 0 1 u1 C u(i 1) C B C B Bu2 C B C u = Bu(i 2)C i = 1 2 : : : n B B . C u = B .. C B.C B .. C C B A B C @ @ A


i

(2.4)

u(i n)

In a similar manner we can also de ne vector F to be the values of F (i j ) at the nodes of the mesh

0 1 0 1 F1 C F (i 1) C B C B B F2 C B C F = BF (i 2)C i = 1 2 : : : n B B . C F = B .. C B.C B .. C C B A B C @ @ A
i

(2.5)

F ( i n)

Then the system may be written as:

Au + h2 F + b = 0
where the boundary conditions are in b, and A is

(2.6)

0 1 0 1 B ;I 4 ;1 B C B C B;I B C B;1 4 C B C B C B C B C B C B=B C ... ... A=B C B C B C B C B C B C B C B B ;I A 4 ;1C @ @ A


;I B ;1 4

(2.7)

I is an n n identity matrix. Matrix A in (2.7) is a n n (or n2 n2 after B is substituted) symmetric block tridiagonal matrix which is also positive de nite. GZ93], GZ94], OR70a] Thus this system can be solved with a neural network approach with an energy function de ned as: E (v) = 1 v Av + v ' where ' = b + h2F 2
T T

(2.8)

and the neural net will try to minimize (2.8) such that:

du = ;@ E ;v v : : :v v : : : v : : : v 11 12 1 21 1 dt @v
ij n n ij ij ij

nn

(2.9)
ij ij

where u is the input of the ij -neuron in the net and v is its output. Note that v = u . Zur92], TH86] Thus

du = ; X a dt
ij kl ij kl

ij kl

v ;'
kl

ij

v =u
ij

ij

i j = 1 ::: n

(2.10)

where a is an element of matrix A at row ij and column kl. The matrix A is diagonally dominant, and thus the neural network can be made up of locally connected cells. An ij -neuron and its locally connected neighbors are illustrated in Figure 2.2. GZ93]
(i-1,j)

1 (i,j) -4

(i,j-1)

(i,j+1)

(i+1,j)

Figure 2.2: Locally connected neurons with weights.

9 The weights of the connections is freqently described using a template:

2 3 1 7 6 T = 61 ;4 17 6 7 4 5
1

(2.11)

2.2 Increased Accuracy


The accuracy of a partial di erential equation solution may be increased by either increasing the size of the mesh, or making the network more sensitive to the most active part of the system being modeled. The rst choice requires that more nodes and thus more cells be used. This translates to increased surface area and therefore expense in a VLSI circuit. The second option provides better results for a set number of nodes. One method for determining where a function is more active is to examine the gradient of the function f (x y). In the regions of R where the values of f (x y) change rapidly the h parameter (distance between the nodes) should be smaller. We will now derive the equations from Section 2.1 on page 5 for a variable distance between the nodes (h) in the solution to the partial di erential equation. Note that our notation must change at this point, n is a distance in the y direction above the node (\north"), s is to the \south" of the node, e and w are to the \east" and \west" respectively (see Figure 2.3 on the following page.) Let us again consider the Poisson two-dimensional equation and again assume that it is twicecontinuously di erentiable on the region R and has some given boundary conditions.1

@ 2 u + @ 2 u = f (x y) @x2 @y2
1

(2.12)

This equation is the same as (2.1).

10

ui,j+n

i,j

n i,j

u i-w ,j i,j

wi,j

u i,j

ei,j

u i+e ,j i,j

s i,j u i,j-s

i,j

Figure 2.3: Notation for the Non-Equidistant Case.

11 where
@ u @ x2

and

@ u @ y2

can be approximated by taking the Taylor expansions about the mesh point Mit80]

i j

+ni j

u ;
i j

si j

u+
i

ei j j

u;
i

wi j j

n2 2 u + n @u + 2 @ u @y @y2 s2 2 u ; s @u + 2 @ u @y @y2 @u e2 2 u + e @x + 2 @ u @x2 @u w2 @ 2 u ; w @x + 2 @xu 2


i j i j i j i j i j i j i j i j

i j

i j

(2.13)

i j

i j

which can be combined to yield2

@2u @x2 @2u @y2


2w (u +
i j i

2 w (u +
i j i

ei j j

2 s (u
i j

i j

+ni j

;u e w ;u n s
i j i j

i j

i j

i j

i j

) + e (u ; (e + w ) ) + n (u ; (n + s )
i j i i j i j i j i j i j i j

wi j j

;u )
i j

si j

;u )
i j

(2.14)

Thus the equation can readily be rewritten as


ei j j

; u ) + e (u ; ;u ) e w (e + w ) 2s (u + ; u ) + n (u ; ; u ) + n s (n + s )
i j i j i wi j j i j i j i j i j i j i j i j ni j i j i j i j si j i j i j i j i j i j

; f (x y) = 0 (2.15)

Note the di erences between (2.14) and (2.2).

12 This equation can be written more concisely if the following functions are de ned

2 E = e(e + w) W = w (e 2 w ) + =N +S +E +W
i j i j i j i j i j i j i j i j

N = n(n2+ s) S = s(n 2 s) +
i j i j

i j

i j

(2.16)

i j

After the functions are substituted we have a ve part equation similar to (2.1)

N u ;N u
i j i j

i j

+ni j +ni j

i j

+S u ; ;S u ;
i j i j i j i j

si j

si j

+E u ;E u
i j i j

i ei j j

i ei j j

+W u ; ;W u ;
i j i i j i

wi j j

wi j j

; u =0 + u =0
i j i j

(2.17)

This can be written in matrix notation as

Au + + b = 0
The matrix A can be written

(2.18)

0 1 ;N 1 1 ;E 1 1 11 B C B ;S 1 2 1 2 ;N 1 2 C ;E 1 2 B C B C B C ;S 1 3 1 3 ;N 1 3 ;E 1 3 B C(2.19) B C A=B C ;S 1 4 ;E 1 4 B C 14 B C B;W 2 1 ;N 2 1 ;E 2 1 C B C 21 @ ... ... . . .A

13 but may be more clear if it is broken into sub-matrices

0 1 B1 E1 B BW2 B2 E2 C C B C A=B B W3 B3 E3C C B C @ A


W4 B4

(2.20)

0 B 1 ;N 1 B;S 2 B ;N 2 2 B =B B B ;S 3 ;N 3 @
i i i i i i i i

;S

1 C C C C C C A

(2.21)

0 B;E B B E =B B B @
i

;E

;E

;E

1 0 C B;W C C W =B B C B C B C B A @
i

;W

;W

;W

1 C C C (2.22) C C C A

Notice how this matrix (2.20) is similar in form to (2.7). The I , the identity matrices in (2.7), which are, of course, diagonal, are represented in (2.20) by the digonals E in the upper triangle and W in the lower. The B matrix is also very similar to the B matrix in (2.7). To complete the matrix equation (2.18) we need only to de ne .
i i i

0 1 B 1C B 2C B C =B . C B .. C B C @ A
n

0 1 F (i 1) C B BF (i 2)C B C = B . C i = 1 2 ::: n B .. C B C @ A
F (i n)

(2.23)

The boundary vector b is de ned as it was in the previous section, but will become more clear after an illustration which will be given in Section 2.3 on the next page.

14 The matrix A given in (2.20) is irreducibly diagonally dominant, has positive diagonal elements, and non-positive non-diagonal elements therefore it is a M-matrix (x2.4.14 of OR70b]). It follows that the solution to the system of equations will be a homeomorphism, and will therefore have a unique and stable solution (x5.4.1 of OR70b]). Since the system has a unique solution a convex energy function which can be can be de ned as it was in Section 2.1 on page 5. GZ93], OR70c], Zur92]

2.3 An Illustration
As an example, let us consider a rectangular plate that can be heated or cooled by applying temperature sources (heaters or refrigerators) around its perimeter. The classical model for the steady-state heat ow in such a plate is the Poisson's equation Ric83]:

@ 2 u + @ 2 u = f (x y) @x2 @y2

(2.24)

where u(x y) denotes the temperature at the point (x y). A sample physical plate is shown in Figure 2.4 on the next page. The temperature distribution inside the physical plate is modeled by the Laplace's equation:

@2u + @2u = 0 @x2 @y2

(2.25)

Suppose that the the temperature in the interior of the plate is at room temperature. Therefore the temperature will change until it reaches a new steady state temperature which is caused by the given conditions. This process can be modeled by

du = @ 2 u + @ 2 u dt @x2 @y2
du dt

(2.26)

The problem is now a function of time: u(x y t). The initial condition u(x y 0) = 20 models room temperature. As approaches zero equation (2.25) becomes (2.26). Thus the steady state is not dependent on time t nor is it dependent on the initial conditions of the interior of the physical plate.

15
Partially heated side 0 0 Heated side u(x,0)=100 2 2 d u + d u dx2 dy2 in the interior Cooled side = 0 u(x,1)=0 u(0,y)=100(1-y) 1 y

1 Heated side x u(1,y)=100

Figure 2.4: Laplace's equation model for our heated plate example.

2.3.1 Equidistant Case


To nd the steady state solution of this problem we divide the plate into a mesh. For simplicity we will consider the equidistant case from the rst section of this chapter where h is constant for all of the cells. (See Figure 2.5 on the following page.) Di erence equations (2.2) are used to replace partial di erentials in equation (2.25). This creates a di erence equation at each point on the mesh. If we set n = 4 then we have a set of 16 equations and 16 unknowns labeled u(i j ) for i j = 1 : : : 4 which can be written in matrix form:

Au + b = 0

(2.27)

where matrix A is a block diagonal 4 4 matrix given by (2.7) and u is the unknown temperature vector as de ned in (2.4). The vector b is the boundary conditions for this problem and is de ned as

b = (180 60 40 20 100 0 0 0 100 0 0 0 200 100 100 100)T

(2.28)

16
Partially heated side u(0,j)=20(5-j) 0 0 1h Heated side u(i,0)=100 2h 3h 4h 5h Heated side ih u(5,j)=100 (i,j) Cooled side u(i,5)=0 1h 2h 3h 4h 5h jh

Figure 2.5: Discretization of a plate domain by a rectangular grid of points. These equations were solved using both a numerical method and our cellular neural network method. To solve the equations using the CNN voltages must be de ned for the temperatures and we arbitrarily chose a scaling factor of 20 (i.e. 100 C corresponds to 5V.) (See Figure 2.6 on the next page.) A circuit diagram for a cell is given in Figure 2.7 on page 18. The rst order circuit analysis for the cell (i j ) is

C du + u ;Rv ;1 + u ;Rv +1 + u ;Rv ;1 + u ; v dt R v =u i j = 1 ::: 4


i j i j i j i j i j i j i j i j i j i j

i j

+1

=I

i j

(2.29)

17

4V

3V

2V

1V

5V

(1,1)

(1,2)

(1,3)

(1,4) 0V

5V

(2,1)

(2,2)

(2,3)

(2,4) 0V

5V

(3,1)

(3,2)

(3,3)

(3,4) 0V

5V

(4,1)

(4,2)

(4,3)

(4,4) 0V

5V

5V

5V

5V

Figure 2.6: 4x4 Cellular Neural Network for our heated plate example.

18

v i,j

I i,j

v i-1,j

R v i,j-1 R v i,j

R v i,j R v i,j+1

+ u i,j v i+1,j -

+ -

+ v i,j -

Figure 2.7: An example of a cell circuit.

19 this set of equations can be written as

du = ;4u + v + v + v + v ;1 +1 ;1 dt v =u i j = 1 ::: 4
i j i j i j i j i j i j i j

i j

+1

+ RI

i j

(2.30)

where = RC and is the time constant of the circuit. Equation (2.30) has the form described earlier by the equation (2.10) where RI = ;' . As before equation (2.10) converges to the steady state due to the positive de niteness of matrix A, and therefore equation (2.30) will also converge. In our simulation a value of R = 100k and C = 1pF giving a time constant = 25ns. The results obtained by Spice simulation (see Table 2.1) match those obtained by numerical analysis using the software program Maple. (See Table 2.2.)
i j i j

4.3687 4.2965 4.4270 4.6748

3.1783 3.3904 3.7365 4.2724

1.9542 2.3502 2.8563 3.6783

0.78848 1.1997 1.6602 2.5846

Table 2.1: The voltages obtained by Spice simulation for the equal distant case. 4.3687 4.2965 4.4270 4.6748 3.1783 3.3904 3.7365 4.2724 1.9542 2.3502 2.8563 3.6783 0.78848 1.1997 1.6602 2.5846

Table 2.2: The voltages obtained by Maple for the equal distant case.

2.3.2 Non-Equidistant Case


The accuracy can be increased if the distance between the nodes is smaller in the region where the change is the greatest. In this example there is one regions where the the rate of change is very large. That region is the south east corner where the bondary conditions are 0 C to the east and 100 C to

20 the south. Although the topic of how to design a suitable mesh for any particualr problem is beyond the scope of this thesis a possible mesh could be described using four matrices. One for the weight in each direction. In these matrices a \1" indicates the least resistance, and a \4" the most.

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 4 3 37 6 7 n=6 64 3 2 27 7 6 7 4 5 4 2 1 1 3 2 3 4 4 37 4 4 3 47 6 7 e = 64 3 2 37 6 7 4 3 27 7 6 7 63 2 1 27 7 6 7 3 2 17 5 4 5 3 2 1 3 2 1 1 2 3 4 4 3 37 6 64 3 2 27 6 7 s=6 64 2 1 17 7 6 7 4 5
4 3 2 1

(2.31)

Again the results from the Spice simulation (see Table 2.3) match those obtained from the numerical solution using the same mesh, at least within one hundredth of a volt. Only two have a di erence greater than one hundredth of a volt (0:0119V and 0:0116V). (See Table 2.5 on the following page.) Although the Spice and numerical solutions match for both the equidistant and the variable distant cases, the latter is more accurate since it gives more solutions to the PDE in the area of highest volatility at the expense of fewer solutions in the area of lowest volatility. 4.3256 4.1895 4.2241 4.4858 3.1128 3.2084 3.4717 3.9110 1.9174 2.2808 2.7065 3.2409 0.97039 1.4790 2.0023 2.5608

Table 2.3: The voltages obtained by spice simulation for the variable case.

21

4.3242 4.1854 4.2122 4.4742

3.1115 3.2054 3.4669 3.9067

1.9165 2.2792 2.7045 3.2393

0.96988 1.4781 2.0012 2.5601

Table 2.4: The voltages obtained by Maple for the variable case.

0.0014 0.0041 0.0119 0.0116

0.0013 0.0030 0.0048 0.0043

0.0009 0.0016 0.0020 0.0016

0.00051 0.0009 0.0011 0.0007

Table 2.5: The di erences between Table 2.3 and Table 2.4.

Chapter 3

Architecture of VLSI Chip


As mentioned previously the architecture of the VLSI chip is the main contribution of this thesis. From the point of view of the architecture the basic problem requires that there be a matrix of cells, that each cell have four variable connection weights and an output.

3.1 Variable Resistors


The variable weights are implemented by variable resistors. The resistors that are required by the cells must be individually controlled. That is every cell may have di erent values for its own resistors. Note that with other CNNs, CY88b], CY88a], including the equal distant case mentioned in the previous chapter, the resistances used in each cell are the same and are described by a \template." We are more concerned with the non-equidistant case where the resistors may be di erent, becuase it allows more accurate results with the same number of cells. As will be discussed in the next chapter there are several ways of making a variable resistor in a CMOS design. The most common is to use some sort of active resistance. This would require that at least one analog line be routed to each resistor in order to set the value. This requires too much routing for a large matrix. For example, with the Orbit 2 technology a typical wire is 4 wide with 3 separating each wire. For a matrix that is 100 100 where each cell has four neighbors we would need a path that was 50 4 (3 + 4 ) = 1400 tall just for the routing of one row of cells. We

22

23 wanted a method that would not require much additional control routing no matter how large the matrix was.

3.1.1 Generic
A generic method for controlling neural networks, especially CNNs, with a minimal amount of control routing would be to use a distributed memory where each cell stores its control information locally. Since the cell weights are frequently need to be controlled individually each weight should have its own local memory. In other types of neural networks, which are not covered in this thesis, the gain of the bu er also needs to be controlled. In those cases, the cell's bu er should also have its own memory. From an architectural point of view all of the local memories need to be addressable so that they can be written to. The memories will, in turn, control the cell. There are numerous types of memory cells that could be used in this generic architecture. The idea will work with both analog and digital memories, providing that the memory type will function with the cell part that needs to be controlled. Figure 3.1 on the next page shows a more general cell with an arbitrary number of weights, and therefore neighbors. The method we chose uses several passive resistors connected in series.1 At the point between each resistor is a digitally controlled analog switch.2 The other node of each switch is connected to a bus. All of the switches in one variable resistor are connected to the same bus and that bus acts as the output of the variable resistor. So the e ective resistance of the circuit is controlled by which switch is on. The number of switches and resistor segments depend on how much controllability is required for any particular application. The size of each resistor segment also depend on the application for which the digitally controlled resistor will be used. (See Figure 3.2 on the following page.) It might appear that all of these switches has made the routing problem worse, because each switch needs a control line. For example, if a variable resistor with ve possible values was required (e.g. 0 , 25 , 50 , 75 , and 100 ) you would need ve switches and therefore ve lines routed to each resistor to control those switches. Although this number can be reduced by using a multiplexer and demultiplexer the number of lines is still dependent on the number of variable resistors which is in turn dependent on the size of the matrix.
1 2

The other reasons besides control routing that this choice was made will be discussed in the next chapter. The switch is always either on or o , but when on it allows an analog signal to pass.

24

Weight
Local Memory

gh

or

ht g ei
rn
Local Memory

ei

em

lM

Weight

Lo

ca

Figure 3.1: A general cell with many weights.

In

r1

r2

c Lo

al

em

or

4 Lo

t gh ry ei mo W l Me
Cell
Out

ca

r1 r2 Select lines for each resistance value.

rn

Figure 3.2: A variable resistor.

25 In order to solve the problem of controlling all of the switches a system of digital shift registers can be used.3 Like any other shift register each register needs two clocks (one the inverse of the other). As the clocks are cycled the input of one shift register moves to the input of the next and so on. A one bit shift register is needed for each switch of each variable resistor (i.e. one for each port, or empty circle, of Figure 3.2 on the page before). This allows us to reduce the number of lines that must be routed to each variable resistor to three (the input, and two clocks.) (As shown in Figure 3.3.)
0 r1 rn

Shift In

Shift Out

Two Phase Clock

Two Phase Clock

Two Phase Clock

Figure 3.3: A local memory cell consisting of shift registers. In order to reduce the number further, the output of the last shift register of a variable resistor can be connected to the input of the rst shift register of the next variable resistor. In this way it would be possible to reduce the total number of lines used to control all of the variable resistors to three. However, that would require a great number of clock cycles to initialize all of the variable resistors, or weights, of the neural network.4 After having considered this case it becomes more clear that this method of controlling weights could be applied to other neural networks. The shape of the network is not important, in fact it does not even need to be cellular. In order to set or change the set of weight values in the neural network using the shift register method you would, in general, need to shift in a completely new set of weights. For example if we had a single variable resistor to set and it had ve possible values, we would need to shift in a \1"
3 Since the shift registers use only two values they are digital, but the two values needed are not the standard TTL digital of 0V and 5V. See the next chapter for more information. 4 The equal distant case and other \template" models can also be solved in this manner. Only the two clocks and one set of shift registers for each variable resistor of the template cell are needed. The switches of all of the other cells can be controlled by these shift registers using bus lines. In this way, far fewer shift registers are needed and thus the layout can be made more compact.

26 followed by four \0"s in order to set it to its highest resistive value or four \0"s followed by a \1" to set the resistor to its smallest value. This requires one complete two phase clock cycle for each full bit shift register. Although for larger matrices this could take a lot of time it is still relatively short, since each phase of the clock cycle need only change the state of one inverter. Thus the clock cycle for the circuit should be as fast as is possible for any clock with that type of technology. As previously mentioned active resistors can be used for variable resistors as well. A system of memory cells could be used to store the control values for active resistors as well. Since shift registers typically only store digital signals and active resistors require analog values the analog memory cells would have to be written to in some other way. In a two dimensional matrix the simplest of which would be a raster type system where a bus line would deliver the signal and a horizontal and vertical line would select each memory cell for a write. Note that active resistors still have a problem with regard to the number and size of the transistors compared to the linear range over which they function, as well as the resistive range. This aspect will be covered in the next chapter.

3.1.2 Implementation
In our implementation we chose a compromise. Each cell consists of four variable resistors which are stacked vertically (in the north direction.) The cells are, of course, in a matrix. The shift registers run in the horizontal direction. The output of the shift registers of one cell are connected to the same level resistor in the next cell which is in the next column. (See Figure 3.4 on the next page.) The clock lines run along a bus vertically through the matrix. In this way you can shift in the settings for a whole row of resistors from one input. Which of course means that all of the rows can be set in parallel. In this way the weights can be set very rapidly since a shift register is only as slow as the switching time of one pass transistor and one inverter. Thus the clock speed is essentially as fast as it can be for any circuit with a given technology. Our chip has a 4 4 matrix that has ve equally spaced resistor values. Therefore, you need ve clock cycles to set each variable resistor. Since the rows are done in parallel the total number of clocks is twenty (5 clocks 4 columns). Although this would be simple and quick to set all of the resistors it requires one input pin for each row of resistors. In our matrix there are 16 such rows. In general pins are at a premium in VLSI designs, so it is desirable to reduce the number used as much as possible.

27

Shift Ins

Shift Outs

Local Memory (Shift Register)

+ -

+ -

Figure 3.4: Two cells with connected variable resistors.

28 After the complete layout was considered, including test circuitry, it was clear that no more than four pins could be used for the variable resistor's shift registers inputs. In order to reduce the number from sixteen to four an extra bank of two shift registers were used. This technique adds two more clocks and uses two inputs for a total of four pins. Under these conditions the two new shift registers, each of which holds one bit of the setting information for eight rows, must be lled before each clock cycle for the matrix. (See Figure 3.5.) Taken together one hundred sixty clock cycles are needed (8 20 = 160). In a proven design fewer pins would be needed for testing therefore more could be used for shift register inputs and the total number of clocks could be reduced.
Top Shift In Shift Out to next cells Shift In

Clk1 Clk1

1,1

1,4

2,1 Bottom Shift In

Clk1 Clk1

3,1

4,1

4,4

Figure 3.5: Matrix with External Shift Registers. To a certain extent these two extra banks of shift registers act as a global or shared memory which is used as a staging area to initialize the local or distributed memory cells of the weights. When looking at the complete system from this point of view it is analogous to a SIMD parallel computer with a two-dimensional mesh interconnection network. (See Figure 3.6 on the next page.) Parallel computers tend to have a host or controlling computer. In our case the test equipment, or a system in which the CNN were embedded in, would have a comparable task as this host would. As a nal comparison we should point out that both the routing of the interconnection networks and

29 handling the individual processing elements I/O requirements are two of the major di culties that must be over come in digital parallel computer designs as well. HB84]
Processing Element

Local Memory

Global Memory

Processing Element

Processing Element

Processing Element

Local Memory

Local Memory

Local Memory

Processing Element

Local Memory

Figure 3.6: Processing elements with local and global memory.

30

3.2 Boundary Conditions and Outputs


The boundary conditions are analog and must be held throughout the evaluation of the circuit. Since current may be owing through the boundaries any storage technique would have to be bu ered. Further more it would be useful to allow more than one set of values to be obtained for a given set of weights. Therefore they should be easy to change. For all of these reasons the boundary conditions are best implemented by directly connecting the matrix to analog bonding pads at the boundary points. Another possibility would be to use an analog memory device such as a oating gate capacitor for each boundary point. Those cells would then need to be lled using some sort of network, of which there are numerous viable possibilities. One such possibility, which is ideally suited for a very large number of boundary nodes, is a raster technique. This technique is not needed for a small network such as our 4 4 matrix, and it requires more complicated test equipment. The outputs, on the other hand, are only needed at the end of the CNN operation. Since the outputs of the cells come from the outputs of the bu er and the input of the bu er is at a large capacitor the outputs will maintain their voltage for a relatively long time. Due to this fact we can read out the outputs more slowly, if needed, and therefore each output does not need its own pin. We choose to connect the output of each cell in a row to a bus line through a simple switch with the same characteristics as the one used in the variable resistors (see Section 4.1.1 on page 32.) The bus lines then go directly to analog pads where they can be easily read externally from the chip. In this way, one cell from each row (an entire column) can be read in parallel. (See Figure 3.7 on the next page.) To a certain extent this is similar to the Massively Parallel Processor (MPP) developed at the NASA Goddard Space Flight Center as well as other mesh based parallel computers. The MPP is a 128 128 processor mesh which is able to handle its output a column at a time (128 bits at a time.) It is not an exact analogy since the the MPP shifts digital bits in and out of the array, and shifting analog bits is not as practical. HB84] To minimize the number of pins used for output a decoder or demultiplexer is used to control the switches. The decoder itself also has an o state so that all of the columns can be turned o . This helps to minimize the load of the pads and bus lines during the operation of the CNN. Another possibility would be to use a fully parallel output device, such as a pixel at each node. The intensity of the radiation from the pixel would indicate the output value of that node.

31

Selects On/Off

1,1

1,4

2,1 Row Output 3,1

4,1

4,4

Figure 3.7: Matrix with Output Control Circuitry.

Chapter 4

CMOS Implementation
As can be seem from the preceding chapters the Arti cial Neural Network described in this thesis consists of a matrix of cells. In addition it is clear that each cell must have four variable resistors, a bu er and a capacitor. Although the larger the voltage range that the circuit could handle for input and output the more accurate our results would be, we decided that 5V was a reasonable voltage to design for. The CMOS design of each of the necessary analog parts will be discussed in the rst section of this chapter. The second section will review the results of Spice simulations.

4.1 Design Circuitry


All of the parts were \over designed" in that tolerances of 0:8 for the length and width of every transistor were allowed, which is beyond the limit set by both MOSIS and Orbit (the chip fabricator). MOS93] Although this uses more of the surface area of the chip it was considered necessary with an unproven design.

4.1.1 Variable Resistor


As described in the previous chapter on the architecture of the neural net the variable resistor consists of passive resistors, \taps" between those resistors and a set of shift registers. The ve taps that were used were equally spaced so that 0%, 25%, 50%, 75%, or 100% of the resistance could be realized. The only requirement on the size of the resistances is that it must be large compared to the parasitic 32

33 resistances in the circuit. We determined based on simulations that approximately 20k should be used for our 100% resistance.

Resistor
We choose to use a passive resistor for several reasons. The rst and foremost is that the CNN requires a linear resistor across as wide a range of voltages and resistances as possible. Although active resistances can be found that are linear across fairly large voltage ranges, they do not have a wide range of resistances. For our purposes a resistor needed to be able to be several values from 0%x to 100%x where x was larger than the parasitic resistances of the rest of the circuit. The active resistance designs that we reviewed could have a large range of resistances in Ohms, but not as a percentage. AH87a] GAS90a] For example, an active design might have had a range of 50k to 75k (which is 25k ). Although 25k is a su cient range, we require 0 to 25k not 50k to 75k .1 Passive resistors, on the other hand, are linear across any practical voltage range. A second reason for using passive resistors is that once they are formed they can't be changed and therefore they do not need control wiring themselves. As mentioned in the chapter on architecture they can still be variable if taps, to allow the current to leave prematurely, are placed along them. A passive resistor in a CMOS VLSI design is simply a length of polysilicon, n-type or p-type di usion or a well. A contact is placed on either end of the length of material. We chose to use polysilicon, because although it has a lower resistance than the others per unit length, it can be snaked so that the total area used for a certain resistance is less than for other materials in the MOSIS 2 analog process. A length of 1808 that was 2 wide.2 At 20 per square unit each small resistance segment was 4520 . Each contact adds approximately 20 ; 50 . Each tap also adds a certain amount of resistance depending on the size of the transistor. MOS93] Since the taps are essentially digital switches that will pass analog values they could be implemented as pass transistors. Simulations showed that true CMOS pass transistors were not required
1 We should point out that active designs that we determined to be to complex (large) in terms of the number of transistors were not fully examined, so it is possible that some active resistor do have these characteristics, but were impractical for size reasons. 2 The CMOS technology used was the MOSIS (Orbit) 2 analog (low noise) which means 1 = 1 .

34 so NMOS pass transistors could be used to save space. In order to pass 5V linearly NMOS transistors must have an \on" of 8V and an \o " of ;8V rather than the usual 0V and 5V found in traditional digital circuits. The pass transistors were as wide as possible to minimize there resistance without increasing the size of the circuit. In this case that was 4 . With a minimum length of 2 the NMOS pass transistor has an average e ective resistance of about 2k when selected (\on.") To summarize, the total resistance is 4520 for each segment plus 2k for the tap and 20 ; 50 per contact. The values for the ve basic variations are given in Table 4.1

Percentage Resistance
0% 25% 50% 75% 100% 2k 6:6k 11:1k 15:6k 20:1k

Table 4.1: The ve basic variations the resistor can achieve.

Shift Register
The shift registers needed to be able to have a \1" at any register, while having a \0" at all of the others. This required the use of full bit shift registers. A full bit shift register is simply a pass transistor and a bu er which is made from two inverters. Since we wanted to be able to select one of ve settings ve shift registers were needed by each variable resistor. The output of each register only needed to be able to drive the next register and a tap, which is simply the gate of an NMOS transistor. Therefore each register needed to drive very little current, so the bu er could be minimum size. For the same reason the bu er's power rails needed to be be at least 8V. The pass transistor only needed to drive the bu er so if the bu er is CMOS the pass gate can be implemented using one NMOS transistor. (See Figure 4.1 on the next page.)

4.1.2 Bu er
The bu er we choose to use was implemented by an operational ampli er with the output tied to the negative input terminal. The operational ampli er design was taken from AH87b] and then reduced in size by examining the simulations and making adjustments. Since we are more interested in the

35
Clock Vdd Resistor Select

Shift In

Shift Out

Vss

Figure 4.1: A one bit shift register. steady state or DC performance we were able to signi cantly reduce the size of the transistors from what the algorithms in AH87b] suggested. For the same reason we were also able to remove the large \compensation capacitor." Another di erence from AH87b] is that a biasing transistor was removed and replaced with a voltage bus. The use of this bus makes the circuit slightly smaller and makes a slight runtime adjustment possible. The circuit is basically a di erential ampli er ( ve transistors) with a little extra biasing circuitry (two transistors.) (See Figure 4.2 on the following page.) The transistor aspect ratios (w/l) that were calculated and those that were actually used are given in Table 4.2.

Device Calculated Ratio Ratio Used


M1 M2 M3 M4 M5 M6 M7 42/1 42/1 1/1 1/1 100/1 44/1 10/1 1/1 1/1 1/1 1/1 3/20 15/1 15/1

Table 4.2: The aspect ratios (w/l) for the devices in each cell's bu er.

36

Vdd

M4 M6

M3

+ Vin

M2

M1

Vout

Vss

M7

M5

Vbias

Vss

Figure 4.2: The bu er used in each cell.

37

4.1.3 Capacitor
Although the previous circuit parts were all designed in a rigorous manner the capacitor was more ad-hoc. The only requirement was that the capacitor store the value at the gate of the bu er as long as necessary for it to be read. If it were larger than necessary the circuit would simply take longer to stabilize. If it were not large enough the result would decay too much before it could be read. Therefore we decided to make the capacitor as large as possible without increasing the size of the layout. In other words we used the extra space left by the width of the bu er compared to the width of the variable resistor. This turned out to be a total of 189 70 of which only 154 29 was usable for the capacitor since each capacitor requires its own well with a full ring guard. In addition this technology requires a 2 boarder so that one plate of the capacitor is larger than the other. The capacitor is then between 1:92pF and 2:46pF (154 29 (0:43fF= 2 to 0:55fF= 2 ). MOS93]

4.1.4 Glue Logic


In addition to the neural network sub-circuits some \Glue Logic" was needed to facilitate the input of the control signals and the output of the CNN. These parts were described previously in Section 3.1.2 on page 26 and Section 3.2 on page 30. Both the 8-bit shift register and the 2-to-4 decoder with an enable line are standard digital parts so they will not be discussed here except to mention that both sub-circuits function at 8V not 0V and 5V. The layouts are given in Figure C.1 on page 77 and Figure C.3 on page 79 respectively (Appendix C on page 77.)

4.2 Simulation Results


Each part was simulated individually with Analog Work Bench by Valid (AWB) using the \Spice+" simulator. The parts were connected to form a cell and that cell was also tested with AWB. A 4 4 matrix was tested using Spice 3 from the University of California at Berkeley (UCB.) All Spice simulation were performed using the same models which are described in MOS93] and are included in the MOSIS Parameters given in Appendix E on page 84 of this thesis. All of the circuit parts were \over designed" in that tolerances of 0:8 for the length and width of every transistor were allowed for, which is beyond the limit set by both MOSIS and Orbit (the chip fabricator). MOS93]

38

4.2.1 Variable Resistor


The variable resistor was simulated with a combination of analog and digital tools. The analog tools were used at the design stage while the digital tools were used at the layout stage.

Resistor
It was not possible to test the passive resistor directly from the layout because the circuit extract of magic considers the two ends of a long segment of polysilicon as only one node. We used Spice at the design stage to ensure that the taps did not alter the resistance to much, but the actual polysilicon resistors were assumed to be ideal with resistances as calculated in Table 4.1 on page 34.

Shift Register
Since the shift register is basically digital it was possible to use the standard CMOS digital simulator esim to test its functionality. Spice was needed to ensure that the bu ers in each shift register were able to drive a tap far enough on or o for the desired analog signal to pass through. The esim input and output for the basic shift register cell is given in Appendix B on page 72 as well as an AWB circuit and \oscilloscope" showing the a shift register driving a tap (Figure B.3 on page 75.)

4.2.2 Bu er
As mentioned previously the design for the bu er was taken from a book and then adjusted considerable during simulations. Although variations in the size of the devices was taken into account in all of the devices it was especially needed in the bu er since it had to be kept small. Normally a minimum size that is fairly large is selected and then all of the aspect ratios are multiplied by that size in the layout. We used actual device sizes and tried all extreme combinations of device size errors during simulations. This extra care kept problems from arising during the layout stage. The bu er that was simulated is shown in Figure 4.3 on the next page. In addition to the basic circuit an extra resistor \z" was used to more accurately mimic the layout. A load on the output consisting of four resistors and four capacitors was also added to simulate the load that the neighbors of the bu er would cause. The biasing voltage, V , was determined based on simulation results to be ;3V.
GG

39 The nal bu er worked quite well when simulated. In Figure 4.4 on the following page \Channel 3" shows the error (di erence between input and output) in volts and is at a di erent scale, and centered at a di erent zero point than the input and output channels. The maximum and minimum errors are also given in the \Value" column (;12:9mV { ;15:4mV.)

Figure 4.3: AWB circuit of a bu er.

4.2.3 Cell
The bu er above was combined with four ideal resistors and a capacitor at its input to form a cell. The cell is then simulated with four resistors and four capacitors to mimic the load that the output of a cell would see. Figure 4.5 on page 41 shows this AWB circuit, whose simulation results were as expected.

40

Figure 4.4: AWB simulation of the bu er.

41

Figure 4.5: AWB circuit of a single cell.

42

4.2.4 Matrix
Spice-3f4 from UCB was used to simulate the complete matrix with both ideal resistors and capacitors. The bu ers were made from both ideal parts and from a layout which adds parasitic capacitance and resistances. It was not possible to test the complete layout due to the problems previously mentioned with the extraction process. The input le that was used for the simulations is given in Appendix A on page 68.

4.2.5 Glue Logic


The only simulations done on the 8-bit shift register and the 2-to-4 decoder with an enable line were standard digital simulations with esim to con rm the functionality of the design and the layout. The details are in Appendix C on page 77.

Chapter 5

Chip Design and Measurement


The layout for each of the parts will now be discussed. Many of the concepts used in designing the layout came from Ism94]. After the layout was complete the design was fabricated by MOSIS as a tiny chip. MOSIS produces and returns four chips, which were tested. Pictures were taken of one of the chips and those are included here along with each layout. Note the the the images are shown with the same orientation as the layouts.

5.1 Layout
Most of the parts had a very straight forward layout. The notable exception was the bu er which required a fair amount of adjustments in order to make some of its large, odd shaped, devices t in as small as space as possible. All of the layouts were originally designed with both N-wells and P-wells. The CIF extractor was then used to remove a well and x up the remaining well. The layout had to be altered in certain areas where the wells were joining unintentionally. This was especially true around the bu er which had odd shaped wells and the capacitor which had to have its own well. Special care was taken so that the routing of each cell was modularized. This insures that the mesh could be made arbitrarily large without having to change the routing. Minor Changes to the layouts of the smaller parts were made to facilitate this modularity.

43

44

5.1.1 Bu er
As described in the last chapter the devices in the bu er had very particular aspect ratios given in Table 4.2 on page 35. The technology we used had a minimum length of 2 and a minimum width of 3 . In order to keep the size of the bu er small, but still have good performance we chose to make the largest ratios as small as possible while making the smaller ratios (1/1) larger than necessary to counteract the e ect of error in fabrication. These transistors (M1 , M2, M3, and M4 ) are the di erential pairs and so they need to be very similar in size. A slight change can make a big di erence. With the help of Spice simulations and taking into account the maximum error allowed under the fabrication rules we arrived at the device sizes given in Table 5.1. MOS93]

Device Calculated Ratio Ratio Used Size Used ( )


M1 M2 M3 M4 M5 M6 M7 42/1 42/1 1/1 1/1 100/1 44/1 10/1 1/1 1/1 1/1 1/1 3/20 15/1 15/1 8/8 8/8 8/8 8/8 3/20 30/2 30/2

Table 5.1: The aspect ratios (w/l) and sizes used for the bu er. The original, straight forward, layout of the bu er was much too large so the devices M6 and M7 were \snaked." The transistor M5 , which is used to bias the di erential pairs, was bent to make the total circuit smaller and to make the connections easier. All of the contacts are made as large as possible. Figure 5.1 on the following page shows the nal layout of the analog bu er while Figure 5.2 shows the bu er as it was fabricated.

5.1.2 Variable Resistors


The layout of the variable resistors began with the design of a single shift register in as small a space as possible. This register was then duplicated in order to make ve connected shift registers. The space between the registers was minimized, to make the whole 5-bit shift register as small as possible. This part was obviously much wider than it was tall, so the resistance, consisting of four \snakes" of polysilicon, was placed above it. In this way the width of the resistance portion was based on the shift registers. The height or number of bends in the \snake" was determined by the approximate

45

Vdd!

Vgg

in

out

Vss

Figure 5.1: Layout of the analog bu er.

Figure 5.2: Picture of the analog bu er.

46 resistance that we required (see Section 4.1.1 on page 33.) Figures 5.3 and 5.4 show the nal layout and physical image of the variable resistor.

out GND in a c

Vdd

Figure 5.3: Layout of the ve way variable resistor.

Figure 5.4: Picture of the ve way variable resistor.

5.1.3 Capacitor
As mentioned in Section 4.1.3 on page 37 the capacitor's size was determined by the space left over. The space must be used to connect the capacitor to V , give it its own well, and | to shield that well from noise | as complete as possible guard ring. The actual capacitor is made from two layers of polysilicon with one layer overlapping the other by 2 . The connection to the top of the capacitor is
SS

47 made with a grid in order to spread out the charge more evenly. Note that the grid can be connected to the rest of the cell with a second layer of metal. For the same reason the capacitor is connected in more than one place to V . The guard ring is complete except for openings for those connections to the negative power rail. Figure 5.5 shows the complete capacitor, positive terminal, and power rails while Figure 5.6 is a picture of the capacitor as connected to a cell.
SS
Vdd

Vss

Figure 5.5: Layout of the large capacitor.

Figure 5.6: Picture of the large capacitor connected to a cell.

48

5.1.4 Cell
The four sub-circuits described above are connected together to form the basic cell of the cellular neural network. The variable resistors are stacked four high with both the shift inputs and the analog inputs on the left and the shift outputs on the right. The analog outputs are on a bus running horizontally in each resistor and the four variable resistors are in turn connected together on a vertical bus near the center of the cell that connects the resistors to the capacitor and the input of the bu er. The output of the bu er is connected through a switch (the same as the tap used in the variable resistor) to an output bus. As can be seen in Figure 5.7 on the following page the capacitor and the bu er are on the bottom left and right of the cell. Also, ten clock lines for the shift registers run vertically through the cell, but are routed around the bu er which is rotated on its side.

5.1.5 Matrix
The matrix is simply constructed by making copies of the cells in a 4 4 matrix. The power rails of the shift registers, bu er, and capacitor connect with their horizontal neighbors. The biasing buses for the bu ers also connect to their neighbors. The clock lines for the shift registers connect every other line together along the bottom of the matrix. An additional set of four busses (one per column) run vertically through the matrix in order to connect the output switches in each column together.

5.1.6 Complete Chip


The chip is completed by placing the matrix inside the analog frame provided by MOSIS. MOS88] The pads which make up the frame are connected to the matrix. In addition the large shift registers and the decoder glue logic are added between the frame and the matrix. Finally the test circuitry, which consist of one variable resistor, one bu er, and one complete cell are added so that they can each be controlled and monitored through their own pins. The basic MOSIS analog tiny chip frame had to be altered in order to increase the number of pins available for our use. Normally the four corner pins are used to power the frame. We changed the frame so that it could be powered by the same pins that power the CNN and test circuitry. Those four pins were then used for extra voltage inputs. See Table D.1 on page 83 in the appendix for the complete pin out. Figure 5.9 on page 52 is the complete layout of the chip. A picture of the complete

49

Figure 5.7: Layout of the complete cell.

50

Figure 5.8: Picture of the complete cell.

51 chip is given in Figure 5.10 on page 53. Note that the picture is a composite of smaller pictures and is as accurate as possible with the available equipment.

5.2 Measurement
After the chip was fabricated it was tested in two stages. First parts were tested (a bu er and a complete cell). In the second stage the complete matrix was tested with several topologies (sets of weights) and several sets of boundary conditions. Finally we should note that although the chip was designed for power rails to be at 8V it was tested with power rails set to 0V and 5V. This was done for two reasons. The main reason is that the chip consumed more current than was expected at the higher voltages (due to a slight error in calculations). The second reason is that although an acceptable amount of current was consumed at slightly higher voltages than a 5V swing, the test equipment available makes it di cult to ensure the accuracy of the digital pulses at voltages other than 0V and 5V.

5.2.1 Test Circuits


A bu er was placed in the chip which had its own pins for both its input and output ports. The bu er did share its V , V , and V pins with the rest of the chip. A complete cell was also placed on the chip. It shared its V , V , V and the matrix clock pins. It also shared four boundary pins with the rest of the matrix. The cell had two shift input pins, one for both the North and South resistors and another for both the East and West resistors. Finally it had its own output pin.
DD SS GG DD SS GG

Bu er
The test bu er was measured rst since it was the most basic and traditional design. After values for the power rails (V and V ) were selected the values for V was determined and applied. It should be noted that the power rails and V should be applied at exactly the same time. If this is not possible V should be applied rst so that the current allowed to ow between the power rails in the bu er will be a known amount. After the bu er was turned on a sine wave was applied to the input pin of the bu er. A Hewlett Packard 100MHz digital scope was used to measure both the input (channel 1) and the output
DD SS GG GG GG

52

final_chip_cif.eps

Figure 5.9: Layout of the complete chip.

53

Figure 5.10: Picture of the complete chip.

54 (channel 4). The amplitude of the sine wave was then compared to the design data and the Spice simulations to con rm that it had the desired characteristics. The biasing voltage was also adjusted to con rm that it was set to a value which gave the best results (the largest amplitude with the least distortion.) Although the bu er only needed to work at DC it was tested at a range of frequencies to ensure that it functioned properly. One set of output results are given in Figure 5.11.

Figure 5.11: Input and output of the test bu er.

Cell
The test cell was tested by assuming the cell was in the top left hand corner of the matrix and setting the boundary values based on Spice simulations with ideal parts for what the cells neighbors should be producing. Proper digital values were then shifted into the cell from a digital test device (a Tektronix LV500) and the output was examined with a digital scope (a Tektronix TDS420). The scope was set to trigger based on an extra output from the LV500. This ensured that the value being examined came from the output pin at a known amount of time after the resistors were set. The

55 actual value used for comparison was an average calculated by the scope itself. This helped to cancel any noise that was on the signal. This is a valid technique since we are actually only concerned with DC values.

5.2.2 Matrix
The complete matrix was then tested by entering several topologies into the LV500 and running each of them against each of several di erent boundary conditions. (See Appendix F on page 91 for a sample \msa" listing from the LV500.) Again the results were taken with a digital scope and the values noted were averages. These numbers were compared with those calculated with both Spice (using ideal parts, including ideal bu ers) and with numerical answers for the partial di erential equation being solved. The results for the non-equidistant case example in Section 2.3.2 on page 19 are given in Table 5.2. Cell Spice Numerical Chip Numerical ; Chip (1,1) 2.73 2.73 2.72 0.01 (1,2) 2.25 2.24 2.25 -0.01 (1,3) 1.77 1.77 1.76 0.01 (1,4) 1.39 1.39 1.38 0.01 (2,1) 2.69 2.67 2.68 -0.01 (2,2) 2.39 2.28 2.29 -0.01 (2,3) 2.08 1.91 1.89 0.02 (2,4) 1.80 1.59 1.56 0.03 (3,1) 2.68 2.68 2.69 -0.01 (3,2) 2.28 2.39 2.39 0.00 (3,3) 1.91 2.08 2.06 0.02 (3,4) 1.59 1.80 1.72 0.08 (4,1) 2.79 2.79 2.80 -0.01 (4,2) 2.56 2.56 2.59 -0.03 (4,3) 2.30 2.30 2.32 -0.02 (4,4) 2.02 2.02 1.98 0.04 Table 5.2: Results from the example. All values in volts. The chip values given in Table 5.2 were taken from the digital scope. An example screen is given in Figure 5.12 on the following page. Each waveform comes from the output of one row. The

56 large changes in voltages come from changing the output select lines which select the column being viewed. The widths of those changes is set by programmed delays in the LV500. Delays were selected that made the values easy to read, although the chip could have been run faster with di erent test equipment. This is because the scope will only display one triggers worth of information at a time and a separate trigger would be needed for each column select change to get completely accurate and exact timing results. If the chip were in an embedded system this would be the case since the rest of the circuitry would be designed after taking the hold time into account.
Average value between lines

Output of Node (3,2) in Volts

Row 4 Row 3 Row 2 Row 1

Figure 5.12: Output from the matrix of the example in Section 2.3.2. In order to determine the rise time for the chip a shorter delay was used. Although it makes it very di cult to read accurate solutions to the problem in general, it does make it possible to get more accurate rise and fall times. Essentially this shorter delay allows us to zoom in on the edges of the curves. This plot is shown in Figure 5.13 on the next page. Note that both the time and the voltage scale are di erent from those in Figure 5.12. The longest rise time for the matrix of the example was

57

Figure 5.13: Output from the matrix showing rise and fall times.

58 for the node in the rst column and the fourth row with 3:22 s. The next edge for that node is a falling edge which is slightly slower than the previous edge, with a fall time of 3:48 s. The fact that the next edge has a slower fall time, but of the same order of magnitude, implies that the delay is caused by the load of the output circuitry. It is clear that all the nodes should take approximately the same time to stabilize, therefore all of the nodes must have stabilized before the rst output was read. If the time it took the internal nodes to stabilize were a major cause of delay, the second and subsequent edges would be noticeably smaller, since that delay would have been accounted for in the rst edge. To con rm this it is possible to allow the nodes plenty of time to stabilize, and then turn on the output circuitry. (Recall from the discussion of the architecture in Section 3.2 on page 30 that it is possible to select any one column for output or no columns for output.) When this is done the results are essentially the same as those described above. Unfortunately, this makes it impossible to determine the actual internal delay of the nodes themselves. Other combinations of voltages and weights were tested. The boundary conditions (in volts) in Figure 5.16 on page 63 were used with each set of weights which will now be given and discussed. The following set is simply the non-variable case. All of the weights are at there maximum. Although ideally the weights could be anything as long as they were all the same, the maximum value is a better choice, because parasitic resistances will have a smaller e ect. In addition the bu ers will have to

59 provide less current. Figure 5.14 on the next page is the output from the sixteen nodes with the rst set of boundary conditions (the same used in the rst scope images.)

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 4 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 4 4 47 4 4 4 47 6 7 e = 64 4 4 47 6 7 4 4 47 7 6 7 64 4 4 47 7 6 7 4 4 47 5 4 5 4 4 4 4 4 4 4 2 3 4 4 4 47 6 64 4 4 47 6 7 s=6 64 4 4 47 7 6 7 4 5
4 4 4 4

(5.1)

This second case is the variable case as described in the example in Section 2.3.2. It is also the one used to produce the plot in Figure 5.12 on page 56.

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 4 3 37 6 7 n=6 64 3 2 27 7 6 7 4 5 4 2 1 1 3 2 3 4 4 37 4 4 3 47 6 7 e = 64 3 2 37 6 7 4 3 27 7 6 7 63 2 1 27 7 6 7 3 2 17 5 4 5 3 2 1 3 2 1 1 2 3 4 4 3 37 6 64 3 2 27 6 7 s=6 64 2 1 17 7 6 7 4 5
4 3 2 1

(5.2)

60

Figure 5.14: Output from the equidistant case with the rst set of boundary conditions. The next set of weights is designed to model a system where all of the weights are the same for for entire columns. Figure 5.15 on the next page is the output from the sixteen nodes with the second set of boundary conditions (see Figure 5.16 on page 63 for the boundary conditions.)

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 4 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 2 1 2 47 2 1 27 6 7 6 7 e = 62 1 2 47 2 1 27 6 7 7 62 1 2 47 7 7 6 2 1 27 5 4 5 2 1 2 4 2 1 2 3 2 4 4 4 47 6 64 4 4 47 7 6 s=6 7 64 4 4 47 7 6 5 4
4 4 4 4

(5.3)

61 The fourth set of weights has two \islands" with equal weights around them.

Figure 5.15: Output from the \column" weight set and the second set of boundary conditions.

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 2 4 47 6 7 n=6 64 2 2 47 7 6 7 4 5 4 4 2 4 3 2 3 4 4 4 47 4 4 47 6 7 6 7 e = 62 2 4 47 2 2 47 6 7 7 64 2 2 47 7 7 6 4 2 27 5 4 5 4 4 4 4 4 4 4 2 3 4 2 4 47 6 64 2 2 47 6 7 s=6 64 4 2 47 7 6 7 4 5
4 4 4 4

(5.4)

62 The fth and nal set of weights has one odd shaped island.

2 64 64 6 w=6 64 6 4
4

2 3 4 4 4 47 6 64 1 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 4 4 47 4 4 4 47 6 7 e = 62 3 4 47 6 7 2 3 47 7 6 7 64 4 4 47 7 6 7 4 4 47 5 4 5 4 4 4 4 4 4 4 2 3 4 1 4 47 6 64 4 4 47 6 7 s=6 64 4 4 47 7 6 7 4 5
4 4 4 4

(5.5)

Note that in all of the weight sets the weight between any two nodes was the same in both directions. Although this is logical for this application of using this network for partial di erential equations it is not a limitationof the architecture and is not necessarily desirable for all applications. CY88b] CY88a] While testing the chip it was clear that the clock cycles could be run as fast as the test equipment could generate them (8ns for each phase). This result is logical since the only devices that each clock phase has to drive is one pass transistor and one inverter. I would predict that with faster test equipment the clock phases could be run as fast as 3ns or 4ns. Although each inverter should be able to switch in approximately 1ns, the clock lines are very long and therefore have a large amount of parasitic capacitance which will slow the signal down.

63

3 3 3

2.2 1.6

1 1 1 3 3

2.2 1.6

1 1 1

Voltage Set 1 3 3 3 3 3 3 1 1 3 3

Voltage Set 2 1 1 2.2 2.2 2.2 3

3 3 1.6

2.2 1.6

1 1 1 3 2.2

2.2 1.6

1 1 1

Voltage Set 3 1.6 1.6 2.2 2.2 2.2 3 1 1 1.83 1.6

Voltage Set 4 1 1 2.2 2.2 2.2 1

Figure 5.16: Voltage sets used to test the complete matrix.

Chapter 6

Conclusion and Future Work


6.1 Summary
An arti cial Cellular Neural Network to solve Partial Di erential Equations, was rst proposed by GZ93]. The CNN was enhanced to increase the accuracy of the solution without adding more cells in GZ94]. This thesis extended this previous work to two dimensions. It also introduces an architecture to control the weights of neural networks. This new architecture can be used to control other types of neural networks as well. It has the advantage of needing few pins, without any loss of control. Furthermore, it can be used in an arbitrarily large network without increasing the space needed for routing outside of each \weight." A CMOS VLSI test chip was designed to implement the CNN. Its purpose was to prove both that the neural network and the architecture would perform as the theory showed. The chip was fabricated by the MOSIS Service and it was tested. Each of the test chip's two goals were met. The chip proved capable of calculating the solutions to the PDEs very rapidly. Since all of the nodes arrived at their individual solutions in parallel, a problem with a greater number of nodes could be solved in the same amount of time with a larger chip. This fact emphasizes the importance of the CNN design. In addition to the solution of the elliptic boundary problems being ideally suited to CNNs in general, this design in particular showed how the control of a CNN could be accomplished without routing being a problem as the number of cells grows. 64

65 During the testing it became clear that the chip worked as well as expected. The architecture was able to control all of the weights as designed. The time between clock cycles was also as expected (as rapid as the test eqipment could produce.) Most of the solutions generated by the test chip were within one hundredth of a volt from what was predicted by both Spice simulations and the numerical solution of the PDEs. The quality of the results were very similar for twenty di erent combinations of boundary conditions and weights between the neurons. One of the disadvantages of the proposed architecture is that it required digital test equipment to test it. This limits the range over which the chip could be tested. It was due mostly to the need to refresh the dynamic shift registers used to set the weights. Approximately three seconds were available to make the changes to the inputs between clocks, but this is a very short time for a human in a test environment. Of course with an analog test set which allows analog values to be preprogrammed like a digital test set allows digital values to be preprogrammed this limitation could have been avoided. If the chip were used in an embedded system the interface could also avoid this timing problem. In addition the current required during shifts needs to be compensated for, especially if the circuit were to be used at higher voltage levels. Several possibilities exist for solving this problem including using either more power pads and/or bonding wires to the power pads. Extra bonding wires could easily be added, with the right equipment, after fabrication. Thus, with the right equipment, it would still be possible to test the circuit at higher voltages. Another possibility is to use on chip capacitors since it is a switching current. The architecture worked as well as it had been designed to. It allowed each of the weights to be set individually while only using six pins of the chip package. Although as few as two pins could have been used, having six pins made part of the weights' setup occur in parallel. In addition the physical layout of the architecture, which allowed it to be used with an arbitrarily large network, worked properly without increasing the routing outside each cell. This feature is very important since routing is generally considered a major problem with neural networks.

66

6.2 Future Work


Although the digital architecture used to set up the weights worked well, other types of memory cells should be considered. For example, SRAM, DRAM, and static shift registers might be considered with there respective advantages and disadvantages studied. More work is needed on the analog boundary conditions and the analog outputs from each cell. The output circuitry appears to slow down the circuit. One option to increase the speed with the same output circuitry is to add a bu er at each output pin. A better method might be to reconsider the output circuitry entirely. In the test chip the output was obtained from the nodes in the test cell using a parallelized raster technique. One possibility would be to put analog to digital converters on the chip to get the output from the chip as digital words. This would save pins in large networks. Of course it would be preferable to get all of the output at the same time in a fully parallel manner. Since there would not be enough pins even with a small matrix like that on our test chip, this is not directly feasible. However, other less obvious methods should be looked at such as having each node drive a pixel so that the output would be visible to the human eye or to a specialized camera. Both the ADC and the pixel method would have the added bene t of not having to have each node drive the capacitance associated with the bonding pad, wire and pins. The boundary conditions can use a lot of pins as well. This problem should also be looked into. It could be solved with a raster method, where analog memory cells were used to store the value needed at each boundary point. Digital to analog converters could also be used on chip if digital memory cells were preferable. All of the theory used through out this thesis is for the steady state case. The other case where time and initial conditions are important should also be studied. In order to do this a method of getting the preconditions into the cells is also needed. The same methods for the output could be used, but instead of pixels arti cial retinas would be needed. Finally, the Reduced Instruction Set Computer (RISC) versus Complex Instruction Set Computer (CISC) problem should be studied with respect to this matrix. The basic idea behind this computer engineering problem is that if you simplify the architecture so that the CPU in question has to do more work it may be faster since there will be more chip area available which if used properly can make the total performance better. In the present case this would mean using simpler cells without variable weights, but since each cell could be made smaller the total result would be more accurate

67 solutions. The analog I/O problems mentioned above would have to be solved in a physically compact manner for this to be realized. Note that the architecture in this paper would remain useful for other types of neural networks. This particular simpli cation would not work for networks that were not designed for this particular problem (solving PDEs.)

Appendix A

Complete Matrix Spice File


The following is the input le used for testing the bu er in a complete 4 4 matrix. Note that all of the directions have equal length (weights). Since Spice-3f4 does not allow resistance values to be given in subcircuit parameters the les for di erent sets of variable weights are very long and are not included in this thesis. As previously the circuit extractor does not handle the polysilicon resistors so they are modeled by ideal parts, as is the main capacitor. At the end of the le are the Spice device model parameters used with both Spice-3f4 and AWB simulations.
* mat.spice XC1 10 11 XC2 20 21 XC3 30 31 XC4 40 41 XC5 XC6 XC7 XC8 50 60 70 80 51 61 71 81

21 31 41 1 61 71 81 1 101 111 121 1 141

4 3 2 1 11 21 31 41

5 11 21 31

51 61 71 81

7 7 7 7

8 8 8 8

9 9 9 9 9 9 9 9 9 9 9 9

CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL

5 91 7 8 51 101 7 8 61 111 7 8 71 121 7 8 7 7 7 7 8 8 8 8

XC9 90 91 XCA 100 101 XCB 110 111 XCC 120 121 XCD 130 131

51 5 131 61 91 141 71 101 151 81 111 161 91 5

5 7 8 9 CELL

68

69
XCE 140 141 XCF 150 151 XCG 160 161 V1 1 V2 2 V3 3 V4 4 V5 5 V7 7 V8 8 V9 9 .OP 0 0 0 0 0 0 0 0 0 1.5 3 5 5 -3 8 -8 151 101 131 161 111 141 1 121 151 5 7 8 9 CELL 5 7 8 9 CELL 5 7 8 9 CELL

* inv/cell.sub .SUBCKT CELL 40 41 51 52 53 54 7 8 9 XB 40 41 7 8 9 Buf *EB 41 0 40 0 1 RR 51 40 2k RU 52 40 2k RL 53 40 2k RD 54 40 2k C1 40 0 2.233p .ENDS

.SUBCKT Buf 119 105 111 1 108 * in 119 * out 105 * VGG 111 * VDD 1 * VSS 108 ** NODE: 0 = GND ** NODE: 1 = Vdd ** NODE: 2 = Error ** SPICE file created for circuit buffer ** Technology: scmos ** ** NODE: 0 ** NODE: 1 ** NODE: 2 RLUMP0 100 RLUMP1 100 M0 101 102 RLUMP2 103 RLUMP3 105 RLUMP4 100 = GND = Vdd = Error 101 176.5 102 176.5 1 1 pfet L=8.0U W=8.0U 104 140.5 106 576.0 107 176.5

70
RLUMP5 108 109 2168.5 M1 104 106 107 109 nfet L=8.0U W=8.0U RLUMP6 108 110 2168.5 RLUMP7 111 112 160.5 RLUMP8 103 113 140.5 RLUMP9 108 114 2168.5 M2 110 112 113 114 nfet L=20.0U W=3.0U RLUMP10 115 116 381.5 RLUMP11 100 117 176.5 M3 116 117 1 1 pfet L=8.0U W=8.0U RLUMP12 103 118 140.5 RLUMP13 119 120 31.5 RLUMP14 115 121 381.5 RLUMP15 108 122 2168.5 M4 118 120 121 122 nfet L=8.0U W=8.0U RLUMP16 105 123 576.0 RLUMP17 115 124 381.5 M5 123 124 1 1 pfet L=2.0U W=30.0U RLUMP18 108 125 2168.5 RLUMP19 111 126 160.5 RLUMP20 105 127 576.0 RLUMP21 108 128 2168.5 M6 125 126 127 128 nfet L=2.0U W=30.0U ** NODE: 0 = GND! C0 108 0 36F ** NODE: 108 = Vss ** NODE: 119 = in C1 115 0 42F ** NODE: 115 = 7_34_18# C2 111 0 13F ** NODE: 111 = Vgg C3 103 0 42F ** NODE: 103 = 7_114_36# C4 105 0 90F ** NODE: 105 = out C5 1 0 78F ** NODE: 1 = Vdd! C6 100 0 47F ** NODE: 100 = 7_22_40# *VDD 1 0 8 *GG 111 0 -3 *VSS 108 0 -8 .ENDS .MODEL nfet NMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=1 + VTO=0.8630 DELTA=6.6420E+00 LD=2.4780E-07 KP=4.7401E-05 + UO=562.8 UEXP=1.5270E-01 UCRIT=7.7040E+04 RSH=2.4000E+01 + GAMMA=0.4374 NSUB=4.0880E+15 NFS=1.980E+11 NEFF=1.0000E+00 + VMAX=5.8030E+04 LAMBDA=3.1840E-02 CGDO=3.1306E-10

71
+ CGSO=3.1306E-10 CGBO=4.3449E-10 CJ=9.5711E-05 MJ=0.7817 + CJSW=5.0429E-10 MJSW=0.346510 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -5.4940E-07 .MODEL pfet PMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=-1 + VTO=-0.9629 DELTA=5.7540E+00 LD=3.0910E-07 KP=1.7106E-05 + UO=203.1 UEXP=2.1320E-01 UCRIT=8.0280E+04 RSH=5.6770E+01 + GAMMA=0.6180 NSUB=8.1610E+15 NFS=3.270E+11 NEFF=1.5000E+00 + VMAX=9.9990E+05 LAMBDA=4.5120E-02 CGDO=3.9050E-10 + CGSO=3.9050E-10 CGBO=4.1280E-10 CJ=3.2437E-04 MJ=0.5637 + CJSW=3.3912E-10 MJSW=0.275876 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -4.1580E-07

.END

Appendix B

Shift Register Simulations


Figure B.1 is the 1-bit shift register used with the esim simulations. Figure B.2 on the following page is a close up of one bit of a shift register as fabricated.
Vdd

clk

clkb

out

in

GND

Figure B.1: Shift Register Cell used with esim.

72

73

Figure B.2: Picture of the Shift Register Cell.

74 The following is the esim input le for the 1-bit shift register.
V V V w G clk 10 clkb 01 in 00001111001100111100110011110000 in out

The output generated by esim from the previous le is given next. Note that the \clock" is two characters wide so everything is repeated.
Funny looking header line in .sim file. using UCB format 6 transistors, 9 nodes (0 pulled up) >00001111001100111100110011110000:in >X0000111100110011110011001111000:out

Figure B.3 on the next page is the AWB circuit that was used to simulate the amount of signal that would be lost by using an NMOS pass transistor as our tap. It was also used to show that the shift register could drive the tap su ciently. Figure B.4 on page 76 is the \oscilloscope" used to show the output of the simulation.

75

Figure B.3: AWB circuit of a 1-bit shift register driving a tap.

76

Figure B.4: AWB \oscilloscope" of a 1-bit shift register driving a tap.

Appendix C

Glue Logic Simulations


C.1 Shift Register
The 8-bit shift register in Figure C.1 is simply eight copies of Figure B.1 on page 72. Figure C.2 is a photograph of the same part after fabrication. See Figure B.2 on page 73 for a close up of one bit of the shift register.
clkb clk Vdd 8 7 6 5 4 3 2 1

in GND

Figure C.1: An 8-bit shift register used to set control lines for the variable resistors.

Figure C.2: Picture of the 8-bit shift register. The following is the esim input le for the shift register.

77

78
V V V w G clk clkb in in 8 64 10 01 0011110000001111 7 6 5 4 3 2 1

The output generated by esim from the previous le is given next. The label \1" is for the last bit which is the farthest from the input to the register and also the rst bit to be entered. Note that the \clock" is two characters wide so everything is repeated, and that the \X"s are expected until the signal has had a chance to propagate.
Funny looking header line in .sim file. using UCB format 48 transistors, 37 nodes (0 pulled up) >0011110000001111001111000000111100111100000011110011110000001111:in >X001111000000111100111100000011110011110000001111001111000000111:8 >XXX0011110000001111001111000000111100111100000011110011110000001:7 >XXXXX00111100000011110011110000001111001111000000111100111100000:6 >XXXXXXX001111000000111100111100000011110011110000001111001111000:5 >XXXXXXXXX0011110000001111001111000000111100111100000011110011110:4 >XXXXXXXXXXX00111100000011110011110000001111001111000000111100111:3 >XXXXXXXXXXXXX001111000000111100111100000011110011110000001111001:2 >XXXXXXXXXXXXXXX0011110000001111001111000000111100111100000011110:1

C.2 Decoder
The 3-to-8 decoder is shown in Figure C.3 on the next page. Note that this circuit only has four output lines. It is actually a 3-to-8 decoder with half of the output lines removed, but logically is a 2-to-4 decoder with an enable line. Figure C.4 on the following page is an image of the same part after fabrication. The following is the esim input le for our decoder.
w V V V G S S0 S1 OUT0 OUT1 OUT2 OUT3 S 00001111 S0 01 S1 0011

The output generated by esim from the previous le is given next.

79

S S0 S0B S1 S1B OUT0 Vdd OUT1 OUT2 OUT3

GND

Figure C.3: A 3-to-8 Decoder used to select an output column.

Figure C.4: Picture of the 3-to-8 Decoder.

80
Funny looking header line in .sim file. using UCB format 36 transistors, 23 nodes (0 pulled up) >00001111:S >01010101:S0 >00110011:S1 >00001000:OUT0 >00000100:OUT1 >00000010:OUT2 >00000001:OUT3

Appendix D

Pin Out
Figure D.1 on the next page shows the layout of the bonding pads with their pin numbers while Table D.1 on page 83 lists the logical names associated with each pin of the nal chip. Note that the table is in the form that is used in the packaging with both the rst and last pin on the top row.

81

82

15 16 17 18 19 20 21 22 23 24 25

14

13

12

11

10

5 4 3 2 1 40 39 38 37 36

26

27

28

29

30

31

32

33

34

35

Figure D.1: the layout of the pins on the MOSIS Tiny Chip frame.

83

Pin
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

N. Input at Cell 11 N. Input at Cell 12 W. Input at Cell 11 N. Input at Cell 13 Output Enable N. Input at Cell 14 Output Select 0 E. Input at Cell 14 TB In TR Shift In Output Select 1 TR Out Top Shift In TR In Shift Clock Row 1 Out W. Input at Cell 21 E. Input at Cell 24 and N. Input at TC V Row 2 Out W. Input at Cell 31 V Bottom Shift In E. Input at Cell 34 and W. Input at TC W. Input at Cell 41 Row 3 Out Shift Clock TC N. and S. Shift In TB Output TC E. and W. Shift In Matrix Clock Test Cell Out S. Input at Cell 41 V S. Input at Cell 42 E. Input at Cell 44 and TC S. Input at Cell 43 Row 4 Out Matrix Clock S. Input at Cell 44 and Test Cell
SS DD GG

Logic Name

Logic Name

Pin
40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21

Table D.1: The pin out used for the chip. N, S, E, and W stands for North or Northern, South or Southern, East or Eastern, and West or Western. TC stands for the test cell, TB for the test bu er, and TR stands for the test resistor.

Appendix E

MOSIS Parameters
The following is copied from MOS93], but is only the relevant part of the le.
SCNA20_ORBIT_SPECS] ORBIT ELECTRICAL PARAMETERS 2UM, DOUBLE METAL, DOUBLE POLY, N-WELL CMOS POLY 1 AND POLY 2 ACTIVE GATES POLY 1 / 2 CAPACITORS DEPLETION IMPLANT ADJUST FOR BURIED CHANNEL POTENTIAL

A.1 A.1.1 A.1.2 A.1.3 A.1.4 A.1.5 A.1.6 A.1.7 A.1.8 A.2 A.2.1 A.2.2 A.2.3

Oxide Thicknesses (Angstroms) Poly 1 gate oxide Poly 2 gate oxide Field oxide (Poly 1 & 2 to Sub) Metal 1 to Poly 1 & 2 Metal 1 to Sub Metal 1 to N+/P+ Diff Metal 2 to Metal 1 Poly 1 to Poly 2 Conductors Poly 1 Poly 2 Metal 1

MIN --370 470 5500 8000 13500 8500 6000 650

TYP --400 500 6000 8500 14500 9000 6500 750

MAX --430 530 6500 9000 15500 9500 7000 850

3700 3700 5500

4000 4000 6000

4300 4300 6500

84

85
A.2.4 Metal 1 10500 11500 12500

B. B.1 B.1.1 B.1.2 B.1.3 B.1.4 B.1.5 B.1.6 B.1.7 B.2 B.2.1 B.2.2 B.2.3 B.2.4 B.2.5 B.2.6 B.2.7 B.3 B.3.1 B.3.2 B.3.3 B.3.4 B.3.5 B.3.6 B.3.7 B.4 B.4.1 B.4.2 B.4.3 B.4.4 B.4.5 B.4.6 B.4.7 B.5 B.5.1 B.5.2

TRANSISTOR SPECIFICATIONS P Channel Poly 1 Threshold (volts) Gamma (volts **.) K'=uCox/2 (uA/v**2) (Vds=0.1V, Vgs=2-3V) Punchthrough for min. length channel (volts) Subthreshold slope (volts**-3/decade) Delta width = effective-mask (microns) Delta length = effective-mask (microns) P Channel Poly 2 Threshold (volts) Gamma (volts**.5 K'=uCox/2 (uA/v**2) Punchthrough for min. length channel (volts) Subthreshold slope (volts**-3/decade) Delta width = effective-mask (microns) Delta length = effective-mask (microns) N Channel Poly 1 Threshold (volts) Gamma (volts**.5) K'=ucox/2 (uA/v**2) (Vds=0.1V, Vgs=2-3V) Subthreshold slope (volts**-3/decade) Punchthrough for min. length channel (volts) Delta width = effective-mask (microns) Delta length = effective-mask (microns) N Channel Poly 2 Threshold (volts) Gamma (volts**.5) K'=uCox/2 (uA/v**2) Subthreshold slope (volts**-3/decade) Punchthrough for min. length channel (volts) Delta width = effective-drawn (microns) Delta length = effective-drawn (microns) CCD Channel Potential (volts) Poly 1 Poly 2 VG=0 VG=0 3.0 3.0 5.0 5.0 8.0 8.0 0.7 0.21 18 10 -0.8 1.1 0.3 20 14 -0.4 1.4 0.4 22 16 -0.1 0.5 .15 20 90 10 -0.7 0.75 .25 23 100 14 -0.3 1.0 .35 26 110 16 0 -1.5 0.5 5.0 -16 -1.15 0.6 6.0 -14 -0.8 0.8 7.0 -10 -1.0 .45 6.0 -16 90 -0.7 -0.75 .55 7.5 -14 100 -0.4 -0.5 .65 8.5 -10 110 -0.1

-0.8

-0.5

-0.2

86
B.6 B.6.1 B.6.2 B.6.3 B.6.4 B.6.5 B.6.6 B.6.7 B.6.8 B.6.9 B.6.10 NPN Transistor in the N-well Beta= 80 to 200 at IB = 1 uA BVEBO = 10 V BVCEO > -10 V BVCES > 10 V BVCBO > -60 V P-base Xj 0.45 to 0.50 micron N+emitter Xj 0.3 micron Rcollector 1.0 +/- 0.2 Kohm/sq P-base resistance 1.2 +/- 0.2 Kohm/sq Early Voltage > 30 V

C. C.1 C.2 C.3 C.4.1 C.4.2 C.5 C.6

SHEET RESISTANCES (OHMS PER SQUARE) P+ Active N+ Active N Well (with field implant) Poly1 Poly2 Metal1 Metal2

MIN --40 20 2000 15 18 .050 .030

TYP --57 28 2500 21 25 .070 .040

MAX --80 40 3000 30 30 .090 .050

D. D.1 D.2 D.3.1 D.3.2 D.4

CONTACT RESISTANCE (OHMS) Metal1 Metal1 Metal1 Metal1 Metal1 to to to to to P+ Active N+ Active Poly1 Poly2 Metal2 (single contact 2 by 2um) 35 20 20 20 0.4 75 50 50 50 0.7

E. E.1.1 E.1.2 E.2 E.3 E.4.1 E.4.2 E.5 E.6 E.7.1 E.7.2 E.8

FIELD INVERSION AND BREAKDOWN VOLTAGES (VOLTS) N Channel Poly1 field inversion N Channel Poly2 field inversion N Channel Metal1 field inversion N Channel Metal2 field inversion Channel Poly1 field inversion P Channel Poly2 field inversion P Channel Metal1 field inversion P Channel Metal2 field inversion N Diffusion to substrate junction breakdown P Diffusion to substrate junction breakdown N-well to P- sub junction breakdown 10 10 10 14 14 14 -14 -14 -14 14 15 50 -10 -10 -10 16 18 90

87

INTERLAYER CAPACITANCES (PLATE: 10 ** -5 PF / MICRON ** 2 FRINGE: 10 ** -5 PF / MICRON)

GATE OXIDE PLATE POLY1 GATE OXIDE PLATE POLY2 FIELD POLY1 TO SUBS FRINGE FIELD POLY2 TO SUBS FRINGE POLY1 TO POLY2 OVER ACTIVE POLY1 TO POLY2 OVER FIELD METAL1 TO ACTIVE PLATE METAL1 TO ACTIVE FRINGE METAL1 TO SUBS PLATE METAL1 TO POLY PLATE METAL1 TO POLY FRINGE METAL2 TO ACTIVE PLATE METAL2 TO ACTIVE FRINGE METAL2 TO SUBS PLATE METAL2 TO SUBS FRINGE METAL2 TO POLY PLATE METAL2 TO POLY FRINGE METAL2 TO METAL1 PLATE METAL2 TO METAL1 FRINGE

Capacitance MIN MAX ----78 90 64 70

Equiv. Thickness MIN MAX ----370 Ang 430 Ang 470 Ang 530 Ang

43 43 3.6 2.2 3.7 1.9 1.5 1.9 4.6

55 55 4.0 2.5 4.4 2.4 1.65 2.4 5.6

650 Ang 650 Ang 8500 Ang

850 Ang 850 Ang 9500 Ang

13500 Ang 15500 Ang 8000 Ang 9000 Ang 14500 Ang 17500 Ang 19500 Ang 22000 Ang 14500 Ang 17500 Ang 6000 Ang 7500 Ang

88
N32A.PRM]

MOSIS PARAMETRIC TEST RESULTS ----------------------------RUN: N32A / ALINE TECHNOLOGY: SCNA I. VENDOR: ORBIT FEATURE SIZE: 2.0um

INTRODUCTION. This report contains the lot average results obtained by MOSIS from measurements of the MOSIS test structures placed on this fabrication run. The SPICE parameters obtained from similiar measurements on a representative wafer from this run are also attached.

COMMENTS: II. TRANSISTOR PARAMETERS: W/L N-CHANNEL P-CHANNEL UNITS -----------------------------------------------------------------------------Vth (Vds=.05V) 3.0/2.00.934 0.996 V Vth (Vds=.05V) Idss (Vgds=5V) Vpt (Id=1.0uA) Vth (Vds=.05V) Vbkd (Ij=1.0uA) Gamma (2.5v,5.0v) 18.0/2.00.839 0.971 V 2698.0 -1383.0 uA *************** *************** V 50.0/50.00.865 15.9 0.174 0.961 -15.3 0.692 V V V^0.5

Kp (Uo*Cox/2) Delta Length Delta Width (Effective=Drawn-Delta) COMMENTS:

26.4

-9.2

uA/V^2

0.424 0.253 um *************** *************** um

III. FIELD OXIDE TRANSISTOR SOURCE/DRAIN SOURCE/DRAIN PARAMETERS: GATE N + ACTIVE P + ACTIVE UNITS ----------------------------------------------------------------------------Vth (Vbs=0,I=1uA) Poly 16.1 -13.1 V Vth (Vbs=0,I=1uA) Metal1 27.6 -36.3 V Vth (Vbs=0,I=1uA) Metal2 49.1 -62.8 V COMMENTS: IV. PROCESS N P N P N METAL METAL

89
PARAMETERS: POLY POLY DIFF DIFF WELL 1 2 UNITS -----------------------------------------------------------------------------Sheet Resistance 22.8 22.0 27.2 61.6 2459.0 0.047 0.028 Ohm/sq Width Variation 0.225 0.223 0.490 0.317 ---0.088 0.322 um (Measured - Drawn) Contact Resist. 12.55 14.58 30.84 41.65 ------0.033 Ohms (Metal1 to Layer) Gate Oxide Thickness: COMMENTS: V. CAPACITANCE N P METAL METAL PARAMETERS: POLY DIFF DIFF 1 2 UNITS -----------------------------------------------------------------------------Area Cap 0.058 0.119 0.346 0.026 0.016 fF/um^2 (Layer to subs) Area Cap ---------0.039 0.020 fF/um^2 (Layer to Poly) Area Cap ------------0.034 fF/um^2 (Layer to Metal1) Fringe Cap ---0.527 0.263 ------fF/um (Layer to subs) COMMENTS: VI. CIRCUIT PARAMETERS: -----------------------------------------------------------------------------Vinv, K = 1 0.00 V Vinv, K = 1.5 0.00 V Vlow, Vhigh, Vinv, Gain, K K K K = = = = 2.0 2.0 2.0 2.0 0.00 V 5.00 V 2.48 V -11.18 37.37 MHz ( 31 stages @ 5.0V)

----

----

423.

409.

----

----

----

Angst.

Ring Oscillator Frequency COMMENTS:

N32A

SPICE LEVEL 2 PARAMETERS

.MODEL CMOSN NMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=1 + VTO=0.8630 DELTA=6.6420E+00 LD=2.4780E-07 KP=4.7401E-05 + UO=562.8 UEXP=1.5270E-01 UCRIT=7.7040E+04 RSH=2.4000E+01 + GAMMA=0.4374 NSUB=4.0880E+15 NFS=1.980E+11 NEFF=1.0000E+00 + VMAX=5.8030E+04 LAMBDA=3.1840E-02 CGDO=3.1306E-10

90
+ CGSO=3.1306E-10 CGBO=4.3449E-10 CJ=9.5711E-05 MJ=0.7817 + CJSW=5.0429E-10 MJSW=0.346510 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -5.4940E-07 .MODEL CMOSP PMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=-1 + VTO=-0.9629 DELTA=5.7540E+00 LD=3.0910E-07 KP=1.7106E-05 + UO=203.1 UEXP=2.1320E-01 UCRIT=8.0280E+04 RSH=5.6770E+01 + GAMMA=0.6180 NSUB=8.1610E+15 NFS=3.270E+11 NEFF=1.5000E+00 + VMAX=9.9990E+05 LAMBDA=4.5120E-02 CGDO=3.9050E-10 + CGSO=3.9050E-10 CGBO=4.1280E-10 CJ=3.2437E-04 MJ=0.5637 + CJSW=3.3912E-10 MJSW=0.275876 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -4.1580E-07

Appendix F

Sample LV500 le
The following is a complete LV500 program used to perform the test described in the variable case example throughout this thesis. It was extracted from the LV500 as a \msa" le. Each set of weights requires its own le, but only one is given here to save space. Note that the chip was shown to run much faster than this le indicates, but when reading the solution with a scope this speed is more practical.
v64 setup version 0 1 1 /* config section */ resolution = 20ns dev_supply_voltage = 5.00v dev_supply_current = 0.20a term_supply_voltage = 3.00v force_high_family_v1 = 5.00v force_low_family_v1 = 0.50v compare_family_v1 = 1.40v force_high_family_v2 = 4.50v force_low_family_v2 = 0.50v compare_family_v2 = 2.50v sector_logic_family = { v1, v1, v1, v1, , , , }

91

92

/* group section */ group "Main_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a signal "Top_Shift_In" { dut = "U_7" sector = 0h0 channel = 0hc } signal "Bottom_Shift_In" { dut = "U_12" sector = 0h1 channel = 0h6 } } group "Output" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0c signal "Output_Enable" { dut = "U_3" sector = 0h0 channel = 0h4 } signal "Output_Select_0" { dut = "U_4" sector = 0h0 channel = 0h6 } signal "Output_Select_1" { dut = "U_6" sector = 0h0 channel = 0ha } } group "TRIGGER" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0d signal "TRIGGER" { dut = "TRIG"

93
sector = 0h2 channel = 0hf } } group "Shift_Clock" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0a signal "Shift_Clock" { dut = "U_14" sector = 0h1 channel = 0ha } indep_signal "Shift_Clock_Bar" { dut = "U_8" sector = 0h0 channel = 0he phase = 0b force_fmt = r0 compare_fmt = edge_t } } group "Matrix_Clock" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0a signal "Matrix_Clock" { dut = "U_16" sector = 0h1 channel = 0he } indep_signal "Matrix_Clock_Bar" { dut = "U_20" sector = 0h2 channel = 0h6 phase = 0b force_fmt = r0 compare_fmt = edge_t } } group "TC_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a

94
signal "TC_NS_Shift_In" { dut = "U_27" sector = 0h1 channel = 0hb } signal "TC_EW_Shift_In" { dut = "U_26" sector = 0h1 channel = 0hd } } group "TR_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a signal "TR_Shift_In" { dut = "U_36" sector = 0h0 channel = 0h9 } } /* template section */ template "template_0" { cycle = 1000ns phase 0a {delay = 0ns width = phase 0b {delay = 300ns width phase 0c {delay = 0ns width = phase 0d {delay = 800ns width group "Main_Input" { function = force } group "Output" { function = force } group "TRIGGER" { function = force } group "Shift_Clock" { function = force signal "Shift_Clock_Bar" { function = force } } group "Matrix_Clock" { function = force signal "Matrix_Clock_Bar" {

200ns } = 200ns } 940ns } = 80ns }

95
function = force } } group "TC_Input" { function = force } group "TR_Input" { function = force } } template "TRIGGER" { cycle = 1000ns phase 0a {delay = 0ns width = phase 0b {delay = 300ns width phase 0c {delay = 0ns width = phase 0d {delay = 800ns width group "Main_Input" { function = mask } group "Output" { function = mask } group "TRIGGER" { function = force } group "Shift_Clock" { function = mask signal "Shift_Clock_Bar" { function = mask } } group "Matrix_Clock" { function = mask signal "Matrix_Clock_Bar" { function = mask } } group "TC_Input" { function = mask } group "TR_Input" { function = mask } } /* schmoo define section */ schmoo_var_x = not_selected schmoo_var_y = not_selected

200ns } = 200ns } 80ns } = 100ns }

96
/* macro section */ macro shift_0() { * "This macro will shift "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 } macro shift_1() { * "this macro will shift "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 } macro col_to_4() { * "set a whole column shift_1 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 }

in 00 00 00 00 00 00 00 00

all 0" 00 0 00 0 00 0 00 0 00 0 00 0 00 0 00 0

in 00 00 00 00 00 00 00 00

all 1" 00 0 00 0 00 0 00 0 00 0 00 0 00 0 00 0

to 4" 00 11 00 0 00 11 00 0 00 11 00 0 00 11 00 0 00 11 00 0

macro mat_clk() { * "Clock the matrix" "template_0" 00 000 0 00 11 00 0 } /* define format info */

define_format { }

97
/* pattern section */ pattern * "This is a "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "shift_0" "mat_clk" * "start col "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0"

tesst of 00 000 0 00 000 0 00 000 0 00 000 0 00 000 0 10 000 0 00 000 0 10 000 0 00 10 00 10 10 00 10 00 10 00 10 00 00 01 00 01 01 01 01 01 01 00 01 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

the whole matrix equadist" 11 00 00 0 11 00 00 0 11 00 00 0 11 00 00 0 11 00 00 0 11 00 00 0 11 00 00 0 11 00 00 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3" 00 00 00 00 00 00 10

000 000 000 000 000 000 000

0 0 0 0 0 0 0

11 11 11 11 11 11 11

00 00 00 00 00 00 00

00 00 00 00 00 00 00

0 0 0 0 0 0 0

98
"template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "shift_0" "mat_clk" * "start col "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" 10 000 0 11 00 00 0 00 00 10 10 10 10 00 00 11 10 01 00 00 00 01 01 00 01 00 01 01 01 00 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2" 00 00 10 10 10 10 10 10 11 10 01 00 00 00 01 01

000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

99
"template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 01 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" * "start col 1" "template_0" 11 000 0 "template_0" 10 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "template_0" 10 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "mat_clk" "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" WAIT "100000" "TRIGGER" 00 000 1 00 "template_0" 00 100 0 WAIT "400000" "template_0" 00 110 0 WAIT "400000" "template_0" 00 101 0 WAIT "400000" "template_0" 00 111 0 WAIT "400000" "template_0" 00 000 0 * "END OF PROGRAM" 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0

11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0

Bibliography
AH87a] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design, section 5.2. In Electrical and Computer Engineering AH87c], 1987. AH87b] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design, section 8.3. In Electrical and Computer Engineering AH87c], 1987. AH87c] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design. Electrical and Computer Engineering. Holt, Rinehart and Winston, Inc., New York, 1987. CY88a] L. O. Chua and L. Yang. Cellular neural networks: Applications. IEEE Transactions on Circuits and Systems, 35(10):1273{1290, October 1988. CY88b] L. O. Chua and L. Yang. Cellular neural networks: Theory. IEEE Transactions on Circuits and Systems, 35(10):1257{1272, October 1988. GAS90a] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits, section 5.2. In Electrical Engineering GAS90c], 1990. GAS90b] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits, page 213. In Electrical Engineering GAS90c], 1990. GAS90c] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits. Electrical Engineering. McGraw-Hill Publishing Company, New York, 1990. GZ93] D. Gobovic and M. E. Zaghloul. Design of locally connected CMOS neural cells to solve the steady-state heat ow problem. In Proceedings of the IEEE 36th Midwest Symposium 100

101
on Circuits and Systems, Detroit, August 1993. The Institute of Electrical and Electronics

Engineers, Inc. GZ94]

D. Gobovic and M. E. Zaghloul. Analog cellular neural network with application to partial di erential equations with variable mesh-size. In Proceedings of the IEEE International symposium on circuits and systems, London, May 1994. The Institute of Electrical and Electronics Engineers, Inc. Kai Hwang and Faye A. Briggs. Computer Architecture and Parallel Processing. McGrawHill Publishing Company, New York, 1984. M. Ismail. Analog VLSI: Signal and Information Processing, chapter 16. McGraw-Hill, Inc., New York, 1994. Andrew Ronald Mitchell. The Finite Di erence Method in Partial Di erential Equations, chapter 3. John Wiley & sons Ltd., 1980.

HB84] Ism94] Mit80]

MOS88] The Information Science Institute of the University of Southern California USC/ISI in Marina del Ray, California. MOSIS Users Manual, 1988. MOS93] Orbit electrical parameters. FTPed from ftp.mosis.edu in the le named scna20-orbittech.inf which can be found in the directory /pub/mosis/vendors/orbit-scna20, 22 July 1993. MP43] W. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin Mathematical Biophysics, 35(5):115{133, 1943.

OR70a] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables, section 4.4. In Computer Science and Applied Mathematics OR70b], 1970. OR70b] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Computer Science and Applied Mathematics. Academic Press, New York, 1970. OR70c] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables, section 5.4. In Computer Science and Applied Mathematics OR70b], 1970.

102 Ric83] TH86] John R. Rice. Numerical Methods, Software and Analysis: IMSL Reference Edition, section 10.1. McGraw-Hill Book Company, New York, 1983. D. W. Tank and J. J. Hop eld. Simple `neural' optimization networks: An A/D converter signal decision circuit, and a linear programming circuit. IEEE Transactions on Circuits and Systems, CAS-33(5):533{541, May 1986. J. M. Zurada. Introduction to Arti cial Neural Systems. West Publishing Company, 1992.

Zur92]

You might also like