Professional Documents
Culture Documents
June 1992, Lehigh University A Thesis submitted to The Faculty of The School of Engineering and Applied Science of the George Washington University in partial satisfaction of the requirements for the degree of Master of Science January 31, 1994 Thesis directed by Mona Elwakkad Zaghloul Professor of Engineering and Applied Science
Abstract
This thesis presents a locally connected neural network for solving a class of partial di erential equations. The network is based on previous theoretical work by others, but is extended to the twodemensional case. The network is designed and simulated with SPICE. Each neural cell is designed using active and passive components. An architecture is described to control the weights between the neurons. This architecture is usable by other neural networks, but is demonstrated with this cellular neural network. The major bene t of the architecture is that it does not require additional space outside of the cell for routing the control lines no matter how many cells are used. The CMOS VLSI implementation was fabricated and measured. A sixteen cell network is also simulated and measured to solve the steady-state heat ow problem under several di erent sets of conditions. The results of this network are compared to the numerical solution of the partial di erential equations.
Acknowledgments
At this point, I would like to acknowledge my deep appreciation to the many people who helped me throughout this work. First and foremost I would like to express my thanks to my advisor Professor Mona E. Zaghloul for her invaluable support encouragement and guidance. Without her help, this work would not be possible. I would also like to thank Dr. Desa Gobovic for getting me started, both for her personal help and for the previous papers on which this work is based. I also appreciate all of the members of my committee for any criticism, suggestions, or comments. A special thanks to Mr. Norris C. Hekimian for the fellowship which allowed me to continue this research. I am grateful to my colleague Michael Salter for convincing me to follow my instincts and Charles Hsu who was always willing to give advice. I would also like to thank Mr. White and his sta at SEASCF (especially Sheryl and Robert) for there assistance with computing matters. Mr. Petrella and his sta at the EE&CS Lab were also of great assistance with both equipment and analog testing advice. Last, but not least, I would like to thank both my parents and my sister for their support, both nancially and emotionally.
Contents
1 Introduction
1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Statement of Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Contribution of Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.1 Equidistant Case : : : : : : : 2.2 Increased Accuracy : : : : : : 2.3 An Illustration : : : : : : : : 2.3.1 Equidistant Case : : : 2.3.2 Non-Equidistant Case
1 2 3
: : : : :
: : : : :
: : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
ii
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: : : : : : : : : : : : :
: 5 : 9 : 14 : 15 : 19 : : : : : : : :
3.1 Variable Resistors : : : : : : : : : 3.1.1 Generic : : : : : : : : : : : 3.1.2 Implementation : : : : : : : 3.2 Boundary Conditions and Outputs 4.1 Design Circuitry : : : : 4.1.1 Variable Resistor 4.1.2 Bu er : : : : : : 4.1.3 Capacitor : : : :
22
22 23 26 30 32 32 34 37
4 CMOS Implementation
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
32
iii 4.1.4 Glue Logic : : : 4.2 Simulation Results : : : 4.2.1 Variable Resistor 4.2.2 Bu er : : : : : : 4.2.3 Cell : : : : : : : 4.2.4 Matrix : : : : : : 4.2.5 Glue Logic : : :
: : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : :
37 37 38 38 39 42 42
43
43 44 44 46 48 48 48 51 51 55
64
A Complete Matrix Spice File B Shift Register Simulations C Glue Logic Simulations
68 72 77
D Pin Out
81
iv
84 91
List of Figures
2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4 3.5 3.6 3.7 4.1 4.2 4.3 4.4 4.5 Neighboring mesh points. : : : : : : : : : : : : : : : : : : : : : Locally connected neurons with weights. : : : : : : : : : : : : : Notation for the Non-Equidistant Case. : : : : : : : : : : : : : : Laplace's equation model for our heated plate example. : : : : : Discretization of a plate domain by a rectangular grid of points. 4x4 Cellular Neural Network for our heated plate example. : : : An example of a cell circuit. : : : : : : : : : : : : : : : : : : : : A general cell with many weights. : : : : : : : : : : A variable resistor. : : : : : : : : : : : : : : : : : : A local memory cell consisting of shift registers. : : Two cells with connected variable resistors. : : : : Matrix with External Shift Registers. : : : : : : : : Processing elements with local and global memory. Matrix with Output Control Circuitry. : : : : : : : A one bit shift register. : : : The bu er used in each cell. : AWB circuit of a bu er. : : : AWB simulation of the bu er. AWB circuit of a single cell. :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : :
6 8 10 15 16 17 18 24 24 25 27 28 29 31 35 36 39 40 41
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
vi 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 B.1 B.2 B.3 B.4 C.1 C.2 C.3 C.4 Picture of the analog bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Layout of the ve way variable resistor. : : : : : : : : : : : : : : : : : : : : : : : Picture of the ve way variable resistor. : : : : : : : : : : : : : : : : : : : : : : : Layout of the large capacitor. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the large capacitor connected to a cell. : : : : : : : : : : : : : : : : : : Layout of the complete cell. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the complete cell. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Layout of the complete chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Picture of the complete chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Input and output of the test bu er. : : : : : : : : : : : : : : : : : : : : : : : : : : Output from the matrix of the example in Section 2.3.2. : : : : : : : : : : : : : : Output from the matrix showing rise and fall times. : : : : : : : : : : : : : : : : : Output from the equidistant case with the rst set of boundary conditions. : : : : Output from the \column" weight set and the second set of boundary conditions. Voltage sets used to test the complete matrix. : : : : : : : : : : : : : : : : : : : : Shift Register Cell used with esim. : : : : : : : : : : : : : Picture of the Shift Register Cell. : : : : : : : : : : : : : : AWB circuit of a 1-bit shift register driving a tap. : : : : AWB \oscilloscope" of a 1-bit shift register driving a tap.
: : : : : : : : : : : : : : : : : : : : : : :
: : : : : : : : : : : : : : : : : : : : : : :
45 46 46 47 47 49 50 52 53 54 56 57 60 61 63 72 73 75 76 77 77 79 79
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
An 8-bit shift register used to set control lines for the variable resistors. Picture of the 8-bit shift register. : : : : : : : : : : : : : : : : : : : : : : A 3-to-8 Decoder used to select an output column. : : : : : : : : : : : : Picture of the 3-to-8 Decoder. : : : : : : : : : : : : : : : : : : : : : : : :
D.1 the layout of the pins on the MOSIS Tiny Chip frame. : : : : : : : : : : : : : : : : : 82
List of Tables
2.1 2.2 2.3 2.4 2.5 The voltages obtained by Spice simulation for the equal distant case. The voltages obtained by Maple for the equal distant case. : : : : : : The voltages obtained by spice simulation for the variable case. : : : The voltages obtained by Maple for the variable case. : : : : : : : : : The di erences between Table 2.3 and Table 2.4. : : : : : : : : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
: : : : :
19 19 20 21 21
4.1 The ve basic variations the resistor can achieve. : : : : : : : : : : : : : : : : : : : : 34 4.2 The aspect ratios (w/l) for the devices in each cell's bu er. : : : : : : : : : : : : : : 35 5.1 The aspect ratios (w/l) and sizes used for the bu er. : : : : : : : : : : : : : : : : : : 44 5.2 Results from the example : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55 D.1 The pin out used for the chip. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83
vii
Chapter 1
Introduction
1.1 Background
Partial di erential equations (PDEs) are very useful in understanding the physical environment in which we live. They can be used to model numerous aspects of life from economics (the rate of change in production due to changes in capital) to music (the physics of a vibrating string) and, of course, to traditional engineering problems. The usual way of solving these equations is to use a digital computer. Unfortunately solving sets of partial di erential equations can be costly in both time and computer resources. One class of PDEs which is fairly common is referred to as the elliptic boundary problem, which is frequently solved using the nite di erence method. An example of the many problems that ll this class is that of modeling heat ow. The traditional method which uses di erence equations requires that the space over which the problem is to be solved is divided up into points and an equation must be solved at each point. For accurate results a large number of points and thus equations are needed. This requires a lot of computing time as well as a large amount of memory. Numerous books and journals have been dedicated to nding faster and less memory intensive methods. In a previous paper by Gobovic and Zaghloul GZ93] an idea was presented for solving this type of problem with neural networks. Our understanding of neural networks, both natural and arti cial, is based on the McCullochPitts model. MP43] This model essentially consists of a cell body, called the soma, many inputs from 1
2 other neurons called dendrites, and one axon which is the output. One axon can connect to many other neurons, even those that are not near by. The junction on the dendrite that connects to the axons of other cells is called a synapses. In general, each cell connects to many other cells. An Arti cial Neural Network (ANN) is a system for processing information based on the model of a biological neuron. ANNs, which can be implemented as digital or analog circuits have a major drawback when implemented as a VLSI design because each neuron may have many connections to many other neurons. A Cellular Neural Network (CNN) is an ANN which is made up of cells that are typically identical. If these cells are locally connected, that is each neuron is only connected to its closest neighboring neurons then it is well suited for VLSI implementations. CY88b] In essentially all neural networks, both natural and arti cial, the neurons are connected by weights which govern how much of an e ect the information from the other neuron will have on the current neuron. The rst paper by Gobovic and Zaghloul had an accuracy that was limited by the size of the mesh in which the problem domain was discretized on (e.g. the number of neurons available to be used). In a second paper GZ94] this problem was addressed in the same way that has been addressed for the digital computer solution: to change the distance between the nodes of the mesh. That paper developed the mathematical theory for a one dimensional case. This thesis extends the improvement made in GZ94] to the two dimensional case and presents an analog CMOS VLSI design and layout. An architecture for controlling the weights in the CNN is also presented.
3 should be able to solve the problem rapidly. And in fact it is possible to solve this problem on large and expensive parallel processing computers. In this thesis a method for solving the problem more accurately using the traditional method has been developed. The proposed technique divides the problem space in an uneven manner so that there are more nodes in areas of greater change. Again this slightly more advanced model could be solved using a parallel processing computer. But these computers tend to be too large and complex to be used in an embedded system. Thus a proposed Cellular Neural Network is introduced to solve the problem. The proposed system that can solve these PDEs rapidly, and is smaller and less complicated than digital parallel processing computers. Like with digital computers a novel architecture is also introduced for controlling the processing elements.
4 have the property that no additional routing will be required outside of a cell to control it. In this way a large number of cells can be used without the amount of routing per cell increasing. The new architecture and the two-diemnsional partial di erential theory will be demonstrated by designing an electronic CMOS circuit. The circuit will be layed out for an Orbit 2 analog process. The chip will then be fabricated by the MOSIS service. Finally, the CNN will be tested to verify both the theory and the design. The second chapter of this thesis describes the mathematical model behind solving PDEs and how those methods can be related to CNNs. In addition it provides a theoretical model for a cell and uses that cell in two illustrations. Chapter three describes an architecture for controlling neural networks in general and the CNN described in this thesis in particular. The fourth chapter describes how each part of the cell is implemented in CMOS. Chapter ve explains how each part was layed out in preparation for VLSI fabrication. The testing procedures of the chip is also reviewed along with the results of the tests. The nal chapter contains conclusions and suggestions for future work based on the results of the fabricated chip. The results of this thesis are expected to serve as an important step toward making large neural networks more controllable and thus more practical. It is also expected to provide partial di erential equation solutions to embedded systems, which would have previously required a digital computer subsystem.
Chapter 2
@ 2 u + @ 2 u = f (x y) @x2 @y2
(2.1)
6 de ned on a region R where u(x y) is a continuously unknown scalar function which satis es some given boundary conditions u( ) = at the boundary of region R, f (x y) is a given function and x and y are space variables. If we let R be a square bounded region in the (x y) plane then it can be divided evenly into a square mesh with mesh size h. We can de ne P (x y) = P (ih jh) as the node on the mesh at (i j ). In a two dimensional mesh each node has four neighbors: (i ; 1 j ), (i + 1 j ), (i j ; 1), (i j + 1).
(0,0) (1,0) (0,1) (0,j) (0,n+1) (1,n+1) (i-1,j) (i,j-1) (i,j) (i+1,j) (i,j+1)
(1,1)
(i,0)
(i,n+1)
Figure 2.1: Neighboring mesh points. The continuous function u(x y) can be approximated by the set of values at the nodes of the mesh. The partial derivatives of u(x y) can be replaced by the di erence equations:
@ 2 u(x y) at P = u(i + 1 j ) + u(i ; 1 j ) ; 2u(i j ) @x2 h2 2 u(x y) @ at P = u(i j + 1) + u(ih2j ; 1) ; 2u(i j ) @y2
ij ij
(2.2)
(2.3)
In order to simplify the system of equations into a matrix form we de ne a vector u to be the n2 values of u
(2.4)
u(i n)
In a similar manner we can also de ne vector F to be the values of F (i j ) at the nodes of the mesh
0 1 0 1 F1 C F (i 1) C B C B B F2 C B C F = BF (i 2)C i = 1 2 : : : n B B . C F = B .. C B.C B .. C C B A B C @ @ A
i
(2.5)
F ( i n)
Au + h2 F + b = 0
where the boundary conditions are in b, and A is
(2.6)
(2.7)
I is an n n identity matrix. Matrix A in (2.7) is a n n (or n2 n2 after B is substituted) symmetric block tridiagonal matrix which is also positive de nite. GZ93], GZ94], OR70a] Thus this system can be solved with a neural network approach with an energy function de ned as: E (v) = 1 v Av + v ' where ' = b + h2F 2
T T
(2.8)
and the neural net will try to minimize (2.8) such that:
du = ;@ E ;v v : : :v v : : : v : : : v 11 12 1 21 1 dt @v
ij n n ij ij ij
nn
(2.9)
ij ij
where u is the input of the ij -neuron in the net and v is its output. Note that v = u . Zur92], TH86] Thus
du = ; X a dt
ij kl ij kl
ij kl
v ;'
kl
ij
v =u
ij
ij
i j = 1 ::: n
(2.10)
where a is an element of matrix A at row ij and column kl. The matrix A is diagonally dominant, and thus the neural network can be made up of locally connected cells. An ij -neuron and its locally connected neighbors are illustrated in Figure 2.2. GZ93]
(i-1,j)
1 (i,j) -4
(i,j-1)
(i,j+1)
(i+1,j)
2 3 1 7 6 T = 61 ;4 17 6 7 4 5
1
(2.11)
@ 2 u + @ 2 u = f (x y) @x2 @y2
1
(2.12)
10
ui,j+n
i,j
n i,j
u i-w ,j i,j
wi,j
u i,j
ei,j
u i+e ,j i,j
s i,j u i,j-s
i,j
11 where
@ u @ x2
and
@ u @ y2
can be approximated by taking the Taylor expansions about the mesh point Mit80]
i j
+ni j
u ;
i j
si j
u+
i
ei j j
u;
i
wi j j
i j
i j
(2.13)
i j
i j
2 w (u +
i j i
ei j j
2 s (u
i j
i j
+ni j
;u e w ;u n s
i j i j
i j
i j
i j
i j
) + e (u ; (e + w ) ) + n (u ; (n + s )
i j i i j i j i j i j i j i j
wi j j
;u )
i j
si j
;u )
i j
(2.14)
; u ) + e (u ; ;u ) e w (e + w ) 2s (u + ; u ) + n (u ; ; u ) + n s (n + s )
i j i j i wi j j i j i j i j i j i j i j i j ni j i j i j i j si j i j i j i j i j i j
; f (x y) = 0 (2.15)
12 This equation can be written more concisely if the following functions are de ned
2 E = e(e + w) W = w (e 2 w ) + =N +S +E +W
i j i j i j i j i j i j i j i j
N = n(n2+ s) S = s(n 2 s) +
i j i j
i j
i j
(2.16)
i j
After the functions are substituted we have a ve part equation similar to (2.1)
N u ;N u
i j i j
i j
+ni j +ni j
i j
+S u ; ;S u ;
i j i j i j i j
si j
si j
+E u ;E u
i j i j
i ei j j
i ei j j
+W u ; ;W u ;
i j i i j i
wi j j
wi j j
; u =0 + u =0
i j i j
(2.17)
Au + + b = 0
The matrix A can be written
(2.18)
(2.20)
0 B 1 ;N 1 B;S 2 B ;N 2 2 B =B B B ;S 3 ;N 3 @
i i i i i i i i
;S
1 C C C C C C A
(2.21)
0 B;E B B E =B B B @
i
;E
;E
;E
1 0 C B;W C C W =B B C B C B C B A @
i
;W
;W
;W
1 C C C (2.22) C C C A
Notice how this matrix (2.20) is similar in form to (2.7). The I , the identity matrices in (2.7), which are, of course, diagonal, are represented in (2.20) by the digonals E in the upper triangle and W in the lower. The B matrix is also very similar to the B matrix in (2.7). To complete the matrix equation (2.18) we need only to de ne .
i i i
0 1 B 1C B 2C B C =B . C B .. C B C @ A
n
0 1 F (i 1) C B BF (i 2)C B C = B . C i = 1 2 ::: n B .. C B C @ A
F (i n)
(2.23)
The boundary vector b is de ned as it was in the previous section, but will become more clear after an illustration which will be given in Section 2.3 on the next page.
14 The matrix A given in (2.20) is irreducibly diagonally dominant, has positive diagonal elements, and non-positive non-diagonal elements therefore it is a M-matrix (x2.4.14 of OR70b]). It follows that the solution to the system of equations will be a homeomorphism, and will therefore have a unique and stable solution (x5.4.1 of OR70b]). Since the system has a unique solution a convex energy function which can be can be de ned as it was in Section 2.1 on page 5. GZ93], OR70c], Zur92]
2.3 An Illustration
As an example, let us consider a rectangular plate that can be heated or cooled by applying temperature sources (heaters or refrigerators) around its perimeter. The classical model for the steady-state heat ow in such a plate is the Poisson's equation Ric83]:
@ 2 u + @ 2 u = f (x y) @x2 @y2
(2.24)
where u(x y) denotes the temperature at the point (x y). A sample physical plate is shown in Figure 2.4 on the next page. The temperature distribution inside the physical plate is modeled by the Laplace's equation:
(2.25)
Suppose that the the temperature in the interior of the plate is at room temperature. Therefore the temperature will change until it reaches a new steady state temperature which is caused by the given conditions. This process can be modeled by
du = @ 2 u + @ 2 u dt @x2 @y2
du dt
(2.26)
The problem is now a function of time: u(x y t). The initial condition u(x y 0) = 20 models room temperature. As approaches zero equation (2.25) becomes (2.26). Thus the steady state is not dependent on time t nor is it dependent on the initial conditions of the interior of the physical plate.
15
Partially heated side 0 0 Heated side u(x,0)=100 2 2 d u + d u dx2 dy2 in the interior Cooled side = 0 u(x,1)=0 u(0,y)=100(1-y) 1 y
Figure 2.4: Laplace's equation model for our heated plate example.
Au + b = 0
(2.27)
where matrix A is a block diagonal 4 4 matrix given by (2.7) and u is the unknown temperature vector as de ned in (2.4). The vector b is the boundary conditions for this problem and is de ned as
(2.28)
16
Partially heated side u(0,j)=20(5-j) 0 0 1h Heated side u(i,0)=100 2h 3h 4h 5h Heated side ih u(5,j)=100 (i,j) Cooled side u(i,5)=0 1h 2h 3h 4h 5h jh
Figure 2.5: Discretization of a plate domain by a rectangular grid of points. These equations were solved using both a numerical method and our cellular neural network method. To solve the equations using the CNN voltages must be de ned for the temperatures and we arbitrarily chose a scaling factor of 20 (i.e. 100 C corresponds to 5V.) (See Figure 2.6 on the next page.) A circuit diagram for a cell is given in Figure 2.7 on page 18. The rst order circuit analysis for the cell (i j ) is
i j
+1
=I
i j
(2.29)
17
4V
3V
2V
1V
5V
(1,1)
(1,2)
(1,3)
(1,4) 0V
5V
(2,1)
(2,2)
(2,3)
(2,4) 0V
5V
(3,1)
(3,2)
(3,3)
(3,4) 0V
5V
(4,1)
(4,2)
(4,3)
(4,4) 0V
5V
5V
5V
5V
Figure 2.6: 4x4 Cellular Neural Network for our heated plate example.
18
v i,j
I i,j
v i-1,j
R v i,j-1 R v i,j
R v i,j R v i,j+1
+ u i,j v i+1,j -
+ -
+ v i,j -
du = ;4u + v + v + v + v ;1 +1 ;1 dt v =u i j = 1 ::: 4
i j i j i j i j i j i j i j
i j
+1
+ RI
i j
(2.30)
where = RC and is the time constant of the circuit. Equation (2.30) has the form described earlier by the equation (2.10) where RI = ;' . As before equation (2.10) converges to the steady state due to the positive de niteness of matrix A, and therefore equation (2.30) will also converge. In our simulation a value of R = 100k and C = 1pF giving a time constant = 25ns. The results obtained by Spice simulation (see Table 2.1) match those obtained by numerical analysis using the software program Maple. (See Table 2.2.)
i j i j
Table 2.1: The voltages obtained by Spice simulation for the equal distant case. 4.3687 4.2965 4.4270 4.6748 3.1783 3.3904 3.7365 4.2724 1.9542 2.3502 2.8563 3.6783 0.78848 1.1997 1.6602 2.5846
Table 2.2: The voltages obtained by Maple for the equal distant case.
20 the south. Although the topic of how to design a suitable mesh for any particualr problem is beyond the scope of this thesis a possible mesh could be described using four matrices. One for the weight in each direction. In these matrices a \1" indicates the least resistance, and a \4" the most.
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 4 3 37 6 7 n=6 64 3 2 27 7 6 7 4 5 4 2 1 1 3 2 3 4 4 37 4 4 3 47 6 7 e = 64 3 2 37 6 7 4 3 27 7 6 7 63 2 1 27 7 6 7 3 2 17 5 4 5 3 2 1 3 2 1 1 2 3 4 4 3 37 6 64 3 2 27 6 7 s=6 64 2 1 17 7 6 7 4 5
4 3 2 1
(2.31)
Again the results from the Spice simulation (see Table 2.3) match those obtained from the numerical solution using the same mesh, at least within one hundredth of a volt. Only two have a di erence greater than one hundredth of a volt (0:0119V and 0:0116V). (See Table 2.5 on the following page.) Although the Spice and numerical solutions match for both the equidistant and the variable distant cases, the latter is more accurate since it gives more solutions to the PDE in the area of highest volatility at the expense of fewer solutions in the area of lowest volatility. 4.3256 4.1895 4.2241 4.4858 3.1128 3.2084 3.4717 3.9110 1.9174 2.2808 2.7065 3.2409 0.97039 1.4790 2.0023 2.5608
Table 2.3: The voltages obtained by spice simulation for the variable case.
21
Table 2.4: The voltages obtained by Maple for the variable case.
Table 2.5: The di erences between Table 2.3 and Table 2.4.
Chapter 3
22
23 wanted a method that would not require much additional control routing no matter how large the matrix was.
3.1.1 Generic
A generic method for controlling neural networks, especially CNNs, with a minimal amount of control routing would be to use a distributed memory where each cell stores its control information locally. Since the cell weights are frequently need to be controlled individually each weight should have its own local memory. In other types of neural networks, which are not covered in this thesis, the gain of the bu er also needs to be controlled. In those cases, the cell's bu er should also have its own memory. From an architectural point of view all of the local memories need to be addressable so that they can be written to. The memories will, in turn, control the cell. There are numerous types of memory cells that could be used in this generic architecture. The idea will work with both analog and digital memories, providing that the memory type will function with the cell part that needs to be controlled. Figure 3.1 on the next page shows a more general cell with an arbitrary number of weights, and therefore neighbors. The method we chose uses several passive resistors connected in series.1 At the point between each resistor is a digitally controlled analog switch.2 The other node of each switch is connected to a bus. All of the switches in one variable resistor are connected to the same bus and that bus acts as the output of the variable resistor. So the e ective resistance of the circuit is controlled by which switch is on. The number of switches and resistor segments depend on how much controllability is required for any particular application. The size of each resistor segment also depend on the application for which the digitally controlled resistor will be used. (See Figure 3.2 on the following page.) It might appear that all of these switches has made the routing problem worse, because each switch needs a control line. For example, if a variable resistor with ve possible values was required (e.g. 0 , 25 , 50 , 75 , and 100 ) you would need ve switches and therefore ve lines routed to each resistor to control those switches. Although this number can be reduced by using a multiplexer and demultiplexer the number of lines is still dependent on the number of variable resistors which is in turn dependent on the size of the matrix.
1 2
The other reasons besides control routing that this choice was made will be discussed in the next chapter. The switch is always either on or o , but when on it allows an analog signal to pass.
24
Weight
Local Memory
gh
or
ht g ei
rn
Local Memory
ei
em
lM
Weight
Lo
ca
In
r1
r2
c Lo
al
em
or
4 Lo
t gh ry ei mo W l Me
Cell
Out
ca
rn
25 In order to solve the problem of controlling all of the switches a system of digital shift registers can be used.3 Like any other shift register each register needs two clocks (one the inverse of the other). As the clocks are cycled the input of one shift register moves to the input of the next and so on. A one bit shift register is needed for each switch of each variable resistor (i.e. one for each port, or empty circle, of Figure 3.2 on the page before). This allows us to reduce the number of lines that must be routed to each variable resistor to three (the input, and two clocks.) (As shown in Figure 3.3.)
0 r1 rn
Shift In
Shift Out
Figure 3.3: A local memory cell consisting of shift registers. In order to reduce the number further, the output of the last shift register of a variable resistor can be connected to the input of the rst shift register of the next variable resistor. In this way it would be possible to reduce the total number of lines used to control all of the variable resistors to three. However, that would require a great number of clock cycles to initialize all of the variable resistors, or weights, of the neural network.4 After having considered this case it becomes more clear that this method of controlling weights could be applied to other neural networks. The shape of the network is not important, in fact it does not even need to be cellular. In order to set or change the set of weight values in the neural network using the shift register method you would, in general, need to shift in a completely new set of weights. For example if we had a single variable resistor to set and it had ve possible values, we would need to shift in a \1"
3 Since the shift registers use only two values they are digital, but the two values needed are not the standard TTL digital of 0V and 5V. See the next chapter for more information. 4 The equal distant case and other \template" models can also be solved in this manner. Only the two clocks and one set of shift registers for each variable resistor of the template cell are needed. The switches of all of the other cells can be controlled by these shift registers using bus lines. In this way, far fewer shift registers are needed and thus the layout can be made more compact.
26 followed by four \0"s in order to set it to its highest resistive value or four \0"s followed by a \1" to set the resistor to its smallest value. This requires one complete two phase clock cycle for each full bit shift register. Although for larger matrices this could take a lot of time it is still relatively short, since each phase of the clock cycle need only change the state of one inverter. Thus the clock cycle for the circuit should be as fast as is possible for any clock with that type of technology. As previously mentioned active resistors can be used for variable resistors as well. A system of memory cells could be used to store the control values for active resistors as well. Since shift registers typically only store digital signals and active resistors require analog values the analog memory cells would have to be written to in some other way. In a two dimensional matrix the simplest of which would be a raster type system where a bus line would deliver the signal and a horizontal and vertical line would select each memory cell for a write. Note that active resistors still have a problem with regard to the number and size of the transistors compared to the linear range over which they function, as well as the resistive range. This aspect will be covered in the next chapter.
3.1.2 Implementation
In our implementation we chose a compromise. Each cell consists of four variable resistors which are stacked vertically (in the north direction.) The cells are, of course, in a matrix. The shift registers run in the horizontal direction. The output of the shift registers of one cell are connected to the same level resistor in the next cell which is in the next column. (See Figure 3.4 on the next page.) The clock lines run along a bus vertically through the matrix. In this way you can shift in the settings for a whole row of resistors from one input. Which of course means that all of the rows can be set in parallel. In this way the weights can be set very rapidly since a shift register is only as slow as the switching time of one pass transistor and one inverter. Thus the clock speed is essentially as fast as it can be for any circuit with a given technology. Our chip has a 4 4 matrix that has ve equally spaced resistor values. Therefore, you need ve clock cycles to set each variable resistor. Since the rows are done in parallel the total number of clocks is twenty (5 clocks 4 columns). Although this would be simple and quick to set all of the resistors it requires one input pin for each row of resistors. In our matrix there are 16 such rows. In general pins are at a premium in VLSI designs, so it is desirable to reduce the number used as much as possible.
27
Shift Ins
Shift Outs
+ -
+ -
28 After the complete layout was considered, including test circuitry, it was clear that no more than four pins could be used for the variable resistor's shift registers inputs. In order to reduce the number from sixteen to four an extra bank of two shift registers were used. This technique adds two more clocks and uses two inputs for a total of four pins. Under these conditions the two new shift registers, each of which holds one bit of the setting information for eight rows, must be lled before each clock cycle for the matrix. (See Figure 3.5.) Taken together one hundred sixty clock cycles are needed (8 20 = 160). In a proven design fewer pins would be needed for testing therefore more could be used for shift register inputs and the total number of clocks could be reduced.
Top Shift In Shift Out to next cells Shift In
Clk1 Clk1
1,1
1,4
Clk1 Clk1
3,1
4,1
4,4
Figure 3.5: Matrix with External Shift Registers. To a certain extent these two extra banks of shift registers act as a global or shared memory which is used as a staging area to initialize the local or distributed memory cells of the weights. When looking at the complete system from this point of view it is analogous to a SIMD parallel computer with a two-dimensional mesh interconnection network. (See Figure 3.6 on the next page.) Parallel computers tend to have a host or controlling computer. In our case the test equipment, or a system in which the CNN were embedded in, would have a comparable task as this host would. As a nal comparison we should point out that both the routing of the interconnection networks and
29 handling the individual processing elements I/O requirements are two of the major di culties that must be over come in digital parallel computer designs as well. HB84]
Processing Element
Local Memory
Global Memory
Processing Element
Processing Element
Processing Element
Local Memory
Local Memory
Local Memory
Processing Element
Local Memory
30
31
Selects On/Off
1,1
1,4
4,1
4,4
Chapter 4
CMOS Implementation
As can be seem from the preceding chapters the Arti cial Neural Network described in this thesis consists of a matrix of cells. In addition it is clear that each cell must have four variable resistors, a bu er and a capacitor. Although the larger the voltage range that the circuit could handle for input and output the more accurate our results would be, we decided that 5V was a reasonable voltage to design for. The CMOS design of each of the necessary analog parts will be discussed in the rst section of this chapter. The second section will review the results of Spice simulations.
33 resistances in the circuit. We determined based on simulations that approximately 20k should be used for our 100% resistance.
Resistor
We choose to use a passive resistor for several reasons. The rst and foremost is that the CNN requires a linear resistor across as wide a range of voltages and resistances as possible. Although active resistances can be found that are linear across fairly large voltage ranges, they do not have a wide range of resistances. For our purposes a resistor needed to be able to be several values from 0%x to 100%x where x was larger than the parasitic resistances of the rest of the circuit. The active resistance designs that we reviewed could have a large range of resistances in Ohms, but not as a percentage. AH87a] GAS90a] For example, an active design might have had a range of 50k to 75k (which is 25k ). Although 25k is a su cient range, we require 0 to 25k not 50k to 75k .1 Passive resistors, on the other hand, are linear across any practical voltage range. A second reason for using passive resistors is that once they are formed they can't be changed and therefore they do not need control wiring themselves. As mentioned in the chapter on architecture they can still be variable if taps, to allow the current to leave prematurely, are placed along them. A passive resistor in a CMOS VLSI design is simply a length of polysilicon, n-type or p-type di usion or a well. A contact is placed on either end of the length of material. We chose to use polysilicon, because although it has a lower resistance than the others per unit length, it can be snaked so that the total area used for a certain resistance is less than for other materials in the MOSIS 2 analog process. A length of 1808 that was 2 wide.2 At 20 per square unit each small resistance segment was 4520 . Each contact adds approximately 20 ; 50 . Each tap also adds a certain amount of resistance depending on the size of the transistor. MOS93] Since the taps are essentially digital switches that will pass analog values they could be implemented as pass transistors. Simulations showed that true CMOS pass transistors were not required
1 We should point out that active designs that we determined to be to complex (large) in terms of the number of transistors were not fully examined, so it is possible that some active resistor do have these characteristics, but were impractical for size reasons. 2 The CMOS technology used was the MOSIS (Orbit) 2 analog (low noise) which means 1 = 1 .
34 so NMOS pass transistors could be used to save space. In order to pass 5V linearly NMOS transistors must have an \on" of 8V and an \o " of ;8V rather than the usual 0V and 5V found in traditional digital circuits. The pass transistors were as wide as possible to minimize there resistance without increasing the size of the circuit. In this case that was 4 . With a minimum length of 2 the NMOS pass transistor has an average e ective resistance of about 2k when selected (\on.") To summarize, the total resistance is 4520 for each segment plus 2k for the tap and 20 ; 50 per contact. The values for the ve basic variations are given in Table 4.1
Percentage Resistance
0% 25% 50% 75% 100% 2k 6:6k 11:1k 15:6k 20:1k
Shift Register
The shift registers needed to be able to have a \1" at any register, while having a \0" at all of the others. This required the use of full bit shift registers. A full bit shift register is simply a pass transistor and a bu er which is made from two inverters. Since we wanted to be able to select one of ve settings ve shift registers were needed by each variable resistor. The output of each register only needed to be able to drive the next register and a tap, which is simply the gate of an NMOS transistor. Therefore each register needed to drive very little current, so the bu er could be minimum size. For the same reason the bu er's power rails needed to be be at least 8V. The pass transistor only needed to drive the bu er so if the bu er is CMOS the pass gate can be implemented using one NMOS transistor. (See Figure 4.1 on the next page.)
4.1.2 Bu er
The bu er we choose to use was implemented by an operational ampli er with the output tied to the negative input terminal. The operational ampli er design was taken from AH87b] and then reduced in size by examining the simulations and making adjustments. Since we are more interested in the
35
Clock Vdd Resistor Select
Shift In
Shift Out
Vss
Figure 4.1: A one bit shift register. steady state or DC performance we were able to signi cantly reduce the size of the transistors from what the algorithms in AH87b] suggested. For the same reason we were also able to remove the large \compensation capacitor." Another di erence from AH87b] is that a biasing transistor was removed and replaced with a voltage bus. The use of this bus makes the circuit slightly smaller and makes a slight runtime adjustment possible. The circuit is basically a di erential ampli er ( ve transistors) with a little extra biasing circuitry (two transistors.) (See Figure 4.2 on the following page.) The transistor aspect ratios (w/l) that were calculated and those that were actually used are given in Table 4.2.
Table 4.2: The aspect ratios (w/l) for the devices in each cell's bu er.
36
Vdd
M4 M6
M3
+ Vin
M2
M1
Vout
Vss
M7
M5
Vbias
Vss
37
4.1.3 Capacitor
Although the previous circuit parts were all designed in a rigorous manner the capacitor was more ad-hoc. The only requirement was that the capacitor store the value at the gate of the bu er as long as necessary for it to be read. If it were larger than necessary the circuit would simply take longer to stabilize. If it were not large enough the result would decay too much before it could be read. Therefore we decided to make the capacitor as large as possible without increasing the size of the layout. In other words we used the extra space left by the width of the bu er compared to the width of the variable resistor. This turned out to be a total of 189 70 of which only 154 29 was usable for the capacitor since each capacitor requires its own well with a full ring guard. In addition this technology requires a 2 boarder so that one plate of the capacitor is larger than the other. The capacitor is then between 1:92pF and 2:46pF (154 29 (0:43fF= 2 to 0:55fF= 2 ). MOS93]
38
Resistor
It was not possible to test the passive resistor directly from the layout because the circuit extract of magic considers the two ends of a long segment of polysilicon as only one node. We used Spice at the design stage to ensure that the taps did not alter the resistance to much, but the actual polysilicon resistors were assumed to be ideal with resistances as calculated in Table 4.1 on page 34.
Shift Register
Since the shift register is basically digital it was possible to use the standard CMOS digital simulator esim to test its functionality. Spice was needed to ensure that the bu ers in each shift register were able to drive a tap far enough on or o for the desired analog signal to pass through. The esim input and output for the basic shift register cell is given in Appendix B on page 72 as well as an AWB circuit and \oscilloscope" showing the a shift register driving a tap (Figure B.3 on page 75.)
4.2.2 Bu er
As mentioned previously the design for the bu er was taken from a book and then adjusted considerable during simulations. Although variations in the size of the devices was taken into account in all of the devices it was especially needed in the bu er since it had to be kept small. Normally a minimum size that is fairly large is selected and then all of the aspect ratios are multiplied by that size in the layout. We used actual device sizes and tried all extreme combinations of device size errors during simulations. This extra care kept problems from arising during the layout stage. The bu er that was simulated is shown in Figure 4.3 on the next page. In addition to the basic circuit an extra resistor \z" was used to more accurately mimic the layout. A load on the output consisting of four resistors and four capacitors was also added to simulate the load that the neighbors of the bu er would cause. The biasing voltage, V , was determined based on simulation results to be ;3V.
GG
39 The nal bu er worked quite well when simulated. In Figure 4.4 on the following page \Channel 3" shows the error (di erence between input and output) in volts and is at a di erent scale, and centered at a di erent zero point than the input and output channels. The maximum and minimum errors are also given in the \Value" column (;12:9mV { ;15:4mV.)
4.2.3 Cell
The bu er above was combined with four ideal resistors and a capacitor at its input to form a cell. The cell is then simulated with four resistors and four capacitors to mimic the load that the output of a cell would see. Figure 4.5 on page 41 shows this AWB circuit, whose simulation results were as expected.
40
41
42
4.2.4 Matrix
Spice-3f4 from UCB was used to simulate the complete matrix with both ideal resistors and capacitors. The bu ers were made from both ideal parts and from a layout which adds parasitic capacitance and resistances. It was not possible to test the complete layout due to the problems previously mentioned with the extraction process. The input le that was used for the simulations is given in Appendix A on page 68.
Chapter 5
5.1 Layout
Most of the parts had a very straight forward layout. The notable exception was the bu er which required a fair amount of adjustments in order to make some of its large, odd shaped, devices t in as small as space as possible. All of the layouts were originally designed with both N-wells and P-wells. The CIF extractor was then used to remove a well and x up the remaining well. The layout had to be altered in certain areas where the wells were joining unintentionally. This was especially true around the bu er which had odd shaped wells and the capacitor which had to have its own well. Special care was taken so that the routing of each cell was modularized. This insures that the mesh could be made arbitrarily large without having to change the routing. Minor Changes to the layouts of the smaller parts were made to facilitate this modularity.
43
44
5.1.1 Bu er
As described in the last chapter the devices in the bu er had very particular aspect ratios given in Table 4.2 on page 35. The technology we used had a minimum length of 2 and a minimum width of 3 . In order to keep the size of the bu er small, but still have good performance we chose to make the largest ratios as small as possible while making the smaller ratios (1/1) larger than necessary to counteract the e ect of error in fabrication. These transistors (M1 , M2, M3, and M4 ) are the di erential pairs and so they need to be very similar in size. A slight change can make a big di erence. With the help of Spice simulations and taking into account the maximum error allowed under the fabrication rules we arrived at the device sizes given in Table 5.1. MOS93]
Table 5.1: The aspect ratios (w/l) and sizes used for the bu er. The original, straight forward, layout of the bu er was much too large so the devices M6 and M7 were \snaked." The transistor M5 , which is used to bias the di erential pairs, was bent to make the total circuit smaller and to make the connections easier. All of the contacts are made as large as possible. Figure 5.1 on the following page shows the nal layout of the analog bu er while Figure 5.2 shows the bu er as it was fabricated.
45
Vdd!
Vgg
in
out
Vss
46 resistance that we required (see Section 4.1.1 on page 33.) Figures 5.3 and 5.4 show the nal layout and physical image of the variable resistor.
out GND in a c
Vdd
5.1.3 Capacitor
As mentioned in Section 4.1.3 on page 37 the capacitor's size was determined by the space left over. The space must be used to connect the capacitor to V , give it its own well, and | to shield that well from noise | as complete as possible guard ring. The actual capacitor is made from two layers of polysilicon with one layer overlapping the other by 2 . The connection to the top of the capacitor is
SS
47 made with a grid in order to spread out the charge more evenly. Note that the grid can be connected to the rest of the cell with a second layer of metal. For the same reason the capacitor is connected in more than one place to V . The guard ring is complete except for openings for those connections to the negative power rail. Figure 5.5 shows the complete capacitor, positive terminal, and power rails while Figure 5.6 is a picture of the capacitor as connected to a cell.
SS
Vdd
Vss
48
5.1.4 Cell
The four sub-circuits described above are connected together to form the basic cell of the cellular neural network. The variable resistors are stacked four high with both the shift inputs and the analog inputs on the left and the shift outputs on the right. The analog outputs are on a bus running horizontally in each resistor and the four variable resistors are in turn connected together on a vertical bus near the center of the cell that connects the resistors to the capacitor and the input of the bu er. The output of the bu er is connected through a switch (the same as the tap used in the variable resistor) to an output bus. As can be seen in Figure 5.7 on the following page the capacitor and the bu er are on the bottom left and right of the cell. Also, ten clock lines for the shift registers run vertically through the cell, but are routed around the bu er which is rotated on its side.
5.1.5 Matrix
The matrix is simply constructed by making copies of the cells in a 4 4 matrix. The power rails of the shift registers, bu er, and capacitor connect with their horizontal neighbors. The biasing buses for the bu ers also connect to their neighbors. The clock lines for the shift registers connect every other line together along the bottom of the matrix. An additional set of four busses (one per column) run vertically through the matrix in order to connect the output switches in each column together.
49
50
51 chip is given in Figure 5.10 on page 53. Note that the picture is a composite of smaller pictures and is as accurate as possible with the available equipment.
5.2 Measurement
After the chip was fabricated it was tested in two stages. First parts were tested (a bu er and a complete cell). In the second stage the complete matrix was tested with several topologies (sets of weights) and several sets of boundary conditions. Finally we should note that although the chip was designed for power rails to be at 8V it was tested with power rails set to 0V and 5V. This was done for two reasons. The main reason is that the chip consumed more current than was expected at the higher voltages (due to a slight error in calculations). The second reason is that although an acceptable amount of current was consumed at slightly higher voltages than a 5V swing, the test equipment available makes it di cult to ensure the accuracy of the digital pulses at voltages other than 0V and 5V.
Bu er
The test bu er was measured rst since it was the most basic and traditional design. After values for the power rails (V and V ) were selected the values for V was determined and applied. It should be noted that the power rails and V should be applied at exactly the same time. If this is not possible V should be applied rst so that the current allowed to ow between the power rails in the bu er will be a known amount. After the bu er was turned on a sine wave was applied to the input pin of the bu er. A Hewlett Packard 100MHz digital scope was used to measure both the input (channel 1) and the output
DD SS GG GG GG
52
final_chip_cif.eps
53
54 (channel 4). The amplitude of the sine wave was then compared to the design data and the Spice simulations to con rm that it had the desired characteristics. The biasing voltage was also adjusted to con rm that it was set to a value which gave the best results (the largest amplitude with the least distortion.) Although the bu er only needed to work at DC it was tested at a range of frequencies to ensure that it functioned properly. One set of output results are given in Figure 5.11.
Cell
The test cell was tested by assuming the cell was in the top left hand corner of the matrix and setting the boundary values based on Spice simulations with ideal parts for what the cells neighbors should be producing. Proper digital values were then shifted into the cell from a digital test device (a Tektronix LV500) and the output was examined with a digital scope (a Tektronix TDS420). The scope was set to trigger based on an extra output from the LV500. This ensured that the value being examined came from the output pin at a known amount of time after the resistors were set. The
55 actual value used for comparison was an average calculated by the scope itself. This helped to cancel any noise that was on the signal. This is a valid technique since we are actually only concerned with DC values.
5.2.2 Matrix
The complete matrix was then tested by entering several topologies into the LV500 and running each of them against each of several di erent boundary conditions. (See Appendix F on page 91 for a sample \msa" listing from the LV500.) Again the results were taken with a digital scope and the values noted were averages. These numbers were compared with those calculated with both Spice (using ideal parts, including ideal bu ers) and with numerical answers for the partial di erential equation being solved. The results for the non-equidistant case example in Section 2.3.2 on page 19 are given in Table 5.2. Cell Spice Numerical Chip Numerical ; Chip (1,1) 2.73 2.73 2.72 0.01 (1,2) 2.25 2.24 2.25 -0.01 (1,3) 1.77 1.77 1.76 0.01 (1,4) 1.39 1.39 1.38 0.01 (2,1) 2.69 2.67 2.68 -0.01 (2,2) 2.39 2.28 2.29 -0.01 (2,3) 2.08 1.91 1.89 0.02 (2,4) 1.80 1.59 1.56 0.03 (3,1) 2.68 2.68 2.69 -0.01 (3,2) 2.28 2.39 2.39 0.00 (3,3) 1.91 2.08 2.06 0.02 (3,4) 1.59 1.80 1.72 0.08 (4,1) 2.79 2.79 2.80 -0.01 (4,2) 2.56 2.56 2.59 -0.03 (4,3) 2.30 2.30 2.32 -0.02 (4,4) 2.02 2.02 1.98 0.04 Table 5.2: Results from the example. All values in volts. The chip values given in Table 5.2 were taken from the digital scope. An example screen is given in Figure 5.12 on the following page. Each waveform comes from the output of one row. The
56 large changes in voltages come from changing the output select lines which select the column being viewed. The widths of those changes is set by programmed delays in the LV500. Delays were selected that made the values easy to read, although the chip could have been run faster with di erent test equipment. This is because the scope will only display one triggers worth of information at a time and a separate trigger would be needed for each column select change to get completely accurate and exact timing results. If the chip were in an embedded system this would be the case since the rest of the circuitry would be designed after taking the hold time into account.
Average value between lines
Figure 5.12: Output from the matrix of the example in Section 2.3.2. In order to determine the rise time for the chip a shorter delay was used. Although it makes it very di cult to read accurate solutions to the problem in general, it does make it possible to get more accurate rise and fall times. Essentially this shorter delay allows us to zoom in on the edges of the curves. This plot is shown in Figure 5.13 on the next page. Note that both the time and the voltage scale are di erent from those in Figure 5.12. The longest rise time for the matrix of the example was
57
Figure 5.13: Output from the matrix showing rise and fall times.
58 for the node in the rst column and the fourth row with 3:22 s. The next edge for that node is a falling edge which is slightly slower than the previous edge, with a fall time of 3:48 s. The fact that the next edge has a slower fall time, but of the same order of magnitude, implies that the delay is caused by the load of the output circuitry. It is clear that all the nodes should take approximately the same time to stabilize, therefore all of the nodes must have stabilized before the rst output was read. If the time it took the internal nodes to stabilize were a major cause of delay, the second and subsequent edges would be noticeably smaller, since that delay would have been accounted for in the rst edge. To con rm this it is possible to allow the nodes plenty of time to stabilize, and then turn on the output circuitry. (Recall from the discussion of the architecture in Section 3.2 on page 30 that it is possible to select any one column for output or no columns for output.) When this is done the results are essentially the same as those described above. Unfortunately, this makes it impossible to determine the actual internal delay of the nodes themselves. Other combinations of voltages and weights were tested. The boundary conditions (in volts) in Figure 5.16 on page 63 were used with each set of weights which will now be given and discussed. The following set is simply the non-variable case. All of the weights are at there maximum. Although ideally the weights could be anything as long as they were all the same, the maximum value is a better choice, because parasitic resistances will have a smaller e ect. In addition the bu ers will have to
59 provide less current. Figure 5.14 on the next page is the output from the sixteen nodes with the rst set of boundary conditions (the same used in the rst scope images.)
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 4 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 4 4 47 4 4 4 47 6 7 e = 64 4 4 47 6 7 4 4 47 7 6 7 64 4 4 47 7 6 7 4 4 47 5 4 5 4 4 4 4 4 4 4 2 3 4 4 4 47 6 64 4 4 47 6 7 s=6 64 4 4 47 7 6 7 4 5
4 4 4 4
(5.1)
This second case is the variable case as described in the example in Section 2.3.2. It is also the one used to produce the plot in Figure 5.12 on page 56.
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 4 3 37 6 7 n=6 64 3 2 27 7 6 7 4 5 4 2 1 1 3 2 3 4 4 37 4 4 3 47 6 7 e = 64 3 2 37 6 7 4 3 27 7 6 7 63 2 1 27 7 6 7 3 2 17 5 4 5 3 2 1 3 2 1 1 2 3 4 4 3 37 6 64 3 2 27 6 7 s=6 64 2 1 17 7 6 7 4 5
4 3 2 1
(5.2)
60
Figure 5.14: Output from the equidistant case with the rst set of boundary conditions. The next set of weights is designed to model a system where all of the weights are the same for for entire columns. Figure 5.15 on the next page is the output from the sixteen nodes with the second set of boundary conditions (see Figure 5.16 on page 63 for the boundary conditions.)
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 4 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 2 1 2 47 2 1 27 6 7 6 7 e = 62 1 2 47 2 1 27 6 7 7 62 1 2 47 7 7 6 2 1 27 5 4 5 2 1 2 4 2 1 2 3 2 4 4 4 47 6 64 4 4 47 7 6 s=6 7 64 4 4 47 7 6 5 4
4 4 4 4
(5.3)
61 The fourth set of weights has two \islands" with equal weights around them.
Figure 5.15: Output from the \column" weight set and the second set of boundary conditions.
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 2 4 47 6 7 n=6 64 2 2 47 7 6 7 4 5 4 4 2 4 3 2 3 4 4 4 47 4 4 47 6 7 6 7 e = 62 2 4 47 2 2 47 6 7 7 64 2 2 47 7 7 6 4 2 27 5 4 5 4 4 4 4 4 4 4 2 3 4 2 4 47 6 64 2 2 47 6 7 s=6 64 4 2 47 7 6 7 4 5
4 4 4 4
(5.4)
62 The fth and nal set of weights has one odd shaped island.
2 64 64 6 w=6 64 6 4
4
2 3 4 4 4 47 6 64 1 4 47 6 7 n=6 64 4 4 47 7 6 7 4 5 4 4 4 4 3 2 3 4 4 47 4 4 4 47 6 7 e = 62 3 4 47 6 7 2 3 47 7 6 7 64 4 4 47 7 6 7 4 4 47 5 4 5 4 4 4 4 4 4 4 2 3 4 1 4 47 6 64 4 4 47 6 7 s=6 64 4 4 47 7 6 7 4 5
4 4 4 4
(5.5)
Note that in all of the weight sets the weight between any two nodes was the same in both directions. Although this is logical for this application of using this network for partial di erential equations it is not a limitationof the architecture and is not necessarily desirable for all applications. CY88b] CY88a] While testing the chip it was clear that the clock cycles could be run as fast as the test equipment could generate them (8ns for each phase). This result is logical since the only devices that each clock phase has to drive is one pass transistor and one inverter. I would predict that with faster test equipment the clock phases could be run as fast as 3ns or 4ns. Although each inverter should be able to switch in approximately 1ns, the clock lines are very long and therefore have a large amount of parasitic capacitance which will slow the signal down.
63
3 3 3
2.2 1.6
1 1 1 3 3
2.2 1.6
1 1 1
Voltage Set 1 3 3 3 3 3 3 1 1 3 3
3 3 1.6
2.2 1.6
1 1 1 3 2.2
2.2 1.6
1 1 1
Chapter 6
65 During the testing it became clear that the chip worked as well as expected. The architecture was able to control all of the weights as designed. The time between clock cycles was also as expected (as rapid as the test eqipment could produce.) Most of the solutions generated by the test chip were within one hundredth of a volt from what was predicted by both Spice simulations and the numerical solution of the PDEs. The quality of the results were very similar for twenty di erent combinations of boundary conditions and weights between the neurons. One of the disadvantages of the proposed architecture is that it required digital test equipment to test it. This limits the range over which the chip could be tested. It was due mostly to the need to refresh the dynamic shift registers used to set the weights. Approximately three seconds were available to make the changes to the inputs between clocks, but this is a very short time for a human in a test environment. Of course with an analog test set which allows analog values to be preprogrammed like a digital test set allows digital values to be preprogrammed this limitation could have been avoided. If the chip were used in an embedded system the interface could also avoid this timing problem. In addition the current required during shifts needs to be compensated for, especially if the circuit were to be used at higher voltage levels. Several possibilities exist for solving this problem including using either more power pads and/or bonding wires to the power pads. Extra bonding wires could easily be added, with the right equipment, after fabrication. Thus, with the right equipment, it would still be possible to test the circuit at higher voltages. Another possibility is to use on chip capacitors since it is a switching current. The architecture worked as well as it had been designed to. It allowed each of the weights to be set individually while only using six pins of the chip package. Although as few as two pins could have been used, having six pins made part of the weights' setup occur in parallel. In addition the physical layout of the architecture, which allowed it to be used with an arbitrarily large network, worked properly without increasing the routing outside each cell. This feature is very important since routing is generally considered a major problem with neural networks.
66
67 solutions. The analog I/O problems mentioned above would have to be solved in a physically compact manner for this to be realized. Note that the architecture in this paper would remain useful for other types of neural networks. This particular simpli cation would not work for networks that were not designed for this particular problem (solving PDEs.)
Appendix A
4 3 2 1 11 21 31 41
5 11 21 31
51 61 71 81
7 7 7 7
8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 9
CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL CELL
XC9 90 91 XCA 100 101 XCB 110 111 XCC 120 121 XCD 130 131
5 7 8 9 CELL
68
69
XCE 140 141 XCF 150 151 XCG 160 161 V1 1 V2 2 V3 3 V4 4 V5 5 V7 7 V8 8 V9 9 .OP 0 0 0 0 0 0 0 0 0 1.5 3 5 5 -3 8 -8 151 101 131 161 111 141 1 121 151 5 7 8 9 CELL 5 7 8 9 CELL 5 7 8 9 CELL
.SUBCKT Buf 119 105 111 1 108 * in 119 * out 105 * VGG 111 * VDD 1 * VSS 108 ** NODE: 0 = GND ** NODE: 1 = Vdd ** NODE: 2 = Error ** SPICE file created for circuit buffer ** Technology: scmos ** ** NODE: 0 ** NODE: 1 ** NODE: 2 RLUMP0 100 RLUMP1 100 M0 101 102 RLUMP2 103 RLUMP3 105 RLUMP4 100 = GND = Vdd = Error 101 176.5 102 176.5 1 1 pfet L=8.0U W=8.0U 104 140.5 106 576.0 107 176.5
70
RLUMP5 108 109 2168.5 M1 104 106 107 109 nfet L=8.0U W=8.0U RLUMP6 108 110 2168.5 RLUMP7 111 112 160.5 RLUMP8 103 113 140.5 RLUMP9 108 114 2168.5 M2 110 112 113 114 nfet L=20.0U W=3.0U RLUMP10 115 116 381.5 RLUMP11 100 117 176.5 M3 116 117 1 1 pfet L=8.0U W=8.0U RLUMP12 103 118 140.5 RLUMP13 119 120 31.5 RLUMP14 115 121 381.5 RLUMP15 108 122 2168.5 M4 118 120 121 122 nfet L=8.0U W=8.0U RLUMP16 105 123 576.0 RLUMP17 115 124 381.5 M5 123 124 1 1 pfet L=2.0U W=30.0U RLUMP18 108 125 2168.5 RLUMP19 111 126 160.5 RLUMP20 105 127 576.0 RLUMP21 108 128 2168.5 M6 125 126 127 128 nfet L=2.0U W=30.0U ** NODE: 0 = GND! C0 108 0 36F ** NODE: 108 = Vss ** NODE: 119 = in C1 115 0 42F ** NODE: 115 = 7_34_18# C2 111 0 13F ** NODE: 111 = Vgg C3 103 0 42F ** NODE: 103 = 7_114_36# C4 105 0 90F ** NODE: 105 = out C5 1 0 78F ** NODE: 1 = Vdd! C6 100 0 47F ** NODE: 100 = 7_22_40# *VDD 1 0 8 *GG 111 0 -3 *VSS 108 0 -8 .ENDS .MODEL nfet NMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=1 + VTO=0.8630 DELTA=6.6420E+00 LD=2.4780E-07 KP=4.7401E-05 + UO=562.8 UEXP=1.5270E-01 UCRIT=7.7040E+04 RSH=2.4000E+01 + GAMMA=0.4374 NSUB=4.0880E+15 NFS=1.980E+11 NEFF=1.0000E+00 + VMAX=5.8030E+04 LAMBDA=3.1840E-02 CGDO=3.1306E-10
71
+ CGSO=3.1306E-10 CGBO=4.3449E-10 CJ=9.5711E-05 MJ=0.7817 + CJSW=5.0429E-10 MJSW=0.346510 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -5.4940E-07 .MODEL pfet PMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=-1 + VTO=-0.9629 DELTA=5.7540E+00 LD=3.0910E-07 KP=1.7106E-05 + UO=203.1 UEXP=2.1320E-01 UCRIT=8.0280E+04 RSH=5.6770E+01 + GAMMA=0.6180 NSUB=8.1610E+15 NFS=3.270E+11 NEFF=1.5000E+00 + VMAX=9.9990E+05 LAMBDA=4.5120E-02 CGDO=3.9050E-10 + CGSO=3.9050E-10 CGBO=4.1280E-10 CJ=3.2437E-04 MJ=0.5637 + CJSW=3.3912E-10 MJSW=0.275876 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -4.1580E-07
.END
Appendix B
clk
clkb
out
in
GND
72
73
74 The following is the esim input le for the 1-bit shift register.
V V V w G clk 10 clkb 01 in 00001111001100111100110011110000 in out
The output generated by esim from the previous le is given next. Note that the \clock" is two characters wide so everything is repeated.
Funny looking header line in .sim file. using UCB format 6 transistors, 9 nodes (0 pulled up) >00001111001100111100110011110000:in >X0000111100110011110011001111000:out
Figure B.3 on the next page is the AWB circuit that was used to simulate the amount of signal that would be lost by using an NMOS pass transistor as our tap. It was also used to show that the shift register could drive the tap su ciently. Figure B.4 on page 76 is the \oscilloscope" used to show the output of the simulation.
75
76
Appendix C
in GND
Figure C.1: An 8-bit shift register used to set control lines for the variable resistors.
Figure C.2: Picture of the 8-bit shift register. The following is the esim input le for the shift register.
77
78
V V V w G clk clkb in in 8 64 10 01 0011110000001111 7 6 5 4 3 2 1
The output generated by esim from the previous le is given next. The label \1" is for the last bit which is the farthest from the input to the register and also the rst bit to be entered. Note that the \clock" is two characters wide so everything is repeated, and that the \X"s are expected until the signal has had a chance to propagate.
Funny looking header line in .sim file. using UCB format 48 transistors, 37 nodes (0 pulled up) >0011110000001111001111000000111100111100000011110011110000001111:in >X001111000000111100111100000011110011110000001111001111000000111:8 >XXX0011110000001111001111000000111100111100000011110011110000001:7 >XXXXX00111100000011110011110000001111001111000000111100111100000:6 >XXXXXXX001111000000111100111100000011110011110000001111001111000:5 >XXXXXXXXX0011110000001111001111000000111100111100000011110011110:4 >XXXXXXXXXXX00111100000011110011110000001111001111000000111100111:3 >XXXXXXXXXXXXX001111000000111100111100000011110011110000001111001:2 >XXXXXXXXXXXXXXX0011110000001111001111000000111100111100000011110:1
C.2 Decoder
The 3-to-8 decoder is shown in Figure C.3 on the next page. Note that this circuit only has four output lines. It is actually a 3-to-8 decoder with half of the output lines removed, but logically is a 2-to-4 decoder with an enable line. Figure C.4 on the following page is an image of the same part after fabrication. The following is the esim input le for our decoder.
w V V V G S S0 S1 OUT0 OUT1 OUT2 OUT3 S 00001111 S0 01 S1 0011
79
GND
80
Funny looking header line in .sim file. using UCB format 36 transistors, 23 nodes (0 pulled up) >00001111:S >01010101:S0 >00110011:S1 >00001000:OUT0 >00000100:OUT1 >00000010:OUT2 >00000001:OUT3
Appendix D
Pin Out
Figure D.1 on the next page shows the layout of the bonding pads with their pin numbers while Table D.1 on page 83 lists the logical names associated with each pin of the nal chip. Note that the table is in the form that is used in the packaging with both the rst and last pin on the top row.
81
82
15 16 17 18 19 20 21 22 23 24 25
14
13
12
11
10
5 4 3 2 1 40 39 38 37 36
26
27
28
29
30
31
32
33
34
35
Figure D.1: the layout of the pins on the MOSIS Tiny Chip frame.
83
Pin
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
N. Input at Cell 11 N. Input at Cell 12 W. Input at Cell 11 N. Input at Cell 13 Output Enable N. Input at Cell 14 Output Select 0 E. Input at Cell 14 TB In TR Shift In Output Select 1 TR Out Top Shift In TR In Shift Clock Row 1 Out W. Input at Cell 21 E. Input at Cell 24 and N. Input at TC V Row 2 Out W. Input at Cell 31 V Bottom Shift In E. Input at Cell 34 and W. Input at TC W. Input at Cell 41 Row 3 Out Shift Clock TC N. and S. Shift In TB Output TC E. and W. Shift In Matrix Clock Test Cell Out S. Input at Cell 41 V S. Input at Cell 42 E. Input at Cell 44 and TC S. Input at Cell 43 Row 4 Out Matrix Clock S. Input at Cell 44 and Test Cell
SS DD GG
Logic Name
Logic Name
Pin
40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21
Table D.1: The pin out used for the chip. N, S, E, and W stands for North or Northern, South or Southern, East or Eastern, and West or Western. TC stands for the test cell, TB for the test bu er, and TR stands for the test resistor.
Appendix E
MOSIS Parameters
The following is copied from MOS93], but is only the relevant part of the le.
SCNA20_ORBIT_SPECS] ORBIT ELECTRICAL PARAMETERS 2UM, DOUBLE METAL, DOUBLE POLY, N-WELL CMOS POLY 1 AND POLY 2 ACTIVE GATES POLY 1 / 2 CAPACITORS DEPLETION IMPLANT ADJUST FOR BURIED CHANNEL POTENTIAL
A.1 A.1.1 A.1.2 A.1.3 A.1.4 A.1.5 A.1.6 A.1.7 A.1.8 A.2 A.2.1 A.2.2 A.2.3
Oxide Thicknesses (Angstroms) Poly 1 gate oxide Poly 2 gate oxide Field oxide (Poly 1 & 2 to Sub) Metal 1 to Poly 1 & 2 Metal 1 to Sub Metal 1 to N+/P+ Diff Metal 2 to Metal 1 Poly 1 to Poly 2 Conductors Poly 1 Poly 2 Metal 1
84
85
A.2.4 Metal 1 10500 11500 12500
B. B.1 B.1.1 B.1.2 B.1.3 B.1.4 B.1.5 B.1.6 B.1.7 B.2 B.2.1 B.2.2 B.2.3 B.2.4 B.2.5 B.2.6 B.2.7 B.3 B.3.1 B.3.2 B.3.3 B.3.4 B.3.5 B.3.6 B.3.7 B.4 B.4.1 B.4.2 B.4.3 B.4.4 B.4.5 B.4.6 B.4.7 B.5 B.5.1 B.5.2
TRANSISTOR SPECIFICATIONS P Channel Poly 1 Threshold (volts) Gamma (volts **.) K'=uCox/2 (uA/v**2) (Vds=0.1V, Vgs=2-3V) Punchthrough for min. length channel (volts) Subthreshold slope (volts**-3/decade) Delta width = effective-mask (microns) Delta length = effective-mask (microns) P Channel Poly 2 Threshold (volts) Gamma (volts**.5 K'=uCox/2 (uA/v**2) Punchthrough for min. length channel (volts) Subthreshold slope (volts**-3/decade) Delta width = effective-mask (microns) Delta length = effective-mask (microns) N Channel Poly 1 Threshold (volts) Gamma (volts**.5) K'=ucox/2 (uA/v**2) (Vds=0.1V, Vgs=2-3V) Subthreshold slope (volts**-3/decade) Punchthrough for min. length channel (volts) Delta width = effective-mask (microns) Delta length = effective-mask (microns) N Channel Poly 2 Threshold (volts) Gamma (volts**.5) K'=uCox/2 (uA/v**2) Subthreshold slope (volts**-3/decade) Punchthrough for min. length channel (volts) Delta width = effective-drawn (microns) Delta length = effective-drawn (microns) CCD Channel Potential (volts) Poly 1 Poly 2 VG=0 VG=0 3.0 3.0 5.0 5.0 8.0 8.0 0.7 0.21 18 10 -0.8 1.1 0.3 20 14 -0.4 1.4 0.4 22 16 -0.1 0.5 .15 20 90 10 -0.7 0.75 .25 23 100 14 -0.3 1.0 .35 26 110 16 0 -1.5 0.5 5.0 -16 -1.15 0.6 6.0 -14 -0.8 0.8 7.0 -10 -1.0 .45 6.0 -16 90 -0.7 -0.75 .55 7.5 -14 100 -0.4 -0.5 .65 8.5 -10 110 -0.1
-0.8
-0.5
-0.2
86
B.6 B.6.1 B.6.2 B.6.3 B.6.4 B.6.5 B.6.6 B.6.7 B.6.8 B.6.9 B.6.10 NPN Transistor in the N-well Beta= 80 to 200 at IB = 1 uA BVEBO = 10 V BVCEO > -10 V BVCES > 10 V BVCBO > -60 V P-base Xj 0.45 to 0.50 micron N+emitter Xj 0.3 micron Rcollector 1.0 +/- 0.2 Kohm/sq P-base resistance 1.2 +/- 0.2 Kohm/sq Early Voltage > 30 V
SHEET RESISTANCES (OHMS PER SQUARE) P+ Active N+ Active N Well (with field implant) Poly1 Poly2 Metal1 Metal2
CONTACT RESISTANCE (OHMS) Metal1 Metal1 Metal1 Metal1 Metal1 to to to to to P+ Active N+ Active Poly1 Poly2 Metal2 (single contact 2 by 2um) 35 20 20 20 0.4 75 50 50 50 0.7
E. E.1.1 E.1.2 E.2 E.3 E.4.1 E.4.2 E.5 E.6 E.7.1 E.7.2 E.8
FIELD INVERSION AND BREAKDOWN VOLTAGES (VOLTS) N Channel Poly1 field inversion N Channel Poly2 field inversion N Channel Metal1 field inversion N Channel Metal2 field inversion Channel Poly1 field inversion P Channel Poly2 field inversion P Channel Metal1 field inversion P Channel Metal2 field inversion N Diffusion to substrate junction breakdown P Diffusion to substrate junction breakdown N-well to P- sub junction breakdown 10 10 10 14 14 14 -14 -14 -14 14 15 50 -10 -10 -10 16 18 90
87
GATE OXIDE PLATE POLY1 GATE OXIDE PLATE POLY2 FIELD POLY1 TO SUBS FRINGE FIELD POLY2 TO SUBS FRINGE POLY1 TO POLY2 OVER ACTIVE POLY1 TO POLY2 OVER FIELD METAL1 TO ACTIVE PLATE METAL1 TO ACTIVE FRINGE METAL1 TO SUBS PLATE METAL1 TO POLY PLATE METAL1 TO POLY FRINGE METAL2 TO ACTIVE PLATE METAL2 TO ACTIVE FRINGE METAL2 TO SUBS PLATE METAL2 TO SUBS FRINGE METAL2 TO POLY PLATE METAL2 TO POLY FRINGE METAL2 TO METAL1 PLATE METAL2 TO METAL1 FRINGE
Equiv. Thickness MIN MAX ----370 Ang 430 Ang 470 Ang 530 Ang
13500 Ang 15500 Ang 8000 Ang 9000 Ang 14500 Ang 17500 Ang 19500 Ang 22000 Ang 14500 Ang 17500 Ang 6000 Ang 7500 Ang
88
N32A.PRM]
MOSIS PARAMETRIC TEST RESULTS ----------------------------RUN: N32A / ALINE TECHNOLOGY: SCNA I. VENDOR: ORBIT FEATURE SIZE: 2.0um
INTRODUCTION. This report contains the lot average results obtained by MOSIS from measurements of the MOSIS test structures placed on this fabrication run. The SPICE parameters obtained from similiar measurements on a representative wafer from this run are also attached.
COMMENTS: II. TRANSISTOR PARAMETERS: W/L N-CHANNEL P-CHANNEL UNITS -----------------------------------------------------------------------------Vth (Vds=.05V) 3.0/2.00.934 0.996 V Vth (Vds=.05V) Idss (Vgds=5V) Vpt (Id=1.0uA) Vth (Vds=.05V) Vbkd (Ij=1.0uA) Gamma (2.5v,5.0v) 18.0/2.00.839 0.971 V 2698.0 -1383.0 uA *************** *************** V 50.0/50.00.865 15.9 0.174 0.961 -15.3 0.692 V V V^0.5
26.4
-9.2
uA/V^2
III. FIELD OXIDE TRANSISTOR SOURCE/DRAIN SOURCE/DRAIN PARAMETERS: GATE N + ACTIVE P + ACTIVE UNITS ----------------------------------------------------------------------------Vth (Vbs=0,I=1uA) Poly 16.1 -13.1 V Vth (Vbs=0,I=1uA) Metal1 27.6 -36.3 V Vth (Vbs=0,I=1uA) Metal2 49.1 -62.8 V COMMENTS: IV. PROCESS N P N P N METAL METAL
89
PARAMETERS: POLY POLY DIFF DIFF WELL 1 2 UNITS -----------------------------------------------------------------------------Sheet Resistance 22.8 22.0 27.2 61.6 2459.0 0.047 0.028 Ohm/sq Width Variation 0.225 0.223 0.490 0.317 ---0.088 0.322 um (Measured - Drawn) Contact Resist. 12.55 14.58 30.84 41.65 ------0.033 Ohms (Metal1 to Layer) Gate Oxide Thickness: COMMENTS: V. CAPACITANCE N P METAL METAL PARAMETERS: POLY DIFF DIFF 1 2 UNITS -----------------------------------------------------------------------------Area Cap 0.058 0.119 0.346 0.026 0.016 fF/um^2 (Layer to subs) Area Cap ---------0.039 0.020 fF/um^2 (Layer to Poly) Area Cap ------------0.034 fF/um^2 (Layer to Metal1) Fringe Cap ---0.527 0.263 ------fF/um (Layer to subs) COMMENTS: VI. CIRCUIT PARAMETERS: -----------------------------------------------------------------------------Vinv, K = 1 0.00 V Vinv, K = 1.5 0.00 V Vlow, Vhigh, Vinv, Gain, K K K K = = = = 2.0 2.0 2.0 2.0 0.00 V 5.00 V 2.48 V -11.18 37.37 MHz ( 31 stages @ 5.0V)
----
----
423.
409.
----
----
----
Angst.
N32A
.MODEL CMOSN NMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=1 + VTO=0.8630 DELTA=6.6420E+00 LD=2.4780E-07 KP=4.7401E-05 + UO=562.8 UEXP=1.5270E-01 UCRIT=7.7040E+04 RSH=2.4000E+01 + GAMMA=0.4374 NSUB=4.0880E+15 NFS=1.980E+11 NEFF=1.0000E+00 + VMAX=5.8030E+04 LAMBDA=3.1840E-02 CGDO=3.1306E-10
90
+ CGSO=3.1306E-10 CGBO=4.3449E-10 CJ=9.5711E-05 MJ=0.7817 + CJSW=5.0429E-10 MJSW=0.346510 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -5.4940E-07 .MODEL CMOSP PMOS LEVEL=2 PHI=0.600000 TOX=4.1000E-08 XJ=0.200000U TPG=-1 + VTO=-0.9629 DELTA=5.7540E+00 LD=3.0910E-07 KP=1.7106E-05 + UO=203.1 UEXP=2.1320E-01 UCRIT=8.0280E+04 RSH=5.6770E+01 + GAMMA=0.6180 NSUB=8.1610E+15 NFS=3.270E+11 NEFF=1.5000E+00 + VMAX=9.9990E+05 LAMBDA=4.5120E-02 CGDO=3.9050E-10 + CGSO=3.9050E-10 CGBO=4.1280E-10 CJ=3.2437E-04 MJ=0.5637 + CJSW=3.3912E-10 MJSW=0.275876 PB=0.800000 * Weff = Wdrawn - Delta_W * The suggested Delta_W is -4.1580E-07
Appendix F
Sample LV500 le
The following is a complete LV500 program used to perform the test described in the variable case example throughout this thesis. It was extracted from the LV500 as a \msa" le. Each set of weights requires its own le, but only one is given here to save space. Note that the chip was shown to run much faster than this le indicates, but when reading the solution with a scope this speed is more practical.
v64 setup version 0 1 1 /* config section */ resolution = 20ns dev_supply_voltage = 5.00v dev_supply_current = 0.20a term_supply_voltage = 3.00v force_high_family_v1 = 5.00v force_low_family_v1 = 0.50v compare_family_v1 = 1.40v force_high_family_v2 = 4.50v force_low_family_v2 = 0.50v compare_family_v2 = 2.50v sector_logic_family = { v1, v1, v1, v1, , , , }
91
92
/* group section */ group "Main_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a signal "Top_Shift_In" { dut = "U_7" sector = 0h0 channel = 0hc } signal "Bottom_Shift_In" { dut = "U_12" sector = 0h1 channel = 0h6 } } group "Output" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0c signal "Output_Enable" { dut = "U_3" sector = 0h0 channel = 0h4 } signal "Output_Select_0" { dut = "U_4" sector = 0h0 channel = 0h6 } signal "Output_Select_1" { dut = "U_6" sector = 0h0 channel = 0ha } } group "TRIGGER" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0d signal "TRIGGER" { dut = "TRIG"
93
sector = 0h2 channel = 0hf } } group "Shift_Clock" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0a signal "Shift_Clock" { dut = "U_14" sector = 0h1 channel = 0ha } indep_signal "Shift_Clock_Bar" { dut = "U_8" sector = 0h0 channel = 0he phase = 0b force_fmt = r0 compare_fmt = edge_t } } group "Matrix_Clock" { radix = bin force_fmt = r0 compare_fmt = edge_t phase = 0a signal "Matrix_Clock" { dut = "U_16" sector = 0h1 channel = 0he } indep_signal "Matrix_Clock_Bar" { dut = "U_20" sector = 0h2 channel = 0h6 phase = 0b force_fmt = r0 compare_fmt = edge_t } } group "TC_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a
94
signal "TC_NS_Shift_In" { dut = "U_27" sector = 0h1 channel = 0hb } signal "TC_EW_Shift_In" { dut = "U_26" sector = 0h1 channel = 0hd } } group "TR_Input" { radix = bin force_fmt = dnrz_l compare_fmt = edge_t phase = 0a signal "TR_Shift_In" { dut = "U_36" sector = 0h0 channel = 0h9 } } /* template section */ template "template_0" { cycle = 1000ns phase 0a {delay = 0ns width = phase 0b {delay = 300ns width phase 0c {delay = 0ns width = phase 0d {delay = 800ns width group "Main_Input" { function = force } group "Output" { function = force } group "TRIGGER" { function = force } group "Shift_Clock" { function = force signal "Shift_Clock_Bar" { function = force } } group "Matrix_Clock" { function = force signal "Matrix_Clock_Bar" {
95
function = force } } group "TC_Input" { function = force } group "TR_Input" { function = force } } template "TRIGGER" { cycle = 1000ns phase 0a {delay = 0ns width = phase 0b {delay = 300ns width phase 0c {delay = 0ns width = phase 0d {delay = 800ns width group "Main_Input" { function = mask } group "Output" { function = mask } group "TRIGGER" { function = force } group "Shift_Clock" { function = mask signal "Shift_Clock_Bar" { function = mask } } group "Matrix_Clock" { function = mask signal "Matrix_Clock_Bar" { function = mask } } group "TC_Input" { function = mask } group "TR_Input" { function = mask } } /* schmoo define section */ schmoo_var_x = not_selected schmoo_var_y = not_selected
96
/* macro section */ macro shift_0() { * "This macro will shift "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 "template_0" 00 000 0 11 } macro shift_1() { * "this macro will shift "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 "template_0" 11 000 0 11 } macro col_to_4() { * "set a whole column shift_1 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 shift_0 "template_0" 00 000 0 }
in 00 00 00 00 00 00 00 00
all 0" 00 0 00 0 00 0 00 0 00 0 00 0 00 0 00 0
in 00 00 00 00 00 00 00 00
all 1" 00 0 00 0 00 0 00 0 00 0 00 0 00 0 00 0
to 4" 00 11 00 0 00 11 00 0 00 11 00 0 00 11 00 0 00 11 00 0
macro mat_clk() { * "Clock the matrix" "template_0" 00 000 0 00 11 00 0 } /* define format info */
define_format { }
97
/* pattern section */ pattern * "This is a "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "shift_0" "mat_clk" * "start col "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0"
tesst of 00 000 0 00 000 0 00 000 0 00 000 0 00 000 0 10 000 0 00 000 0 10 000 0 00 10 00 10 10 00 10 00 10 00 10 00 00 01 00 01 01 01 01 01 01 00 01 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3" 00 00 00 00 00 00 10
0 0 0 0 0 0 0
11 11 11 11 11 11 11
00 00 00 00 00 00 00
00 00 00 00 00 00 00
0 0 0 0 0 0 0
98
"template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "shift_0" "mat_clk" * "start col "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "template_0" "mat_clk" 10 000 0 11 00 00 0 00 00 10 10 10 10 00 00 11 10 01 00 00 00 01 01 00 01 00 01 01 01 00 00 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2" 00 00 10 10 10 10 10 10 11 10 01 00 00 00 01 01
000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
99
"template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 01 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" * "start col 1" "template_0" 11 000 0 "template_0" 10 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "template_0" 10 000 0 "template_0" 11 000 0 "template_0" 11 000 0 "mat_clk" "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "template_0" 01 000 0 "template_0" 00 000 0 "template_0" 00 000 0 "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" "shift_0" "mat_clk" WAIT "100000" "TRIGGER" 00 000 1 00 "template_0" 00 100 0 WAIT "400000" "template_0" 00 110 0 WAIT "400000" "template_0" 00 101 0 WAIT "400000" "template_0" 00 111 0 WAIT "400000" "template_0" 00 000 0 * "END OF PROGRAM" 11 11 11 11 11 11 11 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0 0 0 0 0 0 0 0
11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0 00 00 00 0
Bibliography
AH87a] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design, section 5.2. In Electrical and Computer Engineering AH87c], 1987. AH87b] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design, section 8.3. In Electrical and Computer Engineering AH87c], 1987. AH87c] Phillip E. Allen and Douglas R. Holberg. CMOS Analog Circuit Design. Electrical and Computer Engineering. Holt, Rinehart and Winston, Inc., New York, 1987. CY88a] L. O. Chua and L. Yang. Cellular neural networks: Applications. IEEE Transactions on Circuits and Systems, 35(10):1273{1290, October 1988. CY88b] L. O. Chua and L. Yang. Cellular neural networks: Theory. IEEE Transactions on Circuits and Systems, 35(10):1257{1272, October 1988. GAS90a] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits, section 5.2. In Electrical Engineering GAS90c], 1990. GAS90b] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits, page 213. In Electrical Engineering GAS90c], 1990. GAS90c] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for Analog and Digital Circuits. Electrical Engineering. McGraw-Hill Publishing Company, New York, 1990. GZ93] D. Gobovic and M. E. Zaghloul. Design of locally connected CMOS neural cells to solve the steady-state heat ow problem. In Proceedings of the IEEE 36th Midwest Symposium 100
101
on Circuits and Systems, Detroit, August 1993. The Institute of Electrical and Electronics
D. Gobovic and M. E. Zaghloul. Analog cellular neural network with application to partial di erential equations with variable mesh-size. In Proceedings of the IEEE International symposium on circuits and systems, London, May 1994. The Institute of Electrical and Electronics Engineers, Inc. Kai Hwang and Faye A. Briggs. Computer Architecture and Parallel Processing. McGrawHill Publishing Company, New York, 1984. M. Ismail. Analog VLSI: Signal and Information Processing, chapter 16. McGraw-Hill, Inc., New York, 1994. Andrew Ronald Mitchell. The Finite Di erence Method in Partial Di erential Equations, chapter 3. John Wiley & sons Ltd., 1980.
MOS88] The Information Science Institute of the University of Southern California USC/ISI in Marina del Ray, California. MOSIS Users Manual, 1988. MOS93] Orbit electrical parameters. FTPed from ftp.mosis.edu in the le named scna20-orbittech.inf which can be found in the directory /pub/mosis/vendors/orbit-scna20, 22 July 1993. MP43] W. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin Mathematical Biophysics, 35(5):115{133, 1943.
OR70a] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables, section 4.4. In Computer Science and Applied Mathematics OR70b], 1970. OR70b] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Computer Science and Applied Mathematics. Academic Press, New York, 1970. OR70c] J. M. Ortega and W. C. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables, section 5.4. In Computer Science and Applied Mathematics OR70b], 1970.
102 Ric83] TH86] John R. Rice. Numerical Methods, Software and Analysis: IMSL Reference Edition, section 10.1. McGraw-Hill Book Company, New York, 1983. D. W. Tank and J. J. Hop eld. Simple `neural' optimization networks: An A/D converter signal decision circuit, and a linear programming circuit. IEEE Transactions on Circuits and Systems, CAS-33(5):533{541, May 1986. J. M. Zurada. Introduction to Arti cial Neural Systems. West Publishing Company, 1992.
Zur92]