A Digital Simulation Scheme For Real-Time Systems By

Stephen Lesavich

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science

in

Computer Science

at

The University of Wisconsin-Milwaukee November 1986

Copyright © 1986 by Stephen Lesavich.

All rights reserved.

A Digital Simulation Scheme For Real-Time Systems

By

Stephen Lesavich

A Thesis Submitted in Partial Fulfillment of the

Requirements for the Degree of

Master of Science

in

Computer Science

at

The University of WisconSin-Milwaukee

November 1986

Major Professor

Date

Graduate School Approval

Date

ii

A Digital Simulation Scheme For Real-Time Systems

By

Stephen Lesavich

The University of Wisconsin-Milwaukee, 1986

Under the Supervision of Professor Leonard P. Levine

Abstract

The quantization errors that affect the practical implementation of a real-time digital system using fixed-point arithmetic are examined. A digital simulation scheme written in the C programming language suitable for use on a microcomputer is presented to help analyze these quantization errors. The digital simulation scheme presented is an interactive program that includes a graphics module that allows the user to change the characteristics of system components to show their affect on overall system performance. The simulation is used to analyze both a digital filtering and a computer controlled system. The results obtained from the digital simulation are compared to results obtained experimentally using actual real-time systems.

Major Professor

Date

iii

Table of Contents

Chapter

Title Page

Approval Page Abstract Acknowledgements Table of Contents List of Figures

Chapter 1

1.0 Introduction To Digital Signal Processing 1.1 Digital Filtering Systems

1.2 Digital Filter Definitions

1.3 Conventional Digtial Filter Realization Forms 1.4 State Space Representation of Digital Filters 1.5 Digital Filter Design Methods

1.6 Summary of Chapter 1

Chapter 2

2.0 Quantization Effects in Digital Filters 2.1 Signal Conversion Quantization

2.2 ADC Quantization Effects

2.3 DAC Quantization Effects

2.4 CoeffiCient Quantization

2.5 Arithmetic Quantization

2.6 Realization Form Quantization Errors 2.7 Summary of Chapter 2

Chapter 3

3.0 Historical Review of the Literature 3.1 Early Research 1965-1970

3.2 Papers On Coefficient Quantization Effects 3.3 Papers On Arithmetic Quantization Effects

ii iii iv

v

viii

1 2 2 6 7 8 8

10 11 11 14 15 20 24 27

29 29 31 33

3.4 Quantization Effects in State Space Representation 34

3.5 Quantization Effects in Other Digital Filter Represenatations 35

3.6 Survey Papers 36

v

Chapter

3.7 Summary of Chapter 3

Chapter 4

4.0 Digital Simulation of Real-Time Systems 38 4.1 Justification for Developing a New Digital Simulation Scheme 39 4.2 Microcomputer Simulation Scheme 40

Page 37

4.3 Programming Implementation 42

4.4 Simulation Charateristics 43

4.5 Digital Filtering Simulation 47

4.6 Closed-Loop Control Simulation 50

4.7 Graphics Routines 53

4.8 Summary of Chapter 4 56

Chapter 5

5.0 Digital Simulation Examples 58

5.1 Digital Filter T = 0,010 sec 58

5.2 Digital Filter Coefficient Quantization T = 0.010 sec 63

5,3 Digital Filter Interface Quantization T = 0.010 sec 68

5.4 Digital Filter T = 0,005 sec 70

5,5 Digital Filter Coefficient Quantization T = 0.005 sec 71

5.6 Digital Filter Interface Quantization T = 0.005 sec 72

5.7 Closed-Loop Control 74

5.8 Closed-Loop Control Coefficient Quantization 75

5.9 Closed-Loop Control Interface Quantization 76

5.10 Arithmetic Quantization 77

5.11 Summary of Chapter 5 78

Chapter 6

6,0 Real-Time System Verification 107

6.1 Real-Time System Hardware Configuration 107

6,2 Real- Time System Software Configuration 108

6.3 Real- Time Software Scheme 109

6.4 PDP-11 Assembly Interface Routines 110

6.5 Whitesmiths C Algorithm Implementation 112

vi

Chapter ~

6.6 Verification of Digital Filter Simulation Results T = 0.010 114

6.7 Verification of Digital Filter Simulation Results T = 0.005 115

6.8 Verification of Closed-Loop Control Simulation Results 116

6.9 Summary of Chapter 6 117

Chapter 7

7.0 Summary of Chapters 1-6 135

7.1 Suggestions for Improvement of the Simulation Scheme 135

Appendix

A Digital Simulation Program Source Code 137

B Digital Simulation Input/Output Examples 190

C Real-Time System Program Source Code 195

List of References 214

vii

List of Figures

Chapter ~

Chapter 2

2.1 ADC Quantization Example 12

Chapter 4

4.1 Block Diagram of Simulation Digital Filtering System 40

4.2 Block Diagram of Simulation Closed-loop Control System 41

4.3 Typical Simulation Digital Filter Graphics Screen Dump 57

Chapter 5

5.1 Block Diagram of Simulation Digital Filtering System 58

5.2 Plot of Magnitude vs. Frequency for 4th Order Analog Filter 80

5.3 Plot of Phase Angle vs. Frequency for 4th Order Analog Filter 80

5.4 Plot of Magnitude vs. Frequency for Digital Filter T = 0.010 sec 81

5.5 Plot of Phase Angle vs. Frequency for Digital Filter T = 0.010 sec 81

5.6 Plot of Digital Filter T = 0.010 sec, Floating-point Coefficient Storage 82

5.7 Plot of 6-bit Representation of Digital Filter T = 0.010 sec 82

5.8 Plot of 8-bit Representation of Digital Filter T = 0.010 sec 83

5.9 Plot of 12-bit Representation of Digital Filter T = 0.010 sec 83

5.10 Plot of 16-bit Representation of Digital Filter T = 0.010 sec 84

5:11 Plot of Coefficient Quantization RMS Error Digital Filter T = 0.010 sec 84

5.12 Plot of ADC Quantization RMS Error Digital Filter T = 0.010 sec 85

5.13 Plot of DAC Quantization RMS Error Digital Filter T = 0.010 sec 85

5.14 Plot of Magnitude vs. Frequency for Digital Filter T = 0.005 sec 86

5.15 Plot of Phase Angle vs. Frequency for Digital Filter T = 0.005 sec 86

5.16 Plot of Digital Filter T = 0.005 sec, Floating-point Coefficient Storage 87

5.17 Plot of 10-bit Representation of Digital Filter T = 0.005sec 87

5.18 Plot of 12-bit Representation of Digital Filter T = 0.005 sec 88

5.19 Plot of 14-bit Representation of Digital Filter T = 0.005 sec 88

5.20 Plot of 16-bit Representation of Digital Filter T = 0.005 sec 89

5.21 Plot of Coefficient Quantization RMS Error Digital Filter T = 0.005 sec 89 5.22 Plot of ADC Quantization RMS Error Digital Filter T = 0.005 sec 90

viii

Cha~ter ~

5.23 Plot of DAC Quantization RMS Error Digital Filter T :::: 0.005 sec 90

5.24 Block diagram of Simulation Closed-Loop Control System 74

5.25 Plot of Closed-Loop Control System T:::: 0.1 sec 91

Chapter 6

6. 1 Real-Time Software Scheme Flow Chart

6.2 Real-Time System Output of Digital Filter T = 0.010 sec 6.3 Real-Time System Output of 6-bit Representation of the

Digital Filter T:::: 0.010 sec

6.4 Real- Time System Output of 8-bit Representation of the Digital Filter T = 0.010 sec

6.5 Real-Time System Output of 12-bit Representation of the Digital Filter T = 0.010 sec

6.6 Real-Time System Output of 16-bit Representation of the Digital Filter T = 0.010 sec

6. 7 Real-Time System Output of 6-16 bit Representations of the Digital Filter T:::: 0.010 sec

6.8 Real-Time System Output of Digital Filter T = 0.005 sec 6.9 Real-Time System Output of 10-bit Representation of the Digital Filter T = 0.005 sec

6.1 0 Real-Time System Output of 12-bit Representation of the Digital Filter T = 0.005 sec

6.11 Real-Time System Output of 14-bit Representation of the Digital Filter T = 0.005 sec

6.12 Real-Time System Output of 16-bit Representation of the Digital Filter T = 0.005 sec

6.13 Real-Time System Output of Digital Controller T = 0.1 sec 6.14 Real-Time System Output of 6-bit Representation of the Digital Controller T = 0.1 sec

6. 15 Real-Time System Output of 8-bit Representation of the Digital Controller T = 0.1 sec

118 119

120

121

122

123

124 125

126

127

128

129 130

131

132

ix

Chapter

6.16 Real-Time System Output of 12-bit Representation of the

Digital Controller T = 0.1 sec 6.17 Real-Time System Output of 16-bit Representation of the

Digital Controller T = 0.1 sec Appendix B

B.1 Input Sequence for the Digital Filtering System Simulation 190

133

134

B.2 Simulation Graphics Screen Dump of Digital Filtering System 191

B.3 Input Sequence for the Closed-Loop Control Simulation 192

B.4 Simulation Graphics Screen Dump of Closed-Loop Control System 194

x

1

1.0 Introduction to Digital Signal Processjng

Signal processing has become an important tool in many diverse fields of science and engineering. A signal can be defined as a "function that conveys information about the state of a physical system" [66]. Since most signals must be processed to extract useful information, a large number of signal processing techniques have been developed. These techniques usually take the form of a transformation. A signal is transformed into another more desirable signal suitable for a given application. For example, a signal may contain disturbances (noise) mixed in with the useful information. It then becomes necessary to process the signal to block (reject) the noise. The system or device that accomplishes this is called a filter.

In control systems. as another example of signal transformation, it often becomes necessary to modify the dynamics of a given plant by closed-loop control. The system or device that performs the desirable changes is called a controller or compensator.

In the past most of the signal transformations done by filters and controllers were accomplished by analog hardware. However, the increased speed and decreased cost of modern microprocessors have permitted the digital computer to be used to implement signal processing algorithms in software in real time. [82]

The software implementation of signal processing algorithms offers several advantages [79]. First it offers flexibility, because the characteristics of the transform algorithm can easily be changed by reading in a new set of parameter values. Second, a software implementation results in a more reliable system, because there are no physical components to age or deteriorate. Flnally. a greater degree of accuracy can be obtained with programing a digital signal processing algorithm, since the parameters are controlled by the designer who can adjust them to obtain a system that best meets the problem specification without re-designing the hardware.

The software implementation of digital signal processing algorithms does have limitations that have to be considered. There are inaccuracies and errors that are introduced when a digital filter or digital controller is implemented on a digital computer with finite wordlength and finite arithmetic elements. These errors are retered to as quantization errors will be

2

considered in depth in this paper.

1.1 Digital Filtering Systems

Signals are represented as functions of one or more independent variables. In most cases, signals are functions of an independent variable time, or t. If the independent variable t is allowed to take on a continuum of values the signal is said to be a continuous-time signal. A continuous (or continuous-time) system is a system whose inputs and outputs are continuous (analog) signals. On the other hand, discrete-time (digital) signals are generated by uniformly sampling a continuous-time signal every T seconds. The constant T represents the sampling period. Discrete-time signals are therefore represented as sequences of numbers, and these Signals are defined only at discrete instants of time. If the sampling frequency, 1fT, is small enough based on the Sampling Theorem, the continuous-time signal can be re-constructed from the discrete samples with little (or no) loss of information. A discrete (or discrete-time) system then, is a system whose inputs and outputs are discrete signals.

A digital filter is a discrete system that operates on a discrete input signal, to produce a discrete output Signal according to a specified algorithm. [82] With analog-to-digital (ADC), and digital-to-analog (DAC) converter interfaces around a digital computer, a digital filter can also be used to process analog signals. A continuous-time signal is uniformly sampled every T seconds with an ADC and each voltage amplitude value is converted to a binary number. This digital (discrete-time) signal can now be processed by the signal processing algorithm programmed on the digital computer to achieve the desired results. At the output of the computer, the digital signal is converted back to analog form by a DAC. The numerical content of a computer register is converted to an analog voltage which the DAC holds constant until the content of the register is updated. Most real-time digital filter/control systems are based on the configuration just described.

1.2 Digital Filter pefinitions

A digital filter can be represented by an Nth order difference equation as

3

n n

Yk = L. ai x k-i - L. bi Y k-i •

(1.1 )

1=0 1=1

Thus a digital filter is merely a linear combination of equally spaced input samples xk-i of some signal Xt together with the computed values of the

output Yk-i [26]. The current output Yk is said to be a linear combination of the input elements xk • xk-1' xk-2 •... xk-n if and only if

(1.2)

where a1.a2.a3 •... are constants independent of the input elements xk-i> and at least one of the ai;t O. [10] The filter coefficients ai and bi of Eq. (1.1) are assumed to be constants and unvarying with time. Such filters are called linear time-invariant constant coefficient filters.

If at least one bi and one ai coefficient in Eq. (1.1) are non-zero. the digital filter is said to be recursive. that is. the present value of the output depends both on the present and previous value of the input. as well as on the

previous values of the output. If all the bi coefficients of Yk-i are zero. the filter is called non recursive. The output of a nonrecursive filter depends only on the present and past values of the input.

A digital filter whose impulse response has an infinite number of samples is known as an Infinite Impulse Response (IIR) filter. A digital filter whose impulse response is of finite duration is called a Finite Impulse Response (FIR) filter. IIR filters are generally implemented by a recursive realization like Eq. (1.1). FIR filters are usually implemented by a nonrecursive structure and can be represented by

n

Yk = L ai x k-i • 1=0

(1.3)

FIR filters operate on a predetermined number of input samples. considered

4

to be a data window.

The frequence response of a digital filter refers to the steady state response due to a sinusoidal input of angular frequency w. The angular frequency w is related to the rotational frequency f by w = 21tf. The frequency response gives the transmission of the digital filter for every value of w. The frequency response is completely described by the magnitude (input-to-output) ratio and the phase angle. The bandwidth of a digital filter is defined as the frequency range from w=O to the point at which the magnitude response is 0.707 of, or 3db below its zero frequency or steady-state value. It is also called the passband because it describes the range of frequencies of the input signal over which a system performs satisfactorily. The cutoff frequency Wc is defined as the frequency at which the magnitude ratio is 3db

below its zero frequency value. Thus. Wc specifies the bandwidth of the system.

A low-pass filter is a system which passes sinusoidal inputs whose radian frequencies that fall in the range 0 ~ w ~ wc. and rejects sinusoidal inputs

whose frequency are greater than Wc the cutoff frequency.

A high-pass filter is a system which rejects sinusoidal inputs whose frequencies are smaller than wc' and passes sinusoidal inputs whose

frequencies are greater then wc'

A band-pass filter is a system which passes sinusoidal inputs of frequencies in the range w1~ w s w2. and rejects sinusoidal inputs whose frequencies lie outside this range.

A band-reject filter is a system which rejects sinusoidal inputs whose

frequencies lie in the range w1 s w s w2' and pass all sinusoidal inputs whose frequencies fall outside this range.

The z transform will be used to represent and manipulate the discrete sequence f(k). Given a discrete sequence f(k). the z transform of f(k) is defined as

00

F(z) = L f(k) z -k k=O

where z ;::; esT,

(1.4)

5

and T is the sampling time of the system. Expanding the summation form of F(z) we can also write the z transform of f(k) as

F(z) = f(O) + f(1)z-1 + f(2)z-2 .. , f(n)z-n_

(1.5)

If xk is the input to a discrete digital filter with impulse response 9k. and Yk is the output. then the input and the output are related by

Y(z) = X(z)G(z).

(1.6)

where X(z). G(z) and Y(z) are the z transforms of x(k). g(k) and y(k)

respectively. It can easily be shown that

Y(z)

G(z) =

(1.7)

X(z)

G(z) is called the system transfer function or the pulse transfer function. Given any Nth order difference equation given by Eq. (1.1). then the corresponding discrete transfer function can easily be obtained by z transforming both sides of the difference equation and obtaining

Y(z) ao + a1 z-1 + ... anz-n

G(z) = ------ = -----------------------------

X(z) 1 + b1Z-1 + ... bnz-n

(1.8)

in factored form as

Y(z) aO(z-z1 )(z-z2) ... (z-Zm)

G(z) = -------- = ------------------------------.

X(z) (z-P1 )(z-P2) ... (z-Pn)

(1.9)

or in a closed form expression

6

n

L aiz-i

Y(z) 1=0

G(z) = ------- = ----------------------.

X(z) n

1 + L bjz-i

1=1

(1.10)

A typical way of displaying information about digital filters that the z transform contains is in terms of the poles and zeros of the transfer function G(z). The roots of the numerator polynomial are values of z for which G(z) = O. and are refereed to as the zeros of G(z). Values for which G(z) are infinite are refereed to as the poles of G(z). The poles of G(z) for finite values of z are the roots of the denominator polynomial. [66] The zeros and poles of Eq. (1.9) are represented by z1. z2 .... zm and P1, P2 .... Pn respectively.

A digital system is said to be stable if every bounded input produces a bounded output. If g(k) is the response of a discrete system to a digital impulse (impulse response), then the discrete system is stable if and only if

00

L Ig(k)1 < 00.

k= ....,

(1.11 )

The stability of a digital system as well as its performance characteristics can be determined from the location of its poles in the complex z plane. A linear discrete system is stable if and only if all of its poles are inside the unit circle on the z plane. A discrete system is unstable if the transfer function G(z) has poles outside the unit circle. A critically-stable or oscillatory system has simple poles on the unit circle. but repeated poles on the unit circle constitutes an unstable system.

1.3 Conyentional Digital Filter Realization Forms

There are many ways to realize a digital filter. but there three basic forms that are commonly used. If the digital filter is derived from the direct relation to the z transform, the filter is said to be in the direct form. The direct from of Eq. (1.10) uses separate direct paths for both the numerator and denominator

7

terms. If the transfer function of (1.10) is written as a partial fraction expansion of first and second-order terms,

K

G(z) = L Gi(z)

(1.12)

i=1

the entire filter may be visualized as a parallel connection of the simpler filters Gi(z). In this case the filter is said to be realized in the parallel form. If G(z) is written as the product of first and second order factors

K

G(z) = IT Gi(z),

(1.13)

i=1

then the filter can be visualized as a cascade of simpler filters, the original filter is then said to be in the cascade form. The practical implementation of digital filters are very dependent on the precise digital filter structure. The advantages and disadvantages of these realization forms will be discussed in detail later in this paper.

1.4 State Space Representations of Digital Filters

A digital filter can also be described with state space techniques. The state space or state variable method allows an Nth order system to be represented by a set of N simultaneous, first-order differential or difference equations given in matrix form. In the conventional representations of a digital filter described above, only the relationships between the input and output signals are known. In contrast, the state-space representation gives a total description of the internal as well as the external variables of the system. The state space method can also be used to describe multiple input and multiple output systems. The dynamic equations of a state space representation are

!(k+ 1) = A!(k) + By(k) y(k) = C!(k) + Dy(k)

(1.14) (1.15)

For an Nth order system with p inputs and q outputs, A(k) is the vector of n state variables • .l.I.(k) is a vector of p inputs. and A and B are coeffiCient matrices, refereed to as the (nxn) system matrix and the (nxp) driving or input matrix respectively. In the second dynamic equation, y(k) is the vector of q

8

outputs, C denotes the (qxn) output matrix, and D the (qxp) transmission matrix. [82] State space representations are synonymous with "modern" systems theory and their matrix form makes this representation ideal to obtain numerical solutions for systems on a digital computer. Consequently, state space techniques have recently become a popular way to represent digital filters and control systems.

1.5 Digital Filter Design Methods

The design of digital filters is usually carried out by one of two methods.

The first is to design an analog filter that meets the specifications for a given application. Once the analog filter transfer function is obtained, it is converted to a digital filter transfer function by some approximation technique. The justification for this method is that the design techniques for analog filters are highly developed and analog filters for many applications have already been developed. The second method used is to design the digital filter in the discrete frequency or discrete time domain directly. The advantage of this method is that the exact location of the poles and zeros of the digital filter can be positioned to achieve the desired frequency response. The first method described above, obtaining an analog filter transfer function, then converting it to a digital filter transfer function by some approximation technique, will be the method used throughout this paper.

There are several popular methods for obtaining a digital filter transfer function from an analog filter transfer function. The technique selected is based on the response characteristics desired. For example, if the impulse response of a digital filter G(z) is to be the same as the impulse response of the analog filter, then the Impulse Invariance Method is used. Similarly the Step Invariance or Zero-Order-Hold Method is used to match a step response, and the Pole-Zero Matching Method and Bilinear Transformation are used to match frequency characteristics of the analog filter. There are many other methods to design digital filters, too numerous to consider here.

1.6 Summary of Chapter 1

This chapter has presented an introduction to digital filters including: the

9

terminology, definitions, reresentation forms, and design methods that will be used throughout this paper. The methods described above permit a digital filter to be designed to meet the requirements of a specific application. However, its practical implementation is subject to errors and inaccuracies due to the finite wordlength of the digital hardware it will be used on. These errors, called quantization errors, are the focus of the next chapter.

10

2.0 Quantization Effects in Digital Filters

The practical realization of any digital filter is dependent on the finite arithmetic constraints of the digital computer used to program the filter algorithm. Recall from Chapter 1, that a digital filter can be represented by an Nth order difference equation as

n n

Yk = L aixk-i - L biYk-i-

i=O i=1

(2.1 )

Since Yk-it xk+ ai' and bi are considered to take on a continuum of values, infinite preclslon is assumed. However when a digital filter is implemented on a digital computer. only finite precision is available because of the binary format in which the numbers are stored. Therefore errors and inaccuracies are introduced due to the finite wordlength of the digital computer used to store the filter algorithm. The same thing is true for the finite wordlength of the interface devices, the ADC and DAC, used to encode and decode the input and output signals respectively. These errors are collectively referred to as the quantization errors and affect any practical implementation of filters or controllers using digital hardware.

It is well known that digital systems are associated with two types of arithmetic. fixed-point and floating-point. Fixed-point arithmetic is widely used for real-time applications since it Is faster and more economical than floating-point arithmetic. When a digital filter is implemented using floating-point arithmetic, one is usually not concerned with the quantization errors mentioned above since infinite precision is assumed.

However, errors can occur if digital filters are implemented on a digital computer using floating-point arithmetic. Sandberg [87] was the first to analyze roundoff errors for digital filters implemented using floating-point arithmetic. Wienstein and Oppenheim [97], Oppenheim, [64] and Kaneko and liu [39J also looked at the floating-point roundoff errors of digital filters. Kaneko and Liu did an in depth error analysis of digital filters realized with floating-point arithmetic.

Although errors can, and do, occur when digital filters are implemented with floating-point arithmetic, and analysis of these errors are important,

11

floating-point errors will not be considered further in this paper. It will be assumed that a digital filter implemented with floating-point arithmetic has infinite precision, and is error free. The focus of this paper will be the quantization errors that affect digital filters implemented with fixed-point arithmetic on finite wordlength computers using finite-arithmetic elements.

It is well documented in the literature [3][22][40][44][66][71], that there are four major sources of quantization errors that affect digital filters implemented with fixed-point arithmetic on a digital system with a computer and interface devices that have a finite wordlength. These quantization errors are:

1) Signal Conversion Quantization

2) Coefficient Quantization

3) Arithmetic Quantization

4) Realization Form Quantization

Each of these four error sources will now be explored in detail in the sections that follow.

2.1 Signal Conversion Quantization

Since most digital filtering systems contain analog as well as digital signals, it is necessary to convert analog signals to digital signals with an ADC, and digital signals to analog signals with a DAC. Let us first consider the AID conversion quantization errors.

2.2 ADC Quantization Errors

A discrete input sequence x(k), to a digital filter is usually produced by the uniform sampling of an analog signal by the analog circuitry of an ADC. Since the number of bits in the digital register of the ADC is finite, only a finite resolution can be attained by the ADC. A B-bit binary word defines 2B distinct levels, thus each bit in the ADC digital word provides a resolution of one part in 2B. [44] Since the digital output of the ADC can only assume a finite number of levels, it is necessary to quantize the analog level into the nearest digital level. The quantization level in volts, qv, for a B-bit ADC is given by

qv = 2-m * (full analog scale in volts).

(2.2)

where m = B-1. The value m, the data word, is equal to the B-bit ADC word minus one bit for the sign bit. (This assumes a bi-polar ADC, Eq. (2.2) would

12

be 2-8 for a uni-polar ADC). The input value xk is rounded to the nearest quantization level nqv where nqv is a integral multiple of the quantization level in volts, qv' The value n is one of the 28 distinct quantization levels for a 8-bit ADC word and falls in the range 0 < n S 28. (For example, if qv = 0.125, 2qv = 0.250, 3qv = 0.375, etc. in the positive voltage direction.) Truncation may also be used instead of rounding, but truncating an input value xk to a level not higher than the current level introduces a larger quantization error and is rarely used in practice.

Consider as a simple example, a 3-bit (2 bit-plus-sign) ±2 volt ADC.

Since we are dealing with a 3-bit ADC, the number of distinct quantization levels is 23 = 8. The data word m is 3-1 = 2, so the quantization level in volts is

qv = 2-2 * (2.0) = 0.5 volts.

(2.3)

Figure 2.1 shows a diagram of the AID conversion process for this example. (This ADC is assumed to use 2's complement arithmetic. with a digital word of the form y.1 xx, where 'y' is the sign bit, .1 is the binary point, and 'xx' is the data word.)

Fig. 2.1 ADe Quantization

Analog Level Digital Output

1.5 .J..J.~11

1.0. ~~10

o.g q. t 1""258 ~!g~

-0.5 *-0.52 ~ ~ 11

-1.0 ~~10

-1.5 ~~01

-2.0 1~00

3-bit(2-bit-plus-sign), ±2 volt ADC

The analog input value represented by A in Figure 2.1 has a value of 0.825 volts and is between the two quantization levels 0.5 and 1.0 volts respectively. Since the analog input value of A is closer to the quantization level 1.00 volts (1.00 - 0.825 :;:; 0.175 < qyl2), the value A will be quantized to

13

the 1.0 volt level and produce a digital output of 0.1.102. By similar arguments, the analog input represented by B in Figure 2.1 which has a value of -0.52 volts, will be rounded to the -0.500 volt level and produce a digital output value of 1.1112. It should be noted that since qv = 0.5 volts, the resolution of the ADC considered is very "coarse", and all voltages will be rounded to the nearest 0.5 volt level. The 3-bit ADC was used for example purposes only, and an ADC this small would almost never be used in a real digital system. In practice, most ADC's have a qv in the millivolt range. A 12-bit (11-bit-plus-sign), ± 10 volt ADC for example, will have 211 = 2048 distinct quantization levels and a qv = 2-11 * 10 = 0.005 volts, or a 5 mv resolution.

The difference between the analog input signal and the digital output is called the signal conversion quantization error qe' of the ADC. [40] The

output of the ADC can be viewed as being the sum of the actual input xk and

an error component qe or xqk ::: xk + qe·

(2.4)

The quantized input value xqk available at the output of the ADC introduces an error component with each input sample value. The ADC quantization error qe, has been thoroughly investigated [3][66][67][85][86], and has been described via statistical methods to be a uniformly distributed random variable in the interval (-qv/2,+qv/2), taking on any value between -qv/2 and

+qv/2 with equal probability. The usual approach is to treat the effect of input quantization qe (with rounding) as white noise that has a zero mean and a variance of

(2.5)

It is clear from Eq. (2.5) that the quantization error of the ADC will tend to zero as the number of bits in the ADC digital word tends to infinity. An infinite number of bits in the ADC digital word would imply that the ADC has infinite precision. It should be noted however, that in order to improve the resolution of the ADC by increasing the number of bits in the digital word, the full scale analog signal input value has to be maintained as a constant. In practice, the ADC is usually one of the most expensive components of a digital system,

14

and its wordlength is typically in the range of 8-16 bits. [79] ADC wordlengths of 8-16 bits will introduce small quantization errors into each input sample, but these errors are usually small enough not to adversely affect the performance of a digital system.

2.3 DAC Quantization Errors

The process of D/A conversion does not lead to any error in reconstructing an analog waveform that is equivalent to the digital signal at its input. [85] However, there are two problems associated with the DAC's finite wordlength that can indirectly introduce errors into the practical implementation of a digital filter. First, the digital wordlength of the DAC is always of finite length and must be large enough to reconstruct an analog signal with the desired resolution. The wordlengths of the DAC, ADC and the computer used in a digital system are all interdependent and must be properly coordinated. [40] If the DAC wordlength is too small, the analog signal output from the DAC may be too "coarse" for the desired application. To illustrate this point, consider a 12-bit (11-bit-plus-sign), ±10 volt ADC used with a 9-bit (8-bit-plus-sign). ± 10 volt DAC. It is assumed the digital computer between these interface devices has a sufficient word length. The ADC can encode an analog input signal to one of 212 = 4096 distinct levels, and the ADC has a quantization step size in volts qv = 2-11 * 10 = 0.005 volts.

However, the DAC can only decode its digital word into an analog signal with 29 = 512 distinct levels with a qv = 0.04 volts. Consequently, the DAC can reconstruct an analog input signal with approximately 1/10 the resolution of the original analog input signal encoded by the ADC. The analog output signal in this case mayor may not be acceptable for the desired application. It is up to the system designer to choose a DAC with the proper digital wordlength to assure the analog output signal meets the system specifications.

The other potential source of D/A conversion error is due to the difference in wordlength sizes of the digital computer used in a digital system and the DAC. The wordlength of the computer is usually larger than the wordlength of the DAC, and a discrete output value Yk must somehow be scaled to fit in the DAC digital word for conversion to an analog level. For example, a 16-bit

15

digital computer will produce a 16-bit output value Yk, and if this computer is used with a 12-bit DAC, Yk must be scaled to a 12-bit value that represents the same sign and magnitude as the original 16-bit value. At some point Yk must be rounded or truncated after scaling to fit in 12-bits, and these actions are sure to introduce errors even though they are usually small. These errors will be discussed later in Section 2.5.

It has been shown that the quantization of an analog input signal by an ADC will always introduce errors into the discrete input sequence x(k). These errors were examined and have been modeled using statistical techniques. Their effect can be reduced by using an ADC with a proper digital word length for a given application. It was also shown that the DAC did not directly introduce quantization errors into the digital system, but its digital wordlength must also be of sufficient size to provide the proper analog output resolution and avoid roundoff errors. It is necessary that the system designer properly coordinate the wordlengths of the ADC, DAC, and the digital computer of a digital system to minimize the quantization errors discussed above and provide an adequate system for a given application. In the next section, errors introduced by the quantization of filter coefficients will be examined.

2.4 Coefficient Quantization

When a digital filter is designed, it is usually assumed that the resulting filter coefficients will be stored with as many bits as needed, or stored as an infinite precision value. This is not the case however, as each infinite precision coefficient value must be replaced by a t-bit representation as a result of using fixed-point arithmetic on a finite-wordlength computer. The t-bit representation of a coefficient will introduce inaccuracies into the coefficient value and may change the characteristics of the digital filter designed. These inaccuracies are called coefficient quantization.

Depending on the way negative numbers are represented, there are three forms of fixed-point arithmetic commonly used. They are: the sign-magnitude, 2's complement, and 1's complement representations. [40] The sign-magnitude representation uses the leading bit to represent the sign, 0 for positive values and 1 for negative values. The rest of the bits of the digital

16

word are used to represent the magnitude of the number. The 2's complement number system represents positive numbers in the same way as sign-magnitude, but negative numbers are obtained by complementing all bits, and adding a 1 in the least significant bit to the positive number binary representation. The 1 's complement number system also represents positive numbers the same way as the sign-magnitude, but negative numbers are obtained by complementing all bits of the positive number. The 2's complement number system is the number system used in most digital computers and therefore most commonly used to implement digital filters. [83] Consequently, the 2's complement representation will be the only type of fixed-point arithmetic considered in this paper.

Given a fractional decimal number (Ch 0 representing, a filter coefficient, assume C falls in the range, or scaled so that it falls in the range where -1 < C < 1. If C has a wordlength of B-bits and a 2's complement representation, then

B = m + 1.

(2.6)

where bs is the sign bit, !1 is the binary point, and b1 ... bm are the m data bits. The sign bit bs has a value of 0 to represent positive numbers and a value of 1 to represent a negative numbers or

1 s < 0

o s z O.

(2.7)

The binary digits b1 ... bm have the binary values of 0 or 1 respectively, and the coefficient wordlength B = m data bits + 1 for bs, the sign bit.

There are two common practices used to quantize digital filter coefficients for a fixed-point representation on a finite wordlength computer, truncation and rounding. Truncation of a fixed-point number to t-bits, where t < rn, the number of data bits, is the process of dropping all bits beyond the least significant bit bt. [3] The original (infinite precision) filter coefficient Co is represented by an infinite series of powers of 2 in 2's complement notation by

17

(2.8)

The truncated filter coefficient Ct, truncated to t-bits is represented by

(2.9)

On the other hand, rounding of a number to t-brts is the process of adding a 1 to the (t+ 1) bit of Co' and then truncating to t-bits. The rounded coefficient Cft rounded to t-bits is represented by

Co = bs~ b1b2b3'" btbt+1'" bm

+ O~O 0 0 ... 0 1 ... 0

(2.10)

Consider as an example the filter coefficient Co represented in decimal notation as

Co = 0.677734310.

Co has a binary 2's complement representation of

(2.11 )

Co = 0.1010110112.

If Co is truncated to 5-bits, then Ct, the truncated representation is

(2.12)

Ct = 0.101012 = 0.6562510'

If Co is rounded to 5-bits, then Cr, the rounded representation is Cr = 0.1010111011

+ 0.0000010000

(2.13)

Cr = 0.101102 = 0.687510-

(2.14)

For the truncated representation Ct, the quantization error is

Et = ICo - Ctl = 0.6777343 - 0.65625 = 0.0214843. (2.15)

for an error of 1.46%. The rounded representation Cft has a quantization

18

error of

Er = ICo - Crl = 0.6777343 - 0.6875 = 0.0097657. (2.16)

The rounded coefficient value Cr represents an error of 0.66%. Note that the rounded value of 0.687510 is much closer in magnitude to the original value of of 0.677734310 than is the truncated value of 0.6562510'

When a digital filter with a pulse transfer function of

n

L ajz-i i=O

G(z) = ------------------

n

(2.17)

is actually implemented using fixed-point arithmetic on a finite wordlength computer, the ai and bi coefficients have to be quantized to a specific number of bits using rounding or truncation. Suppose 8-bits (m data + 1 sign bit) are used to represent the coefficients, and let quantized coefficients be represented by

aQi = ai + ~8 bQi = bi + 138

(2.18) (2.19)

where ~8 and 138 represent the error terms of a 8-bit representation.

Since t-bits are used for quantization, ~8 and 138 are bounded for truncation by

1~81 s 2-t, 11381 s 2-t and for rounding by

1~81 S 2-t/2, 11381 S 2-t/2

(2.20)

(2.21 )

The quantized coefficient transfer function can now be represented by n

L aQiz-i

i=O

GQ(z) = -----------------

(2.22)

n

• 1 + L bQiZ-i i~1

8y quantizing the coefficients. the filter obtained is actually different than the

19

one originally designed. If the B-bit word length is not large enough, certain undesirable effects may occur. For instance, the frequency characteristics (e.g. magnitude, phase, and bandwidth) of the quantized filter may differ appreciably from those of the original filter. An even bigger problem will result if the poles of the original digital filter are close to the unit circle (e.g. ai = 0.98987), then the quantized coefficients may lie just outside the unit circle, resulting in an unstable implementation. However, if a reasonable wordlength is chosen to quantize the coefficients, the actual coefficients used will be very close to the ideal or infinite precision coefficients and the quantized filter will meet the specifications for the application it was designed for. The problem of assessing changes in the values of coefficients on the characteristics of a filter, can be approached in a number of ways. First, a simple and practical approach could be used. For example, to evaluate changes in frequency response, the frequency response of the filter with t-blt quantized coefficients could be compared with the ideal response for the original design (e.g the original filter implemented with floating-point arithmetic). Changes in the frequency response could be easily evaluated. The movements of the poles and zeros of the transfer function due to coefficient quantization can be calculated, and then used to study the changes in digital filter response. Suppose the poles of G(z) of Eq.(2.17) are P1' P2 ... PL, and the poles of the

quantized filter transfer function GQ(z) of Eq. (2.22) are Pi + ~Pi where j = 1,2, ... L, it can be shown that the changes in pole position ~Pi are given by

L p.k+1

I

.1Pi = L ---------------- bQk

11=1 L

n (1 - Pi/Pm)

(2.23)

After Liu [51].

where bQk are the quantized coefficient values described in Eq. (2.19). Similar results can be obtained for the movement of the zeros in the numerator of the transfer function. From these movements, the change in the overall filter response can be studied.

A second more rigorous approach is to treat the changes in coefficient

20

values as random statistical perturbations and to estimate the changes that may be expected by a statistical method. or to derive bounds on these changes. [71] Significant research has been done using statistical as well as the practical methods to describe and qualify digital filter coefficient quantization. A sampling of some of the major work in this area can be found in Chapter 3.

In this section it has been shown that all digital filters designed with infinite precision arithmetic and implemented using fixed-point arithmetic on a finite wordlength computer suffer from coefficient quantization problems. If an adequate wordlength is used to store the quantized filter coefficients, the characteristics of the actual filter will be close to those of the ideal filter. If the wordlength is insufficient. the filter characteristics may be changed dramatically. A few methods used to examine the effects of coefficient quantization were presented. In the next section the quantization effects introduced by mathematical operations on the practical implementation of a digital filter will be investigated.

2.5 Arithmetic Quantization

It has been shown in Chapter 1 that a digital filter can be described by an Nth order difference equation as

n n

Yk = L aixk-i - L biYk-i·

i=O 1=1

(2.24)

To generate the current filter output Yk' the filter has a multiplier to calculate the individual products aixk-io biYk-io and has an accumulator to add (subtract) these products. These arithmetic operations can lead to quantization errors in the form of roundoff errors. overflows, and limit cycle oscillations.

When a digital filter is implemented on a digital computer with a finite wordlength using some arithmetic representation and number system, arithmetic operations operate on numbers within a limited range called the dynamic range of the number system. The dynamic range is the defined to be the range of numbers that can be represented from the smallest nonzero magnitude to the largest magnitude in a B-bit word. where B = m data bits

21

plus one sign bit. For example the dynamic range of a 16-bit (15 data-bits-plus-sign) 2's complement fixed-point implementation would be -32768 thru +32767. Floating-point implementations have a larger dynamic range than do fixed-point implementations, and a 16-bit, 2's complement floating-point implementation would have a dynamic range of"" ±1 O±38.

In general, multiplications lead to an increase in the wordlength required

for the result of the operation. To form the products aixk-io biYk+ a B-bit (including sign-bit) for the filter coefficients, and a B-bit word representing the input values xk-i are multiplied leaving a (B+B) bit result. Since the digital filters being considered are implemented using fixed-point arithmetic, the (B+B) product must be shortened to fit in B-bits. There are two ways to accomplish this, rounding and truncation. The rounding and truncation of products are similar to the round-off discussion for coefficients discussed in Section 2.4. The exact products. if infinite precision storage were available would be

(2.25)

However, since the products must be rounded to fit in B-bits, the actual or quantized products are

(2.26)

The rounding of a product is bounded in absolute value be 2-t/2 as described above in Eq. (2.21). This rounding quantization error can be modeled as described in Section 2.2 as white noise with zero mean and a variance of

a2 = 2-m/3.

Hence the product roundoff error is

(2.27)

(2.28)

The current output Yk needs N+N+ 1 multiplies to be produced, [52] and each of these products will introduce quantization errors. The roundoff of products can be minimized by using alternate representations to implement a digital filter. This topic will be discussed in detail in the next section.

When two B-bit numbers are added, the resulting sum should also fit in a B-bit word. If this is not the case, overflow may result. If a fixed-point 2's complement arithmetic is assumed, it is clear that the sign bit of a positive

22

number is O. If the sum of two positive numbers overflows, the carry that results will change the sign bit to a 1. The resulting sum will then be incorrectly interpreted as a negative number. The overflow introduces a severe non-linearity into the filter output, and it can be shown that this phenomenon will lead to large oscillations. As an example consider a computer with a 4-bit (3-bits-plus-sign) wordlength and a fixed-point 2's complement arithmetic. If the two numbers to be summed were 610 and 710 respectively, the following result would be obtained

OLl110 610

OLl111 710

(2.29)

Instead of a large positive number, a negative number is obtained, and since an overflow occurred, an overflow oscillation would be expected. Since this phenomenon is undesirable, there are a few of strategies employed to try and avoid overflow. The first is to scale the input signal values to assure all sums will fall in the dynamic range of the digital filtering system being used. Second, special hardware adders could be used that would "saturate" on overflow. A saturation adder outputs the maximum value of the dynamic range on overflow and will not allow the adder to change the sign of the sum. However, for most implementations were no special hardware or scaling of the input signal is used, the strategy is to monitor overflow and if it occurs, simply set the sum equal to a limit value within the dynamic range of the computer.

It has also been observed that an actual output of a digital filter may not converge to zero even if the input to the filter is zero. It was shown above that

all input values xk-i can only attain a finite number of values since they are quantized and bounded by in amplitude. This makes the digital filter a finite-state machine. [15] Since the filter has a finite number of stages, non-zero output with zero input must become periodic after a finite length of time. This periodic oscillation is referred to as a limit cycle. Filter output with zero input ;s referred to as a zero input limit cycle. limit cycles are caused by the finite wordlength sizes use to manipulate data values within a digital filter.

23

Consider as an example a first order digital filter represented by Yk = xk - 0.935Yk_1.

(2.30)

An impulse of 11 volts is applied were the impulse sequence is described by 11 k = 0

o k » O.

(2.31 )

If infinite precision arithmetic (floating-point) is assumed, then Yk tends to 0 with increasing values of k as is shown in Table 2.1.

Table 2.1 Filter Output with No Rounding (Floating-Point)

k xk Yk-1 Yk = xk - 0.935Yk_1
-------------------------------------------------
0 11 0 11.00
1 0 11.00 -10.285
2 0 -10.285 9.616
3 0 9.616 -8.991
4 0 -8.991 8.407
5 0 8.407 -7.861
6 0 -7.861 7.350
7 0 7.350 -6.872
8 0 -6.872 6.425
00 0 0 0 If fixed-point arithmetic is used with the same impulse input then, only finite precision is available for the mathematics to manipulate each data value. If it is assumed that each data value is rounded the the nearest integer value then Table 2.2 describes the filter output. It is clear from Table 2.2 that the output sequence is periodic beyond k = 5 even though there is no input.

Table 2,2 Filter Output with Rounding (Fixed-Point)
k xk Yk-1 Yk = xk - 0.935Yk_1
-------------------------------------------------
0 11 0 11.00 [ ] = round to the nearest integer
1 0 11 -10.285 [-10]
2 0 -10 9.350 [9]
3 0 9 -8.415 [-8]
4 0 -8 7.480 [7]
5 0 7 -6.545 [-7]
6 0 -7 6.545 [7]
7 0 7 -6.545 [-7]
8 0 -7 6.545 [7]
00 0 ±7 ± 6.545 [7] 24

Clearly it is lack of sufficient accuracy that leads to this type of limit cycle.

Therefore it is important to have wordlengths large enough to allow arithmetic operations to be carried out with the proper precision such that these limit cycle oscillations are kept small.

In this section, quantization errors due to arithmetic operations were examined. These arithmetic quantization errors included roundoff errors, overflows, and limit cycle oscillations. These errors can usually be minimized If the wordlength of the computer used to implement the digital system is of sufficient length. In the next section the quantization errors resulting from realization form used to implement digital filters will be examined.

2.6 Realization Form Quantization Errors

A digital filter transfer function can be realized in a variety of ways. As was discussed in Chapter 1, the three most common realization forms are the direct, cascade, and parallel. Since any practical digital filter realization is dependent on the precise digital filter structure, it is appropriate to choose a realization form that will not be adversely affected by the quantization errors discussed so far. Recall the pulse transfer function of a digital filter is

Y(z) G(z) :;; ---- :;; X(z)

(2.32)

n

A difference equation relating the discrete sequences y(k) and x(k) can be derived by cross-multiplying the terms of Eq. (2.32) to give

n n

Y(z) + L biz-iy(Z) = L aiz-iX(z)

(2.33)

i:O

Using the inverse z-transform were z-iy(z) :;; Yk-i Eq. (2.33) becomes

25

n n

Yk = L ajxk_j - L bjYk-i'

i=O i=1

(2.34)

This difference equation was formed from the direct relation of the z transform transfer function and is said to be a direct form realization. If G(z) is written as a partial fraction expansion as

K

G(z) = L Gi(z),

(2.35)

where Gi(z) is either a first-order section of the form

(2.36)

or a 2nd order section of the form

Gi(z) = ---------------------------

(2.37)

then the filter is said to be realized in the parallel form. If G(z) is

written as a product of 1 st and 2nd order terms as

K

Gi(z) = II Gj(z),

(2.38)

where the 1 st order terms have the form

1 + a1iz-1 Gi(z) = --------------, 1 +b1iZ-1

(2.39)

and the 2nd order terms have the form

1 + a1iz-1 + a2iz-2

Gi(z) = ---------------------------,

1 + b1 iz-1 + b2iZ-2

(2.40)

then the filter is said to be in the cascade form. [85]

26

These realizations forms are usually analyzed with respect to the coefficient and roundoff quantization errors. First, let us look at the coefficient quantization problem. As was discussed in Section 2.4, filter coefficients stored using fixed-point arithmetic on a finite word length computer are subject to quantization errors. For a filter realized in the direct form of Eq. (2.32), the poles depend only on the denominator coefficients bi' As a result of this form, it is possible to show that small errors in the coefficients can cause large shifts in the poles. Thus if the poles are tightly clustered as in any narrow frequency range filter, the poles of the digital filter using the direct form will be very sensitive to coefficient quantization. The higher the order of the system (larger number of poles), the greater the sensitivity. This problem is especially critical when the poles are clustered near the unit circle and may be quantized to values outside the unit circle leading to an unstable filter realization. [65]

Since in general parallel and cascade realizations of a digital filter are really a connection of smaller 1 st or 2nd order blocks, the poles are realized separately and not as tightly coupled. Errors in a given pole is independent of its distance from the other poles of the system. For this reason it can be stated that the cascade and parallel realization forms are superior to the direct form. Since the direct form has a severe coefficient sensitivity problem, it is rarely used in a practical implementation of a digital filter. [36]

The other important error introduced by a filter realization form is the product roundoff error from multiplications performed to calculate a filter output. The arithmetic quantization errors discussed in Section 2.5 were for the direct from realization of a digital filter. It was shown that roundoff quantization of this form can be thought of as white noise with a zero mean and a variance of 2-m/3. The parallel and cascade forms also introduce roundoff errors that can be modeled as white noise with zero mean but unlike the direct form have complicated variance expressions (See [85] and [51]) that will not treated further in this paper. Since there are N+N+ 1 products formed when the digital filter is implemented using Eq. (2.32), there will be N+N+1 quantization error terms which will be added together then to the output of the filter. The direct implementation forms products with the larger values of the high-gain poles, and thus introduces larger roundoff errors then the parallel or cascade form. Since the parallel and cascade realization

27

forms are actually the sum and products of smaller 1 st and 2nd order blocks respectively, the poles are associated with smaller gains and introduce small roundoff errors in each block. The errors introduced into each block are also summed with the filter output but the error result is generally smaller than the direct form. For these reasons, the direct realization form is also inferior to the parallel and cascade forms with regard to product quantization errors. [40]

The cascade realization form also has a few difficulties associated with it use. Since the cascade form is actually a direct programming of small 1 st and 2nd order filter blocks, the output of each block (i) is the input to the next block (i + 1). The intermediate products may have to be scaled to avoid the overflow problems discussed above. Also, it is sometimes difficult to decide which poles to pair with which zeros to get the smallest gains for each block. [52] Therefore, the parallel realization form may be slightly better than the cascade, but the direct form is definitely inferior with regards to both the coefficient and roundoff quantization problems.

The digital filter realization forms discussed above are the 3 most widely used in the practical implementation of digital filters. There are countless variations that could be used. For example, a combination of cascade and parallel forms could be used with each realizing a part of the digital filter, or the direct form may be restructured to give improved results. The resulting hybrid-filters have the same transfer functions, but different quantization properties that would fall somewhere in between the three realization forms discussed above. The choice of a realization form is dependent on the cost and performance desired for a given application, but the problems associated with the actual realization forms of a digital filters are well documented and must be taken into account.

2.7 Summary of Chapter 2

In this chapter, the four major quantization errors affecting the practical implementation of a digital filter were presented and examined. These four quantization errors, signal conversion quantization, coefficient quantization, arithmetic quantization, and realization form quantization, will always introduce errors into the input and output of a digital filter implemented on digital hardware using fixed-point arithmetic and finite-arithmetic elements.

28

Anyone of these four errors could become critical and dramatically change the performance of the filter originally designed. When designing a digital filtering system, it is important to take these quantization errors into account and try to minimize their overall effect on the performance of system.

In the next chapter, a review of research works dealing with the quantization errors discussed in sections 2.1-2.6 will be presented.

29

3.0 Historical Review of the Literature

A review of the literature covering the quantization errors affecting the practical implementations of linear, time-invariant constant coefficient, IIR digital filters due to finite wordlength is presented below. It would not be possible, nor is it the focus of this paper to make the list of references complete. The literature is so large that selecting a proper subset was the only alternative. This subset includes the major research contributions on the quantization errors discussed in Sections 2.1-2.6 of Chapter 2, that were known to exist at the time this paper was written. The summary of these papers begins with the work of Kaiser in 1965, who conducted some of the earliest research on the practical implementation of digital filters.

3.1 Early Research 1965-1970

It appears that Kaiser [37] in 1965 was the first to examine the problems arising from the actual implementation of digital filters on a finite wordlength computer. Up to that time, no one had adequately treated the problems connected with the realization of digital filters using finite-arithmetic elements.

Kaiser examined the relationship between the accuracy of the representation of the digital filter coefficients due to rounding, and the stability of the filter due to errors in the poles and zeros of the z-transfer function. He also tried to relate filter coefficient accuracy to both the sampling rate and filter complexity deriving an absolute minimum bound on the number of decimal digits needed to represent a stable Nth-order digital filter. Kaiser investigated the coefficient accuracy problem of the three popular realization forms, the direct, cascade, and parallel. He concluded the direct form is definitely inferior in performance to both the cascade and parallel forms, and the parallel form may be slightly favorable to the cascade realization.

Kaiser's findings agreed with Knowles and Edwards [411 who earlier in 1965 had used the three realization schemes mentioned above for a discrete controller in a closed-loop digital control system. They examined the degradation in system performance due to roundoff errors in the multiplication and addition operations performed for each of the three realization forms. Knowles and Edwards also investigated the quantization errors introduced by the signal converting devices.

30

Gold and Rader [23] looked at the effects of rounding of the multiplication products in a digital filter showing that this rounding introduces additive noise whose magnitude and spectral shape may be modeled by z-transform techniques. Rader and Gold [84] also examined the effects of filter coefficient rounding, and proposed a realization scheme that was less sensitive to parameter quantization than the standard realizations normally used for a digital filter.

Knowles and Olcayto [42] investigated the deviation in the frequency response of a digital filter realized by a finite wordlength machine, from that which would have been obtained with an infinite wordlength machine. They developed a method which showed that the quantization of a digital filter's coefficients can be modeled by a transfer function of error terms used in parallel with the corresponding ideal filter transfer function. Using their model and certain statistical assumptions, Knowles and Olcayto showed the mean-squared difference in frequency response between the actual and ideal filter for different wordlength computers. They used their method to evaluate the Root-Mean-Square (RMS) value of the filter output noise and multiplication roundoff errors, however their method proved to be unsuitable for filters realized in the cascade form.

Gold and Rader [24] in chapter 4 of their book, summarized the quantization errors that have to be considered when a digital filter is designed on the basis of infinite accuracy, and implemented with finite accuracy arithmetic. They concluded that the four important sources of quantization errors that affect the actual realization of a digital filter are: ADC quantization of the input signal, inaccuracies introduced when rounding and truncation are used to represent coefficients on a finite wordlength computer, roundoff quantization of the addition and multiplication operations, and realization form quantization errors.

Research papers appearing in the literature after 1969 fell roughly into two major areas: papers dealing the coefficient quantization issue, and papers dealing with arithmetic quantization. The other two quantization errors namely signal conversion quantization and realization form quantization, were almost always considered within papers dealing with the other two topics described above. Therefore, a historical review of the literature will presented based based on primarily on papers grouped into the

31

coefficient and arithmetic quantization categories. This review will also include a section on the quantization problems for digital filters realized with state equations. State space representation of digital filters and controllers has become important issue recently, and the quantization problems for state representation must be considered. Finally, research done on quantization error effects for other digital filters such as the popular fast-Fourier transform, and finite-impulse response filters will be considered.

3.2 Papers On Coefficient Quantization Effects

As was discussed in Section 2.4, when a digital filter is designed, it is usually designed assuming the coefficients of the digital filter's transfer function can be represented by as many decimal digits as needed, or by using infinite precision arithmetic. This is rarely the case however, since the filters are implemented in a fixed-point representation, on a computer with a finite wordlength. The coefficients therefore, must be represented by a finite number of bits, and a source of error known as coefficient quantization is introduced. A summary of the major research contributions on the coefficient quantization problem is presented here in chronological order.

Ontes and McNamee [69] derived a threshold of stability for digital filters as a function of the number of bits used to store the fractional portion of the digital filter coefficients for sine and tangent Butterworth low pass filters. They considered the theoretical and actual instability of these filters implemented with both fixed-point and floating-point arithmetic. Ontes and McNamee expanded upon earlier work done by Kaiser [37] by establishing their stability threshold directly in terms of filter bandwidth.

Avenhaus [1] [2], purposed two methods, a direct and a statistical for calculating the proper wordlength to store coefficients for a digital filter implemented on a finite wordlength machine. His research also included a way to reduce the wordlength obtained by at least three bits by optimizing the filter coefficients in the discrete parameter space.

Suk and Mitra [93] present a random search optimization algorithm to obtain coefficients from the design of digital filters on computer with finite wordlength. Their algorithm is used to describe a computer-aided approach to reduce the errors associated with coefficient quantization when a digital

32

filter is designed for use on a computer with a specific wordlength.

Sekey [89} measured the RMS error in actual filter output from that of the "ideal" output for high order digital filters realized as cascade connections of second-order stages. He analyzed the performance difference of high order digital filters in terms of coefficient quantization and internal state variable quantization. He used sine waves, impulses, and triangular waves as reference inputs, and varied the wordlength used to store the filter coefficients from 6 to 16 bits. Sekey found the RMS deviation in output to be a linear function of the number of bits used to store the coefficients (with a negative slope of -6d8/bit) for both sources of quantization error irreguardless of the type of input signal.

Kwan [451 [46} developed a two stage method to reduce the word length to store the coefficients of a digital filter. In the first stage, the statistical wordlength is minimized by optimizing the primary parameters of the amplitude-frequency characteristics of the digital filter being deSigned. Then in stage two, using a non-linear algorithm, the coefficients obtained were further optimized into a discrete set of values.

Zhukov [100] developed a method of synthesizing digital filters whose amplitude-frequency characteristics have a complex shape, taking into account the finite wordlength used to store the filter coefficients. He also discussed the possibility of developing an optimization procedure to minimize the wordlength needed to store the coefficients once they have been obtained.

Ishii [31] introduces a statistical approach for calculating sufficient wordlength needed to store the filter coefficients such that the magnitude of the filter transfer function is described within a prescribed error. He also presents an optimization procedure to obtain optimal coefficient values and minimize the wordlength needed to store the coefficients.

Mingazin [56] developed a method similar to Kwan [45} [46], for synthesizing digital filter coefficients of finite wordlength based on the optimum selection of the primary parameters of the amplitude-frequency characteristics of the filter. He method is easier and more practical to implement than Kwan's. but does not include a method for optimizing the discrete coefficient values.

There are many signals such as medical X-rays, siesmic, magnetic and

33

gravitational data, and photographic data that are inherently two or more dimensional, and require special signal processing techniques. [85] When a digital filter is designed for a two or more dimensional application, the effects of coefficient quantization must still be considered. Sicuranza [90], and Swamy et. at, [94] looked at these coefficient quantization effects for 2-dimensional and N-dimensional digital filters respectively. N-dimensional signal processing has seen a rapid development in the past few years. and research to identify and quantify the quantization errors for the implementation of N-dimensional digital filters actively continues.

3.3 Papers On Arithmetic Quantization Effects

When a digital filter is implemented on a finite-word length computer, the internal mathematical operations such as additions, subtractions. and multiplications can introduce quantization errors that affect the performance of a digital filter. (See Section 2.5). A chronological summary of arithmetic quantization research papers is presented below.

Jackson [32] did an analysis of the quantization problem due to rounding of the multiplication products occurring in the fixed-point implementation of digital filters. He derived an estimate on the limit cycle amplitude based on an effective-value linear model.

Ebert et. at. [19] looked at the overflow oscillation problems of addition operations in cascade and parallel realization of digital filters using 2's complement arithmetic on a finite wordlength computer. They presented a modified 2's complement adder that could eliminate oscillations when overflow occurred.

Jackson [33] [34] in two later papers did an analysis of the roundoff noise for the fixed-point realization of digital filters in the direct, cascade, and parallel form. He found that roundoff noise was sensitive to the form of the realization used to represent the digital filter. Jackson showed the cascade realization was especially sensitive to the way in which terms were grouped in the factorization of the transfer function, as well as sensitive to the grouping of the sub-filters. Jackson also provided rules of thumb to help choose the proper representation form for a digital filter, but relied on trial and error to obtain optimal results.

34

Sandberg and Kaiser [88] established a bound on the RMS value of limit cycles due to roundoff errors in the fixed-point implementation of digital filters. Long and Trick [54] derived an absolute bound on the amplitude of limit cycle oscillations due to roundoff errors in fixed-point digital filters. They showed this bound is equal to the RMS bound of Sandberg and Kaiser [88] for real roots, and will never exceed the RMS bound by more than a factor of two for second-order digital filters.

Hwang [28] looked at the generation of roundoff noise in a fixed-point digital filter as a multistage process and proposed a minimum noise realization for cascade digital filters by dynamic programming. Patney and Roy [70] introduced two methods to calculate the multiplicative roundoff noise in digital filters that proved to be computationally more efficient than methods developed earlier. Finally, Chang [13] did a numerical comparison of several low roundoff noise fixed-point digital filters.

3.4 Quantization Effects in State Space Representation

Recently, the representation of digital filters by state equations has grown in importance. State space representation and design of digital filters offers several advantages over the traditional realization forms. First, it describes the relationships of all variables of the system, internal and external, not just the input and output variables. This allows the system designer to understand the total system. Second, this method can be applied to systems that are nonlinear, time-varying or systems with several inputs and outputs. Finally, the matrix form of the state space representation allows it to be used with complex systems, and lead to mathematical solutions that are suitable for computer solution. [82] The following summary is of papers dealing with the quantization errors in a state space representation of digital filters that have appeared recently in the literature.

Barnes and Fam [5] investigated digital filters free of overflow limit cycles based on the norm of the system matrix obtained from the state space description of the filter. They showed filter realizations that minimize this norm are free of overflow limit cycles, and overflow stable filters of any order can be built with parallel-cascade structures of these minimum norm filters irreguardless of the location of the filter poles.

35

Barnes [6] later did an analysis of the overflow roundoff noise interaction for fixed-point digital filters with state equations that posse the minimum norm property. He developed an expression for calculating the roundoff noise using scaling for the dynamic range of each of the state variables. Barne's expression can be applied to the filter as a whole, or to sub-filter sections to obtain structures that yield minimum roundoff noise.

Fam and Barnes [20] showed that every stable fixed-point. linear digital filter has a state space realization form, that when implemented using 2's complement arithmetic will be free of all overflow limit cycles. These realizations are nonminimal (minimum norm realizations were described by Barnes and Fam [5] above) and require more multiplications than the standard realization forms. but will allows yield limit-cycle-free structures.

Hwang [28] did an analysis of the quantization errors generated when a digital filter is represented with state space equations. He developed an expression to describe the roundoff noise for fixed-point digital filters described by state equations, introducing the effects of amplitude scaling and structure transformation into the noise expression so the interaction of the roundoff noise, filter structure, and dynamic range constraints could easily be studied. There continues to be a sizable research interest looking at the quantization errors for digital filters realized by state equations.

3.5 Quantization Effects on Other Digital Filter Representations

The fast-Fourier transform (FFT) algorithm has proved to be a very popular method for designing and implementing digital filters. [10] [26] [66] [85]. Early work on the quantization errors affecting FFT digital filters was done by Weinstein [98] and Welch [99]. Oppenheim and Weinstein [65] did an in-depth analysis of the quantization effects on FFT digital filters. They treated all the quantization errors discussed in Sections 2.1-2.6 of Chapter 2 in relation to the FFT using fixed-point. floating-point and block floating-point digital filters. Their paper contains a list over over 70 references covering early FFT filter work.

Another popular digital filter design method is the

finited-impulse-response or FIR digital filter. FIR filters unlike the infinite-impulse-response (IIR) filters discussed so far are usually

36

implemented by a nonrecursive structure, and the output of this nonrecursive structure depends only on the present and past values of the input, not previous output values. FIR filters operate on a predetermined number of input samples considered to be a data window. IIR filters generally achieve excellent amplitude responses at the expense of nonlinear phase. In contrast, FIR filters can have exact linear phase. [66] Linear phase filters are important for applications (e.g. speech processing and data transmission) where frequency dispersion due to nonlinear phase is harmful. [85] Therefore, design techniques for FIR filters are of considerable interest.

The papers by Hermann and Schussler [27], Chan and Rabiner [12], Lim et. al, [50], Kodek [43], and D'Addio and Galati [17] are a sampling of research done to deal with the quantization problems that arise when FIR digital filters are implemented using finite-arithmetic elements. The paper by Chan and Rabiner [12] has a good list of references of early work done on the quantization errors affecting FIR digital filters.

3.6 Survey Papers

There are three papers worthy of special mention since they contain a concise analysis of the quantization problems associated with the implementation of digital filters on a finite wordlength computer. All three review techniques both theoretical and practical, developed to deal with the quantization problems discussed so far, and offer a large list of references for research done up to the time each of the papers was published.

The first by Liu [51] in 1971 provided a summary of the most common sources of error due to finite word length. Liu suggested that there were five major quantization errors sources affecting the implementation of a digital filter and they were: quantization of the input signal, quantization of the filter coefficients, roundoff errors of arithmetic operations, quantization errors of the realization form of the filter, and errors introduced by the type of arithmetic. Liu described the effects of each of these 5 sources of error, and then reviewed techniques used to investigate these errors, and solutions proposed to eliminate the errors that appeared in the literature prior to an including 1971.

Oppenheim and Wienstein [65] in a twenty page paper in 1972 did a very

37

detailed analysis of the quantization errors that affect the implementation of a digital filter on a finite wordlength computer. They did a broader and deeper analysis of the problem then Liu [51] did earlier, including other possible quantization sources such as: roundoff and truncation errors of different number systems, the internal representation of numbers in a digital computer, limit cycles, etc. Quantization errors of fixed-point and floating-point of digital filters were discussed. All error sources were then examined with respect to the FFT implementation of digital filters.

The third and final paper was done by Classen et. al. [15] in 1976. There goal was to categorically summarize the research done up to 1976 on the quantization errors that affect digital filters when they are implemented on a finite-wordlength computer. Unlike, Liu [51], and Oppenheim and Weinstein [65], Classen et. at. focused their paper on the roundoff and overflow quantization errors affecting digital filters. Their summary included a concise and easy to use tables listing research done on the roundoff problem divided into several categories. A list of over 80 international references is included.

These three papers were singled out since they offer a concise and complete explanation of the major quantization errors that occur and can be expected when a digital filter is designed and implemented on a finite word length computer using finite arithmetic elements. Anyone interested in gaining an overview of the the quantization problem for digital filters could (and should) start with these three papers, and then refer to the literature cited to examine a particular problem of interest in depth.

3.7 Summary of Chapter 3

It has been well documented in the literature that there are quantization errors that will occur when a digital filter is designed using infinite-precision arithmetic and implemented using finite preclslon arithmetic on a finite wordlength computer. There appears to be agreement that the major sources of quantization errors affecting the implementation of a digital filter with a finite wordlength digital system are: signal conversion quantization, coefficient quantization, arithmetic quantization, and quantization caused by the realization form of the digital filter. There are of course other quantization errors that can and do affect digital filters. In the next chapter a method for simulating digital filters on a microcomputer will be introduced.

38

4.0 Digital Simulation of Real-Time Systems

In the past, most of the signal transformations implied by filters and controllers were accomplished by analog hardware. However, in the last two decades there has been a dynamic rapid movement toward the digital processing of signals. Low cost digital computers have allowed the use of digital processing of signals to implement filtering and control algorithms in real-time. These signal processing algorithms were often complex, and required many sophisticated calculations. Consequently, digital Signal processing was usually done using floating-point arithmetic. Floating-point arithmetic is assumed to offer infinite precision, so the filter coefficient storage and internal mathematical calculations are free of the quantization errors discussed in Chapter 2. Digital computers with floating-point arithmetic usually have a slower computational speed and are more expensive than computers using only fixed-point (all integer) arithmetic. In many real-time applications fixed-point arithmetic is used because it results in faster and more economical implementations. In Chapter 2, it was shown that there are quantization errors that can occur when a digital filter or controller is implemented on a digital computer using fixed-point arithmetic and finite-arithmetic elements. These quantization errors introduced by the digital computer and interface devices of a digital system have to be considered, and a cost effective system that minimizes these errors and meets the desired specifications for a given application is the desired end result for the system designer.

Traditionally, digital system designers used knowledge gained over many years of personal experience, and systematic methods to design and build digital systems. Many times however, these methods involved trial and error testing, and significant design changes before the digital system met design criteria. If design tools were used at all, the designers of digital systems used large simulation packages on mainframe computers. These simulation packages such as CSMP. (the Continuous Simulation Modeling Program) were written in the FORTRAN programming language and functioned in a batch environment. Although these simulation packages were very powerful and extremely valuable. [721 their use usually ment large computing resources were needed, programming expertise was required.

39

and there was a slow turn around time for experimental results. The simulation results and calculations were always obtained with floating-point arithmetic, and offered no option of simulating a digital system that would be implemented with fixed-point arithmetic. Although interface devices such as the DAC and the ADC were represented, their was no option to define or change the word size of these devices to show their effect on the overall system. Consequently, a realistic picture of the interdependencies of the components of the complete digital system could not be easily obtained.

4.1 Justification for Developing a New DiQital Simulation Scheme

In the past few years, with the availability of low cost modern microcomputers, and computer language compilers for general use, it is feasible that most every system designer will have their own microcomputer with a selection of programming languages.

It was desirable to try and develop a digital simulation scheme for real-time systems suitable for use on a microcomputer, programmed in a readily available and popular programming language that would offer distinct advantages over existing simulation packages and be a valuable tool when used with the traditional design methods mentioned above. This new simulation scheme should be as flexible as possible, allowing the system designer to investigate and analyze all components of the system, be interactive, allowing system parameters to be changed from the keyboard, having the results of these changes displayed immediately. It should be as user friendly as possible, and require no further programming or manipulation. The algorithm for the simulation would model real-time systems implemented with either fixed-point or floating-point arithmetic, and allow the characteristics of the interface devices to be changed. The simulation should give results that are accurate and can be used with confidence to build an actual real-time system that will have performance characteristics very close to the those determined with the simulation. This simulation scheme should be a practical and applied tool, a tool system designers could use to quickly obtain a digital system that meets the deslqn criteria.

A digital simulation scheme suitable for a microcomputer that has all of

the characteristics listed above was designed. developed. and implemented. It is written in the C programming language. The C programming language was chosen because it is a portable, high level language that also allows many features available only in assembly language. Compilers for the C language are readily available for most microcomputers. and programs designed for microcomputer simulation use can also be used on actual real-time systems.

4.2 Microcomputer Simulation Scheme

The digital simulation scheme was developed to simulate a digital filtering (open loop) system. or a closed-loop computer control system. A software switch allows the feedback path to be connected/disconnected. The digital filtering system is based on the block diagram of Figure 4.1.

DIGITAL FILTERING SYSTEM
ADC DIGITAL FILTER DAC (ZOH)
x(t) x(k) y(k) 1-e -sT y(t)
• .. Gc(z) .. ...
.. IT ... ... s .... Fig. 4.1 Block Diagram of Simulation Digital Filtering System.

This digital filtering system consists of the digital filter transfer function, Gdz). and the interface devices. the Analog-to-Digital (ADC). and Digital-to-Analog (DAC) converters. The ADC is sampler. and the DAC can be represented mathematically by the transfer function of the Zero-Order-Hold (ZOH) given by

1 - e -sT Gh(s) = ----------s

(4.1 )

40

41

where T represents the sampling time of the system.

The same simulation scheme can also be used to simulate closed-loop computer control systems. The digital control system is based on the block diagram in Figure 4.2.

CONTROLLER CAC (ZOH) PLANT
r(k) + ...... e(k) ... G (z) m(k) ... Gh(S) met) J G (s) e(t) ..

R(z) ~ E(z) c M(z) M(s) 1 p C(s)
.
ACC
c(k), C(z) Fig. 4.2 Block Diagram of Simulation Closed.loop Control System

It consists of the digital controller transfer function Gc(z), the transfer function of the analog plant, Gp(s). and the interface devices. Analog-to-Digital (ADC) and Digital-to-Analog (DAC) converters. In Fig. 4.2, r represents the reference input, e the error, and m the manipulation, and c the controlled output. If the feedback path through the analog plant Gp(s) is disconnected, the control system of Fig. 4.2 becomes the digital filtering system of Fig. 4.1. with r(k) = x(k), m(k) = y(k), and yet) = met). Consequently, the same algorithm can be used to simulate both systems by connecting/disconnecting the feedback path. The simulation was implemented based on the terminology of the closed-loop control system of Fig. 4.2.

The assumed form of the digital filter/controller is given by its 9th order transfer function

Gc(z) = ------ = -----------------------------------

(4.2)

42

where AO-Ag and B1-B9 represent the filter/controller coefficients. If Eq. (4.2) is crossed multiplied, and the negative powers of z are interpreted as the delay operators they represent, then the difference equation form of the algorithm is

mk = (Aoek + Ai ek-1 + + Agek_9)

-( B1 mk-1 + B2mk-2 + + Bgmk_9)'

(4.3)

The variables ek-ek-9 represents the present and previous error values, and mk-mk-9 represent the present and previous manipulation values. In the case of digital filtering or open-loop control. the error value is set equal to the reference input value since there is no feedback.

This simulation was written to represent events as they would occur in an actual real-time system. In an actual real-time system, the digital computer obtains a reference value every T seconds via the ADC and each voltage amplitude value is converted into a binary number. The converted value which is then fed into the digital algorithm to calculate the manipulation value which is then output via the DAC. The numerical content of the DAC digital register is converted to an analog voltage which the DAC holds constant until the content of the register is updated. The digital filter/control algorithm and the ADC and DAC were modeled to given an accurate representation of the components of a digital system. The overall functionality of the simulation scheme should match that of an actual real-time system.

4.3 Programming Implementation

This microcomputer simulation scheme was written in Computer Inovation's C-86 C for use on IBM and IBM compatible microcomputers. It consists of 5 modules which are, a portable module 'digsim.c', the main program, which contains the actual filtering/control algorithm, an IBM specific graphics module, 'graphics.c', which allows the simulation to be displayed in an interactive graphics format, and three header files, 'defines.h', 'plot.h', and 'graph_va.h' that contain program constants, graphics constants, and global variables for the graphics module respectively. The 5 modules are listed in Appendix A. The main module was written to be portable. If the calls to the

43

graphics routines in digsim.c (the main program module) are removed, and an additional C programming statement' done=-: • is added before the' r end while(!done) */ ' comment, the main programming module becomes a portable module suitable for use in any C environment. The results obtained from the simulation run are displayed in a tabular numeric form. The module digsim.c was compiled, linked, and run on IBM, Apple Macintosh, and Compaq microcomputers as well as with UNIX C under 4.2bsd UNIX, and VAXNMS C under version 4.4 of VMS with no changes required. The graphics module, graphics.c is IBM-PC dependent and depends on Computer Innovations C86 C graphics routines for the IBM PC. If this module is to be modified to work on other microcomputers, the calls to these routines must be replaced with machine specific calls. The rest of the graphics module is portable C code.

4.4 Simulation Characteristics

The simulation begins by asking the user to choose either a digital filtering system or a closed-loop computer control system. The same algorithm and internal calculations are done in both cases, but a software switch disconnects/connects the feedback path. The variable names within the algorithm reflect the closed-loop control system of Fig. 4.2.

The next choice the user has is between either the floating-point or fixed-point arithmetic mode. If the floating-point mode is used, the filter/controller coefficients are stored as floating point values, and all internal calculations are carried carried out with floating-point arithmetic. By using floating-point arithmetic, it is assumed infinite precision is available, and none of the internal quantization errors described in chapter 2 will occur. On the other hand, the fixed-point mode stores the coefficients as scaled integer values, and all internal calculations are done with integer values. These integer values are double precision integers. They are at least 32-bits storage on IBM PC microcomputers and may be more on other machines. A 32-bit word allows a program variable to take on a value of '"' ±2 x 106. Many programming languages such as Basic, do not allow double precision integer values, and this fact would prevent the user from inputing a coefficient or interface word size of greater than 15-bits since 216 = 65536 and this would

44

overflow the maximum signed integer value of 32767. Also internal calculations would tend to overflow. This problems is easily solved by using the long int, or double precision integer values available in the C programming language. The fixed-point mode allows the user to simulate an assembly language implementation. The quantization errors due to finite word length and finite arithmetic elements can be analyzed using this mode. The start, finish, and sampling time of the system are then input. The start time is usually assumed to be T = O. If the user has chosen digital filter mode, the input Signal, noise signal and noise multiplier are input. The input signal and noise signal are input as rotational frequencies (Hz), and converted to radian frequencies internally by the program. The noise multiplier determines the magnitude of the noise signal with respect to the input signal, and is typically 10%. A limit value, that is used so the filter/control output does not exceed a predetermined limit, is input. This value helps prevent limit cycles due to overflow and is more important for the closed-loop control option of the program. The limit is also important in real-time systems since too large an output voltage may damage sensitive equipment. Unlike most existing simulation schemes, the characteristics of the interface devices can be varied in this simulation program. The voltage level, and the number of bits in the ADC and DAC word length are input by the user. A scale factor is determined for each interface device. The ADC scale is given by

adc scale = 2(bits_adc-1 L1/volts adc

_ _ ,

(4.4)

while the DAC scale is

dac_scale = volts_dacl2(bits_dac-1 )-1.

(4.5)

As an example, if the program was used to Simulate a system which has a ± 10 volt, 12-bit ADC and DAC, then the ADC scale would be

adc_scale = 212-1-1/10 = 21L1/10 = 204.7.

(4.6)

Similarly, the DAC scale would be dac_scale = 10/211_1 = 0.0048851.

(4.7)

45

If an input of 1 volt was used, and all the algorithm did was sample and display the input. the following sequence would take place

input_value --> input_value * adc_scale --> (4.8)

internal value --> internal value * dac_scale--> output_value

Using the ±10 volt. 12-bit ADC and DAC described above, Eq. (4.8) becomes

1 --> 1 .. 204.7 --> 204.7 --> 204.7 ,. 0.0048851 --> 1.

(4.9)

As can be seen from Eq. (4.9). the input and output values are of the same magnitude so the ADC and DAC scale factors should be correct. If the two interface devices do not have the same wordlength size. the scale of the smaller device has to be adjusted. This assures the input and output point values will be correctly encoded and decoded with the proper magnitudes. If the DAC has a larger wordlength than th ADC, the ADC scale is adjusted by

adc_scale = adc_scale ,. 2(bits_dac - bits_adc}.

(4.10)

If the ADC has a larger wordlength then the DAC. the DAC scale is adjusted by

dac_scale = dac_scale/2(bits_adc - bits_dac).

(4.11 )

To illustrate this point consider a ± 10 volt, t t-blt ADC, and a ± 10 volt, 10-bit DAC. The adc scale for this example is

adc_scale = 210_1/10 = 102.3.

(4.12)

The dac_scale is

dac_scale = 10/29-1 = 0.0195694.

(4.13)

An input of 1 volt, with a sample and display algorithm would be

46

1--> 1 * 204.7 --> 204.7 --> 204.7 * 0.0195694 --> 2.

(4.14)

It is clear from Eq. (4.14) that the 1 volt input was not properly scaled. If the DAC scale. since it has the smaller wordlength is adjusted by using Eq. (4.11). the new dac_scale becomes

dac_scale = 0.0195694/211-10 = 0.0097847.

(4.15)

Substituting the new dac scale from Eq. (4.15) into Eq. (4.14), the new output value becomes

1--> 1 * 204.7 --> 204.7 --> 204.7 * 0.0097847 --> 1. (4.16)

The input value now matches the output value in Eq. (4.16). If the two interface devices differ in size, the smaller of the two will be scaled to achieve the proper input output values. If the user has chosen to use the simulation in the fixed-point mode. the number of bits used to store the controller coefficients is input. For a fixed-point implementation. all coefficient values are multiplied by a scale factor. truncated. and stored as an integer value. The number of bits for the coefficient storage is used to determine a scale factor to store the filter/controller coefficients. This coefficient scale factor is

coeff_scale = 2bits_coeff.

(4.17)

Up to a 9th order filter/controller can be input. The filter/controller coefficients input are stored in one of two ways. In the floating-point mode. they are stored as infinite precision floating-point values. In the fixed-point mode, the coefficients are multiplied by a scale factor(Eq. (4.17») based on the number of bits that will be used to store the coefficients. For example. if a filter coefficient was 0.345. and it was to be stored in the fixed-point mode with 16 bits, the actual value stored would be

0.345 * 216 = 0.345 * 65536 = TRUNC(0.345 * 65536) = 22609. (4.18)

Truncation is used instead of rounding to Simulate fixed-point assembly

47

language implementation. This scale factor is removed before a value is output to the filter or controller as the final output value is divided by the coefficient scale factor to return its magnitude to the proper value. If the user has chosen the closed-loop control option, the coefficients representing the pulse transfer function model of the analog plant are input. Up to a 9th order plant can be used. All the plant coefficients are floating-point values. The closed-loop control option allows only a step input to the controller. After the plant model is input, the size of the step input (or reference) is obtained from the user.

The actual filtering/control algorithm is done with either fixed or floating-point arithmetic depending on the option chosen by the user, not both. The variable names used in the algorithm are based on the closed-loop control terminology of Fig. 4.2. Software flags which depend on the options chosen by the user determine how the algorithm will be processed.

4.5 Digital Filtering Simulation

If the floating-point option is chosen when simulating a digital filtering system, the sequence of events is as follows. The first action preformed is to calculate the input signal and noise signal at the sampling time T. These signal values are obtained by a call to the built in sine function of the e programming language. The values of the input signal and noise signal are obtained from the sine function of the signal radian frequency multiplied by the sampling time. The noise signal value is then multiplied by the noise multiplier. The Signal and noise value are added together to form the current reference value. Since there is no feedback in a digital filtering system, the plant output is zero. The reference value is converted to an integer value by multiplying it by the ADe scale and truncating. This is done implicitly in the e programming language. For example, ireference in the e statement, ' ireference = REF[pt] .. adc_scale; ',is an integer value that will receive the truncated product of the reference input and the adc_scale. The current error value, E[O], in the e statement,' ErO] = ireference - iplant;' , is a real value, and will be set equal to the reference input, an integer value minus the plant output (the plant output is zero for a digital filtering system).

48

This kind of data conversion is not allowed in most programming languages as data values with different types can not be operated upon. but is one of the handy features of the C programming language. In an actual real-time system. the ADC value that is input to a filter/control algorithm is always an integer value. Floating-point ADC's are rare. [44] ( The input value to the digital word of an ADC is usually truncated and this loss of information becomes especially critical in closed-loop control systems. ) Since there is no feedback. the error value is set equal to the reference input. From here on all calculation are done with floating-point values.

As can be seen in Eq. (4.3). the only term of the difference equation that depends on the present time is the product AOek' All other terms depend on the previous values of m and e. This suggests that the term sum. in the C statement, ' M[O] = AO * E[O] + sum; '. can be calculated ahead of time. When the program is first started. for instance, the term sum is zero, and the new sum is calculated for the next loop iteration as soon as the present manipulation value is moved out through the DAC. This scheme ensures that the time between sampling the input signal and sending out the new manipulation (output) is kept to a minimum. This sequence of events is ment to duplicate actual real-time programming practices where the emphasis is placed on speed on computation. In an actual real-time implementation, after the manipulation value is calculated. it is checked against a limit value. If the manipulation value is greater than the limit it is set to the limit. If it is smaller than the negative limit. it is set to the negative limit. The manipulation value is output through the DAC by multiplying it by the DAC scale factor.

The output value is a floating-point number representing an analog voltage. At this point in the simulation. the values of the sampling time. the input signal, noise signal, signal plus noise (reference) and the filtered output (manipulation) are printed. These values are also saved in arrays to be printed out later in a file if the user wants to save the values of the simulation run. After printing out the values for the current time T, the error and

manipulation values of Eq. (4.3) are updated. i. e. ek-9 = ek-8 .... ek-1 = ek and mk-9 = mk-8 ... · mk-1 = mk· The previous error and manipulation values are updated only to the order of the filter/controller.

If the filter/controller were 2nd order. only ek,ek-1 ,ek-2 and mk.mk-1 ,mk-2'

49

would be used and updated. The sum representing the rest of the difference equation in Eq. (4.3) above , is then recalculated to be ready for the next sampling time. This whole process is continued in a 'for' loop. Each iteration of the loop represents the sampling time T. The 'for' loop is executed for the starting to the finish time the user input with an increment of the sampling time. If the user had chosen the digital filtering mode with fixed-point arithmetic instead of floating-point, the same actions are performed, but calculations are done with integer values. The signal and noise values are obtained the same way as for the floating point mode, and then summed and multiplied by the ADC scale to obtain the reference value.

The manipulation value is calculated by multiplying the AO coefficient by the error value, and adding the sum of the difference equation terms as described for the floating-point example above. The calculations here are all integer, and the coefficients have been scaled at input and stored as integer values. The variable names are the same with the exception that the integer variables

all have a prefix of 'I'. For example, AO = lAO' A,=IA" etc. The manipulation value is divided by the coefficient scale before it is output. This ensures the manipulation output magnitude will match the input magnitude of the signal. Since all the coefficient values were scaled up by a factor of the coefficient scale to be stored as integers, all algorithm calculations were much larger in magnitude, so the scale factor must be removed before output. The manipulation value is then checked against an integer limit and a negative integer limit, then output through the DAC by multiplying it by the DAC scale. The error and manipulation values are updated the same way as described above, and the sum is calculated for the next sampling time. The fixed-point option has one additional feature.

Since the calculations are done with integers, the manipulation value divided back by the coefficient scale will give only the quotient, with the remainder being dropped with this integer division. However, the user has the option of saving the remainder, and using it in subsequent calculations. The remainder is saved by performing the modulus operation on the manipulation before it is divided. This is done in the C statement, 'rem_int = IM[O] % coeff_scale;' ,The modulus operator returns the remainder of an integer division. If the user chooses to use this option, the remainder value is

50

added to the next manipulation along with the sum of the difference equation terms. This will help improve some of the loss of precision that would occur with the truncation done by integer division. Only one of the two sets of calculations are carried out depending on what math mode the user has chosen. Internal values from either mode can be printed out at any time if the user wishes to monitor the internals of the algorithm.

4.6 Closed-Loop Control Simulation.

The events for a closed-loop control system follow the block diagram of Fig. 4.2. The feedback path through the analog plant is connected, so the system functions around a closed loop. At time T = 0, an output from the analog plant to control is obtained. The plant output is calculated in the same fashion as the manipulation value. The plant output is set equal to the current

plant input times 00 a plant model numerator coefficient plus the plant_sum. This is done in the C statement, ' PLANT_OUPUT[O] = VM[O] .. 0[0] + plant_sum; '.

The plant_sum is similar to the variable sum in the difference equation form of the algorithm. The plant_sum starts off with a value of zero and is calculated for the time T + 1 during sampling time T as the variable sum was. The plant output is sampled via the ADC, i.e. that plant output is multiplied by the ADC scale and truncated to an integer value. The reference in this case is the value of the step input brought in from the keyboard by the user. This reference value is also scaled by multiplying it by the ADC scale. The step input was brought in as a voltage, so that voltage is scaled to represent the proper magnitude value at the ADC input. The error is then calculated by subtracting the current plant output from the current reference value. The error term in this closed-loop control system is the difference between the reference input and the actual plant output. The manipulation will be adjusted according to the error term to try and keep the plant output as close to the reference input as possible. This error term is used in the calculation to obtain the manipulation as was done in the digital filter, i.e. MO = AO"EO + sum. The manipulation is then checked against the positive and negative limit value. If it exceeds it in either direction. the manipulation is set equal to either

51

the positive or negative limit value. The manipulation is then output through the DAC by multiplying it by the DAC scale. For this closed-loop option, the sampling time, reference input, error value times the DAC scale, manipulation times the DAC scale (after output through the DAC), and plant output are printed. Since the error value is an actual error value based on the internal calculations, it is multiplied by the ADC scale to allow the user to use this term for comparison to the manipulation output value. If this was not done, the error term would be much larger since its values is determined from a subtraction of two numbers that have been multiplied by the DAC scale. The same reasoning is used for the manipulation value. This print statement can be changed to monitor the internal values if the user desires. Updates are then done, and the new sum value is calculated as was done for the digital filter. Since there is feedback, and the manipulation is output to the analog plant, the terms in the difference equation representing the model of the analog have to be updated in a fashion similar to that of the difference equation in the actual filter/control algorithm. The plant sum is then calculated for the next sampling time T.

The closed-loop control system in the fixed-point mode functions the same way as the floating-point mode. The coefficient storage and the internal calculations are done with integer arithmetic. The same internal logic applies to the closed-loop control system, that did for fixed-point digital filter except for the fact that we now have feedback through the analog plant.

As should be apparent in the above examples, the same algorithm is used for both systems. If the input and output through the analog plant is disconnected, the algorithm is an open-loop control or digital filtering system. If the analog plant feedback path is used, the algorithm is a closed-loop computer control system. This makes the simulation very flexible, and allows software switches to control what really happens.

After the algorithm has run from start time to finish time, the program switches to the graphics mode. The values that were printed out during the execution of the algorithm were saved in arrays. These values are now sent to a graphics routine to be plotted on the screen. Since the graphics routines were written to handle double precision values, if the simulation was used in the fixed-point mode, the values saved as integers in arrays during the algorithm execution, are copied into double preclsion real number arrays to

52

be sent to the plotting package. This step saves repeating the whole graphics package to handling the plotting of integer values instead of reals. The integer values are converted to reals and sent to the graphics routines.

The graphics routines plot the values sent over on the screen. These graphics routines are completely IBM PC dependent. The user is then allowed to change system parameters in an interactive box on the bottom of the graphics screen. The algorithm is re-run with the new parameters until the user hits a 'e', for end, in the graphics mode. There is a flag named 'done' that controls the execution of the algorithm. This flag is always true until the user hits a 'e', then the done flag is set. and the program will terminate. Until that time, the user may continually change system parameters and see the results on the graphics screen. The actual values are not printed on the screen as was done during the first run of the simulation, but are plotted graphically. When the user inputs an 'e' the program terminates by asking if the points saved in the arrays for the current simulation run should be saved in a file. If the user does want to save these points. a file is created and the current simulation values are saved. The program then terminates. The user can re-run the algorithm as many times as desired without re-entering system parameters. There are 3 other subroutines included in the main module. They are file_openO, getnumO. and dqetnurru), The routine file_openO is used to open a file with error checking. The routines getnum() and dgetnumO are used to input an integer and real value from the keyboard. The scanfO function for the C programming language is not appropriate for keyboard input. If the user types an improper input, scanft) does not check for bad inputs. Two recursive routines were written to input numbers from the keyboard. These routines buffer characters until a <return> is hit. The routines then try to convert the input buffer into a valid integer or real number. If they find improper input. they call themselves until a proper value is input. These routines save the user the frustration of having to start the program over if mistake causes a bad input. The comments preceding these routines explain their operation in detail.

The graphics routines mentioned above will be explained in detail in the next section.

53

4.7 Graphjcs Routines

The graphics routines used to plot the simulation results are based on Computer Inovations C-86 C graphics library calls. These routines are listed in the comments at the top of the graphics module. The calls to these routines must be changed if this module is ported to another microcomputer. This graphics module plots points in the medium resolution IBM graphics mode. The main graphics routine is graphO. It calls other routines to produce a concise graphics screen with plots of the simulation results.

The graphics module begins by determining the maximum and minimum values of signal, noise, reference (slqnal-nolse}, and manipulation (output) arrays if the user has chosen the digital filtering option, or the max and min of the reference error, manipulation, and plant output arrays for the closed-loop control option. The maximum sampling time value is also determined. The x and y axis scale values are determined next. The maximum and minimum values for the simulation results arrays are then printed for the user. The user is asked to input values for the upper and lower limit of the y-axls. This is done to avoid crunching the curve against the top or bottom of the plot. This feature can be easily made automatic, but the graphics module was also designed to be as flexible as possible. If the user has chosen the closed-loop mode, the maximum and minimum values of the manipulation are also printed. The user is asked for a scale factor to scale the manipulation values. In a closed-loop control system, the manipulation values are usually significantly larger than the other values, and if the manipulation is not scaled, it overshadows the other variables, and they are too small to be studied on the graphics plot. The manipulation array is scaled by the scale factor the user has input so it fits within the range of the other simulation values. The graphics module then switches into the graphics mode and the first action it takes is to draw a box on the screen the simulation results will be plotted in. The x-axis is then marked. The x-axls represents time and ranges from 0 to the finish time the user has input divided into 4 equal parts. The y-axls is marked next. The y-axls represents the magnitude of all simulation values. It is also divided into four equal sections. The legend for each curve is then printed at the top of the graph. A box with the color representing the color that will be used to plot the actual data is drawn after the curve label. These

54

labels are different for the digital filter and the closed-loop option. The user also has the option of having the graphics routines draw a distinct symbol for each point on the graph curve. The symbols that will be used to mark points on the curves are drawn after the legend on the top of the graph box. This option is disabled by default, but can be turned on by setting the value of SYMBOLS in the main program to a value of 1. This can also be done in the graphics mode via the interactive box. However, if the simulation run contains too many points, the plot becomes messy as symbols overlap. A line representing zero is plotted, and then the curves representing the simulation potnts are plotted. And finally, a box at the bottom of the graphics screen is drawn with the message "Change Parameter? "

inside of it. This completes one complete run of the simulation, including both the algorithm and the graphics modules. A typical example the graphics screen produced by a digital filtering system is shown in Fig. 4.3.

The most important feature of this graphics module is the interactive box at the bottom of the graphics screen. It can be used to varying any of the system parameters, and then re-run the filtering/control algorithm, without leaving the graphics mode. The user can vary system parameters by entering a 1-3 character sequence representing system values. For example to examine the word size of the ADC the user would input AD <return>. The program would respond with

ND bits == xx, and wait for an input. If the user wishes to change this or any system value, they input a 'f (divide character or slash) with the new value immediately following the 'r. This action will input and store a new value for this system parameter. For example, if the wordlength of the ADC for a simulation run was 12-bits, the user could verify this fact by entering

AD <return>.

The program would respond

NO bits == 12

and wait for input. Any input but the character 'f will cause the old value for the wordlength of the ADC to be retained. If the user wished to change the ADC wordlength to 14-bits they would use the following sequence

AD <return>

ND bits = 12/14 <return>.

The value for the number of bits for the ADC word length is now 14. If the user

55

wishes to determine the effect of this new AOC wordlength value on the output of a digital system, the input the character 'g' for go. The algorithm is re-run, and the user gets to chose the maximum and minimum y-axis values as described above. (Also the manipulation scale if the closed-loop option is used). All 1/0 is done in the interactive graphics box at the bottom of the graphics screen. The method for examining and changing system variables is the method that is commonly used for actual real-time systems. [See 78]. When the new y-axis values have been input the screen is refreshed with the new simulation data. The user can repeat this process indefinitely. When the user is finished however, they input an

'e'. This sets a flag in the main module and the program terminates as was described in Sect. 4.7.

The following is an alphabetical list of the 1-3 character sequences that are recognized by the simulation program. The sequences with a "" are digital filter specific, and those with a '$' are closed-loop control specific. The other sequences are available to both options. Input can be lowercase or uppercase, and an improper input will just clear the interactive box. The input of new values for system variables is protected by two routines ggetnumO and gdgetnumO. These routines are identical to the getnumO and dgetnumO routines of the main module for inputing integer and real values respectively. They do however contain additional code to function in the graphics mode and ensure all I/O will happen in the interactive graphics box. An explanation of the specifics of each graphics routines can be found in the comments preceding each routine.

The strings currently recognized by the simulation program are

(* = filter only, $ = closed-loop control only, & = fixed-point only)

AO-A9 -- Algorithm numerator coefficients

AO -- Wordlength of the AOC (bits)

&BC -- Wordlength to store cofficients (bits) (fixed-point only)

B 1-B9 -- Algorithm denominator coefficients

CO -- Controller/filter order

$00-09 -- Analog plant model denominator coefficients

E -- End the program.

$ERR -- Toggle Error signal onloff ,0 = off, 1 = on.

OA -- Wordlength of the DAC (bits)

FP

FT

G

L $MAN

"N

"NM "NOI $PO $00-09 $REF REM

.. s

..

SIG

..

SN

SP SYM $STE T VAD VDA

56

-- Change between internal fixed and floating point modes

o = floating-point, 1 = fixed-point -- Finish Time

-- Go, re-run the main program algorithm after changing values

-- Output limit value

-- Toggle manipulation on/off 0 = off, 1 = on

-- Toggle noise signal on/off, 0 = off, 1 = on

-- Noise multiplier

-- Noise signal rotational frequency (Hz)

-- Plant Order

-- Analog plant model numerator coefficients

-- Toggle reference on/off 0 = off, 1 = on

-- Use remainder 0 = off, 1 = on, (fixed-point only)

-- Toggle signal on/off, 0 = off, 1 = on

-- Signal rotational frequency (Hz)

-- Toggle signal+noise on/off. 0 = off. 1 = on

-- Save the polnts of this simulation run in a file

-- Draw symbols at each point on graph curve, - 0 = off. 1 = on

-- Reference step input size

-- Sampling time

-- Voltage Level of the ADC

-- Voltage Level of the DAC.

4.8 Summary of Chapter 4

In this chapter, a new digital simulation scheme was introduced and described. This new simulation scheme offers many advantages over existing schemes. and was designed to be as flexible and realistic as possible. It includes an interactive graphics module that allows the user to see the affects of varying system parameter immediately. Results can be displayed graphically as well as in a tabular numeric form. This digital simulation scheme gives the system designer a powerful design tool that is user friendly, and easy to implement.

In the next chapter, this simulation scheme will be used to analyze a digital filtering and a closed-loop control system.

57

S I G ;;,In NOln::'ITi S+N - OUTPUT

I

,

@.55GB

Change Pa~a~ete~?

Fig. 4.3 Typical Simulation Digital Filter Graphics Screen Dump

58

5.0 Digital Simulation Examples

To illustrate the power of this digital simulation scheme consider the following examples.

5.1 Digital filter T = 0,010 sec

Recall from Chapter 4, the digital filtering scheme is based on the following block diagram.

DIGITAL FILTERING SYSTEM
ADC DIGITAL FILTER DAC (ZOH)
x(t) x(k) y(k) 1-e -sT y(t)
.. .. Gc(z) .. ..
.. 'T .. .. s ... Fig. 5.1 Block Diagram of Simulation Digital Filtering System

Suppose it is desired to build a 4th order low pass digital filtering system with a cutoff frequency of 10 Hz and a sampling time of T = 0.010 sec. Signals with frequencies less than 10Hz should be allowed to pass with full magnitude while Signals with frequencies > 10Hz will be blocked. A 4th-order Butterworth filter will be used. Butterworth filters are characterized by the property that their magnitude is maximally flat in the passband and converges rapidly to zero in the stopband [85]. The transfer function of a 4th order analog Butterworth filter is

w4 c

G(s) = ---------------------------------------------------------------------- (5.1)

54 + 2.613wcs3 + 3.414wc2s2 + 2.613wc3s + V:c 4

59

where Wc is the cutoff frequency. The rotational cutoff frequency of wr = 10 Hz given in the system specifications is equivalent to an angular cutoff frequency of

wc::; 21tWr ::; 21t10 ::; 62.831853 radians/sec.

(5.2)

Substituting the Wc obtained in Eq. (5.2) into Eq. (5.1). the desired analog filter transfer function becomes

15585455

G (s) ::; -----------------------------------------------------------------------------. (5.3)

s4 + 164.18s3 + 13477.9452 + 648155.215 + 15585455

The frequency characteristics of any filter must be investigated to determine if the filter will meet the design criteria. The first step in obtaining the frequency response of an analog filter is to replace s by jw in the transfer function. The complex form of a system described by its transfer function is written in rectangular coordinates as

(5.4)

GOw) = G(s) I = Re(w) + jlm(w) I s=jw

where Re(w) and Im(w) represent the real and imaginary part of GOw). [82]

Eq. (5.4) can be expressed in polar form as GOw) = I G(jw) I t: F(w)

where I GOw)l is the magnitude output-to-input ratio given by

(5.5)

I GUw) I ::; .y [Re(w)]2 + [lm(w)]2

(5.6)

and F(w) is the phase angle given by

F(w) ::; -tan-1 [ Im(w)/Re(w)].

(5.7)

Substituting s = jw into Eq. (5.3) it can be shown I GOw) I. the magnitude output-to-input ratio is

I GOw) I::;

(5.8)

1

.y[ 1+( w/62.83)4 -3.414 (w/62.83)2]2 + [2.613 (w 162.83) -2.613 (W/62.83)3]~

60

and the phase angle F(w) is

2.613 w 162.83 - 2.613 (w/62.83)3

F(w) = -tan-1

(5.9)

1 + (w/62.83)4 - 3.414 (w/62.83)2

Equations (5.8) and (5.9) were used to display the frequency characteristics of the filter of analog filter of Eq. (5.3). A plot of magnitude versus frequency is shown in Figure 5.2, and a plot of phase angle versus frequency is shown in Figure 5.3 where w is varied from 6.28 to 628 rad/sec. As can be seen in Figure 5.2, the analog filter has a magnitude of unity in the passband, and a magnitude that converges rapidly to zero in the stopband beyond the cutoff frequency of 62.8 radians/sec (10 Hz). The phase angle plot can be used to determine additional filter characteristics.

It is desired to obtain a digital filter that exhibits the same characteristics of this 4th order analog filter. To obtain the digital filter, the bilinear transform will be used. The bilinear transformation maps the complex s plane into the complex z plane by a unique bilinear transformation defined by

(1 + s)

z =

(5.10)

(1 - s)

or, equivalently, solving for s

(z - 1)

s=

(5.11 )

(z + 1)

The relationship between the analog and digital frequencies is given by

sin(wdig T/2)

wcon= ------------------ = tan(wdig T/2) , cos(Wdig T/2)

(5.12)

where wcon and Wdig represent the continuous (analog) and digital frequencies respectively, and T is the sampling time of the system. Equation (5.12) is a nonlinear relationship, e.g. a zero analog frequency is mapped into

a zero digital frequency, but the point wcon = 00 is mapped into the digital frequency 21t/T, the folding frequency. The bilinear transformation provides a

61

one-to-one mapping of the jw-axis onto the unit circle. but it compresses the infinite range of analog frequencies into a finite range of digital frequencies. The frequency response of the analog filter over an infinite range is transformed into the range from 0 to 21t/T for the digital filter. The frequency compression phenomenon associated with the bilinear transformation is known as frequency warping and must be treated when designing digital filters. This can be done by prewraping the critical frequencies usually the cutoff frequencies. of the analog filter. so that the critical frequencies of the digital filter end up where they belong.[82]. Substituting the cutoff frequency of 62.8 radians/sec from Eq. (5.2) , and the sampling time of 0.010 sec into Eq. (5.12) the prewraped analog frequency becomes

Wc = tan(62.831853·0.010/2) = 0.3249197.

(5.13)

The desired digital filter transfer function is obtained by using the relation

G(z) = G(s) I

I s = (z-t )/(z+1).

(5.14)

Using the relation described in Eq. (5.14), the analog filter transfer function from Eq. (5.1) becomes

G(z) = G(s)l =

I s = (z-t )/(z+ 1)

(5.15)

w4 c

(z+ 1)

Substituting the value Wc = 0.3249197 from Eq. (5.13). Eq. (5.15) becomes (5.16)

0.0111456

G(z)=-----------------------------------------------------------------------------------------------.

(z-t )4+0.8490151 (z-t )3+0.3604255(z-1 )2+0.0896329(z-1 )+0.0111456

(z- t)

Finally, after algebraic manipulation the desired digital filter is obtained and

62

given by

(S.17)

0.0048244z4+ 0.0192979z3+ 0.0289468z2+ 0.0192979z+ 0.0048244

Gc(z)=---------------------------------------------------------------------------------------------.

z4 - 2.369551z3 + 2.314076z2 - 1.0547282z + 0.187394

The frequency characteristics of this digital filter are determined in a manner similar to the analog filter. The z's in the digital filter transfer function are replaced by eiwT• where T is the sampling time of the system. The substitution

z = e jwT

(5.18)

is commonly done by hand on up to 2nd order systems, while the frequency components of higher order systems are usually determined numerically by using a computer program. The substitution z = eiwT was made on the digital filter of Eq. (5.17). Making use of Euler's Identity

~wT = coswT + jsinwT

(5.19)

the real and imaginary components were grouped to determine the magnitude, I GUw) I. and the phase angle. F(w). of this digital filter for T = 0.010 sec. The magnitude and phase angle were used to determine the frequency response of this digital filter. A plot of magnitude versus frequency and phase angle versus frequency are shown in Figures S.4 and S.5 respectively. The digital filter designed in Eq. (5.17) has frequency response characteristics very close to the original analog filter of Eq. (5.3) The magnitude spike in Fig. 5.4 is at the folding frequency of 21t/T or 628.32 rad/sec. The phase angle of this digital filter seems to compare favorably to the analog phase angle. It appears the digital filter designed in Eq. (S.i7) from the analog filter in Eq. (5.3) has the desired frequency characteristics specified in the system requirements. This digital filter will now be used to demonstrate the the digital simulation.

63

5.2 Digital Filter Coefficient Quantization T - 0.010 sec

Recall that it was assumed that a digital filter implemented with floating-point arithmetic is not subject to the quantization errors described in Sections 2.1-2.6 of Chapter 2. To get an idea of the "ideal" filter output. the the digital filter of Eq. (5.17) was used with the simulation in the floating-point mode, l.e. coefficient storage and internal arithmetic was done with floating-point arithmetic. A tz-blt, ±10 volt, ADC/DAC was used with a signal frequency of 2 Hz (a 1 volt sine wave) and a noise frequency of 60 Hz with a noise multiplier of .1 (a .1 volt sine wave). The 12-bit, ±10 volt ADC/DAC was chosen because this interface configuration is commonly available for actual real-time systems. The 60Hz noise is a frequency beyond the cutoff frequency of 10Hz so it should be blocked or filtered. Figure 1 of Appendix B shows the input sequence that is needed to use the simulation program with the digital filter of Eq. (5.17) and the system characteristics described above. Table 5.1 shows the actual tabular output summary generated by the simulation using the characteristics described above. The simulation output was saved in a file and plotted using Lotus 1-2-3. Figure 5.6 is a plot of this simulation run. Figure 2 of Appendix B is a screen dump of the graphics plot generated by the simulation program. Both plots show the signal, noise, signal+noise, and the filtered output for one cycle of the input signal. Note the filtered signal is free of noise and matches the input signal except for the delay caused by computation of the filter algorithm. Lotus 1-2-3 will be used to plot all output examples since it offers advantages over a screen dump.

Suppose it is desired to build a digital filtering system with the smallest computer wordlength possible using fixed-point arithmetic with a 12-bit, ± 10 volt ADC/DAC based on the digital filter of Eq. (5.17) for a 10 msec sampling time. All coefficients are multiplied by scale factor and are stored as integers in a fixed-point implementation. Consider storing the coefficients first in a word length of 6-bits. The coefficients are stored as integers by multiplying their decimal value by the scale factor of 64 (26 = 64). Assuming that truncation (not rounding) is used, the coefficient 0.0048244 will be stored as TRUNC(.0048244 fr 64) , or TRUNC(0.3087616) = 0, the coefficient 0.0192979 as TRUNC(1.2350656) = 1, etc. Consequently, digital filter transfer function of Eq. (5.17) is represented by

64

z3 + z2+ Z

Gdz) := ------------------------------------------, or (5.20)

64z4 - 151z3 + 148z2 - 67z + 11

0.015625z3 + 0.015625z2 + 0.015625z

Gc{ z) =------------------------------------------------------------------------, (5.21 )

z4 - 2.359375z3 + 2.3125z2 - 1.046875z + 0.171875

where Eq. (5.21) was obtained from Eq. (5.20) by dividing by 64. Using a root-finder program, it was determined that the 6-bit filter given by Eq. (5.21) has zeros (numerator roots) and poles (denominator roots) at

zeros: 0, -0.5 i jO.866025

poles: 0.37438, 0.664099, 0.660919 ijO.506192

(5.22) (5.23)

the original digital filter of Eq. (5.17) has zeros and poles at

zeros: -0.922899, -1.08352, 0.996825 i jO.0799559 poles: 0.524302 i jO.145781

0.660474 ± j0.443346

(5.24) (S.2S)

Note the disappearance of two terms in the numerator for this 6-bit approximation and how the poles and zeros of this filter have shited away from the original values. The filter of Eq. (5.17) was used in the simulation program in the fixed-point mode, i.e. filter coefficients and internal arithmetic was done with fixed-point (integer) arithmetic. A 6-bit word length was used to store the filter coefficients, with the other system parameters the same as those used for the floating-point example. Since the simulation uses the techniques described above, the digital filter of Eq. (5.21) is what is actually used for the filtering algorithm. Figure 5.7 is a plot of the simulation run for one cycle of the input signal. The other system characteristics used in the simulation are the same as those described for the floating-point example, and will be used with the remaining examples unless otherwise noted. The magnitude values on the y-axis of Fig. 5.7 are internal integer numbers representing filtered signal magnitude before output to the DAC. For example, since a i 10 volt, 12-bit ADC and DAC were used, a one volt sine

65

wave input to the ADC would be represented by the number

1 .. 211-1 1 .. 2047

------------ = ---------------- = 204.7.

10 10

If each of the paints were multiplied by the DAC scale of 10/1"2047 =

(5.26)

0.0048851, the output magnitudes would have the same scale as the floating-point simulation run. Since the fixed-point mode has been chosen it is desirable to examine the filter using the internal integer representation of the algorithm values. The output can be compared to the input signal on the same plot. Note the decreased amplitude, the shape of the output signal, and the delay before the filtered output is seen for this 6-bit representation. The smaller amplitude is due to the loss of terms in the numerator since these numerator coefficients were too small to store with a fixed-point representation in a 6-bit wordlength. The shape and the magnitude of the filtered output may also be explained by the shifts in the poles the digital filter transfer function for this 6-bit representation. The larger output delay is due to the fact that integer arithmetic is used, and values must be at least >= 1 to appear as filter output. Since the 6-bit case may be marginal, a coefficient wordlength of 8-bits will be tried next. An 8-bit representation will use a scale factor of 28 = 256. The digital filter transfer function is represented in an 8-bit word length by

z4 + 4 z3 + 7z2 + 4z + 1

Gc(z) = -----------------------------------------------, or

256z4 - 606z3 + 592z2 - 270z + 47

(5.27)

(5.28) 0.00390625z4 + 0.015625z3 + 0.02734375z2+ 0.015625z+ 0.00390625

Gc(z)=----------------------------------------------------------------------------------------------.

z4 - 2.3671875z3 + 2.3125z2 - 1.0546875z + 0.18359375

where Eq. (5.28) was obtained from (5.27) by dividing by 256. The 8-bit filter given by (5.28) has zeros and poles at

66

-0.375189 ± jO.300243.

-1.624810 ± j1.300240.

poles: 0.534489 ± 0.0541343.

(5.29)

zeros:

0.649105 ± 0.4634620.

(5.30)

A plot of the simulation run with an 8-bit coefficient wordlength is shown in Fig. 5.8. Note how this 8-bit representation is closer to the original input signal. than the 6-bit representation was but the 8-bit representation still has a depressed magnitude. Coefficient wordlengths of 12 and 16 bits were also tried. For 12-bits the scale factor is 212 = 4096. and the transfer function of the 12-bit representation is given by

19z4 + 79z3 + 118z2 + 79z + 19

Gdz) = ----------------------------------------------------------, or (5.31 )

4096z4 - 9705z3 + 9478z2 - 4320z + 767

(5.32) 0.0046387z4+ 0.0192871z3+ 0.028809z2+ 0.0192871z+ 0.0046387

Gc (z)= - ---------------------------- --- - ----- --------------------- ----------------------------------- •

z4 - 2.3693847z3 + 2.3139638z2 - 1.056875z + 0.187255

Eq. (5.32) was obtained by dividing Eq. (5.31) by the scale factor of 212= 4096. The poles and zeros of Eq. (5.32) are

zeros: -0.531543, -1.88131, -0.872518 ± jO.488581.

(5.33)

poles: 0.524599 ± jO.143579, 0.660093 ± jO.444168.

(5.34)

The transfer function of the 16-bit representation of the digital filter is given by

(5.35)

316z4+ 1264z3+ 1897z2 + 1264z + 316

Gc(z) = --------------------------------------------------------------------, or

65536z4 - 155290z3 + 151655z2 -69122z + 12281

67

(5.36) 0.00482188z4 + 0.0192871 z3+0.0289459z2+O.0192871 z+ 0.00482188

Gc(z)=-----------------------------------------------------------------------------------------------.

z4 - 2.3695374z3 + 2.3140717z2 - 1.0547180z + 0.1873931

Eq. (5.36) was obtained by dividing (5.35) by 2 16 = 65536. The poles and zeros of Eq. (5.36) are

zeros: -0.833463 ± jO.140768. -1.665400 ± jO.197026.

(5.37)

poles: 0.524247 ± jO.145843. 0.660522 ± jO.443364.

(5.38)

Simulation run plots of the 12-bit and 16-bit coefficient wordlength are shown in Figures 5.9 and 5.10. Note how close the 12-bit and 16-bit filtered signal output compares with the floating-point output signal of Fig. 5.6. The tabular output from a simulation run with 16-bits coefficient storage is shown in Table 5.2. Table 5.3 contains a summary of simulation runs varying the coefficient wordlength from 6 to 18-bits by two bits. The numbers represent internal magnitudes for a ± 10 volt 12-bit ADC/DAC, where a 1 volt input = 204, before output through the DAC. For example, consider the output at T = 0.17 sec. Notice the magnitude difference between the 6-bit output and the 18-bit output. The filter output values do not change after a coefficient wordlength of at least 12-bits is used, but 10-bit coefficient storage output does not differ much in magnitude compared with the 12-18-bit representations. Coefficients stored in a 12-bit wordlength would achieve the desired results of the smallest wordlength computer with a 12-bit ADC and DAC. Note how the maximum output value for the signal using 12-bits reaches a peak at 203, which is just below the maximum magnitude of the input signal value of 204 (at time T = 0.13. The phase difference is due to the delay caused by algorithm computation to obtain an output value. ) Arithmetic quantization due to finite-arithmetic elements causes a small lose in output precision, but the difference is negligible using 12-bits to store the filter coefficients. The Root Mean Square (RMS) error was calculated from the difference from the

68

original input signal and filtered output signal. The RMS error decreases as the coefficient wordlength increases, since the representation of the filter coefficients approaches that of the floating-point filter. Figure 5.11 shows a plot of the difference in RMS magnitude from the steady state value that results from using the digital filter of Eq. (5.17) in the simulation and varying the coefficient wordlength from 6 to 20-bits. For example using 6-bits, the RMS error is over 4 times greater than the RMS steady state value. It should be noted that the digital filter of Eq. (5.17) can not be represented by less than 6-bits as all terms in the numerator would disappear and give an output of zero for every input. Also, the RMS error reaches a steady state value when 12 or more bits are used to represent the filter, so using a larger wordlength to store the filter coefficient would not increase the resolution of the system for this particular filtering example. However digital computer wordlengths are usually based on a multiple of 8-bits, Le. 8, 16, 32. In this example, 8-bits would not provide the proper resolution, and a 16-bit computer wordlength would probably be the logical choice for this digital filter at T = 0.010 sec.

5.3 Pigital Filter Interface Quantization T = 0.010 sec

Assume that it is now desired to determine the effect that the wordlength of the APC and PAC has on the effect of the digital filtering system. The objective here also is to use the smallest wordlength APC/PAC possible. Since the APC and PAC are interdependent, it is necessary to properly coordinate this interface pair if the real-time application is to be successful and cost effective. Let us first examine the effect of changing the wordlength of the APC. Suppose the digital filter of Eq. (5.17) is used with a 12-bit, ± 10 volt PAC and 16-bit coefficient wordlength in a fixed-point implementation, with the APC wordlength varied from 6 to 18 bits. (Note the 16-bit coefficient storage was determined to be more than adequate in the coefficient quantization example above, and the 12-bit PAC word length choice is based upon its common use with actual real-time systems. ) Table 5.4 shows the original Signal and the filtered output for APC wordlength varied by two bits from 6 to 18 bits. Since varying the ADC will cause the representation of the magnitude of a 1 volt input to change, for example, in a 12-bit, ± 10 volt ADC

69

a 1 volt input would be represented internally as 204 (see Eq. (5.26) above), 14-bits, 1 volt = B 19, 16-bits, 1 volt = 3275, etc., the filtered output listed in Table 5.4 is after output through the DAC or internal value" DAC scale. The internal arithmetic and coefficient storage was still done with integer arithmetic, but the internal integer output value was printed after output through the DAC. This allows the output numbers to be compared more easily than numbers of significantly different magnitude. If the filtered output results are compared for a specific sampling time instant, the effects of the ADC wordlength quantization can be examined. For example using T = 0.17 sec, the output of the filter is compared. Note the significant difference in magnitude between the 6 and 16-bit wordlength representation, and how the 16-bit and 18-bit representations essentially the same. The RMS error was again calculated from the difference between the original input signal and the filtered output signal. Figure 5.12 shows the results of using the digital filter of Eq. (5.17) in the simulation and varying the ADC word length from 6 to 18-bits.

The RMS error decreased as the ADC wordlength increased reaching a steady state value at 17-bits. This is also apparent in Table 5.4 as the output values for 16-bits and 18-bits are essentially equivalent. Using a 6-bit word length for the ADC, the RMS error was just 1.07 times greater than the steady state RMS error.

From Fig. 5.12 it is easy to see that a 10-12-bit ADC would be acceptable for for this digital filtering system. Using 12 or more bits in the ADC wordlength, the error between the input signal and filtered output signal will begin to reach a steady state value. The magnitude of the filtered output can be determined for the different ADC wordlengths can be examined numerically using Table 5.4. It appears a12-bit ADC may provide an adequate output magnitude used with a 12-bit DAC and a 16-bit coefficient wordlength for this digital filtering system at T = 0.010 sec, but 14-bits may give slightly better results. It was shown that using more than 16-bits would not provide any additional resolution, and using an ADC with a wordlength of longer than 16-bits would just increase the cost of the system. It would then be up to the system designer to determine the proper ADC word length for this filtering application.

Similar results were obtained by using a ± 10 volt. 12-bit ADC with 16-bit coefficient word length and varying the DAC wordlength from 6 to 20 bits.

70

Table 5.5 shows the original input signal and filtered output varying the DAC wordlength by 2-bits from 6 to 18 bits. Again the filtered output is after output through the DAC to avoid comparing disimilar numbers. A comparison of output values is done at T = 0.17 sec. Figure 5.13 shows the RMS error for varying the DAC wordlength. A steady state value was reached when 14-bits for used for the DAC word length I but the 1 0-12-bit values are very close to the 14-bit values. Similar conclusions can be drawn as was done for the ADC above. The system designer could conclude that based on the simulation results. a ±1 0 volt. 12-bit ADC/DAC. and 16-bits to store the filter coefficients would provide a cost effective digital filtering system that meets the design criteria. However a 16-bit system with a ± 10 volt. 14-bit ADC/DAC may provide better input and output resolution.

5.4 Digital Filter T - 0.005 sec

As another example. assume the analog filter of Eq. (5.3) was discretized using T = 0.005 sec. Using the bilinear transformation method as was used above. the corresponding digital filter transfer function is given by

(5.39)

0.0004166z4 +O.0016664z3+O.0024996z2+ 0.0016664z+0.0004166

Gc(z)=----------------------------------------------------------------------------------------------.

z4 - 3.180667z3 + 3.861268z2 - 2.112175z + 0.43828269

The frequency characteristics of the filter of Eq. (5.39) were determined in the same manner as the digital filter for T = 0.010 sec in Eq. (5.17) by substituting z = ejwT into the digital filter transfer function. A plot of magnitude versus frequency shown in Fig. 5.14. and a plot of phase angle versus frequency shown in Fig. 5.15. Note the folding frequency for this example is

2n/T = 21t/0.005 = 1256.64 rad/sec.

(5.40)

The coefficients of the new filter are smaller by 1-2 orders of magnitude. A significant decrease in performance should be expected if too small a wordJength is used to store the filter coefficients. The Jow pass filter of Eq. (5.39) was used in the simulation in the floating-point mode and the same

71

noise frequency for the 5 msec filter was changed to 40 Hz. A plot of the simulation run is shown in Fig. 5.16. It can be seen for Fig. 5.16 that this filter for T = 0.005 sec has the desired output qualities.

5.5 Digital Filter Coefficient Quantization T = 0.005 sec

Suppose it was also desired to determine the smallest wordlength necessary to store filter coefficients for this digital filter with a sampling rate of T = 0.005 sec. At this 5 msec sampling time, at least 10-bits were needed to represent the filter. A coefficient wordlength below 10-bits was not sufficient to store the filter coeffiCients, and the numerator disappeared, so the filter output was always zero.

Table 5.6 shows filter output for this filter at T = 0.005 sec with the coefficient word length varied from 10 to 20-bits by two bits. These numbers represent internal magnitudes with 1 volt = 204. If the filtered output is examined at T = 0.17 sec as was used above for the filter at 10 msec, it can be shown from table 5.6 that if at least 16-bits were used to store the filter coefficients, the filtered output differed very little from the input signal. Also, the 14-bit representation was not much different from the 16-20-bit representation. Table 5.7 shows the transfer functions for the digital filter of Eq. (5.39) using 10,12,14,16, and 18-bits to store the filter coefficients. These transfer functions were obtain using the same method as was done for the digital filter at 10 msec. The poles and zeros of each of these filters are listed in Table 5.8.

The transfer functions for the 10, 12, 14, and 16-bit representations listed in Table 5.8 were used in the simulation in the fixed-point mode. The system characteristics were the same as was used for the filter at 10 msec except the frequency used for the noise signal was changed to 40 Hz.

Plots of the output of these simulation runs appear in Figs. 5.17-5.20. The 10-bit coefficient wordlength storage caused the output signal to have a depressed magnitude and bumpy shape. The 12-bit representation is much better, but still has a depressed magnitude. The 14-bit and 16-bit output signals differ very little from the input signal. This was also can be seen in Table 5.6.

The RMS error was again calculated from the difference between input

72

signal and filtered output signal as the coefficient wordlength size was varied from 10 to 20-bits. The RMS error reached a steady state value if at least 18-bits were used to store the filter coefficients. Figure 5.21 shows the increase in RMS error plotted versus coefficient word length. Using 10-bits to store the filter coefficients, the RMS error was about 3.5 times the steady state value, and fell off rapidly toward the steady state value.

Comparing Fig. 5.21 with Fig. 5.11, it can be seen that the RMS error tends to approach a steady state value quicker for the 5 msec filter than the 10 msec filter. This trend can possibly be explained by the fact that at 5 msec, twice as many points are obtained, and the signal can be reconstructed with more accuracy using more sampling points. The trends in both graphs are the same, using more bits to store the filter coefficients decreases the error in filter output since the coefficients approach that of the original digital filter.

5.6 Pigital Filter Interface Quantization T - O.OOS sec

The quantization effects of the interface devices were investigated for the filter of Eq. (5.39). The digital filter of Eq. (5.39) was used in the simulation with a 16-bit coefficient wordlength, and a 12-bit ± 10 volt. PAC. The 16-bit wordlength was determined to be adequate in Section 5.5 above. The wordlength of the APC was varied from 10 to 18-bits by two bits. Table 5.9 is summary of the output from each of the simulation runs. These values are output after the PAC to avoid comparing numbers of different magnitudes as was mentioned above. If a specific sampling instance of T = 0.17 sec is examined, it can be seen that the 12-bit APC output is smaller than the 14-18-bit outputs, and the 16 and 18-bit values are essentially the same. The 10-bit output is much larger than any of the other values. This phenomenon is probably due to the fact that 10-bit ADC wordlength does not provide the proper input resolution and the internal calculations cause output values to be a deceiving higher magnitude.

The RMS error was calculated as the ADC wordlength was varied from 10-20-bits. The RMS error had a magnitude of 1.16 times that of the steady state value at 10-bits and rapidly approached the steady state value. If this curve is compared with that of Fig. 5.12, the RMS error for the APe at 10 rnsec, the RMS error approaches the steady-state value quicker at 5 msec,

73

and the curve has a steeper shape. The RMS error falls off quickly to near the steady state value if at least 14-bits are used. An ADC wordlength of less than 14-bits does not appear to be sufficient for this filter at 5 msec.

The DAC wordlength was varied next. The filter of Eq. (5.39) was used with a 16-bit coefficient wordlength and a 12-bit , ± 10 volt ADC. The 12-bit ADC may be marginal, but was chosen because a ±1 0 volt, 12-bit ADC/DAC pair will be used to verify the Simulation results on an actual real-time system. Table 5.10 shows the results of varying the DAC wordlength from 10 to 18-bits. If the sampling instance of T ;:; 0.17 sec is examined, it appears 14-bits may be the lower acceptable limit, and the 16 and 18-bit are close to being the same. The 10-bit DAC gives false results as the 10-bit ADC did.

Figure 5.23 shows a plot of the RMS error for varying the DAC wordlength from 10 to 20-bits. The curve is much steeper than that of Fig 5.13 for the 10 msec filter. It appears at least a 14-bit DAC should be used in this example.

It has been shown that lowering the sampling time for a digital filtering application can require larger wordlengths for the computer and interface devices used to implement a digital filter. The ADC and DAC must provide enough resolution and the coefficient word length must be large enough to store the coeffiCients to achieve the proper results. It is up to the system designer to properly coordinate the ADC, DAC, coefficient wordlength and sampling time if the application is to be successful. If the sampling time is cut in half, the wordlength of the ADC, DAC and those used for coefficient storage must be lengthened to compensate for the higher sampling frequency. It has been shown in the previous two examples, that the filter designed for 10 msec sampling time required a 12·bit ADC, 12·bit DAC, and at least 12-bits to store the filter coefficients. At a 5 msec sampling time, the same digital filtering system required at least 14·bits for the ADC, although 16·bits gave significantly better results, at least a 14·bit DAC, and 16-bits to store the filter coefficients.

The simulation was used to obtain output for the filters at both sampling times. This output was used in graphical as well as tabular form. The simulation allowed the user to examine each component of the digital filtering system and determine the quantization effects of these components be varying their characteristics. This allowed the best choice of system components to be made based on the given design criteria. In the next

section the affects of these quantization errors on a closed-loop control system will be examined.

5.7 Closed-Loop Control

Recall that the closed-loop control system for the simulation is based on the following block diagram.

CONTROLLER DAC (ZOH) PLANT
r(k) + .. e(k) .. G (z) m(k) .. Gh (s) m(t) G (s) eft) ..
R(z) E(z) c M(z) M(s) P C(s)
- 1
ADC
c(k), C(z) Fig. 5.24

Block Diagram of Simulation Closed-Loop Control System

The simulation will be used in the closed-loop mode, so the feedback path will be connected. As an example of a closed-loop control system, consider the plant transfer function

20

Gp (s) = -------------. (s+1 )(s+5)

(5.41 )

Assume that the plant of Eq. (5.41) is to be controlled according to the scheme of Fig. 5.27, using a sampling time of T = 0 .1 sec. Assume. also that it is required for the digital control system to track step inputs in deadbeat fashion. Specifically, it is required that the output match the reference step from the second point on and that there be little or no oscillation after that point. [82] A digital controller for T = 0.1 sec is needed. It can be shown that the digital controller is given by

74

75

6.67672 - 10.091z-1 + 3.66426z-2

----------------------------------------------- .

(5.42)

1 - 0.54979z-1 - 0.45022z-2

The pulse transfer function model of the plant is given by

0.08234z-1 + 0.06743z-2

GhGp(z) = ---------------------------------------

1 - 1.51137z-1 + 0.54881z-2

(5.43)

The controller of Eq. (5.42) and the plant of Eq. (5.43) were used in the simulation in the floating-point mode with a step input of 1-volt. Figure 3 of Appendix B shows the input sequence needed for the simulation to be in the closed-loop control mode with the controller of Eq. (5.42) and the plant of Eq. (5.43). Table 5.11 shows the output from this simulation run. The sampling time, reference input, error, manipulation and plant output are shown in the table. A plot of this simulation run is shown in Fig. 5.25. Figure 4 of Appendix B shows a screen dump of the graphics plot for this closed-loop control system. Since the manipulation ranges from -3.3 - 6.6, it was scaled by 0.167 to bring it in range with other simulation values. This manipulation scaling allows the plot to be studied with all simulation values in the same range. Note the controller achieves the desired results of matching the reference input after two samping times.

5.8 Closed-Loop Control Coefficient Quantization

Although the quantization issues affecting the implementation of a closed-loop control system were not discussed, they are similar to those that arise for a digital filtering system. Research work on the quantization problem can be found in[4], [9], [15]. [16]. [18]. [41]. [55]. [59], [60], [79]. [80]. [91]. and [96].

The objective here is to investigate the coefficient quantization effects using fixed-point arithmetic on the system performance of this closed-loop system. One possibility is to monitor the shift of the controller (or overall

76

system. One possibility is to monitor the shift of the controller (or overall system) poles and zeros as the coefficient resolution varies as was done above in the digital filtering examples. Another approach is to use the simulation program to track the reference input under the same circumstances. The results of the latter approach using the simulation in the fixed-point mode will be considered. The simulation was run in the fixed-point mode using the same input characteristics as was used in the floating-point example. Table 5.12 shows the output results for the simulation run. Note the input value of 1 volt is the number 204 since the ADC is 12-bits as was described above. The coefficient word length size for the controller was varied from S to 1S-bits by two bits. The output is shown tabulated in Table 5.13. The controller transfer functions for S, 8,12, and ts-btts are listed in Table 5.14. They were obtained in the same manner as was used to determine the digital filter transfer functions. The output values are analog plant output values and are printed floating-point numbers even though fixed-point arithmetic was used internally. It can be seen from Table 5.13 that printing the output values as internal integer numbers would make it hard to compare the differences in output as the coefficient wordlength size was varied, so the output values are printed as floating-point values. It appears that storing the control coefficients with 16-bits gives the best output results, but the 14-bit wordlength output is very close to the ts-btt output. Since this control system finishes its work after just two sampling times, the RMS error was not calculated. However, one would expect similar tends to those exhibited by the digital filter at 10 msec sampling time.

5.9 Closed-Loop Control Interface Quantization

Since we are dealing with a closed-loop control system, the quantization errors introduced by the interface devices tend to have a greater affect on system performance than they would in an open-loop or digital filtering system. The input value to the control algorithm and the manipulation output value must be properly encoded and decoded by the ADC and DAC respectively. This interface pair are interdependent and have to be properly coordinated if the control application is to be successful. If either the ADC or DAC does not provide enough resolution, the overall system performance will

77

used in the fixed-point mode with a ± 10 volt, 12-bit DAC and 16-bit coefficient storage. The ADC wordlength was varied from 6 to 16-bits by two bits. Table 5.15 shows the output results for the even coefficient word sizes. Note how the output is closer to the input value of 1 volt if at least a 14-bit ADC is used. Table 5.16 shows the results of varying the DAC wordlength using a ±10 volt, 12-bit ADC, 16-bit coefficient storage and varying the DAC wordlength from 6 to 16-bits. Again, the 14-bit DAC gives a better output result than the 12-bit DAC. In both cases, there is no difference between the 14 and 16-bit wordlength for these interface devices.

A new controller for a lower sampling time could be designed and the tests performed above could be repeated. Results similar to the digital filter designed at the lower sampling would be expected. That is, at a lower sampling time, or higher sampling frequency, storing control coeffiCients in too small a wordlength, or using interface devices with too small a wordlength cause a serious degradation in system performance.

5.10 Arithmetic Quantization

To try and illustrate the effects of arithmetic quantization, the simulation has an option that allows the user to choose if the remainder is saved upon doing an integer divide. If the user chooses to save the remainder, the modulus operator will be used on the final output value. The modulus operator allows the remainder of an integer divide to be determined. The final output value is used mod the coefficient scale to determine the remainder. It is then divided by the coefficient scale, and the quotient of this divide is the value output through the DAC. Recall all filter/controller coefficients were multiplied by a coefficient scale upon input. It is necessary to remove this scale factor before outputing the final value. When an integer divide is performed using fixed-point arithmetic. the quotient is kept and the remainder is lost. If the remainder is not used, the final output value is truncated and some precision is lost. Since the filter/controller output is determined using difference equations, and the algorithm calculations depend on the present as well as the previous output values, a truncation will affect future output points. If the remainder is saved. it is added to the next output value at time T + 1, so there is no loss of precision.

78

The digital filter designed in Eq. (5.17) was used in the simulation in the fixed-point mode. The simulation was run twice, first without using the remainder (truncation) of the divide, and then with using the remainder. The results of this simulation run are shown in Table 5.17. Note how the filter output using the remainder appears before that of the filter where the remainder was not used. Also, the magnitude of the filter using the remainder peaks at 203 while the other peaks at 197. It is clear from this table that using the remainder, and not truncating will give results closer to the input signal, and reduce arithmetic quantization errors. The divide is the only real source of arithmetic quantization error since the add, subtracts, and multiplies are within the dynamic range of the computer used to implement them. Overflows, or limit cycles can occur if too large a value is used for the coefficient word length, or interface word length in the fixed-point mode since all filter/controller coefficients are multiplied by a scale factor upon input. A large scale factor may cause overflows and thus limit cycles in the algorithm calculations.

Similar, but more dramatic results can be obtained by using the digital filter of Eq. (5.39) for the 5 msec sampling time.

The closed-loop control system of Eqs. (5.42 & 5.43) was also used with and without saving the remainder. The results of these two simulation runs are shown in Table 5.18. Note how the control system that does not save the remainder has a manipulation that oscillates, and has an output value that does not matche the reference input value of 204. When the remainder is saved, the control system output is error free after the first two sampling times, and has a manipulation value that is a steady state value with no oscillation.

This type of arithmetic quantization becomes especially critical for the closed-loop control systems, since there is a feedback path, or with a filter/controller at a lower sampling time.

5.11 Summary of Chapter 5

The digital filtering and closed-loop control examples presented in this chapter help to illustrate how the Simulation can be used to develop a complete digital system. The system components can be varied, with the

results examined graphically, or numerically in a tabular form. The simulation allows the user to change system parameters and see the effects on the system. The actual numeric values can be captured to be used in a table, or used to obtain a formal plot with existing software packages such as Lotus 1-2-3. All the plots in this chapter were done with Lotus 1-2-3. These Lotus plots are very similar to what appears on graphics screen. The Lotus 1-2-3 software offers many advantages over a graphics screen dump to the printer, so it was chosen as an alternative for formal plotting, but the interactive graphics available with the simulation are a valuable tool to obtain a immediate visual picture of a change in one of the system components. The graphic and numeric representations of simulation results provide a wealth of information for the system designer. Since it is assumed the simulation is a reliable source of information, the data obtained could be used with confidence by the system designer to obtain an actual digital system that will perform in a manner very close to that of the simulation.

In the next chapter the results obtained from simulation of the digital filters and closed-loop control system will be verified experimentally using actual real-time systems.

79

80

Fig. 5.2 Magnitude vs Frequency

(o4th Order Low ,_ Analog FIlter)

1.1 ~--------------------------------------------------,

0 ••
0 ••
0.7
• 0.'
..
i 0.15
c

"
::I 0.4
0.1
0.2
0.1
0
-0.1
'.21 1 ••••

12.10

Om ... (raeVe")

,.LIt

I2LOO

Fig. 5.3 Phase Angle vs Frequency

(4th-Order Low 11_ Anol •• FlHer)

IH~------------------------------------------------~

81

Fig. 5.4 Magnitude vs Frequency

(DIgItal Fllter T • 0.010 ••• )

1.1 ~--------------------------------------------------~

0.'
0.'
0.7
• 0.1
,.
~ 0.5
c

.,
::I A."
0.1
0.2
0.1
a
-0.1
1.21 12.10

Om.ga (roU ... )

,.1.5.

121.00

11.11

a
-80
-100
I -tlSO
!
.! -200

~
I -250
III
f
-100
-150
-.tOO
1.21 Fig. 5.5 Phase Angle vs Frequency

(Digital Fllter T • 0.010 ... )

11.11

12.10

Om.,a (roU ... )

111.51

121.00

Fig. 5.6 Floating-pt. Coeft. Storage

(12-b" AD<VDAC. T • 0.010 1M>

0.'
D.'
0.4
• 0.2
"
i 0
c
..

:2 -0.2
-0.4
-G.'
-G.I
-t
-t.2
0 0.07

D.t4

0.21

0.21

0.115

D.G

D.'"

TIm. (1M> •

2150
200
lID
tOO
• 10
..
i 0
c
..
,
:2
-5CI
-tOO
-tID
-200
-2150
0 Fig. 5.7 6-bit Coefficient Wordlength

(12-bH ADC.IDAC. T. O.otO 1M)

0.07

0.t4

o.2t

0.21

0.15

0042

0"',

TIm. (1M)

82

Fig. 5.8 8-bit Coefficient Wordlength
(12-1»" ADC/DAC. T • 0.010 NO)
DO
200

110
100
• 50
..
i 0
I:


2
-eo
-100
-150
-200

-250
0 0.07 0.1" 0.21 0.21 0.15 o.~ 0.-'1
11m. (NO) " 250
200
150
100
• 50
..
i 0
I:

I
-50
-100
-150
-200
-210
0 Fig. 5.9 12-bit Coefficient Wordlength

(l2-b" ADC/DAC. T • 0.010 ••• )

0.07

O.U

0.21

0.21

0.35

o.~

0.-'1

83

84

Fig. 5.10 16-bit Coefficient Wordlength

(12-blt AOC/DAC. T - 0.010 NO)

250
200
150
100
50

"
~ 0
~

1J
2 -&0
-100
-150
-200
-250
0 0.28

0.35

0.07

0.21

0.41

O.U

Tlm. (..o) .,

Fig. 5. 11 Coefficient Quantization

(12-bH ADClDAC. T - 0.010 .eo)

1,01
1,07
1,01
..
! 1,05
'1
f I,O~
0
oS
• t.DI

!
oS 1,02
"
JI

1,01 1,01
1,07
1,01
..
! 1,05

:.
j-
& I,D~
oS
• 1,01

i
s 1,02
I
&
1,01 85

Fig, 5. 12 ADC Quantization

(Il-blt DAC, Il-b1t CMff, T-O.DIO .M)

ADC Wordlength (bh)

Fig. 5, 13 DAC Quantization

(12-blt ADC, II-bit Co'-f, T-O.OIO .M)

DolC Wordlengfh (lah)

86

Fig. 5. 14 Magnitude vs Frequency

(Digital mer T • 0.005 no)

t.t ~--------

0.'
0.1
0.7
• 0.'
'IJ
i 0.5
c


:2 0.4
0.1
0.2
0.1
0
-0.1
'.2' '2.10

ttl.a.

121.00

, ....

Omega (rod/.eo)

Fig. 5.15 Phase Angle vs Frequency
(Dlgltal FlIter T • O.ooa no)
0
-50
-100
I -1BO
,
a
....
• -200
it
.t
: -250

f
-100
-1150
-400
'.21 I •.•• 12.10 ..1.15. 121.00
Om ... (rod/no) 87

Fig. 5.16 Floating-Pt. Coeff. Storage

(12-bH ADC/DAC, T • 0.005 ... )

'.2~--------------~------------------------------1

D.'
D.'
D.4
• 1.2
"
i 0
II:


:I -0.2
-0.'
-0.'
-0.'
-,
-1.2
0 0.1 0.2 0.1 t.4 0.1
nm. ( ... , Fig. 5.17 10-bit Coefficient Wordlength
(12-ItJt ADC/DAC, T - D.DOI .",
210
200

110
100
.. 10
& D
it

:I -ID
-100
-lID
-200

-JaO
0 0.1 1.2 0.1 0 •• D.'
n..( ... ) 88

Fig. 5.18 12-bit Coefficient Wordlength
(I2-tttt ADC/DAC. T - 0.001 ... )
210
200

110
100
" 10
I 0
I:


::I
-10
-100
-110
-200

-210
0 0.1 0.2 D.I 0.4 0.1
n .... ( ... ) ., Fig. 5.19 14- bit Coefficient Wordlength
(I2-ItH ADC/DAC. T - 0.001 ... )
210
200

liD
lDO
• 10
..
~ 0
I:

I
-so
-IDO
-110
-200

-210
0 0.1 0.2 0 •• 0.4 0.1
n",. ( ... ) 89

Fig. 5.20 16-bit Coefficient Wordlength

(12-1II1t ADC/DAC. T • 0.001 ... )

2H
200
IH
100
H

ow
~ 0
c


2 -H
-100
-IH
-200
-250
0 0.4

0.2

0.1 nm. (NO' ~

0.5

0.1

Fig. 5.21 Coefficient Quantization (12-blt ADC/DAC. T • 0.001 NO)

J

~

f

! I

~

!

I

O~--~----~---r----~--~--~----~--~----r---~

10

"

12

11

14

II

"

17

II

It

20

Coefn.1ent Word ..... gth (bit.)

90

Fig. 5.22 ADC Quantization

(t2-bft DAC. II-bit Coeff. r-o.ool! no)

1.2
1.11
1.1.
1.17
1.1'
.. 1.15
! 1.14
1 1.11
!- 1.12
::J 1.1 I
0
.s 1.1
• 1.01
• 1.01
~
• 1.07
.s 1.01
g 1.05
I: 1.04
1.01
1.02
1.01
I
0.11
10 11 ACe Word .... _ (blta)

1.2
1.1.
1.11
1.17
1.11
.. I.II!
! 1.14
.. I.t5
t t.12
i.u
.s I. I
• 1.01
• 1.0'
5
1.07
.s 1.0'
s I.DS
I: 1.04
I.DI
1.02
l.ot
t
0.11
ID 11 Fig. 5.23 DAC Quantization

(I2-bft ACe. 1I-bft Coeff. r-O.005 ... )

12

II

14

15

11

17

II

II

20

DAC Wordl .. _ ( .. Ita)

Sign up to vote on this title
UsefulNot useful