Unit 4

UNIT-IV FINITE WORD LENGTH EFFECTS
4.1 INTRODUCTION:
 Digital signal processing algorithms are realized either with special purpose digital
hardware or as programs for a general purpose digital computer.
 In both cases the numbers and Coefficients are stored in finite length registers.
 Therefore, Coefficients and numbers are quantized by truncation or rounding off
when they are stored.
4.2 TYPES OF NUMBER REPRESENTATION:

 There are there common forms that are used to represent the numbers in a digital
Computer or any other digital hardware.
 Fixed Point Representation
 Floating Point Representation
 Block Floating Point Representation
4.3 FIXED POINT REPRESENTATION
 In fixed point, the position of the binary point is fixed.
 The bit to the right represents the fractional part of the number and those to the left
represent the integer part.
Example:
 Covert the decimal number 30.275 to Binary
125
 The Manner in which negative numbers are represented gives three different forms
of fixed point arithmetic.
 Sign Magnitude form
 One‟s Complement form
 Two‟s Complement form
SIGN MAGNITUDE FORM (S&M):

 In this representation the most significant bit is set to „1‟ to represent the negative
sign.
Example: (1.75)10
0.75 x 2 = 1.5
0.5 x 2 = 1.0
+1.75 10 = 01.110000 2
−1.75 10 = 11.110000 2
 In Sign magnitude form the number „0‟ has two representations,

ie, 00.000000
(Or)
10.000000.
 With „b‟ bits only 2𝑏 − 1 numbers can be represented.
ONE’S COMPLEMENT FORM:
 In One‟s Complement form the positive numbers represented as in the S&M form.
 But the negative number is obtained by complementing all the bits of the positive
number.
Example:(0.875)10
0.875 ∗ 2 = 1.75
0.75 ∗ 2 = 1.5
0.5 ∗ 2 = 1
0.875 10 = 0.111000 2
=1.000111 (Complementing each Bit)

126
−0.875 10 = 1.000111 2
In One‟s Complement form the magnitude of the negative number is given by,
𝑏
1− 𝐶𝑖 2−𝑖 − 2−𝑏
𝑖=1
In this type of representation „0‟ can be represented as,

0.000000
(or)
11.111111
 So with „b‟ bits 2𝑏 − 1 numbers can be represented exactly.
TWO’S COMPLEMENT FORM
 In two‟s Complement representation positive numbers are represented as in
sign magnitude and One‟s Complement.
 The negative number is obtained by complementing all the bits of the positive
numbers and adding one to the least significant bit.
Example:
0.875 10 = 0.111000 (One‟s complement from previous example)
1000111
+ 1
−0.875 10 1001000
The magnitude of negative number is given by
𝑏
1− 𝐶𝑖 2−𝑖
1=1
4.4 FLOATING POINT REPRESENTATION:

In floating point representation a positive number is represented as,
F=2C.M
C- Number of Multiplication (or) Division.

M- Mantissa
127
1
Mantissa ranges, ≤𝑀≤1
2
Example:
Represent the following with floating point
(1) 7(2)-7(3).0.25(4)-0.25
Answers:
LetM=5; C=3(bits)
7 3.5 1.75
1) = = = 0.875
2 2 2
(It comes under the ranges ie 0.5 ≤ M ≤ 1)

M=0.875
C=3
F=2c M
=2011. 0.875
0.875 x 2 = 1.75
𝐹 = 2011 3𝐵𝑖𝑡𝑠
0.11100 (5bits)
0.75 x 2 = 1.5
0.5 x 2 =1
2) -7
𝐹 = 2011 3𝐵𝑖𝑡𝑠
1.11100 (5bits)
3) 0.25 0.25 x 2 = 0.5 (It comes Under the ranges)

C=2 M=0.5
F=2cM
=22. 0.5
𝐹 = 2010 3𝐵𝑖𝑡𝑠
0.10000 (5bits)
F=20100.10000
4) −0.25 ⟹ 𝐹 = 2010 . 0.10000 𝐹 = 2010 3𝐵𝑖𝑡𝑠

1.10000 (5bits)
128
COMPARISON OF FIXED POINT AND FLOATING POINTS
FIXED POINT REPRESENTATION FLOATING POINT REPRESENTATION
 In a b-bit binary the range of numbers  In a b-bit binary the range of numbers
represented is less When Compared to represented is large When Compared
floating point representation. to fixed point representation.
 The position of binary point is fixed  The position of binary point is
Variable.
 The resolution is Uniform  The resolution is variable.
throughout the range.
 The accuracy of the result is less due  The accuracy of the results will be
to smaller dynamic range. higher due to larger dynamic range.
 Sped of processing is high  Speed of processing is low.
 Hardware implementation is chapter  Hardware implementation is costlier
 Fixed point arithmetic can be used  Floating point arithmetic cannot be
for real time computations used for real time Computations.
 Quantization error. Occurs only in  Quantization error Occurs in both
multiplication multiplication and addition
4.5 BLOCK FLOATING POINT NUMBERS:
 A Compromise between fixed and floating point system is the block floating point
arithmetic.
 Here, the set of signals to be handled is divided into blocks.
 Each block has the same value for the exponent.
 The arithmetic operation within the block used fixed point arithmetic and only one
exponent per block is stored.
 This representation of number is most suitable in Certain FFT flow graphs and in
digital audio applications.
4.6 QUANTIZATION:
The process of converting a discrete time Continuous amplitude Signal 𝑥 𝑛 into a
discrete time discrete amplitude signal 𝑥𝑞 𝑛 is known as quantization.
129
QUANTIZATION NOISE (OR) A/D CONVERSION NOISE
Sampler Quantizer
𝑥 𝑛
𝑥 𝑡 𝑥𝑞 𝑛
Fig: Block Diagram of A/D Converter

 The process of Converting an analog signal to digital by A/D Converter.
 At first the signal 𝑥 𝑡 is sampled at the regular intervals t=nt where n=0, 1, 2… to
create a sequence,𝑥 𝑛 .This is done by a sampler.
 Then numeric equivalent of each sample 𝑥 𝑛 is expressed by a finite number of
bits giving the sequence 𝑥𝑞 𝑛
 The difference signal 𝑒 𝑛 = 𝑥𝑞 𝑛 − 𝑥 𝑛 is called quantization noise (or) A/D
Conversion noise.
Let,
 Assume sinusoidal Signal Varying between +1 and -1.
 Dynamic range of 2.
 If ADC is used to Convert the Sinusoidal signal it employs (b+1) bits. (Including
sign bit).
 Then the number of levels available for quantizing 𝑥 𝑛 is 2b+1.
 Thus the interval between successive levels are
2
𝑄= = 2. 2− 𝑏+1 = 2. 2−𝑏 2−1
2𝑏+1
𝒒 = 𝟐−𝒃
Where „q‟ is known as step size.
 The common methods of quantization are
 Truncation
 Rounding
130
TRUNCATION:
It is a process of discarding all bits less significant than least significant bit that is
retained.
Example:
0.00110011 to 0.0011
(8bits) to (4 bits)
ROUNDING:
Rounding of a number of „b‟ bits is accomplished by choosing the rounded result as the b
bit number closest to the original number unrounded.
0.11010 0.110 (or) 0.111
(Rounded to ‘3’ bits)
The following error arises due to quantization of numbers.

 Input quantization error.
 Product quantization error.
 Coefficient quantization error.
 Input quantization error:
o The conversion of a continuous time input Signal into digital value
produces an error, which is known as input quantization error.
o This error arises due to the representation of the input signal by a fixed
number of digits in A/D Conversion process.
 PRODUCT QUANTIZATIN ERROR:
o It arises at the Output of a multiplier.
o Multiplication of a ‘b’ bit data with a b bit co-efficient result a product
having ‘2b’ bits.
o Since a ‘b’ bit register is used, the multiplier output must be rounded or
truncated to ‘b’ bits which process an error.
 COEFFICIENT QUANTIZATION ERROR
o The filter coefficients are computed to infinite precision in theory.
131
o If they are quantized, the frequency response of the resulting filter may
differ from the desired response and sometimes the filter may fail to meet
the desired specifications.
o If the poles of the desired filter are closed to the Unit Circle, then those of
the filter with quantized Coefficients may lie just Outside the Unit Circle,
leading to instability.
o The other errors arising from quantization are round off noise and limit
Cycle Oscillations.
4.7 ERROR DUE TO TRUNCATION AND ROUNDING:
TRUNCATION
 The truncation is the process of reducing the size of binary number by discarding
all bits less significant than the least significant bit that is retained.
 In the truncation of a binary number to „b‟ bits all the less significant bits beyond
„bth‟ bit are discarded.
Fig:Input –Output Characteristic of Quantizer due to Quantization.

 Ranges of unquantized numbers are marked on X-axis. (N).
 Quantized numbers are marked on Y-axis (Nt)
 Any positive unquantized number in the range of 0 ≤ N ≤ 1 ∗ 2−b will be
assigned the quantization Step 0 𝑥 2−𝑏
132
 Any positive unquantized number in the range 1 ∗ 2−𝑏 ≤ 𝑁 ≤ 2 ∗ 2−𝑏 will
be assigned the quantization step 1 ∗ 2−𝑏 and so on.
 The error due to truncation of negative number depends on the type of
representation of the number,
N→ Unquantized number
Nt → Quantized number by truncation.
 Quantization error due to truncation is given by
TRUNCATION ERROR = 𝑒𝑡 = 𝑁𝑡 − 𝑁
Number its Representation. Range of error when truncated to ‘b’ bits
Positive Number0 ≥ 𝑒 > −2−𝑏
Sign Magnitude Negative Number0 ≤ 𝑒 < 2−𝑏
One‟s Complement Negative Number0 ≤ 𝑒 < 2−𝑏
Two‟s Complement Negative Number0 ≥ 𝑒 > −2−𝑏
 Relative error due to Truncation,
𝑁𝑡𝑓 − 𝑁𝑓
𝜖𝑡 =
𝑁𝑓
𝑁𝑓 → 𝑈𝑛𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟
𝑁𝑡𝑓 → 𝑄𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 𝑑𝑢𝑒 𝑡𝑜 𝑡𝑟𝑢𝑛𝑐𝑎𝑡𝑖𝑜𝑛.
𝑁𝑡𝑓 = 𝑁𝑓 + 𝑁𝑓 𝜖𝑡
P(et) P(et)
2b
et -2-b 0 -2-b 0 2b et
Fixed point Two’s Complement Fixed Point –One’s Complement

(or) sign Magnitude
133
P(εt) P(εt)
2b/4 2b/2
-2x 2-b 0 2x2-b εt -2x 2-b 0 εt

Floating point When Mantissa in Floating point When Mantissa
Two’s Complement One’s Complement or in Sign Magnitude
Fig: Probability Density functions for Truncations
ERROR DUE TO ROUNDING:

Rounding is the process of reducing the size of a binary number to finite word size
of bits such that the rounded „b’-bit number is closest to the original unquantized number.
2 −𝑏 2 −𝑏
 Any positive Unquantized number in the range 1 ∗ ≤𝑁 <2∗ will be
2 2
assigned the quantization Step 1 ∗ 2−𝑏

2−𝑏 2 −𝑏
 Any positive unquantized number in the range2 ∗ ≤𝑁 <3∗ will be
2 2
assigned the quantization step 2 ∗ 2−𝑏 and so on.

 Rounding error, 𝑒𝑟 = 𝑁𝑟 − 𝑁
N→ Unquantized fixed point binary number.
−2−𝑏 2 −𝑏
 Range of error ≤ 𝑒𝑟 ≤ (fixed point)
2 2
𝑁𝑟 ⟶ Fixed point binary number quantized by rounding

 Relative error due to rounding,
𝑁𝑟𝑓 − 𝑁𝐹
𝜀𝑟 = , 𝑤𝑕𝑒𝑟𝑒 𝑁𝑟𝑓 = 𝑁𝑓 + 𝑁𝑓 𝜀𝑟
𝑁𝐹
𝑁𝑟𝑓 → 𝑈𝑛𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑 𝑓𝑙𝑜𝑎𝑡𝑖𝑛𝑔 𝑝𝑜𝑖𝑛𝑡 𝑏𝑖𝑛𝑎𝑟𝑦 𝑛𝑢𝑚𝑏𝑒𝑟
𝑁𝑓 → 𝑅𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑓𝑙𝑜𝑎𝑡𝑖𝑛𝑔 𝑝𝑜𝑖𝑛𝑡 𝑏𝑖𝑛𝑎𝑟𝑦 𝑛𝑢𝑚𝑏𝑒𝑟
Range of error −2−𝑏 ≤ 𝜀𝑟 ≤ 2−𝑏 (floating point).
134
𝑃 𝑒𝜋 𝑃 𝑒𝜋
2b 2𝑏
2
−2−𝑏
2
0 2−𝑏 𝜀𝜋 −2−𝑏 0 2−𝑏 𝜀𝜋
2
Rounding –Fixed point Rounding- floating point
Fig: Quantization noise probability density functions for rounding.
4.8 INPUT QUANTIZATION ERROR:
 Input quantization error arises when a Continuous signal is converted into digital.
The quantization error is given by,
𝑒 𝑛 = 𝑥𝑞 𝑛 − 𝑥 𝑛
𝑥𝑞 𝑛 → 𝑆𝑎𝑚𝑝𝑙𝑒𝑑 𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
𝑥 𝑛 → 𝑆𝑎𝑚𝑝𝑙𝑒𝑑 𝑈𝑛𝑞𝑢𝑎𝑛𝑡𝑖𝑧𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
 Depending on the way of which 𝑥 𝑛 is a quantized different distribution of
quantization noise may be obtained.
 If rounding of a number is used to get 𝑥𝑞 𝑛 then the error signal satisfies the
−𝑞 𝑞
relation, ≤𝑒 𝑛 ≤
2 2
P(e) P(e)
1/q 1/q
-q/2 0 q/2 e
-q 0 e
Probability density function Probability density function of
for round off error truncation error
135
4.9 PRODUCT QUANTIZATION ERROR
 In fixed point arithmetic the product of two ‘b’ bit numbers results in numbers
„2b’ bits long.
 In digital signal processing application, it is necessary to round this product to ‘b’
bit number, which produces an error known as product quantization (or) product
round off noise.
e(n)
xq(n) y(n)=axq(n)+ e (n)
Fig. Fixed point product round off noise model

−𝑞 𝑞
𝑒 𝑛 is uniformly distributed over the range and ie, Mean of 𝑒 𝑛 = 0
2 2
2−2𝑏
Variance𝜎𝑒2 =
12
𝑒 𝑛 is a stationary white noise sequence.
Fig: Quantization noise model for a first order system.
136
Fig: Quantization noise model for a second order system with five noise
Fig: Quantization noise model for a second order system with a single noise source.
𝟏
𝝈𝟐𝑶𝑲 = 𝝈𝟐𝒆
𝑯𝑲 𝒛 𝑯𝑲 𝒛−𝟏 𝒛−𝟏 𝒅𝒛
𝟐𝝅𝒋
4.10 COEFFICIENT QUANTIZATION ERROR:
 In the design of a digital filter the Coefficient are evaluated with infinite precision.
137
 But when they are quantized, the frequency response of the actual filter deviates
from that which would have been obtained with an infinite word length
representation and the filter may actually fail to meet the desired specifications.
 If the poles of the desired filter are close to the unit circle, then those of the filter
with quantized coefficients may lie just outside the unit circle leading to
instability.
PROBLEM:
Consider a second order IIR filter with,
𝟏.𝟎
𝑯 𝒛 = Find the effect on quantization on pole location of the given
𝟏−𝟎.𝟓𝒛−𝟏 𝟏−𝟎𝟒𝟓𝒛−𝟏
system function in direct form and in cascade form Take b=3bits.

Direct form:
1
𝐻 𝑧 =
1− 0.45𝑧 −1 − 0.5𝑧−1 + 0.225𝑧 −2
1
𝐻 𝑧 =
1− 0.95𝑧 −1 + 1 − 0.225𝑧 −2
-0.95 Converted in to binary.

0.95 x 2 =1.9
0.9 x 2 = 1.8
0.8 x 2 = 1.6
0.6 x 1 = 1.2 0.95 10 = 0.111100 … . 2
0.2 x 2 = 0.4
−0.95 10 = 1.111100 … . 2
0.4 x 2 = 0.8
After truncation to ‘3’ bits.

−0.95 10 = 1.111 2
Convert 1.111 2 into decimal.

1. 111
138
1 x 2-3 = 0.125
1 x 2-2 = 0.25
1 x 2-1 = 0.5
-0.875
0.225 Converted in to binary.
0.225 x 2 = 0.45
0.45 x 2 = 0.9
0.9 x 2 = 1.8
0.8 x 2 = 1.6 (0.225)10 = 0.111100 … . 2
0.6 x 2 = 1.2
After truncation to ‘3’ bits.

0.225 10 = 0.001 2
0. 0 0 1
1 x 2-3 = 0.125
0 x 2-2 = 0
0 x 2-1 = 0 ⇒ 0.125
DIRECT FORM 1
𝐻 𝑧 =
1 − 0.95𝑧 −1 + 1 − 0.225𝑧 −2
1
𝐻 𝑧 =
1− 0.875𝑧 −1 + 0.125𝑧 −2
139
CASECADE FORM:
1
𝐻 𝑧 =
1− 0.5𝑧 −1 1 − 0.45𝑧 −1
−0.5 10 is converted into binary
0.5 x 2 =1 −0.5 10 = 1.100 2
0.45 is converted into binary
0.45 x 2 = 0.9
0.9 x 2 = 1.8 0.45 10 = 0.111100 … . 2
0.8 x 2 = 1.6 −0.45 10 = 1.111100 … . 2
0.6 x 2 = 1.2
0.2 x 2 =0.4
After truncation to „3‟ bits.

−0.45 10 = 1.0 1 1 2
1 x 2-3 = 0.125
1 x 2-2 = 0.25
0 x 2-1 = 0
-0.375
(-0.5)10
1. 1 0 0
0 x 2-3 = 0
0 x 2-2 = 0
1 x 2-1 = 0.5
0.5
140 1
𝐻 𝑧 =
1 − 0.5𝑧 −1 1 − 0.375𝑧−1
4.11 QUANTIZATION NOISE MODEL:
Steady State Input Noise Power:
In digital processing of analog signals, the quantization error is commonly viewed as an
additive noise signal ie., xq(n)=x(n)+e(n)
Sampler Quantizer Digital

System
x (t) x (n) xq (n) y (n)
Sampler xq(n)= xq(n)+e(n)

x (t) x (n)
e (n)
Fig: Quantization Noise Model
 If rounding is used for quantization then the quantization error

–𝑞 9
𝑒 𝑛 = 𝑥𝑞 𝑛 − 𝑥 𝑛 , is bounded by ≤𝑒 𝑛 ≤
2 2
 In general
𝑥2 𝑋1 , 𝑋2 → 𝐿𝑖𝑚𝑖𝑡𝑠
1
𝐸 𝑥 = 𝑥 𝑑𝑡 𝑋1 → 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
𝑥2 − 𝑥1 𝑥1 𝑋2 → 𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
Variance 𝜎 2 = 𝐸 𝑋 2 − 𝐸 2 𝑋
𝑞 /2
1
𝐸 𝑒 =𝑞 −𝑞
𝑒 𝑑𝑒
− −𝑞/2
2 2
𝑞/2
1 𝑒2
=𝑞 𝑞
+ 2 −𝑞/2
2 2
141
1 𝑞 2 −𝑞 2
= 2𝑞 −
2 2
2
1 𝑞2 𝑞2
= −
𝑞 4 4
1
= 0
𝑞
E(e)=0
𝑞/2
2
1
𝐸 𝑒 =𝑞 −𝑞
𝑒 𝑑𝑒
− −𝑞/2
2 2
𝑞/2
1 𝑒3
=𝑞 𝑞
+ 3 −𝑞/2
2 2
1 𝑞 3 −𝑞 3
= 2𝑞 1 −
𝑥 2 2
2 3
1 𝑞3 𝑞3 1 2𝑞3 𝑞2
= + = =
3𝑞 8 8 3𝑞 8 12
𝑞 = 2−𝑏 1
𝜎𝑒2 = 𝐸 𝑒 2 − 𝐸 𝑒
𝑞2
= −0
12
𝑞2
𝜎𝑒2 =
12
2
2−𝑏 2 −2𝑏
From𝟏 ⟹ 𝜎𝑒2 = =
12 12
𝑞−2𝑏
𝜎𝑒2 =
12
142
𝜎𝑒2 → Input Noise power
Note:
 Signal to Noise Ratio,
 If the input signal is 𝑥 𝑛 and its Variance is 𝜎𝑥2 then the ratio of signal
power to noise power which is known as signal to noise ratios.
𝜎𝑥2 𝜎𝑥2 2𝑏 2
2 = 2−2𝑏 = 12 2 𝜎𝑥
𝜎𝑒
12
𝑆𝑁𝑅 = 12 22𝑏 𝜎𝑋2
143
4.12 STEADY STATE OUTPUT NOISE POWER:
The quantized input signal of a digital system can be represented as a sum of
unquantized signal 𝑥 𝑛 and error signal 𝑒 𝑛
Fig: Representation of A/D conversion Noise

Auto Correlation function,
𝑒0 𝑛 = 𝛾𝑒𝑜 𝑒0 𝑚
𝛾𝑒𝑜 𝑒0 𝑚 = 𝐸 𝑒0∗ 𝑛 𝑒0 (𝑛 + 𝑚) (1)
𝑒0∗ 𝑛 = ∞
𝐾=0 𝑕 𝑘 𝑒∗ 𝑛 − 𝑘 (2)
∞
𝑒0 𝑛 + 𝑚 = 𝐾=0 𝑕 𝑘 𝑒 𝑛+𝑚−𝑘 (3)
Substitute 2 & 3 in 1
∞ ∞
𝛾𝑒𝑜 𝑒0 𝑚 = 𝐸 𝑕 𝑘 𝑒∗ 𝑛 − 𝑘 𝑕 𝑘 𝑒 𝑛+𝑚−𝑘
𝐾=0 𝐾=0
∞ 2
= 𝐾=0 𝑕 𝑘 𝐸 𝑒∗ 𝑛 − 𝑘 𝑒 𝑛 + 𝑚 − 𝑘
∞ 2
𝛾𝑒𝑜 𝑒0 𝑚 = 𝐾=0 𝑕 𝑘 𝛾𝑒𝑒 (4)
2
Substitute 𝛾𝑒𝑜 𝑒0 𝑚 = 𝜎𝑒𝑜
𝛾𝑒𝑒 = 𝜎𝑒2 in equation (4)
𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 4 𝑏𝑒𝑐𝑜𝑚𝑒𝑠
2 ∞
𝜎𝑒𝑜 = 𝜎𝑒2 𝐾=0 𝑕
2
𝑘 (5)
Substitute K=n in equation 5
2 ∞
𝜎𝑒𝑜 = 𝜎𝑒2 𝑛=0 𝑕
2
𝑛 (6)
144
Where,
2
𝜎𝑒𝑜 → 𝑂𝑢𝑡𝑝𝑢𝑡 𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟
𝜎𝑒2 → 𝐼𝑛𝑝𝑢𝑡 𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟
By parseval‟s theorem,
∞ 2 1
𝑛=0 𝑕 𝑛 = ∮𝑐 𝐻 𝑧 𝐻 𝑧 −1 𝑧 −1 𝑑𝑧 (7)
2𝜋𝑗
Substitute 7 in 6
2 1
6⟹ 𝜎𝑒𝑜 = 𝜎𝑒2 ∮𝑐 𝐻 𝑧 𝐻 𝑧 −1 𝑧 −1 𝑑𝑧
2𝜋𝑗
PROOF of PARSEVAL’S THEOREM:

Z transform is defined as,
∞
𝑍 𝑕 𝑛 =𝐻 𝑧 = 𝑛=0 𝑕 𝑛 𝑧 −𝑛 (8)
Inverse Z transform is defined as,
1
𝑍 −1 𝐻 𝑧 =𝑕 𝑛 = ∮𝑐 𝐻 𝑧 𝑧 𝑛 −1 𝑑𝑧 (9)
2𝜋𝑗
∞
𝑍 𝑕2 𝑛 = 𝑕2 𝑛 𝑧 −𝑛
𝑛=0
∞
= 𝑕 𝑛 𝑕 𝑛 𝑧 −𝑛
𝑛=0
∞
1
= ∮𝑐 𝐻 𝑧 𝑧 𝑛 −1 𝑑𝑧 𝑕 𝑛 𝑧 −𝑛 (from 9)
2𝜋𝑗
𝑛=0
∞ ∞
2 −𝑛
1
𝑕 𝑛 𝑧 = ∮ 𝐻 𝑧 𝑧 −1 𝑑𝑧𝑕 𝑛
2𝜋𝑗 𝑐
𝑛=0 𝑛=0
Multiply both sides by „Zn‟
145
∞ ∞
1
𝑕2 𝑛 = ∮ 𝐻 𝑧 𝑧 −1 𝑧 𝑛 𝑑𝑧𝑕 𝑛
2𝜋𝑗 𝑐
𝑛=0 𝑛=0
∞
1
= ∮ 𝐻 𝑧 𝑧 −1 −𝑛 −1
𝑧 𝑑𝑧 𝑕 𝑛
2𝜋𝑗 𝑐
𝑛=0
∞
1
= ∮ 𝑕 𝑛 𝑧 −1 −𝑛
𝐻 𝑧 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐
𝑛=0
1
= ∮ 𝐻 𝑧 −1 𝐻 𝑧 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐
∞
1
𝑕2 𝑛 = ∮ 𝐻 𝑧 𝐻 𝑧 −1 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐
𝑛=0
Hence Proved.
146
PROBLEM: Find the output round off noise power for the system having the transfer
1
function 𝐻 𝑧 = assume the word length 4 bits.
1−0.5𝑧 −1 (1+0.4𝑧 −1 )
1 1
𝐻1 𝑧 = , 𝐻2 𝑧 =
1 − 0.5𝑧 −1 1 + 0.4 𝑧−1
𝑌1 𝑧 1 𝑌2 𝑧 1
= , =
𝑋 𝑧 1 − 0.5𝑧 −1 𝑌1 𝑧 1 + 0.4𝑧 −1
𝑌1 𝑧 1 − 0.5𝑧 −1 = 𝑋 𝑧 , 𝑌2 𝑧 1 + 0.4𝑧 −1 = 𝑌 𝑧
𝑌1 𝑧 − 0.5𝑧 −1 𝑌1 𝑧 = 𝑋 𝑧 , 𝑌2 𝑧 0.4𝑧 −1 𝑌2 𝑧 = 𝑌1 𝑧
𝑌1 𝑧 = 𝑋 𝑧 + 0.5𝑧 −1 𝑌1 𝑧 𝑌2 𝑧 = 𝑌1 𝑧 − 0.4𝑧 −1 𝑌2 𝑧
2 2 2
𝜎𝑒0 = 𝜎01 + 𝜎02
2
𝜎01 → 𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟 𝑑𝑢𝑒 𝑡𝑜 𝐻1 𝑧 &𝐻2 𝑧
2
𝜎02 → 𝑁𝑜𝑖𝑠𝑒 𝑝𝑜𝑤𝑒𝑟 𝑑𝑢𝑒 𝑡𝑜 𝐻2 𝑧
2
1
𝜎𝑒0 = 𝜎𝑒2 ∮ 𝐻 𝑧 𝐻 𝑧 −1 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐
1
𝐻 𝑧 = 𝐻1 𝑧 𝐻2 𝑧 =
1 − 0.5𝑧 −1 1 + 0.4𝑧 −1
147
1
𝐻 𝑧 −1 =
1 − 0.5𝑧 1 + 0.4𝑧
2
1 1 1𝑧 −1 𝑑𝑧
𝜎01 = 𝜎𝑒2 ∮𝑐 −1
2𝜋𝑗 1 − 0.5𝑧 1 + 0.4𝑍 −1 1 − 0.5𝑧 1 + 0.4𝑍
1 1 1𝑧 −1 𝑑𝑧
= 𝜎𝑒2 ∮𝑐 −1
2𝜋𝑗 𝑍 𝑧 − 0.5 𝑍 −1 𝑍 + 0.4 −0.5 𝑍−
1
0.4 𝑧 +
1
0.5 0.4
1 𝑧2 𝑧 −1 𝑑𝑧
= 𝜎𝑒2 ∮𝑐
2𝜋𝑗 𝑧 − 0.5 𝑧 + 0.4 −0.2 (𝑧 − 2) 𝑧 + 2.5
2
1 −5𝑧 𝑑𝑧
𝜎01 = 𝜎𝑒2 ∮𝑐
2𝜋𝑗 𝑧 − 0.5 𝑧 + 0.4 𝑧 − 2 𝑧 + 2.5
Stable poles, Z1 = 0.5, Z2 = -0.4
Unstable poles, Z3=2, Z4 = -2.5
2
= 𝜎01 = 𝜎𝑒2 𝐼1
−5𝑧
𝐼1 = lim 𝑍 − 0.5
𝑧→0.5 𝑧 − 0.5 𝑧 + 0.4 𝑧 − 2 𝑧 + 2.5
−5𝑧
lim 𝑧 + 0.4
𝑧→0.4 𝑧 − 0.5 𝑧 + 0.4 𝑧 − 2 𝑧 + 2.5
𝐼1 = 0.617 + 0.440
𝐼1 = 1.057
2−2𝑏
𝜎𝑒2 =
12
2−2 4
=
12
𝜎𝑒2 = 3.255 𝑥 10−4

2
𝜎01 = 𝜎𝑒2 𝐼1
=3.255 x 10-4 x 1.057
2
𝜎01 = 3.44 𝑥 10−4
148
1
𝐻2 𝑧 =
1 + 0.4𝑧 −1
1
𝐻2 𝑧 −1 =
1 + 0.4𝑧
2
1
𝜎02 = 𝜎𝑒2 ∮ 𝐻 𝑧 𝐻2 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐 2
1 1 1
= 𝜎𝑒2 ∮𝑐 −1
. 𝑧 −1 𝑑𝑧
2𝜋𝑗 1 + 0.4 1 + 0.4−1
1 1 1𝑧 −1𝑑𝑧
= 𝜎𝑒2 ∮
2𝜋𝑗 𝑐 𝑧 −1 𝑧 + 0.4 0.4 𝑧 + 1
0.4
2
1 2.5𝑧
𝜎02 = 𝜎𝑒2 ∮𝑐
2𝜋𝑗 𝑧 + 0.4 𝑧 + 2.5
Stable pole 𝑍1 = −0.4

Unstable pole 𝑍2 = −2.5
2
𝜎02 = 𝜎𝑒2 𝐼2
2.5
𝐼2 = lim 𝑧 + 0.4
𝑍→−0.4 𝑧 + 0.4 𝑧 + 2.5
2.5
= = 1.19
2.1
2
𝜎02 = 𝜎𝑒2 𝐼2
= 3.255𝑥10−4 1.19
2
𝜎02 = 3.875 𝑥 10−4
2 2 2
𝜎𝑒0 = 𝜎01 + 𝜎02
= 3.44 𝑥 10−4 + 3.875 𝑥 10−4

Output round off Noise power.
2
𝜎𝑒0 = 7.315 𝑥 10−4
149
4.13 LIMIT CYCLE OSCILATIONS
 In recursive sysems, When the input is zero or some non zero constant value, the
nonlinearities due to finite precision arithmetic operations may cause periodic
oscillations in the Output.
 During periodic oscillations, the Output 𝑦 𝑛 of a system will oscillate betwwen a
finite pasitive and negative value for increasing n or the output will become
constant increasign n, Such oscillations are called limit cycles.
 These oscillations are due o round off erros in multiplication and overflow in
addition.
 There are two types of limit cycle oscillations.
 Zero Input Limit cycle oscillations.
 Overflow limit cycle oscillations.
ZERO INPUT LIMIT CYCLE OSCILLATIONS:

 In recursive systems, if the system output enters a limit cycle, it will continue to
remain in limit cycle even when the input is made zero. Hence these limit cycles
are also called zero input limit cycle oscillations.
 The systems output remains in limit cycle until another input of sufficient
magnitude is applied to drive the system out of limit cycle.
OVERFLOW LIMIT CYCLE OSCILATIONS:

 In fixed point addition of two binary numbers the overflow occurs when the sum
exceeds the finite word length of the register used to store the sum. The overflow
in addition may lead to oscillations in the output which is referred to as overflow
limit cycle oscillations.
 The overflow occurs when the sum exceeds the dynamic range of number system.
 The overflow oscillations can be elimination if saturation arithmetic is performed.
150
 In saturation arithmetic, when an overflow is sensed, the output (sum) is set equal
to maximum allowable value and when an underflow is sensed, The Output (sum)
is set equal to minimum allowable value.
f(x)
-
1
0 1 X
Fig: Characteristics of Saturation adder.

 The saturation arithmetic introduces nonlinearity in the adder and the signal
distortion due to this non linearity to small if the saturation occurs infrequently.
PROBLEM:
Explain the characteristics of a limit cycle oscillation with respect to the system described
by the difference equation,
𝑦 𝑛 = 0.95𝑦 𝑛 − 1 + 𝑥 𝑛
Determine the dead band of the filter. B=4.
Assume b=4
1 −𝑏
2
2
Dead band =
1−∝
∝= 0.95 (From difference equation)

1 −4
2
2
Dead band = = 0.625.
1−0.95
Dead band =0.625
𝑥 𝑛 = 0.95 Converted into binary,
151
0.111 1
0.95 x 2 =1.9
0.9 x 2 =1.8 
0.8 x 2= 1.6 1 x 2-4=0.0624
0.6 x 2 =1.2 1 x 2-3=0.125
1 x 2-2=0.25
1 x 2-1=0.5
0.9375
0.9375 ; 𝑛=0
𝑥 𝑛 =
0 ; 𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
𝑛 = 0; 𝑦 0 = 0.95𝑦 0 − 1 + 𝑥 0
= 0.95 ∗ 0 + 𝑥 0
𝐲 𝟎 = 𝟎. 𝟗𝟑𝟕𝟓
𝐧 = 𝟏; 𝑦 1 = 0.95𝑦 1 − 1 + 𝑥 1
= 0.95𝑦 0 + 0
= 0.95 ∗ 0.9375
𝑦 1 = 0.8906
Convert into binary,
0.8906 x 2 =1.7812
0.7812 x 2 =1.5624 0.111 0
Type equation
0.5624 here.
x 2= 1.1248
0.1248 x 2 =0.2496
0 x 2-4=0
1 x 2-3=0.125
1 x 2-2=0.25
1 x 2-1=0.5
0.875
𝐲 𝟏 = 𝟎. 𝟖𝟕𝟓
152
𝐧 = 𝟐; 𝑦 2 = 0.95𝑦 2 − 1 + 𝑥 2
= 0.95𝑦 1 + 0
= 095 𝑥 0.875
𝑦 2 = 0.83125
0.83125x 2 =1.6625
0.11 0 1
0.6625 x 2 =1.325
0.325x 2= 0.65
0.65x 2 =1.3
1 x 2-4=0.625
0 x 2-3=0
= 0.1101 1 x 2-2=0.25
1 x 2-1=0.5
0.8125
𝐧 = 𝟑; 𝑦 3 = 0.95𝑦 3 − 1 + 𝑥 3
𝑦 3 = 0.95 ∗ 0.8125 + 0
𝑦 3 = 0.771875
0.771875x 2 =1.54375
0.11 0 0
0.54375x 2 =1.0875
0.0875x 2= 0.175
0.175x 2 =0.35
=0.1100
0
0
1 x 2-2=0.25
1 x 2-1=0.5
0.75
𝐲 𝟑 = 𝟎. 𝟕𝟓
153
𝐧 = 𝟒; 𝑦 4 = 0.95𝑦 4 − 1 + 𝑥 4
= 0.95 ∗ 0.75 + 0
𝑦 4 = 0.7125
0.7125 Converted into binary 𝑦 4 = 0.1011

0.1011Converted into decimal𝑦 4 = 0.6875
𝐲 𝟒 = 𝟎. 𝟔𝟖𝟕𝟓
𝐧 = 𝟓; 𝑦 5 = 0.95𝑦 5 − 1 + 𝑥 5
= 0.95 ∗ 0.6875 = 0.653125

0.65125 Converted into binary, 0.1010 0.1010 converted into decimal 0.625 Dead
Band.
 Ans
𝐲 𝟓 = 𝟎. 𝟔𝟐𝟓
SIGNAL SCALING
 The two methods of preventing overflow are saturation arithmetic and sealing the
input signal to the adder.
 In saturation arithmetic, undesirable signal distortion is introduced.
 In order to limit the signal distortion due to frequency overflows, the input signal
to the adder can be scaled such that the overflow becomes a rare event.
Fig: Realization of Second Order IIR filter.
154
The overall input output transfer function,
𝑏0 + 𝑏1 𝑧 −1 + 𝑏2 𝑧 −2
𝐻 𝑧 = 𝑆0
1 + 𝑎1 𝑧 −1 + 𝑎2 𝑧 −2
𝑁 𝑧
𝐻 𝑧 = 𝑆0
𝐷 𝑧
from figure,
𝑊 𝑧 𝑆0 𝑆0
𝐻 𝑧 = = −1 −2
=
𝑋 𝑧 1 + 01 𝑧 + 𝑎2 𝑧 𝐷 𝑧
𝑆0 𝑋 𝑧
𝑊 𝑧 = = 𝑆0 𝑆 𝑧 𝑋 𝑧
𝐷 𝑧
1
Where, 𝑆 𝑧 = →𝐴
𝐷 𝑧
𝑆0
𝑊 𝑛 = 𝑆 𝑒 𝑖𝜃 𝑋 𝑒 𝑖𝜃 𝑒 𝑖𝑛𝜃 𝑑𝜃
2Π
2
𝑆0 2 𝑖𝜃 𝑖𝜃 𝑖𝑛𝜃
2
𝑊 𝑛 = 2 𝑆 𝑒 𝑋 𝑒 𝑒 𝑑𝜃
4𝜋
1 2 1 2
𝑊 2 𝑛 = 𝑆0 2 𝑆 𝑒 𝑖𝜃 𝑑𝜃 𝑋 𝑒 𝑖𝜃 𝑑𝜃
2𝜋 2𝜋
Apply parseval‟s theorem,

1 2
𝑊 2 𝑛 = 𝑆0 2 ∞
𝑛=0 𝑥
2
𝑛 ∫ 𝑆 𝑒 𝑖𝜃 𝑑𝜃 ---1
2𝜋
𝑍 = 𝑒 𝑗𝜃
𝑑𝑧 = 𝑗𝑒 𝑗𝜃 𝑑𝜃
𝑑𝑧
𝑑𝜃 = ----2
𝑗 𝑒 𝑗𝜃
Substitute 2 in 1
∞
2 2 1
𝑊 𝑛 = 𝑆0 𝑥2 𝑛 ∮ 𝑆 𝑧 2 −1
𝑧 𝑑𝑧
2𝜋𝑗 𝑐
𝑛=0
155
∞
2 1
= 𝑆0 𝑥2 𝑛 ∮ 𝑆 𝑧 𝑆 𝑧 −1 𝑧 −1 𝑑𝑧
2𝜋𝑗 𝑐
𝑛=0
∞
𝑊2 𝑛 = 𝑥2 𝑛
𝑛=0
When,
𝑆0 2
∮𝑐 𝑆 𝑧 𝑆 𝑧 −1 𝑧 −1 𝑑𝑧 = 1
2𝜋𝑗
1
𝑆0 2 = 1
∮𝑐 𝑆 𝑧 𝑆 𝑧 −1 𝑧 −1 𝑑𝑧
2𝜋𝑗
2 𝑍 −1 𝑑𝑧 1
𝑆0 = 1 𝑧 −1 𝑑𝑧
=
∮𝑐 𝐼
2𝜋𝑗 𝐷 𝑧 𝐷 𝑧 −1
(from A)
Where,
1 𝑧 −1 𝑑𝑧
𝐼= ∮
2𝜋𝑗 𝑐 𝐷 𝑧 𝐷 𝑧 −1
𝑆0 → 𝑆𝑐𝑎𝑙𝑖𝑛𝑔 𝑓𝑎𝑐𝑡𝑜𝑟.
Problem:
0.5+04𝑧 −1
Given 𝐻 𝑧 = is the transfer function of a digital filter. Find the scalling factor.
1−0.312𝑧 −1
So to avoid overflow in adder of the digital filter.

1 1 𝑧 −1 𝑑𝑧
𝑆02 = ;𝐼 = ∮
𝐼 2𝜋𝑗 𝑐 𝐷 𝑧 𝐷 𝑧 −1
from problem, 𝐷 𝑧 = 1 − 0.312𝑧 −1
156
1 𝑧 −1
𝐼= ∮ 𝑑𝑧
2𝜋𝑗 𝑐 1 − 0.312𝑧 −1 1 − 0.312𝑧
1 𝑧 −1
= ∮𝑐 1
𝑑𝑧
2𝜋𝑗 𝑧 −1 𝑧 − 0.312 −0.312 𝑧−
0.312
1 −3.205
𝐼= ∮𝑐 𝑑𝑧
2𝜋𝑗 𝑧 − 0.312 𝑧 − 0.312
Stable pole: z=0.312

Unstable pole: z=3.205
−3.205
= lim 𝑧 − 0.312
𝑧→0.312 𝑧 − 0.312 𝑧 − 3.205
I=1.1078
1 1
𝑆0 = = = 0.9501
𝐼 1.1078
𝑆0 = 0.9501
157

Unit 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 4

Uploaded by

Copyright:

Available Formats

UNIT-IV FINITE WORD LENGTH EFFECTS

4.2 TYPES OF NUMBER REPRESENTATION:

SIGN MAGNITUDE FORM (S&M):

 In Sign magnitude form the number „0‟ has two representations,

=1.000111 (Complementing each Bit)

In this type of representation „0‟ can be represented as,

4.4 FLOATING POINT REPRESENTATION:

C- Number of Multiplication (or) Division.

(It comes under the ranges ie 0.5 ≤ M ≤ 1)

3) 0.25 0.25 x 2 = 0.5 (It comes Under the ranges)

4) −0.25 ⟹ 𝐹 = 2010 . 0.10000 𝐹 = 2010 3𝐵𝑖𝑡𝑠

Fig: Block Diagram of A/D Converter

The following error arises due to quantization of numbers.

Fig:Input –Output Characteristic of Quantizer due to Quantization.

Fixed point Two’s Complement Fixed Point –One’s Complement

-2x 2-b 0 2x2-b εt -2x 2-b 0 εt

Fig: Probability Density functions for Truncations

ERROR DUE TO ROUNDING:

assigned the quantization Step 1 ∗ 2−𝑏

assigned the quantization step 2 ∗ 2−𝑏 and so on.

𝑁𝑟 ⟶ Fixed point binary number quantized by rounding

xq(n) y(n)=axq(n)+ e (n)

Fig. Fixed point product round off noise model

Fig: Quantization noise model for a first order system.

system function in direct form and in cascade form Take b=3bits.

-0.95 Converted in to binary.

After truncation to ‘3’ bits.

Convert 1.111 2 into decimal.

After truncation to ‘3’ bits.

0.45 is converted into binary

After truncation to „3‟ bits.

Sampler Quantizer Digital

Sampler xq(n)= xq(n)+e(n)

Fig: Quantization Noise Model

 If rounding is used for quantization then the quantization error

𝑆𝑁𝑅 = 12 22𝑏 𝜎𝑋2

Fig: Representation of A/D conversion Noise

PROOF of PARSEVAL’S THEOREM:

Multiply both sides by „Zn‟

𝜎𝑒2 = 3.255 𝑥 10−4

Stable pole 𝑍1 = −0.4

= 3.44 𝑥 10−4 + 3.875 𝑥 10−4

ZERO INPUT LIMIT CYCLE OSCILLATIONS:

OVERFLOW LIMIT CYCLE OSCILATIONS:

Fig: Characteristics of Saturation adder.

∝= 0.95 (From difference equation)

Convert into binary,

0.7125 Converted into binary 𝑦 4 = 0.1011

= 0.95 ∗ 0.6875 = 0.653125

Fig: Realization of Second Order IIR filter.

Apply parseval‟s theorem,

So to avoid overflow in adder of the digital filter.

Stable pole: z=0.312

You might also like