This action might not be possible to undo. Are you sure you want to continue?

Say we stored a number x as a single precision ﬂoating bit number, it will look something like bit position 31 biased exponent - 30 to 23 mantissa - 22 to 0 Float(x) stored as sign e7 e6 e5 e4 e3 e2 e1 e0 m22 m21 . . . m2 m1 m0 That is, the ﬁrst 23 bits, from bit 0 to bit 22, are reserved for the mantissa, which represents the binary rational number 0.m such that

22

0.m =

i=0 ∞ i=1

mi ×

1 2

i+1

=

m21 m22 m0 m1 m2 + + + . . . + 22 + 23 2 4 8 2 2

where each of mi can either be the binary bit 0 or 1. You can easily show that 1 i = 1, which means that 0.m must be strictly less than 1. 2 Furthermore, the next 8 bits, bits 23 to 30, are reserved for the biased exponent, which represents the binary integer e + 127 such that

10

e + 127 =

i=0

ei × 2i = e0 + 2e1 + 4e2 + . . . + 29 e9 + 210 e10

where each of ei is also either 0 or 1. Now, the decimal x is represented as x = (1 + 0.mx ) × 2(ex +127)−127 Let’s look at an example before moving on Say we are given the following ﬂoating point number, and we are asked to ﬁnd its decimal point equivalent. 31 biased exponent - 30 to 23 mantissa - 22 to 0 ? 0 00001111111 00110011001100110011001 We start by ﬁnding out what e? + 127 is. From the above formula, we have

10

e? + 127 =

i=0

ei × 2i = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 0 + 0 + 0 + 0 = 127

Note that e0 in the above diagram is all the way at the right while e10 is at the leftmost. Similarly, we need to ﬁgure out what 0.m? is. We start oﬀ with the formula given above

1

22

0.m? =

i=0

mi ×

1 2

i+1

m21 m22 m0 m1 m2 + + + . . . + 22 + 23 = 2 4 8 2 2 1 0 0 1 1 0 0 = + + + + + . . . + 22 + 23 2 4 8 16 32 2 2 ≈ 0.5999999046325684 So then this means that ? = (1+0.m? )×2(e? +127)−127 = (1+0.5999999046325684)×2(127)−127 = 1.5999999046325684

**How to square root
**

Taking the above deﬁnition of a ﬂoating point number, let’s see how the square root aﬀects its components y= √ x= (1 + 0.mx ) × 2(ex +127)−127

now, y itself can be stored as a ﬂoating point number y = (1 + 0.my ) × 2(ey +127)−127 (1 + 0.mx ) × 2(ex +127)−127 √ √ = 1 + 0.mx × 2(ex +127)−127 √ (ex +127)−127 2 = 1 + 0.mx × 2 = If you’ve taken a physics course on waves, you’ll probably remember having to use the following approximation √ x 1+x≈1+ 2 This√ taken from the ﬁrst order taylor expansion of the square root function, let is f (x) = x

2

f (1 + ) = f (1) + f (1) + = √

2

2

f (1) + O( 3 )

1 + √ − √ 3 + O( 3 ) 2 1 8 1

2

=1+ So we can simplify y further

2

−

8

+ O( 3 )

2

y = (1 + 0.my ) × 2(ey +127)−127 √ (ex +127)−127 2 = 1 + 0.mx × 2 (ex +127)−127 0.mx 2 ≈ 1+ ×2 2 By matching up the es and ms, the above suggests that for y = 0.my ≈ and 0.mx 2

√

x

(ex + 127) − 127 (ex + 127) − 127 ⇐⇒ (ey + 127) = + 127 2 2 That is, the mantissa bits (bit 0 to 22) of y should be approximately half that of the mantissa for x, and the ”biased exponent” of y (ey + 127) should be about half of (the biased exponent of x - 127), and all of that added together with 127, so this is what we will attempt to write. (ey + 127) − 127 =

**Onto the code
**

In this section, we’ll try to ﬁt the above logic into C from scratch, then ﬁgure out how to optimize We can’t access the bits of a ﬂoating point number conveniently in C, so we need to ﬁnd a way to convert it into an underlying type where it is easy to work with the bits. You can’t just cast it to an int as the compiler will actually just truncate the number to an integer and discard all of the underlying information about how the number was originally represented. Instead, we can put it in memmory, and then dereference that location as an integer pointer to get the actual bits representing the ﬂoating point number.

float x; i n t i = ∗ ( i n t ∗ )&x ; // t h i s g i v e s you t h e b i t a r r a y r e p r e s e n t i n g x

**Alternatively, you can also use the following
**

float x; union { float x; int i ; } u; u.x = x; i n t i = u . i ; // t h i s g i v e s you t h e b i t a r r a y r e p r e s e n t i n g x

On gcc4.7 -o3, both compiles to a single mov instruction from a xmm register to a exx register (ﬂoating pointer register to a integer register). Once we ﬁgure out how to get the bit array of a ﬂoating point number into a variable i, we need to be able to isolate out the ”biased exponent” bits (23 to 30) and the exponent bits. We can do this by creating a bit mask and bitwise anding the bit array with the mask to extract out the required parts 3

// exp // #d e f i n e #d e f i n e #d e f i n e

i s between 23 and 30 b i t s 10987654321098765432109876543210 EXP 0 b01111111100000000000000000000000 MAN 0 b00000000011111111111111111111111 BIAS (1<<23)

i n t m x = i& MAN; i n t e x = ( i&EXP) − BIAS ; // t h i s i s t h e u n b i a s e d exponent

**Next, we ﬁnd an approximation for y as from above
**

// m y = m x/2 int m y = m x/2; // e y = e x / 2 , where e y and e x a r e t h e u n b i a s e d e x p o n e n t s // be c a r e f u l , i f e x i s odd , e x /2 w i l l g e t t r u n c a t e d int e y = e x /2; // however , we need t o t u r n t h e u n b i a s e d exponent i n t o t h e b i a s e d one , we do t h i s by adding t h e b i a s e y += BIAS

Next, we need to concatenate the exponent and the mantissa together, we can do this by oring the two together, which is just the addition operator

i n t j = ( e y&EXP) +(m y& MAN) ; f l o a t y = ∗ ( f l o a t ∗ )&j ;

Finally, if the unbiased exponent for x is odd, then the division of the exponent in the 1 previous piece of code will truncate the exponent by an error of 2 2 , which is just sqrt2. √ Hence, we need to test to see if ex is odd, and if it is, we need to multiply y by 2

#d e f i n e SQRT2 1 . 4 1 4 2 1 3 5 6 2 3 7 3 0 9 5 1 i f ( e x & (1<<23) ) { y ∗= SQRT2 ; } return y ;

However, we can rewrite this more succintly. Convince yourself that the above code can be more succintly written as

i n t j = ( ( ( i /2−BIAS / 2 )+BIAS )&EXP) +(( i& MAN) / 2 ) ;

**Crude Error compensation
**

√ As we’ve seen previously, the ﬁrst order taylor approximation of 1 + 0.mx has an error 2 that can be approximated by 8 . Now there’s no easy way of calculating 2 in integer arithmetic, so what we do instead is bound 2 to the average. Now, since the distribution of the digits follow benfords law, we can assume that the expected value of 0.m is a little bit above 0.59. Since around this neighborhood, 0.m2 ≈ 0.m to 0.m, we compromise and 2 2 2 say that 8 ≈ 10 , which means that 1 + 2 − 8 ≈ 1 + 2 . This is how we got the expression 5

i n t j = ( ( ( i /2−BIAS / 2 )+BIAS )&EXP) +(( i& MAN) ∗ 2 / 5 ) ;

4

Method used to find the square root first guess by exploiting ieee properties

Method used to find the square root first guess by exploiting ieee properties

- Math A
- November ('05)12 - Spoken English
- May ('06)-14 - Spoken English
- Approximate Integration
- Feb ('08)-11 - Spoken English/Eenadu/Pratibha
- oct ('06)-7 - Spoken English
- 15-1-2011english
- Fast Inverse Square Root
- PRATIBHA733-736
- PRATIBHA664-666
- Guts_Round_solutions.pdf
- jan ('06)-5 - Spoken English
- buch_arabic-english
- Gateway to Arabic - Book Three - Answer Booklet by Dr. Imran Hamza Alawiye - مفتاح العربية
- Square Root of 2
- analqs_ch2
- Thales
- page32-32
- Square Root
- Radicals
- Heron of Alexandria-Math contributions-square root method and area of a triangle
- Training Calender 2014
- Math 31.03.13
- LINEARLAW 2QZZ
- Euclidean Space
- Emt
- Base Band
- QuantProblemSet IV
- Thedford-ADS-B In_Out Tech Issues-Tuesday Track1
- 13 Brahmasphutasiddhanta III (KR)

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd