You are on page 1of 9

What Every JavaScript Developer Should Know

About Floating Point Numbers


 Posted on February 24, 2014

After I gave my talk on JavaScript (https://blog.chewxy.com/2014/01/27/JavaScript-wat-again/)


(really, I was there trying to shamelessly plug my book - Underhanded JavaScript
(https://leanpub.com/underhandedJavaScript?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog) and its alternate title: JavasScript Technical Interview Questions
(https://leanpub.com/jsinterviewquestions?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog)), there was a Q&A session. I could answer most questions, but Khalid Hilaby
(https://twitter.com/hilaby) asked me a very interesting and quite general question on JavaScript
number types. He had simply wanted to know more about oats in JavaScript and why they act so
strangely. While I could answer the question, I felt I didn’t answer it well enough. I loaded my article on
Pointer Tagging in Go to explain the structure of a oating point number, explained a bit on oating
point arithmetic, and how in the past they had to have special CPUs for oating points (FPUs)* , and
then sort of meandered from there.

Now that I am back in Sydney and well rested, I thought I’d give the question a second try. The result is
the article - What Every JavaScript Developer Should Know About Floating Points
(http:// ippinawesome.org/2014/02/17/what-every-JavaScript-developer-should-know-about-
oating-points/) on Flippin’ Awesome. This is the full unedited version before I edited down for length
and appropriateness for Flippin’ Awesome.

This articles assume the reader is familiar with base-2 representations of base-10 numbers (i.e. 1 is 1 b, 2
is 10b, 3 is 11b, 4 is 100b… etc). In this article, the word “decimal” mostly refers to the decimal
representation of numbers (for example: 2.718). The word “binary” refers to a machine representation.
Written representations will be referred to as “base-10” and “base-2”.

Floating Points
To gure out what a oating point is, we rst start with the idea that there are many kinds of numbers,
which we will go through. We call 1 is an integer - it is a whole number with no fractional values in it.

½ is what’s called a fraction. It implies that the whole number 1 is being divided into 2. The concept of
fractions is a very important one in deriving oating points.

0.5 is commonly known as a decimal number. However, a very important distinction needs to be made
- 0.5 is actually the decimal(base-10) representation of the fraction ¹⁄ . This is how ¹⁄ is represented
when written as a base-10 number - call it the positional notation. We call 0.5 a nite representation
because the numbers in the representation for the fraction is nite - there are no more numbers after
5 in 0.5 . An in nite representation would for example be 0.3333… when representing ⅓. Again, this
idea is an important idea later on.

There exists too another way of representing numbers other than as whole numbers, fractions or
decimal notations. You might have actually seen it before. It looks something like this: 6.022 x 1023 * .
It’s commonly known as the standard form, or the scienti c notation. That form can be generalized to
something that looks like this

D1.D2D3D4...Dp x BE

The general form is called a oating point.

The sequence of p digits of D, D1.D2D3D4…Dp are called Signi cands or Mantissa. p is the number of
signi cant digits, commonly called the Precision. In the case of the simple Avogadro’s number above, let
p be 4. x follows the mantissa (and is part of the notation. The multiplication symbol that will be used
throughout this article will be * ). The Base digit comes after, followed by the Exponent. The exponent
can be a positive or negative number.

The beauty of the oating point is that it can be used to represent ANY number at all. For example, the
integer 1 can be represented as 1.0 x 100 . The speed of light can be represented as 2.99792458 x
106 metres per second. ¹⁄ can be represented in base-2 as 0.1 x 20 .

The Radix Point


If the last example above seemed a little strange, it’s because we don’t normally see a representation of
fractions in base-2. In case you were wondering how to represent fractions in binary with a radix point,
I’m going to show you how.

But rst, let’s have a look at the decimal representation. Why is ¹⁄ 0.5? If you’re like me, you learned in
school on how to do long division. It was also the way explained why ¹⁄ is 0.5 - you simply divided 1
into 2:

   0.5
2  1   
   0
   1 0
   1
0  

There is another way to look at fractions - look at them in terms of the number base and exponent. ¹⁄
can be expressed as a fraction with 101 as the denominator: 5⁄10. In fact, that is the rule when it comes
to determining if a fraction can be nitely represented with a radix point - if it can be expressed as a
fraction with the base and exponent as a denominator, it can be nitely expressed with the radix point
notation.
The idea behind the positional notation is a simple one. Let’s look at an example. Consider the number
19.95 (the price I’m considering for my books - Underhanded JavaScript
(https://leanpub.com/underhandedJavaScript?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog) and JavasScript Technical Interview Questions (https://leanpub.com/jsinterviewquestions?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog)). It can be broken down into the positions as follows:

1 9 . 9 5
1 0 -1
10 10 . 10 10-2

This says that there is 1 unit in the 10 position, 9 units in the 1 position, 9 units in the 0.1 position
and 5 units in the 0.01 position. This concept can likewise be extended to base-2 numbers. Instead of
powers of 10, the positional notation for base-2 numbers have powers of 2 as the positions. It is for this
reason why 10 in base-2 is 2, and why 100 in base-2 is 4.

To detemine if a number can be nitely expressed in base-2, the same method as above applies - check
to see if the fraction can be expressed with a denominator that is a power of 2. Let’s take a simple
example: 0.75 . 0.75 can be expressed as 3⁄4 , of which 4 is 100 in base-2. So it can be written as:
11⁄100 . We know then that this can be nitely expressed as 0.11 . Doing long division with base-2
numbers too yield the same result.

There is also a short-cut method to convert from decimal to base-2 radix point representation, which I
for quick mental estimation:

1. Take non-integral part of the the decimal and multiply it by 2: 0.75 * 2 = 1.50 .
2. Reserve the integral part of the result - 1. The base-2 radix point representation now reads 0.1
3. The the non-integral part of the result and multiply it by 2: 0.5 * 2 = 1.00 .
4. Repeat 2 and 3 until nished* : The radix point now reads 0.11
5. Replace any integral part of the original decimal with the base-2 equivalent.

Now, try it for yourself with either methods, the fraction 1⁄10. Interesting results isn’t it? This will be
important later on.

Removing the Radix Point


In the above examples, we’re still quite tied to having a radix point (the dot in the number). This
presents some problems when it comes to representing something in binary. Given an arbitrary
oating point, say π, we can represent it as a oating point as such: 3.14159 x 100 . In a base-2
representation, it would look something like this: 11.00100100 001111… . Assuming that the number is
represented in a 16 bit manner, this means the digits would be laid out in the machine like this:
11001001000011111 . The question now is this: where is the radix point supposed to be? This doesn’t
even yet involve the exponent (we implicitly assume the base is base-2).

What about if the number was 5.14159 ? The integral part would be 101 instead of 11 , requiring one
more bit eld. Of course, we could specify that the rst n bits of the eld belong to the integer part (i.e.
the left of the radix point), and the rest belongs to the fractional parts, but that’s the topic for another
article about xed point numbers.

Once we remove the radix point, then we only have two things to keep track of: the exponent and the
mantissa. We can remove the radix point by applying a transformation formula, making the generalized
oating point look like this:

D1D2D3D4...Dp ⁄ Bp-1 x BE

This is where we derive most of our binary oating points from. Note that the signi cand is now an
integer. This makes it far simpler to store a oating point number in a machine. In fact, the most widely
used method of representing oating points in binary is with the IEEE 754 format.

IEEE 754
The representation of oating points in JavaScript follows the format as speci ed in IEEE-754
(http://en.wikipedia.org/wiki/IEEE_ oating_point). Speci cally it is a double-precision format,
meaning that 64 bits are allocated for each oating point. Although it is not the only way to represent
oating points in binary, it is by far the most widely used format* . The format is represented in 64-bits
of binary like so:

seeeeeee eeee ffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff


1 11 52

Of the 64 bits available, 1 bit is used for the sign - whether a number is positive or not. 11 bits are used
for the exponent - this allows for up to 1023 as an exponent. The reason for this is because the
exponent actually uses something called offset binary encoding to encode negative numbers. What this
basically means is that if all 11 bit elds are set to 0 (the decimal equivalent is 0 ), the exponent is
actually -1023 in decimal. When all 11 bit elds are set to 1 (the decimal equivalent is 2047 ), the
exponent is actually 1024 in decimal. The exponent of 2047 is actually reserved for special numbers,
as described below.

The remaining 52 bits are allocated for the mantissa. Even that is interesting. Look through a list of
scienti c constants - they’re all written in scienti c notation. Notice that to the left of the radix point,
there is usually only one non-zero digit. This is called the nomalized form. Likewise with oating points,
there is a concept of having a normalized form - in fact, oating points are stored in the normalized
form in binary according to the IEEE-754 standard. However, there is an interesting feature when
storing the normalized form.

Let us consider the fraction 3⁄4. In base-2, it is written 0.11 . This is not the normalized form. The
normalized form is written 1.1 x 2-1 - recall that the integral part of the positional notation cannot
be 0 in the normalized form. It is the normalized form that is stored according to the speci cation.

Because in base-2, digits can only either be 0 or 1 , the normalized form of the oating point always
have the form of 1.xxxx x 2E . This is a convenient feature - you wouldn’t need to store the rst digit -
it’s implied to be always 1 . This gives one whole extra bit of precision. So the mantissa always stores
the bit beginning after the radix point. In the case of 3⁄4, the mantissa is
1000000000000000000000000000000000000000000000000000 . Laid out in memory, this is what 3⁄4
looks like:

00111111 1111 1000 00000000 00000000 00000000 00000000 00000000 00000000

The speci cation also allows for special numbers. Both in nity and NaN for example, is encoded as
2047 in the exponent, with the mantissa ranging from 1 (the last mantissa eld is 1) to
4503599627370495 (all the mantissa elds are 1) for NaNs and 0 in the mantissa eld for in nity. Any
number in the mantissa eld is ignored when the exponent is 2047 . Since all pointers are only 48-bits
in size, this allows for some really cool hacking - such as storing pointers inside NaNs.

This oating point format also explains why in JavaScript, there exists +0 and -0 as well as +Infinity
and -Infinity - the sign bit in the front denotes that. The IEEE-754 speci cation also speci es that
NaN will always compare unordered to any operand, even with itself, which is why in JavaScript, NaN
=== NaN will yield false .

If ever you want to look at how numbers are encoded in JavaScript, the IEEE 754 Decimal Converter
(http://www.h-schmidt.net/FloatConverter/) is actually a good site to check out.

Rounding Errors
With the introduction to oating points done, we now enter a more prickly topic - rounding errors. It is
the bane of all developers who develop with oating point numbers, JavaScript developers doubly so,
because the only number format available to JavaScript developers are oating point numbers.

It was mentioned earlier that fractions like ⅓ cannot be nitely represented in base-10. This is actually
true for all numbers represented in any base. For example, in base-2 numbers, 1⁄10 cannot be nitely
represented. It is represented as 0.000110011001100110011… . Note that 0011 is in nitely repeating. It
is because of this particular quirk that causes rounding errors.

But rst, a primer on rounding errors. Consider one of the most famous irrational numbers, Pi:
3.141592653589793… . Most people remember the rst 5 mantissa ( 3.1415 ) really well* - that’s an
example of rounding down, which we will use for this example. The rounding error can hence
calculated as such:

(R - A) ⁄ Bp-1

Where R stands for the rounded number, and A stands for the actual number. B is the base as
previously seen, as was p , which is the precision. So the oft-remembered Pi has a rounding error of:
0.00009265… ).
While this does not sound quite as severe, let’s try this idea with base-2 numbers. Consider the fraction
1⁄10. In base-10, it’s written as 0.1 . In base-2, it is: 0.00011001100110011… . Assuming we round to just
5 mantissa, it’d be written as 0.0001 . But 0.0001 in binary is actually 1⁄16 (or 0.0625)! This means
there is a rounding error of 0.0375, which is rather large. Imagine doing basic mathematics like 0.1 +
0.2 , and the answer returns 0.2625 !

Fortunately, the oating point speci cation that ECMAScript uses speci es up to 52 mantissa (making it
53 bits of information with some clever hacking), so the rounding errors are quite small. In fact the
speci cation actually goes into the details of the errors, and using a fascinating metric called the ulp
(units in last place) to de ne the precision of the oating point. Because conducting arithmetic
operations on oating points causes errors to build up over time, the IEEE 754 speci cation also comes
speci c algorithms for mathematical operations.

However, it should be noted that despite all that, the associative property of binary operations (like
addition, subtraction, multiplication and subtraction) are not guaranteed when dealing with oating
points, even at high precision ones. What I mean by that is ((x + y) + a + b) is not neccessarily
equal to ((x + y) + (a + b)) .

And that is the cause of the bane of JavaScript developers. For example, in JavaScript, 0.1 + 0.2 ===
0.3 will yield false . Hopefully, by now you would know why. What is worse of course, is the fact that
rounding errors add up with each successive mathematical operation performed on it.

Handling Floating Points in JavaScript


I have one suggestion as tho how to handle oating points in JavaScript: don’t. But of course, given that
JavaScript is such a shit language and only has one numerical type, it is unavoidable. There have been
plenty of suggestions, both good and bad, when it comes to dealing with JavaScript numbers. Most of
these suggestions have to do with rounding numbers in JavaScript before or after binary operations.

The worst advice I’ve actually heard so far is to “expect oating point rounding errors, and duct tape
around it”. The advice then follows on to say - if you expect 0.1 to be 0.10000000000000001 then
work as if you’re working with 0.10000000000000001 all the time. I mean, wtf is with that kind of
ridiculous advice??! Sorry, but that’s plain dumb.

Another suggestion - one that isn’t actually too bad on the surface but shows all sorts of problems once
you’ve given it some thought - is storing everything as an integer number (not the type) for operations,
and then formatting it for display. An example can be seen as used by Stripe - the amounts are stored in
cents. This has a notable problem - not all currencies in the world are actually decimal (Mauritiana).
There too exists currencies in the world where there are no subunits (Japanese Yen) or non-100
subunits (Jordanian Dinars), or more than one subunits (Chinese Renminbi). Eventually, you’d just
recreate the oating point. Probably poorly too.

The best suggestions I’ve seen to handle oating points is to use properly tested libraries like sinfuljs
(https://github.com/guipn/sinful.js) or mathjs (http://mathjs.org) for handling them. I personally
prefer mathjs (but really, for anything mathematics related I wouldn’t even go near JavaScript).
BigDecimal (https://github.com/dtrebbien/BigDecimal.js) is also extremely useful when arbitrary
precision math needs to be done.
Another oft-repeated advice is to use the built-in toPrecision() and toFixed() methods on
numbers. A big warning to anyone thinking of using them - those methods return strings. So if you have
something like:

function foo(x, y) {
return x.toPrecision() + y.toPrecision()
}

>foo(0.1, 0.2)
"0.10.2"

The built in methods toPrecision() and toFixed() are really only for display purposes. Use with
caution! Now go forth and multiply (safely)!

Conclusion
JavaScript numbers are really just oating points as speci ed by IEEE-754. Due to inadequecies when
representing numbers in base-2, as well as a nite machine, we are left with a format that is lled with
rounding errors. This article explains those rounding errors and why errors occur. Always use a good
library for numbers instead of building your own. If you are interested to read more about oating
points, I highly recommend the fairly awesome Handbook of Floating Point Arithmetic
(http://www.amazon.com/gp/product/081764704X/ref=as_li_ss_tl?
ie=UTF8&camp=1789&creative=390957&creativeASIN=081764704X&linkCode=as2&tag=antwzumuniv-
20)
. It’s a bit pricey and a bit dif cult to read, but if you take your time with it, it will come through.

Another book that is actually a good and very dif cult read is Modern Computer Arithmetic
(http://www.amazon.com/gp/product/0521194695/ref=as_li_ss_tl?
ie=UTF8&camp=1789&creative=390957&creativeASIN=0521194695&linkCode=as2&tag=antwzumuniv-
20)
. I haven’t nished it - I mainly skipped the proofs and used it for reference.

If you’re interested comparing oats, the seminal paper is one written by Bruce Dawson
(http://www.cygnus-software.com/papers/comparing oats/comparing oats.htm). It’s a must-read if
you serious about comparing oating points (but really, don’t do it. I’ve done it and it was terrible). In
fact, Bruce Dawson’s body of work is quite good a read - they are scattered everywhere on the Internet
though, so you will have to go nd it yourself.

Incidentally, this topic on oating points also covered in my books (Underhanded JavaScript
(https://leanpub.com/underhandedJavaScript?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog) and its alternate title: JavasScript Technical Interview Questions
(https://leanpub.com/jsinterviewquestions?
utm_source=Blog&utm_medium=Blog&utm_term=js oat&utm_content=js oat&utm_campaign=js oat-
blog)) - with a little bit more detail on representation of numbers and the like. I hadn’t originally wanted
to address oating points, but on the ight back I wrote a long chapter on it while guring out how to
write this article. So if you liked this article, do buy the damn book :P

← PREVIOUS POST (HTTPS://BLOG.CHEWXY.COM/2014/02/13/YOU-SHOULD-DOUBLE-ROAST-YOUR-COFFEE-


AND-OPENCV-TRICKS/)

NEXT POST → (HTTPS://BLOG.CHEWXY.COM/2014/02/26/RUBBER-DUCKY-DEBUGGING/)

5 Comments Bigger on the Inside 1 Manoj Kumar

Recommend t Tweet f Share Sort by Oldest

Join the discussion…

Zach B • 3 years ago


Good explanation of floating-point math, though you make it sound like javascript is unique in
having floating-point precision errors. Any language implementing IEEE754 has the same
behavior. A slightly stronger point is that JS doesn't have any fixed-point types that would
allow for easily avoiding these errors -- although very few languages have built-in fixed-point
support, so just like in JS you have to use a library.

I'm curious why you say "for anything mathematics related I wouldn’t even go near
JavaScript." Floating-point is so common today because it's sufficient in the vast majority of
cases (and it's easy to work with). The notable exception is currencies/banking. Likewise,
developers also need to understand fixed-point types -- they're subject to precision loss and
overflow.

PS: 1/10 is 0.00011... -- you're missing a 0 before the first 1 the first time you show it.
△ ▽ • Reply • Share ›

The Doctor > Zach B • 3 years ago


Fix'd :) Thanks.
△ ▽ • Reply • Share ›

Jamie • 3 years ago


I was trying to understand the toString method in JS for binary conversion
http://www.w3schools.com/js...

can you guide how they are coming up with values. what is the radix doing in here. plesae
explain the calculation

function myFunction() {
var num = 15;
var a = num.toString();
var b = num.toString(2);
var c = num.toString(8);
var d = num.toString(16);

var n = a + "" + b + "" + c + "" + d;

document.getElementById("demo").innerHTML=n;
}

Output:

15
1111
17
f
see more

△ ▽ • Reply • Share ›
The Doctor > Jamie • 3 years ago
The parameter in the toString() method is optional. But if you DO pass in a parameter
to the method, it is treated as a radix. In the case above, the parameter of 2 indicates
that the string will be the string representation of the number in base2 (binary).
Likewise, 8 indicates that the number will be a string representation of 15 in base8
(which is Octal), and so on and so forth
△ ▽ • Reply • Share ›

liveFor10 • 2 years ago


Wow just thank you. I read your entire article and understood a little bit of it maybe. I liked
how you explained NaN and therefore how/why isNaN works. I liked how you just outright
said there is only floats in javascript. I haven't gotten into far enough or thought about it long
enough to have known that was true for myself. I especially liked how you combined the
exponent form with the "binary registers" we all used in our intro computer science classes
back in college. And the super extra awesome bonus for me was that you talked about pi. I am
using something called the dailyProgrammer (on reddit) for assignments/challenges to learn
javascript. #6 was calculate pi to 30 decimal points. So between that assignment and your
article and my other googling I have learned alot about decimals. And that the limitation of my
pi calculation is javascript not me! I used the Nilakantha Series method if that means anything
to you. "npm install BigDecimal" returns no errors/seems to complete successfully. Btw, will
that get me out to 30 places or just prevent the weird 0.1 + 0.2 = 3.0000000000000004 stuff?
Is BigDecimal still the best bet?
△ ▽ • Reply • Share ›

The Top 5 Facts About MacOs
The Truth Every Experienced Mac User Knows

Learn More

Sponsored by MacKeeper

Report ad

(mailto:chewxy@gmail.com) (https://github.com/chewxy) (https://twitter.com/chewxy)

(https://blog.chewxy.com/index.xml)

Chewxy  •  2018  •  Bigger on the Inside (https://blog.chewxy.com)

Hugo v0.37.1 (http://gohugo.io) powered  •  Theme by Beautiful Jekyll (http://deanattali.com/beautiful-jekyll/) adapted to Beautiful
Hugo (https://github.com/halogenica/beautifulhugo)

You might also like