Do you know how computers deal with floating-point numbers?

Roshni Silva
5 min readDec 8, 2021

Have you ever thought about how computers gonna dealing with floating-point numbers? As a software engineer, lead software engineer, or programmer, it's really important to have a good understanding of fundamentals which is a base to building a good career. There are some edge cases where the computer gives some unexpected results to even to your small calculations. These floating-point numbers also lead to giving you an unexpected result for your calculations so it's really important to know about how computers handle these numbers.

In computers, there is a standard called IEEE 754, which is used to represent the floating-point numbers in computers. It takes a floating-point number and divides it into 3 components.

  1. Sign
  2. Exponent
  3. Mantissa

According to this standard, floating-point numbers are represented in a single or double or in long double precision.

Representation of Single, Double and Long Double
IEEE 754 Representation for Single Precision

Let’s take a 9.1 and convert this into IEEE 754 standard and see actually what is the value computer accept as 9.1. In order to convert this decimal value into IEEE 754 format computer follows several steps as below.

Convert the given floating-point number into a binary representation.

Convert the binary representation into scientific notation.

Write the scientific notation in an IEEE 754 format.

Let’s walk through one by one.

Convert the given floating-point number into a binary representation.

According to the above image. We have divided 9.1 into 2 parts which are whole and decimal. After we convert the whole part (9) into binary, the result is 1001 and after converting 0.1 into binary, the result is 00011001100110011….

Finally, we have 1001.00011001100110011… as our binary representation of 9.1

Convert the binary representation into scientific notation.

Once we get the binary representation then we can convert it into scientific notation. Now have a look at below.

Scientific notation of 9.1

Write the scientific notation in an IEEE 754 format.

The first bit is SIGN, it says if the value is positive then the sign bit becomes 0. Else if the value is negative the sign bit becomes 1. In our case, our value is a positive value so the sign bit becomes 0.

Then we have EXPONENT, which has 8 bits to represent the exponent component and it has a range -128 to +127. This +127 is called “Exponent Bias” so whenever we have a positive / plus number for exponent we add that into 127 and get the sum in a binary format.

In our case, we have 2³ where the exponent is positive. So, we add that into 127. Now we have 130. Then we have to convert this 130 into binary format to find the exponent value for 9.1. After we convert it into binary the result is 10000010 which is our exponent value.

The binary value of 130

Finally, we have MANTISSA, from the scientific notation we can ignore 1 and take the rest as mantissa (23 bits). So now our mantissa for 9.1 is 00100011001100110011001.

Now our Final IEEE 754 representation for 9.1 is 01000001000100011001100110011001.

But if you go to any calculator which implemented this IEEE 754 standard you won’t get above mentioned answer. Instead, you get 01000001000100011001100110011010 because in IEEE there is the rule when calculating the mantissa it always checks the 24th bit (if available). If the 24th bit is ‘1’ then it must add it to the 23rd bit in order to round the value. On the other hand, if the 24th bit is ‘0’ then no worries 😏.

The way of rounding Mantissa

Original IEEE 754 Representation for 9.1 is 01000001000100011001100110011001

IEEE 754 Calculator Value for 9.1 is 01000001000100011001100110011010

There is a difference between the above 2 IEEE 754 representations in the last two digits and we called this ‘Floating Point Rounding Problem in Computer’.

Now let’s try to convert this IEEE 754 format value back to decimal. What do you think? Do you think that we get our original value (9.1)? The answer is ‘NOOO’ you won’t get 9.1 as your answer.

Let’s see how this happens.

As you can see our final value is 9.10000038 but not 9.1. We have an additional value due to the rounding occurring in IEEE 754 standard. So these kinds of problems lead to unexpected results in certain scenarios. As the best practice, we should not use float or double for sensitive data calculations so always make sure to use an appropriate datatype when you do programming.

--

--