IEEE-754


Sometimes you are assigned to task to import legacy data and one of the datatypes is FLOAT (or REAL).

IEEE-754 is stored in three parts,.

For FLOAT, IEEE-754 uses 1 bit for the sign, 11 bits for the exponent and 52 bits for the fraction.
REAL uses 1 bit for the sign, 8 bits for the exponent and 23 bits for the mantissa.

Each bit is valued at 1 / (2^bit), ie
1/2 = 0.5
1/4 = 0.25
1/8 = 0.125
1/16 = 0.0625
1/32 = 0.03125
1/64 = 0.015625
1/128 = 0.0078125
1/256 = 0.00390625
1/512 = 0.0019531

and so on. This means 0.15625 equals to 1/8 + 1/32 (0.125 + 0.03125). However the hexadecimal value above is 0x3E200000.
SELECT CAST(0x3E200000 AS REAL) can’t be directly converted to a REAL, so we have to deal with all parts separately and then combine them into a REAL.

The binary representation of the REAL value of 0.15625 is 00111110 00100000 00000000 00000000.
We handle the 8 bits for exponent (green) and 23 bits for the fraction (red) together with the 1 bit sign (black).

We also have to remember that IEEE-754 also holds special values such as NaN, Infiinty and so on. So we also have to check for those special values, and we do that by checking the exponent. If the exponent is decimal 255 for REAL, the number is not valid.

Easiest is the start with the sign, multiply with the fraction part and then multiply with the exponent part.

See how I handle the 8 bits for the exponent (0x7f8) and the 23 bits for the fraction (0x007FFFFF)? When applying the same logic for FLOAT, the function look like this.

These two IEEE-754 representations are usually named BINARY32 (real) and BINARY64 (float). There are also BINARY16 and BINARY128.
BInary16 uses 5 bits for the exponent and 10 bits for the fraction, whereas BINARY128 uses 15 bits for the exponent and 112 bits for the fraction.
Not to mention BINARY256 that uses 19 bits for the exponent and 236 bits for the fraction.
We can use the same logic as written in the two functions above, but there is no compatible datatype that can store the value properly.