Bug 55: Float32 To Float8 With A Fault
- This converts a 32-bit IEEE 754 binary32 (single-precision) float to an 8-bit non-standard E4M3 float.
- An E4M3 float has (from most- to least-significant): 1 sign bit, 4 exponent bits, and 3 mantissa bits.
- E4M3 cannot encode infinities. E4M3 has exactly two NaN bit patterns: all exponent/mantissa bits are 1.
- E4M3 exponent bias is 7. Unbiased binary32 exponents -7 and below yield E4M3 signed zero or subnormals.
- Unbiased binary32 exponents -6 to 8 (inclusive) yield E4M3 normals, avoiding E4M3 NaN bit patterns.
- Higher binary32 exponents (and infinities) clamp to the maximum non-NaN E4M3 exponent/mantissa (126).
- E4M3 NaN is generated if and only if the input is a binary32 NaN; the NaN sign bit is preserved.
- Rounding follows the IEEE 754 standard: To nearest, with ties to even.
Fix The Tiny Bug In This Go Code: