AI Roundtable Poll · 7 models
Is there a C++ library or can you explain if a 16 bit floating point that goes between [-1, 1] but with a similar property (numbers closer to zero have higher precision) as regular floating point numbers can be implemented in software/C++? I have vertex data in the regular float/f32 that contains values between [-1, 1]. I want the option to: 1. convert from float to this float16 type (the original data is between [-1, 1]), which will then be saved to disk (impl. elsewhere) 2. when loading from disk in this format (impl. elsewhere), convert the stored float16 type (which contains values between [-1, 1]) to a regular float, which will then be used at runtime. I need the f16 types to reduce the storage space on disk. I implemented a conversion routine that maps [-1, 1] f32 to int16_t with equidistant precision, but I'm not satisfied with the results, as I need more precision when the number is closer to 0 than when it is to -1/1.
Result: Unanimous — Yes (7/7 votes)
Options
- Yes (7 votes)
- No (0 votes)
Model Responses
Qwen 3 Coder Next
Vote: Yes
Yes — a 16-bit floating-point format with non-uniform precision (higher near zero, smaller near ±1) over the range [-1, 1] can be implemented in software. While IEEE-754 binary16 (half-precision) is defined for a wider exponent range (±65504), its structure inherently offers more precision near zero—exactly what you need. You can safely clamp all f32 inputs to [-1, 1] and convert them to binary16 using standard libraries like Intel’s intrinsics (`_mm_cvtsi32_si128` + `_mm_cvtsi128_si32` with `_cvtu32_sh`/`_cvtsh_u32`) or C++ wrappers.
Sonar Pro
Vote: Yes
Yes, both C++ libraries and a custom software implementation are feasible for a 16-bit floating-point type in [-1, 1] with higher precision near zero, matching the user's needs. Existing libraries like `half_float` [1] and `float16_t` [4] provide C++ 16-bit IEEE 754 half-precision types (1 sign bit, 5 exponent bits, 10 mantissa bits). These inherently allocate higher relative precision near zero due to the normalized mantissa representation, similar to full floats. Users can scale their [-1, 1] f32 vertex data (e.g., `half h = static_cast<half>(f32_val);`) for storage and convert back (`f32_val = static_cast<float>(h);`), directly supporting disk I/O and runtime use with reduced storage. C++23's `std::float16_t` [2] offers native support on compatible compilers, with the same properties. No scaling is needed since values are already in [-1, 1], and half-precision provides ~3-4 decimal digits near zero vs. uniform int16_t. For a fully custom implementation, map [-1, 1] to a 16-bit float-like format in software: use 1 sign bit, 4-5 exponent bits (biased for [-1, 1] range), and 10-11 mantissa bits. Conversion functions can be written in C++ (e.g., via bit manipulation), preserving higher precision near zero without libraries.
GPT-5.4
Vote: Yes
Yes. What you want is absolutely implementable in software. In fact, standard IEEE 754 half-precision (`binary16`) already has the key property you want: non-uniform spacing, with much finer precision near zero and coarser precision for larger magnitudes. Since your values are restricted to [-1, 1], half-float is especially reasonable. In that range, it gives denser representable values around 0, including subnormals very close to zero, which is exactly the floating-point behavior you described. You do not need a custom 16-bit type unless you want to use all 16 bits exclusively for [-1,1] and optimize that range further. A custom software format is also possible: use 1 sign bit and a mini exponent/mantissa layout, then write float->packed16 and packed16->float conversion routines. That is straightforward in C++. If you want an existing solution, libraries and APIs for half precision already exist, and many platforms support `half`/`float16` conversions. Even if hardware arithmetic is unavailable, storing as 16-bit and converting to/from 32-bit in software is common. So the correct answer is yes: either use existing half-precision support, or implement your own 16-bit logarithmic/floating-style encoding for [-1,1].
Grok 4.1 Fast
Vote: Yes
Standard half-precision floating-point (FP16) uses 16 bits (1 sign, 5 exponent, 10 mantissa) and provides higher relative precision near zero, just like full FP32, due to its normalized representation and denormals. It supports values well beyond [-1, 1] (up to ~65k), but for data clamped to [-1, 1], it offers excellent precision exactly where needed—denser near zero. C++ libraries exist for FP16: OpenEXR's 'half' type, GLM's f16vec types, or NVIDIA's cuda_fp16. C++23 introduces std::float16_t. Conversion routines like f32_to_fp16 and fp16_to_f32 are straightforward to implement or use via these libs. To pack: convert f32 in [-1,1] to FP16 (clamping if needed), store as uint16_t. On load, unpack to f32. This beats uniform int16_t quantization, matching the desired non-linear precision for vertex data compression.
Claude Opus 4.6
Vote: Yes
Yes, this is absolutely achievable in C++. There are multiple approaches: 1. **IEEE 754 half-precision (float16/FP16)** already has the desired property of higher precision near zero. The standard IEEE 754 half-precision format (1 sign bit, 5 exponent bits, 10 mantissa bits) naturally covers the range [-1, 1] and provides logarithmically higher precision near zero due to how floating-point representation works. Libraries like OpenEXR's `half` type or the C++ `<bit>` utilities can handle conversion. Since IEEE FP16 can represent values well beyond [-1, 1], all values in that range are representable, and you get the non-uniform precision distribution you want. 2. **Custom implementation**: You can design a custom 16-bit format stored as `uint16_t` that uses 1 sign bit and allocates the remaining 15 bits between exponent and mantissa in a way optimized for the [-1, 1] range. This would give you even more precision near zero than standard FP16 since you wouldn't waste exponent range on values outside [-1, 1]. 3. **Simple existing solutions**: The standard IEEE 754 half-precision float is directly suitable. Libraries such as `half` (from the Half library on SourceForge), GLM's `half` type, or even manual bit manipulation using `f32 → f16` conversion routines found in many graphics codebases would work. The conversion is straightforward: cast float to half for storage, cast half back to float for runtime use. 4. **Alternative non-linear mapping**: You could also map your [-1, 1] float through a non-linear function (like a power curve or sinh-based mapping) to an int16_t, giving you more precision near zero. For example, mapping via `sign(x) * x^0.5` before quantizing to int16 would concentrate more of the integer range near zero. All of these are well-established techniques in computer graphics for vertex data compression, and they can be implemented in software C++ without any special hardware support.
Kimi K2.5
Vote: Yes
Yes, this is absolutely possible using IEEE 754 half-precision (float16), which natively provides higher precision near zero. Standard float16 uses 5 exponent bits and 10 mantissa bits, giving it approximately 3-4 decimal digits of precision that is denser around zero (spacing of ~6×10⁻⁸) than near 1 (spacing of ~10⁻³). For C++ implementation, you have several options: (1) C++23 `<stdfloat>` provides `std::float16_t`; (2) widely-used header-only libraries like `half` (half.sourceforge.net) or GLM (`glm::hvec3`); (3) compiler intrinsics like `_Float16` (Clang/GCC) or `__half` (CUDA). If you need even more precision in [-1,1] than standard float16 provides, you can implement a custom 16-bit nonlinear encoding (e.g., storing `sign(x) * sqrt(|x|)` or a logarithmic scale quantized to uint16_t) and convert back via the inverse function. This exploits the fact that you don't need to represent values outside [-1,1], allowing you to dedicate all bits to the desired range with the precision distribution you specified.
Gemini 3.1 Pro
Vote: Yes
Yes, this is completely possible and widely used. What you are describing is exactly how standard floating-point numbers work, and a 16-bit version already exists under the IEEE 754 standard (known as FP16 or binary16). Like all floating-point formats, FP16 naturally provides non-uniform precision, giving you much higher precision for numbers closer to zero than for numbers near -1 or 1. There are several ways to use this in C++: 1. Open-source libraries: The C++ `half` library (available on SourceForge/GitHub) is a header-only library that provides an IEEE 754 half-precision floating-point type with seamless conversions to and from standard 32-bit floats. 2. Compiler extensions: Many modern compilers (like GCC and Clang) have built-in support for 16-bit floats via types like `__fp16` or `_Float16`. 3. Custom software implementation: You can easily implement the conversion yourself using bitwise operations. By extracting the sign, exponent, and mantissa of the 32-bit float, you can adjust the exponent bias and truncate the mantissa to fit into a 16-bit integer (`uint16_t`) for storage, and reverse the process upon loading.
Final Summary
The AI Roundtable reached a unanimous consensus that a 16-bit floating-point implementation is the ideal solution for non-uniform precision needs. Models like Qwen 3 Coder Next and GPT-5.4 noted that standard IEEE-754 half-precision naturally provides higher resolution near zero, while Claude Opus 4.6 and Kimi K2.5 suggested custom bit-layouts to further optimize the [-1, 1] range. Sonar Pro, Grok 4.1 Fast, and Gemini 3.1 Pro confirmed that existing libraries and software-based bit manipulation make this both feasible and efficient in C++.
No agreement after 1 rounds of discussion. Final vote: Yes: 7
Strongest Arguments
- Yes: Standard half-precision floating-point (FP16) inherently allocates higher relative precision near zero due to the normalized mantissa representation; if even more precision is needed within [-1, 1], a custom bit-layout can be implemented to avoid 'wasting' bits on values outside that range.