Half-precision floats

Enoki provides a compact implementation of a 16 bit half-precision floating point type that is compatible with the FP16 format on GPUs and high dynamic range image libraries such as OpenEXR. To use this feature, include the following header:

#include <enoki/half.h>

Current processors don’t natively implement half precision arithmetic, hence mathematical operations involving this type always involve a half\(\to\) float\(\to\) half roundtrip. For this reason, it is unwise to rely on it for expensive parts of a computation.

The main reason for including a dedicated half precision type in Enoki is that it provides an ideal storage format for floating point data that does not require the full accuracy of the single precision representation, which leads to an immediate storage savings of \(2\times\).

Note

If supported by the target architecture, Enoki uses the F16C instruction set to perform efficient vectorized conversion between half and single precision variables (however, this only affects conversion and no other arithmetic operations). ARM NEON also provides native conversion instructions.

Usage

The following example shows how to use the enoki::half type in a typical use case.

using Color4f = Array<float, 4>;
using Color4h = Array<half, 4>;

uint8_t *image_ptr = ...;

Color4f pixel(load<Color4h>(image_ptr)); // <- conversion vectorized using F16C

/* ... update 'pixel' using single-precision arithmetic ... */

store(image_ptr, Color4h(pixel)); // <- conversion vectorized using F16C

Reference

class half

A half instance encodes a sign bit, an exponent width of 5 bits, and 10 explicitly stored mantissa bits.

All standard mathematical operators are overloaded and implemented using the processor’s floating point unit after a conversion to a IEEE754 single precision. The result of the operation is then converted back to half precision.

uint16_t value

Stores the represented half precision value as an unsigned 16-bit integer.

half(float value)

Constructs a half-precision value from the given single precision argument.

operator float() const

Implicit half to float conversion operator.

static half from_binary(uint16_t value)

Reinterpret a 16-bit unsigned integer as a half-precision variable.

half operator+(half h) const

Addition operator.

half &operator+=(half h)

Addition compound assignment operator.

half operator-() const

Unary minus operator

half operator*(half h) const

Multiplication operator.

half &operator*=(half h)

Multiplication compound assignment operator.

half operator/(half h) const

Division operator.

half &operator/=(half h)

Division compound assignment operator.

bool operator<(half h) const

Less-than comparison operator.

bool operator<=(half h) const

Less-than-or-equal comparison operator.

bool operator>(half h) const

Greater-than comparison operator.

bool operator>=(half h) const

Greater-than-or-equal comparison operator.

bool operator==(half h) const

Equality operator.

bool operator!=(half h) const

Inequality operator.

friend std::ostream &operator<<(std::ostream &os, const half &h)

Stream insertion operator.