digitalmars.com                        
Last update Thu Jan 9 18:31:35 2025

Implementing Half Floats in D

written by Walter Bright

01/18/2013

Programming languages usually come with a suite of built-in types, such as int, long, char, float, double, etc. Built-in type enjoy advantages over library types such as better optimization, compile time execution, specialized literals, etc. D has an unusually large set of them. But there are never enough, and there's always pressure to add more.

For example, some graphics systems employ a half float type. This is a 16 bit type used to store an IEEE floating point value. (Floating point types are normally 32 or 64 bits.) The half float type is used only for compact storage - it is expanded to a 32 bit float type for computation, and crushed back down to 16 bytes for storage.

The alternative to a built-in type is, of course, a user defined library type. How close can D get to implementing a half float type as a user defined type? Is it close enough that the pressure for making it built-in is eased? Let's find out.

Laying down some requirements:

  1. Implicit promotion to float to do any computations. This is because the hardware floats are so fast, they are sure to beat a software emulated 16 bit computation. And besides, 16 bits loses too much precision in intermediate results.
  2. Explicit conversion from float to half float.
  3. A nice literal for the half floats.

That doesn't look so daunting. The promotion rules mirror that of C for shorts (and floats, too, which C allows to be promoted to double for computation).

Starting with the obvious,

struct HalfFloat {
  private:
    ushort s = 0x7C01;
}

The 0x7C01 forms the default initializer, and that value is the NaN value for half floats.

In order to enable casting to a HalfFloat, add a constructor:

this(float f) {
    s = floatToShort(f);
}

The floatToShort() function abstracts away the dirty details of manipulating floating point values elsewhere, so our struct will just focus on the mechanics of creating a new type. Back to the constructor - it's a bit greedy. It'll accept any argument that implicitly converts to a float. In order to restrict it to only accepting floats, turn it into a template and check the type with static if:

this(T : float)(T f) {
    static assert(is(T == float));
    s = floatToShort(f);
}

And now we can write expressions like:

float f;
HalfFloat hf = cast(HalfFloat)f;

Implementing implicit conversions to float is a bit less obvious:

@property float toFloat() { return shortToFloat(s); }
alias toFloat this;

The alias toFloat this construct is unusual - it tells the compiler that if it can't find the struct member it is looking for, to resolve it to the toFloat member. The toFloat member is a property that yields the HalfFloat converted to a float. Again, we've abstracted away the bit twiddling into shortToFloat().

Now, we can write:

HalfFloat g;
HalfFloat hf = cast(HalfFloat)(g + 3.2f);

and g gets implicitly converted to a float before being added to 3.2f using the floating point hardware.

No, I didn't forget the HalfFloat literals. They're as simple as:

template hf(float v) {
    enum hf = HalfFloat(v);
}

and used like:

HalfFloat h = hf!1.3f;

The template takes a single argument of type float, which is the 1.3f, constructs a HalfFloat out of it and assigns it to the manifest constant hf, which becomes the result of the template.

I know what you're thinking, like hell that's a user defined literal. It's calling a bunch of runtime code! Allow me:

HalfFloat foo() {
     return hf!1.3f;
}

Compiling it, then disassembling the code generated for foo():

    push    EAX
    mov     word ptr [EAX],0x3D33
    pop     ECX
    ret

The 0x3D33 is, indeed, the half float representation of 1.3f. What happened is the compiler, in turning HalfFloat(v) into a manifest constant, ran the constructor, and the floatToShort(f) all at compile time.

So there we have it, a nice user defined half float implementation, and one that can serve as a model for creating many other types of unusual user defined arithmetic types.

Here is the complete half float implementation, unit tests and all.

Imperfections

While the half float type looks and acts pretty much like a built-in type, there remains some subtle differences. Built-in types tend to run faster because the optimizer can take advantage of mathematical identities, any available hardware acceleration, and the back end optimizations often can only deal with builtin types.

Conclusion

D offers a set of features, the confluence of which enables the creation of user defined types that work well enough to take the pressure off of adding more built-in types. Adding a built-in type is a major disruptive change to any language, and so there's a really high bar to justify it. But a user defined type can be quickly added by anyone with just a few lines of code.

Acknowledgements

Thanks to Jason House for reviewing a draft of this.

Home | Runtime Library | IDDE Reference | STL | Search | Download | Forums