IEEE 754-2008 minNum and maxNum

The IEEE 754-2008 standard for floating-point arithmetic defines minNum(x, y) and maxNum(x, y) functions that compute the minimum and maximum of two floating-point numbers respectively . This is simple as long as one number is greater than the other, but the functions have some aggravating corner cases:

def maxNum(x, y):
    if x > y:
        return x
    if x < y:
        return y
    if x == y:
        # What is maxNum(+0, -0)?
        return bitand(x, y)
    # What about NaN?
    ...

Negative zero

The first corner case is maxNum(+0, -0); should the result be +0 or -0? The IEEE standard leaves this up to the implementation. The ARM and MIPS architectures both choose to compare as if -0 < +0 for the purposes of min and max. This can be implemented as the bitwise and of x and y when they compare equal using the normal floating-point equality as shown above.

The Intel SSE maxss and maxsd instructions mimic the C expression x > y ? x : y, so they would return y in this case:

>>> maxss(+0, -0)
-0
>>> maxss(-0, +0)
+0

These instructions are not commutative.

Not a number

What if one of the operands is a NaN? Since many other IEEE primitives like addition and subtraction always produce a NaN when one of the operands is a NaN, it would make sense for min and max to do that same. Indeed ARM has a fmax instruction which does that:

def fmax(x, y):
    if x > y:
        return x
    if x < y:
        return y
    if x == y:
        return bitand(x, y)
    return NaN

IEEE however, treats NaN as a missing value for the purpose of the minNum and maxNum functions. They will suppress a single NaN operand and return the number instead:

def maxNum(x, y):
    if x > y:
        return x
    if x < y:
        return y
    if x == y:
        return oneOf(x, y)
    if not isNaN(x):
        return x
    if not isNaN(y)
        return y
    return NaN

The rationale for this behavior is not completely clear to me. This is from Prof. Kahan’s notes:

Some familiar functions have yet to be defined for NaN. For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.

There is further discussion of this choice in the comments on a Julia language issue.

Signaling NaNs

There are two types of NaNs: quiet and signaling NaNs. Quiet NaNs can be produced by invalid operations like 0 / 0 or ∞ - ∞, but signaling NaNs are not produced by normal arithmetic operations, although they can be propagated by certain functions like negate and copySign that only manipulate the sign bit.

All other operations are invalid with a signaling NaN operand, causing them to produce a quiet NaN result. This includes the minNum and maxNum functions:

def maxNum(x, y):
    if isSignaling(x) or isSignaling(y):
        return NaN
    if x > y:
        return x
    if x < y:
        return y
    if x == y:
        return oneOf(x, y)
    if not isNaN(x):
        return x
    if not isNaN(y)
        return y
    return NaN

This means that these functions suppress quiet NaNs while signaling NaNs are converted to a quiet NaN:

>>> maxNum(1, NaN)
1
>>> maxNum(1, sNaN)
NaN

The conversion of signaling NaNs to quiet NaNs as they are propagated has the unfortunate effect of making these functions non-associative:

>>> maxNum(1, maxNum(sNaN, 2))
1
>>> maxNum(maxNum(1, sNaN), 2))
2

It is quite unfortunate that computing the maximum of a set containing the value 2 can produce the result 1. It seems that the answer should be either 2 or NaN. This bad behavior when maxNum functions are chained appears because

Signaling NaNs are propagated as quiet NaNs.
Quiet NaNs are supressed rather than propagated.

Signaling NaNs also latch an invalid operation floating-point exception, but very few programs use the floating-point status flags. Trapping on floating-point exceptions is almost always disabled.

The ARMv8 instruction set has an fmaxnmv instruction which computes the IEEE maxNum function across the lanes of a SIMD vector:

def fmaxnmv(v):
    return maxNum(maxNum(v[0], v[1]), maxNum(v[2], v[3]))

Since the underlying function is non-associative, we can get different results simply by permuting the lanes of the SIMD vector:

>>> fmaxnmv([sNaN, 1, 2, 3])
3
>>> fmaxnmv([1, 2, 3, sNaN])
2

This sets the invalid operation bit in the FPSR floating-point status register as the only indication that something might be wrong.

Mitigation

So how can we compute the min or max of a set of floating-point numbers without losing our sanity? There’s a couple of options:

Use min and max functions that propagate all NaN operands, like the ARM fmax instruction described above. If any member of the set is a NaN (quiet or signaling), the result will be NaN.
Use min and max functions that suppress all NaN operands, quiet or signaling. This is valid to do in a C implementation since the C11 specification leaves the handling of signaling NaNs implementation-defined. On OS X, this if how the <math.h> fmin() and fmax() functions behave.
Detect invalid operation floating point exceptions. Signaling NaN operands cause operations to latch an invalid operation exception and return NaN. If such an exception was latched while computing the maximum element of a set, the result can’t be trusted.