# IEEE 754-2008 minNum and maxNum

The IEEE 754-2008 standard for floating-point arithmetic defines `minNum(x, y)`

and `maxNum(x, y)`

functions that compute the minimum and maximum of two
floating-point numbers respectively . This is simple as long as one number is
greater than the other, but the functions have some aggravating corner cases:

```
def maxNum(x, y):
if x > y:
return x
if x < y:
return y
if x == y:
# What is maxNum(+0, -0)?
return bitand(x, y)
# What about NaN?
...
```

## Negative zero

The first corner case is `maxNum(+0, -0)`

; should the result be `+0`

or
`-0`

? The IEEE standard leaves this up to the implementation. The ARM and MIPS
architectures both choose to compare as if `-0 < +0`

for the purposes of min and
max. This can be implemented as the bitwise `and`

of `x`

and `y`

when they compare
equal using the normal floating-point equality as shown above.

The Intel SSE `maxss`

and `maxsd`

instructions mimic the C expression ```
x > y ? x :
y
```

, so they would return `y`

in this case:

```
>>> maxss(+0, -0)
-0
>>> maxss(-0, +0)
+0
```

These instructions are not commutative.

## Not a number

What if one of the operands is a NaN? Since many other IEEE primitives like
addition and subtraction always produce a NaN when one of the operands is a NaN,
it would make sense for min and max to do that same. Indeed ARM has a `fmax`

instruction which does that:

```
def fmax(x, y):
if x > y:
return x
if x < y:
return y
if x == y:
return bitand(x, y)
return NaN
```

IEEE however, treats NaN as a missing value for the purpose of the `minNum`

and
`maxNum`

functions. They will suppress a single NaN operand and return the
number instead:

```
def maxNum(x, y):
if x > y:
return x
if x < y:
return y
if x == y:
return oneOf(x, y)
if not isNaN(x):
return x
if not isNaN(y)
return y
return NaN
```

The rationale for this behavior is not completely clear to me. This is from Prof. Kahan’s notes:

Some familiar functions have yet to be defined for NaN. For instance

`max{x, y}`

should deliver the same result as`max{y, x}`

but almost no implementations do that when`x`

is NaN. There are good reasons to define`max{NaN, 5} := max{5, NaN} := 5`

though many would disagree.

There is further discussion of this choice in the comments on a Julia language issue.

## Signaling NaNs

There are two types of NaNs: *quiet* and *signaling* NaNs. Quiet NaNs can be
produced by invalid operations like `0 / 0`

or `∞ - ∞`

, but signaling NaNs are
not produced by normal arithmetic operations, although they can be propagated by
certain functions like `negate`

and `copySign`

that only manipulate the sign
bit.

All other operations are invalid with a signaling NaN operand, causing them to
produce a quiet NaN result. This includes the `minNum`

and `maxNum`

functions:

```
def maxNum(x, y):
if isSignaling(x) or isSignaling(y):
return NaN
if x > y:
return x
if x < y:
return y
if x == y:
return oneOf(x, y)
if not isNaN(x):
return x
if not isNaN(y)
return y
return NaN
```

This means that these functions suppress quiet NaNs while signaling NaNs are converted to a quiet NaN:

```
>>> maxNum(1, NaN)
1
>>> maxNum(1, sNaN)
NaN
```

The conversion of signaling NaNs to quiet NaNs as they are propagated has the unfortunate effect of making these functions non-associative:

```
>>> maxNum(1, maxNum(sNaN, 2))
1
>>> maxNum(maxNum(1, sNaN), 2))
2
```

It is quite unfortunate that computing the maximum of a set containing the value
2 can produce the result 1. It seems that the answer should be either 2 or NaN.
This bad behavior when `maxNum`

functions are chained appears because

- Signaling NaNs are propagated as quiet NaNs.
- Quiet NaNs are supressed rather than propagated.

Signaling NaNs also latch an *invalid operation* floating-point exception, but
very few programs use the floating-point status flags. Trapping on
floating-point exceptions is almost always disabled.

The ARMv8 instruction set has an `fmaxnmv`

instruction which computes the IEEE
`maxNum`

function across the lanes of a SIMD vector:

```
def fmaxnmv(v):
return maxNum(maxNum(v[0], v[1]), maxNum(v[2], v[3]))
```

Since the underlying function is non-associative, we can get different results simply by permuting the lanes of the SIMD vector:

```
>>> fmaxnmv([sNaN, 1, 2, 3])
3
>>> fmaxnmv([1, 2, 3, sNaN])
2
```

This sets the *invalid operation* bit in the `FPSR`

floating-point status
register as the only indication that something might be wrong.

## Mitigation

So how can we compute the min or max of a set of floating-point numbers without losing our sanity? There’s a couple of options:

- Use min and max functions that propagate all NaN operands, like the ARM
`fmax`

instruction described above. If any member of the set is a NaN (quiet or signaling), the result will be NaN. - Use min and max functions that suppress all NaN operands, quiet or signaling.
This is valid to do in a C implementation since the C11 specification leaves
the handling of signaling NaNs implementation-defined. On OS X, this if how
the
`<math.h>`

`fmin()`

and`fmax()`

functions behave. - Detect
*invalid operation*floating point exceptions. Signaling NaN operands cause operations to latch an invalid operation exception and return NaN. If such an exception was latched while computing the maximum element of a set, the result can’t be trusted.