4.2.4 Floating-Point Operations

Java provides a number of operators that act on floating-point values:

Other useful constructors, methods, and constants are predefined in the classes Float (§20.9), Double (§20.10), and Math (§20.11).

If at least one of the operands to a binary operator is of floating-point type, then the operation is a floating-point operation, even if the other is integral.

If at least one of the operands to a numerical operator is of type double, then the operation is carried out using 64-bit floating-point arithmetic, and the result of the numerical operator is a value of type double. (If the other operand is not a double, it is first widened to type double by numeric promotion (§5.6).) Otherwise, the operation is carried out using 32-bit floating-point arithmetic, and the result of the numerical operator is a value of type float. If the other operand is not a float, it is first widened to type float by numeric promotion.

Operators on floating-point numbers behave exactly as specified by IEEE 754. In particular, Java requires support of IEEE 754 denormalized floating-point numbers and gradual underflow, which make it easier to prove desirable properties of particular numerical algorithms. Floating-point operations in Java do not "flush to zero" if the calculated result is a denormalized number.

Java requires that floating-point arithmetic behave as if every floating-point operator rounded its floating-point result to the result precision. Inexact results must be rounded to the representable value nearest to the infinitely precise result; if the two nearest representable values are equally near, the one with its least significant bit zero is chosen. This is the IEEE 754 standard's default rounding mode known as round to nearest.

Java uses round toward zero when converting a floating value to an integer (§5.1.3), which acts, in this case, as though the number were truncated, discarding the mantissa bits. Rounding toward zero chooses at its result the format's value closest to and no greater in magnitude than the infinitely precise result.

Java floating-point operators produce no exceptions (§11). An operation that overflows produces a signed infinity, an operation that underflows produces a signed zero, and an operation that has no mathematically definite result produces NaN. All numeric operations with NaN as an operand produce NaN as a result. As has already been described, NaN is unordered, so a numeric comparison operation involving one or two NaNs returns false and any != comparison involving NaN returns true, including x!=x when x is NaN.

The example program:


class Test {

	public static void main(String[] args) {

		// An example of overflow:
		double d = 1e308;
		System.out.print("overflow produces infinity: ");
		System.out.println(d + "*10==" + d*10);

		// An example of gradual underflow:
		d = 1e-305 * Math.PI;
		System.out.print("gradual underflow: " + d + "\n      ");
		for (int i = 0; i < 4; i++)
			System.out.print(" " + (d /= 100000));
		System.out.println();

		// An example of NaN:
		System.out.print("0.0/0.0 is Not-a-Number: ");
		d = 0.0/0.0;
		System.out.println(d);

		// An example of inexact results and rounding:
		System.out.print("inexact results with float:");
		for (int i = 0; i < 100; i++) {
			float z = 1.0f / i;
			if (z * i != 1.0f)
				System.out.print(" " + i);
		}
		System.out.println();

		// Another example of inexact results and rounding:
		System.out.print("inexact results with double:");
		for (int i = 0; i < 100; i++) {
			double z = 1.0 / i;
			if (z * i != 1.0)
				System.out.print(" " + i);
		}
		System.out.println();

		// An example of cast to integer rounding:
		System.out.print("cast to int rounds toward 0: ");
		d = 12345.6;
		System.out.println((int)d + " " + (int)(-d));
	}
}

produces the output:


overflow produces infinity: 1.0e+308*10==Infinity
gradual underflow: 3.141592653589793E-305
	3.1415926535898E-310 3.141592653E-315 3.142E-320 0.0
0.0/0.0 is Not-a-Number: NaN
inexact results with float: 0 41 47 55 61 82 83 94 97
inexact results with double: 0 49 98
cast to int rounds toward 0: 12345 -12345

This example demonstrates, among other things, that gradual underflow can result in a gradual loss of precision.

The inexact results when i is 0 involve division by zero, so that z becomes positive infinity, and z * 0 is NaN, which is not equal to 1.0.