New function in Java math class — floating point number

Java ™ Language specification version 5 Lang. math and Java Lang. strictmath adds 10 new methods, and Java 6 adds 10 more. This two-part series introduces a meaningful new mathematical method. It provides functions familiar to mathematicians in the era of computers. In part 2, I focused on functions designed to manipulate floating-point numbers rather than abstract real numbers.

As I mentioned, The difference between a real number (such as e or 0.2) and its computer representation (such as Java double) is very important. The ideal number should be infinitely accurate, but the number of bits represented by Java is fixed (float is 32 bits and double is 64 bits). The maximum value of float is about 3.4 * 1038. This value is not enough to represent something, such as the number of electrons in the universe.

The maximum value of double is 1.8 * 10308, which can represent almost any physical quantity. However, when it comes to the calculation of abstract mathematical quantities, it may exceed the range of these values. For example, light is 171! (171 * 170 * 169 * 168 *... * 1) exceeds the maximum value of double. Float can only represent 35! Numbers within. Very small numbers (numbers with values close to 0) can also cause trouble, and it is very dangerous to calculate very large numbers and very small numbers.

To deal with this problem, The IEEE 754 standard for floating point mathematics (see) adds special values inf and Nan, which represent infinity and not a number, respectively. IEEE 754 also defines positive 0 and negative 0 (in general mathematics, 0 is not divided into positive and negative, but in computer mathematics, they can be positive or negative). These values bring confusion to the traditional principle. For example, when Nan is used, the law of excluded middle does not hold. X = = y or X! = y may be incorrect. When x or Y is Nan, neither formula holds.

In addition to the problem of number size, accuracy is a more practical problem. Look at this common cycle. After adding 1.0 10 times, the result is not 10, but 9.99999999999998:

For simple applications, you usually let Java text. Decimalformat formats the final output to the integer closest to its value, which is OK. however, In scientific and engineering applications (you are not sure whether the result of the calculation is an integer), you need to be more careful. If you need to subtract between very large numbers to get a smaller number, you need to be very careful. If you use a very small number as a divisor, you also need to pay attention. These operations can turn a small error into a big error and have a great impact on practical applications. By Small rounding errors caused by limited precision floating-point numbers will seriously distort the calculation of mathematical precision.

IEEE 754 floating point number implemented by java has 32 bits. The first bit is the sign bit, 0 is positive and 1 is negative. The next 8 bits represent the index, and its value ranges from - 125 to + 127. The last 23 bits represent the mantissa (sometimes referred to as significant digits) whose values range from 0 to 33554431. Taken together, floating-point numbers are represented as follows: sign * mantissa * 2exponent. Keen readers may have noticed something wrong with these numbers. First, the 8 bits representing the index should be from - 128 to 127, just like signed bytes. However, the deviation of these indexes is 126, that is, they are not used The signed value (0 to 255) subtracts 126 to obtain the true exponent (now - 126 to 128). However, 128 and - 126 are special values. When the exponents are all 1 bits (128), it indicates that the number is inf, - inf or Nan. To determine the specific situation, you must check its mantissa. When the exponents are all 0 bits (- 126), it indicates that the number is abnormal (more about it later), but the exponent is still - 125. The mantissa is generally a 23 bit unsigned integer - it is very simple. The 23 bits can hold 0 to 224-1, that is, 16777215. Wait a minute, did I just say that the mantissa ranges from 0 to 33431, that is, 225-1. Where does the extra bit come from? Therefore, the first bit can be represented by the exponent what. If the exponents are all 0 bits, the first bit is 0. Otherwise, bit 1 is 1. Because we usually know what the first digit is, it is not necessary to include it in the number. You get an extra bit "free". Is it weird? Floating point numbers with bit 1 of the mantissa are normal. That is, the value of mantissa is usually between 1 and 2. A floating-point number with the first bit of the mantissa 0 is not normal. Although the exponent is usually - 125, it can usually represent a smaller number. Double precision numbers are encoded in a similar way, but it uses a 52 bit mantissa and an 11 bit exponent to achieve higher precision. The exponential deviation of the double precision number is 1023. The two getexponent () methods added in Java 6 Return unbiased exponents when representing floating-point numbers or doubles. For floating-point numbers, this number ranges from - 125 to + 127, and for double precision numbers, it ranges from - 1022 to + 1023 (INF and Nan are + 128 / + 1024). For example, Listing 1 compares the results of the getexponent () method based on the more common base 2 logarithm: Math log(x)/Math. Log (2) and math getExponent() public static void main(String[] args) { System.out.println("x\tlg(x)\tMath. getExponent(x)"); for (int i = -255; i < 256; i++) { double x = Math.pow(2,i); System.out.println( x + "\t" + lg(x) + "\t" + Math.getExponent(x)); } } public static double lg(double x) { return Math.log(x)/Math.log(2); } } For some values that use rounding, math Getexponent () is more accurate than normal calculations: if you want to perform a large number of such calculations, math Getexponent () will be faster. However, it should be noted that it is only applicable to calculate the power of 2. For example, when it is changed to the power of 3, the result is as follows: getexponent () does not process the mantissa, which is determined by math Log() processing. With a few steps, you can find the mantissa, take the logarithm of the mantissa, and add the value to the exponent, but it's a little laborious. Math. Getexponent () is useful if you want to quickly estimate orders of magnitude rather than exact values. With math Different from log(), math Getexponent() never returns Nan or inf. If the parameter is Nan or inf, the results of the corresponding floating-point number and double precision number are 128 and 1024, respectively. If the parameter is 0, the results of the corresponding floating-point number and double precision number are - 127 and - 1023, respectively. If the parameter is negative, the exponent of the number is the same as the exponent of the absolute value of the number. For example, the exponent of - 8 is 3, which is the same as the exponent of 8. There is no corresponding getmantissa () method, but you can construct one with simple mathematical knowledge: Although the algorithm is not obvious, you can still find the mantissa through bit masking. To extract bits, simply calculate double doubleToLongBits(x) & 0x000FFFFFFFFFFFFFL。 However, then you need to consider the extra 1 bit in the normal number, and then convert back to a floating-point number in the range of 1 to 2. Real numbers are very dense. Other real numbers can appear between any two different real numbers. This is not the case with floating-point numbers. For floating-point numbers and double precision numbers, there is also the next floating-point number; There is a minimum finite distance between consecutive floating-point numbers and double precision numbers. The NextUp () method returns the nearest floating-point number larger than the first parameter. For example, Listing 2 prints out all floating-point numbers between 1.0 and 2.0: public static void main (string [] args) {float x = 1.0F; int numfloats = 0; while (x < = 2.0) {numfloats + +; system. Out. Println (x); X = math. NextUp (x);} System. out. println(numFloats); } } The result is 8388609 floating-point numbers between 1.0 and 2.0; Although there are many, they are not infinite real numbers. The distance between adjacent numbers is 0.0000001. This distance is called ULP, which is the abbreviation of unit of least precision or unit in the last place. If you need to find the nearest floating-point number less than the specified number backward, you can use the nextafter() method instead. The second parameter specifies whether to find the nearest number above or below the first parameter: if direction is greater than start, nextafter() returns the next number above start. If direction is less than start, nextafter() returns the next number below start. If direction equals start, nextafter() returns start itself. These methods are very useful in some modeling or graphics tools. Numerically, you may need to extract sample values at 10000 locations between a and B, but if you have the accuracy to identify only 1000 independent points between a and B, nine tenths of the work is repeated. You can do only one tenth of the work, but get the same result. Of course, if additional precision is required, you can choose a data type with high precision, such as double or BigDecimal. For example, I've seen this in the Mandelbrot collection Manager. In it, you can enlarge the graph to fall between the two nearest doubles. Mandelbrot sets are very subtle and complex at all levels, but float or double can reach this subtle level before losing the ability to distinguish adjacent points. Math. ULP () returns the distance between a number and its nearest number. Listing 3 lists the ULPS of various powers of 2: public static void main (string [] args) {for (float x = 1.0F; x < = float. Max_value; X * = 2.0f) {system. Out. Println (math. Getexponent (x) + "\ T" + X + "\ t" + Math. ULP (x));}}} Here are some outputs: you can see that floating-point numbers are very accurate for small powers of 2. However, in many applications, this accuracy will be problematic when the value is about 220. When approaching the maximum limit of floating-point numbers, Adjacent values will be separated by the sextillions of thousands (in fact, it may be a little larger, but I can't find a word to express it). As shown in Listing 3, the size of ULP is not fixed. As the number becomes larger, there will be fewer and fewer floating-point numbers between them. For example, there are only 1025 floating-point numbers between 10000 and 10001; their distance is 0.001. There are only 17 floating-point numbers between 1000000 and 1001, and their distance is 0.001 Is 0.05. Accuracy is inversely proportional to the order of magnitude. For floating-point number 10000, the accuracy of ULP becomes 1.0. After this number is exceeded, multiple integer values will be mapped to the same floating-point number. For double precision numbers, this happens only when it reaches 4.5e15, but it is also a problem. The limited precision of floating-point numbers can lead to an unpredictable result: beyond a certain point, x + 1 = = x is true. For example, the following simple loop is actually infinite: for (float x = 16777213f; x < 16777218f; X + = 1.0F) {system. Out. Println (x);} In fact, the cycle will stop at a fixed point, the exact number is 16216. This number is equal to 224. At this point, the ULP is greater than the increment. Math. ULP () provides a practical use for testing. Obviously, we don't usually compare whether two floating-point numbers are exactly equal. Instead, we check whether they are equal within a certain fault tolerance range. For example, in JUnit, the expected actual floating-point value is compared as follows: This indicates that the deviation between the actual value and the expected value is within 0.02. But is 0.02 a reasonable fault tolerance range? If the expected value is 10.5 or -107.82, 0.02 is fully acceptable. But when the expected value is billions, 0.02 is no different from 0. In general, relative errors are considered when testing ULP. Generally, the fault tolerance range selected is between 1 and 10 ULP, depending on the accuracy required for calculation. For example, the following specifies that the actual result must be within 5 ULPS of the real value: depending on the expected value, this value can be one trillion or millions. scalb Math. Scalb (x, y) multiplies X by 2Y. Scalb is the abbreviation of "scale binary". For example, math.scalb (3,4) returns 3 * 24, that is, 3 * 16, and the result is 48.0. You can also use math Scalb() to implement getmantissa(): math Scalb () and X * math What is the difference between pow (2, scalefactor)? In fact, the final result is the same. Any input returns exactly the same value. However, there are differences in performance. Math. The performance of pow () is very poor. It must be able to really deal with some very rare situations, such as power -0.078 for 3.14. For small integer powers, such as 2 and 3 (or based on 2, which is special), a completely wrong algorithm is usually selected. I am worried that this will affect the overall performance. Some compilers and VMS are more intelligent. Some optimizers recognize x * math.pow (2, y) as a special case and convert it to math Scalb (x, y) or something like that. Therefore, the performance impact is not reflected. However, I can guarantee that some VMS are not so intelligent. For example, when testing with Apple's Java 6 VM, math Scalb() is almost always better than x * math Pow (2, y) is two orders of magnitude faster. Of course, this usually has no impact. However, in special cases, such as millions of exponentiation operations, it is necessary to consider whether they can be converted to use math scalb()。 Copysign Math. The copysign() method sets the tag of the first parameter to the tag of the second parameter. The simplest implementation is shown in Listing 4: copysign algorithm (0) {if (magnet < 0) return - magnet; else return magnet;} return magnitude; } However, the real implementation is shown in Listing 5: sun misc. The real algorithm of fputils looks at these bits carefully and you can see that the Nan flag is considered positive. Strictly speaking, math Copysign() does not guarantee this, but is provided by strictm

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>