Java – how to deal with underflow in scientific computing?
I'm studying probabilistic models. When reasoning about these models, the estimated probability may become very small To avoid underflow, I am currently working in the log domain (I store probabilistic logs) Multiplying by probability equals an addition and is summed by using the formula:
log(exp(a) + exp(b)) = log(exp(a - m) + exp(b - m)) + m
Where M = max (a, b)
I use some very large matrices, and I have to take the element exponents of these matrices to calculate the matrix vector multiplication This step is quite expensive and I wonder if there are other ways to deal with underflow when using probability
Edit: for efficiency reasons, I'm looking for solutions that use primitive types instead of storing objects with arbitrary precision representations of real numbers
Editor 2: I'm looking for a faster solution than log domain technology, not a more accurate solution I'm happy with the accuracy I currently get, but I need a faster method In particular, summation occurs during matrix vector multiplication, and I hope to be able to use an effective Blas method
Solution: after discussing with Jonathan dursi, I decided to decompose each matrix and vector according to its largest element and store the factor in the log field Multiplication is direct Before adding, I must factorize an added matrix / vector according to the proportion of two factors I update every ten operations
Solution
This problem has also recently appeared on the computational science stack exchange site. Although there is an immediate fear of overflow, there are many or few problems
Converting to log space is certainly a reasonable method No matter where you are, you can improve the accuracy of your sum in several ways to make a large amount of money correctly The most famous compensation summary method is Kahan summary, which retains a sum and is effectively "surplus"; It gives you some advantages of using higher precision arithmetite without all the cost (and only the original type) Other terms also give you some indication of what you are doing
In addition to improving the actual mechanics of your additions, changing the order in which clauses are added may be very different Sort your terms so that your sum from minimum to maximum can help, because you no longer add very different terms (which may lead to major roundoff problems); In some cases, doing the paired sum of log2 n repeats can also be an improvement, just doing linearity and, depending on what your term looks like
The usefulness of all these methods depends on the properties of the data Although any precise mathematical library is very expensive in computing time (possibly memory), it has the advantages of quite general solutions