Does the Java optimizer remember calculated values?
Which fizzbuzz implementation is more efficient?
public static void fizzBuzz1(int n)
{
boolean fizzed,buzzed;
for(int i = 1; i <= n; i++)
{
fizzed = buzzed = false;
if(i % 3 == 0)
fizzed = true;
if(i % 5 == 0)
buzzed = true;
if(fizzed && buzzed)
System.out.println("FizzBuzz");
else if(fizzed)
System.out.println("Fizz");
else if(buzzed)
System.out.println("Buzz");
else
System.out.println(i);
}
}
public static void fizzBuzz2(int n)
{
for(int i = 1; i <= n; i++)
{
if(i % 3 == 0 && i % 5 == 0)
System.out.println("FizzBuzz");
else if(i % 3 == 0)
System.out.println("Fizz");
else if(i % 5 == 0)
System.out.println("Buzz");
else
System.out.println(i);
}
}
Solution
This time, some interesting turns were taken
First, I'll try to look at the assembly generated using the original method However, JIT has some inlining and optimization, including system out. Println call, so the generated assembly output is too large (for me) to be reasonably analyzed in a reasonable time
So I simplified the whole process so that I could focus on practical problems Finally, I ran the following program:
class Test04
{
public static void main(String args[])
{
long sum = 0;
for (int i=1000; i<12000; i++)
{
sum += fizzBuzz1(i);
sum += fizzBuzz2(i);
}
System.out.println(sum);
}
public static long fizzBuzz1(int n)
{
long sum = 0;
for(int i = 1; i <= n; i++)
{
sum += fizzBuzzCore1(i);
}
return sum;
}
public static long fizzBuzzCore1(int i)
{
boolean fizzed = false;
boolean buzzed = false;
if(i % 3 == 0)
fizzed = true;
if(i % 5 == 0)
buzzed = true;
if(fizzed && buzzed)
return 4;
else if(fizzed)
return 3;
else if(buzzed)
return 2;
else
return 1;
}
public static long fizzBuzz2(int n)
{
long sum = 0;
for(int i = 1; i <= n; i++)
{
sum += fizzBuzzCore2(i);
}
return sum;
}
public static long fizzBuzzCore2(int i)
{
if(i % 3 == 0 && i % 5 == 0)
return 4;
else if(i % 3 == 0)
return 3;
else if(i % 5 == 0)
return 2;
else
return 1;
}
}
The return value is designed to prevent him from completely optimizing the call and to extract the "core" method designed to keep the size of the component output that must be compared as small as possible
(Note: of course, these modifications may affect optimization. For example, a JIT method may have a limit on the number of bytecode instructions. Before it is considered too large to be inlined, - XX: maxinlinesize = 35, but the effects of the two methods are roughly the same, so the required information about practical problems can still be obtained
Moreover, it is not a surprise: after the last optimization, the assembly code of both methods contains the same instructions - here is the assembly of fizzbuzzcore1 for reference:
Decoding compiled method 0x00000000026c0090:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x0000000057260528} 'fizzBuzzCore1' '(I)J' in 'Test04'
# parm0: rdx = int
# [sp+0x20] (sp of caller)
0x00000000026c01c0: sub $0x18,%rsp
0x00000000026c01c7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - Test04::fizzBuzzCore1@-1 (line 27)
0x00000000026c01cc: movslq %edx,%r10
0x00000000026c01cf: mov %edx,%r11d
0x00000000026c01d2: sar $0x1f,%r11d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c01d6: imul $0x66666667,%r10,%r8
0x00000000026c01dd: imul $0x55555556,%r10
0x00000000026c01e4: sar $0x21,%r8
0x00000000026c01e8: sar $0x20,%r10
0x00000000026c01ec: mov %r8d,%r8d
0x00000000026c01ef: sub %r11d,%r8d ;*irem
; - Test04::fizzBuzzCore1@14 (line 31)
0x00000000026c01f2: mov %r10d,%r10d
0x00000000026c01f5: sub %r11d,%r10d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c01f8: mov %r8d,%r11d
0x00000000026c01fb: shl $0x2,%r11d
0x00000000026c01ff: add %r8d,%r11d ;*irem
; - Test04::fizzBuzzCore1@14 (line 31)
0x00000000026c0202: mov %r10d,%r9d
0x00000000026c0205: shl %r9d
0x00000000026c0208: add %r10d,%r9d ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c020b: cmp %r9d,%edx
0x00000000026c020e: jne 0x00000000026c021c ;*ifeq
; - Test04::fizzBuzzCore1@21 (line 33)
0x00000000026c0210: cmp %r11d,%edx
0x00000000026c0213: jne 0x00000000026c021c ;*ifeq
; - Test04::fizzBuzzCore1@25 (line 33)
0x00000000026c0215: mov $0x4,%eax
0x00000000026c021a: jmp 0x00000000026c0239 ;*iload_1
; - Test04::fizzBuzzCore1@32 (line 35)
0x00000000026c021c: cmp %r9d,%edx
0x00000000026c021f: jne 0x00000000026c0228 ;*ifeq
; - Test04::fizzBuzzCore1@33 (line 35)
0x00000000026c0221: mov $0x3,%eax
0x00000000026c0226: jmp 0x00000000026c0239
0x00000000026c0228: cmp %r11d,%edx
0x00000000026c022b: jne 0x00000000026c0234 ;*ifeq
; - Test04::fizzBuzzCore1@41 (line 37)
0x00000000026c022d: mov $0x2,%eax
0x00000000026c0232: jmp 0x00000000026c0239
0x00000000026c0234: mov $0x1,%eax ;*irem
; - Test04::fizzBuzzCore1@6 (line 29)
0x00000000026c0239: add $0x10,%rsp
0x00000000026c023d: pop %rbp
0x00000000026c023e: test %eax,-0x2470244(%rip) # 0x0000000000250000
; {poll_return}
0x00000000026c0244: retq
0x00000000026c0245: hlt
0x00000000026c0246: hlt
0x00000000026c0247: hlt
0x00000000026c0248: hlt
0x00000000026c0249: hlt
0x00000000026c024a: hlt
0x00000000026c024b: hlt
0x00000000026c024c: hlt
0x00000000026c024d: hlt
0x00000000026c024e: hlt
0x00000000026c024f: hlt
0x00000000026c0250: hlt
0x00000000026c0251: hlt
0x00000000026c0252: hlt
0x00000000026c0253: hlt
0x00000000026c0254: hlt
0x00000000026c0255: hlt
0x00000000026c0256: hlt
0x00000000026c0257: hlt
0x00000000026c0258: hlt
0x00000000026c0259: hlt
0x00000000026c025a: hlt
0x00000000026c025b: hlt
0x00000000026c025c: hlt
0x00000000026c025d: hlt
0x00000000026c025e: hlt
0x00000000026c025f: hlt
[Exception Handler]
[Stub Code]
0x00000000026c0260: jmpq 0x000000000261c560 ; {no_reloc}
[Deopt Handler Code]
0x00000000026c0265: callq 0x00000000026c026a
0x00000000026c026a: subq $0x5,(%rsp)
0x00000000026c026f: jmpq 0x00000000025f6f40 ; {runtime_call}
0x00000000026c0274: hlt
0x00000000026c0275: hlt
0x00000000026c0276: hlt
0x00000000026c0277: hlt
But
However, it is surprising that it does not calculate modular operations at all!
At least, there is no clear explanation: there is no IDIV instruction in this code! So JIT really tries to avoid costly differences by doing some annoying, annoying and bitter hacker attacks: instructions
0x00000000026c01d6: imul $0x66666667,%r10 (and following...)
It is a "no division of labor" implementation of a department For example, this method
private static int divideBy3(int n)
{
long r10 = n;
r10 *= 0x55555556L;
r10 >>>= 0x20;
long r10d = r10 & 0xFFFFFFFFL;
return (int)r10d;
}
Use these magic constants and shift to calculate divide by 3 (similarly, for 5 divided by another constant) I haven't done mathematical calculations myself, but I can find an explanation on how to get modular operations from the "integer division by constants" document from hacker's right on page 32 of the "integer division by constants"
