Does the Java optimizer remember calculated values?
Which fizzbuzz implementation is more efficient?
public static void fizzBuzz1(int n) { boolean fizzed,buzzed; for(int i = 1; i <= n; i++) { fizzed = buzzed = false; if(i % 3 == 0) fizzed = true; if(i % 5 == 0) buzzed = true; if(fizzed && buzzed) System.out.println("FizzBuzz"); else if(fizzed) System.out.println("Fizz"); else if(buzzed) System.out.println("Buzz"); else System.out.println(i); } } public static void fizzBuzz2(int n) { for(int i = 1; i <= n; i++) { if(i % 3 == 0 && i % 5 == 0) System.out.println("FizzBuzz"); else if(i % 3 == 0) System.out.println("Fizz"); else if(i % 5 == 0) System.out.println("Buzz"); else System.out.println(i); } }
Solution
This time, some interesting turns were taken
First, I'll try to look at the assembly generated using the original method However, JIT has some inlining and optimization, including system out. Println call, so the generated assembly output is too large (for me) to be reasonably analyzed in a reasonable time
So I simplified the whole process so that I could focus on practical problems Finally, I ran the following program:
class Test04 { public static void main(String args[]) { long sum = 0; for (int i=1000; i<12000; i++) { sum += fizzBuzz1(i); sum += fizzBuzz2(i); } System.out.println(sum); } public static long fizzBuzz1(int n) { long sum = 0; for(int i = 1; i <= n; i++) { sum += fizzBuzzCore1(i); } return sum; } public static long fizzBuzzCore1(int i) { boolean fizzed = false; boolean buzzed = false; if(i % 3 == 0) fizzed = true; if(i % 5 == 0) buzzed = true; if(fizzed && buzzed) return 4; else if(fizzed) return 3; else if(buzzed) return 2; else return 1; } public static long fizzBuzz2(int n) { long sum = 0; for(int i = 1; i <= n; i++) { sum += fizzBuzzCore2(i); } return sum; } public static long fizzBuzzCore2(int i) { if(i % 3 == 0 && i % 5 == 0) return 4; else if(i % 3 == 0) return 3; else if(i % 5 == 0) return 2; else return 1; } }
The return value is designed to prevent him from completely optimizing the call and to extract the "core" method designed to keep the size of the component output that must be compared as small as possible
(Note: of course, these modifications may affect optimization. For example, a JIT method may have a limit on the number of bytecode instructions. Before it is considered too large to be inlined, - XX: maxinlinesize = 35, but the effects of the two methods are roughly the same, so the required information about practical problems can still be obtained
Moreover, it is not a surprise: after the last optimization, the assembly code of both methods contains the same instructions - here is the assembly of fizzbuzzcore1 for reference:
Decoding compiled method 0x00000000026c0090: Code: [Entry Point] [Verified Entry Point] [Constants] # {method} {0x0000000057260528} 'fizzBuzzCore1' '(I)J' in 'Test04' # parm0: rdx = int # [sp+0x20] (sp of caller) 0x00000000026c01c0: sub $0x18,%rsp 0x00000000026c01c7: mov %rbp,0x10(%rsp) ;*synchronization entry ; - Test04::fizzBuzzCore1@-1 (line 27) 0x00000000026c01cc: movslq %edx,%r10 0x00000000026c01cf: mov %edx,%r11d 0x00000000026c01d2: sar $0x1f,%r11d ;*irem ; - Test04::fizzBuzzCore1@6 (line 29) 0x00000000026c01d6: imul $0x66666667,%r10,%r8 0x00000000026c01dd: imul $0x55555556,%r10 0x00000000026c01e4: sar $0x21,%r8 0x00000000026c01e8: sar $0x20,%r10 0x00000000026c01ec: mov %r8d,%r8d 0x00000000026c01ef: sub %r11d,%r8d ;*irem ; - Test04::fizzBuzzCore1@14 (line 31) 0x00000000026c01f2: mov %r10d,%r10d 0x00000000026c01f5: sub %r11d,%r10d ;*irem ; - Test04::fizzBuzzCore1@6 (line 29) 0x00000000026c01f8: mov %r8d,%r11d 0x00000000026c01fb: shl $0x2,%r11d 0x00000000026c01ff: add %r8d,%r11d ;*irem ; - Test04::fizzBuzzCore1@14 (line 31) 0x00000000026c0202: mov %r10d,%r9d 0x00000000026c0205: shl %r9d 0x00000000026c0208: add %r10d,%r9d ;*irem ; - Test04::fizzBuzzCore1@6 (line 29) 0x00000000026c020b: cmp %r9d,%edx 0x00000000026c020e: jne 0x00000000026c021c ;*ifeq ; - Test04::fizzBuzzCore1@21 (line 33) 0x00000000026c0210: cmp %r11d,%edx 0x00000000026c0213: jne 0x00000000026c021c ;*ifeq ; - Test04::fizzBuzzCore1@25 (line 33) 0x00000000026c0215: mov $0x4,%eax 0x00000000026c021a: jmp 0x00000000026c0239 ;*iload_1 ; - Test04::fizzBuzzCore1@32 (line 35) 0x00000000026c021c: cmp %r9d,%edx 0x00000000026c021f: jne 0x00000000026c0228 ;*ifeq ; - Test04::fizzBuzzCore1@33 (line 35) 0x00000000026c0221: mov $0x3,%eax 0x00000000026c0226: jmp 0x00000000026c0239 0x00000000026c0228: cmp %r11d,%edx 0x00000000026c022b: jne 0x00000000026c0234 ;*ifeq ; - Test04::fizzBuzzCore1@41 (line 37) 0x00000000026c022d: mov $0x2,%eax 0x00000000026c0232: jmp 0x00000000026c0239 0x00000000026c0234: mov $0x1,%eax ;*irem ; - Test04::fizzBuzzCore1@6 (line 29) 0x00000000026c0239: add $0x10,%rsp 0x00000000026c023d: pop %rbp 0x00000000026c023e: test %eax,-0x2470244(%rip) # 0x0000000000250000 ; {poll_return} 0x00000000026c0244: retq 0x00000000026c0245: hlt 0x00000000026c0246: hlt 0x00000000026c0247: hlt 0x00000000026c0248: hlt 0x00000000026c0249: hlt 0x00000000026c024a: hlt 0x00000000026c024b: hlt 0x00000000026c024c: hlt 0x00000000026c024d: hlt 0x00000000026c024e: hlt 0x00000000026c024f: hlt 0x00000000026c0250: hlt 0x00000000026c0251: hlt 0x00000000026c0252: hlt 0x00000000026c0253: hlt 0x00000000026c0254: hlt 0x00000000026c0255: hlt 0x00000000026c0256: hlt 0x00000000026c0257: hlt 0x00000000026c0258: hlt 0x00000000026c0259: hlt 0x00000000026c025a: hlt 0x00000000026c025b: hlt 0x00000000026c025c: hlt 0x00000000026c025d: hlt 0x00000000026c025e: hlt 0x00000000026c025f: hlt [Exception Handler] [Stub Code] 0x00000000026c0260: jmpq 0x000000000261c560 ; {no_reloc} [Deopt Handler Code] 0x00000000026c0265: callq 0x00000000026c026a 0x00000000026c026a: subq $0x5,(%rsp) 0x00000000026c026f: jmpq 0x00000000025f6f40 ; {runtime_call} 0x00000000026c0274: hlt 0x00000000026c0275: hlt 0x00000000026c0276: hlt 0x00000000026c0277: hlt
But
However, it is surprising that it does not calculate modular operations at all!
At least, there is no clear explanation: there is no IDIV instruction in this code! So JIT really tries to avoid costly differences by doing some annoying, annoying and bitter hacker attacks: instructions
0x00000000026c01d6: imul $0x66666667,%r10 (and following...)
It is a "no division of labor" implementation of a department For example, this method
private static int divideBy3(int n) { long r10 = n; r10 *= 0x55555556L; r10 >>>= 0x20; long r10d = r10 & 0xFFFFFFFFL; return (int)r10d; }
Use these magic constants and shift to calculate divide by 3 (similarly, for 5 divided by another constant) I haven't done mathematical calculations myself, but I can find an explanation on how to get modular operations from the "integer division by constants" document from hacker's right on page 32 of the "integer division by constants"