It’s slower than the sum of Java
This is the case
cat sum100000000.cpp && cat sum100000000.java #include <cstdio> using namespace std; int main(){ long N=1000000000,sum=0; for( long i=0; i<N; i++ ) sum+= i; printf("%ld\n",sum); } public class sum100000000 { public static void main(String[] args) { long sum=0; for(long i = 0; i < 1000000000; i++) sum += i; System.out.println(sum); } }
This is the result:
time ./a.out && time java sum100000000 499999999500000000 real 0m2.675s user 0m2.673s sys 0m0.002s Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8 499999999500000000 real 0m0.439s user 0m0.470s sys 0m0.027s
You can't see anything unusual in the disassembled binary But it seems that C binary is significantly slow I don't understand
My guess is that there may be some problems with the tool chain
clang -v Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn) Target: x86_64-apple-darwin13.4.0 Thread model: posix uname -a Darwin MacBook-Pro.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64
Add: C / CPP compilation does not have any special binary This will not change the result
gcc sum1b.cpp clang sum1b.cpp
Add: for those who care about llvm, nothing has really changed
$gcc sum100000000.cpp && time ./a.out gcc sum100000000.cpp && time ./a.out 499999999500000000 real 0m2.722s user 0m2.717s sys 0m0.003s
Modified: O2 faster: but looks like a liar
$otool -tV a.out otool -tV a.out a.out: (__TEXT,__text) section _main: 0000000100000f50 pushq %rbp 0000000100000f51 movq %rsp,%rbp 0000000100000f54 leaq 0x37(%rip),%rdi ## literal pool for: "%ld " 0000000100000f5b movabsq $0x6f05b59b5e49b00,%rsi 0000000100000f65 xorl %eax,%eax 0000000100000f67 callq 0x100000f70 ## symbol stub for: _printf 0000000100000f6c xorl %eax,%eax 0000000100000f6e popq %rbp 0000000100000f6f ret
I now believe that this is related to optimization, so now more questions about what JIT does to speed up this calculation?
Solution
The problem is probably that you did not compile the C version with optimization enabled If you enable aggressive optimization, the binaries generated by GCC will win The JIT of the JVM is good, but the simple fact is that the JVM must load and apply the JIT at run time; GCC can optimize binaries at compile time
Leave all GCC flags and give me a binary, which is quite slow, just like yours Using - O2 gave me a binary file that would hardly be output to the Java version Use - O3 to give me one that can easily beat the Java version (this is on my Linux Mint 16 64 bit machine with GCC 4.8.1 and Java 1.8.0_20 [e.g. Java 8 update 20]) Larsmans checked the disassembly of the - O3 version and made sure that the compiler did not pre calculate the results (my C and assembly Fu are very weak these days; thank larsmans for double checking) Interestingly, however, due to mat's investigation, this is actually my use of GCC 4.8 1. By products of; Earlier and later versions of GCC seem willing to calculate the results in advance For us, happy accidents
This is my pure c version [I also updated it to show that you use the constant Ajax's comment in the Java version, but the variable n in the C version (there is no real difference, but...)]:
sum. c:
#include <stdio.h> int main(){ long sum=0; long i; for( i=0; i<1000000000; i++ ) sum+= i; printf("%ld\n",sum); }
My java version is different from yours. It's not easy for me to lose the zero track:
sum. java:
public class sum { public static void main(String[] args) { long sum=0; for(long i = 0; i < 1000000000; i++) sum += i; System.out.println(sum); } }
result:
C binary run (compiled by GCC sum. C):
$time ./a.out 499999999500000000 real 0m2.436s user 0m2.429s sys 0m0.004s
Java run (no special flag compilation, no special runtime flag):
$time java sum 499999999500000000 real 0m0.691s user 0m0.684s sys 0m0.020s
Java run (no special flag for compilation, run - server - noverify, minor improvement):
$time java -server -noverify sum 499999999500000000 real 0m0.651s user 0m0.649s sys 0m0.016s
C binary operation (compiled by GCC - O2 sum. C):
$time ./a.out 499999999500000000 real 0m0.733s user 0m0.732s sys 0m0.000s
C binary operation (compiled by GCC - O3 sum. C):
$time ./a.out 499999999500000000 real 0m0.373s user 0m0.372s sys 0m0.000s
This is the main result of my - O3 version of objdump - D a.out:
0000000000400470 : 400470: 66 0f 6f 1d 08 02 00 movdqa 0x208(%rip),%xmm3 # 400680 400477: 00 400478: 31 c0 xor %eax,%eax 40047a: 66 0f ef c9 pxor %xmm1,%xmm1 40047e: 66 0f 6f 05 ea 01 00 movdqa 0x1ea(%rip),%xmm0 # 400670 400485: 00 400486: eb 0c jmp 400494 400488: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 40048f: 00 400490: 66 0f 6f c2 movdqa %xmm2,%xmm0 400494: 66 0f 6f d0 movdqa %xmm0,%xmm2 400498: 83 c0 01 add $0x1,%eax 40049b: 66 0f d4 c8 paddq %xmm0,%xmm1 40049f: 3d 00 65 cd 1d cmp $0x1dcd6500,%eax 4004a4: 66 0f d4 d3 paddq %xmm3,%xmm2 4004a8: 75 e6 jne 400490 4004aa: 66 0f 6f e1 movdqa %xmm1,%xmm4 4004ae: be 64 06 40 00 mov $0x400664,%esi 4004b3: bf 01 00 00 00 mov $0x1,%edi 4004b8: 31 c0 xor %eax,%eax 4004ba: 66 0f 73 dc 08 psrldq $0x8,%xmm4 4004bf: 66 0f d4 cc paddq %xmm4,%xmm1 4004c3: 66 0f 7f 4c 24 e8 movdqa %xmm1,-0x18(%rsp) 4004c9: 48 8b 54 24 e8 mov -0x18(%rsp),%rdx 4004ce: e9 8d ff ff ff jmpq 400460
As I said, my assembly is weak, but I see a loop instead of the compiler completing the math
For completeness only, the sum of the main parts of the results of javap - C:
public static void main(java.lang.String[]); Code: 0: lconst_0 1: lstore_1 2: lconst_0 3: lstore_3 4: lload_3 5: ldc2_w #2 // long 1000000000l 8: lcmp 9: ifge 23 12: lload_1 13: lload_3 14: ladd 15: lstore_1 16: lload_3 17: lconst_1 18: ladd 19: lstore_3 20: goto 4 23: getstatic #4 // Field java/lang/System.out:Ljava/io/PrintStream; 26: lload_1 27: invokevirtual #5 // Method java/io/PrintStream.println:(J)V 30: return
It does not pre calculate the result at the bytecode level; I can't say what JIT is doing