It’s slower than the sum of Java

This is the case

cat sum100000000.cpp && cat sum100000000.java 
#include <cstdio>

using namespace std;

int main(){
  long N=1000000000,sum=0;
  for( long i=0; i<N; i++ ) sum+= i;
  printf("%ld\n",sum);
}


public class sum100000000 {
    public static void main(String[] args) {
        long sum=0;
        for(long i = 0; i < 1000000000; i++) sum += i;
        System.out.println(sum);
    }
}

This is the result:

time ./a.out && time java sum100000000
499999999500000000

real    0m2.675s
user    0m2.673s
sys 0m0.002s
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
499999999500000000

real    0m0.439s
user    0m0.470s
sys 0m0.027s

You can't see anything unusual in the disassembled binary But it seems that C binary is significantly slow I don't understand

My guess is that there may be some problems with the tool chain

clang -v
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

uname -a
Darwin MacBook-Pro.local 13.4.0 Darwin Kernel Version 13.4.0: Sun Aug 17 19:50:11 PDT 2014; root:xnu-2422.115.4~1/RELEASE_X86_64 x86_64

Add: C / CPP compilation does not have any special binary This will not change the result

gcc sum1b.cpp
clang sum1b.cpp

Add: for those who care about llvm, nothing has really changed

$gcc sum100000000.cpp && time ./a.out
gcc sum100000000.cpp && time ./a.out
499999999500000000

real    0m2.722s
user    0m2.717s
sys 0m0.003s

Modified: O2 faster: but looks like a liar

$otool -tV a.out
otool -tV a.out
a.out:
(__TEXT,__text) section
_main:
0000000100000f50    pushq   %rbp
0000000100000f51    movq    %rsp,%rbp
0000000100000f54    leaq    0x37(%rip),%rdi ## literal pool for: "%ld
"
0000000100000f5b    movabsq $0x6f05b59b5e49b00,%rsi
0000000100000f65    xorl    %eax,%eax
0000000100000f67    callq   0x100000f70 ## symbol stub for: _printf
0000000100000f6c    xorl    %eax,%eax
0000000100000f6e    popq    %rbp
0000000100000f6f    ret

I now believe that this is related to optimization, so now more questions about what JIT does to speed up this calculation?

Solution

The problem is probably that you did not compile the C version with optimization enabled If you enable aggressive optimization, the binaries generated by GCC will win The JIT of the JVM is good, but the simple fact is that the JVM must load and apply the JIT at run time; GCC can optimize binaries at compile time

Leave all GCC flags and give me a binary, which is quite slow, just like yours Using - O2 gave me a binary file that would hardly be output to the Java version Use - O3 to give me one that can easily beat the Java version (this is on my Linux Mint 16 64 bit machine with GCC 4.8.1 and Java 1.8.0_20 [e.g. Java 8 update 20]) Larsmans checked the disassembly of the - O3 version and made sure that the compiler did not pre calculate the results (my C and assembly Fu are very weak these days; thank larsmans for double checking) Interestingly, however, due to mat's investigation, this is actually my use of GCC 4.8 1. By products of; Earlier and later versions of GCC seem willing to calculate the results in advance For us, happy accidents

This is my pure c version [I also updated it to show that you use the constant Ajax's comment in the Java version, but the variable n in the C version (there is no real difference, but...)]:

sum. c:

#include <stdio.h>

int main(){
  long sum=0;
  long i;
  for( i=0; i<1000000000; i++ ) sum+= i;
  printf("%ld\n",sum);
}

My java version is different from yours. It's not easy for me to lose the zero track:

sum. java:

public class sum {
    public static void main(String[] args) {
        long sum=0;
        for(long i = 0; i < 1000000000; i++) sum += i;
        System.out.println(sum);
    }
}

result:

C binary run (compiled by GCC sum. C):

$time ./a.out
499999999500000000

real    0m2.436s
user    0m2.429s
sys 0m0.004s

Java run (no special flag compilation, no special runtime flag):

$time java sum
499999999500000000

real    0m0.691s
user    0m0.684s
sys 0m0.020s

Java run (no special flag for compilation, run - server - noverify, minor improvement):

$time java -server -noverify sum
499999999500000000

real    0m0.651s
user    0m0.649s
sys 0m0.016s

C binary operation (compiled by GCC - O2 sum. C):

$time ./a.out
499999999500000000

real    0m0.733s
user    0m0.732s
sys 0m0.000s

C binary operation (compiled by GCC - O3 sum. C):

$time ./a.out
499999999500000000

real    0m0.373s
user    0m0.372s
sys 0m0.000s

This is the main result of my - O3 version of objdump - D a.out:

0000000000400470 :
  400470:   66 0f 6f 1d 08 02 00    movdqa 0x208(%rip),%xmm3        # 400680 
  400477:   00 
  400478:   31 c0                   xor    %eax,%eax
  40047a:   66 0f ef c9             pxor   %xmm1,%xmm1
  40047e:   66 0f 6f 05 ea 01 00    movdqa 0x1ea(%rip),%xmm0        # 400670 
  400485:   00 
  400486:   eb 0c                   jmp    400494 
  400488:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40048f:   00 
  400490:   66 0f 6f c2             movdqa %xmm2,%xmm0
  400494:   66 0f 6f d0             movdqa %xmm0,%xmm2
  400498:   83 c0 01                add    $0x1,%eax
  40049b:   66 0f d4 c8             paddq  %xmm0,%xmm1
  40049f:   3d 00 65 cd 1d          cmp    $0x1dcd6500,%eax
  4004a4:   66 0f d4 d3             paddq  %xmm3,%xmm2
  4004a8:   75 e6                   jne    400490 
  4004aa:   66 0f 6f e1             movdqa %xmm1,%xmm4
  4004ae:   be 64 06 40 00          mov    $0x400664,%esi
  4004b3:   bf 01 00 00 00          mov    $0x1,%edi
  4004b8:   31 c0                   xor    %eax,%eax
  4004ba:   66 0f 73 dc 08          psrldq $0x8,%xmm4
  4004bf:   66 0f d4 cc             paddq  %xmm4,%xmm1
  4004c3:   66 0f 7f 4c 24 e8       movdqa %xmm1,-0x18(%rsp)
  4004c9:   48 8b 54 24 e8          mov    -0x18(%rsp),%rdx
  4004ce:   e9 8d ff ff ff          jmpq   400460 

As I said, my assembly is weak, but I see a loop instead of the compiler completing the math

For completeness only, the sum of the main parts of the results of javap - C:

  public static void main(java.lang.String[]);
    Code:
       0: lconst_0
       1: lstore_1
       2: lconst_0
       3: lstore_3
       4: lload_3
       5: ldc2_w        #2                  // long 1000000000l
       8: lcmp
       9: ifge          23
      12: lload_1
      13: lload_3
      14: ladd
      15: lstore_1
      16: lload_3
      17: lconst_1
      18: ladd
      19: lstore_3
      20: goto          4
      23: getstatic     #4                  // Field java/lang/System.out:Ljava/io/PrintStream;
      26: lload_1
      27: invokevirtual #5                  // Method java/io/PrintStream.println:(J)V
      30: return

It does not pre calculate the result at the bytecode level; I can't say what JIT is doing

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>