Assembly – vmovdqu what are you doing here?
•
Java
I have a Java loop as follows: @ h_ 419_ 7@
public void testMethod() { int[] nums = new int[10]; for (int i = 0; i < nums.length; i++) { nums[i] = 0x42; } }
0x00000001296ac845: cmp %r10d,%ebp 0x00000001296ac848: jae 0x00000001296ac8b4 0x00000001296ac84a: movl $0x42,0x10(%rbx,%rbp,4) 0x00000001296ac852: inc %ebp 0x00000001296ac854: cmp %r11d,%ebp 0x00000001296ac857: jl 0x00000001296ac845 0x00000001296ac859: mov %r10d,%r8d 0x00000001296ac85c: add $0xfffffffd,%r8d 0x00000001296ac860: mov $0x80000000,%r9d 0x00000001296ac866: cmp %r8d,%r10d 0x00000001296ac869: cmovl %r9d,%r8d 0x00000001296ac86d: cmp %r8d,%ebp 0x00000001296ac870: jge 0x00000001296ac88e 0x00000001296ac872: vmovq -0xda(%rip),%xmm0 0x00000001296ac87a: vpunpcklqdq %xmm0,%xmm0,%xmm0 0x00000001296ac87e: xchg %ax,%ax 0x00000001296ac880: vmovdqu %xmm0,4) 0x00000001296ac886: add $0x4,%ebp 0x00000001296ac889: cmp %r8d,%ebp 0x00000001296ac88c: jl 0x00000001296ac880
Solution
The optimizer chooses to vectorize the loop, setting 4 values per iteration (vmovdqu the previous instruction is quite opaque, but it may map 0x42 to all channels of xmm0.) The "misaligned" variable is necessary because there is no guarantee that the array is SIMD aligned in memory (after all, it stores int32s, not int32x4s)
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
二维码