Java – * * busted * * how to use sun misc. Unsafe speed up byte [] lookup?
I'm trying to use unsafe to iterate memory instead of traversing the values in byte [] Use unsafe to allocate memory blocks The memory is sufficient to hold 65536 byte values
I'm trying this:
char aChar = some character if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){ // do something }
Substitute:
char aChar = some character if ((byte) 0 == ( lookup[aChar] & mask )){ // do something }
I think unsafe can access memory faster than using regular array access and perform index checks on each index
Just wishful thinking that the JVM will have a special operation (unsafe), which will make regular array access and iteration faster in some way In my opinion, the JVM works well in normal byte [] iterations and can quickly complete them using normal, pure, pure java code
@Millimoose hit the well-known "nail on the head"
"Insecurity may be useful for many things, but this degree of micro optimization is not one of them." – Millimeter“
It is faster to use unsafe under very strict and limited conditions:
>(64 bit JVM only) faster lookup for a single 65535 bytes [] and only one test at a time In this case, 64_ Unsafelookup on bit JVM_ 8b 24% faster If the test is repeated twice each time, the normal method is now 30% faster than the unsafe method In pure interpretation mode on cold JVMs, unsafe is faster so far – but only for the first time and only for small array sizes In 32-bit standard Oracle JVM 7 On X, it is normally three times faster than using unsafe
Using unsafe (in my test) is slow:
>Oracle Java 64 bit and 32-bit virtual machines are relatively slow > it will be slower regardless of the operating system and machine architecture (32-bit and 64 bit) > it will be slower even if the serverjvm option is called > in the 32-bit JVM (64 bit or even slower?) In the following code, the insecurity is 9% or more (1_gb array and unsafelookup_8b (the fastest one)) in 64 bit JVM, the insecurity is 234% or more (1_mb array and unsafelookup_1b (the fastest one) in the following code
Is there any reason**
When I run the following code yellowb (check 1GB bytes []), the normal is still the fastest:
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar" initialize data... initialize data done! use normalLookup()... Not found '0' time : 1967737 us. use unsafeLookup_1B()... Not found '0' time : 2923367 us. use unsafeLookup_8B()... Not found '0' time : 2495663 us. Flat profile of 26.35 secs (2018 total ticks): main Interpreted + native Method 0.0% 1 + 0 test.StackOverflow.main 0.0% 1 + 0 Total interpreted Compiled + native Method 67.8% 1369 + 0 test.StackOverflow.main 11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B 11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B 9.1% 184 + 0 test.StackOverflow.normalLookup 99.9% 2016 + 0 Total compiled Stub + native Method 0.0% 0 + 1 sun.misc.Unsafe.getLong 0.0% 0 + 1 Total stub Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 26.39 seconds: 100.0% 2023 Received ticks C:\Users\wilf>java -version java version "1.7.0_07" Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) Client VM (build 23.3-b01,mixed mode,sharing)
CPU: Intel Core 2 Duo E4600 @ 2.4GHz 4.00GB (available 3.25gb) operating system: Windows 7 (32)
Using Windows 7_ 64,32-bit Java runs tests on 4-core AMD64:
initialize data... initialize data done! use normalLookup()... Not found '0' time : 1631142 us. use unsafeLookup_1B()... Not found '0' time : 2365214 us. use unsafeLookup_8B()... Not found '0' time : 1783320 us.
Using Windows 7_ 64,64 bit Java runs tests on 4 core AMD64:
use normalLookup()... Not found '0' time : 655146 us. use unsafeLookup_1B()... Not found '0' time : 904783 us. use unsafeLookup_8B()... Not found '0' time : 764427 us. Flat profile of 6.34 secs (13 total ticks): main Interpreted + native Method 23.1% 3 + 0 java.io.PrintStream.println 23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B 15.4% 2 + 0 test.StackOverflow.main 7.7% 1 + 0 java.io.DataInputStream.<init> 69.2% 9 + 0 Total interpreted Compiled + native Method 7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B 7.7% 0 + 1 test.StackOverflow.main 7.7% 0 + 1 test.StackOverflow.normalLookup 7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B 30.8% 0 + 4 Total compiled Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM Thread-local ticks: 100.0% 1 Blocked (of total) Global summary of 6.35 seconds: 100.0% 14 Received ticks 42.9% 6 Compilation
Solution
I think the two functions you publish are basically the same, because they only read 1 byte and then convert it to int for further comparison
It is more effective to read 4 bytes int or 8 bytes at a time I wrote two functions to do the same thing: compare the contents of two bytes [] to see if they are the same:
Function 1:
public static boolean hadoopEquals(byte[] b1,byte[] b2) { if(b1 == b2) { return true; } if(b1.length != b2.length) { return false; } // Bring WritableComparator code local for(int i = 0;i < b1.length; ++i) { int a = (b1[i] & 0xff); int b = (b2[i] & 0xff); if (a != b) { return false; } } return true; }
Function 2:
public static boolean goodEquals(byte[] b1,byte[] b2) { if(b1 == b2) { return true; } if(b1.length != b2.length) { return false; } int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class); int numLongs = (int)Math.ceil(b1.length / 8.0); for(int i = 0;i < numLongs; ++i) { long currentOffset = baSEOffset + (i * 8); long l1 = UnSafe.getLong(b1,currentOffset); long l2 = UnSafe.getLong(b2,currentOffset); if(0L != (l1 ^ l2)) { return false; } } return true; }
I ran the two functions (Corei7 2630qm, 8GB DDR3, 64bit win 7, 64bit hotspot JVM) on my laptop, and compared the two 400MB bytes []. The results are as follows:
Function 1: ~ 670ms
Function 2: ~ 80ms
2 faster
So my suggestion is to read 8 bytes at a time and use the XOR operator (^):
long l1 = UnSafe.getLong(byteArray,offset); //8 byte if(0L == l1 ^ 0xFF) //if the lowest byte == 0? /* do something */ if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0? /* do something */ /* go on... */
================================================== ==========================
Hi, wilf, I use your code to create a test class, as shown below. This class compares the speed between the three functions that find the first 0 in the byte array:
package test; import java.lang.reflect.Field; import sun.misc.Unsafe; /** * Test the speed in looking up the 1st 0 in a byte array * Set -Xms the same as -Xms to avoid Heap reallocation * * @author yellowb * */ public class StackOverflow { public static Unsafe UnSafe; public static Unsafe getUnsafe() throws SecurityException,NoSuchFieldException,IllegalArgumentException,illegalaccessexception { Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); return unsafe; } /** * use 'byte[index]' form to read 1 byte every time * @param buf */ public static void normalLookup(byte[] buf) { for (int i = 0; i < buf.length; ++i) { if ((byte) 0 == buf[i]) { System.out.println("The 1st '0' is at position : " + i); return; } } System.out.println("Not found '0'"); } /** * use Unsafe.getByte to read 1 byte every time directly from the memory * @param buf */ public static void unsafeLookup_1B(byte[] buf) { int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class); for (int i = 0; i < buf.length; ++i) { byte b = UnSafe.getByte(buf,(long) (baSEOffset + i)); if (0 == ((int) b & 0xFF)) { System.out.println("The 1st '0' is at position : " + i); return; } } System.out.println("Not found '0'"); } /** * use Unsafe.getLong to read 8 byte every time directly from the memory * @param buf */ public static void unsafeLookup_8B(byte[] buf) { int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class); //The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop int numLongs = buf.length / 8; long currentOffset = 0L; for (int i = 0; i < numLongs; ++i) { currentOffset = baSEOffset + (i * 8); //the step is 8 bytes long l = UnSafe.getLong(buf,currentOffset); //Compare each byte(in the 8-Byte long) to 0 //PS:x86 cpu is little-endian mode if (0L == (l & 0xFF)) { System.out.println("The 1st '0' is at position : " + (i * 8)); return; } if (0L == (l & 0xFF00L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 1)); return; } if (0L == (l & 0xFF0000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 2)); return; } if (0L == (l & 0xFF000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 3)); return; } if (0L == (l & 0xFF00000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 4)); return; } if (0L == (l & 0xFF0000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 5)); return; } if (0L == (l & 0xFF000000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 6)); return; } if (0L == (l & 0xFF00000000000000L)) { System.out.println("The 1st '0' is at position : " + (i * 8 + 7)); return; } } //If some rest bytes exists int rest = buf.length % 8; if(0 != rest) { currentOffset = currentOffset + 8; //Because the length of rest bytes < 8,we have to read them one by one for(; currentOffset < (baSEOffset + buf.length); ++currentOffset) { byte b = UnSafe.getByte(buf,(long)currentOffset); if (0 == ((int) b & 0xFF)) { System.out.println("The 1st '0' is at position : " + (currentOffset - baSEOffset)); return; } } } System.out.println("Not found '0'"); } public static void main(String[] args) throws SecurityException,illegalaccessexception { UnSafe = getUnsafe(); int len = 1024 * 1024 * 1024; //1G long startTime = 0L; long endTime = 0L; System.out.println("initialize data..."); byte[] byteArray1 = new byte[len]; for (int i = 0; i < len; ++i) { byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0 } //If you want to set one byte to 0,uncomment the below statement // byteArray1[2500] = (byte)0; System.out.println("initialize data done!"); System.out.println("use normalLookup()..."); startTime = System.nanoTime(); normalLookup(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); System.out.println("use unsafeLookup_1B()..."); startTime = System.nanoTime(); unsafeLookup_1B(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); System.out.println("use unsafeLookup_8B()..."); startTime = System.nanoTime(); unsafeLookup_8B(byteArray1); endTime = System.nanoTime(); System.out.println("time : " + ((endTime - startTime) / 1000) + " us."); } }
The output is:
initialize data... initialize data done! use normalLookup()... Not found '0' time : 1271781 us. use unsafeLookup_1B()... Not found '0' time : 716898 us. use unsafeLookup_8B()... Not found '0' time : 591689 us.
The results show that each use of unsafe Getbyte() reads 1 byte much faster than regular iteration byte [] Reading 8 bytes is the fastest