Java – * * busted * * how to use sun misc. Unsafe speed up byte [] lookup?
I'm trying to use unsafe to iterate memory instead of traversing the values in byte [] Use unsafe to allocate memory blocks The memory is sufficient to hold 65536 byte values
I'm trying this:
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
Substitute:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
I think unsafe can access memory faster than using regular array access and perform index checks on each index
Just wishful thinking that the JVM will have a special operation (unsafe), which will make regular array access and iteration faster in some way In my opinion, the JVM works well in normal byte [] iterations and can quickly complete them using normal, pure, pure java code
@Millimoose hit the well-known "nail on the head"
"Insecurity may be useful for many things, but this degree of micro optimization is not one of them." – Millimeter“
It is faster to use unsafe under very strict and limited conditions:
>(64 bit JVM only) faster lookup for a single 65535 bytes [] and only one test at a time In this case, 64_ Unsafelookup on bit JVM_ 8b 24% faster If the test is repeated twice each time, the normal method is now 30% faster than the unsafe method In pure interpretation mode on cold JVMs, unsafe is faster so far – but only for the first time and only for small array sizes In 32-bit standard Oracle JVM 7 On X, it is normally three times faster than using unsafe
Using unsafe (in my test) is slow:
>Oracle Java 64 bit and 32-bit virtual machines are relatively slow > it will be slower regardless of the operating system and machine architecture (32-bit and 64 bit) > it will be slower even if the serverjvm option is called > in the 32-bit JVM (64 bit or even slower?) In the following code, the insecurity is 9% or more (1_gb array and unsafelookup_8b (the fastest one)) in 64 bit JVM, the insecurity is 234% or more (1_mb array and unsafelookup_1b (the fastest one) in the following code
Is there any reason**
When I run the following code yellowb (check 1GB bytes []), the normal is still the fastest:
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01,mixed mode,sharing)
CPU: Intel Core 2 Duo E4600 @ 2.4GHz 4.00GB (available 3.25gb) operating system: Windows 7 (32)
Using Windows 7_ 64,32-bit Java runs tests on 4-core AMD64:
initialize data... initialize data done! use normalLookup()... Not found '0' time : 1631142 us. use unsafeLookup_1B()... Not found '0' time : 2365214 us. use unsafeLookup_8B()... Not found '0' time : 1783320 us.
Using Windows 7_ 64,64 bit Java runs tests on 4 core AMD64:
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
Solution
I think the two functions you publish are basically the same, because they only read 1 byte and then convert it to int for further comparison
It is more effective to read 4 bytes int or 8 bytes at a time I wrote two functions to do the same thing: compare the contents of two bytes [] to see if they are the same:
Function 1:
public static boolean hadoopEquals(byte[] b1,byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
// Bring WritableComparator code local
for(int i = 0;i < b1.length; ++i)
{
int a = (b1[i] & 0xff);
int b = (b2[i] & 0xff);
if (a != b)
{
return false;
}
}
return true;
}
Function 2:
public static boolean goodEquals(byte[] b1,byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class);
int numLongs = (int)Math.ceil(b1.length / 8.0);
for(int i = 0;i < numLongs; ++i)
{
long currentOffset = baSEOffset + (i * 8);
long l1 = UnSafe.getLong(b1,currentOffset);
long l2 = UnSafe.getLong(b2,currentOffset);
if(0L != (l1 ^ l2))
{
return false;
}
}
return true;
}
I ran the two functions (Corei7 2630qm, 8GB DDR3, 64bit win 7, 64bit hotspot JVM) on my laptop, and compared the two 400MB bytes []. The results are as follows:
Function 1: ~ 670ms
Function 2: ~ 80ms
2 faster
So my suggestion is to read 8 bytes at a time and use the XOR operator (^):
long l1 = UnSafe.getLong(byteArray,offset); //8 byte if(0L == l1 ^ 0xFF) //if the lowest byte == 0? /* do something */ if(0L == l1 ^ 0xFF00) //if the 2nd lowest byte == 0? /* do something */ /* go on... */
================================================== ==========================
Hi, wilf, I use your code to create a test class, as shown below. This class compares the speed between the three functions that find the first 0 in the byte array:
package test;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* Test the speed in looking up the 1st 0 in a byte array
* Set -Xms the same as -Xms to avoid Heap reallocation
*
* @author yellowb
*
*/
public class StackOverflow
{
public static Unsafe UnSafe;
public static Unsafe getUnsafe() throws SecurityException,NoSuchFieldException,IllegalArgumentException,illegalaccessexception
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
/**
* use 'byte[index]' form to read 1 byte every time
* @param buf
*/
public static void normalLookup(byte[] buf)
{
for (int i = 0; i < buf.length; ++i)
{
if ((byte) 0 == buf[i])
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getByte to read 1 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_1B(byte[] buf)
{
int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class);
for (int i = 0; i < buf.length; ++i)
{
byte b = UnSafe.getByte(buf,(long) (baSEOffset + i));
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getLong to read 8 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_8B(byte[] buf)
{
int baSEOffset = UnSafe.arrayBaSEOffset(byte[].class);
//The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
int numLongs = buf.length / 8;
long currentOffset = 0L;
for (int i = 0; i < numLongs; ++i)
{
currentOffset = baSEOffset + (i * 8); //the step is 8 bytes
long l = UnSafe.getLong(buf,currentOffset);
//Compare each byte(in the 8-Byte long) to 0
//PS:x86 cpu is little-endian mode
if (0L == (l & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (i * 8));
return;
}
if (0L == (l & 0xFF00L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
return;
}
if (0L == (l & 0xFF0000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
return;
}
if (0L == (l & 0xFF000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
return;
}
if (0L == (l & 0xFF00000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
return;
}
if (0L == (l & 0xFF0000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
return;
}
if (0L == (l & 0xFF000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
return;
}
if (0L == (l & 0xFF00000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
return;
}
}
//If some rest bytes exists
int rest = buf.length % 8;
if(0 != rest)
{
currentOffset = currentOffset + 8;
//Because the length of rest bytes < 8,we have to read them one by one
for(; currentOffset < (baSEOffset + buf.length); ++currentOffset)
{
byte b = UnSafe.getByte(buf,(long)currentOffset);
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (currentOffset - baSEOffset));
return;
}
}
}
System.out.println("Not found '0'");
}
public static void main(String[] args) throws SecurityException,illegalaccessexception
{
UnSafe = getUnsafe();
int len = 1024 * 1024 * 1024; //1G
long startTime = 0L;
long endTime = 0L;
System.out.println("initialize data...");
byte[] byteArray1 = new byte[len];
for (int i = 0; i < len; ++i)
{
byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0
}
//If you want to set one byte to 0,uncomment the below statement
// byteArray1[2500] = (byte)0;
System.out.println("initialize data done!");
System.out.println("use normalLookup()...");
startTime = System.nanoTime();
normalLookup(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_1B()...");
startTime = System.nanoTime();
unsafeLookup_1B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_8B()...");
startTime = System.nanoTime();
unsafeLookup_8B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
}
}
The output is:
initialize data... initialize data done! use normalLookup()... Not found '0' time : 1271781 us. use unsafeLookup_1B()... Not found '0' time : 716898 us. use unsafeLookup_8B()... Not found '0' time : 591689 us.
The results show that each use of unsafe Getbyte() reads 1 byte much faster than regular iteration byte [] Reading 8 bytes is the fastest
