Java – improve protocol buffer performance

I'm writing an application that needs to quickly deserialize millions of messages from a single file

The main function of an application is to get a message from a file, do some work, and then discard the message Each message consists of ~ 100 fields (not all messages are parsed, but I need them because the user of the application can decide which fields he wants to process)

At this point, the application contains a loop that is executed in each iteration using the readdelimitedfrom () call

There is no way to optimize the problem to better adapt to this situation (splitting multiple files, etc.) In addition, at this moment, due to the number of messages and the dimension of each message, I need a gzip file (and it is quite effective in reducing the size because the field values are very repetitive) – although this reduces performance

Solution

If CPU time is your bottleneck (this is unlikely if you load directly from a hard disk with cold cache, but this may be the case in other cases), here are some ways to improve throughput:

>If possible, use C instead of Java and reuse the same message object for each iteration of the loop This reduces the time spent on memory management because the same memory is reused every time. > Instead of using readdelimitedfrom(), construct a codedinputstream and use it to read multiple messages, as shown below:

// Do this once:
CodedInputStream cis = CodedInputStream.newInstance(input);

// Then read each message like so:
int limit = cis.pushLimit(cis.readRawVarint32());
builder.mergeFrom(cis);
cis.popLimit(limit);
cis.resetSizeCounter();

(a similar approach applies to C.) > use snappy or lz4 compression instead of gzip These algorithms still obtain reasonable compression ratio, but are optimized for speed (lz4 may be better, although snappy is developed by protobufs developed by Google, so you may want to test them on your dataset.) > Consider using cap'n protocol instead of protocol buffers Unfortunately, there is no java version yet, but edit: there are capnproto Java, and many other language implementations It has proven to be much faster in the languages it supports (Disclosure: I'm the author of Cap'n proto. I'm also the author of protocol buffers V2, an open source version released by Google.)

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>