How to avoid memory waste when storing UTF – 8 characters (8 bits) in Java characters (16 bits) two-in-one?

2020-02-22 • Java

I'm afraid I have questions about the details of a rather over saturated topic. I searched a lot, but I can't find a clear answer - this is particularly obvious - important question:

When converting byte [] to string using UTF-8, each byte (8 bits) becomes an 8-bit character encoded by UTF-8, but each UTF-8 character is saved as a 16 bit character in Java Is that right? If so, does this mean that each stupid Java character uses only the first 8 bits and consumes twice as much memory? Is that right? I want to know how this waste is accepted

Is there any trick to have an 8-bit pseudo string? Does this actually reduce memory consumption? Or, is there a way to store 8-bit characters in > two < one Java 16bit characters, which can avoid this kind of memory waste? Thank you for any confusing answers... Editor: Hi, thank you for your answers I know the variable length attribute of UTF - 8 However, since my source is an 8 - bit byte, I understand (obviously wrong) that it only needs 8 - bit UTF - 8 words Does the UTF-8 conversion actually save the strange symbols you see when you see "cat somebinary" on the CLI? I think UTF - 8 is only used to map the bytes of each possible 8 - bit word to a specific 8 - bit word of UTF - 8 Wrong? I thought about using Base64, but it's terrible because it only uses 7 bits

Reformulated question: is there a smarter way to convert bytes to strings? Perhaps my favorite is to convert byte [] to char [], but after that, I still have 16 bit words

Other use case information:

I am adjusting jedis (the Java client of NoSQL redis) as the "original storage layer" of hypergraphdb Therefore, jedis is another "database" database My problem is that I must always provide jedis with byte [] data, but internally, > redis < (actual server) only processes "binary security" strings Since redis is written in C language, char is 8 bits long and AFAIK is 7 bits instead of ASCII However, in jedis, in the Java world, each character is 16 bits long internally I don't know this code yet, but I want jedis to convert this Java 16 bit string into a redis compliant 8-bit string ([here] [3]) It says it extends filteroutputstream I want to bypass it byte [] < - > string full conversion and use filteroutputstream...?)

Now I want to know: if I have to swap byte [] and string all the time, and the amount of data is from very small to possibly very large, then passing each 8-bit character into 16 bits in Java won't waste a lot of memory?

Solution

Yes, please make sure you have the latest version of Java

The above is how to avoid memory waste when storing UTF-8 characters (8 bits) in Java characters (16 bits) two-in-one? I hope this article can help you solve how to avoid memory waste when storing UTF-8 characters (8 bits) in Java characters (16 bits) two-in-one? Program development problems encountered.

If you think the content of the programming home website is good, you are welcome to recommend the programming home website to programmers and friends.

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java lambda sublist

< <上一篇

Android implements a picture verification code countdown function

下一篇>>

搜索内容

How to avoid memory waste when storing UTF – 8 characters (8 bits) in Java characters (16 bits) two-in-one?

Solution

热门文章