Java modifies UTF-8 strings in Python

I connect to Java applications through python I need to be able to construct byte sequences that contain UTF - 8 strings Java in datainputstream The modified UTF-8 encoding is used in readutf(), which is not supported by python (yet at least)

Can anyone point out the right direction for me to construct Java modified UTF-8 strings in Python?

Update #1: to learn more about Java modified UTF-8, see the readutf method from the datainput interface on the 550 here or here in the Java se docs line

Update #2: I'm trying to interact with a third-party JBoss web application that is using this modified utf8 format by calling datainputstream Read UTF to read the string in the string (sorry for any confusion with normal Java UTF8 string operations)

Thank you in advance

Solution

You can ignore the modified UTF-8 encoding (mutf-8) and treat it as UTF-8 In Python, you can handle it like this,

>Convert the string to normal UTF-8 and store the bytes in the buffer. > Write 2-byte buffer length (not string length) as binary in big endian. > Write down the entire buffer

I did this in PHP, and Java didn't complain about my coding at all (at least in Java 5)

Mutf - 8 is mainly used in JNI and other systems with null termination strings The only difference from ordinary UTF - 8 is how u 0000 is encoded Ordinary UTF-8 uses 1 byte encoding (0x00), and mutf-8 uses 2 bytes (0xc0 0x80) First, you should not use U 0000 (invalid code point) in any Unicode text Second, datainputstream Readutf () does not enforce encoding, so it is happy to accept either

Editor: Python code should be like this,

def writeUTF(data,str):
    utf8 = str.encode('utf-8')
    length = len(utf8)
    data.append(struct.pack('!H',length))
    format = '!' + str(length) + 's'
    data.append(struct.pack(format,utf8))
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>