Java – why does us-ascii encoding accept non us-ascii characters?

Consider the following codes:

public class ReadingTest {

    public void readAndPrint(String usingEncoding) throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2,(byte) 0xB5}); // 'micro' sign UTF-8 representation
        InputStreamReader isr = new InputStreamReader(bais,usingEncoding);
        char[] cbuf = new char[2];
        isr.read(cbuf);
        System.out.println(cbuf[0]+" "+(int) cbuf[0]);
    }

    public static void main(String[] argv) throws Exception {
        ReadingTest w = new Readingtest();
        w.readAndPrint("UTF-8");
        w.readAndPrint("US-ASCII");
    }
}

Observed output:

µ 181
? 65533

Why did the second call to readandprint() (the one using us-ascii) succeed? I hope it throws an error because the input is not the correct character in this code Where is this behavior enforced in the Java API or JLS?

Solution

The default operation for finding non decodeable bytes in the input stream is to replace them with the Unicode character U + fffd replacement character

If you want to change it, you can pass characterdecoder to the inputstreamreader configured with different codingerroraction:

CharsetDecoder decoder = Charset.forName(usingEncoding).newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
InputStreamReader isr = new InputStreamReader(bais,decoder);
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>