Java – why does us-ascii encoding accept non us-ascii characters?

2020-08-19 • Java

Consider the following codes:

public class ReadingTest {

    public void readAndPrint(String usingEncoding) throws Exception {
        ByteArrayInputStream bais = new ByteArrayInputStream(new byte[]{(byte) 0xC2,(byte) 0xB5}); // 'micro' sign UTF-8 representation
        InputStreamReader isr = new InputStreamReader(bais,usingEncoding);
        char[] cbuf = new char[2];
        isr.read(cbuf);
        System.out.println(cbuf[0]+" "+(int) cbuf[0]);
    }

    public static void main(String[] argv) throws Exception {
        ReadingTest w = new Readingtest();
        w.readAndPrint("UTF-8");
        w.readAndPrint("US-ASCII");
    }
}

Observed output:

µ 181
? 65533

Why did the second call to readandprint() (the one using us-ascii) succeed? I hope it throws an error because the input is not the correct character in this code Where is this behavior enforced in the Java API or JLS?

Solution

The default operation for finding non decodeable bytes in the input stream is to replace them with the Unicode character U + fffd replacement character

If you want to change it, you can pass characterdecoder to the inputstreamreader configured with different codingerroraction:

CharsetDecoder decoder = Charset.forName(usingEncoding).newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
InputStreamReader isr = new InputStreamReader(bais,decoder);

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

Java – change classes and objects (other aproachs?)

< <上一篇

Is java “public static void main (string [] args)” the only way to create a main method?

下一篇>>

搜索内容

Java – why does us-ascii encoding accept non us-ascii characters?

Solution

热门文章