Java – read and write files that contain UTF – 8 (different language) characters
I have a file, It contains the following characters: “Joh 1:1ஆதியிலேஆதியிலே்த்தைதை்தது,அந்ததவாரதததைதைதைதைதைதைதைதேவனிடதததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுததுதத ுததுததுததுதது”“”“”“”“”“
www.unicode. org/charts/PDF/U0B80. pdf
When I use the following code:
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out,"UTF8"));
The output is a box and other strange characters, as follows:
“P = O ֛;< A Y ՠ;”
Can I help you?
These are the complete code:
File f=new File("E:\\bible.docx"); Reader decoded=new InputStreamReader(new FileInputStream(f),StandardCharsets.UTF_8); bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out,StandardCharsets.UTF_8)); char[] buffer = new char[1024]; int n; StringBuilder build=new StringBuilder(); while(true){ n=decoded.read(buffer); if(n<0){break;} build.append(buffer,n); bufferedWriter.write(buffer); }
The StringBuilder value displays UTF characters, but when displayed in a window, it displays as a box
Find the answer to the question!!! The encoding is correct (i.e. UTF-8). Java reads the file as UTF-8 and the string character is UTF-8. The problem is that there is no font to display it in the output panel of NetBeans After changing the font of the output panel (NetBeans - > tools - > Options - > misc - > Output tab), I got the expected results The same applies when it is displayed in jtextarea (the font needs to be changed) But we can't change the windows' CMD prompt font
Solution
Because your output is encoded in UTF-8, but still contains replacement characters (U fffd,), I believe there will be problems when you read data
Make sure you know the encoding used by the input stream and set the encoding according to the inputstreamreader If that's Tamil, I guess it could be UTF-8 I don't know if Java supports tace-16 It looks like this
StringBuilder buffer = new StringBuilder(); try (InputStream encoded = ...) { Reader decoded = new InputStreamReader(encoded,StandardCharsets.UTF_8); char[] buffer = new char[1024]; while (true) { int n = decoded.read(buffer); if (n < 0) break; buffer.append(buffer,n); } } String verse = buffer.toString();