Strange behavior of Java scanner reading files
Therefore, I encountered an interesting problem when using the scanner class to read content from a file Basically, I'm trying to read several output files generated by the parsing application from the directory to calculate some accuracy metrics
Basically, my code just walks through each file in the directory and uses a scanner to open them to process the content For whatever reason, the scanner did not read some files (all UTF-8 encoding) Even if the file is not empty, scan Hasnextline () also returns false on the first call (I open the debugger and observe it) Every time I use the file object to directly initialize the scanner (the file objects has been successfully created) Namely:
File file = new File(pathName); ... Scanner scanner = new Scanner(file);
I tried several things and finally solved this problem by initializing the scanner in the following ways:
Scanner scanner = new Scanner(new FileInputStream(file));
Although I'm glad to solve this problem, I'm still curious about possible problems in the past Any ideas? Thank you.
Solution
According to the scanner in Java 6u23 Java source, new line detected
private static final String LINE_SEPARATOR_PATTERN = "\r\n|[\n\r???]"; private static final String LINE_PATTERN = ".*("+LINE_SEPARATOR_PATTERN+")|.+$";
Therefore, you can check whether the following regular expressions can match the contents of unread files
.*(\r\n|[\n\r???])|.+$
I'll also check for abnormalities
Update: it makes me curious. I look for the answer It seems that your problem has been asked and solved here: Java scanner (file) misbehaving, but scanner (FileInputStream) always works with the same file
To sum up, it is about characters other than ASCII. Their behavior will be different depending on whether you initialize scanner with file or FileInputStream