Junior sister learning java IO: reading files

brief introduction

Younger martial sister is a little confused about the reader and stream in Java io. She doesn't know which one to use. How to read the file is the correct posture? Today, senior brother f answered for her on the spot.

Characters and bytes

Younger martial sister is confused recently: Senior brother F, last time you mentioned that IO reading is divided into two categories: reader and InputStream. What's the difference between these two categories? Why do I see some classes that are both reader and stream? For example: inputstreamreader?

Younger martial sister, do you know the ultimate three questions of philosophers? who are you? Where do you come from? Where to go?

Elder martial brother F, are you confused? I'm asking you Java, what philosophy do you talk about.

Junior sister, actually, philosophy is the basis of all knowledge. Do you know how to translate scientific principles? The philosophy of science.

What do you think is the essence of code in a computer? The essence of code is a long string of binary numbers composed of 0 and 1. When so many binary numbers are combined, they become the code in the computer, that is, the binary code that the JVM can recognize and run.

Younger martial sister admires: what elder martial brother f said seems very reasonable, but what does it have to do with reader and InputStream?

Don't worry. There is a fixed number in the dark. Let me ask you a question first. What is the smallest unit stored in Java?

Younger martial sister: let me see, the smallest one in Java should be Boolean. True and false correspond to binary 1 and 0.

By the way, although Boolean is also the smallest unit of storage in Java, it needs to occupy a byte of space. The smallest storage unit in Java is actually byte. If you don't believe it, you can use the JOL tool I introduced earlier to verify:

[main] INFO com.flydean.JolUsage - java.lang.Boolean object internals:
 OFFSET  SIZE      TYPE DESCRIPTION                               VALUE
      0    12           (object header)                           N/A
     12     1   boolean Boolean.value                             N/A
     13     3           (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total

The above is the boxed Boolean. You can see that although the Boolean finally occupies 16bytes, the Boolean in it is only 1bytes.

Byte is translated into Chinese as byte, which is the basic unit stored in Java.

With bytes, we can interpret characters. Characters are composed of bytes. According to different encoding methods, characters can be composed of one, two or more bytes. Chinese characters that we humans can recognize with the naked eye and English can be regarded as characters.

The reader is the character read according to a certain encoding format, and the InputStream is the lower byte read directly.

Younger martial sister: I see. We can use reader for text files and InputStream for non text files.

Children can be taught. Younger martial sister is making rapid progress.

Read by character

Younger martial sister, next elder martial brother f will tell you about several ways to read files by characters. The first is to use FileReader to read files, but FileReader itself does not provide any method to read data. To really read data, we still need to use BufferedReader to connect to FileReader. BufferedReader provides read cache, which can read one line at a time:

public void withFileReader() throws IOException {
        File file = new File("src/main/resources/www.flydean.com");

        try (FileReader fr = new FileReader(file); BufferedReader br = new BufferedReader(fr)) {
            String line;
            while ((line = br.readLine()) != null) {
                if (line.contains("www.flydean.com")) {
                    log.info(line);
                }
            }
        }
    }

Each time you read a line, you can connect these lines to form a stream through files Lines, we get a stream in which we can use lambda expression to read files. This is the second method:

public void withStream() throws IOException {
        Path filePath = Paths.get("src/main/resources","www.flydean.com");
        try (Stream<String> lines = Files.lines(filePath))
        {
            List<String> filteredLines = lines.filter(s -> s.contains("www.flydean.com"))
                    .collect(Collectors.toList());
            filteredLines.forEach(log::info);
        }
    }

The third is not commonly used, but elder martial brother also wants to teach you. This way is to use the scanner in the tool class. The scanner can split files by line breaks, which is also good:

public void withScanner() throws FileNotFoundException {
        FileInputStream fin = new FileInputStream(new File("src/main/resources/www.flydean.com"));
        Scanner scanner = new Scanner(fin,"UTF-8").useDelimiter("\n");
        String theString = scanner.hasNext() ? scanner.next() : "";
        log.info(theString);
        scanner.close();
    }

Read by byte

Little younger martial sister was very satisfied and hurriedly urged me: Senior brother F, I understand the character reading method. Read the bytes quickly.

I nodded. Younger martial sister, do you remember the essence of philosophy? Bytes are the essence of Java storage. Only by mastering the essence can we discover all hypocrisy.

Remember the files tool class mentioned before? This tool class provides many methods related to file operation, including the method of reading all bytes. Younger martial sister, please note that here is to read all bytes at one time! Be sure to use it with caution. It can only be used in scenes with few files. Remember.

public void readBytes() throws IOException {
        Path path = Paths.get("src/main/resources/www.flydean.com");
        byte[] data = Files.readAllBytes(path);
        log.info("{}",data);
    }

If it is a large file, you can use FileInputStream to read a certain number of bytes at a time:

public void readWithStream() throws IOException {
        File file = new File("src/main/resources/www.flydean.com");
        byte[] bFile = new byte[(int) file.length()];
        try(FileInputStream fileInputStream  = new FileInputStream(file))
        {
            fileInputStream.read(bFile);
            for (int i = 0; i < bFile.length; i++) {
                log.info("{}",bFile[i]);
            }
        }
    }

The stream is read byte by byte, which will be slow. We use filechannel and ByteBuffer in NiO to speed up the reading speed:

public void readWithBlock() throws IOException {
        try (RandomAccessFile aFile = new RandomAccessFile("src/main/resources/www.flydean.com","r");
             FileChannel inChannel = aFile.getChannel();) {
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            while (inChannel.read(buffer) > 0) {
                buffer.flip();
                for (int i = 0; i < buffer.limit(); i++) {
                    log.info("{}",buffer.get());
                }
                buffer.clear();
            }
        }
    }

Younger martial sister: is there a faster way to read very, very large files?

Of course, remember the mapping of virtual address space we talked about last time:

We can directly map the user's address space and the system's address space into the same virtual address memory at the same time, thus avoiding the performance overhead caused by copying:

public void copyWithMap() throws IOException{
        try (RandomAccessFile aFile = new RandomAccessFile("src/main/resources/www.flydean.com","r");
             FileChannel inChannel = aFile.getChannel()) {
             MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY,inChannel.size());
             buffer.load();
            for (int i = 0; i < buffer.limit(); i++)
            {
                log.info("{}",buffer.get());
            }
            buffer.clear();
        }
    }

Number of rows looking for errors

Younger martial sister: great! Elder martial brother F, you speak very well. Younger martial sister, I have another question: Recently, I have been doing file parsing. Some file formats are not standardized. The parsing fails halfway through the parsing, but there is no error prompt. It is difficult to locate the problem. Is there any good solution?

It's getting late. Elder martial brother will teach you another method. There is a class called linenumberreader in Java. You can use it to read files and print line numbers. Does it meet your needs

public void useLineNumberReader() throws IOException {
        try(LineNumberReader lineNumberReader = new LineNumberReader(new FileReader("src/main/resources/www.flydean.com")))
        {
            //输出初始行数
            log.info("Line {}",lineNumberReader.getLineNumber());
            //重置行数
            lineNumberReader.setLineNumber(2);
            //获取现有行数
            log.info("Line {} ",lineNumberReader.getLineNumber());
            //读取所有文件内容
            String line = null;
            while ((line = lineNumberReader.readLine()) != null)
            {
                log.info("Line {} is : {}",lineNumberReader.getLineNumber(),line);
            }
        }
    }
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>