Java Apache POI reads and writes Word Doc files
Reading and writing Word Doc files using poi
The hwpf module of Apache POI is specially used to read and write Word Doc files. In hwpf, we use hwpfdocument to represent a word document. There are several concepts in hwpfdocument:
Range: it represents a range, which can be the whole document, a section, a paragraph, or a text with common attributes (characterrun).
Section: a section of a word document. A word document can be composed of multiple sections.
Paragraph: a paragraph in a word document. A section can be composed of multiple paragraphs.
Characterrun: a piece of text with the same attributes. A paragraph can consist of multiple characterruns.
Table: a table.
Tablerow: the row corresponding to the table.
TableCell: the cell corresponding to the table.
Section, paragraph, characterrun, and table all inherit from range.
1. Read word doc file
In daily application, it is very rare for us to read information from word files. More often, we write the content into word files. There are two main ways to read data from Word Doc files using POI: through wordextractor and through hwpfdocument. When reading information inside wordextractor, it is obtained through hwpfdocument.
1.1 reading files through wordextractor
When using wordextractor to read a file, we can only read the text content of the file and some attributes based on the document, but we can't read the attributes of the document content. If you want to read the properties of the document content, you need to use hwpfdocument to read it. The following is an example of reading a file using wordextractor:
1.2 reading documents through hwpfdocument
Hwpfdocument is the representative of the current word document, and its function is stronger than wordextractor. Through it, we can read the tables and lists in the document, and add, modify and delete the contents of the document. Only after these additions, modifications and deletions, the relevant information is saved in the hwpfdocument, that is, what we change is the hwpfdocument, not the file on the disk. If you want these changes to take effect, we can call the write method of hwpfdocument to output the modified hwpfdocument to the specified output stream. This can be the output stream of the original file, the output stream of the new file (equivalent to saving as) or other output streams. The following is an example of reading a file through hwpfdocument:
2. Write word doc file
When using POI to write Word Doc files, we must have a doc file first, because we write the doc file through hwpfdocument, which is attached to a document. Therefore, the usual approach is to prepare a document with blank content on the hard disk, and then create an hwpfdocument based on the blank file. After that, we can add new content to the hwpfdocument, and then write it to another doc file. This is equivalent to generating a word doc file using poi.
In practical application, when we generate word files, we always generate a certain type of files. The format of such files is fixed, but some fields are different. Therefore, in practical application, we do not need to generate the contents of the whole word file through hwpfdocument. Instead, create a new word document on the disk, whose content is the content of the word file we need to generate, and then replace some of the contents belonging to variables with a method similar to "${paramname}". So when we generate word files based on some information, we only need to get the HWPFDocument based on the word file. Then we call Range's replaceText () method to replace the corresponding variables to the corresponding values, and then write the current HWPFDocument into the new outgoing stream. This method is used more in practical application, because it can not only reduce our workload, but also make the text format more clear. Let's take an example based on this approach.
Suppose we now have some changed information, and then we need to generate Word Doc files in the following format through this information:
According to the above description, in the first step, we create a doc file in the corresponding format as a template. Its content is as follows:
With such a template, we can create the corresponding hwpfdocument, replace the corresponding variable with the corresponding value, and then output the hwpfdocument to the corresponding output stream. The following is the corresponding code.
(Note: This article is written based on POI 3.9)
Thank you for reading, hope to help you, thank you for your support to this site!