Determine whether the document is doc or docx in a Java application without knowing its extension

There is a constraint in the content management system that requires all word documents with a specific extension to be stored (different from doc or docx) However, when outputting a document to users, we need to know whether it is a doc or docx file to provide the correct MIME type

So, is there any way to programmatically find out whether the document is doc or docx through its content?

Solution

Here is a link to the forensics wiki, detailing many different file types It describes the titles of DOC and docx files, so you should be able to parse the files and determine their types

View links The doc file is an OLE composite file that should have the following binary headers:

d0 cf 11 e0 a1 b1 1a e1

contrary,. Docx files will have binary signatures:

50 4b
The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>