Writing a Russian PDF using the Java pdfbox Library

I'm using a named PDF@R_694_2419 @Java library, trying to write text to PDF It works for English text, but when I try to write Russian text in PDF, these letters look strange It seems that the problem is the font used, but I'm not sure, so I hope if someone can guide me through this Here are the important lines of code:

PDTrueTypeFont font = PDTrueTypeFont.loadTTF( pdfFile,new File( "fonts/VREMACCI.TTF" ) );  // Windows Russian font imported to write the Russian text.
font.setEncoding( new WinAnsiEncoding() );  // Define the Encoding used in writing.
// Some code here to open the PDF & define a new page.
contentStream.drawString( "отделом компьютерной" ); // Write the Russian text.

Winansiencoding source code is: click here

——————- edited on November 18, 2009

After some investigation, I now determine that this is a coding problem, which can be helpful through the use of PDF@R_694_2419 @The class name is dictionaryencoding, and I define my own coding to solve it

I don't know how to use it, but here's what I've tried so far:

COSDictionary cosDic = new COSDictionary();
cosDic.setString( COSName.getPDFName("Ercyrillic"),"0420 " ); // Russian letter.
font.setEncoding( new DictionaryEncoding( cosDic ) );

This doesn't work because I seem to fill out the dictionary in the wrong way. When I write a PDF page with this text, it appears blank

The source code of dictionaryencoding is: click here

Solution

The long story goes like this - in order to output Unicode from PDF in TrueType font, the output must contain a lot of detailed and seemingly redundant information It boils down to – within TrueType fonts, glyphs are stored as glyphs IDs These glyphs are associated with specific Unicode characters (and IIRC, Unicode glyphs can refer to several code points, such as e and sharp accent - my memory is hazy) Pdf does not really have Unicode support except for the mapping from utf16be value in string to glyph IDS in TrueType font and from utf16be value to unicode (even identity)

>A font Dictionary of type type0

>An array of descendantfonts with entries described below > to unicode entries that map utf16be values to unicode > set an encoding to identity-h

The output of a unit test in my own tool is as follows:

13 0 obj
<< 
   /BaseFont /DejaVuSansCondensed 
   /DescendantFonts [ 4 0 R  ]   
   /ToUnicode 14 0 R 
   /Type /Font 
   /Subtype /Type0 
   /Encoding /Identity-H 
>> endobj

14 0 obj
<< /Length 346 >> stream
/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
/Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS
def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 1
beginbfrange <0000> <FFFF> <0000> endbfrange endcmap CMapName currentdict /CMap
defineresource pop end end

Endstream% please note that the format of the stream is incorrect

>A font Dictionary of subtype cidfonttype2

>Cidssyteinfo > a fontdescriptor > DW and W > map cidtogidmap to character id to glyph ID

This is one in the same test – this is the object in the descendantfonts array:

4 0 obj
<< 
   /Subtype /CIDFontType2 
   /Type /Font 
   /BaseFont /DejaVuSansCondensed 
   /CIDSystemInfo 8 0 R 
   /FontDescriptor 9 0 R 
   /DW 1000 
   /W 10 0 R 
   /CIDToGIDMap 11 0 R 
>>

8 0 obj
<< 
   /Registry (Adobe)
   /Ordering (UCS)
   /Supplement 0 
>>
endobj

Why do I say that? It and PDF@R_694_2419 @What does it matter? Just like this: the Unicode output in PDF is frankly painful in docking Acrobat was developed before Unicode. From the beginning, it was painful to have no CJK encoding of Unicode (I know - I work on acrobat) Later Unicode support was added, but it really felt suppressed People want you to just say / encoding / Unicode and have a string that starts with a prick and y-break character instead of you No such luck If you don't convert every detail (real acrobat, embedded postscript program to unicode? Wth?), You will get a blank page in acrobat I swear, I didn't do that

At this point, I wrote a PDF generation tool for a separate company (. Net now, so it won't help you), and I made a design goal to hide all this nonsense All text is Unicode - if you only use the same character codes as winansi, that's what you get Use anything else and you'll get all these other things If PDF@R_694_2419 @I'd be surprised to be able to work for you - it's a serious trouble

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>