DynaPDF Manual - Page 633
Previous Page 632 Index Next Page 634
Function Reference
Page 633 of 860
Whether it is possible to convert a PDF string to Unicode depends on whether the required
encoding information is available. This is always the case if a font uses a predefined encoding like
WinAnsi or MacRoman or if the glyph names of Type1 fonts are available in the Adobe Glyph List
or ZapfDingbats encoding.
Fonts which use a symbol encoding can provide a ToUnicode CMap which offers the required
mapping to Unicode. However, this CMap is optional and is not necessarily available. If a symbol
font does not contain a ToUnicode CMap the strings are converted to the code page 1252.
External CMaps
A widely used technique to reduce the amount of data that must stored in a PDF file is the usage of
non embedded CID fonts. CID fonts, whether embedded or not, can depend on external CMap files
which offer the required mapping to Unicode.
To process strings of such fonts correctly, DynaPDF must be able to load required CMap files if
necessary. Therefore, DynaPDF is delivered with the most important CMap files which are provided
by Adobe Systems. These CMaps can be found in the DynaPDF installation directory at
/Resource/CMap/. Applications which extract text from PDF files should include these CMaps so
that they can be loaded at runtime.
The search path to external CMaps must be set with SetCMapDir() before executing ParseContent()
the first time. The function creates a CMap cache that is hold in memory until the PDF instance will
be deleted. The search path(s) to external CMap files should be set only one time per PDF instance
and one PDF instance should be used to process so many PDF files as possible. This can significantly
improve processing speed.
If a required CMap is absent the Decoded parameter of the TShowTextArrayW callback function is
set to false and the string should be ignored in this case because no meaningful values can be
returned.
Inside the Callback Functions
The following sub-clauses describe important operations which must be executed in the callback
functions to achieve correct results.
Previous topic: Unicode conversion
Next topic: TBeginTemplate, TMulMatrix, TSetFont