DynaPDF Manual - Page 599

Previous Page 598   Index   Next Page 600

Function Reference
Page 599 of 839
The function parses the content stream of the current open page. The content parser can be used to
extract text, images, and vector graphics from a PDF file. The parameter Stack holds a set of callback
functions which are executed if corresponding operators were found in the content stream. The
parameter Data is a user defined pointer that is passed unchanged to the callback functions.
All callback functions are optional. Which callback functions must be set depends on the kind of
information that should be extracted. For example, an application that extracts images must at least
provide the callback function TInsertImage.
All callback functions which return an integer value can break processing if necessary. A return
value of zero indicates success and processing continues. A return value of 1 of the TBeginTemplate
or TBeginPattern callback functions indicates that the object should be skipped. The corresponding
content streams are not executed in this case. This can be useful when extracting images. Any other
return value breaks processing.
Notice:
It is allowed to write arbitrary objects into the page while the content parser is executed but it is
strongly required to check whether a fatal error occurred when writing something to the page.
The callback function must return a negative value in such a case to break processing. This is
required because the parser doesn't notice when a fatal error occurs. New objects will be
ignored when parsing a page.
ParseContent() is already part of DynaPDF since version 2.0.30 but it was never documented.
Because the function was undocumented, a few important changes were made in DynaPDF 2.5
which do not break backward compatibility. However, an application that uses the following
features must be slightly changed when it is recompiled:
The flag pfTranslateStrings is no longer defined because the used callback function specifies
already whether strings should be converted to Unicode. The old constant 1 is ignored.
Two additional callback functions TShowTextA and TShowTextW were defined at the
reserved fields Reserved0001 and Reserved0002. These callback functions are no longer
defined and no longer executed. These functions could be used in combination with
TShowTextArrayA or TShowTextArrayW only. DynaPDF processes now the entire text with
the array versions. So, existing applications will still work as expected.
The function supports several flags which are useful when extracting images from a PDF file. These
flags are meaningful only, if the TInsertImage callback function is set.
 

Previous topic: ParseContent

Next topic: The Graphics State