DynaPDF Manual - Page 477
Previous Page 476 Index Next Page 478
Page 477 of 821
.Net, and C#. You should take a look into the examples text_extraction, text_extraction2, edit_text,
and text_search to determine how the function GetPageText() can be used.
If you need only a text search algorithm it is better to use the content parser of DynaPDF directly
because it is faster than GetPageText() (see the example text_search for further information).
The class CPDFEditText() (used in the example edit_text) contains already a rather complex and
complete text replacement algorithm that demonstrates how the functions ReplacePageText() and
ReplacePageTextEx() can be used. You should try to understand how this algorithm works so that
you can extend it. This class demonstrates especially how space characters can be identified and
how they must be handled when replacing texts. However, note that PDF files are generally not
designed to edit existing contents. Existing text should only be replaced if there is no other way to
achieve the same result or if only minor changes must be applied, e.g. replacing a misspelled word.
GetPageText() parses the content stream of the currently open page or template as it is at time of
executing the function. The content stream contains all operators and values which were added
beforehand with DynaPDF functions incl. the contents of imported PDF files. If texts should be
replaced or deleted it is usually best to process imported page(s) before adding new contents.
If the function succeeds and if further records are available the return value is 1. If the function fails
or if no further records are available the return value is 0.
If a content stream contains no text the return value is 0 and the members TextLen and KerningCount
are set to 0. If a content stream contains only one text record the return value is 0 that means that no
further records are available but the members TextLen and KerningCount are set to values greater 0.
const PPDF* IPDF) // Instance pointer
The function returns the width of the currently open page. If no open page can be detected the
return value is the default width which will be used for newly created pages. The page width refers
to the media box of a page. The real size is maybe smaller if a crop box is present. The crop takes
precedence because it crops the media box.
If SetUseVisibleCoords() was set to true, the function checks whether a cop box is present and
returns the size of this box if set. A PDF unit represents 1/72 inch. See also GetBBox().
Previous topic: How to calculate the rotation angle?, How to find and replace text in a page?
Next topic: GetPDFVersion, GetPDFVersionEx