DynaPDF Manual - Page 55

Previous Page 54   Index   Next Page 56

Content parsing & editing
Page 55 of 860
DeleteText
Syntax:
LBOOL psrDeleteText(
const PPDF* IPDF,
// PDF instance pointer
const IPSR* Ctx,
// Parser instance pointer
struct TFltRect* Area,
// Required
TDeleteTextFlags Flags) // See below
The function deletes every glyph or character that touches or lies inside the rectangle Area.
Area must be defined as if the page would be viewed in a PDF viewer. That means in bottom up
coordinates and the orientation must be considered (see GetPageOrientation()). The width and
height of a page must be calculated from the crop box if set, or from the media box otherwise
(see GetPageBBox()). Note also that the width and height must be exchanged if the orientation is
90, -90, 270, or -270 degrees.
Note that this function deletes text only. Text can also occur in form of images or vector
graphics. There are no functions yet to identify and delete text in such objects.
Return values:
If the function succeeds the return value is 1. If the function fails the return value is 0.
ExtractText
Syntax:
LBOOL psrExtractText(
const PPDF* IPDF,
// PDF instance pointer
const IPSR* Ctx,
// Parser instance pointer
TTextExtractionFlags Flags, // See below
struct TFltRect* Area,
// Optional search area
UI16** Text,
// Required, address of UI16* variable
UI32* TextLen)
// Required, address of a UI32 variable
typedef enum TTextExtractionFlags
{
tefDefault
= 0, // Create text lines in the original order.
tefSortTextX
= 1, // Sort text records in x-direction.
tefSortTextY
= 2, // Sort text records in y-direction.
tefSortTextXY
= tefSortTextX | tefSortTextY,
tefDeleteOverlappingText = 4
// Text extraction only.
}TTextExtractionFlags;
The function extracts the text of a page with the same algorithm that FindText() uses to find text
on a page. In order to get exactly the same result the flag tefSortTextX must be set.
The function ExtractText() of the PDF instance calls in fact this function internally.
The optional parameter Area can be set to restrict text extraction to that rectangle. The rectangle
must be defined as if the page would be viewed in a PDF viewer. That means in bottom up
 

Previous topic: DeleteOperator, DeleteOperatorInObject, DeleteParserContext

Next topic: FindText, Optional search area