Monkeybread Software - DynaPDF Manual

DynaPDF Manual - Page 607

Function Reference

Page 607 of 839

Transform(m, x2, y2); // End point of the text record

double realTextWidth = CalcDistance(x1, y1, x2, y2);

The end point of a text record is usually required to determine whether the next record lies on the

same line. An algorithm that is able to construct text lines in arbitrary rotated coordinate systems is

provided in the example Text Extraction which is delivered with all DynaPDF versions.

Character Spacing

As described above the current character spacing is already considered in the text width that is

provided in all text callback functions. However, the value must be stored in the graphics state if the

width of a sub string must be computed. Character spacing is measured in unscaled font units. The

required transformation to text space is done in functions like GetTextWidth().

Word Spacing

Like character spacing, the current word spacing is already considered in the text width that is

provided in all text callback functions. However, word spacing applies to the space character of

simple fonts only.

An application that extracts text from PDF files maybe wants to preserve the original formatting of

the text. In this case, the distance between two words in the same text record must be known, e.g. to

insert a number of spaces to emulate the word spacing.

However, note that the current word spacing must be ignored if the font type is ftType0 (the font

type is a parameter of the graphics state and is set with the TSetFont callback function).

Another thing that must be considered is that word and character spacing are measured in unscaled

font units. The width of a space character including word spacing can be calculated with the

function GetTextWidth() that is part of the font API (the name is fntGetTextWidth() in C/C++).

An algorithm that considers word spacing must check whether the source string contains space

characters. If a space was found, the width of the sub string that occurs before must be calculated so

that the start and end point of the word can be calculated. Additional spaces can be skipped and the

cursor position is updated to the position behind the spaces. Processing continues until the entire

text of the record was processed.

An algorithm that processes text in this way calculates essentially the start and end coordinates of

every text part that is either separated by spaces or kerning space.

The required source code looks as follows (C++):

// The following code fragment uses the TShowTextArrayW callback function.

SI32 parseShowTextArrayW(const void* Data, const struct TTextRecordA*

Source, struct TCTM* Matrix, const struct TTextRecordW* Kerning, UI32

Count, double Width, LBOOL Decoded)

{

if (!Decoded) return 0;

double x1 = 0.0;

double y1 = 0.0;

Previous topic: Text Width