DynaPDF Manual - Page 607

Previous Page 606   Index   Next Page 608

Function Reference
Page 607 of 839
Transform(m, x2, y2); // End point of the text record
double realTextWidth = CalcDistance(x1, y1, x2, y2);
The end point of a text record is usually required to determine whether the next record lies on the
same line. An algorithm that is able to construct text lines in arbitrary rotated coordinate systems is
provided in the example Text Extraction which is delivered with all DynaPDF versions.
Character Spacing
As described above the current character spacing is already considered in the text width that is
provided in all text callback functions. However, the value must be stored in the graphics state if the
width of a sub string must be computed. Character spacing is measured in unscaled font units. The
required transformation to text space is done in functions like GetTextWidth().
Word Spacing
Like character spacing, the current word spacing is already considered in the text width that is
provided in all text callback functions. However, word spacing applies to the space character of
simple fonts only.
An application that extracts text from PDF files maybe wants to preserve the original formatting of
the text. In this case, the distance between two words in the same text record must be known, e.g. to
insert a number of spaces to emulate the word spacing.
However, note that the current word spacing must be ignored if the font type is ftType0 (the font
type is a parameter of the graphics state and is set with the TSetFont callback function).
Another thing that must be considered is that word and character spacing are measured in unscaled
font units. The width of a space character including word spacing can be calculated with the
function GetTextWidth() that is part of the font API (the name is fntGetTextWidth() in C/C++).
An algorithm that considers word spacing must check whether the source string contains space
characters. If a space was found, the width of the sub string that occurs before must be calculated so
that the start and end point of the word can be calculated. Additional spaces can be skipped and the
cursor position is updated to the position behind the spaces. Processing continues until the entire
text of the record was processed.
An algorithm that processes text in this way calculates essentially the start and end coordinates of
every text part that is either separated by spaces or kerning space.
The required source code looks as follows (C++):
// The following code fragment uses the TShowTextArrayW callback function.
SI32 parseShowTextArrayW(const void* Data, const struct TTextRecordA*
Source, struct TCTM* Matrix, const struct TTextRecordW* Kerning, UI32
Count, double Width, LBOOL Decoded)
{
if (!Decoded) return 0;
double x1 = 0.0;
double y1 = 0.0;
 

Previous topic: Text Width

Next topic: Text Scaling, Sub string coordinates