Pixel Building Forum

Text Layouting => Character Positioning => Topic started by: PiotrGrochowski on September 26, 2020, 07:43:51 pm


Title: What is text layouting, character positioning, The standard method
Post by: PiotrGrochowski on September 26, 2020, 07:43:51 pm
The entire category of Text Layouting is something unknown to the FreeType library, because layouting is out of scope for FreeType.

However, it is important to focus on Text Layouting anyway. It is out of scope for FreeType, yet in scope for TD renderer and is expected to be an important part of the Type Design ecosystem.

Text Layouting involves finding ways to layout all text. It is not a trivial task due to newlines (0x000D 0x000A), surrogates (0xD800 to 0xDBFF, 0xDC00 to 0xDFFF), wraps (when text line would be wider than the text box) and tab stops (0x0009).

Character Positioning is finding ways to position all characters. Standards Unicode handling, like special Arabic forms and Thai combining and so on, is out of scope for the standard method because it will be extremely complex.

What then, is the standard method?

This is what it looks like:

Before any read of input text we find the font top and font bottom. This should be determined in OpenType fonts by taking WinAscent and WinDescent and scaling it to the ppem size and rounding (half pixel rounds away from baseline in font height boundaries). In raw bitmap font format there is no concept of baseline but the font height boundaries are implicit to be the top and bottom of the bitmap after the font width and font height are known.

The font top and font bottom, the difference being the font height. It is critical because every line advances by font height amount of pixels.

Now we can start interpreting. Assume UTF-16 is being used to render from. What to render? The first character? Not yet. We have to keep special rules throughout Text Layouting.

We render the characters, the rules:

0x0009 is the tab, write a 0x0020 character and advance to the next tab stop after the current position, the distance between tab stops is to sum widths of 0x0041 to 0x005A and sum widths of 0x0061 to 0x007A and take the sum of that, add 26 and divide by 52, multiply by 8.

0x000D 0x000A is the newline. This specific sequence will move the current position to the beginning of the line, then move down by the font height.

0xD800 to 0xDFFF are surrogates. Use the UTF-16 rule to interpret a pair of surrogates, 0xD800 to 0xDBFF followed by 0xDC00 to 0xDFFF, as a code point. It is still rendered as a single character.

The wrap rule. Not everything will fit, so an auto wrap is used. Wraps may occur on space (0x0020) or tab (0x0009). Find the largest extent of text that will fit before a space or tab, then render it as well space or tab (possibly beyond the bounds). After a space or tab hanging beyond bounds there may not be another such space or tab, it must go on another line, moving down by the font height.

The select rule: Render all characters in the selection range in the selection foreground and background instead of the text box foreground and background. When a tab is selected, the entire extent of tab is selected, not the rendered space character.

Cursor inversion: The cursor is placed on the current position a single pixel horizontally covering the font height vertically. It inverts the colors.

Cursor on the right: If the vertical cursor would be rendered to the right beyond the bounds it is rendered on the right edge instead