Next: Debugging, Previous: Miscellaneous, Up: GNU troff Reference [Contents][Index]
gtroff
InternalsGNU troff
processes input in three steps. One or more input
characters are gathered into a token,140 the smallest meaningful
unit of troff
input. Then, one or more tokens are converted
to a node, a data structure representing any object that may
ultimately appear in the output, like a glyph or motion on the page.
Finally, nodes are converted to the device-independent output language
understood by all output drivers.
Actually, before step one happens, gtroff
converts certain
escape sequences into reserved input characters (not accessible by the
user); such reserved characters are used for other internal processing
also – this is the very reason why not all characters are valid input.
See Identifiers, for more on this topic.
For example, the input string ‘fi\[:u]’ is converted into a
character token ‘f’, a character token ‘i’, and a special
token ‘:u’ (representing u umlaut). Later on, the character
tokens ‘f’ and ‘i’ are merged into a single node representing
the ligature glyph ‘fi’ (provided the current font has a glyph for
this ligature); the same happens with ‘:u’. All output glyph nodes
are ‘processed’, which means that they are invariably associated with a
given font, font size, advance width, etc. During the formatting
process, gtroff
itself adds various nodes to control the data
flow.
Macros, diversions, and strings collect elements in two chained lists: a list of tokens that have been passed unprocessed, and a list of nodes. Consider the following diversion.
.di xxx a \!b c .br .di
It contains these elements.
node list | token list | element number |
line start node | — | 1 |
glyph node a | — | 2 |
word space node | — | 3 |
— | b | 4 |
— | \n | 5 |
glyph node c | — | 6 |
vertical size node | — | 7 |
vertical size node | — | 8 |
— | \n | 9 |
Elements 1, 7, and 8 are inserted by gtroff
; the latter
two (which are always present) specify the vertical extent of the last
line, possibly modified by \x
. The br
request finishes
the pending output line, inserting a newline token, which is
subsequently converted to a space when the diversion is interpolated.
Note that the word space node has a fixed width that isn’t adjustable
anymore. To convert horizontal space nodes back into tokens, use
the unformat
request.
Macros only contain elements in the token list (and the node list is empty); diversions and strings can contain elements in both lists.
The chop
request simply reduces the number of elements in a
macro, string, or diversion by one. Exceptions are compatibility
save and compatibility ignore tokens, which are ignored. The
substring
request also ignores those tokens.
Some requests like tr
or cflags
work on glyph identifiers
only; this means that the associated glyph can be changed without
destroying this association. This can be very helpful for substituting
glyphs. In the following example, we assume that glyph ‘foo’ isn’t
available by default, so we provide a substitution using the
fchar
request and map it to input character ‘x’.
.fchar \[foo] foo .tr x \[foo]
Now let us assume that we install an additional special font ‘bar’ that has glyph ‘foo’.
.special bar .rchar \[foo]
Since glyphs defined with fchar
are searched before glyphs in
special fonts, we must call rchar
to remove the definition of the
fallback glyph. Anyway, the translation is still active; ‘x’ now
maps to the real glyph ‘foo’.
Macro and request arguments preserve compatibility mode enablement.
.cp 1 \" switch to compatibility mode .de xx \\$1 .. .cp 0 \" switch compatibility mode off .xx caf\['e] ⇒ café
Since compatibility mode is enabled while de
is invoked, the
macro xx
enables compatibility mode when it is called. Argument
$1
can still be handled properly because it inherits the
compatibility mode enablement status that was active at the point where
xx
was called.
After interpolation of the parameters, the compatibility save and restore tokens are removed.
Next: Debugging, Previous: Miscellaneous, Up: GNU troff Reference [Contents][Index]