The groff
command’s -k option calls the
preconv
preprocessor to perform input character encoding
conversions. Input to the GNU troff
formatter itself, on the
other hand, must be in a single-byte encoding compatible with ISO 646:1991 IRV (US-ASCII).
Certain macro files are responsible for translating input character
codes above 127 decimal to appropriate GNU troff
escape
sequences, and setting up hyphenation codes for letters their encodings
define; typically, they also invoke hcode
requests to case-fold
such letters where necessary so that they match hyphenation patterns.
As a rule, a localization file (recall see Manipulating Hyphenation)
loads one of these files; a document need not do so directly.
latin1
¶ISO Latin-1 is an encoding for Western European languages.
Any document that is encoded in ISO 646:1991 IRV (US-ASCII), or, equivalently, uses only code points from the “C0 Controls” and “Basic Latin” parts of the Unicode character set is also a valid ISO Latin-1 document; the standards are interchangeable in their first 128 code points.36
koi8-r
¶To use KOI8-R, an encoding for the Russian language, either place
‘.mso koi8-r.tmac’ at the very beginning of your document or
supply ‘-m koi8-r’ as a command-line argument to groff
. The
localization file ru.tmac takes care of this automatically; see
Manipulating Hyphenation.37
latin2
¶To use ISO Latin-2, an encoding for Central and Eastern European
languages, invoke ‘.mso latin2.tmac’ at the beginning of your
document or supply ‘-m latin2’ as a command-line argument to
groff
.
latin5
¶To use ISO Latin-5, an encoding for the Turkish language, invoke
‘.mso latin5.tmac’ at the beginning of your document or
supply ‘-m latin5’ as a command-line argument to groff
.
latin9
¶ISO Latin-9 succeeds Latin-1; it includes a Euro sign and better
coverage for French. To use this encoding, invoke ‘.mso latin9.tmac’ at the beginning of your document or supply
‘-m latin9’ as a command-line argument to groff
.
Some characters from an input encoding may not be available with a particular output driver, or their glyphs may not have representation in the font used. For terminal devices, fallbacks are defined, like ‘EUR’ for the Euro sign and ‘(C)’ for the copyright sign. For typesetter devices, you may need to “mount” fonts that support glyphs required by the document. See Font Positions.
Because a Euro glyph was not historically defined in PostScript fonts,
groff
comes with a font called freeeuro.pfa that provides
the Euro in several styles. Standard PostScript fonts contain the
glyphs from Latin-5 and Latin-9 that Latin-1 lacks, so these
encodings are supported for the ps and pdf output
devices as groff
ships, while Latin-2 is not.
Unicode supports characters from all other input encodings; the utf8 output driver for terminals therefore does as well. The DVI output driver supports the Latin-2 and Latin-9 encodings if the command-line option ‘-m ec’ is used as well. 38