Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]
When filling, GNU troff
hyphenates words as needed at
user-specified and automatically determined hyphenation points. The
machine-driven determination of hyphenation points in words requires
algorithms and data, and is susceptible to conventions and preferences.
Before tackling such automatic hyphenation, let us consider how
hyphenation points can be set explicitly.
Explicitly hyphenated words such as “mother-in-law” are eligible for
breaking after each of their hyphens. Relatively few words in a
language offer such obvious break points, however, and automatic
detection of syllabic (or phonetic) boundaries for hyphenation is not
perfect,69 particularly for
unusual words found in technical literature. We can instruct GNU
troff
how to hyphenate specific words if the need arises.
Define each hyphenation exception word comprising ordinary or special characters with each hyphen-minus ‘-’ in word indicating a hyphenation point. For example, the request
.hw in-sa-lub-rious alpha
marks potential hyphenation points in “insalubrious”, and prevents “alpha” from being hyphenated at all.
Besides the space character, any character whose hyphenation code is
zero can be used to separate the arguments (see the hcode
request
below).
Hyphenation points specified with hw
are not subject to the
within-word placement restrictions imposed by the hy
request (see
below).
Hyphenation exceptions specified with the hw
request are
associated with the hyphenation language (see the hla
request
below) and environment (see Environments); invoking the hw
request in the absence of a hyphenation language is an error.
The request is ignored if there are no parameters.
You can obtain a report of hyphenation exceptions on the standard error
stream with the phw
request. See Debugging.
These are known as hyphenation exceptions in the expectation
that most users will avail themselves of automatic hyphenation; these
exceptions override any rules that would normally apply to a word
matching a hyphenation exception defined with hw
.
Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar specialized text needs to be breakable in sensible places without hyphenation.
To tell GNU troff
how to hyphenate words as they occur in input,
use the \%
escape sequence; it is the default hyphenation
character. Each instance within a word indicates to GNU troff
that the word may be hyphenated at that point, while prefixing a word
with this escape sequence prevents it from being otherwise hyphenated.
This mechanism affects only that occurrence of the word; to change the
hyphenation of a word for the remainder of input processing, use the
hw
request.
GNU troff
regards the escape sequences \X
and \Y
as
starting a word; that is, the \%
escape sequence in, say,
‘\X'...'\%foobar’ or ‘\Y'...'\%foobar’ no longer
prevents hyphenation of ‘foobar’ but inserts a hyphenation point
just prior to it; most likely this isn’t what you want.
See Postprocessor Access.
\:
inserts a non-printing break point; that is, a word can break
there, but the soft hyphen glyph (see below) is not written to the
output if it does. The remainder of the word is subject to hyphenation
as normal.
You can combine \:
and \%
to control breaking of a file
name or URL, or to permit hyphenation only after certain explicit
hyphens within a word.
The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce was, in retrospect, inevitable once the contents of \%/var/log/\:\%httpd/\:\%access_log on the family web server came to light, revealing visitors from Hogwarts.
Change the hyphenation character to char. This character then
works as the \%
escape sequence normally does, and thus no longer
appears in the output.70 Without an
argument, hc
resets the hyphenation character to \%
(the
default). The hyphenation character is associated with the environment
(see Environments).
Set the soft hyphen character, inserted when a word is hyphenated
automatically or at a hyphenation character, to the ordinary or special
character c.71 If the argument is omitted, the soft
hyphen character is set to the default, \[hy]
. If no glyph for
c exists in the font in use at a potential hyphenation point, then
the line is not broken there. Neither character definitions (specified
with the char
and similar requests) nor translations (specified
with the tr
request) are applied to c.
Several requests influence automatic hyphenation. Because conventions
vary, a variety of hyphenation modes is available to the hy
request; these determine whether hyphenation will apply to a
word prior to breaking a line at the end of a page (more or less; see
below for details), and at which positions within that word
automatically determined hyphenation points are permissible. The places
within a word that are eligible for hyphenation are determined by
language-specific data and lettercase relationships. Furthermore,
hyphenation of a word might be suppressed due to a limit on
consecutive hyphenated lines (hlm
), a minimum line length
threshold (hym
), or because the line can instead be adjusted with
additional inter-word space (hys
).
Set automatic hyphenation mode to mode, an integer encoding
conditions for hyphenation; if omitted, the configured hyphenation mode
default (see below) is implied. The hyphenation mode is available in
the read-only register ‘.hy’; it is associated with the environment
(see Environments). The hyphenation mode default depends on the
localization file loaded when GNU troff
starts up; see the
hpf
request below. If no localization file is loaded, the
default is ‘1’.
Typesetting practice generally does not avail itself of every
opportunity for hyphenation, but the details differ by language and site
mandates. The hyphenation modes of AT&T troff
were
implemented with English-language publishing practices of the 1970s in
mind, not a scrupulous enumeration of conceivable parameters. GNU
troff
extends those modes such that finer-grained control is
possible, favoring compatibility with older implementations over a more
intuitive arrangement. The means of hyphenation mode control is a set
of numbers that can be added up to encode the behavior
sought.72 The entries in the
following table are termed values; the sum of the desired
values is the mode.
0
disables hyphenation.
1
enables hyphenation except after the first and before the last character of a word.
The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as ‘.hy 1’, and then apply or lift restrictions relative to that basis.
2
disables hyphenation of the last word on a page or column,73 even for explicitly hyphenated words.
4
disables hyphenation before the last two characters of a word.
8
disables hyphenation after the first two characters of a word.
16
enables hyphenation before the last character of a word.
32
enables hyphenation after the first character of a word.
Apart from value 2, restrictions imposed by the hyphenation mode
are not respected for words whose hyphenations have been
specified with the hyphenation character (‘\%’ by default) or the
hw
request.
Nonzero values in the previous table are additive. For example,
mode 12 causes GNU troff
to hyphenate neither the last two
nor the first two characters of a word. Some values cannot be used
together because they contradict; for instance, values 4 and 16,
and values 8 and 32. As noted, it is superfluous to add 1 to any
non-zero even mode.
The automatic placement of hyphens in words is determined by pattern files, which are derived from TeX and available for several languages. These files are named hyphen.xx (for the patterns) and hyphenex.xx (for a list of exceptions in languages that require them) where xx is an ISO 639 language code; see the table below.
The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can’t be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file—you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to ‘echo $(nroff)’.
.ll 1 .hy 48 splitting
You will get
s- plit- t- in- g
instead of the correct ‘split- ting’. English patterns as distributed
with GNU troff
need two characters at the beginning and three
characters at the end; this means that value 4 of hy
is
mandatory. Value 8 is possible as an additional restriction, but
values 16 and 32 should be avoided, as should mode 1.
Modes 4 and 6 are typical.
A table of left and right minimum character counts for hyphenation as
needed by the patterns distributed with GNU troff
follows; see
the groff_tmac(5) man page for more information on GNU
troff
’s language macro files.
language | pattern name | left min | right min |
---|---|---|---|
Czech | cs | 2 | 2 |
English | en | 2 | 3 |
French | fr | 2 | 3 |
German traditional | det | 2 | 2 |
German reformed | den | 2 | 2 |
Italian | it | 2 | 2 |
Russian | ru | 2 | 2 |
Spanish | es | 2 | 2 |
Swedish | sv | 1 | 2 |
Hyphenation exceptions within pattern files (i.e., the words within a
TeX \hyphenation
group) obey the hyphenation restrictions
given by hy
.
Disable automatic hyphenation; i.e., set the hyphenation mode to 0
(see above). The hyphenation mode of the last call to hy
is not
remembered, but invoking hy
without an argument restores the
hyphenation mode default; groff
’s localization macro files do so
for the languages listed above.
Set hyphenation mode default to mode, configuring the value the
automatic hyphenation mode takes if hy
is invoked without an
argument. The hyphenation mode default is available in the read-only
register ‘.hydefault’; it is associated with the environment
(see Environments).
Read hyphenation patterns from pattern-file, which is sought
in the same way that macro files are with the mso
request or the
-m mac command-line option to groff
. The
pattern-file should have the same format as (simple) TeX
pattern files. More specifically, the following scanning rules are
implemented.
\$
are not supported.
^^xx
(where each x is 0–9 or a–f) and
^^c
(character c in the code point range 0–127
decimal) are recognized; other uses of ^
cause an error.
hpf
checks for the expression \patterns{…}
(possibly with whitespace before or after the braces). Everything
between the braces is taken as hyphenation patterns. Consequently,
{
and }
are not allowed in patterns.
\hyphenation{…}
gives a list of hyphenation
exceptions.
\endinput
is recognized also.
\patterns
is missing, the whole
file is treated as a list of hyphenation patterns (except that the
%
character is recognized as the start of a comment).
The hpfa
request appends a file of patterns to the current list.
The set of hyphenation patterns is associated with the language set by
the hla
request (see below). The hpf
request is usually
invoked by a localization file loaded by the troffrc
file.74
A second call to hpf
(for the same language) replaces the
hyphenation patterns with the new ones. Invoking hpf
or
hpfa
causes an error if there is no hyphenation language. If no
hpf
request is specified (either in the document, in a file
loaded at startup, or in a macro package), GNU troff
won’t
automatically hyphenate at all.
For automatic hyphenation to work, the formatter must know which letters
are equivalent; for example, the letter ‘E’ behaves like ‘e’;
only the latter typically appears in hyphenation pattern files. GNU
troff
expects characters that participate in automatic
hyphenation to be assigned hyphenation codes that define these
equivalence classes. At startup, GNU troff
assigns
hyphenation codes to the letters ‘a’–‘z’, applies the same
codes to ‘A’–‘Z’ in one-to-one correspondence, and assigns a
code of zero to all other characters.
The
hcode
request extends this principle to letters
outside the Unicode basic Latin set;
without it,
words containing such letters
won’t hyphenate properly
even if the corresponding hyphenation patterns contain them.
Set the hyphenation code of ordinary or special character dst1 to
that of src1, and so on. dst1 must be an ordinary character
(other than a numeral) or a special character, and src1 must be an
ordinary character (other than a numeral) or a special character to
which a hyphenation code has already been applied. Assigning the code
of an ordinary character to itself effectively creates a unique
hyphenation code (which can then be copied to others). hcode
ignores spaces between arguments. If any argument is invalid,
hcode
reports an error and stops reading them.
For example, the following hcode
requests are necessary to assign
hyphenation codes to the letters ‘ÄäÖöÜüß’, needed for German.
.hcode ä ä Ä ä .hcode ö ö Ö ö .hcode ü ü Ü ü .hcode ß ß
Without these assignments, GNU troff
treats the German word
‘Kindergärten’ (the plural form of ‘kindergarten’) as two words
‘kinderg’ and ‘rten’ because the hyphenation code of the
umlaut a is zero by default, just like a space. There is a German
hyphenation pattern that covers ‘kinder’, so GNU troff
finds the hyphenation ‘kin-der’. The other two hyphenation points
(‘kin-der-gär-ten’) are missed.
To remove a character’s hyphenation code, copy the code of a character with a hyphenation code value of zero to it. For example, ‘.hcode ß $’ removes the hyphenation code from ‘ß’ (unless ‘$’ has already been assigned a different one).
The phcode
request may be helpful to troubleshoot hyphenation
code assignments. See Debugging.
Caution: This request will be withdrawn in a future
groff
release. Use hcode
instead.
The hpfcode
request defines mapping values for character codes in
pattern files. It is an older mechanism no longer used by GNU
troff
’s own macro files. hpf
or hpfa
apply the
mapping after reading the patterns but before replacing or appending to
the active list of patterns. Its arguments are pairs of character
codes—integers from 0 to 255. The request maps character
code a to code b, code c to
code d, and so on. Character codes that would otherwise be
invalid in GNU troff
can be used.
Set the hyphenation language to lang, or clear it if there is no
argument. Hyphenation exceptions specified with the hw
request
and hyphenation patterns and exceptions specified with the hpf
and hpfa
requests are associated with the hyphenation language.
The hla
request is usually invoked by a localization file, which
is turn loaded by the troffrc or troffrc-end file; see the
hpf
request above.
The hyphenation language is available in the read-only string-valued register ‘.hla’; it is associated with the environment (see Environments).
If no hyphenation language is set, automatic hyphenation will not occur.
Set the maximum quantity of consecutive hyphenated lines to n. If
n is negative, there is no maximum. If omitted, n
is -1. This value is associated with the environment
(see Environments). Only lines output from a given environment
count toward the maximum associated with that environment. Hyphens
resulting from \%
are counted; explicit hyphens are not.
The .hlm
read-only register stores this maximum. The count of
immediately preceding consecutive hyphenated lines is available in the
read-only register .hlc
.
Set the (right) hyphenation margin to length. If the adjustment mode is not ‘b’ or ‘n’, the line is not hyphenated if it is shorter than length. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is ‘m’. The hyphenation margin is associated with the environment (see Environments).
A negative argument resets the hyphenation margin to zero, emitting a warning in category ‘range’.
The hyphenation margin is available in the .hym
read-only
register.
Suppress hyphenation of the line in adjustment modes ‘b’ or ‘n’ if that adjustment can be achieved by adding no more than hyphenation-space extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is ‘m’. The hyphenation space adjustment threshold is associated with the environment (see Environments).
A negative argument resets the hyphenation space adjustment threshold to zero, emitting a warning in category ‘range’.
The hyphenation space adjustment threshold is available in the
.hys
read-only register.
Next: Manipulating Spacing, Previous: Manipulating Filling and Adjustment, Up: GNU troff Reference [Contents][Index]