Next: , Previous: , Up: GNU troff Reference   [Contents][Index]


5.10 Manipulating Hyphenation

When filling, GNU troff hyphenates words as needed at user-specified and automatically determined hyphenation points. The machine-driven determination of hyphenation points in words requires algorithms and data, and is susceptible to conventions and preferences. Before tackling such automatic hyphenation, let us consider how hyphenation points can be set explicitly.

Explicitly hyphenated words such as “mother-in-law” are eligible for breaking after each of their hyphens. Relatively few words in a language offer such obvious break points, however, and automatic detection of syllabic (or phonetic) boundaries for hyphenation is not perfect,69 particularly for unusual words found in technical literature. We can instruct GNU troff how to hyphenate specific words if the need arises.

Request: .hw word …

Define each hyphenation exception word comprising ordinary or special characters with each hyphen-minus ‘-’ in word indicating a hyphenation point. For example, the request

.hw in-sa-lub-rious alpha

marks potential hyphenation points in “insalubrious”, and prevents “alpha” from being hyphenated at all.

Besides the space character, any character whose hyphenation code is zero can be used to separate the arguments (see the hcode request below).

Hyphenation points specified with hw are not subject to the within-word placement restrictions imposed by the hy request (see below).

Hyphenation exceptions specified with the hw request are associated with the hyphenation language (see the hla request below) and environment (see Environments); invoking the hw request in the absence of a hyphenation language is an error.

The request is ignored if there are no parameters.

You can obtain a report of hyphenation exceptions on the standard error stream with the phw request. See Debugging.

These are known as hyphenation exceptions in the expectation that most users will avail themselves of automatic hyphenation; these exceptions override any rules that would normally apply to a word matching a hyphenation exception defined with hw.

Situations also arise when only a specific occurrence of a word needs its hyphenation altered or suppressed, or when a URL or similar specialized text needs to be breakable in sensible places without hyphenation.

Escape sequence: \%
Escape sequence: \:

To tell GNU troff how to hyphenate words as they occur in input, use the \% escape sequence; it is the default hyphenation character. Each instance within a word indicates to GNU troff that the word may be hyphenated at that point, while prefixing a word with this escape sequence prevents it from being otherwise hyphenated. This mechanism affects only that occurrence of the word; to change the hyphenation of a word for the remainder of input processing, use the hw request.

GNU troff regards the escape sequences \X and \Y as starting a word; that is, the \% escape sequence in, say, ‘\X'...'\%foobar or ‘\Y'...'\%foobar no longer prevents hyphenation of ‘foobar’ but inserts a hyphenation point just prior to it; most likely this isn’t what you want. See Postprocessor Access.

\: inserts a non-printing break point; that is, a word can break there, but the soft hyphen glyph (see below) is not written to the output if it does. The remainder of the word is subject to hyphenation as normal.

You can combine \: and \% to control breaking of a file name or URL, or to permit hyphenation only after certain explicit hyphens within a word.

The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce
was, in retrospect, inevitable once the contents of
\%/var/log/\:\%httpd/\:\%access_log on the family web
server came to light, revealing visitors from Hogwarts.
Request: .hc [char]

Change the hyphenation character to char. This character then works as the \% escape sequence normally does, and thus no longer appears in the output.70 Without an argument, hc resets the hyphenation character to \% (the default). The hyphenation character is associated with the environment (see Environments).

Request: .shc [c]

Set the soft hyphen character, inserted when a word is hyphenated automatically or at a hyphenation character, to the ordinary or special character c.71 If the argument is omitted, the soft hyphen character is set to the default, \[hy]. If no glyph for c exists in the font in use at a potential hyphenation point, then the line is not broken there. Neither character definitions (specified with the char and similar requests) nor translations (specified with the tr request) are applied to c.

Several requests influence automatic hyphenation. Because conventions vary, a variety of hyphenation modes is available to the hy request; these determine whether hyphenation will apply to a word prior to breaking a line at the end of a page (more or less; see below for details), and at which positions within that word automatically determined hyphenation points are permissible. The places within a word that are eligible for hyphenation are determined by language-specific data and lettercase relationships. Furthermore, hyphenation of a word might be suppressed due to a limit on consecutive hyphenated lines (hlm), a minimum line length threshold (hym), or because the line can instead be adjusted with additional inter-word space (hys).

Request: .hy [mode]
Register: \n[.hy]

Set automatic hyphenation mode to mode, an integer encoding conditions for hyphenation; if omitted, the configured hyphenation mode default (see below) is implied. The hyphenation mode is available in the read-only register ‘.hy’; it is associated with the environment (see Environments). The hyphenation mode default depends on the localization file loaded when GNU troff starts up; see the hpf request below. If no localization file is loaded, the default is ‘1’.

Typesetting practice generally does not avail itself of every opportunity for hyphenation, but the details differ by language and site mandates. The hyphenation modes of AT&T troff were implemented with English-language publishing practices of the 1970s in mind, not a scrupulous enumeration of conceivable parameters. GNU troff extends those modes such that finer-grained control is possible, favoring compatibility with older implementations over a more intuitive arrangement. The means of hyphenation mode control is a set of numbers that can be added up to encode the behavior sought.72 The entries in the following table are termed values; the sum of the desired values is the mode.

0

disables hyphenation.

1

enables hyphenation except after the first and before the last character of a word.

The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as ‘.hy 1’, and then apply or lift restrictions relative to that basis.

2

disables hyphenation of the last word on a page or column,73 even for explicitly hyphenated words.

4

disables hyphenation before the last two characters of a word.

8

disables hyphenation after the first two characters of a word.

16

enables hyphenation before the last character of a word.

32

enables hyphenation after the first character of a word.

Apart from value 2, restrictions imposed by the hyphenation mode are not respected for words whose hyphenations have been specified with the hyphenation character (‘\%’ by default) or the hw request.

Nonzero values in the previous table are additive. For example, mode 12 causes GNU troff to hyphenate neither the last two nor the first two characters of a word. Some values cannot be used together because they contradict; for instance, values 4 and 16, and values 8 and 32. As noted, it is superfluous to add 1 to any non-zero even mode.

The automatic placement of hyphens in words is determined by pattern files, which are derived from TeX and available for several languages. These files are named hyphen.xx (for the patterns) and hyphenex.xx (for a list of exceptions in languages that require them) where xx is an ISO 639 language code; see the table below.

The number of characters at the beginning of a word after which the first hyphenation point should be inserted is determined by the patterns themselves; it can’t be reduced further without introducing additional, invalid hyphenation points (unfortunately, this information is not part of a pattern file—you have to know it in advance). The same is true for the number of characters at the end of a word before the last hyphenation point should be inserted. For example, you can supply the following input to ‘echo $(nroff)’.

.ll 1
.hy 48
splitting

You will get

s- plit- t- in- g

instead of the correct ‘split- ting’. English patterns as distributed with GNU troff need two characters at the beginning and three characters at the end; this means that value 4 of hy is mandatory. Value 8 is possible as an additional restriction, but values 16 and 32 should be avoided, as should mode 1. Modes 4 and 6 are typical.

A table of left and right minimum character counts for hyphenation as needed by the patterns distributed with GNU troff follows; see the groff_tmac(5) man page for more information on GNU troff’s language macro files.

languagepattern nameleft minright min
Czechcs22
Englishen23
Frenchfr23
German traditionaldet22
German reformedden22
Italianit22
Russianru22
Spanishes22
Swedishsv12

Hyphenation exceptions within pattern files (i.e., the words within a TeX \hyphenation group) obey the hyphenation restrictions given by hy.

Request: .nh

Disable automatic hyphenation; i.e., set the hyphenation mode to 0 (see above). The hyphenation mode of the last call to hy is not remembered, but invoking hy without an argument restores the hyphenation mode default; groff’s localization macro files do so for the languages listed above.

Request: .hydefault [mode]
Register: \n[.hydefault]

Set hyphenation mode default to mode, configuring the value the automatic hyphenation mode takes if hy is invoked without an argument. The hyphenation mode default is available in the read-only register ‘.hydefault’; it is associated with the environment (see Environments).

Request: .hpf pattern-file
Request: .hpfa pattern-file

Read hyphenation patterns from pattern-file, which is sought in the same way that macro files are with the mso request or the -m mac command-line option to groff. The pattern-file should have the same format as (simple) TeX pattern files. More specifically, the following scanning rules are implemented.

The hpfa request appends a file of patterns to the current list.

The set of hyphenation patterns is associated with the language set by the hla request (see below). The hpf request is usually invoked by a localization file loaded by the troffrc file.74

A second call to hpf (for the same language) replaces the hyphenation patterns with the new ones. Invoking hpf or hpfa causes an error if there is no hyphenation language. If no hpf request is specified (either in the document, in a file loaded at startup, or in a macro package), GNU troff won’t automatically hyphenate at all.

For automatic hyphenation to work, the formatter must know which letters are equivalent; for example, the letter ‘E’ behaves like ‘e’; only the latter typically appears in hyphenation pattern files. GNU troff expects characters that participate in automatic hyphenation to be assigned hyphenation codes that define these equivalence classes. At startup, GNU troff assigns hyphenation codes to the letters ‘a’–‘z’, applies the same codes to ‘A’–‘Z’ in one-to-one correspondence, and assigns a code of zero to all other characters.

The hcode request extends this principle to letters outside the Unicode basic Latin set; without it, words containing such letters won’t hyphenate properly even if the corresponding hyphenation patterns contain them.

Request: .hcode dst1 src1 [dst2 src2] …

Set the hyphenation code of ordinary or special character dst1 to that of src1, and so on. dst1 must be an ordinary character (other than a numeral) or a special character, and src1 must be an ordinary character (other than a numeral) or a special character to which a hyphenation code has already been applied. Assigning the code of an ordinary character to itself effectively creates a unique hyphenation code (which can then be copied to others). hcode ignores spaces between arguments. If any argument is invalid, hcode reports an error and stops reading them.

For example, the following hcode requests are necessary to assign hyphenation codes to the letters ‘ÄäÖöÜüß’, needed for German.

.hcode ä ä  Ä ä
.hcode ö ö  Ö ö
.hcode ü ü  Ü ü
.hcode ß ß

Without these assignments, GNU troff treats the German word ‘Kindergärten’ (the plural form of ‘kindergarten’) as two words ‘kinderg’ and ‘rten’ because the hyphenation code of the umlaut a is zero by default, just like a space. There is a German hyphenation pattern that covers ‘kinder’, so GNU troff finds the hyphenation ‘kin-der’. The other two hyphenation points (‘kin-der-gär-ten’) are missed.

To remove a character’s hyphenation code, copy the code of a character with a hyphenation code value of zero to it. For example, ‘.hcode ß $’ removes the hyphenation code from ‘ß’ (unless ‘$’ has already been assigned a different one).

The phcode request may be helpful to troubleshoot hyphenation code assignments. See Debugging.

Request: .hpfcode a b [c d] …

Caution: This request will be withdrawn in a future groff release. Use hcode instead.

The hpfcode request defines mapping values for character codes in pattern files. It is an older mechanism no longer used by GNU troff’s own macro files. hpf or hpfa apply the mapping after reading the patterns but before replacing or appending to the active list of patterns. Its arguments are pairs of character codes—integers from 0 to 255. The request maps character code a to code b, code c to code d, and so on. Character codes that would otherwise be invalid in GNU troff can be used.

Request: .hla [lang]
Register: \n[.hla]

Set the hyphenation language to lang, or clear it if there is no argument. Hyphenation exceptions specified with the hw request and hyphenation patterns and exceptions specified with the hpf and hpfa requests are associated with the hyphenation language. The hla request is usually invoked by a localization file, which is turn loaded by the troffrc or troffrc-end file; see the hpf request above.

The hyphenation language is available in the read-only string-valued register ‘.hla’; it is associated with the environment (see Environments).

If no hyphenation language is set, automatic hyphenation will not occur.

Request: .hlm [n]
Register: \n[.hlm]
Register: \n[.hlc]

Set the maximum quantity of consecutive hyphenated lines to n. If n is negative, there is no maximum. If omitted, n is -1. This value is associated with the environment (see Environments). Only lines output from a given environment count toward the maximum associated with that environment. Hyphens resulting from \% are counted; explicit hyphens are not.

The .hlm read-only register stores this maximum. The count of immediately preceding consecutive hyphenated lines is available in the read-only register .hlc.

Request: .hym [length]
Register: \n[.hym]

Set the (right) hyphenation margin to length. If the adjustment mode is not ‘b’ or ‘n’, the line is not hyphenated if it is shorter than length. Without an argument, the hyphenation margin is reset to its default value, 0. The default scaling unit is ‘m’. The hyphenation margin is associated with the environment (see Environments).

A negative argument resets the hyphenation margin to zero, emitting a warning in category ‘range’.

The hyphenation margin is available in the .hym read-only register.

Request: .hys [hyphenation-space]
Register: \n[.hys]

Suppress hyphenation of the line in adjustment modes ‘b’ or ‘n’ if that adjustment can be achieved by adding no more than hyphenation-space extra space to each inter-word space. Without an argument, the hyphenation space adjustment threshold is set to its default value, 0. The default scaling unit is ‘m’. The hyphenation space adjustment threshold is associated with the environment (see Environments).

A negative argument resets the hyphenation space adjustment threshold to zero, emitting a warning in category ‘range’.

The hyphenation space adjustment threshold is available in the .hys read-only register.


Next: , Previous: , Up: GNU troff Reference   [Contents][Index]