https://www.lunabase.org/~faber/Vault/software/grap/
Besides groff,
neatroff
is an exception.
Unix and related operating systems distinguish
standard output and standard error streams because of
troff:
https://www.tuhs.org/pipermail/tuhs/2013-December/006113.html.
See Line Layout.
roff is the language of historical Unix
manuals, and of man pages to this day.
POSIX has not standardized a mechanism
for the
man
command to distinguish pages by numeric category.
If
‘man 'groff(7)'’
produces an error,
attempt
‘man 7 groff’
or
‘man -s 7 groff’.
GNU
troff does not,
however,
accept newlines
(line feeds)
in file names supplied as arguments to requests.
The
mso request loads a macro file of any name.
See Host System Service Access.
See Device and Font Description Files.
The remainder of this chapter is based on
“Writing Papers with NROFF using -me” by Eric P. Allman,
which is distributed with groff as meintro.me.
While manual pages are older, early ones used macros supplanted by the man package of Seventh Edition Unix (1979). ms shipped with Sixth Edition (1975) and was documented by Mike Lesk in a Bell Labs internal memorandum.
defined in ms Footnotes
Distinguish a
document title from “titles”, which are what roff systems call
headers and footers collectively.
This idiosyncrasy arose through
feature accretion; for example, the B macro in Sixth Edition
Unix ms (1975) accepted only one argument, the text to be set in
boldface. By Version 7 (1979) it recognized a second argument; in
1990, groff ms added a “pre” argument, placing it third
to avoid breaking support for older documents.
Unix Version 7 ms, its descendants, and GNU
ms prior to groff version 1.23.0
You could reset it
after each call to 1C, 2C, or MC.
“Typing Documents on the UNIX System: Using the -ms Macros with Troff and Nroff”, M. E. Lesk, Bell Laboratories, 1978
Register values are converted to and stored as basic units. See Measurements.
If you redefine the ms PT macro
and desire special treatment of certain page numbers (like ‘1’),
you may need to handle a non-Arabic page number format, as groff
ms’s PT does; see the macro package source.
In
groff ms,
the PN and % registers are aliases.
Removal beforehand is necessary
because
groff ms
aliases these macros with a diagnostic one;
you want to reorient the aliased name
before (re-)populating the macro.
See Device and Font Description Files.
Tabs
and leaders also separate words.
Escape sequences can function as word characters,
word separators,
or neither—the last simply have no effect on
GNU
troff’s idea of whether an input character is within a word.
We’ll discuss all of these in due course.
A
well-researched jeremiad appreciated by groff contributors on
both sides of the sentence-spacing debate can be found at
https://web.archive.org/web/20171217060354/http://www.heracliteanriver.com/?p=324.
This statement oversimplifies; there are escape sequences whose purpose is precisely to produce glyphs on the output device, and input characters that aren’t part of escape sequences can undergo a great deal of processing before getting to the output.
The mnemonics for the special characters shown here are “dagger”, “double dagger”, “right (double) quote”, and “closing (single) quote”. See groff_char(7).
See text lines.
“Tab” abbreviates “tabulation”, suggesting a table arrangement mechanism.
The backspace character is also meaningful; see Page Motions.
The \RET escape sequence
can alter how an input line is classified;
see Line Continuation.
Argument handling in macros is more flexible but also more complex. See Calling Macros.
Some escape sequences undergo interpolation as well.
GNU
troff
offers additional ones.
See Writing Macros.
Macro files and packages frequently define registers and strings as well.
The semantics of certain punctuation code points have gotten stricter with the successive standards, a cause of some frustration among man page writers; see groff_char(7).
It also emits a warning in category ‘input’. See Warnings.
Historically,
control characters like
ASCII
STX,
ETX,
and
BEL
(Control+B,
Control+C,
and
Control+G,
respectively)
have been observed in
roff
documents,
particularly in macro packages employing them as delimiters
with the output comparison operator
to try to avoid collisions
with the content of arbitrary user-supplied parameters
(see Operators in Conditionals).
We discourage this expedient;
in
GNU
troff it is unnecessary
(outside of compatibility mode)
because the program parses delimited arguments
at a different input level than their surrounding context.
See Implementation Differences.
KOI8-R code points in the range
0x80–0x9F are not valid input to GNU troff;
recall Input Format.
This restriction should be no impediment to practical documents,
as these KOI8-R code points do not encode letters,
but box-drawing symbols and characters
that are better obtained via special character escape sequences;
see groff_char(7).
The DVI output device defaults to using the Computer Modern (CM) fonts; ec.tmac loads the EC fonts instead, which provide Euro ‘\[Eu]’ and per mille ‘\[%0]’ glyphs.
Emacs: fill-column: 72; Vim: textwidth=72
groff does not yet support right-to-left
scripts.
groff’s
terminal output devices have page offsets of zero.
See Registers.
See Numeric Expressions.
Provision is made for interpreting and reporting decimal fractions in certain cases.
If that’s not enough, see the groff_tmac(5) man page for the 62bit.tmac macro package.
If overflow would
occur,
GNU
troff emits a warning in category
‘range’.
See Warnings.
GNU
troff emits a warning in category
‘number’.
See Warnings.
Control structure syntax creates an exception to this rule, but is designed to remain useful: recalling our example, ‘.if 1 .Underline this’ would underline only “this”, precisely. See Conditionals and Loops.
See Diversions.
Use of escape sequences in identifiers
is not portable.
For example,
DWB 3.3
troff accepts
\_.
Plan 9
troff does too,
along with
\',
\`,
and
\-.
Solaris
troff rejects all of these except
\_,
but accepts
\&,
\{,
\},
\SPC,
\%,
and
\c.
Heirloom Doctools
troff rejects all of these,
including
\_,
but accepts
\!,
which the others reject.
GNU
troff rejects all of the foregoing.
GNU
troff emits a warning in category
‘mac’.
See Warnings.
GNU
troff emits a warning in category
‘reg’.
See Warnings.
Recall Identifiers.
In compatibility mode, a space is not necessary after a request or macro name of two characters’ length.
Plan 9
troff does.
GNU
troff emits a warning in category
‘mac’.
See Warnings.
\~ is fairly
portable; see Other Differences.
Strictly, you can neglect to close the last quoted macro argument, relying on the end of the control line to do so. We consider this lethargic practice poor style.
GNU
troff emits a warning in category
‘escape’.
See Warnings.
The omission of spaces before the comment escape sequences is necessary; see Strings.
TeX does have such a mechanism.
The
GNU
eqn(1)
and
tbl(1)
preprocessors use parameterized but non-delimited special character
escape sequences
\(
and
\[
to bracket portions of their output.
See Page Layout.
See Operators in Conditionals.
See Implementation Differences.
This claim may be more aspirational than descriptive.
except in copy mode on Plan 9
troff
See Copy Mode.
See Conditional Blocks.
Exception: auto-incrementing registers defined outside
the ignored region will be modified if interpolated with
\ną inside it. See Auto-increment.
See Page Motions.
GNU
troff emits a warning in category
‘reg’.
See Warnings.
GNU
troff emits a warning in category
‘reg’.
See Warnings.
A negative auto-increment can be considered an “auto-decrement”.
GNU troff dynamically allocates memory for as
many registers as required.
See Environments.
though not necessarily to the output device; see Diversions
If you’re not sure whether an input line has been
productive, you can use the pline request before and after it to
see whether it produced any output nodes. See Debugging.
See Line Continuation.
The
.R
register interpolates the largest value that
GNU
troff can work with.
Recall
Built-in Registers.
Recall Filling and Sentences for the definitions of word and sentence boundaries, respectively.
See Font Description File Format. This request is incorrectly documented in the AT&T
troff manual as using units of 1/36 em.
Whether a perfect algorithm for this application is even possible is an unsolved problem in computer science: https://tug.org/docs/liang/liang-thesis.pdf.
GNU
troff emits a warning in category
‘missing’.
See Warnings.
\% itself stops marking
hyphenation points but still produces no output glyph.
“Soft” because it appears in output only where a hyphenation break is performed; a “hard” hyphen, as in “long-term”, always appears.
The mode is a vector of Boolean values encoded as an integer. To a programmer, this fact is easily deduced from the exclusive use of powers of two for the configuration parameters; they are computationally easy to “mask off” and compare to zero. To almost everyone else, the arrangement seems recondite and unfriendly.
The formatter prevents hyphenation if the next page location trap is closer to the vertical drawing position than the next text baseline would be. See Page Location Traps. A macro package might also employ value ‘2’ to prevent hyphenation before a display; recall Displays and Keeps.
See subsection “Localization packages” of groff_tmac(5).
See Environments.
For more detail on localization, see groff_tmac(5).
See the discussion of the
ds
request in Strings.
GNU
troff also emits a warning in category
‘range’.
See Warnings.
GNU
troff also emits a warning in category
‘range’.
See Warnings.
See Page Location Traps.
To shift the text baseline for
part of an output line—to set super- or subscripts, for
instance–use the \v escape sequence. See Page Motions.
See Drawing Geometric Objects.
or geometric objects; see Drawing Geometric Objects
to the top-level diversion; see Diversions
Plan 9 troff
uses the register .S for this purpose.
Pronounce “leader” to rhyme with “feeder”; it refers to how the glyphs “lead” the eye across the page to the corresponding page number or other datum.
A
GNU
nroff program is available for convenience;
it runs
GNU
troff
to perform formatting;
see
nroff(1).
See Conditionals and Loops, for more on built-in conditions.
See Copy Mode.
Historically, the \c escape
sequence has proven challenging to characterize. Some sources say it
“connects the next input text” (to the input line on which it
appears); others describe it as “interrupting” text, on the grounds
that a text line is interrupted without breaking, perhaps to inject a
request invocation or macro call.
See Diversions.
Terminals and some typesetters have fonts that render at
only one or two sizes. As examples, take the groff lj4
device’s Lineprinter, and lbp’s Courier and Elite faces.
Font designers prepare families such that the styles share esthetic properties.
Historically, the fonts troffs dealt with were not
Free Software or, as with the Graphic Systems C/A/T, did not even exist
in the digital domain.
See Font Description File Format.
It also emits a warning in category ‘font’ or ‘range’, as appropriate. See Warnings.
See DESC File Format.
Depending on the breadth
of the output device’s glyph repertoire,
the characters
',
-,
^,
`,
and
~
can be exceptions to this rule.
"
and
\
are not exceptions,
but because they are syntactically meaningful to the formatter,
access to their glyphs
may require use of special characters
(or changing or disabling the escape character).
See groff_char(7).
Fonts do not necessarily arrange their glyphs per a standard character encoding.
See Strings.
See Device and Font Description Files.
Not all versions of the
man
program support the
-T
option;
use the subsequent example for an alternative.
This is “Normalization Form D” as documented in Unicode Standard Annex #15 (https://unicode.org/reports/tr15/).
See Compatibility Mode.
See Character Classes.
See GNU troff Internals.
Mutually recursive character definitions are handled similarly.
See Font Description File Format.
See Environments.
See Miscellaneous.
See Environments.
See Miscellaneous.
A monospaced font may possess glyphs for ligatures, but they nevertheless seldom see use to set text.
Opinions of this escape sequence’s best name abound.
“Zero-width space” is a popular misnomer: roff formatters do
not treat it like a space; when filling, they do not break a line where
\& appears. Ossanna called it a “non-printing, zero-width
character”, but the character causes output even though it does
not “print”. If no output line is pending, the dummy character starts
one. Contrast an empty input document with one containing only
\&. The former produces no output; the latter, a blank page.
In text fonts, parentheses are often the tallest
glyphs, but a font’s glyphs may not match the nominal type size! In the
standard PostScript font families, 10-point Times sets better with
9-point Helvetica and 11-point Courier than if all were used at
10 points. Recall the fzoom request in Selecting Fonts for a remedy.
Rhyme with “sledding”; mechanical typography used lead metal (Latin plumbum).
The claim appears to have been true of Ossanna
troff for the C/A/T device; Kernighan made device-independent
troff more flexible.
In compatibility mode only, a non-zero n must be in the range 4–39. See Compatibility Mode.
See Device and Font Description Files.
These are known vulgarly as “ANSI” colors, after its X3.64 standard, now withdrawn.
See Copy Mode.
GNU
troff emits a warning in category
‘mac’.
See Warnings.
We refer to
vtroff,
which converted the C/A/T command stream
produced by early-vintage AT&T
troff
to input suitable for Versatec and Benson-Varian plotters.
Strictly,
letters not otherwise recognized
are
treated as output comparison delimiters.
A portable document avoids using letters not in the list above;
for example,
Plan 9
troff uses
‘h’
to test a mode it calls
htmlroff,
and GNU
troff may provide additional operators in the future.
Because formatting of the comparands takes place in a dummy environment, vertical motions within them cannot spring traps. See Traps.
All
of this is to say that the lists of nodes created by formatting
xxx and yyy must be identical.
See GNU troff Internals.
See Copy Mode.
This bizarre behavior maintains compatibility with
AT&T troff.
See while.
See Copy Mode.
unless you redefine it
“somewhat less” because things other than macro calls can be on the input stack
See Copy Mode.
While it is possible to define and call a macro ‘.’, you can’t use it as an end macro: during a macro definition, ‘..’ is never handled as calling ‘.’, even if ‘.de name .’ explicitly precedes it.
Its structure is adapted from, and isomorphic to, part of a solution by Tadziu Hoffman to the problem of reflowing text multiple times to find an optimal configuration for it. https://lists.gnu.org/archive/html/groff/2008-12/msg00006.html
as trace.tmac does
See Copy Mode.
If they were not,
parameter interpolations would be similar to command-line
parameters—fixed for the entire duration of a roff program’s
run. The advantage of interpolating \$ escape sequences even in
copy mode is that they can interpolate different contents from one call
to the next, like function parameters in a procedural language. The
additional escape character is the price of this power.
Compare this to the
\def and \edef commands in TeX.
These are lightly adapted from the groff
implementation of the ms macros.
See Page Location Traps.
At the
grops defaults of 10-point type on 12-point vertical spacing, the
difference between half a vee and half an em can be subtle: large
spacings like ‘.vs .5i’ make it obvious.
Historically,
tools named
nrchbar
and
changebar
were developed for marking changes with margin characters
and could be found in archives of the
comp.sources.unix
Usenet group.
Some proprietary Unices also offer(ed) a
diffmk
program.
(hc, vc) is adjusted to the point nearest the perpendicular bisector of the arc’s chord.
A trap planted at ‘20i’ or ‘-30i’ cannot spring on a page of length ‘11i’.
It may help to think of each trap location as
maintaining a queue; wh operates on the head of the queue, and
ch operates on its tail. Only the trap at the head of the queue
is visible.
See Debugging.
See Diversions.
While processing an end-of-input macro, the formatter assumes that the next page break must be the last; it goes into “sudden death overtime”.
See GNU troff Internals.
GNU
troff emits a warning in category
‘mac’.
See Warnings.
See Environments.
GNU
troff emits a warning in category
‘di’.
See Warnings.
Thus, the “water” gets “higher” proceeding down the page.
We must double the backslash. Recall Copy Mode.
See Debugging.
GNU
troff emits a warning in category
‘file’.
See Warnings.
See GNU troff Internals.
POSIX
command environments
and roff formatters employ different integer-to-Boolean
interpretation conventions;
a POSIX command exits with a zero status if it succeeds
and a positive one if it fails,
whereas a roff register
tests “true” if it has a positive value.
See Debugging.
See GNU troff Output.
See GNU troff Internals.
When encountered, these produce warnings in category ‘char’. See Warnings.
When not in copy mode,
the formatter does not tokenize the escape sequences
\f,
\F,
\H,
\m,
\M,
\R,
\s,
and
\S,
but instead updates the environment.
GNU
troff
encodes tokens that aren’t Unicode Basic Latin characters
as code points in the C0 and C1 control ranges;
we plan to move them to the Unicode Private Use Area (PUA)
or to code points outside the Unicode encoding space
in a future release.
Because
GNU
troff’s internals are subject to revision,
we do not show the output of these examples.
The names and structures of node types may change over time.
The JSON interpreter
jq(1)
is not essential,
but can be helpful in understanding the topology of the node trees
populating output lines and diversions in particular.
You may
wonder why a glyph node for
‘hy’
exists when this example
doesn’t produce one on the output.
That’s because the break is discretionary;
at the time a word is formatted into nodes,
GNU
troff doesn’t know where the output line will break.
Later,
when processesing a pending output line,
GNU
troff has that knowledge,
and iterates through the output line’s node list,
using its discretion to discard these hyphen glyph nodes
everywhere except when hyphenating a word at the end of the line.
The
Graphic Systems C/A/T phototypesetter
(the original device target for
AT&T
troff)
supported only a few discrete type sizes
in the range 6–36 points,
so Ossanna contrived a special case in the parser
to do what the user must have meant.
Kernighan warned of this in the 1992 revision
of CSTR #54 (§2.3),
and more recently,
McIlroy referred to it as a “living fossil”.
Recall Strings.
Thus,
.ll 10n \%antidisestablishmen\%tarianism .br \&\%antidisestablishmen\%tarianism .pl \n(nlu
produces different results with each of the three formatters.
Naturally, if you’ve changed
the escape character, you need to prefix the e with whatever it
is—and you’ll likely get something other than a backslash in the
output.
AT&T
troff’s font description files
did not define the
rs
special character,
but those of
its descendant Heirloom Doctools
troff do,
as of its 060716 release (July 2006).
In
GNU
troff, node objects produce these commands;
recall GNU troff Internals.
GNU
troff also reads files that don’t satisfy
the strict POSIX definition of a text file—for example,
those lacking a final newline character—and the
cf
and
trf requests read arbitrary files.
Recall Host System Service Access.
Plan 9 troff has also abandoned the binary
format.
groff requests and escape sequences
interpret non-negative integers as mounting positions instead. Further,
a font named ‘0’ cannot be automatically mounted by the
fonts directive of a DESC file.
On typesetters, this directive is misnamed since it starts a list of glyphs, not characters.
that is, any integer parsable by the C standard library’s strtol(3) function
The parser for device-independent output can be found in the file groff-source-dir/src/libs/libdriver/input.cpp.
See “A Typesetter-independent TROFF”, Bell Labs CSTR #97, 1982.