unroff-website/www/doc/unroff.1.html

700 lines
22 KiB
HTML

<html>
<head>
<!-- This file has been generated by unroff 1.0, 03/21/96 19:29:17. -->
<!-- Do not edit! -->
<link rev="made" href="mailto:net@informatik.uni-bremen.de">
<!-- $Revision: 1.16 $ -->
<title>Manual page for unroff(1)</title>
</head>
<body>
<h2>
unroff - programmable, extensible troff translator
<hr></h2>
<h2>SYNOPSIS</h2>
<b>unroff
</b>[
<b>-f</b><i>format
</i>] [
<b>-m</b><i>package
</i>] [
<b>-h</b><i>heapsize
</i>] [
<b>-C
</b>]
[
<b>-t
</b>] [
<i>file</i> | <i>option...
</i>]
<h2>OVERVIEW</h2>
<i>unroff
</i>reads and parses documents with embedded troff markup
and translates them to a different format--typically
to a different markup language such as SGML.<tt> </tt>
The actual output format is not hard-wired into
<i>unroff</i>;
instead, the translation is performed by a set of user-supplied rules
and functions written in the
<i>Scheme
</i>programming language.<tt> </tt>
<i>unroff
</i>employs the Extension Language Kit
<i>Elk
</i>to achieve programmability based on the Scheme language:
a fully-functional Scheme interpreter is embedded in the translator.<tt> </tt>
<p>
The documents that can be processed by
<i>unroff
</i>are not restricted to a specific troff macro set.<tt> </tt>
Translation rules for a new macro package can be added by supplying
a set of corresponding Scheme procedures (a ``back-end'').<tt> </tt>
Predefined sets of such procedures exist for a number of combinations
of target language and troff macro package:
<i>unroff
</i>1.0 supports translation to the ``Hypertext Markup Language''
(HTML) version 2.0 for the
<b>-man
</b>and
<b>-ms
</b>macro packages as well as ``bare'' troff (see
<b>unroff-html</b>(1),
<b>unroff-html-man</b>(1),
and
<b>unroff-html-ms</b>(1)
for a description).<tt> </tt>
<p>
Unlike conventional troff conversion tools,
<i>unroff
</i>includes a full troff parser and can therefore handle user-defined
macros, strings, and number registers, nested if-else requests
(with text blocks enclosed by `\{' and `\}' escape sequences), arbitrary
fonts and font positions, troff ``copy mode'', low-level formatting
requests such as `\l' and '\h', and the subtle
differences between request and macro invocations that are inherent
in the troff processing model.<tt> </tt>
<i>unroff
</i>has adopted a number of troff extensions introduced by
<i>groff</i>,
among them long names for macros, strings, number registers, and
special characters, and the `\$@' and `\$*' escape sequences.<tt> </tt>
<p>
<i>unroff
</i>interprets its input stream as a sequence of ``events''.<tt> </tt>
Events include the invocation of a troff request or macro, the use of a
troff escape sequence or special character, a troff string
or number register reference, end of sentence, start
of a new input file, and so on.<tt> </tt>
For each event encountered
<i>unroff
</i>invokes a Scheme procedure associated with that event.<tt> </tt>
Some types of events require a procedure that returns a string (or an
object that can be coerced into a string),
which is then interpolated into the input or output stream;
for other types of events, the event procedures are just called
for their side-effects.<tt> </tt>
<p>
The set of Scheme procedures to be used by
<i>unroff
</i>is determined by the output format and the name of the troff
macro package.<tt> </tt>
In addition, users can supply event procedures for their own macro
definitions (or replace existing ones) in form of a simple Scheme
program passed to
<i>unroff
</i>along with the troff input files; Scheme code can even be directly
embedded in the troff input as described below.<tt> </tt>
<p>
The full capabilities of
<i>unroff
</i>and the Scheme primitives required to write extensions or support
for new output formats are described in the
<i>Unroff Programmer's Manual</i>.<tt> </tt>
<h2>GENERIC OPTIONS</h2>
<dl>
<dt><b>-f</b><i>format
</i><dd>
Specifies the output format into which the troff input files are
translated.<tt> </tt>
If no
<b>-f
</b>option is given, a default output format is used (for
<i>unroff
</i>version 1.0 the default is
<b>-f</b><i>html</i>).<tt> </tt>
This default can be overridden by setting the
<b>UNROFF_FORMAT
</b>environment variable.<tt> </tt>
<dt><b>-m</b><i>name
</i><dd>
Specifies the name of the macro package that would be used by ordinary
troff to typeset the document.<tt> </tt>
In contrast to troff
<i>unroff
</i>does not actually load the macro package.<tt> </tt>
Instead, the specified name-in combination with the specified output
format-selects a set of Scheme files providing the procedure definitions
that control the translation process (see
<b>FILES
</b>below).<tt> </tt>
Therefore a corresponding
<b>tmac
</b>file need not exist for a given
<b>-m
</b>option.<tt> </tt>
<dt><b>-h</b><i>heapsize
</i><dd>
This option can be used to specify a non-standard heap size (in Kbytes)
for the Scheme interpreter included in
<i>unroff</i>;
see
<b>elk</b>(1).<tt> </tt>
<dt><b>-C
</b><dd>
Enables troff compatibility mode.<tt> </tt>
In compatibility mode certain
<i>groff
</i>extensions such as long names are not recognized.<tt> </tt>
<dt><b>-t
</b><dd>
Enables test mode.<tt> </tt>
Instead of processing troff input files,
<i>unroff
</i>enters an interactive Scheme top-level.<tt> </tt>
This can be useful to interactively experiment with the Scheme
primitives defined by
<i>unroff
</i>or to test or debug user-defined Scheme procedures.<tt> </tt>
</dl>
<h2>KEYWORD/VALUE OPTIONS</h2>
In addition to the generic options, a set of output-format-specific
options can be set from the command line and from within troff and
Scheme input files.<tt> </tt>
When specified on the command line, these options have the form
<dl><dt><dd>
<pre>
<i>option</i>=<i>value</i>
</pre>
</dl>
where the format of
<i>value
</i>depends on the
<i>type
</i>of the option.<tt> </tt>
For example, most output formats defines an option
<b>document
</b>whose value is used as a prefix for all output files created during
the translation.<tt> </tt>
The option is assigned a value by specifying a token such as
<dl><dt><dd>
<pre>
document=thesis
</pre>
</dl>
on the command line.<tt> </tt>
This option's value is interpreted as a plain string, i.e.
its type is
<b>string</b>.<tt> </tt>
<p>
The Scheme back-ends and user-supplied extensions can define their
own option types, but at least the following types are recognized:
<dl>
<dt><b>integer
</b><dd>
the option value is composed of an optional sign and an (arbitrary)
string of digits
<dt><b>boolean
</b><dd>
the option value must either be the character 1 (true) or the
character 0 (false)
<dt><b>character
</b><dd>
a single character must be specified as the option value
<dt><b>string
</b><dd>
an arbitrary string of characters can be specified
<dt><b>dynstring
</b><dd>
``dynamic string''; the option value is either
</dl>
<dl><dt><dd>
<dl>
<dt><i>string
</i><dd>
to assign a string to the option in the normal way, or
<dt><b>+</b><i>string
</i><dd>
to append the characters after the plus sign
to the option's current value, or
<dt><b>-</b><i>string
</i><dd>
to remove the characters after the minus sign from the
option's current value.<tt> </tt>
</dl>
</dl>
<p>
These extension-specific options must appear after the generic
<i>unroff
</i>options and may be mixed with the file name arguments.<tt> </tt>
As the option assignments and specified input files are processed in
order, the value given for an option is in effect for all the input
files that appear on the command line to the right of the option.<tt> </tt>
<p>
The exact set of keyword/value options is determined by the
Scheme code loaded for a given combination of output format
and macro package name and is described in the corresponding
manuals.<tt> </tt>
The following few options can always be set, regardless of the
actual output format:
<dl>
<dt><b>include-files</b> (boolean)
<dd>
If true,
<b>.so
</b>requests are executed by
<i>unroff
</i>in the normal way (that is, the named input file is read and
parsed), otherwise
<b>.so
</b>requests are ignored.<tt> </tt>
The default value is 1.<tt> </tt>
<dt><b>if-true</b> (dynstring)
<dd>
the specified characters are assigned to (appended to, removed from)
the set of one-character conditions that are regarded as true
by the
<b>.if
</b>and
<b>.ie
</b>requests.<tt> </tt>
The default value is "to".<tt> </tt>
<dt><b>if-false</b> (dynstring)
<dd>
like
<b>if-true</b>;
specifies the one-character conditions regarded as false.<tt> </tt>
The default value is "ne".<tt> </tt>
</dl>
<h2>FILES</h2>
<h3>INPUT FILES</h3>
On startup,
<i>unroff
</i>loads the Scheme source files that control the translation process.<tt> </tt>
All these files are loaded from subdirectories of a site-specific
``library directory'', typically something like
<b>/usr/local/lib/unroff</b>.<tt> </tt>
The directory is usually chosen by the system administrator when
installing the software and can be overridden by setting the
<b>UNROFF_DIR
</b>environment variable.<tt> </tt>
The path names mentioned in the following are relative to this
library directory.<tt> </tt>
<p>
The first Scheme file loaded is
<b>scm/troff.scm
</b>which contains basic definitions such as the built-in options
and option types, implementations for troff requests that are
not output-format specific, and utility functions to be used
by the back-ends or by user-supplied extensions.<tt> </tt>
Next, the file
<b>scm/</b><i>format</i><b>/common.scm
</b>is loaded, where
<i>format
</i>is the value of the option
<b>-f
</b>as given on the command line (or its default value).<tt> </tt>
The file implements the translation of the basic troff
requests, escape sequences, and special characters, etc.<tt> </tt>
The code dealing with macro invocations is loaded from
<b>scm/</b><i>format</i><b>/</b><i>package</i><b>.scm
</b>where
<i>package
</i>is the value of the option
<b>-m
</b>with the letter `m' prepended.<tt> </tt>
<p>
Finally, the file
<b>.unroff
</b>is loaded from the caller's home directory if present.<tt> </tt>
Arbitrary Scheme code can be placed in this initialization file.<tt> </tt>
It is typically used to assign values to package-specific
keyword/value options according to the user's preferences
(by means of the
<i>set-option!
</i>Scheme primitive as explained in the Programmer's Manual).<tt> </tt>
<p>
When the initial files have been loaded, any troff input files specified
in the command line are read and parsed.<tt> </tt>
The special file name
`<b>-</b>'
can be used to indicate standard input (usually in combination with
ordinary file names).<tt> </tt>
If no file name is given,
<i>unroff
</i>reads from standard input.<tt> </tt>
<p>
In addition to troff input files, file containing Scheme code can
be mentioned in the command line.<tt> </tt>
Scheme files (which by convention end in
<b>.scm</b>)
are loaded into the Scheme interpreter and usually contain
used-defined Scheme procedures to translate specific macros or
to replace existing procedures, or other user-supplied extensions
of any kind.<tt> </tt>
Scheme files named in the command line (or loaded explicitly from
within other files) are resolved against the directory
<b>scm/misc/
</b>which may hold site-specific extensions or other supplementary
packages.<tt> </tt>
troff files and Scheme files can be mixed freely in the command line.<tt> </tt>
<h3>OUTPUT FILES</h3>
Whether
<i>unroff
</i>sends its output to standard output or produces one or more output
files is not hard-wired but determined by the combination of output
format and macro package.<tt> </tt>
Generally, if no troff input files are specified, output is directed
to standard output, but this rule is not mandatory and may
be overridden by specific back-ends.<tt> </tt>
The
<b>document
</b>option is usually honored, although other rules may be employed to
determine the names of output files (for example, the extension
that implements
<b>-man
</b>for a given output format may derive the name of the output file
for a manual page from the input file name; see
<b>unroff-html-man</b>(1)).<tt> </tt>
<p>
If
<i>unroff
</i>is interrupted or quits early, any output files produced so far may be
incomplete or may contain wrong or inconsistent data, because
several passes may be required to complete an output file (for example,
to resolve cross references between a set of files), or because
an output file is not necessarily produced as a whole, but
<i>unroff
</i>may work on several files simultaneously.<tt> </tt>
<h2>EXAMPLES</h2>
<p>
To translate a troff document composed of two files and written with the
``ms'' macro package to HTML 2.0,
<i>unroff
</i>might be called like this:
<dl><dt><dd>
<pre>
unroff -fhtml -ms doc.tr doc.tr
</pre>
</dl>
Two options specific to the combination of
<b>-fhtml
</b>and
<b>-ms
</b>might be added to specify a prefix for output files and to have
the resulting output split into separate files after each section
(see
<b>unroff-html-ms</b>(1)):
<dl><dt><dd>
<pre>
unroff -fhtml -ms document=out/ split=1 doc.tr doc.tr
</pre>
</dl>
Additional features may be loaded from Scheme files specified in the
command line, e.g.
<b>hyper.scm
</b>which implements general Hypertext requests (and gets loaded from
<b>scm/misc/</b>)
and a user-supplied file in the current directory providing translation
rules for user-defined troff macros:
<dl><dt><dd>
<pre>
unroff -fhtml -ms document=out/ split=1 hyper.scm doc.scm\
doc.tr doc.tr
</pre>
</dl>
<h2>TROFF SUPPORT AND EXTENSIONS</h2>
As
<i>unroff
</i>translates troff input into another language rather than typesetting
the text in the usual way, its processing model necessarily differs
from that of conventional troff.<tt> </tt>
For a detailed description refer to the Programmer's Manual.<tt> </tt>
<p>
In brief,
<i>unroff
</i>copies characters from input to output, optionally performing
target-language-specific character translations.<tt> </tt>
For each request or macro invocation, string or number register
reference, special character, escape sequence, sentence end, or
<b>eqn</b>(1)
inline equation encountered in the input stream,
<i>unroff
</i>checks whether an ``event value'' has been specified by
the Scheme code (user-supplied or part of the back-end).<tt> </tt>
An event value is either a plain string, which is then treated as
if it had been part of the input stream, or a Scheme procedure,
which is then invoked and must in turn return a string.<tt> </tt>
The Scheme procedures are passed arguments, e.g. the macro
or request arguments in case of a procedure attached to a macro
or request, or an escape sequence argument for functions such as
`\f' or `\w'.<tt> </tt>
<p>
If no event value has been associated with a particular macro,
string, or number register,
<i>unroff
</i>checks whether a definition has been supplied in the normal way,
i.e. by means of
<b>.de</b>,
<b>.ds</b>,
or
<b>.nr</b>.<tt> </tt>
In this case, the value of the macro, string, or register is
interpolated as done by ordinary troff.<tt> </tt>
If no definition can be found, a fallback definition is looked up
as a last resort; and if everything fails, a warning is printed
and the event is ignored.<tt> </tt>
Similarly, event procedures are invoked at end of input line,
when an input file is opened or closed, at program start and
termination, and for each option specified in the command line;
but these procedures are called solely for their side-effects
(i.e. the return values are ignored).<tt> </tt>
<p>
Most Scheme procedures just emit the target language's representation
of the event with which they are associated.<tt> </tt>
Other procedures perform various kinds of bookkeeping; the procedure
associated with the
<b>.de
</b>request, for example, puts the text following
aside for later expansion, and the event procedures attached to
the requests
<b>.ds
</b>and
<b>.nr
</b>and to the escape sequences `\*' and `\n'
implement troff strings and number registers.<tt> </tt>
This way, even basic troff functions need not be hard-wired and can
be altered or replaced freely without recompiling
<i>unroff</i>.<tt> </tt>
<p>
The rule that an event value associated with a macro has precedence
over the actual macro definition accommodates higher-level,
structure-oriented target languages (such as SGML).<tt> </tt>
While the micro-formatting contained in a typical
<b>-ms
</b>macro definition, for example, makes sense to an ordinary typesetting
program, it is usually impossible to infer the macro's
<i>structural
</i>function from it (new paragraph, quotation, etc.).<tt> </tt>
On the other hand, troff documents often define a few additional,
simple macros that just serve as an abbreviation for a sequence
of predefined macros; in this case event procedures need not
specified, as
<i>unroff
</i>will then perform normal macro expansion.<tt> </tt>
<p>
<i>unroff
</i>usually takes care to not rescan the characters returned by event
procedures as if their results had been normal input, because
most event procedures already return code in the target language rather
than troff input that can be rescanned.<tt> </tt>
This, however, cannot always be avoided; for example, if a troff string
reference occurs at macro definition time (because `\*' is used rather
than `\\*'), the string value ends up in the macro body and will still
be rescanned when the macro is invoked.<tt> </tt>
A few other pitfalls caused by differences in the processing models of
troff and
<i>unroff
</i>are listed in the BUGS section below.<tt> </tt>
<p>
The scaling performed for the usual troff scale indicators
can be manipulated by a calling a Scheme primitive from within
the Scheme code implementing a particular back-end.<tt> </tt>
<h3>NEW TROFF REQUESTS</h3>
To aid transparent output of code in the target language and
evaluation of inline Scheme code,
<i>unroff
</i>supports two new requests and two extensions to the
<b>.ig
</b>(ignore input lines) troff request.<tt> </tt>
<p>
If
<b>.ig
</b>is called with the symbol
<b>&gt;&gt;
</b>as its first argument, all input lines up to (but not including)
the terminating
<b>.&gt;&gt;
</b>are sent to the current output file.<tt> </tt>
Example:
when translating to the Hypertext Markup Language, the construct
could be used to emit literal HTML code like this:
<dl><dt><dd>
<pre>
.ig &gt;&gt;
&lt;address&gt;
Bart Simpson&lt;br&gt;
Springfield
&lt;/address&gt;
.&gt;&gt;
</pre>
</dl>
<p>
To produce a single line of output, the new request
<b>.&gt;&gt;
</b>can be used as in this HTML example:
<dl><dt><dd>
<pre>
.&gt;&gt; "&lt;code&gt;result = i+1;&lt;/code&gt;"
</pre>
</dl>
<p>
If the
<b>.ig
</b>request is called with the argument
<b>##,
</b>everything up to the terminating
<b>.##
</b>is passed to the Scheme interpreter for evaluation.<tt> </tt>
This allows users to embed Scheme code in a troff document which
is executed when the document is processed by
<i>unroff</i>.<tt> </tt>
One use of this construct is to provide a Scheme event procedure
for a user-defined macro by placing the corresponding Scheme
definition in the same source file right below the troff macro definition.<tt> </tt>
Similarly, the request
<b>.##
</b>can be used to evaluate a short S-expression; all arguments to
the request are concatenated and then passed to the Scheme
interpreter.<tt> </tt>
<p>
Note that inline Scheme code is a potentially dangerous feature,
as a document received by someone else may contain embedded code
that does something unexpected when the file is processed by
<i>unroff
</i>(but it is probably not more dangerous than the standard troff
<b>.pi
</b>request or the
<b>.sy
</b>request of
<i>ditroff</i>).<tt> </tt>
<p>
<i>unroff
</i>defines the following new read-only number registers:
<dl>
<dt><b>.U
</b><dd>
This register always expand to 1.<tt> </tt>
It can be used by macros to determine whether the document is
being processed by
<i>unroff</i>.<tt> </tt>
<dt><b>.C
</b><dd>
Expands to 1 if troff compatibility mode has been enabled
by using the option
<b>-C</b>,
to 0 otherwise.<tt> </tt>
</dl>
<p>
The following new escape sequences are available in a macro
body during macro expansion:
<dl>
<dt><b>$0
</b><dd>
The name of the current macro.<tt> </tt>
<dt><b>$*
</b><dd>
The concatenation of all arguments, separated by spaces.<tt> </tt>
<dt><b>$@
</b><dd>
The concatenation of all arguments, separated by spaces, and
with each argument enclosed by double quotes.<tt> </tt>
</dl>
<p>
The names of strings, macros, number registers, and fonts may be of
any length.<tt> </tt>
As in
<i>groff</i>,
square brackets can be used for names of arbitrary length:
<dl><dt><dd>
<pre>
\f[font] \*[string] \n[numreg] ...
</pre>
</dl>
<p>
There is no limit on the number of macro arguments, and the following
syntax can be used to reference the 10th, 11th, etc. macro argument:
<dl><dt><dd>
<pre>
\$(12 \$[12] \$[123]
</pre>
</dl>
<p>
Unless troff compatibility mode has been enabled, the arguments to the
<i>groff</i>-specific
escape sequences `\A', `\C', '\L', '\N', '\R', '\V', '\Y',
and '\Z' are recognized and parsed, so that event procedures
can be implemented correctly for these escape sequences.<tt> </tt>
<h2>SEE ALSO</h2>
<b>unroff-html</b>(1),
<b>unroff-html-man</b>(1),
<b>unroff-html-ms</b>(1);
<br>
<b>troff</b>(1),
<b>groff</b>(1);
<b>elk</b>(1).<tt> </tt>
<p>
Unroff Programmer's Manual.<tt> </tt>
<p>
http://www.informatik.uni-bremen.de/~net/unroff
<h2>AUTHOR</h2>
Oliver Laumann, net@cs.tu-berlin.de
<h2>BUGS</h2>
A number of low-level formatting features of troff (such as the
absolute position indicator in numerical expressions)
are not yet supported by
<i>unroff
</i>version 1.0, which is not critical for higher-level,
structure-oriented target languages such as the Hypertext
Markup Language.<tt> </tt>
<p>
Diversions are not supported, although specific back-ends are
free to add this functionality.<tt> </tt>
<p>
Special characters are not treated right in certain contexts;
in particular, special characters may not be used in place
of plain characters where the characters act as some kind of
delimiter as in
<dl><dt><dd>
<pre>
.if \(bsfoo\(bsbar\(bs ...
</pre>
</dl>
<p>
Spaces in an
<b>.if
</b>condition do not work; e.g. the following fails:
<dl><dt><dd>
<pre>
.if ' ' ' ...
</pre>
</dl>
<p>
Conditional input is subject to string and number register
expansion even if the corresponding if-condition evaluates to false.<tt> </tt>
<p>
There are no number register formats, i.e. the request
<b>.af
</b>does not work.<tt> </tt>
<p>
The set of punctuation marks that indicate end of sentence
should be configurable.<tt> </tt>
<p>
Empty input lines and leading space should trigger a special
event, so that their break semantics can be implemented correctly.<tt> </tt>
<p>
A comment in a line by itself currently does not generate a
blank line.<tt> </tt>
<p><hr>
Markup created by <em>unroff</em> 1.0,&#160;<tt> </tt>&#160;<tt> </tt>March 21, 1996.
</body>
</html>