The documents that can be processed by unroff are not restricted to a specific troff macro set. Translation rules for a new macro package can be added by supplying a set of corresponding Scheme procedures (a ``back-end''). Predefined sets of such procedures exist for a number of combinations of target language and troff macro package: unroff 1.0 supports translation to the ``Hypertext Markup Language'' (HTML) version 2.0 for the -man and -ms macro packages as well as ``bare'' troff (see unroff-html(1), unroff-html-man(1), and unroff-html-ms(1) for a description).
Unlike conventional troff conversion tools, unroff includes a full troff parser and can therefore handle user-defined macros, strings, and number registers, nested if-else requests (with text blocks enclosed by `\{' and `\}' escape sequences), arbitrary fonts and font positions, troff ``copy mode'', low-level formatting requests such as `\l' and '\h', and the subtle differences between request and macro invocations that are inherent in the troff processing model. unroff has adopted a number of troff extensions introduced by groff, among them long names for macros, strings, number registers, and special characters, and the `\$@' and `\$*' escape sequences.
unroff interprets its input stream as a sequence of ``events''. Events include the invocation of a troff request or macro, the use of a troff escape sequence or special character, a troff string or number register reference, end of sentence, start of a new input file, and so on. For each event encountered unroff invokes a Scheme procedure associated with that event. Some types of events require a procedure that returns a string (or an object that can be coerced into a string), which is then interpolated into the input or output stream; for other types of events, the event procedures are just called for their side-effects.
The set of Scheme procedures to be used by unroff is determined by the output format and the name of the troff macro package. In addition, users can supply event procedures for their own macro definitions (or replace existing ones) in form of a simple Scheme program passed to unroff along with the troff input files; Scheme code can even be directly embedded in the troff input as described below.
The full capabilities of unroff and the Scheme primitives required to write extensions or support for new output formats are described in the Unroff Programmer's Manual.
option=value
document=thesis
The Scheme back-ends and user-supplied extensions can define their own option types, but at least the following types are recognized:
These extension-specific options must appear after the generic unroff options and may be mixed with the file name arguments. As the option assignments and specified input files are processed in order, the value given for an option is in effect for all the input files that appear on the command line to the right of the option.
The exact set of keyword/value options is determined by the Scheme code loaded for a given combination of output format and macro package name and is described in the corresponding manuals. The following few options can always be set, regardless of the actual output format:
The first Scheme file loaded is scm/troff.scm which contains basic definitions such as the built-in options and option types, implementations for troff requests that are not output-format specific, and utility functions to be used by the back-ends or by user-supplied extensions. Next, the file scm/format/common.scm is loaded, where format is the value of the option -f as given on the command line (or its default value). The file implements the translation of the basic troff requests, escape sequences, and special characters, etc. The code dealing with macro invocations is loaded from scm/format/package.scm where package is the value of the option -m with the letter `m' prepended.
Finally, the file .unroff is loaded from the caller's home directory if present. Arbitrary Scheme code can be placed in this initialization file. It is typically used to assign values to package-specific keyword/value options according to the user's preferences (by means of the set-option! Scheme primitive as explained in the Programmer's Manual).
When the initial files have been loaded, any troff input files specified in the command line are read and parsed. The special file name `-' can be used to indicate standard input (usually in combination with ordinary file names). If no file name is given, unroff reads from standard input.
In addition to troff input files, file containing Scheme code can be mentioned in the command line. Scheme files (which by convention end in .scm) are loaded into the Scheme interpreter and usually contain used-defined Scheme procedures to translate specific macros or to replace existing procedures, or other user-supplied extensions of any kind. Scheme files named in the command line (or loaded explicitly from within other files) are resolved against the directory scm/misc/ which may hold site-specific extensions or other supplementary packages. troff files and Scheme files can be mixed freely in the command line.
If unroff is interrupted or quits early, any output files produced so far may be incomplete or may contain wrong or inconsistent data, because several passes may be required to complete an output file (for example, to resolve cross references between a set of files), or because an output file is not necessarily produced as a whole, but unroff may work on several files simultaneously.
To translate a troff document composed of two files and written with the ``ms'' macro package to HTML 2.0, unroff might be called like this:
unroff -fhtml -ms doc.tr doc.tr
unroff -fhtml -ms document=out/ split=1 doc.tr doc.tr
unroff -fhtml -ms document=out/ split=1 hyper.scm doc.scm\ doc.tr doc.tr
In brief, unroff copies characters from input to output, optionally performing target-language-specific character translations. For each request or macro invocation, string or number register reference, special character, escape sequence, sentence end, or eqn(1) inline equation encountered in the input stream, unroff checks whether an ``event value'' has been specified by the Scheme code (user-supplied or part of the back-end). An event value is either a plain string, which is then treated as if it had been part of the input stream, or a Scheme procedure, which is then invoked and must in turn return a string. The Scheme procedures are passed arguments, e.g. the macro or request arguments in case of a procedure attached to a macro or request, or an escape sequence argument for functions such as `\f' or `\w'.
If no event value has been associated with a particular macro, string, or number register, unroff checks whether a definition has been supplied in the normal way, i.e. by means of .de, .ds, or .nr. In this case, the value of the macro, string, or register is interpolated as done by ordinary troff. If no definition can be found, a fallback definition is looked up as a last resort; and if everything fails, a warning is printed and the event is ignored. Similarly, event procedures are invoked at end of input line, when an input file is opened or closed, at program start and termination, and for each option specified in the command line; but these procedures are called solely for their side-effects (i.e. the return values are ignored).
Most Scheme procedures just emit the target language's representation of the event with which they are associated. Other procedures perform various kinds of bookkeeping; the procedure associated with the .de request, for example, puts the text following aside for later expansion, and the event procedures attached to the requests .ds and .nr and to the escape sequences `\*' and `\n' implement troff strings and number registers. This way, even basic troff functions need not be hard-wired and can be altered or replaced freely without recompiling unroff.
The rule that an event value associated with a macro has precedence over the actual macro definition accommodates higher-level, structure-oriented target languages (such as SGML). While the micro-formatting contained in a typical -ms macro definition, for example, makes sense to an ordinary typesetting program, it is usually impossible to infer the macro's structural function from it (new paragraph, quotation, etc.). On the other hand, troff documents often define a few additional, simple macros that just serve as an abbreviation for a sequence of predefined macros; in this case event procedures need not specified, as unroff will then perform normal macro expansion.
unroff usually takes care to not rescan the characters returned by event procedures as if their results had been normal input, because most event procedures already return code in the target language rather than troff input that can be rescanned. This, however, cannot always be avoided; for example, if a troff string reference occurs at macro definition time (because `\*' is used rather than `\\*'), the string value ends up in the macro body and will still be rescanned when the macro is invoked. A few other pitfalls caused by differences in the processing models of troff and unroff are listed in the BUGS section below.
The scaling performed for the usual troff scale indicators can be manipulated by a calling a Scheme primitive from within the Scheme code implementing a particular back-end.
If .ig is called with the symbol >> as its first argument, all input lines up to (but not including) the terminating .>> are sent to the current output file. Example: when translating to the Hypertext Markup Language, the construct could be used to emit literal HTML code like this:
.ig >> <address> Bart Simpson<br> Springfield </address> .>>
To produce a single line of output, the new request .>> can be used as in this HTML example:
.>> "<code>result = i+1;</code>"
If the .ig request is called with the argument ##, everything up to the terminating .## is passed to the Scheme interpreter for evaluation. This allows users to embed Scheme code in a troff document which is executed when the document is processed by unroff. One use of this construct is to provide a Scheme event procedure for a user-defined macro by placing the corresponding Scheme definition in the same source file right below the troff macro definition. Similarly, the request .## can be used to evaluate a short S-expression; all arguments to the request are concatenated and then passed to the Scheme interpreter.
Note that inline Scheme code is a potentially dangerous feature, as a document received by someone else may contain embedded code that does something unexpected when the file is processed by unroff (but it is probably not more dangerous than the standard troff .pi request or the .sy request of ditroff).
unroff defines the following new read-only number registers:
The following new escape sequences are available in a macro body during macro expansion:
The names of strings, macros, number registers, and fonts may be of any length. As in groff, square brackets can be used for names of arbitrary length:
\f[font] \*[string] \n[numreg] ...
There is no limit on the number of macro arguments, and the following syntax can be used to reference the 10th, 11th, etc. macro argument:
\$(12 \$[12] \$[123]
Unless troff compatibility mode has been enabled, the arguments to the groff-specific escape sequences `\A', `\C', '\L', '\N', '\R', '\V', '\Y', and '\Z' are recognized and parsed, so that event procedures can be implemented correctly for these escape sequences.
Unroff Programmer's Manual.
http://www.informatik.uni-bremen.de/~net/unroff
Diversions are not supported, although specific back-ends are free to add this functionality.
Special characters are not treated right in certain contexts; in particular, special characters may not be used in place of plain characters where the characters act as some kind of delimiter as in
.if \(bsfoo\(bsbar\(bs ...
Spaces in an .if condition do not work; e.g. the following fails:
.if ' ' ' ...
Conditional input is subject to string and number register expansion even if the corresponding if-condition evaluates to false.
There are no number register formats, i.e. the request .af does not work.
The set of punctuation marks that indicate end of sentence should be configurable.
Empty input lines and leading space should trigger a special event, so that their break semantics can be implemented correctly.
A comment in a line by itself currently does not generate a blank line.