scsh-0.6/doc/scsh-manual/procnotation.tex

544 lines
23 KiB
TeX
Raw Normal View History

2001-07-13 02:59:22 -04:00
%&latex -*- latex -*-
\chapter{Process notation}
\label{sec:proc-forms}
Scsh has a notation for controlling {\Unix} processes that takes the
form of s-expressions; this notation can then be embedded inside of
standard {\Scheme} code.
The basic elements of this notation are \emph{process forms},
\emph{extended process forms}, and \emph{redirections}.
\section{Extended process forms and i/o redirections}
An \emph{extended process form} is a specification of a {\Unix} process to
run, in a particular I/O environment:
\codex{\var{epf} {\synteq} (\var{pf} $ \var{redir}_1$ {\ldots} $ \var{redir}_n $)}
where \var{pf} is a process form and the $\var{redir}_i$ are redirection specs.
A \emph{redirection spec} is one of:
\begin{inset}
\begin{tabular}{@{}l@{\qquad{\tt; }}l@{}}
\ex{(< \var{[fdes]} \var{file-name})} & \ex{Open file for read.}
\\\ex{(> \var{[fdes]} \var{file-name})} & \ex{Open file create/truncate.}
\\\ex{(<< \var{[fdes]} \var{object})} & \ex{Use \var{object}'s printed rep.}
\\\ex{(>> \var{[fdes]} \var{file-name})} & \ex{Open file for append.}
\\\ex{(= \var{fdes} \var{fdes/port})} & \ex{Dup2}
\\\ex{(- \var{fdes/port})} & \ex{Close \var{fdes/port}.}
\\\ex{stdports} & \ex{0,1,2 dup'd from standard ports.}
\end{tabular}
\end{inset}
The input redirections default to file descriptor 0;
the output redirections default to file descriptor 1.
The subforms of a redirection are implicitly backquoted,
and symbols stand for their print-names.
So \ex{(> ,x)} means
``output to the file named by {\Scheme} variable \ex{x},''
and \ex{(< /usr/shivers/.login)} means ``read from \ex{/usr/shivers/.login}.''
\pagebreak
Here are two more examples of i/o redirection:
%
\begin{center}
\begin{codebox}
(< ,(vector-ref fv i))
(>> 2 /tmp/buf)\end{codebox}
\end{center}
%
These two redirections cause the file \ex{fv[i]} to be opened on stdin, and
\ex{/tmp/buf} to be opened for append writes on stderr.
The redirection \ex{(<< \var{object})} causes input to come from the
printed representation of \var{object}.
For example,
\codex{(<< "The quick brown fox jumped over the lazy dog.")}
causes reads from stdin to produce the characters of the above string.
The object is converted to its printed representation using the \ex{display}
procedure, so
\codex{(<< (A five element list))}
is the same as
\codex{(<< "(A five element list)")}
is the same as
\codex{(<< ,(reverse '(list element five A))){\rm.}}
(Here we use the implicit backquoting feature to compute the list to
be printed.)
The redirection \ex{(= \var{fdes} \var{fdes/port})} causes \var{fdes/port}
to be dup'd into file descriptor \var{fdes}.
For example, the redirection
\codex{(= 2 1)}
causes stderr to be the same as stdout.
\var{fdes/port} can also be a port, for example:
\codex{(= 2 ,(current-output-port))}
causes stderr to be dup'd from the current output port.
In this case, it is an error if the port is not a file port
(\eg, a string port).
More complex redirections can be accomplished using the \ex{begin}
process form, discussed below, which gives the programmer full control
of i/o redirection from {\Scheme}.
\subsection{Port and file descriptor sync}
\begin{sloppypar}
It's important to remember that rebinding Scheme's current I/O ports
(\eg, using \ex{call-with-input-file} to rebind the value of
\ex{(current-input-port)})
does \emph{not} automatically ``rebind'' the file referenced by the
{\Unix} stdio file descriptors 0, 1, and 2.
This is impossible to do in general, since some {\Scheme} ports are
not representable as {\Unix} file descriptors.
For example, many {\Scheme} implementations provide ``string ports,''
that is, ports that collect characters sent to them into memory buffers.
The accumulated string can later be retrieved from the port as a string.
If a user were to bind \ex{(current-output-port)} to such a port, it would
be impossible to associate file descriptor 1 with this port, as it
cannot be represented in {\Unix}.
So, if the user subsequently forked off some other program as a subprocess,
that program would of course not see the {\Scheme} string port as its standard
output.
\end{sloppypar}
To keep stdio synced with the values of {\Scheme}'s current i/o ports,
use the special redirection \ex{stdports}.
This causes 0, 1, 2 to be redirected from the current {\Scheme} standard ports.
It is equivalent to the three redirections:
\begin{code}
(= 0 ,(current-input-port))
(= 1 ,(current-output-port))
(= 2 ,(error-output-port))\end{code}
%
The redirections are done in the indicated order. This will cause an error if
one of the current i/o ports isn't a {\Unix} port (\eg, if one is a string
port).
This {\Scheme}/{\Unix} i/o synchronisation can also be had in {\Scheme} code
(as opposed to a redirection spec) with the \ex{(stdports->stdio)}
procedure.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Process forms}
A \emph{process form} specifies a computation to perform as an independent
{\Unix} process. It can be one of the following:
%
\begin{leftinset}
\begin{codebox}
(begin . \var{scheme-code})
(| \vari{pf}{\!1} {\ldots} \vari{pf}{\!n})
(|+ \var{connect-list} \vari{pf}{\!1} {\ldots} \vari{pf}{\!n})
(epf . \var{epf})
(\var{prog} \vari{arg}{1} {\ldots} \vari{arg}{n})
\end{codebox}
\qquad
\begin{codebox}
; Run \var{scheme-code} in a fork.
; Simple pipeline
; Complex pipeline
; An extended process form.
; Default: exec the program.
\end{codebox}
\end{leftinset}
%
The default case \ex{(\var{prog} \vari{arg}1 {\ldots} \vari{arg}n)}
is also implicitly backquoted.
That is, it is equivalent to:
%
\codex{(begin (apply exec-path `(\var{prog} \vari{arg}1 {\ldots} \vari{arg}n)))}
%
\ex{Exec-path} is the version of the \ex{\urlh{http://www.FreeBSD.org/cgi/man.cgi?query=exec&apropos=0&sektion=0&manpath=FreeBSD+4.3-RELEASE&format=html}{exec()}} system call that
uses scsh's path list to search for an executable.
The program and the arguments must be either strings, symbols, or integers.
Symbols and integers are coerced to strings.
A symbol's print-name is used.
Integers are converted to strings in base 10.
Using symbols instead of strings is convenient, since it suppresses the
clutter of the surrounding \ex{"\ldots"} quotation marks.
To aid this purpose, scsh reads symbols in a case-sensitive manner,
so that you can say
\codex{(more Readme)}
and get the right file.
A \var{connect-list} is a specification of how two processes are to be wired
together by pipes.
It has the form \ex{((\vari{from}1 \vari{from}2 {\ldots} \var{to}) \ldots)}
and is implicitly backquoted.
For example,
%
\codex{(|+ ((1 2 0) (3 1)) \vari{pf}{\!1} \vari{pf}{\!2})}
%
runs \vari{pf}{\!1} and \vari{pf}{\!2}.
The first clause \ex{(1 2 0)} causes \vari{pf}{\!1}'s
stdout (1) and stderr (2) to be connected via pipe
to \vari{pf}{\!2}'s stdin (0).
The second clause \ex{(3 1)} causes \vari{pf}{\!1}'s file descriptor 3 to be
connected to \vari{pf}{\!2}'s file descriptor 1.
%---this is unusual, and not expected to occur very often.
The \ex{begin} process form does a \ex{stdio->stdports} synchronisation
in the child process before executing the body of the form.
This guarantees that the \ex{begin} form, like all other process forms,
``sees'' the effects of any associated I/O redirections.
Note that {\R4RS} does not specify whether or not \ex{|} and \ex{|+}
are readable symbols. Scsh does.
\section{Using extended process forms in \Scheme}
Process forms and extended process forms are \emph{not} {\Scheme}.
They are a different notation for expressing computation that, like {\Scheme},
is based upon s-expressions.
Extended process forms are used in {\Scheme} programs by embedding them inside
special {\Scheme} forms.
There are three basic {\Scheme} forms that use extended process forms:
\ex{exec-epf}, \cd{&}, and \ex{run}.
\dfn {exec-epf} {. \var{epf}} {\noreturn} {syntax}
\dfnx {\&} {. \var{epf}} {proc} {syntax}
\dfnx {run} {. \var{epf}} {proc} {syntax}
\begin{desc}
\index{exec-epf} \index{\&} \index{run}
The \ex{(exec-epf . \var{epf})} form nukes the current process: it establishes
the i/o redirections and then overlays the current process with the requested
computation.
The \ex{(\& . \var{epf})} form is similar, except that the process is forked
off in background. The form returns the subprocess' process object.
The \ex{(run . \var{epf})} form runs the process in foreground:
after forking off the computation, it waits for the subprocess to exit,
and returns its exit status.
These special forms are macros that expand into the equivalent
series of system calls.
The definition of the \ex{exec-epf} macro is non-trivial,
as it produces the code to handle i/o redirections and set up pipelines.
However, the definitions of the \cd{&} and \ex{run} macros are very simple:
\begin{leftinset}
\begin{tabular}{@{}l@{\quad$\equiv$\quad}l@{}}
\cd{(& . \var{epf})} & \ex{(fork (\l{} (exec-epf . \var{epf})))} \\
\ex{(run . \var{epf})} & \cd{(wait (& . \var{epf}))}
\end{tabular}
\end{leftinset}
\end{desc}
\subsection{Procedures and special forms}
It is a general design principle in scsh that all functionality
made available through special syntax is also available in a
straightforward procedural form.
So there are procedural equivalents for all of the process notation.
In this way, the programmer is not restricted by the particular details of
the syntax.
Here are some of the syntax/procedure equivalents:
\begin{inset}
\begin{tabular}{@{}|ll|@{}}
\hline
Notation & Procedure \\ \hline \hline
\ex{|} & \ex{fork/pipe} \\
\ex{|+} & \ex{fork/pipe+} \\
\ex{exec-epf} & \ex{exec-path} \\
redirection & \ex{open}, \ex{dup} \\
\cd{&} & \ex{fork} \\
\ex{run} & $\ex{wait} + \ex{fork}$ \\
\hline
\end{tabular}
\end{inset}
%
Having a solid procedural foundation also allows for general notational
experimentation using {\Scheme}'s macros.
For example, the programmer can build his own pipeline notation on top of the
\ex{fork} and \ex{fork/pipe} procedures.
Chapter~\ref{chapt:syscalls} gives the full story on all the procedures
in the syscall library.
\subsection{Interfacing process output to {\Scheme}}
\label{sec:io-interface}
There is a family of procedures and special forms that can be used
to capture the output of processes as {\Scheme} data.
%
\dfn {run/port} {. \var{epf}} {port} {syntax}
\dfnx{run/file} {. \var{epf}} {\str} {syntax}
\dfnx{run/string} {. \var{epf}} {\str} {syntax}
\dfnx{run/strings} {. \var{epf}} {{\str} list} {syntax}
\dfnx{run/sexp} {. \var{epf}} {object} {syntax}
\dfnx{run/sexps} {. \var{epf}} {list} {syntax}
\begin{desc}
These forms all fork off subprocesses, collecting the process' output
to stdout in some form or another.
The subprocess runs with file descriptor 1 and the current output port
bound to a pipe.
\begin{desctable}{0.7\linewidth}
\ex{run/port} & Value is a port open on process's stdout.
Returns immediately after forking child. \\
\ex{run/file} & Value is name of a temp file containing process's output.
Returns when process exits. \\
\ex{run/string} & Value is a string containing process' output.
Returns when eof read. \\
\ex{run/strings}& Splits process' output into a list of
newline-delimited strings. Returns when eof read. \\
\ex{run/sexp} & Reads a single object from process' stdout with \ex{read}.
Returns as soon as the read completes. \\
\ex{run/sexps} & Repeatedly reads objects from process' stdout with \ex{read}.
Returns accumulated list upon eof.
\end{desctable}
The delimiting newlines are not included in the strings returned by
\ex{run/strings}.
These special forms just expand into calls to the following analogous
procedures.
\end{desc}
\defun {run/port*} {thunk} {port}
\defunx {run/file*} {thunk} {\str}
\defunx {run/string*} {thunk} {\str}
\defunx {run/strings*} {thunk} {{\str} list}
\defunx {run/sexp*} {thunk} {object}
\defunx {run/sexps*} {thunk} {object list}
\begin{desc}
For example, \ex{(run/port . \var{epf})} expands into
\codex{(run/port* (\l{} (exec-epf . \var{epf}))).}
\end{desc}
The following procedures are also of utility for generally parsing
input streams in scsh:
\defun {port->string} {port} {\str}
\defunx {port->sexp-list} {port} {list}
\defunx {port->string-list} {port} {{\str} list}
\defunx {port->list} {reader port} {list}
\begin{desc}
\ex{Port->string} reads the port until eof,
then returns the accumulated string.
\ex{Port->sexp-list} repeatedly reads data from the port until eof,
then returns the accumulated list of items.
\ex{Port->string-list} repeatedly reads newline-terminated strings from the
port until eof, then returns the accumulated list of strings.
The delimiting newlines are not part of the returned strings.
\ex{Port->list} generalises these two procedures.
It uses \var{reader} to repeatedly read objects from a port.
It accumulates these objects into a list, which is returned upon eof.
The \ex{port->string-list} and \ex{port->sexp-list} procedures
are trivial to define, being merely \ex{port->list} curried with
the appropriate parsers:
\begin{code}\cddollar
(port->string-list \var{port}) $\equiv$ (port->list read-line \var{port})
(port->sexp-list \var{port}) $\equiv$ (port->list read \var{port})\end{code}
%
The following compositions also hold:
\begin{code}\cddollar
run/string* $\equiv$ port->string $\circ$ run/port*
run/strings* $\equiv$ port->string-list $\circ$ run/port*
run/sexp* $\equiv$ read $\circ$ run/port*
run/sexps* $\equiv$ port->sexp-list $\circ$ run/port*\end{code}
\end{desc}
\defun{port-fold}{port reader op . seeds} {\object\star}
\begin{desc}
This procedure can be used to perform a variety of iterative operations
over an input stream.
It repeatedly uses \var{reader} to read an object from \var{port}.
If the first read returns eof, then the entire \ex{port-fold}
operation returns the seeds as multiple values.
If the first read operation returns some other value $v$, then
\var{op} is applied to $v$ and the seeds:
\ex{(\var{op} \var{v} . \var{seeds})}.
This should return a new set of seed values, and the reduction then loops,
reading a new value from the port, and so forth.
(If multiple seed values are used, then \var{op} must return multiple values.)
For example, \ex{(port->list \var{reader} \var{port})}
could be defined as
\codex{(reverse (port-fold \var{port} \var{reader} cons '()))}
An imperative way to look at \ex{port-fold} is to say that it
abstracts the idea of a loop over a stream of values read from
some port, where the seed values express the loop state.
\remark{This procedure was formerly named \texttt{\indx{reduce-port}}.
The old binding is still provided, but is deprecated and will
2001-11-08 03:35:19 -05:00
probably vanish in a future release.}
2001-07-13 02:59:22 -04:00
\end{desc}
\section{More complex process operations}
The procedures and special forms in the previous section provide for the
common case, where the programmer is only interested in the output of the
process.
These special forms and procedures provide more complicated facilities
for manipulating processes.
\subsection{Pids and ports together}
\dfn {run/port+proc} {. \var{epf}} {[port proc]} {syntax}
\defunx {run/port+proc*} {thunk} {[port proc]}
\begin{desc}
This special form and its analogous procedure can be used
if the programmer also wishes access to the process' pid, exit status,
or other information.
They both fork off a subprocess, returning two values:
a port open on the process' stdout (and current output port),
and the subprocess's process object.
A process object encapsulates the subprocess' process id and exit code;
it is the value passed to the \ex{wait} system call.
For example, to uncompress a tech report, reading the uncompressed
data into scsh, and also be able to track the exit status of
the decompression process, use the following:
\begin{code}
(receive (port child) (run/port+proc (zcat tr91-145.tex.Z))
(let* ((paper (port->string port))
(status (wait child)))
{\rm\ldots{}use \ex{paper}, \ex{status}, and \ex{child} here\ldots}))\end{code}
%
Note that you must \emph{first} do the \ex{port->string} and
\emph{then} do the wait---the other way around may lock up when the
zcat fills up its output pipe buffer.
\end{desc}
\subsection{Multiple stream capture}
Occasionally, the programmer may want to capture multiple distinct output
streams from a process. For instance, he may wish to read the stdout and
stderr streams into two distinct strings. This is accomplished with the
\ex{run/collecting} form and its analogous procedure, \ex{run/collecting*}.
%
2001-11-08 03:35:19 -05:00
\dfn {run/collecting} {fds . epf} {[status port\ldots]} {syntax}
\defunx {run/collecting*} {fds thunk} {[status port\ldots]}
2001-07-13 02:59:22 -04:00
\begin{desc}
\ex{Run/collecting} and \ex{run/collecting*} run processes that produce
multiple output streams and return ports open on these streams. To avoid
issues of deadlock, \ex{run/collecting} doesn't use pipes. Instead, it first
runs the process with output to temp files, then returns ports open on the
temp files. For example,
%
\codex{(run/collecting (1 2) (ls))}
%
runs \ex{ls} with stdout (fd 1) and stderr (fd 2) redirected to temporary
files.
When the \ex{ls} is done, \ex{run/collecting} returns three values: the
\ex{ls} process' exit status, and two ports open on the temporary files. The
files are deleted before \ex{run/collecting} returns, so when the ports are
closed, they vanish. The \ex{fds} list of file descriptors is implicitly
backquoted by the special-form version.
For example, if Kaiming has his mailbox protected, then
\begin{code}
(receive (status out err)
(run/collecting (1 2) (cat /usr/kmshea/mbox))
(list status (port->string out) (port->string err)))\end{code}
%
might produce the list
\codex{(256 "" "cat: /usr/kmshea/mbox: Permission denied")}
What is the deadlock hazard that causes \ex{run/collecting} to use temp files?
Processes with multiple output streams can lock up if they use pipes
to communicate with {\Scheme} i/o readers. For example, suppose
some {\Unix} program \ex{myprog} does the following:
\begin{enumerate}
\item First, outputs a single ``\ex{(}'' to stderr.
\item Then, outputs a megabyte of data to stdout.
\item Finally, outputs a single ``\ex{)}'' to stderr, and exits.
\end{enumerate}
Our scsh programmer decides to run \ex{myprog} with stdout and stderr redirected
\emph{via {\Unix} pipes} to the ports \ex{port1} and \ex{port2}, respectively.
He gets into trouble when he subsequently says \ex{(read port2)}.
The {\Scheme} \ex{read} routine reads the open paren, and then hangs in a
\ex{\urlh{http://www.FreeBSD.org/cgi/man.cgi?query=read&apropos=0&sektion=0&manpath=FreeBSD+4.3-RELEASE&format=html}{read()}} system call trying to read a matching close paren.
But before \ex{myprog} sends the close paren down the stderr
pipe, it first tries to write a megabyte of data to the stdout pipe.
However, {\Scheme} is not reading that pipe---it's stuck waiting for input on
stderr.
So the stdout pipe quickly fills up, and \ex{myprog} hangs, waiting for the
pipe to drain.
The \ex{myprog} child is stuck in a stdout/\ex{port1} write;
the {\Scheme} parent is stuck in a stderr/\ex{port2} read.
Deadlock.
Here's a concrete example that does exactly the above:
\begin{code}
(receive (status port1 port2)
(run/collecting (1 2)
(begin
;; Write an open paren to stderr.
(run (echo "(") (= 1 2))
;; Copy a lot of stuff to stdout.
(run (cat /usr/dict/words))
;; Write a close paren to stderr.
(run (echo ")") (= 1 2))))
;; OK. Here, I have a port PORT1 built over a pipe
;; connected to the BEGIN subproc's stdout, and
;; PORT2 built over a pipe connected to the BEGIN
;; subproc's stderr.
(read port2) ; Should return the empty list.
(port->string port1)) ; Should return a big string.\end{code}
%
In order to avoid this problem, \ex{run/collecting} and \ex{run/collecting*}
first run the child process to completion, buffering all the output
streams in temp files (using the \ex{temp-file-channel} procedure, see below).
When the child process exits, ports open on the buffered output are returned.
This approach has two disadvantages over using pipes:
\begin{itemize}
\item The total output from the child output is temporarily written
to the disk before returning from \ex{run/collecting}. If this output
is some large intermediate result, the disk could fill up.
\item The child producer and {\Scheme} consumer are serialised; there is
no concurrency overlap in their execution.
\end{itemize}
%
However, it remains a simple solution that avoids deadlock. More
sophisticated solutions can easily be programmed up as
needed---\ex{run/collecting*} itself is only 12 lines of simple code.
See \ex{temp-file-channel} for more information on creating temp files
as communication channels.
\end{desc}
\section{Conditional process sequencing forms}
These forms allow conditional execution of a sequence of processes.
\dfn{||} {\vari{pf}1 \ldots \var{pf}n} {\boolean} {syntax}
\begin{desc}
Run each proc until one completes successfully (\ie, exit status zero).
Return true if some proc completes successfully; otherwise \sharpf.
\end{desc}
\dfn{\&\&} {\vari{pf}1 \ldots \var{pf}n} {\boolean} {syntax}
\begin{desc}
Run each proc until one fails (\ie, exit status non-zero).
Return true if all procs complete successfully; otherwise \sharpf.
\end{desc}
\section{Process filters}
These procedures are useful for forking off processes to filter
text streams.
\begin{defundesc}{char-filter}{filter}{\proc}
The \var{filter} argument is a character$\rightarrow$character procedure.
Returns a procedure that when called, repeatedly reads a character
from the current input port, applies \var{filter} to the character,
and writes the result to the current output port.
The procedure returns upon reaching eof on the input port.
For example, to downcase a stream of text in a spell-checking pipeline,
instead of using the {\Unix} \ex{tr A-Z a-z} command, we can say:
\begin{code}
(run (| (delatex)
(begin ((char-filter char-downcase))) ; tr A-Z a-z
(spell)
(sort)
(uniq))
(< scsh.tex)
(> spell-errors.txt))\end{code}
\end{defundesc}
\begin{defundesc}{string-filter}{filter [buflen]}{\proc}
The \var{filter} argument is a string$\rightarrow$string procedure.
Returns a procedure that when called, repeatedly reads a string
from the current input port, applies \var{filter} to the string,
and writes the result to the current output port.
The procedure returns upon reaching eof on the input port.
The optional \var{buflen} argument controls the number of characters
each internal read operation requests; this means that \var{filter}
will never be applied to a string longer than \var{buflen} chars.
The default \var{buflen} value is 1024.
\end{defundesc}