scsh-0.5/doc/scsh-manual/syscalls.tex

2844 lines
119 KiB
TeX
Raw Normal View History

1995-10-13 23:34:21 -04:00
%&latex -*- latex -*-
\chapter{System Calls}
1995-11-03 23:41:53 -05:00
\label{chapt:syscalls}
1995-10-13 23:34:21 -04:00
Scsh provides (almost) complete access to the basic {\Unix} kernel services:
1995-11-03 23:41:53 -05:00
processes, files, signals and so forth. These procedures comprise a
{\Scheme} binding for {\Posix}, with a few of the more standard extensions
thrown in (\eg, symbolic links, \ex{fchown}, \ex{fstat}, sockets).
1995-10-13 23:34:21 -04:00
\section{Errors}
Scsh syscalls never return error codes, and do not use a global
\ex{errno} variable to report errors.
Errors are consistently reported by raising exceptions.
This frees up the procedures to return useful values,
and allows the programmer to assume that
\emph{if a syscall returns, it succeeded.}
This greatly simplifies the flow of the code from the programmer's point
of view.
Since {\Scheme} does not yet have a standard exception system, the scsh
definition remains somewhat vague on the actual form of exceptions
and exception handlers. When a standard exception system is defined,
scsh will move to it. For now, scsh uses the {\scm} exception system,
with a simple sugaring on top to hide the details in the common case.
System call error exceptions contain the {\Unix} \ex{errno} code reported by
the system call. Unlike C, the \ex{errno} value is a part of the exception
packet, it is \emph{not} accessed through a global variable.
For reference purposes, the {\Unix} \ex{errno} numbers
are bound to the variables \ex{errno/perm}, \ex{errno/noent}, {\etc}
System calls never return \ex{error/intr}---they
automatically retry. (Currently only true for I/O calls.)
\begin{dfndesc}
{errno-error}{errno syscall .\ data}{\noreturn}{procedure}
Raises a {\Unix} error exception for {\Unix} error number \var{errno}.
The \var{syscall} and \var{data} arguments are packaged up in the exception
packet passed to the exception handler.
\end{dfndesc}
\defunx{with-errno-handler*}{handler thunk}{value(s) of thunk}
\begin{dfndescx}
{with-errno-handler}{handler-spec . body}{\valueofbody}{syntax}
{\Unix} syscalls raise error exceptions by calling \ex{errno-error}.
Programs can use \ex{with-errno-handler*} to establish
handlers for these exceptions.
If a {\Unix} error arises while \var{thunk} is executing,
\var{handler} is called on two arguments:
\codex{(\var{handler} \var{errno} \var{packet})}
\var{packet} is a list of the form
$$\var{packet} = \ex{(\var{errno-msg} \var{syscall} . \var{data})},$$
where \var{errno-msg} is the standard {\Unix} error message for the error,
\var{syscall} is the procedure that generated the error,
and \var{data} is a list of information generated by the error,
which varies from syscall to syscall.
If \var{handler} returns, the handler search continues upwards.
\var{Handler} can acquire the exception by invoking a saved continuation.
This procedure can be sugared over with the following syntax:
%
\begin{code}
(with-errno-handler
((\var{errno} \var{packet}) \var{clause} \ldots)
\var{body1}
\var{body2}
\ldots)\end{code}
%
This form executes the body forms with a particular errno handler installed.
When an errno error is raised, the handler search machinery will
bind variable \var{errno} to the error's integer code, and variable
\var{packet} to the error's auxiliary data packet.
Then, the clauses will be checked for a match.
The first clause that matches is executed, and its value is the
value of the entire \ex{with-errno-handler} form.
If no clause matches, the handler search continues.
Error clauses have two forms
%
\begin{code}
((\var{errno} \ldots) \var{body} \ldots)
(else \var{body} \ldots)\end{code}
%
In the first type of clause, the \var{errno} forms are integer expressions.
They are evaluated and compared to the error's errno value.
An \ex{else} clause matches any errno value.
Note that the \var{errno} and \var{data}
variables are lexically visible to the error clauses.
Example:
\begin{code}
(with-errno-handler
((errno packet) ; Only handle 3 particular errors.
((errno/wouldblock errno/again)
(loop))
((errno/acces)
(format #t "Not allowed access!")
#f))
(foo frobbotz)
(blatz garglemumph))\end{code}
%
It is not defined what dynamic context the handler executes in,
so fluid variables cannot reliably be referenced.
Note that Scsh system calls always retry when interrupted, so that
the \ex{errno/intr} exception is never raised.
If the programmer wishes to abort a system call on an interrupt, he
should have the interrupt handler explicitly raise an exception or
invoke a stored continuation to throw out of the system call.
\remark{This is not strictly true in the current implementation---only
some of the i/o syscalls loop.
But BSD variants never return \ex{EINTR} anyway, unless you explicitly
request it, so we'll live w/it for now.}
\end{dfndescx}
\subsection{Interactive mode and error handling}
Scsh runs in two modes: interactive and script mode. It starts up in
interactive mode if the scsh interpreter is started up with no script
argument. Otherwise, scsh starts up in script mode. The mode determines
whether scsh prints prompts in between reading and evaluating forms, and it
affects the default error handler. In interactive mode, the default error
handler will report the error, and generate an interactive breakpoint so that
the user can interact with the system to examine, fix, or dismiss from the
error. In script mode, the default error handler causes the scsh process to
exit.
When scsh forks a child with \ex{(fork)}, the child resets to script mode.
This can be overridden if the programmer wishes.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{I/O}
\subsection{Standard {\R4RS} I/O procedures}
In scsh, most standard {\R4RS} i/o operations (such as \ex{display} or
\ex{read-char}) work on both integer file descriptors and {\Scheme} ports.
When doing i/o with a file descriptor, the i/o operation is done
directly on the file, bypassing any buffered data that may have
accumulated in an associated port.
1995-11-03 23:41:53 -05:00
Note that character-at-a-time operations such as \ex{read-char}
1995-10-13 23:34:21 -04:00
are likely to be quite slow when performed directly upon file
descriptors.
The standard {\R4RS} procedures \ex{read-char}, \ex{char-ready?}, \ex{write},
\ex{display}, \ex{newline},
and \ex{write-char} are all generic, accepting integer file descriptor
arguments as well as ports.
Scsh also mandates the availability of \ex{format}, and further requires
\ex{format} to accept file descriptor arguments as well as ports.
The procedures \ex{peek-char} and \ex{read} do \emph{not} accept
file descriptor arguments, since these functions require the ability to
read ahead in the input stream, a feature not supported by {\Unix} I/O.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Port manipulation and standard ports}
\defun {close-after} {port consumer} {value(s) of consumer}
\begin{desc}
Returns \ex{(\var{consumer} \var{port})}, but closes the port on return.
No dynamic-wind magic. \remark{Is there a less-awkward name?}
\end{desc}
\defun {error-output-port}{} {port}
\begin{desc}
This procedure is analogous to \ex{current-output-port}, but produces
a port used for error messages---the scsh equivalent of stderr.
\end{desc}
\defun {with-current-input-port*} {port thunk} {value(s) of thunk}
\defunx {with-current-output-port*} {port thunk} {value(s) of thunk}
\defunx {with-error-output-port*} {port thunk} {value(s) of thunk}
\begin{desc}
These procedures install \var{port} as the current input, current output,
and error output port, respectively, for the duration of a call to
\var{thunk}.
\end{desc}
\dfn {with-current-input-port} {port . body} {value(s) of body} {syntax}
\dfnx {with-current-output-port} {port . body} {value(s) of body} {syntax}
\dfnx {with-error-output-port} {port . body} {value(s) of body} {syntax}
\begin{desc}
These special forms are simply syntactic sugar for the
{\ttt with\=current\=input\=port*} procedure and friends.
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {set-current-input-port!} {port}{\undefined}
\defunx{set-current-output-port!}{port}{\undefined}
\defunx{set-error-output-port!} {port}{\undefined}
\begin{desc}
These procedures alter the dynamic binding of the current I/O port procedures
to new values.
\end{desc}
\defun {close} {port/fd} {\boolean}
1995-10-13 23:34:21 -04:00
\begin{desc}
Close the port or file descriptor.
If \var{port/fd} is a file descriptor, and it has a port allocated to it,
the port is shifted to a new file descriptor created with \ex{(dup
port/fd)} before closing \ex{port/fd}. The port then has its revealed
count set to zero. This reflects the design criteria that ports are not
associated with file descriptors, but with open files.
To close a file descriptor, and any associated port it might have, you
must instead say one of (as appropriate):
\begin{code}
(close (fdes->inport fd))
(close (fdes->outport fd))\end{code}
1995-11-03 23:41:53 -05:00
The procedure returns true if it closed an open port.
If the port was already closed, it returns false;
this is not an error.
1995-10-13 23:34:21 -04:00
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {stdports->stdio}{} {\undefined}
\defunx {stdio->stdports}{} {\undefined}
1995-10-13 23:34:21 -04:00
\begin{desc}
1995-11-03 23:41:53 -05:00
These two procedures are used to synchronise Unix' standard I/O
file descriptors and Scheme's current I/O ports.
\ex{(stdports->stdio)} causes the standard I/O file descriptors
(0, 1, and 2) to take their values from the current I/O ports.
It is exactly equivalent to the series of
1995-10-13 23:34:21 -04:00
redirections:\footnote{Why not \ex{move->fdes}?
Because the current output port and error port
might be the same port.}
\begin{code}
(dup (current-input-port) 0)
(dup (current-output-port) 1)
(dup (error-output-port) 2)\end{code}
%
1995-11-03 23:41:53 -05:00
\ex{stdio->stdports} causes the bindings of the current I/O ports
to be changed to ports constructed over the standard I/O file
descriptors.
It is exactly equivalent to the series of assignments
\begin{code}
(set-current-input-port! (fdes->inport 0))
(set-current-output-port! (fdes->inport 1))
(set-error-output-port! (fdes->inport 2))\end{code}
However, you are more likely to find the dynamic-extent variant,
\ex{with-stdio-ports*}, below, to be of use in general programming.
\end{desc}
\defun{with-stdio-ports*} {thunk} {value(s) of thunk}
\dfnx {with-stdio-ports} {body \ldots} {value(s) of body}{syntax}
\begin{desc}
\ex{with-stdio-ports*} binds the standard ports \ex{(current-input-port)},
1995-10-13 23:34:21 -04:00
\ex{(current-output-port)}, and \ex{(error-output-port)} to be ports
on file descriptors 0, 1, 2, and then calls \var{thunk}.
It is equivalent to:
\begin{code}
(with-current-input-port (fdes->inport 0)
(with-current-output-port (fdes->inport 1)
(with-error-output-port (fdes->outport 2)
(thunk))))\end{code}
1995-11-03 23:41:53 -05:00
%
The \ex{with-stdio-ports} special form is merely syntactic sugar.
1995-10-13 23:34:21 -04:00
\end{desc}
1995-11-03 23:41:53 -05:00
1995-10-13 23:34:21 -04:00
\subsection{String ports}
{\scm} has string ports, which you can use. Scsh has not committed to the
particular interface or names that {\scm} uses, so be warned that the
interface described herein may be liable to change.
\defun {make-string-input-port} {string} {\port}
\begin{desc}
Returns a port that reads characters from the supplied string.
\end{desc}
\defun {make-string-output-port} {} {\port}
\defunx {string-output-port-output} {port} {\port}
\begin{desc}
1995-11-03 23:41:53 -05:00
A string output port is a port that collects the characters given to it into
1995-10-13 23:34:21 -04:00
a string.
The accumulated string is retrieved by applying \ex{string-output-port-output}
to the port.
\end{desc}
\defun {call-with-string-output-port} {procedure} {\str}
\begin{desc}
1995-11-03 23:41:53 -05:00
The \var{procedure} value is called on a port. When it returns,
1995-10-13 23:34:21 -04:00
\ex{call-with-string-output-port} returns a string containing the
1995-11-03 23:41:53 -05:00
characters that were written to that port during the execution
of \var{procedure}.
1995-10-13 23:34:21 -04:00
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Revealed ports and file descriptors}
The material in this section and the following one is not critical for most
applications.
You may safely skim or completely skip this section on a first reading.
Dealing with {\Unix} file descriptors in a {\Scheme} environment is difficult.
In {\Unix}, open files are part of the process environment, and are referenced
by small integers called \emph{file descriptors}. Open file descriptors are
the fundamental way i/o redirections are passed to subprocesses, since
file descriptors are preserved across fork's and exec's.
{\Scheme}, on the other hand, uses ports for specifying i/o sources. Ports are
garbage-collected {\Scheme} objects, not integers. Ports can be garbage
collected; when a port is collected, it is also closed. Because file
descriptors are just integers, it's impossible to garbage collect them---you
wouldn't be able to close file descriptor 3 unless there were no 3's in the
system, and you could further prove that your program would never again
compute a 3. This is difficult at best.
If a {\Scheme} program only used {\Scheme} ports, and never actually used
file descriptors, this would not be a problem. But {\Scheme} code
must descend to the file descriptor level in at least two circumstances:
%
\begin{itemize}
\item when interfacing to foreign code
\item when interfacing to a subprocess.
\end{itemize}
%
This causes a problem. Suppose we have a {\Scheme} port constructed
on top of file descriptor 2. We intend to fork off a program that
will inherit this file descriptor. If we drop references to the port,
the garbage collector may prematurely close file 2 before we fork
the subprocess. The interface described below is intended to fix this and
other problems arising from the mismatch between ports and file descriptors.
The {\Scheme} kernel maintains a port table that maps a file descriptor
to the {\Scheme} port allocated for it (or, {\sharpf} if there is no port
allocated for this file descriptor). This is used to ensure that
there is at most one open port for each open file descriptor.
The port data structure for file ports has two fields besides the descriptor:
1995-11-03 23:41:53 -05:00
\var{revealed} and \var{closed?}.
When a file port is closed with \ex{(close port)},
the port's file descriptor is closed, its entry in the port table is cleared,
and the port's \var{closed?} field is set to true.
1995-10-13 23:34:21 -04:00
When a file descriptor is closed with \ex{(close fdes)}, any associated
port is shifted to a new file descriptor created with \ex{(dup fdes)}.
1995-11-03 23:41:53 -05:00
The port has its revealed count reset to zero (and hence becomes eligible
for closing on GC). See discussion below.
To really put a stake through a descriptor's heart without waiting for
associated ports to be GC'd, you must say one of
1995-10-13 23:34:21 -04:00
%
\begin{code}
(close (fdes->inport fdes))
(close (fdes->output fdes))\end{code}
1995-11-03 23:41:53 -05:00
The \var{revealed} field is an aid to garbage collection. It is an integer
1995-10-13 23:34:21 -04:00
semaphore. If it is zero, the port's file descriptor can be closed when
1995-11-03 23:41:53 -05:00
the port is collected. Essentially, the \var{revealed} field reflects whether
1995-10-13 23:34:21 -04:00
or not the port's file descriptor has escaped to the {\Scheme} user. If
the {\Scheme} user doesn't know what file descriptor is associated with
a given port, then he can't possibly retain an ``integer handle'' on the
port after dropping pointers to the port itself, so the garbage collector
is free to close the file.
Ports allocated with \ex{open-output-file} and \ex{open-input-file} are
1995-11-03 23:41:53 -05:00
unrevealed ports---\ie, \var{revealed} is initialised to 0.
No one knows the port's file descriptor, so the file descriptor can be closed
when the port is collected.
1995-10-13 23:34:21 -04:00
The functions \ex{fdes->output-port}, \ex{fdes->input-port}, \ex{port->fdes}
are used to shift back and forth between file descriptors and ports. When
\ex{port->fdes} reveals a port's file descriptor, it increments the port's
1995-11-03 23:41:53 -05:00
\var{revealed} field. When the user is through with the file descriptor, he
can call \ex{(release-port-handle \var{port})}, which decrements the count.
The function \ex{(call/fdes fdes/port \var{proc})} automates this protocol.
\ex{call/fdes} uses \ex{dynamic-wind} to enforce the protocol.
If \var{proc} throws out of the \ex{call/fdes} application,
the unwind handler releases the descriptor handle;
if the user subsequently tries to throw \emph{back} into \var{proc}'s
context, the wind handler raises an error. When the user maps a file
descriptor to a port with \ex{fdes->outport} or \ex{fdes->inport}, the port
has its revealed field incremented.
1995-10-13 23:34:21 -04:00
Not all file descriptors are created by requests to make ports. Some are
inherited on process invocation via \ex{exec(2)}, and are simply part of the
global environment. Subprocesses may depend upon them, so if a port is later
allocated for these file descriptors, is should be considered as a revealed
port. For example, when the {\Scheme} shell's process starts up, it opens ports
on file descriptors 0, 1, and 2 for the initial values of
\ex{(current-input-port)}, \ex{(current-output-port)}, and
1995-11-03 23:41:53 -05:00
\ex{(error-output-port)}.
These ports are initialised with \var{revealed} set to 1,
1995-10-13 23:34:21 -04:00
so that stdin, stdout, and stderr are not closed even if the user drops the
1995-11-03 23:41:53 -05:00
port.
1995-10-13 23:34:21 -04:00
Unrevealed file ports have the nice property that they can be closed when all
pointers to the port are dropped. This can happen during gc, or at an
\ex{exec()}---since all memory is dropped at an \ex{exec()}. No one knows the
file descriptor associated with the port, so the exec'd process certainly
can't refer to it.
This facility preserves the transparent close-on-collect property
for file ports that are used in straightforward ways, yet allows
access to the underlying {\Unix} substrate without interference from
the garbage collector. This is critical, since shell programming
absolutely requires access to the {\Unix} file descriptors, as their
numerical values are a critical part of the process interface.
1995-11-03 23:41:53 -05:00
A port's underlying file descriptor can be shifted around with \ex{dup(2)}
when convenient. That is, the actual file descriptor on top of which a port is
constructed can be shifted around underneath the port by the scsh kernel when
necessary. This is important, because when the user is setting up file
descriptors prior to a \ex{exec(2)}, he may explicitly use a file descriptor
that has already been allocated to some port. In this case, the scsh kernel
just shifts the port's file descriptor to some new location with \ex{dup},
freeing up its old descriptor. This prevents errors from happening in the
following scenario. Suppose we have a file open on port \ex{f}. Now we want
to run a program that reads input on file 0, writes output to file 1, errors
to file 2, and logs execution information on file 3. We want to run this
program with input from \ex{f}.
So we write:
1995-10-13 23:34:21 -04:00
%
\begin{code}
(run (/usr/shivers/bin/prog)
(> 1 output.txt)
(> 2 error.log)
(> 3 trace.log)
(= 0 ,f))\end{code}
%
Now, suppose by ill chance that, unbeknownst to us, when the operating system
opened \ex{f}'s file, it allocated descriptor 3 for it. If we blindly redirect
\ex{trace.log} into file descriptor 3, we'll clobber \ex{f}! However, the
port-shuffling machinery saves us: when the \ex{run} form tries to dup
\ex{trace.log}'s file descriptor to 3, \ex{dup} will notice that file
descriptor 3 is already associated with an unrevealed port (\ie, \ex{f}). So,
it will first move \ex{f} to some other file descriptor. This keeps \ex{f}
alive and well so that it can subsequently be dup'd into descriptor 0 for
\ex{prog}'s stdin.
The port-shifting machinery makes the following guarantee: a port is only
moved when the underlying file descriptor is closed, either by a \ex{close()}
or a \ex{dup2()} operation. Otherwise a port/file-descriptor association is
stable.
Under normal circumstances, all this machinery just works behind the scenes to
keep things straightened out. The only time the user has to think about it is
when he starts accessing file descriptors from ports, which he should almost
never have to do. If a user starts asking what file descriptors have been
allocated to what ports, he has to take responsibility for managing this
information.
\subsection{Port-mapping machinery}
The procedures provided in this section are almost never needed.
You may safely skim or completely skip this section on a first reading.
Here are the routines for manipulating ports in scsh. The important
points to remember are:
\begin{itemize}
\item A file port is associated with an open file, not a particular file
descriptor.
\item The association between a file port and a particular file descriptor
is never changed \emph{except} when the file descriptor is explicitly
closed. ``Closing'' includes being used as the target of a \ex{dup2}, so
the set of procedures below that close their targets are
\ex{close}, two-argument \ex{dup}, and \ex{move->fdes}.
If the target file descriptor of one of these routines has an
allocated port, the port will be shifted to another freshly-allocated
file descriptor, and marked as unrevealed, thus preserving the port
but freeing its old file descriptor.
\end{itemize}
These rules are what is necessary to ``make things work out'' with no
surprises in the general case.
\defun {fdes->inport} {fd} {port}
\defunx {fdes->outport} {fd} {port}
\defunx {port->fdes} {port} {\fixnum}
\begin{desc}
These increment the port's revealed count.
\end{desc}
\defun {port-revealed} {port} {{\integer} or \sharpf}
\begin{desc}
Return the port's revealed count if positive, otherwise \sharpf.
\end{desc}
\defun{release-port-handle} {port} {\undefined}
\begin{desc}
Decrement the port's revealed count.
\end{desc}
\defun {call/fdes} {fd/port consumer} {value(s) of consumer}
\begin{desc}
Calls \var{consumer} on a file descriptor;
takes care of revealed bookkeeping.
If \var{fd/port} is a file descriptor, this is just
\ex{(\var{consumer} \var{fd/port})}.
If \var{fd/port} is a port,
calls \var{consumer} on its underlying file descriptor.
While \var{consumer} is running, the port's revealed count is incremented.
When \ex{call/fdes} is called with port argument, you are not allowed to
throw into \var{consumer} with a stored continuation, as that would violate
the revealed-count bookkeeping.
\end{desc}
\defun{move->fdes} {fd/port target-fd} {port or fdes}
\begin{desc}
Maps fd$\rightarrow$fd and port$\rightarrow$port.
If \var{fd/port} is a file-descriptor not equal to \var{target-fd},
dup it to \var{target-fd} and close it. Returns \var{target-fd}.
If \var{fd/port} is a port, it is shifted to \var{target-fd},
by duping its underlying file-descriptor if necessary.
\var{Fd/port}'s original file descriptor is
closed (if it was different from \var{target-fd}).
Returns the port.
This operation resets \var{fd/port}'s revealed count to 1.
In all cases when \var{fd/port} is actually shifted, if there is a port
already using \var{target-fd}, it is first relocated to some other file
descriptor.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{{\Unix} I/O}
\defun {dup} {port/fd [newfd]} {port/fd}
\defunx{dup->inport} {port/fd [newfd]} {port}
\defunx{dup->outport} {port/fd [newfd]} {port}
\defunx{dup->fdes} {port/fd [newfd]} {fd}
\begin{desc}
1995-11-03 23:41:53 -05:00
These procedures provide the functionality of C's \ex{dup()} and \ex{dup2()}.
1995-10-13 23:34:21 -04:00
The different routines return different types of values:
\ex{dup->inport}, \ex{dup->outport}, and \ex{dup->fdes} return
input ports, output ports, and integer file descriptors, respectively.
\ex{dup}'s return value depends on on the type of
\var{port/fd}---it maps fd$\rightarrow$fd and port$\rightarrow$port.
These procedures use the {\Unix} \ex{dup()} syscall to replicate
the file descriptor or file port \var{port/fd}.
If a \var{newfd} file descriptor is given, it is used as the target of
the dup operation, \ie, the operation is a \ex{dup2()}.
In this case, procedures that return a port (such as \ex{dup->inport})
will return one with the revealed count set to one.
For example, \ex{(dup (current-input-port) 5)} produces
a new port with underlying file descriptor 5, whose revealed count is 1.
If \var{newfd} is not specified,
then the operating system chooses the file descriptor,
and any returned port is marked as unrevealed.
If the \var{newfd} target is given,
and some port is already using that file descriptor,
the port is first quietly shifted (with another \ex{dup})
to some other file descriptor (zeroing its revealed count).
Since {\Scheme} doesn't provide read/write ports,
\ex{dup->inport} and \ex{dup->outport} can be useful for
getting an output version of an input port, or \emph{vice versa}.
For example, if \ex{p} is an input port open on a tty, and
we would like to do output to that tty, we can simply use
\ex{(dup->outport p)} to produce an equivalent output port for the tty.
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {seek} {fd/port offset [whence]} {\integer}
\begin{desc}
Reposition the I/O cursor for a file descriptor or port.
\var{whence} is one of \{\ex{seek/set}, \ex{seek/delta}, \ex{seek/end}\},
and defaults to \ex{seek/set}.
If \ex{seek/set}, then \var{offset} is an absolute index into the file;
if \ex{seek/delta}, then \var{offset} is a relative offset from the current
I/O cursor;
if \ex{seek/end}, then \var{offset} is a relative offset from the end of file.
The \var{fd/port} argument may be a port or an integer file descriptor.
Not all such values are seekable;
this is dependent on the OS implementation.
The return value is the resulting position of the I/O cursor in the I/O stream.
1995-10-13 23:34:21 -04:00
\oops{The current implementation doesn't handle \var{offset} arguments
that are not immediate integers (\ie, representable in 30 bits).}
1995-11-03 23:41:53 -05:00
\end{desc}
\defun {tell} {fd/port} {\integer}
\begin{desc}
Returns the position of the I/O cursor in the the I/O stream.
Not all file descriptors or ports support cursor-reporting;
this is dependent on the OS implementation.
\end{desc}
1995-10-13 23:34:21 -04:00
\begin{defundesc} {open-file} {fname flags [perms]} {\port}
\var{Perms} defaults to \cd{#o666}.
\var{Flags} is an integer bitmask, composed by or'ing together the following
constants:
\begin{code}\codeallowbreaks
open/read ; You may only
open/write ; choose one
open/read+write ; of these three
open/no-control-tty
open/nonblocking
open/append
open/create
open/truncate
open/exclusive
. ; Your Unix may have
. ; a few more.\end{code}
%
1995-11-03 23:41:53 -05:00
Returns a port. The port is an input port if the \var{flags} permit it,
1995-10-13 23:34:21 -04:00
otherwise an output port. \R4RS/\scm/scsh do not have input/output ports,
so it's one or the other. This should be fixed. (You can hack simultaneous
i/o on a file by opening it r/w, taking the result input port,
and duping it to an output port with \ex{dup->outport}.)
\end{defundesc}
\defun{open-input-file}{fname [flags]}\port
\begin{defundescx}{open-output-file}{fname [flags perms]}\port
These are equivalent to \ex{open-file}, after first setting the
read/write bits of the \var{flags} argument to \ex{open/read} or
\ex{open/write}, respectively.
\var{Flags} defaults to zero for \ex{open-input-file},
and
\codex{(bitwise-ior open/create open/truncate)}
for \ex{open-output-file}.
These defaults make the procedures backwards-compatible with their
unary {\R4RS} definitions.
\end{defundescx}
\begin{defundesc} {open-fdes} {fname flags [perms]} \integer
Returns a file descriptor.
\end{defundesc}
\begin{defundesc}{pipe}{} {[\var{rport} \var{wport}]}
Returns two ports, the read and write end-points of a {\Unix} pipe.
\end{defundesc}
\defun{read-string}{nbytes [fd/port]} {{\str} or \sharpf}
1995-11-03 23:41:53 -05:00
\dfnix{read-string!} {str [fd/port start end]} {nread or \sharpf}{procedure}
{read-string"!@\texttt{read-string"!}}
\begin{desc}
1995-10-13 23:34:21 -04:00
These calls read exactly as much data as you requested, unless
there is not enough data (eof).
\ex{read-string!} reads the data into string \var{str}
at the indices in the half-open interval $[\var{start},\var{end})$;
the default interval is the whole string: $\var{start}=0$ and
$\var{end}=\ex{(string-length \var{string})}$.
They will persistently retry on partial reads and when interrupted
until (1) error, (2) eof, or (3) the input request is completely
satisfied.
Partial reads can occur when reading from an intermittent source,
such as a pipe or tty.
\ex{read-string} returns the string read; \ex{read-string!} returns
the number of characters read. They both return false at eof.
A request to read zero bytes returns immediately, with no eof check.
The values of \var{start} and \var{end} must specify a well-defined
interval in \var{str},
\ie, $0 \le \var{start} \le \var{end} \le \ex{(string-length \var{str})}$.
Any partially-read data is included in the error exception packet.
Error returns on non-blocking input are considered an error.
1995-11-03 23:41:53 -05:00
\end{desc}
1995-10-13 23:34:21 -04:00
\defun {read-string/partial} {nbytes [fd/port]} {{\str} or \sharpf}
1995-11-03 23:41:53 -05:00
\dfnix{read-string!/partial} {str [fd/port start end]} {nread or \sharpf}
{procedure}{read-string"!/partial@\texttt{read-string"!/partial}}
\begin{desc}
1995-10-13 23:34:21 -04:00
%
These are atomic best-effort/forward-progress calls.
Best effort: they may read less than you request if there is a
lesser amount of data immediately available (\eg, because you
are reading from a pipe or a tty).
Forward progress: if no data is immediately available
(\eg, empty pipe), they will block.
Therefore, if you request an $n>0$ byte read,
while you may not get everything you asked for, you will always get something
(barring eof).
There is one case in which the forward-progress guarantee is cancelled:
when the programmer explicitly sets the port to non-blocking i/o.
In this case, if no data is immediately available,
the procedure will not block, but will immediately return a zero-byte read.
\ex{read-string/partial} reads the data into a freshly allocated string,
which it returns as its value.
\ex{read-string!/partial} reads the data into string \var{str}
at the indices in the half-open interval $[\var{start},\var{end})$;
the default interval is the whole string: $\var{start}=0$ and
$\var{end}=\ex{(string-length \var{string})}$.
The values of \var{start} and \var{end} must specify a well-defined
interval in \var{str},
\ie, $0 \le \var{start} \le \var{end} \le \ex{(string-length \var{str})}$.
It returns the number of bytes read.
A request to read zero bytes returns immediatedly, with no eof check.
In sum, there are only three ways you can get a zero-byte read:
(1) you request one, (2) you turn on non-blocking i/o, or (3) you
try to read at eof.
These are the routines to use for non-blocking input.
They are also useful when you wish to efficiently process data
in large blocks, and your algorithm is insensitive to the block size
of any particular read operation.
1995-11-03 23:41:53 -05:00
\end{desc}
\defun {select }{rvec wvec evec [timeout]}{rvec' wvec' evec'}
\defunx{select!}{rvec wvec evec [timeout]}{nr nw ne}
\begin{desc}
The \ex{select} procedure allows a process to block and wait for events on
multiple I/O channels.
The \var{rvec} and \var{evec} arguments are vectors of input ports and
integer file descriptors; \var{wvec} is a vector of output ports and
integer file descriptors.
The procedure returns three vectors whose elements are subsets of the
corresponding arguments.
Every element of \var{rvec'} is ready for input;
every element of \var{wvec'} is ready for output;
every element of \var{evec'} has an exceptional condition pending.
The \ex{select} call will block until at least one of the I/O channels
passed to it is ready for operation.
The \var{timeout} value can be used to force the call to time-out
after a given number of seconds. It defaults to the special value
\ex{\#f}, meaning wait indefinitely. A zero value can be used to poll
the I/O channels.
If an I/O channel appears more than once in a given vector---perhaps
occuring once as a Scheme port, and once as the port's underlying
integer file descriptor---only one of these two references may appear
in the returned vector.
Buffered I/O ports are handled specially---if an input port's buffer is
not empty, or an output port's buffer is not yet full, then these
ports are immediately considered eligible for I/O without using
the actual, primitive \ex{select} system call to check the underlying
file descriptor.
This works pretty well for buffered input ports, but is a little
problematic for buffered output ports.
The \ex{select!} procedure is similar, but indicates the subset
of active I/O channels by side-effecting the argument vectors.
Non-active I/O channels in the argument vectors are overwritten with
{\sharpf} values.
The call returns the number of active elements remaining in each
vector.
As a convenience, the vectors passed in to \ex{select!} are
allowed to contain {\sharpf} values as well as integers and ports.
\remark{I have found the \ex{select!} interface to be the more
useful of the two. After the system call, it allows you
to check a specific I/O channel in constant time.}
\end{desc}
1995-10-13 23:34:21 -04:00
\begin{defundescx}{write-string}{string [fd/port start end]}\undefined
This procedure writes all the data requested.
If the procedure cannot perform the write with a single kernel call
(due to interrupts or partial writes),
it will perform multiple write operations until all the data is written
or an error has occurred.
A non-blocking i/o error is considered an error.
(Error exception packets for this syscall include the amount of
data partially transferred before the error occurred.)
The data written are the characters of \var{string} in the half-open
interval $[\var{start},\var{end})$.
The default interval is the whole string: $\var{start}=0$ and
$\var{end}=\ex{(string-length \var{string})}$.
The values of \var{start} and \var{end} must specify a well-defined
interval in \var{str},
\ie, $0 \le \var{start} \le \var{end} \le \ex{(string-length \var{str})}$.
A zero-byte write returns immediately, with no error.
Output to buffered ports: \ex{write-string}'s efforts end as soon
as all the data has been placed in the output buffer.
Errors and true output may not happen until a later time, of course.
\end{defundescx}
\begin{defundescx}{write-string/partial}{string [fd/port start end]}{nwritten}
This routine is the atomic best-effort/forward-progress analog
to \ex{write-string}.
It returns the number of bytes written, which may be less than you
asked for.
Partial writes can occur when (1) we write off the physical end of
the media, (2) the write is interrrupted, or (3) the file descriptor
is set for non-blocking i/o.
If the file descriptor is not set up for non-blocking i/o, then
a successful return from these procedures makes a forward progress
guarantee---that is, a partial write took place of at least one byte:
\begin{itemize}
\item If we are at the end of physical media, and no write takes place,
an error exception is raised.
So a return implies we wrote \emph{something}.
\item If the call is interrupted after a partial transfer, it returns
immediately. But if the call is interrupted before any data transfer,
then the write is retried.
\end{itemize}
If we request a zero-byte write, then the call immediately returns 0.
If the file descriptor is set for non-blocking i/o, then the call
may return 0 if it was unable to immediately write anything
(\eg, full pipe).
Barring these two cases, a write either returns $\var{nwritten} > 0$,
or raises an error exception.
Non-blocking i/o is only available on file descriptors and unbuffered
ports. Doing non-blocking i/o to a buffered port is not well-defined,
and is an error (the problem is the subsequent flush operation).
\end{defundescx}
1995-11-03 23:41:53 -05:00
\subsection{Buffered I/O}
{\scm} ports use buffered I/O---data is transferred to or from the
OS in blocks. Scsh provides control of this mechanism: the programmer
may force saved-up output data to be transferred to the OS when
he chooses,
and may also choose which I/O buffering policy to employ for a given
port (or turn buffering off completely).
It can be useful to turn I/O buffering off in some cases, for example
when an I/O stream is to be shared by multiple subprocesses.
For this reason, scsh allocates an unbuffered port for file descriptor 0
at start-up time.
Because shells frequently share stdin with subprocesses, if the shell
does buffered reads, it might ``steal'' input intended for a subprocess. For
this reason, all shells, including sh, csh, and scsh, read stdin unbuffered.
Applications that can tolerate buffered input on stdin can reset
\ex{(current-input-port)} to block buffering for higher performance.
\begin{defundesc}{set-port-buffering}{port policy [size]}\undefined
This procedure allows the programmer to assign a particular I/O buffering
policy to a port, and to choose the size of the associated buffer.
It may only be used on new ports, \ie, before I/O is performed on the port.
There are three buffering policies that may be chosen:
\begin{inset}
\begin{tabular}{l@{\qquad}l}
\ex{bufpol/block} & General block buffering (general default) \\
\ex{bufpol/line} & Line buffering (tty default) \\
\ex{bufpol/none} & Direct I/O---no buffering
\end{tabular}
\end{inset}
The line buffering policy flushes output whenever a newline is output;
whenever the buffer is full; or whenever an input is read from stdin.
Line buffering is the default for ports open on terminal devices.
The \var{size} argument requests an I/O buffer of \var{size} bytes.
If not given, a reasonable default is used; if given and zero,
buffering is turned off
(\ie, $\var{size} = 0$ for any policy is equivalent to
$\var{policy} = \ex{bufpol/none}$).
\end{defundesc}
1995-10-13 23:34:21 -04:00
\begin{defundesc}{force-output} {[fd/port]}{\noreturn}
This procedure does nothing when applied to an integer file descriptor
or unbuffered port.
It flushes buffered output when applied to a buffered port,
and raises a write-error exception on error. Returns no value.
\end{defundesc}
1995-11-03 23:41:53 -05:00
\begin{defundesc}{flush-all-ports} {}{\noreturn}
This procedure flushes all open output ports with buffered data.
\end{defundesc}
\subsection{File locking}
Scsh provides {\Posix} advisory file locking.
\emph{Advisory} locks are locks that can be checked by user code,
but do not affect other I/O operations.
For example, if a process has an exclusive lock on a region of a file,
other processes will not be able to obtain locks on that region of the file,
but they will still be able to read and write the file with no hindrance.
Using advisory locks requires cooperation amongst the agents accessing
the shared resource.
\remark{
Unfortunately, {\Posix} file locks are associated with actual files,
not with associated open file descriptors.
Once a process locks a file, using some file descriptor \var{fd},
the next time \emph{any} file descriptor referencing that file is closed,
all associated locks are released.
Scsh moves Scheme ports from file descriptor to file descriptor with
\ex{dup()} and \ex{close()} as required by the runtime,
so it is impossible to keep file locks open across one of these shifts.
Hence we can only offer {\Posix} advisory file locking directly on raw
integer file descriptors;
regrettably, there are no facilities for locking Scheme ports.
Note that once a Scheme port is revealed in scsh, the runtime will not
shift the port around with \ex{dup()} and \ex{close()}.
This means the file-locking procedures can then be applied to the port's
associated file descriptor.
NeXTSTEP users should also note that even minimalist {\Posix} file locking
is not supported for NFS-mounted files in NeXTSTEP; NeXT claims they will
fix this in NS release 4.
}
{\Posix} allows the user to lock a region of a file with either
an exclusive or shared lock.
Locked regions are described by the \emph{lock-region} record:
\begin{code}
(define-record lock-region
exclusive?
start
len
whence
pid)\end{code}
\index{lock-region?}
\index{lock-region:exclusive?} \index{lock-region:whence}
\index{lock-region:start} \index{lock-region:end}
\index{lock-region:len} \index{lock-region:pid}
%
The \ex{exclusive?} field is true if the lock is exclusive;
false if it is shared.
The \ex{whence} field is one of the values from the \ex{seek} call:
\ex{seek/set}, \ex{seek/delta}, or \ex{seek/end},
and determines the interpretation of the \ex{start} field:
\begin{itemize}
\item If \ex{seek/set}, the \ex{start} value is simply an absolute index
into the file.
\item If \ex{seek/delta}, the \ex{start} value is an offset from the
file descriptor's current position in the file.
\item If \ex{seek/end}, the \ex{start} value is an offset from the
end of the file.
\end{itemize}
The region of the file being locked is given by the \ex{start} and \ex{len}
fields.
The \ex{pid} field gives the process id of the process holding the region
lock, when relevant (see \ex{get-lock-region} below).
\begin{defundesc}{make-lock-region}{exclusive? start len [whence]}{lock-region}
This procedure makes a lock-region record.
The \ex{whence} field defaults to \ex{seek/set}.
\end{defundesc}
\defun {lock-region}{fdes lock}{\undefined}
\defunx{lock-region/no-block}{fdes lock}{\boolean}
\begin{desc}
These procedures lock a region of the file referenced by file descriptor
\var{fdes}.
The \ex{lock-region} procedure blocks until the lock is granted;
the non-blocking variant returns a boolean indicating whether or not
the lock was granted.
\end{desc}
\begin{defundesc}{get-lock-region}{fdes lock}{lock-region or \sharpf}
Return the first lock region on \var{fdes} that overlaps with
the lock region \var{lock}.
If there is no such lock, return false.
This procedure fills out the \ex{pid} field of the returned lock region,
and is the only procedure that has anything to do with this field.
\end{defundesc}
\begin{defundesc}{unlock-region}{fdes lock}{\undefined}
Release a lock from a file.
\end{defundesc}
\defun{with-region-lock*}{fdes lock thunk}{value(s) of thunk}
\dfnx{with-region-lock}{fdes lock body \ldots}{value(s) of body}{syntax}
\begin{desc}
This procedure obtains the requested lock, and then calls
\ex{(\var{thunk})}. When \var{thunk} returns, the lock is released.
A non-local exit (\eg, throwing to a saved continuation or raising
an exception) also causes the lock to be released.
After a normal return from \var{thunk}, its return values are returned
by \ex{with-region-lock*}.
The \ex{with-region-lock} special form is equivalent syntactic sugar.
\end{desc}
1995-10-13 23:34:21 -04:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{File system}
Besides the following procedures, which allow access to the
computer's file system, scsh also provides a set of procedures
which manipulate file \emph{names}. These string-processing
procedures are documented in section \ref{sec:filenames}.
\defun {create-directory} {fname [perms override?]} {\undefined}
\defunx{create-fifo} {fname [perms override?]} {\undefined}
\defunx{create-hard-link} {oldname newname [override?]} {\undefined}
\begin{defundescx}
{create-symlink} {old-name new-name [override?]} {\undefined}
These procedures create objects of various kinds in the file system.
The \var{override?} argument controls the action if there is already an
object in the file system with the new name:
\begin{optiontable}
\sharpf & signal an error (default) \\
'query & prompt the user \\
\textnormal{\emph{other}}& \parbox[t]{0.7\linewidth}{
delete the old object (with \ex{delete-file}
or \ex{delete-directory,} as appropriate) before
creating the new object.}
\end{optiontable}
\var{Perms} defaults to \cd{#o777} (but is masked by the current umask).
\remark{Currently, if you try to create a hard or symbolic link from a
file to itself, you will error out with \var{override?} false, and simply
delete your file with \var{override?} true. Catching this will require
some sort of true-name procedure, which I currently do not have.}
\end{defundescx}
\defun {delete-directory} {fname} \undefined
\defunx{delete-file} {fname} \undefined
\begin{defundescx} {delete-filesys-object} {fname} \undefined
These procedures delete objects from the file system.
The {\ttt delete\=filesys\=object} procedure will delete an object
of any type from the file system: files, (empty) directories, symlinks, fifos,
\etc.
\end{defundescx}
\begin{defundescx}{read-symlink}{fname} \str
Return the filename referenced by symbolic link \ex{fname}.
\end{defundescx}
\begin{defundescx} {rename-file} {old-fname new-fname [override?]} \undefined
If you override an existing object, then \var{old-fname}
and \var{new-fname} must type-match---either both directories,
or both non-directories.
This is required by the semantics of {\Unix} \ex{rename()}.
\remark{
There is an unfortunate atomicity problem with the \ex{rename-file}
procedure: if you
specify no-override, but create file \ex{new-fname} sometime between
\ex{rename-file}'s existence check and the actual rename operation,
your file will be clobbered with \ex{old-fname}. There is no way to fix
this problem, given the semantics of {\Unix} \ex{rename()};
at least it is highly unlikely to occur in practice.
}
\end{defundescx}
\defun {set-file-mode} {fname/fd/port mode} \undefined
\defunx{set-file-owner} {fname/fd/port uid} {\undefined}
\defunx{set-file-group} {fname/fd/port gid} {\undefined}
\begin{desc}
These procedures set the permission bits, owner id, and group id of a
file, respectively.
The file can be specified by giving the file name, or either an
integer file descriptor or a port open on the file.
Setting file user or group ownership usually requires root privileges.
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {set-file-times} {fname [access-time mod-time]} {\undefined}
\begin{desc}
This procedure sets the access and modified times for the file
\var{fname} to the supplied values (see section~\ref{sec:time}
for the scsh representation of time).
If neither time argument is supplied, they are both taken to be
the current time. You must provide both times or neither.
If the procedure completes successfully, the file's time of last
status-change (\ex{ctime}) is set to the current time.
\end{desc}
1995-10-13 23:34:21 -04:00
\defun {sync-file} {fd/port} \undefined
\defunx{sync-file-system}{} \undefined
\begin{desc}
Calling \ex{sync-file}
causes {\Unix} to update the disk data structures for a given file.
If \var{fd/port} is a port, any buffered data it may have is first
flushed.
Calling \ex{sync-file-system} synchronises the kernel's entire file
system with the disk.
These procedures are not {\Posix}.
Interestingly enough, \ex{sync\=file\=system} doesn't actually
do what it is claimed to do. We just threw it in for humor value.
See the \ex{sync(2)} man page for {\Unix} enlightenment.
\end{desc}
\begin{defundesc} {truncate-file} {fname/fd/port len} \undefined
The specified file is truncated to \var{len} bytes in length.
\end{defundesc}
1995-11-03 23:41:53 -05:00
\begin{defundesc}{file-info} {fname/fd/port [chase?]} {file-info-record}
The \ex{file-info} procedure
1995-10-13 23:34:21 -04:00
returns a record structure containing everything
there is to know about a file. If the \var{chase?} flag is true
(the default), then the procedure chases symlinks and reports on
the files to which they refer. If \var{chase?} is false, then
the procedure checks the actual file itself, even if it's a symlink.
The \var{chase?} flag is ignored if the file argument is a file descriptor
or port.
The value returned is a \emph{file-info record}, defined to have the
following structure:
\begin{code}
(define-record file-info
type ; \{block-special, char-special, directory,
; fifo, regular, socket, symlink\}
device ; Device file resides on.
inode ; File's inode.
mode ; File's mode bits: permissions, setuid, setgid
nlinks ; Number of hard links to this file.
uid ; Owner of file.
gid ; File's group id.
size ; Size of file, in bytes.
1995-11-03 23:41:53 -05:00
atime ; Time of last access.
mtime ; Time of last mod.
ctime) ; Time of last status change.\end{code}
1995-10-13 23:34:21 -04:00
\index{file-info:type}\index{file-info:device}\index{file-info:inode}%
\index{file-info:mode}\index{file-info:nlinks}\index{file-info:uid}%
\index{file-info:gid}\index{file-info:size}\index{file-info:atime}%
\index{file-info:mtime}\index{file-info:ctime}%
%
The uid field of a file-info record is accessed with the procedure
\codex{(file-info:uid x)}
and similarly for the other fields.
The \ex{type} field is a symbol; all other fields are integers.
A file-info record is discriminated with the \ex{file-info?} predicate.
The following procedures all return selected information about
1995-11-03 23:41:53 -05:00
a file; they are built on top of \ex{file-info}, and are
1995-10-13 23:34:21 -04:00
called with the same arguments that are passed to it.
\begin{inset}
1995-11-03 23:41:53 -05:00
\newcommand{\Ex}[1]{\ex{#1}\index{#1@{\tt{#1}}}}
1995-10-13 23:34:21 -04:00
\begin{tabular}{ll}
Procedure & returns \\\hline
\Ex{file-type} & type \\
\Ex{file-inode} & inode \\
\Ex{file-mode} & mode \\
\Ex{file-nlinks} & nlinks \\
\Ex{file-owner} & uid \\
\Ex{file-group} & gid \\
\Ex{file-size} & size \\
\Ex{file-last-access} & atime \\
\Ex{file-last-mod} & mtime \\
\Ex{file-last-status-change} & ctime
\end{tabular}
\end{inset}
%
Example:
\begin{code}
;; All my files in /usr/tmp:
(filter (\l{f} (= (file-owner f) (user-uid)))
(directory-files "/usr/tmp")))\end{code}
1995-11-03 23:41:53 -05:00
\remark{\ex{file-info} was named \ex{file-attributes} in releases of scsh
prior to release 0.4. We changed the name to \ex{file-info} for
consistency with the other information-retrieval procedures in
scsh: \ex{user-info}, \ex{group-info}, \ex{host-info},
\ex{network-info }, \ex{service-info}, and \ex{protocol-info}.
The \ex{file-attributes} binding is still supported in the current
release of scsh, but is deprecated, and may go away in a future
release.}
1995-10-13 23:34:21 -04:00
\end{defundesc}
\defun {file-directory?}{fname/fd/port [chase?]}{\boolean}
\defunx {file-fifo?}{fname/fd/port [chase?]}{\boolean}
\defunx {file-regular?}{fname/fd/port [chase?]}{\boolean}
\defunx {file-socket?}{fname/fd/port [chase?]}{\boolean}
\defunx {file-special?}{fname/fd/port [chase?]}{\boolean}
\defunx {file-symlink?}{fname/fd/port}{\boolean}
\begin{desc}
These procedures are file-type predicates that test the
type of a given file.
1995-11-03 23:41:53 -05:00
They are applied to the same arguments to which \ex{file-info} is applied;
1995-10-13 23:34:21 -04:00
the sole exception is \ex{file-symlink?}, which does not take
the optional \var{chase?} second argument.
\begin{inset}
\newcommand{\Ex}[1]{\ex{#1}\index{\tt{#1}}}
\begin{tabular}{l@{\qquad}l}
\end{tabular}
\end{inset}
For example,
\codex{(file-directory? "/usr/dalbertz")\qquad\evalto\qquad\sharpt}
\end{desc}
\defun {file-not-readable?} {fname} \boolean
1995-11-03 23:41:53 -05:00
\defunx{file-not-writable?} {fname} \boolean
1995-10-13 23:34:21 -04:00
\defunx{file-not-executable?} {fname} \boolean
\begin{desc}
Returns:
\begin{optiontable}
\textnormal{Value} & meaning \\ \hline
\sharpf & Access permitted \\
'search-denied & {\renewcommand{\arraystretch}{1}%
\begin{tabular}[t]{@{}l@{}}
Can't stat---a protected directory \\
is blocking access.\end{tabular}} \\
'permission & Permission denied. \\
'no-directory & Some directory doesn't exist. \\
'nonexistent & File doesn't exist.
\end{optiontable}
%
A file is considered writeable if either (1) it exists and is writeable
or (2) it doesn't exist and the directory is writeable.
Since symlink permission bits are ignored by the filesystem, these
calls do not take a \var{chase?} flag.
\oops{\ex{file-not-writeable?} does not currently do the directory
check.}
\end{desc}
\defun {file-readable?} {fname} \boolean
\defunx {file-writable?} {fname} \boolean
\defunx {file-executable?} {fname} \boolean
\begin{desc}
These procedures are the logical negation of the
preceding \ex{file-not-\ldots?} procedures.
\end{desc}
\begin{defundesc}{file-not-exists?} {fname [chase?]} \object
Returns:
\begin{optiontable}
\sharpf & Exists. \\
\sharpt & Doesn't exist. \\
'search-denied & \parbox[t]{0.5\linewidth}{\sloppy\raggedright
Some protected directory
is blocking the search.}
\end{optiontable}
\end{defundesc}
\begin{defundesc}{file-exists?} {fname [chase?]} \boolean
This is simply
\ex{(not (file-not-exists? \var{fname} \var{[chase?]}))}
\end{defundesc}
\defun {directory-files} {[dir dotfiles?]} {string list}
\begin{desc}
Return the list of files in directory \var{dir},
which defaults to the current working directory.
The \var{dotfiles?} flag (default {\sharpf}) causes dot files to be
included in the list.
Regardless of the value of \var{dotfiles?}, the two files \ex{.} and
\ex{..} are \emph{never} returned.
The directory \var{dir} is not prepended to each file name in the
result list. That is,
\codex{(directory-files "/etc")}
returns
\codex{("chown" "exports" "fstab" \ldots)}
\emph{not}
\codex{("/etc/chown" "/etc/exports" "/etc/fstab" \ldots)}
To use the files in returned list, the programmer can either manually
prepend the directory:
\codex{(map (\l{f} (string-append dir "/" f)) files)}
or cd to the directory before using the file names:
%
\begin{code}
(with-cwd dir
(for-each delete-file (directory-files)))\end{code}
%
or use the \ex{glob} procedure, defined below.
A directory list can be generated by \ex{(run/strings (ls))}, but this
is unreliable, as filenames with whitespace in their names will be
split into separate entries. Using \ex{directory-files} is reliable.
\end{desc}
\defun {glob} {\vari{pat}1 \ldots} {string list}
\begin{desc}
Glob each pattern against the filesystem and return the sorted list.
Duplicates are not removed. Patterns matching nothing are not included
literally.\footnote{Why bother to mention such a silly possibility?
Because that is what sh does.}
C shell \verb|{a,b,c}| patterns are expanded. Backslash quotes
characters, turning off the special meaning of
\verb|{|, \verb|}|, \cd{*}, \verb|[|, \verb|]|, and \verb|?|.
Note that the rules of backslash for {\Scheme} strings and glob patterns
work together to require four backslashes in a row to specify a
single literal backslash. Fortunately, this should be a rare
occurrence.
A glob subpattern will not match against dot files unless the first
character of the subpattern is a literal ``\ex{.}''.
Further, a dot subpattern will not match the files \ex{.} or \ex{..}
unless it is a constant pattern, as in \ex{(glob "../*/*.c")}.
So a directory's dot files can be reliably generated
with the simple glob pattern \ex{".*"}.
Some examples:
\begin{inset}
\begin{verbatim}
(glob "*.c" "*.h")
;; All the C and #include files in my directory.
(glob "*.c" "*/*.c")
;; All the C files in this directory and
;; its immediate subdirectories.
(glob "lexer/*.c" "parser/*.c")
(glob "{lexer,parser}/*.c")
;; All the C files in the lexer and parser dirs.
(glob "\\{lexer,parser\\}/*.c")
;; All the C files in the strange
;; directory "{lexer,parser}".
(glob "*\\*")
;; All the files ending in "*", e.g.
;; ("foo*" "bar*")
(glob "*lexer*")
("mylexer.c" "lexer1.notes")
;; All files containing the string "lexer".
(glob "lexer")
;; Either ("lexer") or ().\end{verbatim}
\end{inset}
%
If the first character of the pattern (after expanding braces) is a slash,
the search begins at root; otherwise, the search begins in the current
working directory.
If the last character of the pattern (after expanding braces) is a slash,
then the result matches must be directories, \eg,
\begin{code}
(glob "/usr/man/man?/") \evalto
("/usr/man/man1/" "/usr/man/man2/" \ldots)\end{code}
Globbing can sometimes be useful when we need a list of a directory's files
where each element in the list includes the pathname for the file.
Compare:
\begin{code}
(directory-files "../include") \evalto
("cig.h" "decls.h" \ldots)
(glob "../include/*") \evalto
("../include/cig.h" "../include/decls.h" \ldots)\end{code}
\end{desc}
\defun{glob-quote}{str}\str
\begin{desc}
Returns a constant glob pattern that exactly matches \var{str}.
All wild-card characters in \var{str} are quoted with a backslash.
\begin{code}
(glob-quote "Any *.c files?")
{\evalto}"Any \\*.c files\\?"\end{code}
\end{desc}
\begin{defundesc}{file-match}{root dot-files? \vari{pat}1 \vari{pat}2 {\ldots} \vari{pat}n}{string list}
\ex{file-match} provides a more powerful file-matching service, at the
expense of a less convenient notation. It is intermediate in
power between most shell matching machinery and recursive \ex{find(1)}.
Each pattern is a regexp. The procedure searches from \var{root},
matching the first-level files against pattern \vari{pat}1, the
second-level files against \vari{pat}2, and so forth.
The list of files matching the whole path pattern is returned,
in sorted order.
The matcher uses Spencer's regular expression package.
The files \ex{.} and \ex{..} are never matched. Other dot files are only
matched if the \var{dot-files?} argument is \sharpt.
A given \vari{pat}i pattern is matched as a regexp, so it is not forced
to match the entire file name. \Eg, pattern \ex{"t"} matches any
file containing a ``t'' in its name, while pattern \verb|"^t$"| matches
only a file whose entire name is ``\ex{t}''.
The \vari{pat}i patterns can be more general than stated above.
\begin{itemize}
\item A single pattern can specify multiple levels of the path by
embedding \ex{/} characters within the pattern. For example,
the pattern \ex{"a/b/c"} gives a match equivalent to the
list of patterns \ex{"a" "b" "c"}.
\item A \vari{pat}i pattern can be a procedure,
which is used as a match predicate.
It will be repeatedly called with a candidate file-name to test.
The file-name will be the entire path accumulated.
1995-11-03 23:41:53 -05:00
If the procedure raises an error condition, \ex{file-match} will
catch the error and treat it as a failed match.
This keeps \ex{file-match} from being blown out of the water
by applying tests to dangling symlinks and other similar situations.
1995-10-13 23:34:21 -04:00
\end{itemize}
Some examples:
%% UGH. Because we are using code instead of verbatim, we have to
%% double up on backslashes.
\begin{tightleftinset}
\begin{code}
(file-match "/usr/lib" #f "m$" "^tab") \evalto
("/usr/lib/term/tab300" "/usr/lib/term/tab300-12" \ldots)
\cb
(file-match "." #f "^lex|parse|codegen$" "\\\\.c$") \evalto
("lex/lex.c" "lex/lexinit.c" "lex/test.c"
"parse/actions.c" "parse/error.c" parse/test.c"
"codegen/io.c" "codegen/walk.c")
\cb
(file-match "." #f "^lex|parse|codegen$/\\\\.c$")
;; The same.
\cb
(file-match "." #f file-directory?)
;; Return all subdirs of the current directory.
\cb
(file-match "/" #f file-directory?) \evalto
("/bin" "/dev" "/etc" "/tmp" "/usr")
;; All subdirs of root.
\cb
(file-match "." #f "\\\\.c")
;; All the C files in my directory.
\cb
(define (ext extension)
(\l{fn} (string-suffix? fn extension)))
\cb
(define (true . x) #t)
\cb
(file-match "." #f "./\\\\.c")
(file-match "." #f "" "\\\\.c")
(file-match "." #f true "\\\\.c")
(file-match "." #f true (ext "c"))
;; All the C files of all my immediate subdirs.
\cb
(file-match "." #f "lexer") \evalto
("mylexer.c" "lexer.notes")
;; Compare with (glob "lexer"), above.\end{code}
\end{tightleftinset}
Note that when \var{root} is the current working directory (\ex{"."}),
when it is converted to directory form, it becomes \ex{""}, and doesn't
show up in the result file-names.
It is regrettable that the regexp wild card char, ``\ex{.}'',
is such an important file name literal, as dot-file prefix and extension
delimiter.
\end{defundesc}
\begin{defundesc} {create-temp-file} {[prefix]} \str
\ex{Create-temp-file} creates a new temporary file and return its name.
The optional argument specifies the filename prefix to use, and defaults
to \ex{"/usr/tmp/\var{pid}"}, where \var{pid} is the current process' id.
The procedure generates a sequence of filenames that have \var{prefix} as
a common prefix, looking for a filename that doesn't already exist in the
file system. When it finds one, it creates it, with permission \cd{#o600}
and returns the filename. (The file permission can be changed to a more
permissive permission with \ex{set-file-mode} after being created).
This file is guaranteed to be brand new. No other process will have it
open. This procedure does not simply return a filename that is very
likely to be unused. It returns a filename that definitely did not exist
at the moment \ex{create-temp-file} created it.
It is not necessary for the process' pid to be a part of the filename
for the uniqueness guarantees to hold. The pid component of the default
prefix simply serves to scatter the name searches into sparse regions, so
that collisions are less likely to occur. This speeds things up, but does
not affect correctness.
Security note: doing i/o to files created this way in \ex{/usr/tmp/} is
not necessarily secure. General users have write access to \ex{/usr/tmp/},
so even if an attacker cannot access the new temp file, he can delete it
and replace it with one of his own. A subsequent open of this filename
will then give you his file, to which he has access rights. There are
several ways to defeat this attack,
\begin{enumerate}
\item Use \ex{temp-file-iterate}, below, to return the file descriptor
allocated when the file is opened. This will work if the file
only needs to be opened once.
\item If the file needs to be opened twice or more, create it in a
1995-11-03 23:41:53 -05:00
protected directory, \eg, \verb|$HOME|.
1995-10-13 23:34:21 -04:00
\item Ensure that \ex{/usr/tmp} has its sticky bit set. This
requires system administrator privileges.
\end{enumerate}
The actual default prefix used is controlled by the dynamic variable
\ex{*temp-file-template*}, and can be overridden for increased security.
See \ex{temp-file-iterate}.
\end{defundesc}
\defunx {temp-file-iterate} {maker [template]} {\object\+}
\defvarx {*temp-file-template*} \str
\begin{desc}
This procedure can be used to perform certain atomic transactions on
the file system involving filenames. Some examples:
\begin{itemize}
\item Linking a file to a fresh backup temp name.
\item Creating and opening an unused, secure temp file.
\item Creating an unused temporary directory.
\end{itemize}
This procedure uses \var{template} to generate a series of trial file
names.
\var{Template} is a \ex{format} control string, and defaults to
\codex{"/usr/tmp/\var{pid}.\~a"}
where \var{pid} is the current process' process id.
File names are generated by calling \ex{format} to instantiate the
template's \verb|~a| field with a varying string.
\var{Maker} is a procedure which is serially called on each file name
generated. It must return at least one value; it may return multiple
values. If the first return value is {\sharpf} or if \var{maker} raises the
\ex{errno/exist} errno exception, \ex{temp-file-iterate} will loop,
generating a new file name and calling \var{maker} again. If the first
return value is true, the loop is terminated, returning whatever value(s)
\var{maker} returned.
After a number of unsuccessful trials, \ex{temp-file-iterate} may give up
and signal an error.
Thus, if we ignore its optional \var{prefix} argument,
\ex{create-temp-file} could be defined as:
\begin{code}
(define (create-temp-file)
(let ((flags (bitwise-ior open/create open/exclusive)))
(temp-file-iterate
(\l{f}
(close (open-output-file f flags #o600))
f))))\end{code}
To rename a file to a temporary name:
\begin{code}
(temp-file-iterate (\l{backup}
(create-hard-link old-file backup)
backup)
".#temp.\~a") ; Keep link in cwd.
(delete-file old-file)\end{code}
Recall that scsh reports syscall failure by raising an error
exception, not by returning an error code. This is critical to
to this example---the programmer can assume that if the
\ex{temp-file-iterate} call returns, it returns successully.
So the following \ex{delete-file} call can be reliably invoked,
safe in the knowledge that the backup link has definitely been established.
To create a unique temporary directory:
\begin{code}
(temp-file-iterate (\l{dir} (create-directory dir) dir)
"/usr/tmp/tempdir.\~a")\end{code}
%
Similar operations can be used to generate unique symlinks and fifos,
or to return values other than the new filename (\eg, an open file
descriptor or port).
The default template is in fact taken from the value of the dynamic
variable \ex{*temp-file-template*}, which itself defaults to
\ex{"/usr/tmp/\var{pid}.\~a"}, where \var{pid} is the scsh process'
pid.
For increased security, a user may wish to change the template
to use a directory not allowing world write access
(\eg, his home directory).
\end{desc}
\defun{temp-file-channel}{} {[inp outp]}
\begin{desc}
This procedure can be used to provide an interprocess communications
channel with arbitrary-sized buffering. It returns two values, an input
port and an output port, both open on a new temp file. The temp file
itself is deleted from the {\Unix} file tree before \ex{temp-file-channel}
returns, so the file is essentially unnamed, and its disk storage is
reclaimed as soon as the two ports are closed.
\ex{Temp-file-channel} is analogous to \ex{port-pipe} with two exceptions:
\begin{itemize}
\item If the writer process gets ahead of the reader process, it will
not hang waiting for some small pipe buffer to drain. It will simply
buffer the data on disk. This is good.
\item If the reader process gets ahead of the writer process, it will
also not hang waiting for data from the writer process. It will
simply see and report an end of file. This is bad.
In order to ensure that an end-of-file returned to the reader is
legitimate, the reader and writer must serialise their i/o. The
simplest way to do this is for the reader to delay doing input
until the writer has completely finished doing output, or exited.
\end{itemize}
\end{desc}
\section{Processes}
\defun {exec} {prog arg1 \ldots argn} \noreturn
\defunx {exec-path} {prog arg1 \ldots argn} \noreturn
\defunx {exec/env} {prog env arg1 \ldots argn} \noreturn
\defunx {exec-path/env} {prog env arg1 \ldots argn} \noreturn
\begin{desc}
The \ex{\ldots/env} variants take an environment specified as a
string$\rightarrow$string alist.
An environment of {\sharpt} is taken to mean the current process' environment
(\ie, the value of the external char \ex{**environ}).
[Rationale: {\sharpf} is a more convenient marker for the current environment
than {\sharpt}, but would cause an ambiguity on Schemes that identify
{\sharpf} and \ex{()}.]
The path-searching variants search the directories in the list
{\ttt exec\=path\=list} for the program.
A path-search is not performed if the program name contains
a slash character---it is used directly. So a program with a name like
\ex{"bin/prog"} always executes the program \ex{bin/prog} in the current working
directory. See \verb|$path| and \verb|exec-path-list|, below.
Note that there is no analog to the C function \ex{execv()}.
To get the effect just do
\codex{(apply exec prog arglist)}
All of these procedures flush buffered output and close unrevealed ports
before executing the new binary.
To avoid flushing buffered output, see \verb|%exec| below.
Note that the C \ex{exec()} procedure allows the zeroth element of the
argument vector to be different from the file being executed, \eg
%
\begin{inset}
\begin{verbatim}
char *argv[] = {"-", "-f", 0};
exec("/bin/csh", argv, envp);\end{verbatim}
\end{inset}
%
The scsh \ex{exec}, \ex{exec-path}, \ex{exec/env}, and \ex{exec-path/env}
procedures do not give this functionality---element 0 of the arg vector is
always identical to the \ex{prog} argument. In the rare case the user wishes
to differentiate these two items, he can use the low-level \verb|%exec| and
\verb|exec-path-search| procedures.
These procedures never return under any circumstances.
As with any other system call, if there is an error, they raise
an exception.
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {\%exec} {prog arglist env} \undefined
\defunx{exec-path-search} {fname pathlist} {{\str} or \sharpf}
1995-10-13 23:34:21 -04:00
\begin{desc}
1995-11-03 23:41:53 -05:00
The \ex{\%exec} procedure is the low-level interface to the system call.
The \var{arglist} parameter is a list of arguments;
1995-10-13 23:34:21 -04:00
\var{env} is either a string$\rightarrow$string alist or {\sharpt}.
The new program's \cd{argv[0]} will be taken from \ex{(car \var{arglist})},
\emph{not} from \var{prog}.
An environment of {\sharpt} means the current process' environment.
\verb|%exec| does not flush buffered output
(see \ex{flush-all-ports}).
1995-11-03 23:41:53 -05:00
All exec procedures, including \verb|%exec|, coerce the \cd{prog} and \cd{arg}
values to strings using the usual conversion rules: numbers are converted to
decimal numerals, and symbols converted to their print-names.
1995-10-13 23:34:21 -04:00
\ex{exec-path-search} searches the directories of \var{pathlist} looking for
an occurrence of file \ex{fname}. If no executable file is found, it returns
{\sharpf}. If \ex{fname} contains a slash character, the path search is
short-circuited, but the procedure still checks to ensure that the file exists
and is executable---if not, it still returns {\sharpf}.
1995-11-03 23:41:53 -05:00
Users of this procedure should be aware that it invites a potential race
condition: between checking the file with \ex{exec-path-search} and executing
it with \ex{\%exec}, the file's status might change.
The only atomic way to do the search is to loop over the candidate
file names, exec'ing each one and looping when the exec operation fails.
1995-10-13 23:34:21 -04:00
See \cd{$path} and \ex{exec-path-list}, below.
\end{desc}
\defun {exit} {[status]} \noreturn
\defunx {\%exit} {[status]} \noreturn
\begin{desc}
These procedures terminate the current process with a given exit status.
The default exit status is 0.
The low-level \verb|%exit| procedure immediately terminates the process
without flushing buffered output.
\end{desc}
1995-11-03 23:41:53 -05:00
\begin{defundesc} {call-terminally} {thunk} \noreturn
\ex{call-terminally} calls its thunk. When the thunk returns, the process
exits. Although \ex{call-terminally} could be implemented as
\codex{(\l{thunk} (thunk) (exit 0))}
an implementation can take advantage of the fact that this procedure never
returns. For example, the runtime can start with a fresh stack and also
start with a fresh dynamic environment, where shadowed bindings are
discarded. This can allow the old stack and dynamic environment to be
collected (assuming this data is not reachable through some live
continuation).
\end{defundesc}
1995-10-13 23:34:21 -04:00
\begin{defundesc}{suspend}{} \undefined
Suspend the current process with a SIGSTOP signal.
\end{defundesc}
1995-11-03 23:41:53 -05:00
\defun {fork} {[thunk]} {proc or \sharpf}
\defunx {\%fork} {[thunk]} {proc or \sharpf}
1995-10-13 23:34:21 -04:00
\begin{desc}
\ex{fork} with no arguments is like C \ex{fork()}.
1995-11-03 23:41:53 -05:00
In the parent process, it returns the child's \emph{process object}
(see below for more information on process objects).
In the child process, it returns {\sharpf}.
1995-10-13 23:34:21 -04:00
\ex{fork} with an argument only returns in the parent process, returning
1995-11-03 23:41:53 -05:00
the child's process object.
The child process calls \var{thunk} and then exits.
1995-10-13 23:34:21 -04:00
\ex{fork} flushes buffered output before forking, and sets the child
process to non-interactive. \verb|%fork| does not perform this bookkeeping;
it simply forks.
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {fork/pipe} {[thunk]} {proc or \sharpf}
\defunx{\%fork/pipe} {[thunk]} {proc or \sharpf}
1995-10-13 23:34:21 -04:00
\begin{desc}
Like \ex{fork} and \ex{\%fork}, but the parent and child communicate via a
pipe connecting the parent's stdin to the child's stdout. These procedures
side-effect the parent by changing his stdin.
In effect, \ex{fork/pipe} splices a process into the data stream
immediately upstream of the current process.
This is the basic function for creating pipelines.
Long pipelines are built by performing a sequence of \ex{fork/pipe} calls.
For example, to create a background two-process pipe \ex{a | b}, we write:
%
\begin{code}
(fork (\l{} (fork/pipe a) (b)))\end{code}
%
which returns the pid of \ex{b}'s process.
To create a background three-process pipe \ex{a | b | c}, we write:
%
\begin{code}
(fork (\l{} (fork/pipe a)
(fork/pipe b)
(c)))\end{code}
%
which returns the pid of \ex{c}'s process.
1995-11-03 23:41:53 -05:00
Note that these procedures affect file descriptors, not ports.
That is, the pipe is allocated connecting the child's file descriptor
1 to the parent's file descriptor 0.
\emph{Any previous Scheme port built over these affected file descriptors
is shifted to a new, unused file descriptor with \ex{dup} before
allocating the I/O pipe.}
This means, for example, that the ports bound to \ex{(current-input-port)}
and \ex{(current-output-port)} in either process are not affected---they
still refer to the same I/O sources and sinks as before.
Remember the simple scsh rule: Scheme ports are bound to I/O sources
and sinks, \emph{not} particular file descriptors.
If the child process wishes to rebind the current output port
to the pipe on file descriptor 1, it can do this using
\ex{with-current-output-port} or a related form.
Similarly, if the parent wishes to change the current input port
to the pipe on file descriptor 0, it can do this using
\ex{set-current-input-port!} or a related form.
Here is an example showing how to set up the I/O ports on both sides
of the pipe:
\begin{code}
(fork/pipe (\l{}
(with-current-output-port (fdes->outport 1)
(display "Hello, world.\\n"))))
(set-current-input-port! (fdes->inport 0)
(read-line) ; Read the string output by the child.\end{code}
None of this is necessary when the I/O is performed by an exec'd
program in the child or parent process, only when the pipe will
be referenced by Scheme code through one of the default current I/O
ports.
1995-10-13 23:34:21 -04:00
\end{desc}
1995-11-03 23:41:53 -05:00
\defun {fork/pipe+} {conns [thunk]} {proc or \sharpf}
\defunx {\%fork/pipe+} {conns [thunk]} {proc or \sharpf}
1995-10-13 23:34:21 -04:00
\begin{desc}
Like \ex{fork/pipe}, but the pipe connections between the child and parent
are specified by the connection list \var{conns}.
See the
\codex{(|+ \var{conns} \vari{pf}{\!1} \ldots{} \vari{pf}{\!n})}
process form for a description of connection lists.
\end{desc}
1995-11-03 23:41:53 -05:00
\subsection{Process objects and process reaping}
\label{sec:proc-objects}
Scsh uses \emph{process objects} to represent Unix processes.
They are created by the \ex{fork} procedure, and have the following
exposed structure:
\begin{code}
(define-record proc
pid)\end{code}
\index{proc}\index{proc?}\index{proc:pid}
The only exposed slot in a proc record is the process' pid,
the integer id assigned by Unix to the process.
The only exported primitive procedures for manipulating process objects
are \ex{proc?} and \ex{proc:pid}.
Process objects are created with the \ex{fork} procedure.
\begin{defundesc}{pid->proc}{pid [probe?]}{proc}
This procedure maps integer Unix process ids to scsh process objects.
It is intended for use in interactive and debugging code,
and is deprecated for use in production code.
If there is no process object in the system indexed by the given pid,
\ex{pid->proc}'s action is determined by the \var{probe?} parameter
(default \sharpf):
\begin{center}
\begin{tabular}{|l|l|}
\hline
\var{probe?} & Return \\ \hline\hline
\sharpf & \emph{signal error condition.} \\ \hline
\ex{'create} & Create new proc object. \\ \hline
True value & \sharpf \\ \hline
\end{tabular}
\end{center}
\end{defundesc}
Sometime after a child process terminates, scsh will perform a \ex{wait}
system call on the child in background, caching the process' exit status
in the child's proc object.
This is called ``reaping'' the process.
Once the child has been waited, the Unix kernel can free the storage allocated
for the dead process' exit information, so process reaping prevents the process
table from becoming cluttered with un-waited dead child processes
(a.k.a. ``zombies'').
This can be especially severe if the scsh process never waits on child
processes at all; if the process table overflows with forgotten zombies,
the OS may be unable to fork further processes.
Reaping a child process moves its exit status information from the kernel
into the scsh process, where it is cached inside the child's process object.
If the scsh user drops all pointers to the process object, it will simply be
garbage collected.
On the other hand, if the scsh program retains a pointer to the process object,
it can use scsh's \ex{wait} system call to synchronise with the child and
retrieve its exit status multiple times (this is not possible with simple
Unix integer pids in C---the programmer can only wait on a pid once).
Thus, process objects allow scsh programmer to do two things not allowed
in other programming environments:
\begin{itemize}
\item Subprocesses that are never waited on are still removed from the
process table, and their associated exit status data is eventually
automatically garbage collected.
\item Subprocesses can be waited on multiple times.
\end{itemize}
However, note that once a child has exited, if the scsh programmer
drops all pointers to the child's proc object, the child's exit status
will be reaped and thrown away.
This is the intended behaviour, and it means that integer pids are not
enough to cause a process's exit status to be retained by the scsh runtime.
(This is because it is clearly impossible to GC data referenced by integers.)
As a convenience for interactive use and debugging, all procedures that
take process objects will also accept integer Unix pids as arguments,
coercing them to the corresponding process objects.
Since integer process ids are not reliable ways to keep a child's exit
status from being reaped and garbage collected, programmers are encouraged
to use process objects in production code.
\begin{defundesc}{autoreap-policy}{[policy]}{old-policy}
The scsh programmer can choose different policies for automatic
process reaping.
The policy is determined by applying this procedure to one of the
values \ex{'early}, \ex{'late}, or {\sharpf} (\ie, no autoreap).
\begin{description}
\item [early]
The child is reaped from the {\Unix} kernel's process table
into scsh as soon as possible after it dies. In the current
release of scsh, this happens at the next call to
\ex{wait}---when scsh is asked to wait for a particular
child to exit, it will reap \emph{all} outstanding zombies.
When signal handlers are added to a future release of scsh,
early autoreaping will use the \ex{SIGCHLD} signal to reap
zombies with minimum delay.
\item [late]
The child is not autoreaped until it dies \emph{and} the scsh program
drops all pointers to its process object. That is, the process
table is cleaned out during garbage collection.
\item [\sharpf]
If autoreaping is turned off, process reaping is completely under
control of the programmer, who can force outstanding zombies to
be reaped by manually calling the \ex{reap-zombies} procedure
(see below).
\end{description}
Note that under any of the autoreap policies, a particular process $p$ can
be manually reaped into scsh by simply calling \ex{(wait $p$)}.
\emph{All} zombies can be manually reaped with \ex{reap-zombies}.
The \ex{autoreap-policy} procedure returns the policy's previous value.
Calling \ex{autoreap-policy} with no arguments returns the current
policy without no change.
\end{defundesc}
1995-10-13 23:34:21 -04:00
1995-11-03 23:41:53 -05:00
\begin{defundesc}{reap-zombies}{}{\boolean}
This procedure reaps all outstanding exited child processes into scsh.
It returns true if there are no more child processes to wait on, and
false if there are outstanding processes still running or suspended.
1995-10-13 23:34:21 -04:00
\end{defundesc}
1995-11-03 23:41:53 -05:00
\subsubsection{Issues with process reaping}
Reaping a process does not reveal its process group at the time of
death; this information is lost when the process reaped.
This means that a dead, reaped process is \emph{not eligible} as a return
value for a future \ex{wait-process-group} call.
This is not likely to be a problem for most code, as programs almost
never wait on exited processes by process group.
Process group waiting is usually applied to \emph{stopped} processes,
which are never reaped.
So it is unlikely that this will be a problem for most programs.
%%% Actually, this is *not* a problem if you stick with proc objects, instead
%%% of using pids, so I commented it out.
%
%\paragraph{Pid aliasing}
%Second, once a process has been reaped, its 16-bit process id becomes
%available to Unix for re-use.
%So it is conceivable that a long time in the future, a \ex{fork} operation
%could produce a subprocess with the identical pid, causing \ex{wait}
%operations on the old, dead, reaped child, and the new child to become
%confused.
%This kind of pid aliasing is intrinsic to the nature of Unix's single-use pid
%deallocation policy,
%but is very, very unlikely to happen in practice,
%given the 16-bit size of the pid space.
%Scsh will detect occurences of pid aliasing,
%in the unlikely event that one occurs.
%When \ex{fork} creates a proc object, it checks to see if the scsh heap
%contains an already existing proc object with the same pid as the newly forked
%process.
%If so, an exception is raised; if not handled by the program, this will stop
%the program, either killing the process or invoking an interactive debugger.
Automatic process reaping is a useful programming convenience.
However, if a program is careful to wait for all children, and does not wish
automatic reaping to happen, the programmer can simply turn process
autoreaping off.
Programs that do not wish to use automatic process reaping should be
aware that some scsh routines create subprocesses but do not return
the child's pid: \ex{run/port*}, and its related procedures and
special forms (\ex{run/strings}, \emph{et al.}).
Automatic process reaping will clean the child processes created by
these procedures out of the kernel's process table.
If a program doesn't use process reaping, it should either avoid these
forms, or use \ex{wait-any} to wait for the children to exit.
\subsection{Process waiting}
\defun {wait} {proc/pid [flags]} {status}
\begin{desc}
This procedure waits until a child process exits, and returns its
exit code. The \var{proc/pid} argument is either a process object
(section \ref{sec:proc-objects}) or an integer process id.
\ex{Wait} returns the child's exit status code (or suspension code,
if the \ex{wait/stopped-children} option is used, see below).
Status values can be queried with the procedures in section
\ref{sec:wait-codes}.
The \var{flags} argument is an integer whose bits specify
additional options. It is composed by or'ing together the following
flags:
\begin{center}
\begin{tabular}{|l|l|}
\hline
Flag & Meaning \\ \hline \hline
\ex{wait/poll} & Return {\sharpf} immediately if
child still active. \\ \hline
\ex{wait/stopped-children} & Wait for suspend as well as exit. \\ \hline
\end{tabular}
\end{center}
\end{desc}
\begin{defundesc} {wait-any} {[flags]} {[proc status]}
The optional \var{flags} argument is as for \ex{wait}.
This procedure waits for any child process to exit (or stop, if the
\ex{wait/stopped-children} flag is used)
It returns the process' process object and status code.
If there are no children left for which to wait, the two values
\ex{[{\sharpf} {\sharpt}]} are returned.
If the \ex{wait/poll} flag is used, and none of the children
are immediately eligble for waiting,
then the values \ex{[{\sharpf} {\sharpf}]} are returned:
\begin{center}
\begin{tabular}{|l|l|}
\hline
[{\sharpf} {\sharpf}] & Poll, none ready \\ \hline
[{\sharpf} {\sharpt}] & No children \\ \hline
\end{tabular}
\end{center}
\ex{Wait-any} will not return a process that has been previously waited
by any other process-wait procedure (\ex{wait}, \ex{wait-any},
and \ex{wait-process-group}).
It will return reaped processes that haven't yet been waited.
The use of \ex{wait-any} is deprecated.
\end{defundesc}
\begin{defundesc} {wait-process-group} {proc/pid [flags]} {[proc status]}
This procedure waits for any child whose process group is \var{proc/pid}
(either a process object or a pid).
The \var{flags} argument is as for \ex{wait}.
Note that if the programmer wishes to wait for exited processes
by process group, the program should take care not to use process
reaping (section \ref{sec:proc-objects}), as this loses
process group information. However, most process-group waiting is
for stopped processes (to implement job control), so this is rarely
an issue, as stopped processes are not subject to reaping.
\end{defundesc}
\subsection{Analysing process status codes}
\label{sec:wait-codes}
When a child process dies (or is suspended), its parent can call the \ex{wait}
procedure to recover the exit (or suspension) status of the child.
The exit status is a small integer that encodes information
1995-10-13 23:34:21 -04:00
describing how the child terminated.
1995-11-03 23:41:53 -05:00
The bit-level format of the exit status is not defined by {\Posix};
you must use the following three functions to decode one.
1995-10-13 23:34:21 -04:00
However, if a child terminates normally with exit code 0,
{\Posix} does require \ex{wait} to return an exit status that is exactly
zero.
So \ex{(zero? \var{status})} is a correct way to test for non-error,
1995-11-03 23:41:53 -05:00
normal termination, \eg,
\begin{code}
(if (zero? (run (rcp scsh.tar.gz lambda.csd.hku.hk:)))
(delete-file "scsh.tar.gz"))\end{code}
1995-10-13 23:34:21 -04:00
\defun {status:exit-val}{status}{{\integer} or \sharpf}
\defunx{status:stop-sig}{status}{{\integer} or \sharpf}
\defunx{status:term-sig}{status}{{\integer} or \sharpf}
\begin{desc}
For a given status value produced by calling \ex{wait},
exactly one of these routines will return a true value.
If the child process exited normally, \ex{status:exit-val} returns the
exit code for the child process (\ie, the value the child passed to \ex{exit}
or returned from \ex{main}). Otherwise, this function returns false.
If the child process was suspended by a signal, \ex{status:stop-sig}
returns the signal that suspended the child.
Otherwise, this function returns false.
If the child process terminated abnormally, \ex{status:term-sig}
returns the signal that terminated the child.
Otherwise, this function returns false.
\end{desc}
%% Dereleased until we have a more portable implementation.
%\defun{halts?}{proc}\boolean
%\begin{desc}
%This procedure, ported from early T implementations,
%returns true iff \ex{(\var{proc})} returns at all.
%\remark{The current implementation is a constant function returning {\sharpt},
% which suffices for all {\Unix} implementations of which we are aware.}
%\end{desc}
\section{Process state}
\defun {umask}{} \fixnum
\defunx {set-umask} {perms} \undefined
\defunx {with-umask*} {perms thunk} {values of thunk}
\dfnx {with-umask} {perms . body} {values of body} {syntax}
\begin{desc}
The process' current umask is retrieved with \ex{umask}, and set with
\ex{(set-umask \var{perms})}. Calling \ex{with-umask*} changes the umask
to \var{perms} for the duration of the call to \var{thunk}. If the
program throws out of \var{thunk} by invoking a continuation, the umask is
reset to its external value. If the program throws back into \var{thunk}
by calling a stored continuation, the umask is restored to the \var{perms}
value. The special form \ex{with-umask} is equivalent in effect to
the procedure \ex{with-umask*}, but does not require the programmer
to explicitly wrap a \ex{(\l{} \ldots)} around the body of the code
to be executed.
\end{desc}
\defun {chdir} {[fname]} \undefined
\defunx {cwd}{} \str
\defunx {with-cwd*} {fname thunk} {value(s) of thunk}
\dfnx {with-cwd} {fname . body} {value(s) of body} {syntax}
\begin{desc}
These forms manipulate the current working directory.
The cwd can be changed with \ex{chdir}
(although in most cases, \ex{with-cwd} is preferrable).
If \ex{chdir} is called with no arguments, it changes the cwd to
the user's home directory.
The \ex{with-cwd*} procedure calls \ex{thunk} with the cwd temporarily
set to \var{fname}; when \var{thunk} returns, or is exited in a non-local
fashion (\eg, by raising an exception or by invoking a continuation),
the cwd is returned to its original value.
The special form \ex{with-cwd} is simply syntactic sugar for \ex{with-cwd*}.
\end{desc}
\defun {pid}{} \fixnum
\defunx {parent-pid}{} \fixnum
1995-11-03 23:41:53 -05:00
\defunx {process-group} {} \fixnum
\defunx {set-process-group} {[proc] pgrp} \undefined % [not implemented]
1995-10-13 23:34:21 -04:00
\begin{desc}
\ex{(pid)} and \ex{(parent-pid)} retrieve the process id for the
current process and its parent.
1995-11-03 23:41:53 -05:00
\ex{(process-group)} returns the process group of the current process.
A process' process-group can be set with \ex{set-process-group};
the value \var{pid} specifies the affected process. It may be either
a process object or an integer process id, and defaults to the current
1995-10-13 23:34:21 -04:00
process.
\end{desc}
\defun {set-priority} {which who priority} \undefined %; priority stuff unimplemented
\defunx {priority} {which who} \fixnum % ; not implemented
\defunx {nice} {[pid delta]} \undefined %; not implemented
\begin{desc}
These procedures set and access the priority of processes.
I can't remember how \ex{set-priority} and \ex{priority} work, so no
documentation, and besides, they aren't implemented yet, anyway.
\end{desc}
\defunx {user-login-name}{} \str
\defunx {user-uid}{} \fixnum
\defunx {user-effective-uid}{} \fixnum
\defunx {user-gid}{} \fixnum
\defunx {user-effective-gid}{} \fixnum
\defunx {user-supplementary-gids}{} {{\fixnum} list}
\defunx {set-uid} {uid} \undefined
\defunx {set-gid} {gid} \undefined
\begin{desc}
These routines get and set the effective and real user and group ids.
The \ex{set-uid} and \ex{set-gid} routines correspond to the {\Posix}
\ex{setuid()} and \ex{setgid()} procedures.
\end{desc}
\defun {process-times} {} {[{\fixnum} {\fixnum} {\fixnum} \fixnum]}
\begin{desc}
Returns four values:
\begin{tightinset}
1995-11-03 23:41:53 -05:00
\begin{flushleft}
1995-10-13 23:34:21 -04:00
user CPU time in clock-ticks \\
system CPU time in clock-ticks \\
user CPU time of all descendant processes \\
system CPU time of all descendant processes
1995-11-03 23:41:53 -05:00
\end{flushleft}
1995-10-13 23:34:21 -04:00
\end{tightinset}
1995-11-03 23:41:53 -05:00
Note that CPU time clock resolution is not the same as
the real-time clock resolution provided by \ex{time+ticks}.
That's Unix.
\end{desc}
\defun{cpu-ticks/sec}{} {integer}
\begin{desc}
Returns the resolution of the CPU timer in clock ticks per second.
This can be used to convert the times reported by \ex{process-times}
to seconds.
1995-10-13 23:34:21 -04:00
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1995-11-03 23:41:53 -05:00
\section{User and group database access}
These procedures are used to access the user and group databases
1995-10-13 23:34:21 -04:00
(\eg, the ones traditionally stored in \ex{/etc/passwd} and \ex{/etc/group}.)
\defun {user-info} {uid/name} {record}
\begin{desc}
Return a \ex{user-info} record giving the recorded information for a
particular user:
\index{user-info}
\index{user-info:name}
\index{user-info:uid}
\index{user-info:gid}
\index{user-info:home-dir}
\index{user-info:shell}
\begin{code}
(define-record user-info
name uid gid home-dir shell)\end{code}
The \var{uid/name} argument is either an integer uid or a string user-name.
\end{desc}
\defun {->uid} {uid/name} \fixnum
\defunx {->username} {uid/name} \str
\begin{desc}
These two procedures coerce integer uid's and user names to a particular
form.
\end{desc}
\defun {group-info} {gid/name} {record}
\begin{desc}
Return a \ex{group-info} record giving the recorded information for a
1995-11-03 23:41:53 -05:00
particular group:
1995-10-13 23:34:21 -04:00
\index{group-info}
\index{group-info:name}
\index{group-info:gid}
\index{group-info:members}
\begin{code}
(define-record group-info
name gid members)\end{code}
1995-11-03 23:41:53 -05:00
The \var{gid/name} argument is either an integer gid or a string group-name.
1995-10-13 23:34:21 -04:00
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Accessing command-line arguments}
\defvar {command-line-arguments}{{\str} list}
\defunx {command-line}{} {{\str} list}
\begin{desc}
The list of strings \ex{command-line-arguments} contains the arguments
passed to the scsh process on the command line.
Calling \ex{(command-line)} returns the complete \ex{argv}
1995-11-03 23:41:53 -05:00
string list, including the program. So if we run a scsh program
1995-10-13 23:34:21 -04:00
\codex{/usr/shivers/bin/myls -CF src}
then \ex{command-line-arguments} is
\codex{("-CF" "src")}
and \ex{(command-line)} returns
\codex{("/usr/shivers/bin/myls" "-CF" "src")}
\ex{command-line} returns a fresh list each time it is called.
In this way, the programmer can get a fresh copy of the original
argument list if \ex{command-line-arguments} has been modified or is lexically
shadowed.
\end{desc}
\defun {arg} {arglist n [default]} \str
\defunx {arg*} {arglist n [default-thunk]} \str
\defunx {argv} {n [default]} \str
\begin{desc}
These procedures are useful for accessing arguments from argument
lists.
\ex{arg} returns the $n^{\rm{th}}$ element of \var{arglist}.
The index is 1-based.
If \var{n} is too large, \var{default} is returned;
if no \var{default}, then an error is signaled.
\ex{arg*} is similar, except that the \var{default-thunk} is called to generate
the default value.
\ex{(argv \var{n})} is simply \ex{(arg (command-line) (+ \var{n} 1))}.
The +1 offset ensures that the two forms
%
\begin{code}
(arg command-line-arguments \var{n})
(argv \var{n})\end{code}
%
return the same argument
(assuming the user has not rebound or modified \ex{command-line-arguments}).
Example:
%
\begin{code}
(if (null? command-line-arguments)
(& (xterm -n ,host -title ,host
-name ,(string-append "xterm_" host)))
(let* ((progname (file-name-nondirectory (argv 1)))
(title (string-append host ":" progname)))
(& (xterm -n ,title
-title ,title
-e ,@command-line-arguments))))\end{code}
%
1995-11-03 23:41:53 -05:00
A subtlety: when the scsh interpreter is used to execute a scsh program,
the program name reported in the head of the \ex{(command-line)} list
is the scsh program, {\em not} the interpreter.
For example, if we have a shell script in file \ex{fullecho}:
1995-10-13 23:34:21 -04:00
\begin{code}
#!/usr/local/bin/scsh -s
!#
(for-each (\l{arg} (display arg) (display " "))
(command-line))\end{code}
and we run the program
\codex{fullecho hello world}
the program will print out
\codex{fullecho hello world}
not
\codex{/usr/local/bin/scsh -s fullecho hello world}
1995-11-03 23:41:53 -05:00
This argument line processing ensures that if a scsh program is subsequently
compiled into a standalone executable or byte-compiled to a heap-image
executable by the {\scm} virtual machine, its semantics will be
1995-10-13 23:34:21 -04:00
unchanged---the arglist processing is invariant. In effect, the
\codex{/usr/local/bin/scsh -s}
is not part of the program;
it's a specification for the machine to execute the program on, so it is
not properly part of the program's argument list.
\end{desc}
\section{System parameters}
1995-11-03 23:41:53 -05:00
%\defun {maximum-fds}{}\fixnum
%\defunx {page-size}{} \fixnum
\defun {system-name}{} \str
1995-10-13 23:34:21 -04:00
\begin{desc}
1995-11-03 23:41:53 -05:00
Returns the name of the host on which we are executing.
This may be a local name, such as ``solar,'' as opposed to a
fully-qualified domain name such as ``solar.csie.ntu.edu.tw.''
1995-10-13 23:34:21 -04:00
\end{desc}
\section{Signal system}
Signal numbers are bound to the variables \ex{signal/hup}, \ex{signal/int},
\ldots
1995-11-03 23:41:53 -05:00
\defun {signal-process} {proc sig} \undefined
\defunx {signal-process-group} {prgrp sig} \undefined
1995-10-13 23:34:21 -04:00
\begin{desc}
These two procedures send signals to a specific process, and all the processes
in a specific process group, respectively.
1995-11-03 23:41:53 -05:00
The \var{proc} and \var{prgrp} arguments are either processes
or integer process ids.
1995-10-13 23:34:21 -04:00
\end{desc}
I haven't done signal handlers yet. Should be straightforward: a mechanism
to assign procedures to signals.
\defun{itimer}{???} \undefined
\defunx{pause-until-interrupt}{} \undefined
\defun{sleep}{secs} \undefined
\begin{desc}
Sleeping is defined, but we don't offer a way to sleep for a more precise
interval (\eg, a microsecond timer), as this is not in {\Posix}.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Time}
1995-11-03 23:41:53 -05:00
\label{sec:time}
This time package does not currently work with NeXTSTEP, as NeXTSTEP
does not provide a {\Posix}-compliant time library that will even link.
1995-10-13 23:34:21 -04:00
Scsh's time system is fairly sophisticated, particularly with respect
to its careful treatment of time zones.
However, casual users shouldn't be intimidated;
1995-11-03 23:41:53 -05:00
all of the complexity is optional,
1995-10-13 23:34:21 -04:00
and defaulting all the optional arguments reduces the system
to a simple interface.
\subsection{Terminology}
``UTC'' and ``UCT'' stand for ``universal coordinated time,'' which is the
official name for what is colloquially referred to as ``Greenwich Mean
Time.''
1995-11-03 23:41:53 -05:00
{\Posix} allows a single time zone to specify \emph{two} different offsets
from UTC: one standard one, and one for ``summer time.''
Summer time is frequently some sort of daylight savings time.
1995-10-13 23:34:21 -04:00
The scsh time package consistently uses this terminology: we never say
``gmt'' or ``dst;'' we always say ``utc'' and ``summer time.''
\subsection{Basic data types}
We have two types: \emph{time} and \emph{date}.
\index{time}
A \emph{time} specifies an instant in the history of the universe.
1995-11-03 23:41:53 -05:00
It is location and time-zone independent.\footnote{Physics pedants please note:
The scsh authors live in a Newtonian universe. We disclaim responsibility
for calculations performed in non-ANSI standard light-cones.}
A time is a real value
1995-10-13 23:34:21 -04:00
giving the number of elapsed seconds since the Unix ``epoch''
(Midnight, January 1, 1970 UTC).
Time values provide arbitrary time resolution,
limited only by the number system of the underlying Scheme system.
\index{date}
A \emph{date} is a name for an instant in time that is specified
relative to some location/time-zone in the world, \eg:
\begin{tightinset}
Friday October 31, 1994 3:47:21 pm EST.
\end{tightinset}
Dates provide one-second resolution,
and are expressed with the following record type:
%
\begin{code}\index{date}
(define-record date ; A Posix tm struct
seconds ; Seconds after the minute [0-59]
minute ; Minutes after the hour [0-59]
hour ; Hours since midnight [0-23]
month-day ; Day of the month [1-31]
month ; Months since January [0-11]
year ; Years since 1900
tz-name ; Time-zone name: #f or a string.
tz-secs ; Time-zone offset: #f or an integer.
summer? ; Summer (Daylight Savings) time in effect?
week-day ; Days since Sunday [0-6]
year-day) ; Days since Jan. 1 [0-365]\end{code}
%
If the \ex{tz-secs} field is given, it specifies the time-zone's offset from
UTC in seconds. If it is specified, the \ex{tz-name} and \ex{summer?}
fields are ignored when using the date structure to determine a specific
instant in time.
If the \ex{tz-name} field is given, it is a time-zone string such as
\ex{"EST"} or \ex{"HKT"} understood by the OS.
1995-11-03 23:41:53 -05:00
Since {\Posix} time-zone strings can specify dual standard/summer time-zones
1995-10-13 23:34:21 -04:00
(e.g., "EST5EDT" specifies U.S. Eastern Standard/Eastern Daylight Time),
the value of the \ex{summer?} field is used to resolve the amiguous
boundary cases. For example, on the morning of the Fall daylight savings
change-over, 1:00am--2:00am happens twice. Hence the date 1:30 am
on this morning can specify two different seconds;
the \ex{summer?} flag says which one.
A date with $\ex{tz-name} = \ex{tz-secs} = \ex{\#f}$ is a date that
is specified in terms of the system's current time zone.
There is redundancy in the \ex{date} data structure.
For example, the \ex{year-day} field is redundant
with the \ex{month-day} and \ex{month} fields.
Either of these implies the values of the \ex{week-day} field.
The \ex{summer?} and \ex{tz-name} fields are redundant with the \ex{tz-secs}
field in terms of specifying an instant in time.
This redundancy is provided because consumers of dates may want it broken out
in different ways.
The scsh procedures that produce date records fill them out completely.
However, when date records produced by the programmer are passed to
scsh procedures, the redundancy is resolved by ignoring some of the
secondary fields.
This is described for each procedure below.
\defun{make-date} {s min h mday mon y [tzn tzs summ? wday yday]} {date}
\begin{desc}
When making a \ex{date} record, the last five elements of the record
are optional, and default to \ex{\#f}, \ex{\#f}, \ex{\#f}, 0,
and 0 respectively.
This is useful when creating a \ex{date} record to pass as an
argument to \ex{time}.
\end{desc}
\subsection{Time zones}
Several time procedures take time zones as arguments. When optional,
the time zone defaults to local time zone. Otherwise the time zone
can be one of:
\begin{inset}
\begin{tabular}{lp{0.7\linewidth}}
\ex{\#f} & Local time \\
Integer & Seconds of offset from UTC. For example,
New York City is -18000 (-5 hours), San Francisco
is -28800 (-8 hours). \\
1995-11-03 23:41:53 -05:00
String & A {\Posix} time zone string understood by the OS
1995-10-13 23:34:21 -04:00
(\ie., the sort of time zone assigned to the \ex{\$TZ}
environment variable).
\end{tabular}
\end{inset}
An integer time zone gives the number of seconds you must add to UTC
to get time in that zone. It is \emph{not} ``seconds west'' of UTC---that
flips the sign.
To get UTC time, use a time zone of either 0 or \ex{"UCT0"}.
\subsection{Procedures}
\defun {time+ticks} {} {[secs ticks]}
\defunx{ticks/sec} {} \real
\begin{desc}
The current time, with sub-second resolution.
1995-11-03 23:41:53 -05:00
Sub-second resolution is not provided by {\Posix},
1995-10-13 23:34:21 -04:00
but is available on many systems.
The time is returned as elapsed seconds since the Unix epoch, plus
a number of sub-second ``ticks.''
The length of a tick may vary from implementation to implementation;
it can be determined from \ex{(ticks/sec)}.
The system clock is not required to report time at the full resolution
given by \ex{(ticks/sec)}. For example, on BSD, time is reported at
$1\mu$s resolution, so \ex{(ticks/sec)} is 1,000,000. That doesn't mean
the system clock has micro-second resolution.
If the OS does not support sub-second resolution, the \var{ticks} value
is always 0, and \ex{(ticks/sec)} returns 1.
\begin{remarkenv}
I chose to represent system clock resolution as ticks/sec
instead of sec/tick to increase the odds that the value could
be represented as an exact integer, increasing efficiency and
making it easier for Scheme implementations that don't have
sophisticated numeric support to deal with the quantity.
You can convert seconds and ticks to seconds with the expression
\codex{(+ \var{secs} (/ \var{ticks} (ticks/sec)))}
Given that, why not have the fine-grain time procedure just
return a non-integer real for time? Following Common Lisp, I chose to
allow the system clock to report sub-second time in its own units to
lower the overhead of determining the time. This would be important
for a system that wanted to precisely time the duration of some
event. Time stamps could be collected with little overhead, deferring
the overhead of precisely calculating with them until after collection.
This is all a bit academic for the {\scm} implementation, where
we determine time with a heavyweight system call, but it's nice
to plan for the future.
\end{remarkenv}
\end{desc}
\defun {date} {} {date-record}
\defunx{date} {[time tz]} {date-record}
\begin{desc}
Simple \ex{(date)} returns the current date, in the local time zone.
With the optional arguments, \ex{date} converts the time to the date as
specified by the time zone \var{tz}.
\var{Time} defaults to the current time; \var{tz} defaults to local time,
and is as described in the time-zone section.
If the \var{tz} argument is an integer, the date's \ex{tz-name}
1995-11-03 23:41:53 -05:00
field is a {\Posix} time zone of the form
1995-10-13 23:34:21 -04:00
``\ex{UTC+\emph{hh}:\emph{mm}:\emph{ss}}'';
the trailing \ex{:\emph{mm}:\emph{ss}} portion is deleted if it is zeroes.
\end{desc}
\defun {time} {} \integer
\defunx{time} {[date]} \integer
\begin{desc}
Simple \ex{(time)} returns the current time.
With the optional date argument, \ex{time} converts a date to a time.
\var{Date} defaults to the current date.
Note that the input \var{date} record is overconstrained.
\ex{time} ignores \var{date}'s \ex{week-day} and \ex{year-day} fields.
If the date's \ex{tz-secs} field is set, the \ex{tz-name} and
\ex{summer?} fields are ignored.
If the \ex{tz-secs} field is \ex{\#f}, then the time-zone is taken
from the \ex{tz-name} field. A false \ex{tz-name} means the system's
current time zone. When calculating with time-zones, the date's
\ex{summer?} field is used to resolve ambiguities:
\begin{tightinset}
\begin{tabular}{ll}
\ex{\#f} & Resolve an ambiguous time in favor of non-summer time. \\
true & Resolve an ambiguous time in favor of summer time.
\end{tabular}
\end{tightinset}
This is useful in boundary cases during the change-over. For example,
in the Fall, when US daylight savings time changes over at 2:00 am,
1:30 am happens twice---it names two instants in time, an hour apart.
Outside of these boundary cases, the \ex{summer?} flag is ignored. For
example, if the standard/summer change-overs happen in the Fall and the
Spring, then the value of \ex{summer?} is ignored for a January or
July date. A January date would be resolved with standard time, and a
July date with summer time, regardless of the \ex{summer?} value.
The \ex{summer?} flag is also ignored if the time zone doesn't have
a summer time---for example, simple UTC.
\end{desc}
\defun {date->string} {date} \str
\defunx{format-date} {fmt date} \str
\begin{desc}
\ex{Date->string} formats the date as a 24-character string of the
form:
\begin{tightinset}
Sun Sep 16 01:03:52 1973
\end{tightinset}
\ex{Format-date} formats the date according to the format string
\var{fmt}. The format string is copied verbatim, except that tilde
characters indicate conversion specifiers that are replaced by fields from
the date record. Figure \ref{fig:dateconv} gives the full set of
conversion specifiers supported by \ex{format-date}.
\begin{boxedfigure}{tbp}
\renewcommand{\arraystretch}{1.25}
\begin{tabular}{l>{\raggedrightparbox}p{0.9\linewidth}}
\verb|~~| & Converted to the \verb|~| character. \\
\verb|~a| & abbreviated weekday name \\
\verb|~A| & full weekday name \\
\verb|~b| & abbreviated month name \\
\verb|~B| & full month name \\
\verb|~c| & time and date using the time and date representation
for the locale (\verb|~X ~x|) \\
\verb|~d| & day of the month as a decimal number (01-31) \\
\verb|~H| & hour based on a 24-hour clock
as a decimal number (00-23) \\
\verb|~I| & hour based on a 12-hour clock
as a decimal number (01-12) \\
\verb|~j| & day of the year as a decimal number (001-366) \\
\verb|~m| & month as a decimal number (01-12) \\
\verb|~M| & minute as a decimal number (00-59) \\
\verb|~p| & AM/PM designation associated with a 12-hour clock \\
\verb|~S| & second as a decimal number (00-61) \\
\verb|~U| & week number of the year;
Sunday is first day of week (00-53) \\
\verb|~w| & weekday as a decimal number (0-6), where Sunday is 0 \\
\verb|~W| & week number of the year;
Monday is first day of week (00-53) \\
\verb|~x| & date using the date representation for the locale \\
\verb|~X| & time using the time representation for the locale \\
\verb|~y| & year without century (00-99) \\
\verb|~Y| & year with century (\eg 1990) \\
\verb|~Z| & time zone name or abbreviation, or no characters
if no time zone is determinable
\end{tabular}
\caption{\texttt{format-date} conversion specifiers}
\label{fig:dateconv}
\end{boxedfigure}
\end{desc}
%\defun{utc-offset} {[time tz]} \integer
%\begin{desc}
% Returns the offset from UTC of time zone \var{tz} at instant \var{time}.
% \var{time} defaults to the current time; \var{tz} defaults to local time,
% and is as described in the time-zone section.
%
% The offset is the number of seconds you add to UTC time to get
% local time.
%
% Note: Be aware that other time interfaces (\eg, the BSD C interface)
% give offsets as seconds \emph{west} of UTC, which flips the sign. The scsh
% definition is chosen for arithmetic simplicity. It's easy to remember
% the definition of the offset: what you add to UTC to get local.
%\end{desc}
%
%\defun{time-zone} {[summer? tz]} \str
%\begin{desc}
% Returns the name of the time zone as a string. \var{Summer?} is
% used to choose between the summer name and the standard name
% (\eg, ``EST'' and ``EDT'')\@. \var{Summer?} is interpreted as follows:
% \begin{inset}
% \begin{tabular}{lp{0.7\linewidth}}
% Integer & A time value.
% The variant in use at that time is returned. \\
% \ex{\#f} & The standard time name is returned. \\
% \emph{Otherwise} & The summer time name is returned.
% \end{tabular}
% \end{inset}
% \ex{Summer?} defaults to the case that pertains at the time of the call.
% It is ignored if the time zone doesn't have a summer variant.
%\end{desc}
1995-11-03 23:41:53 -05:00
\dfni {fill-in-date!}{date}{date}{procedure}
{fill-in-date"!@\texttt{fill-in-date"!}}
1995-10-13 23:34:21 -04:00
\begin{desc}
This procedure fills in missing, redundant slots in a date record.
In decreasing order of priority:
\begin{itemize}
\itum{year, month, month-day $\Rightarrow$ year-day}
If the \ex{year}, \ex{month}, and \ex{month-day} fields are all
defined (are all integers), the \ex{year-day}
field is set to the corresponding value.
\itum{year, year-day $\Rightarrow$ month, month-day}
If the \ex{month} and \ex{month-day} fields aren't set, but
the \ex{year} and \ex{year-day} fields are set, then
\ex{month} and \ex{month-day} are calculated.
\itum{year, month, month-day, year-day $\Rightarrow$ week-day}
If either of the above rules is able to determine what day it is,
the \ex{week-day} field is then set.
\itum{tz-secs $\Rightarrow$ tz-name}
If \ex{tz-secs} is defined, but \ex{tz-name} is not, it is assigned
a time-zone name of the form ``\ex{UTC+\emph{hh}:\emph{mm}:\emph{ss}}'';
the trailing \ex{:\emph{mm}:\emph{ss}} portion is deleted if it
is zeroes.
\itum{tz-name, date, summer? $\Rightarrow$ tz-secs, summer?}
If the date information is provided up to second resolution,
\ex{tz-name} is also provided, and \ex{tz-secs} is not set,
then \ex{tz-secs} and \ex{summer?} are set to their correct values.
Summer-time ambiguities are resolved using the original value of
\ex{summer?}. If the time zone doesn't have a
summer time variant, then \ex{summer?} is set to \ex{\#f}.
\itum{local time, date, summer? $\Rightarrow$ tz-name, tz-secs, summer?}
If the date information is provided up to second resolution,
but no time zone information is provided (both \ex{tz-name} and
\ex{tz-secs} aren't set), then we proceed as in the above case,
except the system's current time zone is used.
\end{itemize}
These rules allow one particular ambiguity to escape:
if both \ex{tz-name} and \ex{tz-secs} are set, they are not brought
into agreement. It isn't clear how to do this, nor is it clear which
one should take precedence.
\oops{\ex{fill-in-date!} isn't implemented yet.}
\end{desc}
\section{Environment variables}
\defun {setenv} {var val} \undefined
\defunx {getenv} {var} \str
\begin{desc}
These functions get and set the process environment, stored in the
external C variable \ex{char **environ}.
An environment variable \var{var} is a string.
If an environment variable is set to a string \var{val},
then the process' global environment structure is altered with an entry
of the form \ex{"\var{var}=\var{val}"}.
If \var{val} is {\sharpf}, then any entry for \var{var} is deleted.
\end{desc}
\defun {env->alist}{} {{\str$\rightarrow$\str} alist}
\begin{desc}
The \ex{env->alist} procedure converts the entire environment into
an alist, \eg,
\begin{code}
(("TERM" . "vt100")
("SHELL" . "/bin/csh")
("EDITOR" . "emacs")
\ldots)\end{code}
\end{desc}
\defun {alist->env} {alist} \undefined
\begin{desc}
\var{Alist} must be an alist whose keys are all strings, and whose values
are all either strings or string lists. String lists are converted to
colon lists (see below). The alist is installed as the current {\Unix}
environment (\ie, converted to a null-terminated C vector of
\ex{"\var{var}=\var{val}"} strings which is assigned to the global
\ex{char **environ}).
\end{desc}
The following three functions help the programmer manipulate alist
tables in some generally useful ways. They are all defined using
\ex{equal?} for key comparison.
\begin{defundesc} {alist-delete} {key alist} {alist}
Delete any entry labelled by value \var{key}.
\end{defundesc}
\begin{defundesc} {alist-update} {key val alist} {alist}
Delete \var{key} from \var{alist}, then cons on a
\ex{(\var{key} . \var{val})} entry.
\end{defundesc}
\defun{alist-compress} {alist} {alist}
\begin{desc}
Compresses \var{alist} by removing shadowed entries.
Example:
\begin{code}
;;; Shadowed (1 . c) entry removed.
(alist-compress '( (1 . a) (2 . b) (1 . c) (3 . d) ))
{\evalto} ((1 . a) (2 . b) (3 . d))\end{code}
\end{desc}
\defun {with-env*} {env-alist-delta thunk} {value(s) of thunk}
\defunx {with-total-env*} {env-alist thunk} {value(s) of thunk}
\begin{desc}
These procedures call \var{thunk} in the context of an altered
environment. They return whatever values \var{thunk} returns.
Non-local returns restore the environment to its outer value;
throwing back into the thunk by invoking a stored continuation
restores the environment back to its inner value.
The \var{env-alist-delta} argument specifies
a \emph{modification} to the current en\-vi\-ron\-ment---\var{thunk}'s
environment is the original environment overridden with the
bindings specified by the alist delta.
The \var{env-alist} argument specifies a complete environment
that is installed for \var{thunk}.
\end{desc}
\dfn {with-env} {env-alist-delta . body} {value(s) of body} {syntax}
\dfnx {with-total-env} {env-alist . body} {value(s) of body} {syntax}
\begin{desc}
These special forms provide syntactic sugar for \ex{with-env*}
and {\ttt with\=total\=env*}.
The env alists are not evaluated positions, but are implicitly backquoted.
In this way, they tend to resemble binding lists for \ex{let} and
\ex{let*} forms.
\end{desc}
Example: These four pieces of code all run the mailer with special
\cd{$TERM} and \cd{$EDITOR} values.
{\small
\begin{code}
(with-env (("TERM" . "xterm") ("EDITOR" . ,my-editor))
(run (mail shivers@lcs.mit.edu)))
\cb
(with-env* `(("TERM" . "xterm") ("EDITOR" . ,my-editor))
(\l{} (run (mail shivers@csd.hku.hk))))
\cb
(run (begin (setenv "TERM" "xterm") ; Env mutation happens
(setenv "EDITOR" my-editor) ; in the subshell.
(exec-epf (mail shivers@research.att.com))))
\cb
;; In this example, we compute an alternate environment ENV2
;; as an alist, and install it with an explicit call to the
;; EXEC-PATH/ENV procedure.
(let* ((env (env->alist)) ; Get the current environment,
(env1 (alist-update env "TERM" "xterm")) ; and compute
(env2 (alist-update env1 "EDITOR" my-editor))) ; the new env.
(run (begin (exec-path/env "mail" env2 "shivers@cs.cmu.edu"))))\end{code}}
\subsection{Path lists and colon lists}
Environment variables such as \ex{\$PATH} encode a list of strings
by separating the list elements with colon delimiters.
Once parsed into actual lists, these ordered lists can be manipulated
with the following two functions.
To convert between the colon-separated string encoding and the
list-of-strings representation, see the \ex{field-reader} and
\ex{join-strings} functions in section~\ref{sec:field-reader}.
\remark{An earlier release of scsh provided the \ex{split-colon-list}
and \ex{string-list->colon-list} functions. These have been
removed from scsh, and are replaced by the more general
parsers and unparsers of the field-reader module.}
%\defun {split-colon-list} {string} {{\str} list}
%\defunx {string-list->colon-list} {string-list} \str
%\begin{desc}
% Many {\Unix} lists, such as the \cd{$PATH} search path,
% are stored as ``colon lists.''
% A colon list is a string containing elements delimited by colon characters.
% These functions provide conversions between colon lists and true
% {\Scheme} lists.
%%
%\begin{code}
%(split-colon-list "/foo:/bar::/usr/tmp") \evalto
% ("/foo" "/bar" "" "/usr/tmp")\end{code}
%%
% \ex{string-list->colon-list} is the inverse function.
%
% \ex{with-env*}, \ex{with-total-env*}, and \ex{alist->env} all coerce
% string lists to colon lists where appropriate.
%\end{desc}
\defun {add-before} {elt before list} {list}
\defunx {add-after} {elt after list} {list}
\begin{desc}
These functions are for modifying search-path lists, where element order
is significant.
\ex{add-before} adds \var{elt} to the list immediately
before the first occurrence of \var{before} in the list.
If \var{before} is not in the list, \var{elt} is added to the end
of the list.
\ex{add-after} is similar:
\var{elt} is added after the last occurrence of \var{after}.
If \var{after} is not found,
\var{elt} is added to the beginning of the list.
Neither function destructively alters the original path-list.
The result may share structure with the original list.
Both functions use \ex{equal?} for comparing elements.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{\protect{\tt\$USER}, \protect{\tt\$HOME}, and \protect{\tt\$PATH}}
Like sh and unlike csh, scsh has \emph{no} interactive dependencies on
environment variables.
It does, however, initialise certain internal values at startup time from the
initial process environment, in particular \cd{$HOME} and \cd{$PATH}.
Scsh never uses \cd{$USER} at all.
It computes \ex{(user-login-name)} from the system call \ex{(user-uid)}.
\defvar {home-directory} \str
\defvarx {exec-path-list} {{\str} list}
\begin{desc}
Scsh accesses \cd{$HOME} at start-up time, and stores the value in the
global variable \ex{home-directory}. It uses this value for \ex{\~}
lookups and for returning to home on \ex{(chdir)}.
Scsh accesses \cd{$PATH} at start-up time, colon-splits the path list, and
stores the value in the global variable \ex{exec-path-list}. This list is
used for \ex{exec-path} and \ex{exec-path/env} searches.
\end{desc}
1995-11-03 23:41:53 -05:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\input{tty}