*** empty log message ***

This commit is contained in:
olin-shivers 2001-06-01 17:49:20 +00:00
parent 85003bce1d
commit 32b0c4bea5
8 changed files with 193 additions and 703 deletions

View File

@ -76,6 +76,7 @@ characters.
\subsection{Parsing fields}
\label{sec:field-splitter}
\defun {field-splitter} {[field num-fields]} \proc
\defunx {infix-splitter} {[delim num-fields handle-delim]} \proc

View File

@ -1,297 +0,0 @@
%&latex -*- latex -*-
\chapter{Changes from previous releases}
\label{sec:changes}
\newcommand{\itam}[1]{\item {#1} \\}
\section{Changes from the previous release}
This section details changes that have been made in scsh since
the previous release.
Scsh is now much more robust.
All known bugs have been fixed.
There have been many improvements and extensions made.
These new features and changes are listed below, in no particular order;
the relevant sections of the manual give the full details.
Scsh now supports complete {\Posix}, including signal handlers.
Early autoreaping of child processes is now handled by a \ex{SIGCHLD}
signal handler, so children are reaped as early as possible with no
user intervention required.
A functional static heap linker is included in this release.
It is ugly, limited in functionality, and extremely slow, but it works.
It can be used to build scsh binaries that start up instantly.
The regular expression system has been sped up.
Regular-expression compilation is now provided,
and the \ex{awk} macro has been rewritten to pre-compile
regexps used in rules outside the loop.
It is still, however, slower than it should be.
Execing programs should be faster in this release, since we now use the
\ex{CLOEXEC} status bit to get automatic closing of unrevealed
port file descriptors.
{scm}'s floating point support was inadvertently omitted from the last
release. It has been reinstated.
There is now a new command-line switch, \ex{-sfd \var{num}},
which causes scsh to read its script from file descriptor \var{num}.
\section{Changes from the penultimate release}
This section details changes that have been made in scsh since
the penultimate release.
Scsh is now much more robust.
All known bugs have been fixed.
There have been many improvements and extensions made.
We have also made made some incompatible changes.
The sections below briefly describe these new features and changes;
the relevant sections of the manual give the full details.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{New features}
This release incorporates several new features into scsh.
\begin{itemize}
\itam{Control of buffered I/O}
Scsh now allows you to control the buffering policy used for doing I/O
on a Scheme port.
\itam{Here-strings}
Scsh now has a new lexical feature, \verb|#<<|, that provides
the ability to enter long, multi-line string constants in scsh programs.
Such a string is called a ``here string,'' by analogy to the common
shell ``here document'' \ex{<<} redirection.
\itam{Delimited readers and read-line}
Scsh now has a powerful set of delimited readers.
These can be used to read input delimited by
a newline character (\ex{read-line}),
a blank line (\ex{read-paragraph}),
or the occurrence of any character in an arbitrary set (\ex{read-delimited}).
While these procedures can be applied to any Scheme input port,
there is native-code support for performing delimited reads on
Unix input sources, so doing block input with these procedures should be
much faster than the equivalent character-at-a-time Scheme code.
\itam{New system calls}
With the sole exception of signal handlers, scsh now has all of {\Posix}.
This release introduces
\begin{itemize}
\item \ex{select},
\item full terminal device control,
\item support for pseudo-terminal ``pty'' devices,
\item file locking,
\item process timing,
\item \ex{set-file-times},
\item \ex{seek} and \ex{tell}.
\end{itemize}
Note that having \ex{select}, pseudo-terminals, and tty device control means
that it is now possible to implement interesting network protocols, such as
telnet servers and clients, directly in Scheme.
\itam{New command-line switches}
There is a new set of command-line switches that make it possible
to write shell scripts using the {\scm} module system.
Scripts can use the new command-line switches to open dependent
modules and load dependent source code.
Scripts can also be written in the {\scm} module language,
which allows you to use it both as a standalone shell script,
and as a code module that can be loaded and used by other Scheme programs.
\itam{Static heap linking}
There is a new facility that allows you to compile a heap image
to a \ex{.o} file that can be linked with the scsh virtual machine.
This produces a standalone executable binary, makes startup time
near-instantaneous, and greatly improves memory performance---the
initial heap image is placed in the process' text pages,
where it is shared by different scsh processes, and does not occupy
space in the run-time heap.
\oops{The static heap linker was not documented and installed in time
for this release.}
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Incompatible improvements}
Some features of scsh have been improved in ways that are
not backwards-compatible with previous releases.
These changes should not affect most code;
however, please note the changes and modify your code accordingly.
\begin{itemize}
\itam{New process-object data-type returned by \ex{fork}}
Previous releases were prone to fill up the kernel's process table
if a program forked large numbers of processes and subsequently failed
to use \ex{wait} to reclaim the entries in the kernel's process table.
(This is a problem in standard C environments, as well.)
Scsh 0.4 introduces a new mechanism for automatically managing subprocesses.
Processes are no longer represented by an integer process id,
which is impossible to garbage-collect, but by an
abstract process data type that encapsulates the process id.
All processes are represented using the new data structures;
see the relevant section of the manual for further details.
\itam{Better stdio/current-port synchronisation}
The \ex{(begin \ldots)} process form now does a \ex{stdio->stdports}
call before executing its body.
This means that the Scheme code in the body ``sees'' any external
redirections.
For example, it means that if a \ex{begin} form in the middle of a pipeline
performs I/O on the current input and output ports, it will be communicating
with its upstream and downstream pipes.
\Eg, this code works as intended without the need for explicit synchronisation:
\begin{verbatim}
(run (| (gunzip)
;; Kill line 1 and insert doubled-sided
;; code at head of Postscript.
(begin (read-line) ; Eat first line.
(display "%!PS-Adobe-2.0\\n")
(display "statusdict /setduplexmode known ")
(display "{statusdict begin true ")
(display "setduplexmode end} if\n")
(exec-epf (cat)))
(lpr))
(< paper.ps))\end{verbatim}
Arranging for the \ex{begin} process form to synchronise
the current I/O ports with stdio means that all process forms now
see their epf's redirections.
\itam{\ex{file-match} more robust}
The \ex{file-match} procedure now catches any error condition
signalled by a match procedure,
and treats it as if the procedure had simply returned {\sharpf},
\ie, match failure.
This means \ex{file-match} no longer gets blown out of the water by
trying to apply a function like \ex{file-directory?} to a dangling symlink,
and other related OS errors.
\itam{Standard input now unbuffered}
Scsh's startup code now makes the initial current input port
(corresponding to file descriptor 0) unbuffered.
This keeps the shell from ``stealing'' input meant for subprocesses.
However, it does slow down character-at-a-time input processing.
If you are writing a program that is tolerant of buffered input,
and wish the efficiency gains, you can reset the buffering policy
yourself.
\itam{``writeable'' now spelled ``writable''}
We inconsistently spelled \ex{file-writable?} and \ex{file-not-writable?}
in the manual and the implementation.
We have now standardised on the common spelling ``writable'' in both.
The older bindings still exist in release 0.4, but will go away in future
releases.
\itam{\protect\ex{char-set-member?} replaced}
We have de-released the \ex{char-set-member?} procedure.
The scsh 0.3 version of this procedure took arguments
in the following order:
\codex{(char-set-member? \var{char} \var{char-set})}
This argument order is in accordance with standard mathematical useage
(\ie, $x \in S$), and also consistent with the R4RS
\ex{member}, \ex{memq} and \ex{memv} procedures.
It is, however, exactly opposite from the argument order
used by the \ex{char-set-member?} in MIT Scheme's character-set library.
If we left things as they were, we risked problems with code
ported over from MIT Scheme.
On the other hand, changing to conformance with MIT Scheme meant
inconsistency with common mathematical notation and other long-standing
Scheme procedures.
Either way was bound to introduce confusion.
We've taken the approach of simply removing the \ex{char-set-member?}
procedure altogether, and replacing it with a new procedure:
\codex{(char-set-contains? \var{cset} \var{char})}
Note that the argument order is consistent with the name.
\itam{\ex{file-attributes} now \ex{file-info}}
In keeping with the general convention in scsh of naming procedures
that retrieve information about system resources \ex{\ldots-info}
(\eg, \ex{tty-info}, \ex{user-info}, \ex{group-info}),
the \ex{file-attributes} procedure is now named \ex{file-info}.
We continue to export a \ex{file-attributes} binding for the current
release, but it will go away in future releases.
\itam{Renaming of I/O synchronisation procedures}
The \ex{(stdio->stdports \var{thunk})} procedure has been
renamed \ex{with-stdio-ports*};
there is now a corresponding \ex{with-stdio-ports} special form.
The \ex{stdio->stdports} procedure is now a nullary procedure
that side-effects the current set of current I/O port bindings.
\itam{New meta-arg line-two syntax}
Scsh now uses a simplified grammar for describing command-line
arguments read by the ``meta-arg'' switch from line two of a shell script.
If you were using this feature in previous releases, the three incompatible
changes of which to be aware are:
(1) tab is no longer allowed as an argument delimiter,
(2) a run of space characters is not equivalent to a single space,
(3) empty arguments are written a different way.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Backwards-compatible improvements}
Some existing features in scsh have been improved in ways that will
not effect existing code.
\begin{itemize}
\itam{Improved error reporting}
Exception handlers that print out error messages and warnings now
print their messages on the error output port,
instead of the current output port.
Previous releases used the current output port,
a problem inherited from Scheme 48.
Previous scsh releases flushed the Scheme 48 debugging tables when
creating the standard scsh heap image.
This trimmed the size of the heap image, but made error messages much
less comprehensible.
We now retain the debugging tables.
This bloats the heap image up by about 600kb. And worth it, too.
(We also have some new techniques for eliminating the run-time memory
penalty imposed by these large heap images.
Scsh's new static-heap technology allows for this data to be linked
into the text pages of the vm's binary, where it will not be touched
by the GC or otherwise affect the memory system until it is referenced.)
Finally, scsh now generates more informative error messages for syscall
errors.
For example, a file-open error previously told you what the error was
(\eg, ``Permission denied,'' or ``No such file or directory''),
but not which file you had tried to open.
We've improved this.
\itam{Closing a port twice allowed}
Scsh used to generate an error if you attempted to close a port
that had already been closed.
This is now allowed.
The close procedure returns a boolean to indicate whether the port had
already been closed or not.
\itam{Better time precision}
The \ex{time+ticks} procedure now returns sub-second precision on OS's
that support it.
\itam{Nicer print-methods for basic data-types}
Scsh's standard record types now print more informatively.
For example, a process object includes the process id in its
printed representation: the process object for process id 2653
prints as \verb|#{proc 2653}|.
\end{itemize}

View File

@ -23,6 +23,8 @@
\def\maketildeactive{\catcode`\~=13}
\def\~{\char`\~}
\newcommand{\evalsto}{\ensuremath{\Rightarrow}}
% One-line code examples
%\newcommand{\codex}[1]% One line, centred. Tight spacing.
% {$$\abovedisplayskip=.75ex plus 1ex minus .5ex%

View File

@ -18,6 +18,30 @@ This manual gives a complete description of scsh.
A general discussion of the design principles behind scsh can be found
in a companion paper, ``A Scheme Shell.''
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Copyright \& source-code license}
Scsh is open source. The complete sources come with the standard
distribution, which can be downloaded off the net.
For years, scsh's underlying Scheme implementation, Scheme 48, did not have an
open-source copyright. However, around 1999/2000, the Scheme 48 authors
graciously retrofitted a BSD-style open-source copyright onto the system.
Swept up by the fervor, we tacked an ideologically hip license onto scsh
source, ourselves (BSD-style, as well). Not that we ever cared before what you
did with the system.
As a result, the whole system is now open source, top-to-bottom.
We note that the code is a rich source for other Scheme implementations
to mine. Not only the \emph{code}, but the \emph{APIs} are available
for implementors working on Scheme environments for systems programming.
These APIs represent years of work, and should provide a big head-start
on any related effort. (Just don't call it ``scsh,'' unless it's
\emph{exactly} compliant with the scsh interfaces.)
Take all the code you like; we'll just write more.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Obtaining scsh}
Scsh is distributed via net publication.
@ -100,14 +124,11 @@ but the system as-released does not currently provide these features.
In the current release, the system has some rough edges.
It is quite slow to start up---loading the initial image into the
{\scm} virtual machine takes about a cpu second.
{\scm} virtual machine induces a noticeable delay.
This can be fixed with the static heap linker provided with this release.
This manual is very, very rough.
At some point, we hope to polish it up, finish it off, and re-typeset it
using markup, so we can generate html, info nodes, and {\TeX} output from
the single source without having to deal with Texinfo.
But it's all there is, for now.
We welcome parties interested in porting the manual to a more portable
XML or SGML format; please contact us if you are interested in doing so.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Naming conventions}
@ -375,11 +396,17 @@ All told, the \ex{define-record} form above defines the following procedures:
(ship:size \var{ship}) & Retrieve the \var{size} field. \\
\hline
(set-ship:x \var{ship} \var{new-x}) & Assign the \var{x} field. \\
(set-ship:y \var{ship} \var{new-y}) & Assign the \var{x} field. \\
(set-ship:y \var{ship} \var{new-y}) & Assign the \var{y} field. \\
(set-ship:size \var{ship} \var{new-size}) & Assign the \var{size} field. \\
\hline
(modify-ship:x \var{ship} \var{xfun}) & Modify \var{x} field with \var{xfun}. \\
(modify-ship:y \var{ship} \var{yfun}) & Modify \var{y} field with \var{yfun}. \\
(modify-ship:size \var{ship} \var{sizefun}) & Modify \var{size} field with \var{sizefun}. \\
\hline
(ship? \var{object}) & Type predicate. \\
\hline
(copy-ship \var{ship}) & Shallow-copy of the record. \\
\hline
\end{tabular}
\end{center}
%
@ -387,7 +414,9 @@ All told, the \ex{define-record} form above defines the following procedures:
An implementation of \ex{define-record} is available as a macro for Scheme
programmers to define their own record types;
the syntax is accessed by opening the package \ex{defrec-package}, which
exports the single syntax form \ex{define-record}.
exports the single syntax form \ex{define-record}.
See the source code for the \ex{defrec-package} module
for further details of the macro.
You must open this package to access the form.
Scsh does not export a record-definition package by default as there are
@ -417,21 +446,9 @@ you could not read and internalise such a twisted account without
bleeding from the nose and ears.
However, you might keep in mind the following simple fact: of all the
standards, {\Posix}, as far as I have been able to determine,
is the least common denominator.
standards, {\Posix} is the least common denominator.
So when this manual repeatedly refers to {\Posix}, the point is ``the
thing we are describing should be portable just about anywhere.''
Scsh sticks to {\Posix} when at all possible; it's major departure is
Scsh sticks to {\Posix} when at all possible; its major departure is
symbolic links, which aren't in {\Posix} (see---it
really \emph{is} a least common denominator).
However, just because {\Posix} is the l.c.d. standard doesn't mean everyone
supports all of it.
The guerilla PC {\Unix} implementations that have been springing up on
the net (\eg, NetBSD, Linux, FreeBSD, and so forth) are only recently coming
into compliance with the standard---although they are getting there.
We have been able to implement scsh completely on all of these systems,
however---the single exception is NeXTSTEP, whose buggy {\Posix} libraries
restricts us to partial support (these lacunae are indicated where relevant
in the rest of the manual).\footnote{Feel like porting scsh from {\Posix} to
NeXT's BSD API? Send us your fixes; we'll fold them in.}

View File

@ -1,4 +1,4 @@
%&latex -*- latex -*-
% -*- latex -*-
% This is the reference manual for the Scheme Shell.
@ -47,7 +47,6 @@
\include{awk}
\include{miscprocs}
\include{running}
\include{changes}
\include{todo}
\backmatter

View File

@ -1,27 +1,57 @@
% -*- latex -*-
\chapter{Strings and characters}
Scsh provides a set of procedures for processing strings and characters.
The procedures provided match regular expressions, search strings,
parse file-names, and manipulate sets of characters.
Also see chapters \ref{chapt:sre}, \ref{chapt:rdelim} and \ref{chapt:fr-awk}
on regular-expressions, record I/O, field parsing, and the awk loop.
The procedures documented there allow you to search and pattern-match strings,
read character-delimited records from ports,
use regular expressions to split the records into fields
(for example, splitting a string at every occurrence of colon or white-space),
and loop over streams of these records in a convenient way.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{String manipulation}
\label{sec:stringmanip}
Strings are the basic communication medium for {\Unix} processes, so a
shell language must have reasonable facilities for manipulating them.
Unix programming environment must have reasonable facilities for manipulating
them.
Scsh provides a powerful set of procedures for processing strings and
characters.
Besides the the facilities described in this chapter, scsh also provides
\begin{itemize}
\itum{Regular expressions (chapter~\ref{chapt:sre})}
A complete regular-expression system.
\itum{Field parsing, delimited record I/O and the awk loop
(chapter~\ref{chapt:fr-awk})}
These procedures let you read in chunks of text delimited by selected
characters, and
parse each record into fields based on regular expressions
(for example, splitting a string at every occurrence of colon or
white-space).
The \ex{awk} form allows you to loop over streams of these records
in a convenient way.
\itum{The SRFI-13 string libraries}
This pair of libraries contains procedures that create, fold, iterate over,
search, compare, assemble, cut, hash, case-map, and otherwise manipulate
strings.
They are provided by the \ex{string-lib} and \ex{string-lib-internals}
packages, and are also available in the default \ex{scsh} package.
More documentation on these procedures can be found at URLs
\begin{tightinset}
% The gratuitous mbox makes xdvi render the hyperlinks better.
\mbox{\url{http://srfi.schemers.org/srfi-13/srfi-13.html}}\\
\url{http://srfi.schemers.org/srfi-13/srfi-13.txt}
\end{tightinset}
\itum{The SRFI-14 character-set library}
This library provides a set-of-characters abstraction, which is frequently
useful when searching, parsing, filtering or otherwise operating on
strings and character data. The SRFI is provided by the \ex{char-set-lib}
package; it's bindings are also available in the default \ex{scsh} package.
More documentation on this library can be found at URLs
\begin{tightinset}
% The gratuitous mbox makes xdvi render the hyperlinks better.
\mbox{\url{http://srfi.schemers.org/srfi-14/srfi-14.html}}\\
\url{http://srfi.schemers.org/srfi-14/srfi-14.txt}
\end{tightinset}
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Manipulating file-names}
\section{Manipulating file names}
\label{sec:filenames}
These procedures do not access the file-system at all; they merely operate
@ -30,7 +60,7 @@ design. Perhaps a more sophisticated system would be better, something
like the pathname abstractions of {\CommonLisp} or MIT Scheme. However,
being {\Unix}-specific, we can be a little less general.
\subsubsection{Terminology}
\subsection{Terminology}
These procedures carefully adhere to the {\Posix} standard for file-name
resolution, which occasionally entails some slightly odd things.
This section will describe these rules, and give some basic terminology.
@ -95,7 +125,7 @@ interpreted in file-name form, \ie, as root.
\subsubsection{Procedures}
\subsection{Procedures}
\defun {file-name-directory?} {fname} \boolean
\defunx {file-name-non-directory?} {fname} \boolean
@ -355,38 +385,7 @@ is also frequently useful for expanding file-names.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Other string manipulation facilities}
\defun {index} {string char [start]} {{\fixnum} or false}
\defunx {rindex} {string char [start]} {{\fixnum} or false}
\begin{desc}
These procedures search through \var{string} looking for an occurrence
of character \var{char}. \ex{index} searches left-to-right; \ex{rindex}
searches right-to-left.
\ex{index} returns the smallest index $i$ of \var{string} greater
than or equal to \var{start} such that $\var{string}[i] = \var{char}$.
The default for \var{start} is zero. If there is no such match,
\ex{index} returns false.
\ex{rindex} returns the largest index $i$ of \var{string} less than
\var{start} such that $\var{string}[i] = \var{char}$.
The default for \var{start} is \ex{(string-length \var{string})}.
If there is no such match, \ex{rindex} returns false.
\end{desc}
I should probably snarf all the MIT Scheme string functions, and stick them
in a package. {\Unix} programs need to mung character strings a lot.
MIT string match commands:
\begin{tightcode}
[sub]string-match-{forward,backward}[-ci]
[sub]string-{prefix,suffix}[-ci]?
[sub]string-find-{next,previous}-char[-ci]
[sub]string-find-{next,previous}-char-in-set
[sub]string-replace[!]
\ldots\etc\end{tightcode}
These are not currently provided.
\section{Other string manipulation facilities}
\begin{defundesc} {substitute-env-vars} {fname} \str
Replace occurrences of environment variables with their values.
@ -412,315 +411,72 @@ These are not currently provided.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Character sets}
\label{sec:char-sets}
\section{Character predicates}
Scsh provides a \ex{char-set} type for expressing sets of characters.
These sets are used by some of the delimited-input procedures
(section~\ref{sec:field-reader}).
Scsh's character set package was adapted and extended from
Project Mac's MIT Scheme package.
Note that the character type used in the current implementation corresponds
to the ASCII character set---but you would be wise not to build this
assumption into your code if you can help it.\footnote{
Actually, it's slightly uglier than that, albeit somewhat more
useful. The current character type corresponds to an eight-bit
superset of ASCII. The \ex{ascii->char} and \ex{char->ascii}
functions will preserve this eighth bit. However, none of the
the high 128 characters appear in any of the standard character
sets defined in section~\ref{sec:std-csets}, except for
\ex{char-set:full}. If someone would email the authors a listing
of the full Latin-1 definition, we'll be happy to upgrade these
sets' definitions to make them Latin-1 compliant.}
\defun{char-set?}{x}\boolean
\begin{desc}
Is the object \var{x} a character set?
\end{desc}
\defun{char-set=}{\vari{cs}1 \vari{cs}2\ldots}\boolean
\begin{desc}
Are the character sets equal?
\end{desc}
\defun{char-set<=}{\vari{cs}1 \vari{cs}2\ldots}\boolean
\begin{desc}
Returns true if every character set \vari{cs}{i} is
a subset of character set \vari{cs}{i+1}.
\end{desc}
\defun{char-set-fold}{kons knil cs}\object
\begin{desc}
This is the fundamental iterator for character sets.
Applies the function \var{kons} across the character set \var{cs} using
initial state value \var{knil}.
That is, if \var{cs} is the empty set, the procedure returns \var{knil}.
Otherwise, some element \var{c} of \var{cs} is chosen; let \var{cs'} be
the remaining, unchosen characters.
The procedure returns
\begin{tightcode}
(char-set-fold \var{kons} (\var{kons} \var{c} \var{knil}) \var{cs'})\end{tightcode}
For example, we could define \ex{char-set-members} (see below)
as
\begin{tightcode}
(lambda (cs) (char-set-fold cons '() cs))\end{tightcode}
\remark{This procedure was formerly named \texttt{\indx{reduce-char-set}}.
The old binding is still provided, but is deprecated and will
probably vanish in a future release.}
\end{desc}
\defun{char-set-for-each}{p cs}{\undefined}
\begin{desc}
Apply procedure \var{p} to each character in the character set \var{cs}.
Note that the order in which \var{p} is applied to the characters in the
set is not specified, and may even change from application to application.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Creating character sets}
\defun{char-set}{\vari{char}1\ldots}{char-set}
\begin{desc}
Return a character set containing the given characters.
\end{desc}
\defun{chars->char-set}{chars}{char-set}
\begin{desc}
Return a character set containing the characters in the list \var{chars}.
\end{desc}
\defun{string->char-set}{s}{char-set}
\begin{desc}
Return a character set containing the characters in the string \var{s}.
\end{desc}
\defun{predicate->char-set}{pred}{char-set}
\begin{desc}
Returns a character set containing every character \var{c} such that
\ex{(\var{pred} \var{c})} returns true.
\end{desc}
\defun{ascii-range->char-set}{lower upper}{char-set}
\begin{desc}
Returns a character set containing every character whose {\Ascii}
code lies in the half-open range $[\var{lower},\var{upper})$.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Querying character sets}
\defun {char-set-members}{char-set}{character-list}
\begin{desc}
This procedure returns a list of the members of \var{char-set}.
\end{desc}
\defunx{char-set-contains?}{char-set char}\boolean
\begin{desc}
This procedure tests \var{char} for membership in set \var{char-set}.
\remark{Previous releases of scsh called this procedure \ex{char-set-member?},
reversing the order of the arguments.
This made sense, but was unfortunately the reverse order in which the
arguments appear in MIT Scheme.
A reasonable argument order was not backwards-compatible with MIT Scheme;
on the other hand, the MIT Scheme argument order was counter-intuitive
and at odds with common mathematical notation and the \ex{member} family
of R4RS procedures.
We sought to escape the dilemma by shifting to a new name.}
\end{desc}
\defun{char-set-size}{cs}\integer
\begin{desc}
Returns the number of elements in character set \var{cs}.
\end{desc}
\defun{char-set-every?}{pred cs}\boolean
\defunx{char-set-any?}{pred cs}\object
\begin{desc}
The \ex{char-set-every?} procedure returns true if predicate \var{pred}
returns true of every character in the character set \var{cs}.
Likewise, \ex{char-set-any?} applies \var{pred} to every character in
character set \var{cs}, and returns the first true value it finds.
If no character produces a true value, it returns false.
The order in which these procedures sequence through the elements of
\var{cs} is not specified.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Character-set algebra}
\defun {char-set-invert}{char-set}{char-set}
\defunx{char-set-union}{\vari{char-set}1\ldots}{char-set}
\defunx{char-set-intersection}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set}
\defunx{char-set-difference}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set}
\begin{desc}
These procedures implement set complement, union, intersection, and difference
for character sets.
The union, intersection, and difference operations are n-ary, associating
to the left; the difference function requires at least one argument, while
union and intersection may be applied to zero arguments.
\end{desc}
\defun {char-set-adjoin}{cs \vari{char}1\ldots}{char-set}
\defunx{char-set-delete}{cs \vari{char}1\ldots}{char-set}
\begin{desc}
Add/delete the \vari{char}i characters to/from character set \var{cs}.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Standard character sets}
\label{sec:std-csets}
Several character sets are predefined for convenience:
\begin{center}
\newcommand{\entry}[1]{\ex{#1}\index{#1}}
\begin{tabular}{|ll|}
\hline
\entry{char-set:lower-case} & Lower-case alphabetic chars \\
\entry{char-set:upper-case} & Upper-case alphabetic chars \\
\entry{char-set:alphabetic} & Alphabetic chars \\
\entry{char-set:numeric} & Decimal digits: 0--9 \\
\entry{char-set:alphanumeric} & Alphabetic or numeric \\
\entry{char-set:graphic} & Printing characters except space \\
\entry{char-set:printing} & Printing characters including space \\
\entry{char-set:whitespace} & Whitespace characters \\
\entry{char-set:control} & Control characters \\
\entry{char-set:punctuation} & Punctuation characters \\
\entry{char-set:hex-digit} & A hexadecimal digit: 0--9, A--F, a--f \\
\entry{char-set:blank} & Blank characters \\
\entry{char-set:ascii} & A character in the ASCII set. \\
\entry{char-set:empty} & Empty set \\
\entry{char-set:full} & All characters \\
\hline
\end{tabular}
\end{center}
The first eleven of these correspond to the character classes defined in
Posix.
Note that there may be characters in \ex{char-set:alphabetic} that are
neither upper or lower case---this might occur in implementations that
use a character type richer than ASCII, such as Unicode.
A ``graphic character'' is one that would put ink on your page.
While the exact composition of these sets may vary depending upon the
character type provided by the Scheme system upon which scsh is running,
here are the definitions for some of the sets in an ASCII character set:
\begin{center}
\newcommand{\entry}[1]{\ex{#1}\index{#1}}
\begin{tabular}{|ll|}
\hline
char-set:alphabetic & A--Z and a--z \\
char-set:lower-case & a--z \\
char-set:upper-case & A--Z \\
char-set:graphic & Alphanumeric + punctuation \\
char-set:whitespace & Space, newline, tab, page,
vertical tab, carriage return \\
char-set:blank & Space and tab \\
char-set:control & ASCII 0--31 and 127 \\
char-set:punctuation & \verb|!"#$%&'()*+,-./:;<=>|\verb#?@[\]^_`{|}~# \\
\hline
\end{tabular}
\end{center}
\defun {char-alphabetic?}\character\boolean
\defun {char-letter?}\character\boolean
\defunx{char-lower-case?}\character\boolean
\defunx{char-upper-case?}\character\boolean
\defunx{char-numeric? }\character\boolean
\defunx{char-alphanumeric?}\character\boolean
\defunx{char-title-case?}\character\boolean
\defunx{char-digit?}\character\boolean
\defunx{char-letter+digit?}\character\boolean
\defunx{char-graphic?}\character\boolean
\defunx{char-printing?}\character\boolean
\defunx{char-whitespace?}\character\boolean
\defunx{char-blank?}\character\boolean
\defunx{char-control?}\character\boolean
\defunx{char-iso-control?}\character\boolean
\defunx{char-punctuation?}\character\boolean
\defunx{char-hex-digit?}\character\boolean
\defunx{char-ascii?}\character\boolean
\begin{desc}
These predicates are defined in terms of the above character sets.
Each of these predicates tests for membership in one of the standard
character sets provided by the SRFI-14 character-set library.
Additionally, the following redundant bindings are provided for {R5RS}
compatibility:
\begin{inset}
\begin{tabular}{ll}
{R5RS} name & scsh definition \\ \hline
\ex{char-alphabetic?} & \ex{char-letter+digit?} \\
\ex{char-numeric?} & \ex{char-digit?} \\
\ex{char-alphanumeric?} & \ex{char-letter+digit?}
\end{tabular}
\end{inset}
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Linear-update character-set operations}
These procedures have a hybrid pure-functional/side-effecting semantics:
they are allowed, but not required, to side-effect one of their parameters
in order to construct their result.
An implementation may legally implement these procedures as pure,
side-effect-free functions, or it may implement them using side effects,
depending upon the details of what is the most efficient or simple to
implement in terms of the underlying representation.
\section{Deprecated character-set procedures}
\label{sec:char-sets}
What this means is that clients of these procedures \emph{may not} rely
upon these procedures working by side effect.
For example, this is not guaranteed to work:
\begin{verbatim}
(let ((cs (char-set #\a #\b #\c)))
(char-set-adjoin! cs #\d)
cs) ; Could be either {a,b,c} or {a,b,c,d}.
\end{verbatim}
However, this is well-defined:
\begin{verbatim}
(let ((cs (char-set #\a #\b #\c)))
(char-set-adjoin! cs #\d)) ; {a,b,c,d}
\end{verbatim}
So clients of these procedures write in a functional style, but must
additionally be sure that, when the procedure is called, there are no
other live pointers to the potentially-modified character set (hence the term
``linear update'').
The SRFI-13 character-set library grew out of an earlier library developed
for scsh.
However, the SRFI standardisation process introduced incompatibilities with
the original scsh bindings.
The current version of scsh provides the library
\ex{obsolete-char-set-lib}, which contains the old bindings found in
previous releases of scsh.
The following table lists the members of this library, along with
the equivalent SRFI-13 binding. This obsolete library is deprecated and
\emph{not} open by default in the standard \ex{scsh} environment;
new code should use the SRFI-13 bindings.
\begin{inset}
\begin{tabular}{ll}
Old \ex{obsolete-char-set-lib} & SRFI-13 \ex{char-set-lib} \\ \hline
There are two benefits to this convention:
\begin{itemize}
\item Implementations are free to provide the most efficient possible
implementation, either functional or side-effecting.
\item Programmers may nonetheless continue to assume that character sets
are purely functional data structures: they may be reliably shared
without needing to be copied, uniquified, and so forth.
\end{itemize}
In practice, these procedures are most useful for efficiently constructing
character sets in a side-effecting manner, in some limited local context,
before passing the character set outside the local construction scope to be
used in a functional manner.
Scsh provides no assistance in checking the linearity of the potentially
side-effected parameters passed to these functions --- there's no linear
type checker or run-time mechanism for detecting violations.
\defun{char-set-copy}{cs}{char-set}
\begin{desc}
Returns a copy of the character set \var{cs}.
``Copy'' means that if either the input parameter or the
result value of this procedure is passed to one of the linear-update
procedures described below, the other character set is guaranteed
not to be altered.
(A system that provides pure-functional implementations of the rest of
the linear-operator suite could implement this procedure as the
identity function.)
\end{desc}
\defun{char-set-adjoin!}{cs \vari{char}1\ldots}{char-set}
\begin{desc}
Add the \vari{char}i characters to character set \var{cs}, and
return the result.
This procedure is allowed, but not required, to side-effect \var{cs}.
\end{desc}
\defun{char-set-delete!}{cs \vari{char}1\ldots}{char-set}
\begin{desc}
Remove the \vari{char}i characters to character set \var{cs}, and
return the result.
This procedure is allowed, but not required, to side-effect \var{cs}.
\end{desc}
\defun {char-set-invert!}{char-set}{char-set}
\defunx{char-set-union!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set}
\defunx{char-set-intersection!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set}
\defunx{char-set-difference!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set}
\begin{desc}
These procedures implement set complement, union, intersection, and difference
for character sets.
They are allowed, but not required, to side-effect their first parameter.
The union, intersection, and difference operations are n-ary, associating
to the left.
\end{desc}
\ex{char-set-members} & \ex{char-set->list} \\
\ex{chars->char-set} & \ex{list->char-set} \\
\ex{ascii-range->char-set} & \ex{ucs-range->char-set} (not exact) \\
\ex{predicate->char-set} & \ex{char-set-filter} (not exact) \\
\ex{char-set-every}? & \ex{char-set-every} \\
\ex{char-set-any}? & \ex{char-set-any} \\
\\
\ex{char-set-invert} & \ex{char-set-complement} \\
\ex{char-set-invert}! & \ex{char-set-complement!} \\
\\
\ex{char-set:alphabetic} & \ex{char-set:letter} \\
\ex{char-set:numeric} & \ex{char-set:digit} \\
\ex{char-set:alphanumeric} & \ex{char-set:letter+digit} \\
\ex{char-set:control} & \ex{char-set:iso-control}
\end{tabular}
\end{inset}
Note also that the \ex{->char-set} procedure no longer handles a predicate
argument.

View File

@ -963,11 +963,6 @@ Note that once a Scheme port is revealed in scsh, the runtime will not
shift the port around with \ex{dup()} and \ex{close()}.
This means the file-locking procedures can then be applied to the port's
associated file descriptor.
NeXTSTEP users should also note that even minimalist {\Posix} file locking
is not supported for NFS-mounted files in NeXTSTEP; NeXT claims they will
fix this in NS release 4.
We'd appreciate hearing from users when and if this happens.
}
{\Posix} allows the user to lock a region of a file with either
@ -1392,8 +1387,8 @@ Returns:
Note that the rules of backslash for {\Scheme} strings and glob patterns
work together to require four backslashes in a row to specify a
single literal backslash. Fortunately, this should be a rare
occurrence.
single literal backslash. Fortunately, it is very rare that a backslash
occurs in a Unix file name.
A glob subpattern will not match against dot files unless the first
character of the subpattern is a literal ``\ex{.}''.
@ -2623,9 +2618,6 @@ all of the complexity is optional,
and defaulting all the optional arguments reduces the system
to a simple interface.
\remark{This time package does not currently work with NeXTSTEP, as NeXTSTEP
does not provide a {\Posix}-compliant time library that will even link.}
\subsection{Terminology}
``UTC'' and ``UCT'' stand for ``universal coordinated time,'' which is the
official name for what is colloquially referred to as ``Greenwich Mean
@ -2992,7 +2984,8 @@ If \var{val} is {\sharpf}, then any entry for \var{var} is deleted.
an alist, \eg,
\begin{code}
(("TERM" . "vt100")
("SHELL" . "/bin/csh")
("SHELL" . "/usr/local/bin/scsh")
("PATH" . "/sbin:/usr/sbin:/bin:/usr/bin")
("EDITOR" . "emacs")
\ldots)\end{code}
\end{desc}
@ -3005,6 +2998,21 @@ If \var{val} is {\sharpf}, then any entry for \var{var} is deleted.
environment (\ie, converted to a null-terminated C vector of
\ex{"\var{var}=\var{val}"} strings which is assigned to the global
\ex{char **environ}).
\begin{code}
;;; Note $PATH entry is converted
;;; to /sbin:/usr/sbin:/bin:/usr/bin.
(alist->env '(("TERM" . "vt100")
("PATH" "/sbin" "/usr/sbin" "/bin")
("SHELL" . "/usr/local/bin/scsh")))
\end{code}
Note that \ex{env->alist} and \ex{alist->env} are not exact
inverses---\ex{alist->env} will convert a list value into a single
colon-separated string, but \ex{env->alist} will not parse colon-separated
values into lists. (See the \ex{\$PATH} element in the examples given for
each procedure.)
\end{desc}
The following three functions help the programmer manipulate alist
@ -3082,18 +3090,30 @@ Example: These four pieces of code all run the mailer with special
\subsection{Path lists and colon lists}
Environment variables such as \ex{\$PATH} encode a list of strings
by separating the list elements with colon delimiters.
Once parsed into actual lists, these ordered lists can be manipulated
with the following two functions.
When environment variables such as \ex{\$PATH} need to encode a list of
strings (such as a list of directories to be searched),
the common Unix convention is to separate the list elements with
colon delimiters.\footnote{\ldots and hope the individual list elements
don't contain colons themselves.}
To convert between the colon-separated string encoding and the
list-of-strings representation, see the \ex{field-reader} and
\ex{join-strings} functions in section~\ref{sec:field-reader}.
\remark{An earlier release of scsh provided the \ex{split-colon-list}
and \ex{string-list->colon-list} functions. These have been
removed from scsh, and are replaced by the more general
parsers and unparsers of the field-reader module.}
list-of-strings representation, see the \ex{infix-splitter} function
(section~\ref{sec:field-splitter}) and the string library's
\ex{string-join} function.
For example,
\begin{code}
(define split (infix-splitter (rx ":")))
(split "/sbin:/bin::/usr/bin") {\evalsto}
'("/sbin" "/bin" "" "/usr/bin")
(string-join ":" '("/sbin" "/bin" "" "/usr/bin")) {\evalsto}
"/sbin:/bin::/usr/bin"\end{code}
The following two functions are useful for manipulating these ordered lists,
once they have been parsed from their colon-separated form.
%\remark{An earlier release of scsh provided the \ex{split-colon-list}
% and \ex{string-list->colon-list} functions. These have been
% removed from scsh, and are replaced by the more general
% parsers and unparsers of the field-reader module.}
%
%\defun {split-colon-list} {string} {{\str} list}
%\defunx {string-list->colon-list} {string-list} \str
%\begin{desc}
@ -3146,15 +3166,18 @@ Scsh never uses \cd{$USER} at all.
It computes \ex{(user-login-name)} from the system call \ex{(user-uid)}.
\defvar {home-directory} \str
\defvarx {exec-path-list} {{\str} list}
\defvarx {exec-path-list} {{\str} list fluid}
\begin{desc}
Scsh accesses \cd{$HOME} at start-up time, and stores the value in the
global variable \ex{home-directory}. It uses this value for \ex{\~}
lookups and for returning to home on \ex{(chdir)}.
Scsh accesses \cd{$PATH} at start-up time, colon-splits the path list, and
stores the value in the global variable \ex{exec-path-list}. This list is
stores the value in the fluid \ex{exec-path-list}. This list is
used for \ex{exec-path} and \ex{exec-path/env} searches.
To access, rebind or side-effect fluid cells, you must open
the \ex{fluids} package.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

View File

@ -17,19 +17,8 @@ an elegant language; go wild.
\item An X gui interface. (Needs threads.)
\item A better C function/data-structure interface. This is not easy.
\item More network protocols. Telnet and ftp would be the most important.
\item An ILU interface.
\item An RPC system, with ``tail-recursion.''
\item Interfaces to relational db's.
This would be quite useful for Web servers.
An s-expression embedding of SQL would be a key design component
of such a system, along the lines of scsh's process notation or
\ex{awk} notation.
\item Port Edwin, and emacs text editor written in MIT Scheme, to scsh.
Combine it with scsh's OS interfaces to make a visual shell.
\item An \ex{expect} knock-off.
\item A \ex{make} replacement, using scsh's process notation in the build
rules.
\item Manual hacking.
\begin{itemize}
\item The {\LaTeX} hackery needs yet another serious pass. Most importantly,