scsh-0.5/doc/scsh-manual/strings.tex

661 lines
25 KiB
TeX
Raw Normal View History

1995-11-03 23:41:53 -05:00
% -*- latex -*-
1995-10-13 23:34:21 -04:00
\chapter{Strings and characters}
Scsh provides a set of procedures for processing strings and characters.
The procedures provided match regular expressions, search strings,
parse file-names, and manipulate sets of characters.
1995-11-03 23:41:53 -05:00
Also see chapters \ref{chapt:rdelim} and \ref{chapt:fr-awk}
on record I/O, field parsing, and the awk loop.
1995-10-13 23:34:21 -04:00
The procedures documented there allow you to read character-delimited
records from ports, use regular expressions to split the records into fields
(for example, splitting a string at every occurrence of colon or white-space),
and loop over streams of these records in a convenient way.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{String manipulation}
\label{sec:stringmanip}
Strings are the basic communication medium for {\Unix} processes, so a
shell language must have reasonable facilities for manipulating them.
\subsection{Regular expressions}
\label{sec:regexps}
The following functions perform regular expression matching.
The code uses Henry Spencer's regular expression package.
\begin{defundesc}{string-match} {regexp string [start]} {match or false}
Search \var{string} starting at position \var{start}, looking for a match
for \var{regexp}. If a match is found, return a match structure describing
the match, otherwise {\sharpf}. \var{Start} defaults to 0.
\end{defundesc}
\begin{defundesc} {regexp-match?} {obj} \boolean
Is the object a regular expression match?
\end{defundesc}
1997-04-04 17:36:16 -05:00
\begin{defundesc} {match:start} {match [match-number]} {{\fixnum} or false}
1995-11-03 23:41:53 -05:00
Returns the start position of the match denoted by \var{match-number}.
1995-10-13 23:34:21 -04:00
The whole regexp is 0. Each further number represents positions
enclosed by \ex{(\ldots)} sections. \var{Match-number} defaults to 0.
1997-03-10 21:49:52 -05:00
If the regular expression matches as a whole,
but a particular parenthesized sub-expression does not match, then
\ex{match:start} returns {\sharpf}.
1995-10-13 23:34:21 -04:00
\end{defundesc}
\begin{defundesc} {match:end} {match [match-number]} \fixnum
Returns the end position of the match denoted by \var{match-number}.
\var{Match-number} defaults to 0 (the whole match).
1997-03-10 21:49:52 -05:00
If the regular expression matches as a whole,
but a particular parenthesized sub-expression does not match, then
\ex{match:end} returns {\sharpf}.
1995-10-13 23:34:21 -04:00
\end{defundesc}
1997-04-04 17:36:16 -05:00
\begin{defundesc} {match:substring} {match [match-number]} {{\str} or false}
1995-10-13 23:34:21 -04:00
Returns the substring matched by match \var{match-number}.
\var{Match-number} defaults to 0 (the whole match).
1997-04-04 17:36:16 -05:00
If there was no match, returns false.
1995-10-13 23:34:21 -04:00
\end{defundesc}
Regular expression matching compiles patterns into special data
structures which can be efficiently used to match against strings.
The overhead of compiling patterns that will be used for multiple
searches can be avoided by these lower-level routines:
%
\begin{defundesc} {make-regexp} {str} {re}
Generate a compiled regular expression from the given string.
\end{defundesc}
\begin{defundesc} {regexp?} {obj} \boolean
Is the object a regular expression?
\end{defundesc}
\begin{defundesc} {regexp-exec} {regexp str [start]} {match or false}
Apply the regular expression \var{regexp} to the string \var{str} starting
at position \var{start}. If the match succeeds it returns a regexp-match,
otherwise {\sharpf}. \var{Start} defaults to 0.
\end{defundesc}
\defun{regexp-quote}{str}{\str}
\begin{desc}
Returns a regular expression that matches the string \var{str} exactly.
In other words, it quotes the regular expression, prepending backslashes
to all the special regexp characters in \var{str}.
\begin{code}
(regexp-quote "*Hello* world.")
1995-11-03 23:41:53 -05:00
{\evalto}"\\\\*Hello\\\\* world\\\\."\end{code}
1995-10-13 23:34:21 -04:00
\end{desc}
\defun{regexp-substitute}{port match . items}{{\str} or \undefined}
\begin{desc}
This procedure can be used to perform string substitutions based on
regular expression matches.
The results of the substitution can be either output to a port or
returned as a string.
The \var{match} argument is a regular expression match structure
that controls the substitution.
If \var{port} is an output port, the \var{items} are written out to
the port:
\begin{itemize}
\item If an item is a string, it is copied directly to the port.
\item If an item is an integer, the corresponding submatch from \var{match}
is written to the port.
\item If an item is \ex{'pre},
the prefix of the matched string (the text preceding the match)
is written to the port.
\item If an item is \ex{'post},
the suffix of the matched string is written.
\end{itemize}
If \var{port} is {\sharpf}, nothing is written, and a string is constructed
and returned instead.
\end{desc}
\defun{regexp-substitute/global}{port regexp string . items}
{{\str} or \undefined}
\begin{desc}
This procedure is similar to \ex{regexp-substitute},
but can be used to perform repeated match/substitute operations over
a string.
It has the following differences with \ex{regexp-substitute}:
\begin{itemize}
\item It takes a regular expression and string to be matched as
parameters, instead of a completed match structure.
\item If the regular expression doesn't match the string, this
procedure is the identity transform---it returns or outputs the
string.
\item If an item is \ex{'post}, the procedure recurses on the suffix string
(the text from \var{string} following the match).
Including a \ex{'post} in the list of items is how one gets multiple
match/substitution operations.
\item If an item is a procedure, it is applied to the match structure for
a given match.
The procedure returns a string to be used in the result.
\end{itemize}
Some examples:
{\small
\begin{widecode}
;;; Replace occurrences of "Cotton" with "Jin".
(regexp-substitute/global #f "Cotton" s
'pre "Jin" 'post)
;;; mm/dd/yy -> dd/mm/yy date conversion.
(regexp-substitute/global #f "([0-9]+)/([0-9]+)/([0-9]+)" ; mm/dd/yy
s ; Source string
'pre 2 "/" 1 "/" 3 'post)
;;; "9/29/61" -> "Sep 29, 1961" date conversion.
(regexp-substitute/global #f "([0-9]+)/([0-9]+)/([0-9]+)" ; mm/dd/yy
s ; Source string
'pre
;; Sleazy converter -- ignores "year 2000" issue, and blows up if
;; month is out of range.
(lambda (m)
(let ((mon (vector-ref '#("Jan" "Feb" "Mar" "Apr" "May" "Jun"
"Jul" "Aug" "Sep" "Oct" "Nov" "Dec")
(- (string->number (match:substring m 1)) 1)))
(day (match:substring m 2))
(year (match:substring m 3)))
(string-append mon " " day ", 19" year)))
'post)
;;; Remove potentially offensive substrings from string S.
(regexp-substitute/global #f "Windows|tcl|Intel" s
'pre 'post)\end{widecode}}
\end{desc}
1995-10-13 23:34:21 -04:00
\subsection{Other string manipulation facilities}
\defun {index} {string char [start]} {{\fixnum} or false}
\defunx {rindex} {string char [start]} {{\fixnum} or false}
\begin{desc}
These procedures search through \var{string} looking for an occurrence
of character \var{char}. \ex{index} searches left-to-right; \ex{rindex}
searches right-to-left.
\ex{index} returns the smallest index $i$ of \var{string} greater
than or equal to \var{start} such that $\var{string}[i] = \var{char}$.
The default for \var{start} is zero. If there is no such match,
\ex{index} returns false.
\ex{rindex} returns the largest index $i$ of \var{string} less than
\var{start} such that $\var{string}[i] = \var{char}$.
The default for \var{start} is \ex{(string-length \var{string})}.
If there is no such match, \ex{rindex} returns false.
\end{desc}
I should probably snarf all the MIT Scheme string functions, and stick them
in a package. {\Unix} programs need to mung character strings a lot.
MIT string match commands:
\begin{tightcode}
[sub]string-match-{forward,backward}[-ci]
[sub]string-{prefix,suffix}[-ci]?
[sub]string-find-{next,previous}-char[-ci]
[sub]string-find-{next,previous}-char-in-set
[sub]string-replace[!]
\ldots\etc\end{tightcode}
These are not currently provided.
\begin{defundesc} {substitute-env-vars} {fname} \str
Replace occurrences of environment variables with their values.
An environment variable is denoted by a dollar sign followed by
alphanumeric chars and underscores, or is surrounded by braces.
\begin{exampletable}
\splitline{\ex{(substitute-env-vars "\$USER/.login")}}
{\ex{"shivers/.login"}} \\
\cd{(substitute-env-vars "$\{USER\}_log")} & \cd{"shivers_log"}
\end{exampletable}
\end{defundesc}
\subsection{Manipulating file-names}
\label{sec:filenames}
These procedures do not access the file-system at all; they merely operate
on file-name strings. Much of this structure is patterned after the gnu emacs
design. Perhaps a more sophisticated system would be better, something
like the pathname abstractions of {\CommonLisp} or MIT Scheme. However,
being {\Unix}-specific, we can be a little less general.
\subsubsection{Terminology}
These procedures carefully adhere to the {\Posix} standard for file-name
resolution, which occasionally entails some slightly odd things.
This section will describe these rules, and give some basic terminology.
A \emph{file-name} is either the file-system root (``/''),
or a series of slash-terminated directory components, followed by
a a file component.
Root is the only file-name that may end in slash.
Some examples:
\begin{center}
\begin{tabular}{lll}
File name & Dir components & File component \\\hline
\ex{src/des/main.c} & \ex{("src" "des")} & \ex{"main.c"} \\
\ex{/src/des/main.c} & \ex{("" "src" "des")} & \ex{"main.c"} \\
\ex{main.c} & \ex{()} & \ex{"main.c"} \\
\end{tabular}
\end{center}
Note that the relative filename \ex{src/des/main.c} and the absolute filename
\ex{/src/des/main.c} are distinguished by the presence of the root component
\ex{""} in the absolute path.
Multiple embedded slashes within a path have the same meaning as
a single slash.
More than two leading slashes at the beginning of a path have the same
meaning as a single leading slash---they indicate that the file-name
is an absolute one, with the path leading from root.
However, {\Posix} permits the OS to give special meaning to
\emph{two} leading slashes.
For this reason, the routines in this section do not simplify two leading
slashes to a single slash.
A file-name in \emph{directory form} is either a file-name terminated by
a slash, \eg, ``\ex{/src/des/}'', or the empty string, ``''.
1995-11-03 23:41:53 -05:00
The empty string corresponds to the current working directory,
whose file-name is dot (``\ex{.}'').
1995-10-13 23:34:21 -04:00
Working backwards from the append-a-slash rule,
we extend the syntax of {\Posix} file-names to define the empty string
to be a file-name form of the root directory ``\ex{/}''.
(However, ``\ex{/}'' is also acceptable as a file-name form for root.)
So the empty string has two interpretations:
as a file-name form, it is the file-system root;
as a directory form, it is the current working directory.
Slash is also an ambiguous form: \ex{/} is both a directory-form and
a file-name form.
The directory form of a file-name is very rarely used.
Almost all of the procedures in scsh name directories by giving
their file-name form (without the trailing slash), not their directory form.
So, you say ``\ex{/usr/include}'', and ``\ex{.}'', not
``\ex{/usr/include/}'' and ``''.
The sole exceptions are
\ex{file-name-as-directory} and \ex{directory-as-file-name},
whose jobs are to convert back-and-forth between these forms,
and \ex{file-name-directory}, whose job it is to split out the
directory portion of a file-name.
However, most procedures that expect a directory argument will coerce
a file-name in directory form to file-name form if it does not have
a trailing slash.
Bear in mind that the ambiguous case, empty string, will be
interpreted in file-name form, \ie, as root.
\subsubsection{Procedures}
1995-11-03 23:41:53 -05:00
\defun {file-name-directory?} {fname} \boolean
\defunx {file-name-non-directory?} {fname} \boolean
\begin{desc}
These predicates return true if the string is in directory form, or
file-name form (see the above discussion of these two forms).
Note that they both return true on the ambiguous case of empty string,
which is both a directory (current working directory), and a file name
(the file-system root).
\begin{center}
\begin{tabular}{lll}
File name & \ex{\ldots-directory?} & \ex{\ldots-non-directory?} \\
\hline
\ex{"src/des"} & \ex{\sharpf} & \ex{\sharpt} \\
\ex{"src/des/"} & \ex{\sharpt} & \ex{\sharpf} \\
\ex{"/"} & \ex{\sharpt} & \ex{\sharpf} \\
\ex{"."} & \ex{\sharpf} & \ex{\sharpt} \\
\ex{""} & \ex{\sharpt} & \ex{\sharpt}
\end{tabular}
\end{center}
\end{desc}
1995-10-13 23:34:21 -04:00
\begin{defundesc} {file-name-as-directory} {fname} \str
Convert a file-name to directory form.
Basically, add a trailing slash if needed:
\begin{exampletable}
\ex{(file-name-as-directory "src/des")} & \ex{"src/des/"} \\
\ex{(file-name-as-directory "src/des/")} & \ex{"src/des/"} \\[2ex]
%
\header{\ex{.}, \ex{/}, and \ex{""} are special:}
\ex{(file-name-as-directory ".")} & \ex{""} \\
\ex{(file-name-as-directory "/")} & \ex{"/"} \\
\ex{(file-name-as-directory "")} & \ex{"/"}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {directory-as-file-name} {fname} \str
Convert a directory to a simple file-name.
Basically, kill a trailing slash if one is present:
\begin{exampletable}
\ex{(directory-as-file-name "foo/bar/")} & \ex{"foo/bar"} \\[2ex]
%
\header{\ex{/} and \ex{""} are special:}
\ex{(directory-as-file-name "/")} & \ex{"/"} \\
\ex{(directory-as-file-name "")} & \ex{"."} (\ie, the cwd) \\
\end{exampletable}
\end{defundesc}
\begin{defundesc} {file-name-absolute?} {fname} \boolean
Does \var{fname} begin with a root or \ex{\~} component?
(Recognising \ex{\~} as a home-directory specification
is an extension of {\Posix} rules.)
%
\begin{exampletable}
\ex{(file-name-absolute? "/usr/shivers")} & {\sharpt} \\
\ex{(file-name-absolute? "src/des")} & {\sharpf} \\
\ex{(file-name-absolute? "\~/src/des")} & {\sharpt} \\[2ex]
%
\header{Non-obvious case:}
\ex{(file-name-absolute? "")} & {\sharpt} (\ie, root)
\end{exampletable}
\end{defundesc}
\begin{defundesc} {file-name-directory} {fname} {{\str} or false}
Return the directory component of \var{fname} in directory form.
If the file-name is already in directory form, return it as-is.
%
\begin{exampletable}
\ex{(file-name-directory "/usr/bdc")} & \ex{"/usr/"} \\
{\ex{(file-name-directory "/usr/bdc/")}} &
{\ex{"/usr/bdc/"}} \\
\ex{(file-name-directory "bdc/.login")} & \ex{"bdc/"} \\
\ex{(file-name-directory "main.c")} & \ex{""} \\[2ex]
%
\header{Root has no directory component:}
\ex{(file-name-directory "/")} & \ex{""} \\
\ex{(file-name-directory "")} & \ex{""}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {file-name-nondirectory} {fname} \str
Return non-directory component of fname.
%
\begin{exampletable}
{\ex{(file-name-nondirectory "/usr/ian")}} &
{\ex{"ian"}} \\
\ex{(file-name-nondirectory "/usr/ian/")} & \ex{""} \\
{\ex{(file-name-nondirectory "ian/.login")}} &
{\ex{".login"}} \\
\ex{(file-name-nondirectory "main.c")} & \ex{"main.c"} \\
\ex{(file-name-nondirectory "")} & \ex{""} \\
\ex{(file-name-nondirectory "/")} & \ex{"/"}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {split-file-name} {fname} {{\str} list}
Split a file-name into its components.
%
\begin{exampletable}
\splitline{\ex{(split-file-name "src/des/main.c")}}
{\ex{("src" "des" "main.c")}} \\[1.5ex]
%
\splitline{\ex{(split-file-name "/src/des/main.c")}}
{\ex{("" "src" "des" "main.c")}} \\[1.5ex]
%
\splitline{\ex{(split-file-name "main.c")}} {\ex{("main.c")}} \\[1.5ex]
%
\splitline{\ex{(split-file-name "/")}} {\ex{("")}}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {path-list->file-name} {path-list [dir]} \str
Inverse of \ex{split-file-name}.
\begin{code}
(path-list->file-name '("src" "des" "main.c"))
{\evalto} "src/des/main.c"
(path-list->file-name '("" "src" "des" "main.c"))
{\evalto} "/src/des/main.c"
\cb
{\rm{}Optional \var{dir} arg anchors relative path-lists:}
(path-list->file-name '("src" "des" "main.c")
"/usr/shivers")
{\evalto} "/usr/shivers/src/des/main.c"\end{code}
%
The optional \var{dir} argument is usefully \ex{(cwd)}.
\end{defundesc}
\begin{defundesc} {file-name-extension} {fname} \str
Return the file-name's extension.
%
\begin{exampletable}
\ex{(file-name-extension "main.c")} & \ex{".c"} \\
\ex{(file-name-extension "main.c.old")} & \ex{".old"} \\
\ex{(file-name-extension "/usr/shivers")} & \ex{""}
\end{exampletable}
%
\begin{exampletable}
\header{Weird cases:}
\ex{(file-name-extension "foo.")} & \ex{"."} \\
\ex{(file-name-extension "foo..")} & \ex{"."}
\end{exampletable}
%
\begin{exampletable}
\header{Dot files are not extensions:}
\ex{(file-name-extension "/usr/shivers/.login")} & \ex{""}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {file-name-sans-extension} {fname} \str
Return everything but the extension.
%
\begin{exampletable}
\ex{(file-name-sans-extension "main.c")} & \ex{"main"} \\
\ex{(file-name-sans-extension "main.c.old")} & \ex{"main.c""} \\
\splitline{\ex{(file-name-sans-extension "/usr/shivers")}}
{\ex{"/usr/shivers"}}
\end{exampletable}
%
\begin{exampletable}
\header{Weird cases:}
\ex{(file-name-sans-extension "foo.")} & \ex{"foo"} \\
\ex{(file-name-sans-extension "foo..")} & \ex{"foo."} \\[2ex]
%
\header{Dot files are not extensions:}
\splitline{\ex{(file-name-sans-extension "/usr/shivers/.login")}}
{\ex{"/usr/shivers/.login}}
\end{exampletable}
Note that appending the results of \ex{file-name-extension} and
{\ttt file\=name\=sans\=extension} in all cases produces the original file-name.
\end{defundesc}
\begin{defundesc} {parse-file-name} {fname} {[dir name extension]}
Let $f$ be \ex{(file-name-nondirectory \var{fname})}.
This function returns the three values:
\begin{itemize}
\item \ex{(file-name-directory \var{fname})}
\item \ex{(file-name-sans-extension \var{f}))}
\item \ex{(file-name-extension \var{f}\/)}
\end{itemize}
The inverse of \ex{parse-file-name}, in all cases, is \ex{string-append}.
The boundary case of \ex{/} was chosen to preserve this inverse.
\end{defundesc}
\begin{defundesc} {replace-extension} {fname ext} \str
This procedure replaces \var{fname}'s extension with \var{ext}.
It is exactly equivalent to
\codex{(string-append (file-name-sans-extension \var{fname}) \var{ext})}
\end{defundesc}
\defun{simplify-file-name}{fname}\str
\begin{desc}
Removes leading and internal occurrences of dot.
A trailing dot is left alone, as the parent could be a symlink.
Removes internal and trailing double-slashes.
A leading double-slash is left alone, in accordance with {\Posix}.
However, triple and more leading slashes are reduced to a single slash,
in accordance with {\Posix}.
Double-dots (parent directory) are left alone, in case they come after
symlinks or appear in a \ex{/../\var{machine}/\ldots} ``super-root'' form
(which {\Posix} permits).
\end{desc}
\defun{resolve-file-name}{fname [dir]}\str
\begin{desc}
\begin{itemize}
\item Do \ex{\~} expansion.
\item If \var{dir} is given,
convert a relative file-name to an absolute file-name,
relative to directory \var{dir}.
\end{itemize}
\end{desc}
\begin{defundesc} {expand-file-name} {fname [dir]} \str
Resolve and simplify the file-name.
\end{defundesc}
\begin{defundesc} {home-dir} {[user]} \str
\ex{home-dir} returns \var{user}'s home directory.
\var{User} defaults to the current user.
\begin{exampletable}
\ex{(home-dir)} & \ex{"/user1/lecturer/shivers"} \\
\ex{(home-dir "ctkwan")} & \ex{"/user0/research/ctkwan"}
\end{exampletable}
\end{defundesc}
\begin{defundesc} {home-file} {[user] fname} \str
Returns file-name \var{fname} relative to \var{user}'s home directory;
\var{user} defaults to the current user.
%
\begin{exampletable}
\ex{(home-file "man")} & \ex{"/usr/shivers/man"} \\
\ex{(home-file "fcmlau" "man")} & \ex{"/usr/fcmlau/man"}
\end{exampletable}
\end{defundesc}
The general \ex{substitute-env-vars} string procedure,
defined in the previous section,
is also frequently useful for expanding file-names.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{ASCII encoding}
\defun {char->ascii}{\character} \integer
\defunx {ascii->char}{\integer} \character
\begin{desc}
These are identical to \ex{char->integer} and \ex{integer->char} except that
they use the {\Ascii} encoding.
\end{desc}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Character sets}
\label{sec:char-sets}
Scsh provides a \ex{char-set} type for expressing sets of characters.
1995-11-03 23:41:53 -05:00
These sets are used by some of the delimited-input procedures
1995-10-13 23:34:21 -04:00
(section~\ref{sec:field-reader}).
The character set package that scsh uses was taken from Project Mac's
MIT Scheme.
\defun{char-set?}{x}\boolean
\begin{desc}
1995-11-03 23:41:53 -05:00
Returns true if the object \var{x} is a character set.
1995-10-13 23:34:21 -04:00
\end{desc}
\subsection{Creating character sets}
\defun{char-set}{\vari{char}1\ldots}{char-set}
\begin{desc}
Return a character set containing the given characters.
\end{desc}
\defun{chars->char-set}{chars}{char-set}
\begin{desc}
Return a character set containing the characters in the list \var{chars}.
\end{desc}
\defun{string->char-set}{s}{char-set}
\begin{desc}
Return a character set containing the characters in the string \var{s}.
\end{desc}
\defun{predicate->char-set}{pred}{char-set}
\begin{desc}
Returns a character set containing every character \var{c} such that
\ex{(\var{pred} \var{c})} returns true.
\end{desc}
\defun{ascii-range->char-set}{lower upper}{char-set}
\begin{desc}
Returns a character set containing every character whose {\Ascii}
1995-11-03 23:41:53 -05:00
code lies in the half-open range $[\var{lower},\var{upper})$.
1995-10-13 23:34:21 -04:00
\end{desc}
\subsection{Querying character sets}
\defun {char-set-members}{char-set}{character-list}
\begin{desc}
This procedure returns a list of the members of \var{char-set}.
\end{desc}
1995-11-03 23:41:53 -05:00
\defunx{char-set-contains?}{char-set char}\boolean
1995-10-13 23:34:21 -04:00
\begin{desc}
This procedure tests \var{char} for membership in set \var{char-set}.
1995-11-03 23:41:53 -05:00
\remark{Previous releases of scsh called this procedure \ex{char-set-member?},
reversing the order of the arguments.
This made sense, but was unfortunately the reverse order in which the
arguments appear in MIT Scheme.
A reasonable argument order was not backwards-compatible with MIT Scheme;
on the other hand, the MIT Scheme argument order was counter-intuitive
and at odds with common mathematical notation and the \ex{member} family
of R4RS procedures.
We sought to escape the dilemma by shifting to a new name.}
1995-10-13 23:34:21 -04:00
\end{desc}
\subsection{Character set algebra}
\defun {char-set-invert}{char-set}{char-set}
\defunx{char-set-union}{\vari{char-set}1 \vari{char-set}2}{char-set}
\defunx{char-set-intersection}{\vari{char-set}1 \vari{char-set}2}{char-set}
\defunx{char-set-difference}{\vari{char-set}1 \vari{char-set}2}{char-set}
\begin{desc}
These procedures implement set complement, union, intersection, and difference
for character sets.
\end{desc}
\subsection{Standard character sets}
Several character sets are predefined for convenience:
\begin{center}
\newcommand{\entry}[1]{\ex{#1}\index{#1}}
\begin{tabular}{|ll|}
\hline
\entry{char-set:upper-case} & A--Z \\
\entry{char-set:lower-case} & a--z \\
\entry{char-set:numeric} & 0--9 \\
\entry{char-set:whitespace} & space, newline, tab, linefeed, page,
return \\
\entry{char-set:not-whitespace} & Complement of \ex{char-set:whitespace} \\
\entry{char-set:alphabetic} & A--Z and a--z \\
\entry{char-set:alphanumeric} & Alphabetic or numeric \\
\entry{char-set:graphic} & Printing characters and space \\
\hline
\end{tabular}
\end{center}
\defun {char-upper-case?}\character\boolean
\defunx{char-lower-case?}\character\boolean
\defunx{char-numeric? }\character\boolean
\defunx{char-whitespace?}\character\boolean
\defunx{char-alphabetic?}\character\boolean
\defunx{char-alphanumeric?}\character\boolean
\defunx{char-graphic?}\character\boolean
\begin{desc}
These predicates are defined in terms of the above character sets.
\end{desc}