561 lines
21 KiB
TeX
561 lines
21 KiB
TeX
|
\chapter{Strings and characters}
|
||
|
|
||
|
Scsh provides a set of procedures for processing strings and characters.
|
||
|
The procedures provided match regular expressions, search strings,
|
||
|
parse file-names, and manipulate sets of characters.
|
||
|
|
||
|
Also see chapter \ref{chapt:fr-awk} on record I/O, field parsing,
|
||
|
and the awk loop.
|
||
|
The procedures documented there allow you to read character-delimited
|
||
|
records from ports, use regular expressions to split the records into fields
|
||
|
(for example, splitting a string at every occurrence of colon or white-space),
|
||
|
and loop over streams of these records in a convenient way.
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
\section{String manipulation}
|
||
|
\label{sec:stringmanip}
|
||
|
|
||
|
Strings are the basic communication medium for {\Unix} processes, so a
|
||
|
shell language must have reasonable facilities for manipulating them.
|
||
|
|
||
|
\subsection{Regular expressions}
|
||
|
\label{sec:regexps}
|
||
|
|
||
|
The following functions perform regular expression matching.
|
||
|
The code uses Henry Spencer's regular expression package.
|
||
|
|
||
|
\begin{defundesc}{string-match} {regexp string [start]} {match or false}
|
||
|
Search \var{string} starting at position \var{start}, looking for a match
|
||
|
for \var{regexp}. If a match is found, return a match structure describing
|
||
|
the match, otherwise {\sharpf}. \var{Start} defaults to 0.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {regexp-match?} {obj} \boolean
|
||
|
Is the object a regular expression match?
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {match:start} {match [match-number]} \fixnum
|
||
|
Returns the start position of the match denoted by \var{match-number}
|
||
|
The whole regexp is 0. Each further number represents positions
|
||
|
enclosed by \ex{(\ldots)} sections. \var{Match-number} defaults to 0.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {match:end} {match [match-number]} \fixnum
|
||
|
Returns the end position of the match denoted by \var{match-number}.
|
||
|
\var{Match-number} defaults to 0 (the whole match).
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {match:substring} {match [match-number]} \str
|
||
|
Returns the substring matched by match \var{match-number}.
|
||
|
\var{Match-number} defaults to 0 (the whole match).
|
||
|
\end{defundesc}
|
||
|
|
||
|
\remark{
|
||
|
What do these guys do when there is no match corresponding to
|
||
|
\var{match-number}?
|
||
|
Return {\sharpf} or signal error? {\sharpf} probably best.}
|
||
|
|
||
|
Regular expression matching compiles patterns into special data
|
||
|
structures which can be efficiently used to match against strings.
|
||
|
The overhead of compiling patterns that will be used for multiple
|
||
|
searches can be avoided by these lower-level routines:
|
||
|
%
|
||
|
\begin{defundesc} {make-regexp} {str} {re}
|
||
|
Generate a compiled regular expression from the given string.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {regexp?} {obj} \boolean
|
||
|
Is the object a regular expression?
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {regexp-exec} {regexp str [start]} {match or false}
|
||
|
Apply the regular expression \var{regexp} to the string \var{str} starting
|
||
|
at position \var{start}. If the match succeeds it returns a regexp-match,
|
||
|
otherwise {\sharpf}. \var{Start} defaults to 0.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{remarkenv}
|
||
|
The truth: S48 doesn't have the facilities for extending the garbage
|
||
|
collector to malloc'd C storage (unlike elk). So we do not really export
|
||
|
regular expression compilation. What we currently do is this:
|
||
|
\begin{tightcode}
|
||
|
(define regexp? string?)
|
||
|
(define (make-regexp str) str)
|
||
|
(define (regexp-exec regexp str [start])
|
||
|
(string-match regexp str [start]))\end{tightcode}
|
||
|
%
|
||
|
This could be improved upon in another implementation (like elk).
|
||
|
\end{remarkenv}
|
||
|
|
||
|
|
||
|
\defun{regexp-quote}{str}{\str}
|
||
|
\begin{desc}
|
||
|
Returns a regular expression that matches the string \var{str} exactly.
|
||
|
In other words, it quotes the regular expression, prepending backslashes
|
||
|
to all the special regexp characters in \var{str}.
|
||
|
\begin{code}
|
||
|
(regexp-quote "*Hello* world.")
|
||
|
{\evalto}"\\*Hello\\* world\\."\end{code}
|
||
|
\end{desc}
|
||
|
|
||
|
\oops{Scsh regex matching doesn't currently flag un-matched subexpressions
|
||
|
in the \ex{match:begin}, \ex{match:end}, and \ex{match:substring} functions.
|
||
|
This needs to be fixed.}
|
||
|
|
||
|
\subsection{Other string manipulation facilities}
|
||
|
|
||
|
\defun {index} {string char [start]} {{\fixnum} or false}
|
||
|
\defunx {rindex} {string char [start]} {{\fixnum} or false}
|
||
|
\begin{desc}
|
||
|
These procedures search through \var{string} looking for an occurrence
|
||
|
of character \var{char}. \ex{index} searches left-to-right; \ex{rindex}
|
||
|
searches right-to-left.
|
||
|
|
||
|
\ex{index} returns the smallest index $i$ of \var{string} greater
|
||
|
than or equal to \var{start} such that $\var{string}[i] = \var{char}$.
|
||
|
The default for \var{start} is zero. If there is no such match,
|
||
|
\ex{index} returns false.
|
||
|
|
||
|
\ex{rindex} returns the largest index $i$ of \var{string} less than
|
||
|
\var{start} such that $\var{string}[i] = \var{char}$.
|
||
|
The default for \var{start} is \ex{(string-length \var{string})}.
|
||
|
If there is no such match, \ex{rindex} returns false.
|
||
|
\end{desc}
|
||
|
|
||
|
I should probably snarf all the MIT Scheme string functions, and stick them
|
||
|
in a package. {\Unix} programs need to mung character strings a lot.
|
||
|
|
||
|
MIT string match commands:
|
||
|
\begin{tightcode}
|
||
|
[sub]string-match-{forward,backward}[-ci]
|
||
|
[sub]string-{prefix,suffix}[-ci]?
|
||
|
[sub]string-find-{next,previous}-char[-ci]
|
||
|
[sub]string-find-{next,previous}-char-in-set
|
||
|
[sub]string-replace[!]
|
||
|
\ldots\etc\end{tightcode}
|
||
|
These are not currently provided.
|
||
|
|
||
|
\begin{defundesc} {substitute-env-vars} {fname} \str
|
||
|
Replace occurrences of environment variables with their values.
|
||
|
An environment variable is denoted by a dollar sign followed by
|
||
|
alphanumeric chars and underscores, or is surrounded by braces.
|
||
|
|
||
|
\begin{exampletable}
|
||
|
\splitline{\ex{(substitute-env-vars "\$USER/.login")}}
|
||
|
{\ex{"shivers/.login"}} \\
|
||
|
\cd{(substitute-env-vars "$\{USER\}_log")} & \cd{"shivers_log"}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
\subsection{Manipulating file-names}
|
||
|
\label{sec:filenames}
|
||
|
|
||
|
These procedures do not access the file-system at all; they merely operate
|
||
|
on file-name strings. Much of this structure is patterned after the gnu emacs
|
||
|
design. Perhaps a more sophisticated system would be better, something
|
||
|
like the pathname abstractions of {\CommonLisp} or MIT Scheme. However,
|
||
|
being {\Unix}-specific, we can be a little less general.
|
||
|
|
||
|
\subsubsection{Terminology}
|
||
|
These procedures carefully adhere to the {\Posix} standard for file-name
|
||
|
resolution, which occasionally entails some slightly odd things.
|
||
|
This section will describe these rules, and give some basic terminology.
|
||
|
|
||
|
A \emph{file-name} is either the file-system root (``/''),
|
||
|
or a series of slash-terminated directory components, followed by
|
||
|
a a file component.
|
||
|
Root is the only file-name that may end in slash.
|
||
|
Some examples:
|
||
|
\begin{center}
|
||
|
\begin{tabular}{lll}
|
||
|
File name & Dir components & File component \\\hline
|
||
|
\ex{src/des/main.c} & \ex{("src" "des")} & \ex{"main.c"} \\
|
||
|
\ex{/src/des/main.c} & \ex{("" "src" "des")} & \ex{"main.c"} \\
|
||
|
\ex{main.c} & \ex{()} & \ex{"main.c"} \\
|
||
|
\end{tabular}
|
||
|
\end{center}
|
||
|
|
||
|
Note that the relative filename \ex{src/des/main.c} and the absolute filename
|
||
|
\ex{/src/des/main.c} are distinguished by the presence of the root component
|
||
|
\ex{""} in the absolute path.
|
||
|
|
||
|
Multiple embedded slashes within a path have the same meaning as
|
||
|
a single slash.
|
||
|
More than two leading slashes at the beginning of a path have the same
|
||
|
meaning as a single leading slash---they indicate that the file-name
|
||
|
is an absolute one, with the path leading from root.
|
||
|
However, {\Posix} permits the OS to give special meaning to
|
||
|
\emph{two} leading slashes.
|
||
|
For this reason, the routines in this section do not simplify two leading
|
||
|
slashes to a single slash.
|
||
|
|
||
|
A file-name in \emph{directory form} is either a file-name terminated by
|
||
|
a slash, \eg, ``\ex{/src/des/}'', or the empty string, ``''.
|
||
|
The empty string corresponds to the current working directory, who's
|
||
|
file-name is dot (``\ex{.}'').
|
||
|
Working backwards from the append-a-slash rule,
|
||
|
we extend the syntax of {\Posix} file-names to define the empty string
|
||
|
to be a file-name form of the root directory ``\ex{/}''.
|
||
|
(However, ``\ex{/}'' is also acceptable as a file-name form for root.)
|
||
|
So the empty string has two interpretations:
|
||
|
as a file-name form, it is the file-system root;
|
||
|
as a directory form, it is the current working directory.
|
||
|
Slash is also an ambiguous form: \ex{/} is both a directory-form and
|
||
|
a file-name form.
|
||
|
|
||
|
The directory form of a file-name is very rarely used.
|
||
|
Almost all of the procedures in scsh name directories by giving
|
||
|
their file-name form (without the trailing slash), not their directory form.
|
||
|
So, you say ``\ex{/usr/include}'', and ``\ex{.}'', not
|
||
|
``\ex{/usr/include/}'' and ``''.
|
||
|
The sole exceptions are
|
||
|
\ex{file-name-as-directory} and \ex{directory-as-file-name},
|
||
|
whose jobs are to convert back-and-forth between these forms,
|
||
|
and \ex{file-name-directory}, whose job it is to split out the
|
||
|
directory portion of a file-name.
|
||
|
However, most procedures that expect a directory argument will coerce
|
||
|
a file-name in directory form to file-name form if it does not have
|
||
|
a trailing slash.
|
||
|
Bear in mind that the ambiguous case, empty string, will be
|
||
|
interpreted in file-name form, \ie, as root.
|
||
|
|
||
|
|
||
|
|
||
|
\subsubsection{Procedures}
|
||
|
|
||
|
\begin{defundesc} {file-name-as-directory} {fname} \str
|
||
|
Convert a file-name to directory form.
|
||
|
Basically, add a trailing slash if needed:
|
||
|
\begin{exampletable}
|
||
|
\ex{(file-name-as-directory "src/des")} & \ex{"src/des/"} \\
|
||
|
\ex{(file-name-as-directory "src/des/")} & \ex{"src/des/"} \\[2ex]
|
||
|
%
|
||
|
\header{\ex{.}, \ex{/}, and \ex{""} are special:}
|
||
|
\ex{(file-name-as-directory ".")} & \ex{""} \\
|
||
|
\ex{(file-name-as-directory "/")} & \ex{"/"} \\
|
||
|
\ex{(file-name-as-directory "")} & \ex{"/"}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {directory-as-file-name} {fname} \str
|
||
|
Convert a directory to a simple file-name.
|
||
|
Basically, kill a trailing slash if one is present:
|
||
|
\begin{exampletable}
|
||
|
\ex{(directory-as-file-name "foo/bar/")} & \ex{"foo/bar"} \\[2ex]
|
||
|
%
|
||
|
\header{\ex{/} and \ex{""} are special:}
|
||
|
\ex{(directory-as-file-name "/")} & \ex{"/"} \\
|
||
|
\ex{(directory-as-file-name "")} & \ex{"."} (\ie, the cwd) \\
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {file-name-absolute?} {fname} \boolean
|
||
|
Does \var{fname} begin with a root or \ex{\~} component?
|
||
|
(Recognising \ex{\~} as a home-directory specification
|
||
|
is an extension of {\Posix} rules.)
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\ex{(file-name-absolute? "/usr/shivers")} & {\sharpt} \\
|
||
|
\ex{(file-name-absolute? "src/des")} & {\sharpf} \\
|
||
|
\ex{(file-name-absolute? "\~/src/des")} & {\sharpt} \\[2ex]
|
||
|
%
|
||
|
\header{Non-obvious case:}
|
||
|
\ex{(file-name-absolute? "")} & {\sharpt} (\ie, root)
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {file-name-directory} {fname} {{\str} or false}
|
||
|
Return the directory component of \var{fname} in directory form.
|
||
|
If the file-name is already in directory form, return it as-is.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\ex{(file-name-directory "/usr/bdc")} & \ex{"/usr/"} \\
|
||
|
{\ex{(file-name-directory "/usr/bdc/")}} &
|
||
|
{\ex{"/usr/bdc/"}} \\
|
||
|
\ex{(file-name-directory "bdc/.login")} & \ex{"bdc/"} \\
|
||
|
\ex{(file-name-directory "main.c")} & \ex{""} \\[2ex]
|
||
|
%
|
||
|
\header{Root has no directory component:}
|
||
|
\ex{(file-name-directory "/")} & \ex{""} \\
|
||
|
\ex{(file-name-directory "")} & \ex{""}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {file-name-nondirectory} {fname} \str
|
||
|
Return non-directory component of fname.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
{\ex{(file-name-nondirectory "/usr/ian")}} &
|
||
|
{\ex{"ian"}} \\
|
||
|
\ex{(file-name-nondirectory "/usr/ian/")} & \ex{""} \\
|
||
|
{\ex{(file-name-nondirectory "ian/.login")}} &
|
||
|
{\ex{".login"}} \\
|
||
|
\ex{(file-name-nondirectory "main.c")} & \ex{"main.c"} \\
|
||
|
\ex{(file-name-nondirectory "")} & \ex{""} \\
|
||
|
\ex{(file-name-nondirectory "/")} & \ex{"/"}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {split-file-name} {fname} {{\str} list}
|
||
|
Split a file-name into its components.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\splitline{\ex{(split-file-name "src/des/main.c")}}
|
||
|
{\ex{("src" "des" "main.c")}} \\[1.5ex]
|
||
|
%
|
||
|
\splitline{\ex{(split-file-name "/src/des/main.c")}}
|
||
|
{\ex{("" "src" "des" "main.c")}} \\[1.5ex]
|
||
|
%
|
||
|
\splitline{\ex{(split-file-name "main.c")}} {\ex{("main.c")}} \\[1.5ex]
|
||
|
%
|
||
|
\splitline{\ex{(split-file-name "/")}} {\ex{("")}}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {path-list->file-name} {path-list [dir]} \str
|
||
|
Inverse of \ex{split-file-name}.
|
||
|
\begin{code}
|
||
|
(path-list->file-name '("src" "des" "main.c"))
|
||
|
{\evalto} "src/des/main.c"
|
||
|
(path-list->file-name '("" "src" "des" "main.c"))
|
||
|
{\evalto} "/src/des/main.c"
|
||
|
\cb
|
||
|
{\rm{}Optional \var{dir} arg anchors relative path-lists:}
|
||
|
(path-list->file-name '("src" "des" "main.c")
|
||
|
"/usr/shivers")
|
||
|
{\evalto} "/usr/shivers/src/des/main.c"\end{code}
|
||
|
%
|
||
|
The optional \var{dir} argument is usefully \ex{(cwd)}.
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {file-name-extension} {fname} \str
|
||
|
Return the file-name's extension.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\ex{(file-name-extension "main.c")} & \ex{".c"} \\
|
||
|
\ex{(file-name-extension "main.c.old")} & \ex{".old"} \\
|
||
|
\ex{(file-name-extension "/usr/shivers")} & \ex{""}
|
||
|
\end{exampletable}
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\header{Weird cases:}
|
||
|
\ex{(file-name-extension "foo.")} & \ex{"."} \\
|
||
|
\ex{(file-name-extension "foo..")} & \ex{"."}
|
||
|
\end{exampletable}
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\header{Dot files are not extensions:}
|
||
|
\ex{(file-name-extension "/usr/shivers/.login")} & \ex{""}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {file-name-sans-extension} {fname} \str
|
||
|
Return everything but the extension.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\ex{(file-name-sans-extension "main.c")} & \ex{"main"} \\
|
||
|
\ex{(file-name-sans-extension "main.c.old")} & \ex{"main.c""} \\
|
||
|
\splitline{\ex{(file-name-sans-extension "/usr/shivers")}}
|
||
|
{\ex{"/usr/shivers"}}
|
||
|
\end{exampletable}
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\header{Weird cases:}
|
||
|
\ex{(file-name-sans-extension "foo.")} & \ex{"foo"} \\
|
||
|
\ex{(file-name-sans-extension "foo..")} & \ex{"foo."} \\[2ex]
|
||
|
%
|
||
|
\header{Dot files are not extensions:}
|
||
|
\splitline{\ex{(file-name-sans-extension "/usr/shivers/.login")}}
|
||
|
{\ex{"/usr/shivers/.login}}
|
||
|
\end{exampletable}
|
||
|
|
||
|
Note that appending the results of \ex{file-name-extension} and
|
||
|
{\ttt file\=name\=sans\=extension} in all cases produces the original file-name.
|
||
|
\end{defundesc}
|
||
|
|
||
|
|
||
|
\begin{defundesc} {parse-file-name} {fname} {[dir name extension]}
|
||
|
Let $f$ be \ex{(file-name-nondirectory \var{fname})}.
|
||
|
This function returns the three values:
|
||
|
\begin{itemize}
|
||
|
\item \ex{(file-name-directory \var{fname})}
|
||
|
\item \ex{(file-name-sans-extension \var{f}))}
|
||
|
\item \ex{(file-name-extension \var{f}\/)}
|
||
|
\end{itemize}
|
||
|
The inverse of \ex{parse-file-name}, in all cases, is \ex{string-append}.
|
||
|
The boundary case of \ex{/} was chosen to preserve this inverse.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {replace-extension} {fname ext} \str
|
||
|
This procedure replaces \var{fname}'s extension with \var{ext}.
|
||
|
It is exactly equivalent to
|
||
|
\codex{(string-append (file-name-sans-extension \var{fname}) \var{ext})}
|
||
|
\end{defundesc}
|
||
|
|
||
|
\defun{simplify-file-name}{fname}\str
|
||
|
\begin{desc}
|
||
|
Removes leading and internal occurrences of dot.
|
||
|
A trailing dot is left alone, as the parent could be a symlink.
|
||
|
Removes internal and trailing double-slashes.
|
||
|
A leading double-slash is left alone, in accordance with {\Posix}.
|
||
|
However, triple and more leading slashes are reduced to a single slash,
|
||
|
in accordance with {\Posix}.
|
||
|
Double-dots (parent directory) are left alone, in case they come after
|
||
|
symlinks or appear in a \ex{/../\var{machine}/\ldots} ``super-root'' form
|
||
|
(which {\Posix} permits).
|
||
|
\end{desc}
|
||
|
|
||
|
\defun{resolve-file-name}{fname [dir]}\str
|
||
|
\begin{desc}
|
||
|
\begin{itemize}
|
||
|
\item Do \ex{\~} expansion.
|
||
|
\item If \var{dir} is given,
|
||
|
convert a relative file-name to an absolute file-name,
|
||
|
relative to directory \var{dir}.
|
||
|
\end{itemize}
|
||
|
\end{desc}
|
||
|
|
||
|
\begin{defundesc} {expand-file-name} {fname [dir]} \str
|
||
|
Resolve and simplify the file-name.
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {home-dir} {[user]} \str
|
||
|
\ex{home-dir} returns \var{user}'s home directory.
|
||
|
\var{User} defaults to the current user.
|
||
|
|
||
|
\begin{exampletable}
|
||
|
\ex{(home-dir)} & \ex{"/user1/lecturer/shivers"} \\
|
||
|
\ex{(home-dir "ctkwan")} & \ex{"/user0/research/ctkwan"}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
\begin{defundesc} {home-file} {[user] fname} \str
|
||
|
Returns file-name \var{fname} relative to \var{user}'s home directory;
|
||
|
\var{user} defaults to the current user.
|
||
|
%
|
||
|
\begin{exampletable}
|
||
|
\ex{(home-file "man")} & \ex{"/usr/shivers/man"} \\
|
||
|
\ex{(home-file "fcmlau" "man")} & \ex{"/usr/fcmlau/man"}
|
||
|
\end{exampletable}
|
||
|
\end{defundesc}
|
||
|
|
||
|
The general \ex{substitute-env-vars} string procedure,
|
||
|
defined in the previous section,
|
||
|
is also frequently useful for expanding file-names.
|
||
|
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
\section{ASCII encoding}
|
||
|
|
||
|
\defun {char->ascii}{\character} \integer
|
||
|
\defunx {ascii->char}{\integer} \character
|
||
|
\begin{desc}
|
||
|
These are identical to \ex{char->integer} and \ex{integer->char} except that
|
||
|
they use the {\Ascii} encoding.
|
||
|
\end{desc}
|
||
|
|
||
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||
|
\section{Character sets}
|
||
|
\label{sec:char-sets}
|
||
|
|
||
|
Scsh provides a \ex{char-set} type for expressing sets of characters.
|
||
|
These sets are used by some of the delimited input procedures
|
||
|
(section~\ref{sec:field-reader}).
|
||
|
The character set package that scsh uses was taken from Project Mac's
|
||
|
MIT Scheme.
|
||
|
|
||
|
\defun{char-set?}{x}\boolean
|
||
|
\begin{desc}
|
||
|
Returns true if the object \ex{x} is a character set.
|
||
|
\end{desc}
|
||
|
|
||
|
\subsection{Creating character sets}
|
||
|
|
||
|
\defun{char-set}{\vari{char}1\ldots}{char-set}
|
||
|
\begin{desc}
|
||
|
Return a character set containing the given characters.
|
||
|
\end{desc}
|
||
|
|
||
|
\defun{chars->char-set}{chars}{char-set}
|
||
|
\begin{desc}
|
||
|
Return a character set containing the characters in the list \var{chars}.
|
||
|
\end{desc}
|
||
|
|
||
|
\defun{string->char-set}{s}{char-set}
|
||
|
\begin{desc}
|
||
|
Return a character set containing the characters in the string \var{s}.
|
||
|
\end{desc}
|
||
|
|
||
|
\defun{predicate->char-set}{pred}{char-set}
|
||
|
\begin{desc}
|
||
|
Returns a character set containing every character \var{c} such that
|
||
|
\ex{(\var{pred} \var{c})} returns true.
|
||
|
\end{desc}
|
||
|
|
||
|
\defun{ascii-range->char-set}{lower upper}{char-set}
|
||
|
\begin{desc}
|
||
|
Returns a character set containing every character whose {\Ascii}
|
||
|
code lies in the range $[\var{lower},\var{upper}]$ inclusive.
|
||
|
\end{desc}
|
||
|
|
||
|
\subsection{Querying character sets}
|
||
|
\defun {char-set-members}{char-set}{character-list}
|
||
|
\begin{desc}
|
||
|
This procedure returns a list of the members of \var{char-set}.
|
||
|
\end{desc}
|
||
|
|
||
|
\defunx{char-set-member?}{char char-set}\boolean
|
||
|
\begin{desc}
|
||
|
This procedure tests \var{char} for membership in set \var{char-set}.
|
||
|
\end{desc}
|
||
|
|
||
|
\subsection{Character set algebra}
|
||
|
\defun {char-set-invert}{char-set}{char-set}
|
||
|
\defunx{char-set-union}{\vari{char-set}1 \vari{char-set}2}{char-set}
|
||
|
\defunx{char-set-intersection}{\vari{char-set}1 \vari{char-set}2}{char-set}
|
||
|
\defunx{char-set-difference}{\vari{char-set}1 \vari{char-set}2}{char-set}
|
||
|
\begin{desc}
|
||
|
These procedures implement set complement, union, intersection, and difference
|
||
|
for character sets.
|
||
|
\end{desc}
|
||
|
|
||
|
\subsection{Standard character sets}
|
||
|
Several character sets are predefined for convenience:
|
||
|
|
||
|
\begin{center}
|
||
|
\newcommand{\entry}[1]{\ex{#1}\index{#1}}
|
||
|
\begin{tabular}{|ll|}
|
||
|
\hline
|
||
|
\entry{char-set:upper-case} & A--Z \\
|
||
|
\entry{char-set:lower-case} & a--z \\
|
||
|
\entry{char-set:numeric} & 0--9 \\
|
||
|
\entry{char-set:whitespace} & space, newline, tab, linefeed, page,
|
||
|
return \\
|
||
|
\entry{char-set:not-whitespace} & Complement of \ex{char-set:whitespace} \\
|
||
|
\entry{char-set:alphabetic} & A--Z and a--z \\
|
||
|
\entry{char-set:alphanumeric} & Alphabetic or numeric \\
|
||
|
\entry{char-set:graphic} & Printing characters and space \\
|
||
|
\hline
|
||
|
\end{tabular}
|
||
|
\end{center}
|
||
|
|
||
|
|
||
|
\defun {char-upper-case?}\character\boolean
|
||
|
\defunx{char-lower-case?}\character\boolean
|
||
|
\defunx{char-numeric? }\character\boolean
|
||
|
\defunx{char-whitespace?}\character\boolean
|
||
|
\defunx{char-alphabetic?}\character\boolean
|
||
|
\defunx{char-alphanumeric?}\character\boolean
|
||
|
\defunx{char-graphic?}\character\boolean
|
||
|
\begin{desc}
|
||
|
These predicates are defined in terms of the above character sets.
|
||
|
\end{desc}
|
||
|
|
||
|
|