% -*- latex -*- \chapter{Strings and characters} Scsh provides a set of procedures for processing strings and characters. The procedures provided match regular expressions, search strings, parse file-names, and manipulate sets of characters. Also see chapters \ref{chapt:sre}, \ref{chapt:rdelim} and \ref{chapt:fr-awk} on regular-expressions, record I/O, field parsing, and the awk loop. The procedures documented there allow you to search and pattern-match strings, read character-delimited records from ports, use regular expressions to split the records into fields (for example, splitting a string at every occurrence of colon or white-space), and loop over streams of these records in a convenient way. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{String manipulation} \label{sec:stringmanip} Strings are the basic communication medium for {\Unix} processes, so a shell language must have reasonable facilities for manipulating them. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Manipulating file-names} \label{sec:filenames} These procedures do not access the file-system at all; they merely operate on file-name strings. Much of this structure is patterned after the gnu emacs design. Perhaps a more sophisticated system would be better, something like the pathname abstractions of {\CommonLisp} or MIT Scheme. However, being {\Unix}-specific, we can be a little less general. \subsubsection{Terminology} These procedures carefully adhere to the {\Posix} standard for file-name resolution, which occasionally entails some slightly odd things. This section will describe these rules, and give some basic terminology. A \emph{file-name} is either the file-system root (``/''), or a series of slash-terminated directory components, followed by a a file component. Root is the only file-name that may end in slash. Some examples: \begin{center} \begin{tabular}{lll} File name & Dir components & File component \\\hline \ex{src/des/main.c} & \ex{("src" "des")} & \ex{"main.c"} \\ \ex{/src/des/main.c} & \ex{("" "src" "des")} & \ex{"main.c"} \\ \ex{main.c} & \ex{()} & \ex{"main.c"} \\ \end{tabular} \end{center} Note that the relative filename \ex{src/des/main.c} and the absolute filename \ex{/src/des/main.c} are distinguished by the presence of the root component \ex{""} in the absolute path. Multiple embedded slashes within a path have the same meaning as a single slash. More than two leading slashes at the beginning of a path have the same meaning as a single leading slash---they indicate that the file-name is an absolute one, with the path leading from root. However, {\Posix} permits the OS to give special meaning to \emph{two} leading slashes. For this reason, the routines in this section do not simplify two leading slashes to a single slash. A file-name in \emph{directory form} is either a file-name terminated by a slash, \eg, ``\ex{/src/des/}'', or the empty string, ``''. The empty string corresponds to the current working directory, whose file-name is dot (``\ex{.}''). Working backwards from the append-a-slash rule, we extend the syntax of {\Posix} file-names to define the empty string to be a file-name form of the root directory ``\ex{/}''. (However, ``\ex{/}'' is also acceptable as a file-name form for root.) So the empty string has two interpretations: as a file-name form, it is the file-system root; as a directory form, it is the current working directory. Slash is also an ambiguous form: \ex{/} is both a directory-form and a file-name form. The directory form of a file-name is very rarely used. Almost all of the procedures in scsh name directories by giving their file-name form (without the trailing slash), not their directory form. So, you say ``\ex{/usr/include}'', and ``\ex{.}'', not ``\ex{/usr/include/}'' and ``''. The sole exceptions are \ex{file-name-as-directory} and \ex{directory-as-file-name}, whose jobs are to convert back-and-forth between these forms, and \ex{file-name-directory}, whose job it is to split out the directory portion of a file-name. However, most procedures that expect a directory argument will coerce a file-name in directory form to file-name form if it does not have a trailing slash. Bear in mind that the ambiguous case, empty string, will be interpreted in file-name form, \ie, as root. \subsubsection{Procedures} \defun {file-name-directory?} {fname} \boolean \defunx {file-name-non-directory?} {fname} \boolean \begin{desc} These predicates return true if the string is in directory form, or file-name form (see the above discussion of these two forms). Note that they both return true on the ambiguous case of empty string, which is both a directory (current working directory), and a file name (the file-system root). \begin{center} \begin{tabular}{lll} File name & \ex{\ldots-directory?} & \ex{\ldots-non-directory?} \\ \hline \ex{"src/des"} & \ex{\sharpf} & \ex{\sharpt} \\ \ex{"src/des/"} & \ex{\sharpt} & \ex{\sharpf} \\ \ex{"/"} & \ex{\sharpt} & \ex{\sharpf} \\ \ex{"."} & \ex{\sharpf} & \ex{\sharpt} \\ \ex{""} & \ex{\sharpt} & \ex{\sharpt} \end{tabular} \end{center} \end{desc} \begin{defundesc} {file-name-as-directory} {fname} \str Convert a file-name to directory form. Basically, add a trailing slash if needed: \begin{exampletable} \ex{(file-name-as-directory "src/des")} & \ex{"src/des/"} \\ \ex{(file-name-as-directory "src/des/")} & \ex{"src/des/"} \\[2ex] % \header{\ex{.}, \ex{/}, and \ex{""} are special:} \ex{(file-name-as-directory ".")} & \ex{""} \\ \ex{(file-name-as-directory "/")} & \ex{"/"} \\ \ex{(file-name-as-directory "")} & \ex{"/"} \end{exampletable} \end{defundesc} \begin{defundesc} {directory-as-file-name} {fname} \str Convert a directory to a simple file-name. Basically, kill a trailing slash if one is present: \begin{exampletable} \ex{(directory-as-file-name "foo/bar/")} & \ex{"foo/bar"} \\[2ex] % \header{\ex{/} and \ex{""} are special:} \ex{(directory-as-file-name "/")} & \ex{"/"} \\ \ex{(directory-as-file-name "")} & \ex{"."} (\ie, the cwd) \\ \end{exampletable} \end{defundesc} \begin{defundesc} {file-name-absolute?} {fname} \boolean Does \var{fname} begin with a root or \ex{\~} component? (Recognising \ex{\~} as a home-directory specification is an extension of {\Posix} rules.) % \begin{exampletable} \ex{(file-name-absolute? "/usr/shivers")} & {\sharpt} \\ \ex{(file-name-absolute? "src/des")} & {\sharpf} \\ \ex{(file-name-absolute? "\~/src/des")} & {\sharpt} \\[2ex] % \header{Non-obvious case:} \ex{(file-name-absolute? "")} & {\sharpt} (\ie, root) \end{exampletable} \end{defundesc} \begin{defundesc} {file-name-directory} {fname} {{\str} or false} Return the directory component of \var{fname} in directory form. If the file-name is already in directory form, return it as-is. % \begin{exampletable} \ex{(file-name-directory "/usr/bdc")} & \ex{"/usr/"} \\ {\ex{(file-name-directory "/usr/bdc/")}} & {\ex{"/usr/bdc/"}} \\ \ex{(file-name-directory "bdc/.login")} & \ex{"bdc/"} \\ \ex{(file-name-directory "main.c")} & \ex{""} \\[2ex] % \header{Root has no directory component:} \ex{(file-name-directory "/")} & \ex{""} \\ \ex{(file-name-directory "")} & \ex{""} \end{exampletable} \end{defundesc} \begin{defundesc} {file-name-nondirectory} {fname} \str Return non-directory component of fname. % \begin{exampletable} {\ex{(file-name-nondirectory "/usr/ian")}} & {\ex{"ian"}} \\ \ex{(file-name-nondirectory "/usr/ian/")} & \ex{""} \\ {\ex{(file-name-nondirectory "ian/.login")}} & {\ex{".login"}} \\ \ex{(file-name-nondirectory "main.c")} & \ex{"main.c"} \\ \ex{(file-name-nondirectory "")} & \ex{""} \\ \ex{(file-name-nondirectory "/")} & \ex{"/"} \end{exampletable} \end{defundesc} \begin{defundesc} {split-file-name} {fname} {{\str} list} Split a file-name into its components. % \begin{exampletable} \splitline{\ex{(split-file-name "src/des/main.c")}} {\ex{("src" "des" "main.c")}} \\[1.5ex] % \splitline{\ex{(split-file-name "/src/des/main.c")}} {\ex{("" "src" "des" "main.c")}} \\[1.5ex] % \splitline{\ex{(split-file-name "main.c")}} {\ex{("main.c")}} \\[1.5ex] % \splitline{\ex{(split-file-name "/")}} {\ex{("")}} \end{exampletable} \end{defundesc} \begin{defundesc} {path-list->file-name} {path-list [dir]} \str Inverse of \ex{split-file-name}. \begin{code} (path-list->file-name '("src" "des" "main.c")) {\evalto} "src/des/main.c" (path-list->file-name '("" "src" "des" "main.c")) {\evalto} "/src/des/main.c" \cb {\rm{}Optional \var{dir} arg anchors relative path-lists:} (path-list->file-name '("src" "des" "main.c") "/usr/shivers") {\evalto} "/usr/shivers/src/des/main.c"\end{code} % The optional \var{dir} argument is usefully \ex{(cwd)}. \end{defundesc} \begin{defundesc} {file-name-extension} {fname} \str Return the file-name's extension. % \begin{exampletable} \ex{(file-name-extension "main.c")} & \ex{".c"} \\ \ex{(file-name-extension "main.c.old")} & \ex{".old"} \\ \ex{(file-name-extension "/usr/shivers")} & \ex{""} \end{exampletable} % \begin{exampletable} \header{Weird cases:} \ex{(file-name-extension "foo.")} & \ex{"."} \\ \ex{(file-name-extension "foo..")} & \ex{"."} \end{exampletable} % \begin{exampletable} \header{Dot files are not extensions:} \ex{(file-name-extension "/usr/shivers/.login")} & \ex{""} \end{exampletable} \end{defundesc} \begin{defundesc} {file-name-sans-extension} {fname} \str Return everything but the extension. % \begin{exampletable} \ex{(file-name-sans-extension "main.c")} & \ex{"main"} \\ \ex{(file-name-sans-extension "main.c.old")} & \ex{"main.c""} \\ \splitline{\ex{(file-name-sans-extension "/usr/shivers")}} {\ex{"/usr/shivers"}} \end{exampletable} % \begin{exampletable} \header{Weird cases:} \ex{(file-name-sans-extension "foo.")} & \ex{"foo"} \\ \ex{(file-name-sans-extension "foo..")} & \ex{"foo."} \\[2ex] % \header{Dot files are not extensions:} \splitline{\ex{(file-name-sans-extension "/usr/shivers/.login")}} {\ex{"/usr/shivers/.login}} \end{exampletable} Note that appending the results of \ex{file-name-extension} and {\ttt file\=name\=sans\=extension} in all cases produces the original file-name. \end{defundesc} \begin{defundesc} {parse-file-name} {fname} {[dir name extension]} Let $f$ be \ex{(file-name-nondirectory \var{fname})}. This function returns the three values: \begin{itemize} \item \ex{(file-name-directory \var{fname})} \item \ex{(file-name-sans-extension \var{f}))} \item \ex{(file-name-extension \var{f}\/)} \end{itemize} The inverse of \ex{parse-file-name}, in all cases, is \ex{string-append}. The boundary case of \ex{/} was chosen to preserve this inverse. \end{defundesc} \begin{defundesc} {replace-extension} {fname ext} \str This procedure replaces \var{fname}'s extension with \var{ext}. It is exactly equivalent to \codex{(string-append (file-name-sans-extension \var{fname}) \var{ext})} \end{defundesc} \defun{simplify-file-name}{fname}\str \begin{desc} Removes leading and internal occurrences of dot. A trailing dot is left alone, as the parent could be a symlink. Removes internal and trailing double-slashes. A leading double-slash is left alone, in accordance with {\Posix}. However, triple and more leading slashes are reduced to a single slash, in accordance with {\Posix}. Double-dots (parent directory) are left alone, in case they come after symlinks or appear in a \ex{/../\var{machine}/\ldots} ``super-root'' form (which {\Posix} permits). \end{desc} \defun{resolve-file-name}{fname [dir]}\str \begin{desc} \begin{itemize} \item Do \ex{\~} expansion. \item If \var{dir} is given, convert a relative file-name to an absolute file-name, relative to directory \var{dir}. \end{itemize} \end{desc} \begin{defundesc} {expand-file-name} {fname [dir]} \str Resolve and simplify the file-name. \end{defundesc} \begin{defundesc} {absolute-file-name} {fname [dir]} \str Convert file-name \var{fname} into an absolute file name, relative to directory \var{dir}, which defaults to the current working directory. The file name is simplified before being returned. This procedure does not treat a leading tilde character specially. \end{defundesc} \begin{defundesc} {home-dir} {[user]} \str \ex{home-dir} returns \var{user}'s home directory. \var{User} defaults to the current user. \begin{exampletable} \ex{(home-dir)} & \ex{"/user1/lecturer/shivers"} \\ \ex{(home-dir "ctkwan")} & \ex{"/user0/research/ctkwan"} \end{exampletable} \end{defundesc} \begin{defundesc} {home-file} {[user] fname} \str Returns file-name \var{fname} relative to \var{user}'s home directory; \var{user} defaults to the current user. % \begin{exampletable} \ex{(home-file "man")} & \ex{"/usr/shivers/man"} \\ \ex{(home-file "fcmlau" "man")} & \ex{"/usr/fcmlau/man"} \end{exampletable} \end{defundesc} The general \ex{substitute-env-vars} string procedure, defined in the previous section, is also frequently useful for expanding file-names. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Other string manipulation facilities} \defun {index} {string char [start]} {{\fixnum} or false} \defunx {rindex} {string char [start]} {{\fixnum} or false} \begin{desc} These procedures search through \var{string} looking for an occurrence of character \var{char}. \ex{index} searches left-to-right; \ex{rindex} searches right-to-left. \ex{index} returns the smallest index $i$ of \var{string} greater than or equal to \var{start} such that $\var{string}[i] = \var{char}$. The default for \var{start} is zero. If there is no such match, \ex{index} returns false. \ex{rindex} returns the largest index $i$ of \var{string} less than \var{start} such that $\var{string}[i] = \var{char}$. The default for \var{start} is \ex{(string-length \var{string})}. If there is no such match, \ex{rindex} returns false. \end{desc} I should probably snarf all the MIT Scheme string functions, and stick them in a package. {\Unix} programs need to mung character strings a lot. MIT string match commands: \begin{tightcode} [sub]string-match-{forward,backward}[-ci] [sub]string-{prefix,suffix}[-ci]? [sub]string-find-{next,previous}-char[-ci] [sub]string-find-{next,previous}-char-in-set [sub]string-replace[!] \ldots\etc\end{tightcode} These are not currently provided. \begin{defundesc} {substitute-env-vars} {fname} \str Replace occurrences of environment variables with their values. An environment variable is denoted by a dollar sign followed by alphanumeric chars and underscores, or is surrounded by braces. \begin{exampletable} \splitline{\ex{(substitute-env-vars "\$USER/.login")}} {\ex{"shivers/.login"}} \\ \cd{(substitute-env-vars "$\{USER\}_log")} & \cd{"shivers_log"} \end{exampletable} \end{defundesc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{ASCII encoding} \defun {char->ascii}{\character} \integer \defunx {ascii->char}{\integer} \character \begin{desc} These are identical to \ex{char->integer} and \ex{integer->char} except that they use the {\Ascii} encoding. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Character sets} \label{sec:char-sets} Scsh provides a \ex{char-set} type for expressing sets of characters. These sets are used by some of the delimited-input procedures (section~\ref{sec:field-reader}). Scsh's character set package was adapted and extended from Project Mac's MIT Scheme package. Note that the character type used in the current implementation corresponds to the ASCII character set---but you would be wise not to build this assumption into your code if you can help it.\footnote{ Actually, it's slightly uglier than that, albeit somewhat more useful. The current character type corresponds to an eight-bit superset of ASCII. The \ex{ascii->char} and \ex{char->ascii} functions will preserve this eighth bit. However, none of the the high 128 characters appear in any of the standard character sets defined in section~\ref{sec:std-csets}, except for \ex{char-set:full}. If someone would email the authors a listing of the full Latin-1 definition, we'll be happy to upgrade these sets' definitions to make them Latin-1 compliant.} \defun{char-set?}{x}\boolean \begin{desc} Is the object \var{x} a character set? \end{desc} \defun{char-set=}{\vari{cs}1 \vari{cs}2\ldots}\boolean \begin{desc} Are the character sets equal? \end{desc} \defun{char-set<=}{\vari{cs}1 \vari{cs}2\ldots}\boolean \begin{desc} Returns true if every character set \vari{cs}{i} is a subset of character set \vari{cs}{i+1}. \end{desc} \defun{char-set-fold}{kons knil cs}\object \begin{desc} This is the fundamental iterator for character sets. Applies the function \var{kons} across the character set \var{cs} using initial state value \var{knil}. That is, if \var{cs} is the empty set, the procedure returns \var{knil}. Otherwise, some element \var{c} of \var{cs} is chosen; let \var{cs'} be the remaining, unchosen characters. The procedure returns \begin{tightcode} (char-set-fold \var{kons} (\var{kons} \var{c} \var{knil}) \var{cs'})\end{tightcode} For example, we could define \ex{char-set-members} (see below) as \begin{tightcode} (lambda (cs) (char-set-fold cons '() cs))\end{tightcode} \remark{This procedure was formerly named \texttt{\indx{reduce-char-set}}. The old binding is still provided, but is deprecated and will probably vanish in a future release.} \end{desc} \defun{char-set-for-each}{p cs}{\undefined} \begin{desc} Apply procedure \var{p} to each character in the character set \var{cs}. Note that the order in which \var{p} is applied to the characters in the set is not specified, and may even change from application to application. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Creating character sets} \defun{char-set}{\vari{char}1\ldots}{char-set} \begin{desc} Return a character set containing the given characters. \end{desc} \defun{chars->char-set}{chars}{char-set} \begin{desc} Return a character set containing the characters in the list \var{chars}. \end{desc} \defun{string->char-set}{s}{char-set} \begin{desc} Return a character set containing the characters in the string \var{s}. \end{desc} \defun{predicate->char-set}{pred}{char-set} \begin{desc} Returns a character set containing every character \var{c} such that \ex{(\var{pred} \var{c})} returns true. \end{desc} \defun{ascii-range->char-set}{lower upper}{char-set} \begin{desc} Returns a character set containing every character whose {\Ascii} code lies in the half-open range $[\var{lower},\var{upper})$. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Querying character sets} \defun {char-set-members}{char-set}{character-list} \begin{desc} This procedure returns a list of the members of \var{char-set}. \end{desc} \defunx{char-set-contains?}{char-set char}\boolean \begin{desc} This procedure tests \var{char} for membership in set \var{char-set}. \remark{Previous releases of scsh called this procedure \ex{char-set-member?}, reversing the order of the arguments. This made sense, but was unfortunately the reverse order in which the arguments appear in MIT Scheme. A reasonable argument order was not backwards-compatible with MIT Scheme; on the other hand, the MIT Scheme argument order was counter-intuitive and at odds with common mathematical notation and the \ex{member} family of R4RS procedures. We sought to escape the dilemma by shifting to a new name.} \end{desc} \defun{char-set-size}{cs}\integer \begin{desc} Returns the number of elements in character set \var{cs}. \end{desc} \defun{char-set-every?}{pred cs}\boolean \defunx{char-set-any?}{pred cs}\object \begin{desc} The \ex{char-set-every?} procedure returns true if predicate \var{pred} returns true of every character in the character set \var{cs}. Likewise, \ex{char-set-any?} applies \var{pred} to every character in character set \var{cs}, and returns the first true value it finds. If no character produces a true value, it returns false. The order in which these procedures sequence through the elements of \var{cs} is not specified. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Character-set algebra} \defun {char-set-invert}{char-set}{char-set} \defunx{char-set-union}{\vari{char-set}1\ldots}{char-set} \defunx{char-set-intersection}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set} \defunx{char-set-difference}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set} \begin{desc} These procedures implement set complement, union, intersection, and difference for character sets. The union, intersection, and difference operations are n-ary, associating to the left; the difference function requires at least one argument, while union and intersection may be applied to zero arguments. \end{desc} \defun {char-set-adjoin}{cs \vari{char}1\ldots}{char-set} \defunx{char-set-delete}{cs \vari{char}1\ldots}{char-set} \begin{desc} Add/delete the \vari{char}i characters to/from character set \var{cs}. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Standard character sets} \label{sec:std-csets} Several character sets are predefined for convenience: \begin{center} \newcommand{\entry}[1]{\ex{#1}\index{#1}} \begin{tabular}{|ll|} \hline \entry{char-set:lower-case} & Lower-case alphabetic chars \\ \entry{char-set:upper-case} & Upper-case alphabetic chars \\ \entry{char-set:alphabetic} & Alphabetic chars \\ \entry{char-set:numeric} & Decimal digits: 0--9 \\ \entry{char-set:alphanumeric} & Alphabetic or numeric \\ \entry{char-set:graphic} & Printing characters except space \\ \entry{char-set:printing} & Printing characters including space \\ \entry{char-set:whitespace} & Whitespace characters \\ \entry{char-set:control} & Control characters \\ \entry{char-set:punctuation} & Punctuation characters \\ \entry{char-set:hex-digit} & A hexadecimal digit: 0--9, A--F, a--f \\ \entry{char-set:blank} & Blank characters \\ \entry{char-set:ascii} & A character in the ASCII set. \\ \entry{char-set:empty} & Empty set \\ \entry{char-set:full} & All characters \\ \hline \end{tabular} \end{center} The first eleven of these correspond to the character classes defined in Posix. Note that there may be characters in \ex{char-set:alphabetic} that are neither upper or lower case---this might occur in implementations that use a character type richer than ASCII, such as Unicode. A ``graphic character'' is one that would put ink on your page. While the exact composition of these sets may vary depending upon the character type provided by the Scheme system upon which scsh is running, here are the definitions for some of the sets in an ASCII character set: \begin{center} \newcommand{\entry}[1]{\ex{#1}\index{#1}} \begin{tabular}{|ll|} \hline char-set:alphabetic & A--Z and a--z \\ char-set:lower-case & a--z \\ char-set:upper-case & A--Z \\ char-set:graphic & Alphanumeric + punctuation \\ char-set:whitespace & Space, newline, tab, page, vertical tab, carriage return \\ char-set:blank & Space and tab \\ char-set:control & ASCII 0--31 and 127 \\ char-set:punctuation & \verb|!"#$%&'()*+,-./:;<=>|\verb#?@[\]^_`{|}~# \\ \hline \end{tabular} \end{center} \defun {char-alphabetic?}\character\boolean \defunx{char-lower-case?}\character\boolean \defunx{char-upper-case?}\character\boolean \defunx{char-numeric? }\character\boolean \defunx{char-alphanumeric?}\character\boolean \defunx{char-graphic?}\character\boolean \defunx{char-printing?}\character\boolean \defunx{char-whitespace?}\character\boolean \defunx{char-blank?}\character\boolean \defunx{char-control?}\character\boolean \defunx{char-punctuation?}\character\boolean \defunx{char-hex-digit?}\character\boolean \defunx{char-ascii?}\character\boolean \begin{desc} These predicates are defined in terms of the above character sets. \end{desc} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \subsection{Linear-update character-set operations} These procedures have a hybrid pure-functional/side-effecting semantics: they are allowed, but not required, to side-effect one of their parameters in order to construct their result. An implementation may legally implement these procedures as pure, side-effect-free functions, or it may implement them using side effects, depending upon the details of what is the most efficient or simple to implement in terms of the underlying representation. What this means is that clients of these procedures \emph{may not} rely upon these procedures working by side effect. For example, this is not guaranteed to work: \begin{verbatim} (let ((cs (char-set #\a #\b #\c))) (char-set-adjoin! cs #\d) cs) ; Could be either {a,b,c} or {a,b,c,d}. \end{verbatim} However, this is well-defined: \begin{verbatim} (let ((cs (char-set #\a #\b #\c))) (char-set-adjoin! cs #\d)) ; {a,b,c,d} \end{verbatim} So clients of these procedures write in a functional style, but must additionally be sure that, when the procedure is called, there are no other live pointers to the potentially-modified character set (hence the term ``linear update''). There are two benefits to this convention: \begin{itemize} \item Implementations are free to provide the most efficient possible implementation, either functional or side-effecting. \item Programmers may nonetheless continue to assume that character sets are purely functional data structures: they may be reliably shared without needing to be copied, uniquified, and so forth. \end{itemize} In practice, these procedures are most useful for efficiently constructing character sets in a side-effecting manner, in some limited local context, before passing the character set outside the local construction scope to be used in a functional manner. Scsh provides no assistance in checking the linearity of the potentially side-effected parameters passed to these functions --- there's no linear type checker or run-time mechanism for detecting violations. \defun{char-set-copy}{cs}{char-set} \begin{desc} Returns a copy of the character set \var{cs}. ``Copy'' means that if either the input parameter or the result value of this procedure is passed to one of the linear-update procedures described below, the other character set is guaranteed not to be altered. (A system that provides pure-functional implementations of the rest of the linear-operator suite could implement this procedure as the identity function.) \end{desc} \defun{char-set-adjoin!}{cs \vari{char}1\ldots}{char-set} \begin{desc} Add the \vari{char}i characters to character set \var{cs}, and return the result. This procedure is allowed, but not required, to side-effect \var{cs}. \end{desc} \defun{char-set-delete!}{cs \vari{char}1\ldots}{char-set} \begin{desc} Remove the \vari{char}i characters to character set \var{cs}, and return the result. This procedure is allowed, but not required, to side-effect \var{cs}. \end{desc} \defun {char-set-invert!}{char-set}{char-set} \defunx{char-set-union!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set} \defunx{char-set-intersection!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set} \defunx{char-set-difference!}{\vari{char-set}1 \vari{char-set}2\ldots}{char-set} \begin{desc} These procedures implement set complement, union, intersection, and difference for character sets. They are allowed, but not required, to side-effect their first parameter. The union, intersection, and difference operations are n-ary, associating to the left. \end{desc}