140 lines
5.7 KiB
TeX
140 lines
5.7 KiB
TeX
%&latex -*- latex -*-
|
|
|
|
\chapter{Reading delimited strings}
|
|
\label{chapt:rdelim}
|
|
|
|
Scsh provides a set of procedures that read delimited strings from
|
|
input ports.
|
|
There are procedures to read a single line of text
|
|
(terminated by a newline character),
|
|
a single paragraph (terminated by a blank line),
|
|
and general delimited strings
|
|
(terminated by a character belonging to an arbitrary character set).
|
|
|
|
These procedures can be applied to any Scheme input port.
|
|
However, the scsh virtual machine has native-code support for performing
|
|
delimited reads on Unix ports, and these input operations should be
|
|
particularly fast---much faster than doing the equivalent character-at-a-time
|
|
operation from Scheme code.
|
|
|
|
All of the delimited input operations described below take a \ex{handle-delim}
|
|
parameter, which determines what the procedure does with the terminating
|
|
delimiter character.
|
|
There are four possible choices for a \ex{handle-delim} parameter:
|
|
\begin{inset}
|
|
\begin{tabular}{|l|l|} \hline
|
|
\ex{handle-delim} & Meaning \\ \hline\hline
|
|
\ex{'trim} & Ignore delimiter character. \\
|
|
\ex{'peek} & Leave delimiter character in input stream. \\
|
|
\ex{'concat} & Append delimiter character to returned value. \\
|
|
\ex{'split} & Return delimiter as second value. \\
|
|
\hline
|
|
\end{tabular}
|
|
\end{inset}
|
|
The last three cases allow the programmer to distinguish between strings
|
|
that are terminated by a delimiter character, and strings that are
|
|
terminated by an end-of-file.
|
|
|
|
|
|
\begin{defundesc} {read-line} {[port handle-newline]} {{\str} or eof-object}
|
|
Reads and returns one line of text; on eof, returns the eof object.
|
|
A line is terminated by newline or eof.
|
|
|
|
\var{handle-newline} determines what \ex{read-line} does with the
|
|
newline or EOF that terminates the line; it takes the general set
|
|
of values described for the general \ex{handle-delim} case above,
|
|
and defaults to \ex{'trim} (discard the newline).
|
|
Using this argument allows one to tell whether or not the last line of
|
|
input in a file is newline terminated.
|
|
\end{defundesc}
|
|
|
|
\defun{read-paragraph} {[port handle-delim]} {{\str} or eof}
|
|
\begin{desc}
|
|
This procedure skips blank lines,
|
|
then reads text from a port until a blank line or eof is found.
|
|
A ``blank line'' is a (possibly empty) line composed only of white space.
|
|
The \var{handle-delim} parameter determines how the terminating
|
|
blank line is handled.
|
|
It is described above, and defaults to \ex{'trim}.
|
|
The \ex{'peek} option is not available.
|
|
\end{desc}
|
|
|
|
|
|
The following procedures read in strings from ports delimited by characters
|
|
belonging to a specific set.
|
|
See section~\ref{sec:char-sets} for information on character set manipulation.
|
|
|
|
\defun{read-delimited}{char-set [port handle-delim]} {{\str} or eof}
|
|
\begin{desc}
|
|
Read until we encounter one of the chars in \var{char-set} or eof.
|
|
The \var{handle-delim} parameter determines how the terminating character
|
|
is handled. It is described above, and defaults to \ex{'peek}.
|
|
|
|
The \var{char-set} argument may be a charset, a string, a character, or a
|
|
character predicate; it is coerced to a charset.
|
|
\end{desc}
|
|
|
|
\dfni{read-delimited!} {char-set buf [port handle-delim start end]}
|
|
{nchars or eof or \#f}{procedure}
|
|
{read-delimited"!@\texttt{read-delimited"!}}
|
|
\begin{desc}
|
|
A side-effecting variant of \ex{read-delimited}.
|
|
|
|
The data is written into the string \var{buf} at the indices in the
|
|
half-open interval $[\var{start},\var{end})$; the default interval is the
|
|
whole string: $\var{start}=0$ and $\var{end}=\ex{(string-length
|
|
\var{buf})}$. The values of \var{start} and \var{end} must specify a
|
|
well-defined interval in \var{str}, \ie, $0 \le \var{start} \le \var{end}
|
|
\le \ex{(string-length \var{buf})}$.
|
|
|
|
It returns \var{nbytes}, the number of bytes read. If the buffer filled up
|
|
without a delimiter character being found, \ex{\#f} is returned. If
|
|
the port is at eof when the read starts, the eof object is returned.
|
|
|
|
If an integer is returned (\ie, the read is successfully terminated by
|
|
reading a delimiter character), then the \var{handle-delim} parameter
|
|
determines how the terminating character is handled.
|
|
It is described above, and defaults to \ex{'peek}.
|
|
\end{desc}
|
|
|
|
|
|
|
|
\dfni{\%read-delimited!} {char-set buf gobble? [port start end]}
|
|
{[char-or-eof-or-\#f \integer]}{procedure}
|
|
{"%read-delimited"!@\verb:"%read-delimited"!:}
|
|
\begin{desc}
|
|
This low-level delimited reader uses an alternate interface.
|
|
It returns two values: \var{terminator} and \var{num-read}.
|
|
\begin{description}
|
|
\item [terminator]
|
|
A value describing why the read was terminated:
|
|
\begin{flushleft}
|
|
\begin{tabular}{l@{\qquad$\Rightarrow$\qquad}l}
|
|
Character or eof-object & Read terminated by this value. \\
|
|
\ex{\#f} & Filled buffer without finding a delimiter.
|
|
\end{tabular}
|
|
\end{flushleft}
|
|
|
|
\item [num-read]
|
|
Number of characters read into \var{buf}.
|
|
\end{description}
|
|
|
|
If the read is successfully terminated by reading a delimiter character,
|
|
then the \var{gobble?} parameter determines what to do with the terminating
|
|
character.
|
|
If true, the character is removed from the input stream;
|
|
if false, the character is left in the input stream where a subsequent
|
|
read operation will retrieve it.
|
|
In either case, the character is also the first value returned by
|
|
the procedure call.
|
|
\end{desc}
|
|
|
|
%Note:
|
|
%- Invariant: TERMINATOR = #f => NUM-READ = END - START.
|
|
%- Invariant: TERMINATOR = eof-object and NUM-READ = 0 => at EOF.
|
|
%- When determining the TERMINATOR return value, ties are broken
|
|
% favoring character or the eof-object over #f. That is, if the buffer
|
|
% fills up, %READ-DELIMITED! will peek at one more character from the
|
|
% input stream to determine if it terminates the input. If so, that
|
|
% is returned, not #f.
|