Complete and up-to-date documentation for the RFC822 library.

This commit is contained in:
sperber 2003-01-09 13:47:19 +00:00
parent 9b11ac1572
commit 1b31924b80
1 changed files with 74 additions and 119 deletions

View File

@ -1,104 +1,73 @@
\chapter{Handle RFC822 headers}\label{cha:rfc822} \chapter{RFC~822 Library}\label{cha:rfc822}
% %
\begin{description} The \ex{rfc822} structure provides rudimentary support for parsing
\item[Used files:] rfc822.scm headers according to RFC 822 \textit{Standard for the format of ARPA
\item[Name of the package:] rfc822 Internet text messages}. These headers show up in SMTP messages,
\end{description} HTTP headers, etc.
%
\section{What users want to know}
\section{A note on line-terminators} \defun{read-rfc822-field} {[port] [read-line]} {name body}
Line-terminating sequences are always a drag, because there's no \begin{desc}
agreement on them -- the Net protocols and DOS use
carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the
Mac uses \ex{cr}. One one hand, you'd like to use the code for all of
the above, on the other, you'd also like to use the code for strict
applications that need definitely not to recognise bare \ex{cr}'s or
\ex{lf}'s as terminators.
RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to
terminate lines of text. On the other hand, careful perusal of the
text shows up some ambiguities (there are maybe three or four of
these, and I'm too lazy to write them all down). Furthermore, it is an
unfortunate fact that many Unix apps separate lines of RFC~822 text
with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).
As a result, this code takes a broad-minded view of line-terminators:
lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and
either terminating sequence is trimmed.
If you need stricter parsing, you can call the lower-level procedure
\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take
the read-line procedure as an extra parameter. This means that you can
pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only
\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or
not the terminators get trimmed. However, your read-line procedure
must indicate the header-terminating empty line by returning \emph{either}
the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
\section{Description of the procedures}
\defun{read-rfc822-field} {\ovar{port}} {name body}
\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}
Read one field from the port, and return two values: Read one field from the port, and return two values:
\begin{description} \begin{description}
\item[\var{name}] Symbol such as \ex{'subject} or \ex{'to}. The \item[\var{name}] This is a symbol describing the RFC 822 field
field name is converted to a symbol using the Scheme name, such as \ex{subject} or \ex{to}. The symbol consists of all
implementation's preferred case. If the implementation reads lower-case letters.\footnote{In fact, it \ex{read-rfc822-field}
symbols in a case-sensitive fashion (e.g., scsh), lowercase is uses the preferred case for symbols of the underlying Scheme
used. This means you can compare these symbols to quoted constants implementation which, in the case of scsh, happens to be lower-case.}
using \ex{eq?}. When printing these field names out, it looks best \item[\var{body}] This is list of strings which are the field's
if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}. body, e.g. Each list element is one line from the field's body,
so if the field spreads out over three lines, then the body is a
\item[\var{body}] List of strings which are the field's body, e.g. list of three strings. The terminating \ex{cr}/\ex{lf}'s are
(``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from trimmed from each string. Note that header bodies frequently contain
the field's body, so if the field spreads out over three lines, space after the colon like this:
then the body is a list of three strings. The terminating %
\ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or \begin{verbatim}
a leading horizontal tab is also trimmed, but one and onyl one. Subject: RFC 822 can format itself in the ARPA
\end{verbatim}
%
In this case, \var{body} will be
\begin{verbatim}
(" RFC 822 can format itself in the ARPA")
\end{verbatim}
\end{description} \end{description}
%
When there are no more fields -- EOF or a blank line has terminated When there are no more fields---EOF or a blank line has terminated
the header section -- then the procedure returns [\sharpf\ \sharpf]. the header section---then \ex{read-rfc822-field} returns [\sharpf\ \sharpf].
The \ex{\%read-rfc822-field} variant allows you to specify your own
read-line procedure. The one used by \ex{read-rfc822-field}
terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it
trims the terminator from the line. Your read-line procedure should
trim the terminator of the line, so an empty line is returned as an
empty string.
The procedures raise an error if the syntax of the read field (the
line returned by the read-line-function) is illegal (regarding
RFC~822).
\end{defundescx}
\defun{read-rfc822-headers} {\ovar{port}} {association list} \var{Port} is an optional input port to read from---it defaults to
\begin{defundescx}{\%read-rfc822-headers} {read-line port} the value of \ex{(current-input-port)}.
{association list}
Read in and parse up a section of text that looks like the header \var{Read-line} is an optional parameter specifying a procedure of
portion of an RFC~822 message. Return an association list mapping a one argument (the input port) used to read the raw header lines.
field name (a symbol such as 'date or 'subject) to a list of field The default used by \ex{read-rfc822-field} terminates lines with
bodies -- one for each occurence of the field in the header. So if either \ex{cr}/\ex{lf} or just \ex{lf}, and it trims the terminator
there are five ``Received-by:'' fields in the header, the alist maps from the line. This procedure should trim the terminator of the
'received-by to a five element list. Each body is in turn line, so an empty line is returned as an empty string.
represented by a list of strings -- one for each line of the field.
So a field spread across three lines would produce a three element
body.
The \ex{\%read-rfc822-headers} variant allows you to specify your The procedure raises an error if the syntax of the read field (the
own read-line procedure. See \emph{A note on line-terminators} above line returned by the read-line-function) is illegal according to
for reasons why. RFC~822.
\end{desc}
Hint: If you want to get familiar with these procedures, you might \defun{read-rfc822-headers} {[port] [read-line]} {association-list}
find \ex{make\=string\=input\=port}, that makes a port out of a \begin{desc}
string, helpful. This procedure reads in and parses a section of text that looks like
\end{defundescx} the header portion of an RFC~822 message. It returns an association
list mapping a field name (a symbol such as 'date or 'subject) to a
list of field bodies----one for each occurence of the field in the
header. So if there are five \ex{Received-by} fields in the header,
the alist maps \ex{received-by} to a five-element list. Each body is
in turn represented by a list of strings----one for each line of the
field. So a field spread across three lines would produce a
three-element body.
\var{Port} and \var{read-line} are as with \ex{read-rfc822-field}.
\end{desc}
\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}} \defun{rejoin-header-lines} {alist [seperator]} {association list}
{association list} \begin{desc}
Takes a field \var{alist} such as is returned by Takes a field \var{alist} such as is returned by
\ex{read-rfc822-headers} and returns an equivalent association list. \ex{read-rfc822-headers} and returns an equivalent association list.
@ -110,8 +79,8 @@ the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
To rejoin a single body list, use scsh's \ex{join-strings} To rejoin a single body list, use scsh's \ex{join-strings}
procedure. procedure.
\end{defundesc} \end{desc}
%
For the following definitions' examples, let's use this set of of For the following definitions' examples, let's use this set of of
RFC~822 headers: RFC~822 headers:
\begin{alltt} \begin{alltt}
@ -122,51 +91,37 @@ RFC~822 headers:
\end{alltt} \end{alltt}
% %
\begin{defundesc}{get-header-all} {headers name} {string list list} \defun{get-header-all} {headers name} {string list list}
\begin{desc}
Returns all entries or \sharpf, e.g.\ Returns all entries or \sharpf, e.g.\
\codex{(get-header-all hdrs 'to)} \codex{(get-header-all hdrs 'to)}
results to returns
\codex{'((" ziggy," " newts") (" gjs, tk"))} \codex{'((" ziggy," " newts") (" gjs, tk"))}
\end{defundesc} \end{desc}
\begin{defundesc}{get-header-lines} {headers name} {string list} \defun{get-header-lines} {headers name} {string list}
\begin{desc}
Returns all lines of the first entry or \sharpf, e.g.\ Returns all lines of the first entry or \sharpf, e.g.\
\codex{(get-header-lines hdrs 'to)} \codex{(get-header-lines hdrs 'to)}
results to returns
\codex{'(" ziggy," " newts")} \codex{(" ziggy," " newts")}
\end{defundesc} \end{desc}
\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string} \defun{get-header} {headers name [separator]} {string}
\begin{desc}
Returns the first entry with the lines joined together by seperator Returns the first entry with the lines joined together by seperator
(newline by default), e.g.\ (newline by default), e.g.\
\codex{(get-header hdrs 'to)} \codex{(get-header hdrs 'to)}
results to returns
\begin{alltt} \begin{alltt}
" ziggy, " ziggy,
newts" newts"
\end{alltt} \end{alltt}
% %
Note, that \ex{newts} is led by two spaces. Note, that \ex{newts} is led by two spaces.
\end{defundesc} \end{desc}
\begin{defundesc}{string->symbol-pref}{string}{symbol}
Takes a \string and converts it to a symbol using the Scheme
implementation's preferred case. (The preferred case is recognized by
a doing once a \ex{symbol->string} conversion of \ex{'a}.)
\end{defundesc}
\section{Desireable functionalities}
\begin{itemize}
\item Unfolding long lines.
\item Lexing structured fields.
\item Unlexing structured fields into canonical form.
\item Parsing and unparsing dates.
\item Parsing and unparsing addresses.
\end{itemize}
%%% Local Variables: %%% Local Variables:
%%% mode: latex %%% mode: latex
%%% TeX-master: man.tex %%% TeX-master: "man"
%%% End: %%% End: