Complete and up-to-date documentation for the RFC822 library.

This commit is contained in:
sperber 2003-01-09 13:47:19 +00:00
parent 9b11ac1572
commit 1b31924b80
1 changed files with 74 additions and 119 deletions

View File

@ -1,104 +1,73 @@
\chapter{Handle RFC822 headers}\label{cha:rfc822}
\chapter{RFC~822 Library}\label{cha:rfc822}
%
\begin{description}
\item[Used files:] rfc822.scm
\item[Name of the package:] rfc822
\end{description}
%
\section{What users want to know}
The \ex{rfc822} structure provides rudimentary support for parsing
headers according to RFC 822 \textit{Standard for the format of ARPA
Internet text messages}. These headers show up in SMTP messages,
HTTP headers, etc.
\section{A note on line-terminators}
Line-terminating sequences are always a drag, because there's no
agreement on them -- the Net protocols and DOS use
carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the
Mac uses \ex{cr}. One one hand, you'd like to use the code for all of
the above, on the other, you'd also like to use the code for strict
applications that need definitely not to recognise bare \ex{cr}'s or
\ex{lf}'s as terminators.
RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to
terminate lines of text. On the other hand, careful perusal of the
text shows up some ambiguities (there are maybe three or four of
these, and I'm too lazy to write them all down). Furthermore, it is an
unfortunate fact that many Unix apps separate lines of RFC~822 text
with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).
As a result, this code takes a broad-minded view of line-terminators:
lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and
either terminating sequence is trimmed.
If you need stricter parsing, you can call the lower-level procedure
\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take
the read-line procedure as an extra parameter. This means that you can
pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only
\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or
not the terminators get trimmed. However, your read-line procedure
must indicate the header-terminating empty line by returning \emph{either}
the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
\section{Description of the procedures}
\defun{read-rfc822-field} {\ovar{port}} {name body}
\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}
\defun{read-rfc822-field} {[port] [read-line]} {name body}
\begin{desc}
Read one field from the port, and return two values:
\begin{description}
\item[\var{name}] Symbol such as \ex{'subject} or \ex{'to}. The
field name is converted to a symbol using the Scheme
implementation's preferred case. If the implementation reads
symbols in a case-sensitive fashion (e.g., scsh), lowercase is
used. This means you can compare these symbols to quoted constants
using \ex{eq?}. When printing these field names out, it looks best
if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}.
\item[\var{body}] List of strings which are the field's body, e.g.
(``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from
the field's body, so if the field spreads out over three lines,
then the body is a list of three strings. The terminating
\ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or
a leading horizontal tab is also trimmed, but one and onyl one.
\item[\var{name}] This is a symbol describing the RFC 822 field
name, such as \ex{subject} or \ex{to}. The symbol consists of all
lower-case letters.\footnote{In fact, it \ex{read-rfc822-field}
uses the preferred case for symbols of the underlying Scheme
implementation which, in the case of scsh, happens to be lower-case.}
\item[\var{body}] This is list of strings which are the field's
body, e.g. Each list element is one line from the field's body,
so if the field spreads out over three lines, then the body is a
list of three strings. The terminating \ex{cr}/\ex{lf}'s are
trimmed from each string. Note that header bodies frequently contain
space after the colon like this:
%
\begin{verbatim}
Subject: RFC 822 can format itself in the ARPA
\end{verbatim}
%
In this case, \var{body} will be
\begin{verbatim}
(" RFC 822 can format itself in the ARPA")
\end{verbatim}
\end{description}
When there are no more fields -- EOF or a blank line has terminated
the header section -- then the procedure returns [\sharpf\ \sharpf].
The \ex{\%read-rfc822-field} variant allows you to specify your own
read-line procedure. The one used by \ex{read-rfc822-field}
terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it
trims the terminator from the line. Your read-line procedure should
trim the terminator of the line, so an empty line is returned as an
empty string.
The procedures raise an error if the syntax of the read field (the
line returned by the read-line-function) is illegal (regarding
RFC~822).
\end{defundescx}
%
When there are no more fields---EOF or a blank line has terminated
the header section---then \ex{read-rfc822-field} returns [\sharpf\ \sharpf].
\defun{read-rfc822-headers} {\ovar{port}} {association list}
\begin{defundescx}{\%read-rfc822-headers} {read-line port}
{association list}
\var{Port} is an optional input port to read from---it defaults to
the value of \ex{(current-input-port)}.
Read in and parse up a section of text that looks like the header
portion of an RFC~822 message. Return an association list mapping a
field name (a symbol such as 'date or 'subject) to a list of field
bodies -- one for each occurence of the field in the header. So if
there are five ``Received-by:'' fields in the header, the alist maps
'received-by to a five element list. Each body is in turn
represented by a list of strings -- one for each line of the field.
So a field spread across three lines would produce a three element
body.
\var{Read-line} is an optional parameter specifying a procedure of
one argument (the input port) used to read the raw header lines.
The default used by \ex{read-rfc822-field} terminates lines with
either \ex{cr}/\ex{lf} or just \ex{lf}, and it trims the terminator
from the line. This procedure should trim the terminator of the
line, so an empty line is returned as an empty string.
The \ex{\%read-rfc822-headers} variant allows you to specify your
own read-line procedure. See \emph{A note on line-terminators} above
for reasons why.
The procedure raises an error if the syntax of the read field (the
line returned by the read-line-function) is illegal according to
RFC~822.
\end{desc}
Hint: If you want to get familiar with these procedures, you might
find \ex{make\=string\=input\=port}, that makes a port out of a
string, helpful.
\end{defundescx}
\defun{read-rfc822-headers} {[port] [read-line]} {association-list}
\begin{desc}
This procedure reads in and parses a section of text that looks like
the header portion of an RFC~822 message. It returns an association
list mapping a field name (a symbol such as 'date or 'subject) to a
list of field bodies----one for each occurence of the field in the
header. So if there are five \ex{Received-by} fields in the header,
the alist maps \ex{received-by} to a five-element list. Each body is
in turn represented by a list of strings----one for each line of the
field. So a field spread across three lines would produce a
three-element body.
\var{Port} and \var{read-line} are as with \ex{read-rfc822-field}.
\end{desc}
\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}}
{association list}
\defun{rejoin-header-lines} {alist [seperator]} {association list}
\begin{desc}
Takes a field \var{alist} such as is returned by
\ex{read-rfc822-headers} and returns an equivalent association list.
@ -110,8 +79,8 @@ the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
To rejoin a single body list, use scsh's \ex{join-strings}
procedure.
\end{defundesc}
\end{desc}
%
For the following definitions' examples, let's use this set of of
RFC~822 headers:
\begin{alltt}
@ -122,51 +91,37 @@ RFC~822 headers:
\end{alltt}
%
\begin{defundesc}{get-header-all} {headers name} {string list list}
\defun{get-header-all} {headers name} {string list list}
\begin{desc}
Returns all entries or \sharpf, e.g.\
\codex{(get-header-all hdrs 'to)}
results to
returns
\codex{'((" ziggy," " newts") (" gjs, tk"))}
\end{defundesc}
\end{desc}
\begin{defundesc}{get-header-lines} {headers name} {string list}
\defun{get-header-lines} {headers name} {string list}
\begin{desc}
Returns all lines of the first entry or \sharpf, e.g.\
\codex{(get-header-lines hdrs 'to)}
results to
\codex{'(" ziggy," " newts")}
\end{defundesc}
returns
\codex{(" ziggy," " newts")}
\end{desc}
\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string}
\defun{get-header} {headers name [separator]} {string}
\begin{desc}
Returns the first entry with the lines joined together by seperator
(newline by default), e.g.\
\codex{(get-header hdrs 'to)}
results to
returns
\begin{alltt}
" ziggy,
newts"
\end{alltt}
%
Note, that \ex{newts} is led by two spaces.
\end{defundesc}
\begin{defundesc}{string->symbol-pref}{string}{symbol}
Takes a \string and converts it to a symbol using the Scheme
implementation's preferred case. (The preferred case is recognized by
a doing once a \ex{symbol->string} conversion of \ex{'a}.)
\end{defundesc}
\section{Desireable functionalities}
\begin{itemize}
\item Unfolding long lines.
\item Lexing structured fields.
\item Unlexing structured fields into canonical form.
\item Parsing and unparsing dates.
\item Parsing and unparsing addresses.
\end{itemize}
\end{desc}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: man.tex
%%% TeX-master: "man"
%%% End: