sunet/doc/latex/rfc822.tex

171 lines
6.7 KiB
TeX

\section{Handle RFC822 headers}\label{sec:rfc822}
%
\begin{description}
\item[Used files:] rfc822.scm
\item[Name of the package:] rfc822
\end{description}
%
\subsection{What users want to know}
\subsubsection*{A note on line-terminators}
Line-terminating sequences are always a drag, because there's no
agreement on them -- the Net protocols and DOS use
carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the
Mac uses \ex{cr}. One one hand, you'd like to use the code for all of
the above, on the other, you'd also like to use the code for strict
applications that need definitely not to recognise bare \ex{cr}'s or
\ex{lf}'s as terminators.
RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to
terminate lines of text. On the other hand, careful perusal of the
text shows up some ambiguities (there are maybe three or four of
these, and I'm too lazy to write them all down). Furthermore, it is an
unfortunate fact that many Unix apps separate lines of RFC~822 text
with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).
As a result, this code takes a broad-minded view of line-terminators:
lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and
either terminating sequence is trimmed.
If you need stricter parsing, you can call the lower-level procedure
\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take
the read-line procedure as an extra parameter. This means that you can
pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only
\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or
not the terminators get trimmed. However, your read-line procedure
must indicate the header-terminating empty line by returning \emph{either}
the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
\subsubsection*{Description of the procedures}
\defun{read-rfc822-field} {\ovar{port}} {name body}
\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}
Read one field from the port, and return two values:
\begin{description}
\item{\var{name}} Symbol such as \ex{'subject} or \ex{'to}. The
field name is converted to a symbol using the Scheme
implementation's preferred case. If the implementation reads
symbols in a case-sensitive fashion (e.g., scsh), lowercase is
used. This means you can compare these symbols to quoted constants
using \ex{eq?}. When printing these field names out, it looks best
if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}.
\item{\var{body}} List of strings which are the field's body, e.g.
(``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from
the field's body, so if the field spreads out over three lines,
then the body is a list of three strings. The terminating
\ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or
a leading horizontal tab is also trimmed, but one and onyl one.
\end{description}
When there are no more fields -- EOF or a blank line has terminated
the header section -- then the procedure returns [\sharpf\ \sharpf].
The \ex{\%read-rfc822-field} variant allows you to specify your own
read-line procedure. The one used by \ex{read-rfc822-field}
terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it
trims the terminator from the line. Your read-line procedure should
trim the terminator of the line, so an empty line is returned as an
empty string.
The procedures raise an error if the syntax of the read field (the
line returned by the read-line-function) is illegal (regarding
RFC~822).
\end{defundescx}
\defun{read-rfc822-headers} {\ovar{port}} {association list}
\begin{defundescx}{\%read-rfc822-headers} {read-line port}
{association list}
Read in and parse up a section of text that looks like the header
portion of an RFC~822 message. Return an association list mapping a
field name (a symbol such as 'date or 'subject) to a list of field
bodies -- one for each occurence of the field in the header. So if
there are five ``Received-by:'' fields in the header, the alist maps
'received-by to a five element list. Each body is in turn
represented by a list of strings -- one for each line of the field.
So a field spread across three lines would produce a three element
body.
The \ex{\%read-rfc822-headers} variant allows you to specify your
own read-line procedure. See \emph{A note on line-terminators} above
for reasons why.
Hint: If you want to get familiar with these procedures, you might
find \ex{make\=string\=input\=port}, that makes a port out of a
string, helpful.
\end{defundescx}
\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}}
{association list}
Takes a field \var{alist} such as is returned by
\ex{read-rfc822-headers} and returns an equivalent association list.
Each body (\str list) in the input \var{alist} is joined into a
single list in the output alist. \var{separator} is the string used
to join these elements together; it defaults to a single space, but
can usefully be ``\verb|\n|'' (linefeed) or ``\verb|\r\n|''
(carriage-return/line-feed).
To rejoin a single body list, use scsh's \ex{join-strings}
procedure.
\end{defundesc}
For the following definitions' examples, let's use this set of of
RFC~822 headers:
\begin{code}
From: shivers
To: ziggy,
newts
To: gjs, tk\end{code}
%
\begin{defundesc}{get-header-all} {headers name} {string list list}
Returns all entries or \sharpf, e.g.\
\codex{(get-header-all hdrs 'to)}
results to
\codex{'((" ziggy," " newts") (" gjs, tk"))}
\end{defundesc}
\begin{defundesc}{get-header-lines} {headers name} {string list}
Returns all lines of the first entry or \sharpf, e.g.\
\codex{(get-header-lines hdrs 'to)}
results to
\codex{'(" ziggy," " newts")}
\end{defundesc}
\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string}
Returns the first entry with the lines joined together by seperator
(newline by default), e.g.\
\codex{(get-header hdrs 'to)}
results to
\begin{code}
" ziggy,
newts"\end{code}
%
Note, that \ex{newts} is led by two spaces.
\end{defundesc}
\begin{defundesc}{string->symbol-pref}{string}{symbol}
Takes a \string and converts it to a symbol using the Scheme
implementation's preferred case. (The preferred case is recognized by
a doing once a \ex{symbol->string} conversion of \ex{'a}.)
\end{defundesc}
\subsubsection*{Desireable functionalities}
\begin{itemize}
\item Unfolding long lines.
\item Lexing structured fields.
\item Unlexing structured fields into canonical form.
\item Parsing and unparsing dates.
\item Parsing and unparsing addresses.
\end{itemize}
%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: