Complete and up-to-date documentation for the RFC822 library.
This commit is contained in:
parent
9b11ac1572
commit
1b31924b80
|
@ -1,104 +1,73 @@
|
||||||
\chapter{Handle RFC822 headers}\label{cha:rfc822}
|
\chapter{RFC~822 Library}\label{cha:rfc822}
|
||||||
%
|
%
|
||||||
\begin{description}
|
The \ex{rfc822} structure provides rudimentary support for parsing
|
||||||
\item[Used files:] rfc822.scm
|
headers according to RFC 822 \textit{Standard for the format of ARPA
|
||||||
\item[Name of the package:] rfc822
|
Internet text messages}. These headers show up in SMTP messages,
|
||||||
\end{description}
|
HTTP headers, etc.
|
||||||
%
|
|
||||||
\section{What users want to know}
|
|
||||||
|
|
||||||
\section{A note on line-terminators}
|
\defun{read-rfc822-field} {[port] [read-line]} {name body}
|
||||||
Line-terminating sequences are always a drag, because there's no
|
\begin{desc}
|
||||||
agreement on them -- the Net protocols and DOS use
|
|
||||||
carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the
|
|
||||||
Mac uses \ex{cr}. One one hand, you'd like to use the code for all of
|
|
||||||
the above, on the other, you'd also like to use the code for strict
|
|
||||||
applications that need definitely not to recognise bare \ex{cr}'s or
|
|
||||||
\ex{lf}'s as terminators.
|
|
||||||
|
|
||||||
RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to
|
|
||||||
terminate lines of text. On the other hand, careful perusal of the
|
|
||||||
text shows up some ambiguities (there are maybe three or four of
|
|
||||||
these, and I'm too lazy to write them all down). Furthermore, it is an
|
|
||||||
unfortunate fact that many Unix apps separate lines of RFC~822 text
|
|
||||||
with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).
|
|
||||||
As a result, this code takes a broad-minded view of line-terminators:
|
|
||||||
lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and
|
|
||||||
either terminating sequence is trimmed.
|
|
||||||
|
|
||||||
If you need stricter parsing, you can call the lower-level procedure
|
|
||||||
\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take
|
|
||||||
the read-line procedure as an extra parameter. This means that you can
|
|
||||||
pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only
|
|
||||||
\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or
|
|
||||||
not the terminators get trimmed. However, your read-line procedure
|
|
||||||
must indicate the header-terminating empty line by returning \emph{either}
|
|
||||||
the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
|
|
||||||
|
|
||||||
\section{Description of the procedures}
|
|
||||||
|
|
||||||
\defun{read-rfc822-field} {\ovar{port}} {name body}
|
|
||||||
\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}
|
|
||||||
|
|
||||||
Read one field from the port, and return two values:
|
Read one field from the port, and return two values:
|
||||||
|
|
||||||
\begin{description}
|
\begin{description}
|
||||||
\item[\var{name}] Symbol such as \ex{'subject} or \ex{'to}. The
|
\item[\var{name}] This is a symbol describing the RFC 822 field
|
||||||
field name is converted to a symbol using the Scheme
|
name, such as \ex{subject} or \ex{to}. The symbol consists of all
|
||||||
implementation's preferred case. If the implementation reads
|
lower-case letters.\footnote{In fact, it \ex{read-rfc822-field}
|
||||||
symbols in a case-sensitive fashion (e.g., scsh), lowercase is
|
uses the preferred case for symbols of the underlying Scheme
|
||||||
used. This means you can compare these symbols to quoted constants
|
implementation which, in the case of scsh, happens to be lower-case.}
|
||||||
using \ex{eq?}. When printing these field names out, it looks best
|
\item[\var{body}] This is list of strings which are the field's
|
||||||
if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}.
|
body, e.g. Each list element is one line from the field's body,
|
||||||
|
so if the field spreads out over three lines, then the body is a
|
||||||
\item[\var{body}] List of strings which are the field's body, e.g.
|
list of three strings. The terminating \ex{cr}/\ex{lf}'s are
|
||||||
(``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from
|
trimmed from each string. Note that header bodies frequently contain
|
||||||
the field's body, so if the field spreads out over three lines,
|
space after the colon like this:
|
||||||
then the body is a list of three strings. The terminating
|
%
|
||||||
\ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or
|
\begin{verbatim}
|
||||||
a leading horizontal tab is also trimmed, but one and onyl one.
|
Subject: RFC 822 can format itself in the ARPA
|
||||||
|
\end{verbatim}
|
||||||
|
%
|
||||||
|
In this case, \var{body} will be
|
||||||
|
\begin{verbatim}
|
||||||
|
(" RFC 822 can format itself in the ARPA")
|
||||||
|
\end{verbatim}
|
||||||
\end{description}
|
\end{description}
|
||||||
|
%
|
||||||
When there are no more fields -- EOF or a blank line has terminated
|
When there are no more fields---EOF or a blank line has terminated
|
||||||
the header section -- then the procedure returns [\sharpf\ \sharpf].
|
the header section---then \ex{read-rfc822-field} returns [\sharpf\ \sharpf].
|
||||||
|
|
||||||
The \ex{\%read-rfc822-field} variant allows you to specify your own
|
|
||||||
read-line procedure. The one used by \ex{read-rfc822-field}
|
|
||||||
terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it
|
|
||||||
trims the terminator from the line. Your read-line procedure should
|
|
||||||
trim the terminator of the line, so an empty line is returned as an
|
|
||||||
empty string.
|
|
||||||
|
|
||||||
The procedures raise an error if the syntax of the read field (the
|
|
||||||
line returned by the read-line-function) is illegal (regarding
|
|
||||||
RFC~822).
|
|
||||||
\end{defundescx}
|
|
||||||
|
|
||||||
\defun{read-rfc822-headers} {\ovar{port}} {association list}
|
\var{Port} is an optional input port to read from---it defaults to
|
||||||
\begin{defundescx}{\%read-rfc822-headers} {read-line port}
|
the value of \ex{(current-input-port)}.
|
||||||
{association list}
|
|
||||||
|
|
||||||
Read in and parse up a section of text that looks like the header
|
\var{Read-line} is an optional parameter specifying a procedure of
|
||||||
portion of an RFC~822 message. Return an association list mapping a
|
one argument (the input port) used to read the raw header lines.
|
||||||
field name (a symbol such as 'date or 'subject) to a list of field
|
The default used by \ex{read-rfc822-field} terminates lines with
|
||||||
bodies -- one for each occurence of the field in the header. So if
|
either \ex{cr}/\ex{lf} or just \ex{lf}, and it trims the terminator
|
||||||
there are five ``Received-by:'' fields in the header, the alist maps
|
from the line. This procedure should trim the terminator of the
|
||||||
'received-by to a five element list. Each body is in turn
|
line, so an empty line is returned as an empty string.
|
||||||
represented by a list of strings -- one for each line of the field.
|
|
||||||
So a field spread across three lines would produce a three element
|
|
||||||
body.
|
|
||||||
|
|
||||||
The \ex{\%read-rfc822-headers} variant allows you to specify your
|
The procedure raises an error if the syntax of the read field (the
|
||||||
own read-line procedure. See \emph{A note on line-terminators} above
|
line returned by the read-line-function) is illegal according to
|
||||||
for reasons why.
|
RFC~822.
|
||||||
|
\end{desc}
|
||||||
|
|
||||||
Hint: If you want to get familiar with these procedures, you might
|
\defun{read-rfc822-headers} {[port] [read-line]} {association-list}
|
||||||
find \ex{make\=string\=input\=port}, that makes a port out of a
|
\begin{desc}
|
||||||
string, helpful.
|
This procedure reads in and parses a section of text that looks like
|
||||||
\end{defundescx}
|
the header portion of an RFC~822 message. It returns an association
|
||||||
|
list mapping a field name (a symbol such as 'date or 'subject) to a
|
||||||
|
list of field bodies----one for each occurence of the field in the
|
||||||
|
header. So if there are five \ex{Received-by} fields in the header,
|
||||||
|
the alist maps \ex{received-by} to a five-element list. Each body is
|
||||||
|
in turn represented by a list of strings----one for each line of the
|
||||||
|
field. So a field spread across three lines would produce a
|
||||||
|
three-element body.
|
||||||
|
|
||||||
|
\var{Port} and \var{read-line} are as with \ex{read-rfc822-field}.
|
||||||
|
\end{desc}
|
||||||
|
|
||||||
\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}}
|
\defun{rejoin-header-lines} {alist [seperator]} {association list}
|
||||||
{association list}
|
\begin{desc}
|
||||||
|
|
||||||
Takes a field \var{alist} such as is returned by
|
Takes a field \var{alist} such as is returned by
|
||||||
\ex{read-rfc822-headers} and returns an equivalent association list.
|
\ex{read-rfc822-headers} and returns an equivalent association list.
|
||||||
|
@ -110,8 +79,8 @@ the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).
|
||||||
|
|
||||||
To rejoin a single body list, use scsh's \ex{join-strings}
|
To rejoin a single body list, use scsh's \ex{join-strings}
|
||||||
procedure.
|
procedure.
|
||||||
\end{defundesc}
|
\end{desc}
|
||||||
|
%
|
||||||
For the following definitions' examples, let's use this set of of
|
For the following definitions' examples, let's use this set of of
|
||||||
RFC~822 headers:
|
RFC~822 headers:
|
||||||
\begin{alltt}
|
\begin{alltt}
|
||||||
|
@ -122,51 +91,37 @@ RFC~822 headers:
|
||||||
\end{alltt}
|
\end{alltt}
|
||||||
%
|
%
|
||||||
|
|
||||||
\begin{defundesc}{get-header-all} {headers name} {string list list}
|
\defun{get-header-all} {headers name} {string list list}
|
||||||
|
\begin{desc}
|
||||||
Returns all entries or \sharpf, e.g.\
|
Returns all entries or \sharpf, e.g.\
|
||||||
\codex{(get-header-all hdrs 'to)}
|
\codex{(get-header-all hdrs 'to)}
|
||||||
results to
|
returns
|
||||||
\codex{'((" ziggy," " newts") (" gjs, tk"))}
|
\codex{'((" ziggy," " newts") (" gjs, tk"))}
|
||||||
\end{defundesc}
|
\end{desc}
|
||||||
|
|
||||||
\begin{defundesc}{get-header-lines} {headers name} {string list}
|
\defun{get-header-lines} {headers name} {string list}
|
||||||
|
\begin{desc}
|
||||||
Returns all lines of the first entry or \sharpf, e.g.\
|
Returns all lines of the first entry or \sharpf, e.g.\
|
||||||
\codex{(get-header-lines hdrs 'to)}
|
\codex{(get-header-lines hdrs 'to)}
|
||||||
results to
|
returns
|
||||||
\codex{'(" ziggy," " newts")}
|
\codex{(" ziggy," " newts")}
|
||||||
\end{defundesc}
|
\end{desc}
|
||||||
|
|
||||||
\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string}
|
\defun{get-header} {headers name [separator]} {string}
|
||||||
|
\begin{desc}
|
||||||
Returns the first entry with the lines joined together by seperator
|
Returns the first entry with the lines joined together by seperator
|
||||||
(newline by default), e.g.\
|
(newline by default), e.g.\
|
||||||
\codex{(get-header hdrs 'to)}
|
\codex{(get-header hdrs 'to)}
|
||||||
results to
|
returns
|
||||||
\begin{alltt}
|
\begin{alltt}
|
||||||
" ziggy,
|
" ziggy,
|
||||||
newts"
|
newts"
|
||||||
\end{alltt}
|
\end{alltt}
|
||||||
%
|
%
|
||||||
Note, that \ex{newts} is led by two spaces.
|
Note, that \ex{newts} is led by two spaces.
|
||||||
\end{defundesc}
|
\end{desc}
|
||||||
|
|
||||||
|
|
||||||
\begin{defundesc}{string->symbol-pref}{string}{symbol}
|
|
||||||
Takes a \string and converts it to a symbol using the Scheme
|
|
||||||
implementation's preferred case. (The preferred case is recognized by
|
|
||||||
a doing once a \ex{symbol->string} conversion of \ex{'a}.)
|
|
||||||
\end{defundesc}
|
|
||||||
|
|
||||||
\section{Desireable functionalities}
|
|
||||||
|
|
||||||
\begin{itemize}
|
|
||||||
\item Unfolding long lines.
|
|
||||||
\item Lexing structured fields.
|
|
||||||
\item Unlexing structured fields into canonical form.
|
|
||||||
\item Parsing and unparsing dates.
|
|
||||||
\item Parsing and unparsing addresses.
|
|
||||||
\end{itemize}
|
|
||||||
|
|
||||||
%%% Local Variables:
|
%%% Local Variables:
|
||||||
%%% mode: latex
|
%%% mode: latex
|
||||||
%%% TeX-master: man.tex
|
%%% TeX-master: "man"
|
||||||
%%% End:
|
%%% End:
|
||||||
|
|
Loading…
Reference in New Issue