\section{Handle RFC822 headers}\label{sec:rfc822} % \begin{description} \item[Used files:] rfc822.scm \item[Name of the package:] rfc822 \end{description} % \subsection{What users want to know} \subsubsection*{A note on line-terminators} Line-terminating sequences are always a drag, because there's no agreement on them -- the Net protocols and DOS use carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the Mac uses \ex{cr}. One one hand, you'd like to use the code for all of the above, on the other, you'd also like to use the code for strict applications that need definitely not to recognise bare \ex{cr}'s or \ex{lf}'s as terminators. RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to terminate lines of text. On the other hand, careful perusal of the text shows up some ambiguities (there are maybe three or four of these, and I'm too lazy to write them all down). Furthermore, it is an unfortunate fact that many Unix apps separate lines of RFC~822 text with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}). As a result, this code takes a broad-minded view of line-terminators: lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and either terminating sequence is trimmed. If you need stricter parsing, you can call the lower-level procedure \ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take the read-line procedure as an extra parameter. This means that you can pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only \ex{cr}'s (for a Mac app, perhaps), and you can determine whether or not the terminators get trimmed. However, your read-line procedure must indicate the header-terminating empty line by returning \emph{either} the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object). \subsubsection*{Description of the procedures} \defun{read-rfc822-field} {\ovar{port}} {name body} \begin{defundescx}{\%read-rfc822-field } {read-line port} {name body} Read one field from the port, and return two values: \begin{description} \item{\var{name}} Symbol such as \ex{'subject} or \ex{'to}. The field name is converted to a symbol using the Scheme implementation's preferred case. If the implementation reads symbols in a case-sensitive fashion (e.g., scsh), lowercase is used. This means you can compare these symbols to quoted constants using \ex{eq?}. When printing these field names out, it looks best if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}. \item{\var{body}} List of strings which are the field's body, e.g. (``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from the field's body, so if the field spreads out over three lines, then the body is a list of three strings. The terminating \ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or a leading horizontal tab is also trimmed, but one and onyl one. \end{description} When there are no more fields -- EOF or a blank line has terminated the header section -- then the procedure returns [\sharpf\ \sharpf]. The \ex{\%read-rfc822-field} variant allows you to specify your own read-line procedure. The one used by \ex{read-rfc822-field} terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it trims the terminator from the line. Your read-line procedure should trim the terminator of the line, so an empty line is returned as an empty string. The procedures raise an error if the syntax of the read field (the line returned by the read-line-function) is illegal (regarding RFC~822). \end{defundescx} \defun{read-rfc822-headers} {\ovar{port}} {association list} \begin{defundescx}{\%read-rfc822-headers} {read-line port} {association list} Read in and parse up a section of text that looks like the header portion of an RFC~822 message. Return an association list mapping a field name (a symbol such as 'date or 'subject) to a list of field bodies -- one for each occurence of the field in the header. So if there are five ``Received-by:'' fields in the header, the alist maps 'received-by to a five element list. Each body is in turn represented by a list of strings -- one for each line of the field. So a field spread across three lines would produce a three element body. The \ex{\%read-rfc822-headers} variant allows you to specify your own read-line procedure. See \emph{A note on line-terminators} above for reasons why. Hint: If you want to get familiar with these procedures, you might find \ex{make\=string\=input\=port}, that makes a port out of a string, helpful. \end{defundescx} \begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}} {association list} Takes a field \var{alist} such as is returned by \ex{read-rfc822-headers} and returns an equivalent association list. Each body (\str list) in the input \var{alist} is joined into a single list in the output alist. \var{separator} is the string used to join these elements together; it defaults to a single space, but can usefully be ``\verb|\n|'' (linefeed) or ``\verb|\r\n|'' (carriage-return/line-feed). To rejoin a single body list, use scsh's \ex{join-strings} procedure. \end{defundesc} For the following definitions' examples, let's use this set of of RFC~822 headers: \begin{alltt} From: shivers To: ziggy, newts To: gjs, tk \end{alltt} % \begin{defundesc}{get-header-all} {headers name} {string list list} Returns all entries or \sharpf, e.g.\ \codex{(get-header-all hdrs 'to)} results to \codex{'((" ziggy," " newts") (" gjs, tk"))} \end{defundesc} \begin{defundesc}{get-header-lines} {headers name} {string list} Returns all lines of the first entry or \sharpf, e.g.\ \codex{(get-header-lines hdrs 'to)} results to \codex{'(" ziggy," " newts")} \end{defundesc} \begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string} Returns the first entry with the lines joined together by seperator (newline by default), e.g.\ \codex{(get-header hdrs 'to)} results to \begin{alltt} " ziggy, newts" \end{alltt} % Note, that \ex{newts} is led by two spaces. \end{defundesc} \begin{defundesc}{string->symbol-pref}{string}{symbol} Takes a \string and converts it to a symbol using the Scheme implementation's preferred case. (The preferred case is recognized by a doing once a \ex{symbol->string} conversion of \ex{'a}.) \end{defundesc} \subsubsection*{Desireable functionalities} \begin{itemize} \item Unfolding long lines. \item Lexing structured fields. \item Unlexing structured fields into canonical form. \item Parsing and unparsing dates. \item Parsing and unparsing addresses. \end{itemize} %%% Local Variables: %%% mode: latex %%% TeX-master: man.tex %%% End: