sunet/doc/latex/rfc822.tex

\section{Handle RFC822 headers}
%
\begin{description}
\item[Used files:] rfc822.scm
\item[Name of the package:] rfc822
\end{description}
%
\subsection{What users want to know}

\subsubsection*{A note on line-terminators}
Line-terminating sequences are always a drag, because there's no
agreement on them -- the Net protocols and DOS use
carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the
Mac uses \ex{cr}. One one hand, you'd like to use the code for all of
the above, on the other, you'd also like to use the code for strict
applications that need definitely not to recognise bare \ex{cr}'s or
\ex{lf}'s as terminators.

RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to
terminate lines of text. On the other hand, careful perusal of the
text shows up some ambiguities (there are maybe three or four of
these, and I'm too lazy to write them all down). Furthermore, it is an
unfortunate fact that many Unix apps separate lines of RFC~822 text
with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).
As a result, this code takes a broad-minded view of line-terminators:
lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and
either terminating sequence is trimmed.

If you need stricter parsing, you can call the lower-level procedure
\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take
the read-line procedure as an extra parameter. This means that you can
pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only
\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or
not the terminators get trimmed. However, your read-line procedure
must indicate the header-terminating empty line by returning \emph{either}
the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).

\subsubsection*{Description of the procedures}

\defun{read-rfc822-field} {\ovar{port}} {name body}
\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}
  
  Read one field from the port, and return two values:

  \begin{description}
  \item{\var{name}} Symbol such as \ex{'subject} or \ex{'to}. The
    field name is converted to a symbol using the Scheme
    implementation's preferred case. If the implementation reads
    symbols in a case-sensitive fashion (e.g., scsh), lowercase is
    used. This means you can compare these symbols to quoted constants
    using \ex{eq?}. When printing these field names out, it looks best
    if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}.
    
  \item{var{body}} List of strings which are the field's body, e.g.
    (``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from
    the field's body, so if the field spreads out over three lines,
    then the body is a list of three strings. The terminating
    \ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or
    a leading horizontal tab is also trimmed, but one and onyl one.
  \end{description}
    
  When there are no more fields -- EOF or a blank line has terminated
  the header section -- then the procedure returns [\sharpf\ \sharpf].
 
  The \ex{\%read-rfc822-field} variant allows you to specify your own
  read-line procedure. The one used by \ex{read-rfc822-field}
  terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it
  trims the terminator from the line. Your read-line procedure should
  trim the terminator of the line, so an empty line is returned as an
  empty string.
  
  The procedures raise an error if the syntax of the read field (the
  line returned by the read-line-function) is illegal (regarding
  RFC~822).
\end{defundescx}

\defun{read-rfc822-headers} {\ovar{port}} {association list}
\begin{defundescx}{\%read-rfc822-headers} {read-line port}
  {association list}
  
  Read in and parse up a section of text that looks like the header
  portion of an RFC~822 message. Return an association list mapping a
  field name (a symbol such as 'date or 'subject) to a list of field
  bodies -- one for each occurence of the field in the header. So if
  there are five ``Received-by:'' fields in the header, the alist maps
  'received-by to a five element list. Each body is in turn
  represented by a list of strings -- one for each line of the field.
  So a field spread across three lines would produce a three element
  body.
  
  The \ex{\%read-rfc822-headers} variant allows you to specify your
  own read-line procedure. See \emph{A note on line-terminators} above
  for reasons why.

  Hint: If you want to get familiar with these procedures, you might
  find \ex{make\=string\=input\=port}, that makes a port out of a
  string, helpful.
\end{defundescx}

\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}}
  {association list}
  
  Takes a field \var{alist} such as is returned by
  \ex{read-rfc822-headers} and returns an equivalent association list.
  Each body (\str list) in the input \var{alist} is joined into a
  single list in the output alist. \var{separator} is the string used
  to join these elements together; it defaults to a single space, but
  can usefully be ``\verb|\n|'' (linefeed) or ``\verb|\r\n|''
  (carriage-return/line-feed).
  
  To rejoin a single body list, use scsh's \ex{join-strings}
  procedure.
\end{defundesc}

For the following definitions' examples, let's use this set of of
RFC~822 headers:
\begin{code}
     From: shivers
     To: ziggy,
       newts
     To: gjs, tk\end{code}
%

\begin{defundesc}{get-header-all} {headers name} {string list list}
  Returns all entries or \sharpf, e.g.\
  \codex{(get-header-all hdrs 'to)}
  results to
  \codex{'((" ziggy," "  newts") (" gjs, tk"))}
\end{defundesc}

\begin{defundesc}{get-header-lines} {headers name} {string list}
  Returns all lines of the first entry or \sharpf, e.g.\
  \codex{(get-header-lines hdrs 'to)}
  results to
  \codex{'(" ziggy," "  newts")}
\end{defundesc}

\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string}
  Returns the first entry with the lines joined together by seperator
  (newline by default), e.g.\
  \codex{(get-header hdrs 'to)}
  results to
  \begin{code}
" ziggy,
  newts"\end{code}
%
  Note, that \ex{newts} is led by two spaces.
\end{defundesc}


\begin{defundesc}{string->symbol-pref}{string}{symbol}
  Takes a \string and converts it to a symbol using the Scheme
  implementation's preferred case. (The preferred case is recognized by
  a doing once a \ex{symbol->string} conversion of \ex{'a}.)
\end{defundesc}

\subsubsection*{Desireable functionalities}

\begin{itemize}
\item Unfolding long lines.
\item Lexing structured fields.
\item Unlexing structured fields into canonical form.
\item Parsing and unparsing dates.
\item Parsing and unparsing addresses.
\end{itemize}

%%% Local Variables: 
%%% mode: latex
%%% TeX-master: t
%%% End:
* Created LaTeX style documentation. All documentation in plain text files were now moved to this LaTeX doc (man.tex). Currently, not everything is documented. 2002-02-12 06:50:54 -05:00			`\section{Handle RFC822 headers}`
			`%`
			`\begin{description}`
			`\item[Used files:] rfc822.scm`
			`\item[Name of the package:] rfc822`
			`\end{description}`
			`%`
			`\subsection{What users want to know}`

			`\subsubsection*{A note on line-terminators}`
			`Line-terminating sequences are always a drag, because there's no`
			`agreement on them -- the Net protocols and DOS use`
			`carriage-return/line-feed (\ex{cr}/\ex{lf}); Unix uses \ex{lf}; the`
			`Mac uses \ex{cr}. One one hand, you'd like to use the code for all of`
			`the above, on the other, you'd also like to use the code for strict`
			`applications that need definitely not to recognise bare \ex{cr}'s or`
			`\ex{lf}'s as terminators.`

			`RFC 822 requires a \ex{cr}/\ex{lf} (carriage-return/line-feed) pair to`
			`terminate lines of text. On the other hand, careful perusal of the`
			`text shows up some ambiguities (there are maybe three or four of`
			`these, and I'm too lazy to write them all down). Furthermore, it is an`
			`unfortunate fact that many Unix apps separate lines of RFC~822 text`
			`with simple linefeeds (e.g., messages kept in \ex{/usr/spool/mail}).`
			`As a result, this code takes a broad-minded view of line-terminators:`
			`lines can be terminated by either \ex{cr}/\ex{lf} or just \ex{lf}, and`
			`either terminating sequence is trimmed.`

			`If you need stricter parsing, you can call the lower-level procedure`
			`\ex{\%read\=rfc822\=field} and \ex{\%read\=rfc822\=headers}. They take`
			`the read-line procedure as an extra parameter. This means that you can`
			`pass in a procedure that recognises only \ex{cr}/\ex{lf}'s, or only`
			`\ex{cr}'s (for a Mac app, perhaps), and you can determine whether or`
			`not the terminators get trimmed. However, your read-line procedure`
			`must indicate the header-terminating empty line by returning \emph{either}`
			`the empty string or the two-char string \ex{cr}/\ex{lf} (or the EOF object).`

			`\subsubsection*{Description of the procedures}`

			`\defun{read-rfc822-field} {\ovar{port}} {name body}`
			`\begin{defundescx}{\%read-rfc822-field } {read-line port} {name body}`

			`Read one field from the port, and return two values:`

			`\begin{description}`
			`\item{\var{name}} Symbol such as \ex{'subject} or \ex{'to}. The`
			`field name is converted to a symbol using the Scheme`
			`implementation's preferred case. If the implementation reads`
			`symbols in a case-sensitive fashion (e.g., scsh), lowercase is`
			`used. This means you can compare these symbols to quoted constants`
			`using \ex{eq?}. When printing these field names out, it looks best`
			`if you capitalize them with \ex{(capitalize\=string (symbol->string field\=name))}.`

			`\item{var{body}} List of strings which are the field's body, e.g.`
			(``shivers\discretionary{@}{}{@}lcs.mit.edu''). Each list element is one line from
			`the field's body, so if the field spreads out over three lines,`
			`then the body is a list of three strings. The terminating`
			`\ex{cr}/\ex{lf}'s are trimmed from each string. A leading space or`
			`a leading horizontal tab is also trimmed, but one and onyl one.`
			`\end{description}`

			`When there are no more fields -- EOF or a blank line has terminated`
			`the header section -- then the procedure returns [\sharpf\ \sharpf].`

			`The \ex{\%read-rfc822-field} variant allows you to specify your own`
			`read-line procedure. The one used by \ex{read-rfc822-field}`
			`terminates lines with either \ex{cr}/\ex{lf} or just \ex{lf}, and it`
			`trims the terminator from the line. Your read-line procedure should`
			`trim the terminator of the line, so an empty line is returned as an`
			`empty string.`

			`The procedures raise an error if the syntax of the read field (the`
			`line returned by the read-line-function) is illegal (regarding`
			`RFC~822).`
			`\end{defundescx}`

			`\defun{read-rfc822-headers} {\ovar{port}} {association list}`
			`\begin{defundescx}{\%read-rfc822-headers} {read-line port}`
			`{association list}`

			`Read in and parse up a section of text that looks like the header`
			`portion of an RFC~822 message. Return an association list mapping a`
			`field name (a symbol such as 'date or 'subject) to a list of field`
			`bodies -- one for each occurence of the field in the header. So if`
			there are five ``Received-by:'' fields in the header, the alist maps
			`'received-by to a five element list. Each body is in turn`
			`represented by a list of strings -- one for each line of the field.`
			`So a field spread across three lines would produce a three element`
			`body.`

			`The \ex{\%read-rfc822-headers} variant allows you to specify your`
			`own read-line procedure. See \emph{A note on line-terminators} above`
			`for reasons why.`

			`Hint: If you want to get familiar with these procedures, you might`
			`find \ex{make\=string\=input\=port}, that makes a port out of a`
			`string, helpful.`
			`\end{defundescx}`

			`\begin{defundesc}{rejoin-header-lines} {alist \ovar{seperator}}`
			`{association list}`

			`Takes a field \var{alist} such as is returned by`
			`\ex{read-rfc822-headers} and returns an equivalent association list.`
			`Each body (\str list) in the input \var{alist} is joined into a`
			`single list in the output alist. \var{separator} is the string used`
			`to join these elements together; it defaults to a single space, but`
			can usefully be ``\verb\|\n\|'' (linefeed) or ``\verb\|\r\n\|''
			`(carriage-return/line-feed).`

			`To rejoin a single body list, use scsh's \ex{join-strings}`
			`procedure.`
			`\end{defundesc}`

			`For the following definitions' examples, let's use this set of of`
			`RFC~822 headers:`
			`\begin{code}`
			`From: shivers`
			`To: ziggy,`
			`newts`
			`To: gjs, tk\end{code}`
			`%`

			`\begin{defundesc}{get-header-all} {headers name} {string list list}`
			`Returns all entries or \sharpf, e.g.\`
			`\codex{(get-header-all hdrs 'to)}`
			`results to`
			`\codex{'((" ziggy," " newts") (" gjs, tk"))}`
			`\end{defundesc}`

			`\begin{defundesc}{get-header-lines} {headers name} {string list}`
			`Returns all lines of the first entry or \sharpf, e.g.\`
			`\codex{(get-header-lines hdrs 'to)}`
			`results to`
			`\codex{'(" ziggy," " newts")}`
			`\end{defundesc}`

			`\begin{defundesc}{get-headers} {headers name \ovar{seperator}} {string}`
			`Returns the first entry with the lines joined together by seperator`
			`(newline by default), e.g.\`
			`\codex{(get-header hdrs 'to)}`
			`results to`
			`\begin{code}`
			`" ziggy,`
			`newts"\end{code}`
			`%`
			`Note, that \ex{newts} is led by two spaces.`
			`\end{defundesc}`


			`\begin{defundesc}{string->symbol-pref}{string}{symbol}`
			`Takes a \string and converts it to a symbol using the Scheme`
			`implementation's preferred case. (The preferred case is recognized by`
			`a doing once a \ex{symbol->string} conversion of \ex{'a}.)`
			`\end{defundesc}`

			`\subsubsection*{Desireable functionalities}`

			`\begin{itemize}`
			`\item Unfolding long lines.`
			`\item Lexing structured fields.`
			`\item Unlexing structured fields into canonical form.`
			`\item Parsing and unparsing dates.`
			`\item Parsing and unparsing addresses.`
			`\end{itemize}`

			`%%% Local Variables:`
			`%%% mode: latex`
			`%%% TeX-master: t`
			`%%% End:`