sunet/doc/rfc822.scm.doc

162 lines
5.3 KiB
Plaintext

This file documents names defined in rfc822.scm:
NOTES
A note on line-terminators:
Line-terminating sequences are always a drag, because there's no
agreement on them -- the Net protocols and DOS use cr/lf; Unix uses
lf; the Mac uses cr. One one hand, you'd like to use the code for all
of the above, on the other, you'd also like to use the code for strict
applications that need definitely not to recognise bare cr's or lf's
as terminators.
RFC 822 requires a cr/lf (carriage-return/line-feed) pair to terminate
lines of text. On the other hand, careful perusal of the text shows up
some ambiguities (there are maybe three or four of these, and I'm too
lazy to write them all down). Furthermore, it is an unfortunate fact
that many Unix apps separate lines of RFC 822 text with simple
linefeeds (e.g., messages kept in /usr/spool/mail). As a result, this
code takes a broad-minded view of line-terminators: lines can be
terminated by either cr/lf or just lf, and either terminating sequence
is trimmed.
If you need stricter parsing, you can call the lower-level procedure
%READ-RFC-822-FIELD and %READ-RFC822-HEADERS procs. They take the
read-line procedure as an extra parameter. This means that you can
pass in a procedure that recognises only cr/lf's, or only cr's (for a
Mac app, perhaps), and you can determine whether or not the
terminators get trimmed. However, your read-line procedure must
indicate the header-terminating empty line by returning *either* the
empty string or the two-char string cr/lf (or the EOF object).
DEFINITIONS AND DESCRIPTIONS
(read-rfc822-field [port])
(%read-rfc822-field read-line port)
Read one field from the port, and return two values [NAME BODY]:
- NAME Symbol such as 'subject or 'to. The field name is converted
to a symbol using the Scheme implementation's preferred
case. If the implementation reads symbols in a case-sensitive
fashion (e.g., scsh), lowercase is used. This means you can
compare these symbols to quoted constants using EQ?. When
printing these field names out, it looks best if you capitalise
them with (CAPITALIZE-STRING (SYMBOL->STRING FIELD-NAME)).
- BODY List of strings which are the field's body, e.g.
("shivers@lcs.mit.edu"). Each list element is one line from
the field's body, so if the field spreads out over three lines,
then the body is a list of three strings. The terminating
cr/lf's are trimmed from each string. A leading space or a
leading horizontal tab is also trimmed, but one and onyl one.
When there are no more fields -- EOF or a blank line has terminated
the header section -- then the procedure returns [#f #f].
The %READ-RFC822-FIELD variant allows you to specify your own
read-line procedure. The one used by READ-RFC822-FIELD terminates
lines with either cr/lf or just lf, and it trims the terminator from
the line. Your read-line procedure should trim the terminator of the
line, so an empty line is returned as an empty string.
The procedures raise an error if the syntax of the read field (the
line returned by the read-line-function) is illegal (RFC822 illegal).
read-rfc822-headers [port]
%read-rfc822-headers read-line port
Read in and parse up a section of text that looks like the header
portion of an RFC 822 message. Return an alist mapping a field name (a
symbol such as 'date or 'subject) to a list of field bodies -- one for
each occurence of the field in the header. So if there are five
"Received-by:" fields in the header, the alist maps 'received-by to a
five element list. Each body is in turn represented by a list of
strings -- one for each line of the field. So a field spread across
three lines would produce a three element body.
The %READ-RFC822-HEADERS variant allows you to specify your own
read-line procedure. See notes (A note on line-terminators) above for
reasons why.
rejoin-header-lines alist [seperator]
Takes a field alist such as is returned by READ-RFC822-HEADERS and
returns an equivalent alist. Each body (string list) in the input
alist is joined into a single list in the output alist. SEPARATOR is
the string used to join these elements together; it defaults to a
single space " ", but can usefully be "\n" or "\r\n".
To rejoin a single body list, use scsh's JOIN-STRINGS procedure.
For the following definitions' examples, let's use this set of of
RFC822 headers:
From: shivers
To: ziggy,
newts
To: gjs, tk
get-header-all headers name
returns all entries or #f, p.e.
(get-header-all hdrs 'to) -> ((" ziggy," " newts") (" gjs, tk"))
get-header-lines headers name
returns all lines of the first entry or #f, p.e.
(get-header-lines hdrs 'to) -> (" ziggy," " newts")
get-headers headers name [seperator]
returns the first entry with the lines joined together by seperator
(newline by default (\n)), p.e.
(get-header hdrs 'to) -> "ziggy,\n newts"
htab
is the horizontal tab (ascii-code 9)
string->symbol-pref
is a procedure that takes a string and converts it to a symbol
using the Scheme implementation's preferred case. The preferred case
is recognized by a doing a symbol->string conversion of 'a.
DESIREABLE FUNCTIONALITIES
- Unfolding long lines.
- Lexing structured fields.
- Unlexing structured fields into canonical form.
- Parsing and unparsing dates.
- Parsing and unparsing addresses.