162 lines
5.3 KiB
Plaintext
162 lines
5.3 KiB
Plaintext
This file documents names defined in rfc822.scm:
|
|
|
|
|
|
|
|
|
|
NOTES
|
|
|
|
|
|
|
|
A note on line-terminators:
|
|
|
|
Line-terminating sequences are always a drag, because there's no
|
|
agreement on them -- the Net protocols and DOS use cr/lf; Unix uses
|
|
lf; the Mac uses cr. One one hand, you'd like to use the code for all
|
|
of the above, on the other, you'd also like to use the code for strict
|
|
applications that need definitely not to recognise bare cr's or lf's
|
|
as terminators.
|
|
|
|
RFC 822 requires a cr/lf (carriage-return/line-feed) pair to terminate
|
|
lines of text. On the other hand, careful perusal of the text shows up
|
|
some ambiguities (there are maybe three or four of these, and I'm too
|
|
lazy to write them all down). Furthermore, it is an unfortunate fact
|
|
that many Unix apps separate lines of RFC 822 text with simple
|
|
linefeeds (e.g., messages kept in /usr/spool/mail). As a result, this
|
|
code takes a broad-minded view of line-terminators: lines can be
|
|
terminated by either cr/lf or just lf, and either terminating sequence
|
|
is trimmed.
|
|
|
|
If you need stricter parsing, you can call the lower-level procedure
|
|
%READ-RFC-822-FIELD and %READ-RFC822-HEADERS procs. They take the
|
|
read-line procedure as an extra parameter. This means that you can
|
|
pass in a procedure that recognises only cr/lf's, or only cr's (for a
|
|
Mac app, perhaps), and you can determine whether or not the
|
|
terminators get trimmed. However, your read-line procedure must
|
|
indicate the header-terminating empty line by returning *either* the
|
|
empty string or the two-char string cr/lf (or the EOF object).
|
|
|
|
|
|
|
|
|
|
DEFINITIONS AND DESCRIPTIONS
|
|
|
|
|
|
|
|
(read-rfc822-field [port])
|
|
(%read-rfc822-field read-line port)
|
|
|
|
Read one field from the port, and return two values [NAME BODY]:
|
|
|
|
- NAME Symbol such as 'subject or 'to. The field name is converted
|
|
to a symbol using the Scheme implementation's preferred
|
|
case. If the implementation reads symbols in a case-sensitive
|
|
fashion (e.g., scsh), lowercase is used. This means you can
|
|
compare these symbols to quoted constants using EQ?. When
|
|
printing these field names out, it looks best if you capitalise
|
|
them with (CAPITALIZE-STRING (SYMBOL->STRING FIELD-NAME)).
|
|
|
|
- BODY List of strings which are the field's body, e.g.
|
|
("shivers@lcs.mit.edu"). Each list element is one line from
|
|
the field's body, so if the field spreads out over three lines,
|
|
then the body is a list of three strings. The terminating
|
|
cr/lf's are trimmed from each string. A leading space or a
|
|
leading horizontal tab is also trimmed, but one and onyl one.
|
|
|
|
When there are no more fields -- EOF or a blank line has terminated
|
|
the header section -- then the procedure returns [#f #f].
|
|
|
|
The %READ-RFC822-FIELD variant allows you to specify your own
|
|
read-line procedure. The one used by READ-RFC822-FIELD terminates
|
|
lines with either cr/lf or just lf, and it trims the terminator from
|
|
the line. Your read-line procedure should trim the terminator of the
|
|
line, so an empty line is returned as an empty string.
|
|
|
|
The procedures raise an error if the syntax of the read field (the
|
|
line returned by the read-line-function) is illegal (RFC822 illegal).
|
|
|
|
|
|
|
|
read-rfc822-headers [port]
|
|
%read-rfc822-headers read-line port
|
|
|
|
Read in and parse up a section of text that looks like the header
|
|
portion of an RFC 822 message. Return an alist mapping a field name (a
|
|
symbol such as 'date or 'subject) to a list of field bodies -- one for
|
|
each occurence of the field in the header. So if there are five
|
|
"Received-by:" fields in the header, the alist maps 'received-by to a
|
|
five element list. Each body is in turn represented by a list of
|
|
strings -- one for each line of the field. So a field spread across
|
|
three lines would produce a three element body.
|
|
|
|
The %READ-RFC822-HEADERS variant allows you to specify your own
|
|
read-line procedure. See notes (A note on line-terminators) above for
|
|
reasons why.
|
|
|
|
|
|
|
|
rejoin-header-lines alist [seperator]
|
|
|
|
Takes a field alist such as is returned by READ-RFC822-HEADERS and
|
|
returns an equivalent alist. Each body (string list) in the input
|
|
alist is joined into a single list in the output alist. SEPARATOR is
|
|
the string used to join these elements together; it defaults to a
|
|
single space " ", but can usefully be "\n" or "\r\n".
|
|
|
|
To rejoin a single body list, use scsh's JOIN-STRINGS procedure.
|
|
|
|
|
|
|
|
For the following definitions' examples, let's use this set of of
|
|
RFC822 headers:
|
|
From: shivers
|
|
To: ziggy,
|
|
newts
|
|
To: gjs, tk
|
|
|
|
|
|
|
|
get-header-all headers name
|
|
|
|
returns all entries or #f, p.e.
|
|
(get-header-all hdrs 'to) -> ((" ziggy," " newts") (" gjs, tk"))
|
|
|
|
|
|
|
|
get-header-lines headers name
|
|
|
|
returns all lines of the first entry or #f, p.e.
|
|
(get-header-lines hdrs 'to) -> (" ziggy," " newts")
|
|
|
|
|
|
|
|
get-headers headers name [seperator]
|
|
|
|
returns the first entry with the lines joined together by seperator
|
|
(newline by default (\n)), p.e.
|
|
(get-header hdrs 'to) -> "ziggy,\n newts"
|
|
|
|
|
|
|
|
htab
|
|
|
|
is the horizontal tab (ascii-code 9)
|
|
|
|
|
|
|
|
string->symbol-pref
|
|
|
|
is a procedure that takes a string and converts it to a symbol
|
|
using the Scheme implementation's preferred case. The preferred case
|
|
is recognized by a doing a symbol->string conversion of 'a.
|
|
|
|
|
|
|
|
|
|
DESIREABLE FUNCTIONALITIES
|
|
|
|
- Unfolding long lines.
|
|
- Lexing structured fields.
|
|
- Unlexing structured fields into canonical form.
|
|
- Parsing and unparsing dates.
|
|
- Parsing and unparsing addresses.
|