sunet/doc/uri.scm.doc

151 lines
4.7 KiB
Plaintext
Raw Normal View History

2001-05-20 15:01:37 -04:00
This file documents names specified in uri.scm.
NOTES
URIs are of following syntax:
[scheme] : path [? search ] [# fragmentid]
Parts in [] may be ommitted. The last part is usually referred to as
fragid in this document.
DEFINITIONS AND DESCRIPTIONS
char-set
uri-reserved
A list of reserved characters (semicolon, slash, hash, question mark,
double colon and space).
procedure
parse-uri uri-string --> (scheme, path, search, frag-id)
Multiple-value return: scheme, path, search, frag-id, in this
order. scheme, search and frag-id are either #f or a string. path is a
nonempty list of strings. An empty path is a list containing the empty
string. parse-uri tries to be tolerant of the various ways people build broken URIs out there on the Net (so it is not absolutely conform with RFC 1630).
procedure
unescape-uri string [start [end]] --> string
Unescapes a string. This procedure should only be used *after* the url
(!) was parsed, since unescaping may introduce characters that blow
up the parse (that's why escape sequences are used in URIs ;).
Escape-sequences are of following scheme: %hh where h is a hexadecimal
digit. E.g. %20 is space (ASCII character 32).
procedure
hex-digit? character --> boolean
Returns #t if character is a hexadecimal digit (i.e., one of 1-9, a-f,
A-F), #f otherwise.
procedure
hexchar->int character --> number
Translates the given character to an integer, p.e. (hexchar->int \#a)
=> 10.
procedure
int->hexchar integer --> character
Translates the given integer from range 1-15 into an hexadecimal
character (uses uppercase letters), p.e. (int->hexchar 14) => E.
char-set
uri-escaped-chars
A set of characters that are escaped in URIs. These are the following
characters: dollar ($), minus (-), underscore (_), at (@), dot (.),
and-sign (&), exclamation mark (!), asterisk (*), backslash (\),
double quote ("), single quote ('), open brace ((), close brace ()),
comma (,) plus (+) and all other characters that are neither letters
nor digits (such as space and control characters).
procedure
escape-uri string [escaped-chars] --> string
Escapes characters of string that are given with escaped-chars.
escaped-chars default to uri-escaped-chars. Be careful with using this
procedure to chunks of text with syntactically meaningful reserved
characters (e.g., paths with URI slashes or colons) -- they'll be
escaped, and lose their special meaning. E.g. it would be a mistake to
apply escape-uri to "//lcs.mit.edu:8001/foo/bar.html" because the
slashes and colons would be escaped. Note that esacpe-uri doesn't
check this as it would lose his meaning.
procedure
resolve-uri cscheme cp scheme p --> (scheme, path)
Sorry, I can't figure out what resolve-uri is inteded to do. Perhaps
I find it out later.
The code seems to have a bug: In the body of receive, there's a
loop. j should, according to the comment, count sequential /. But j
counts nothing in the body. Either zero is added ((lp (cdr cp-tail)
(cons (car cp-tail) rhead) (+ j 0))) or j is set to 1 ((lp (cdr
cp-tail) (cons (car cp-tail) rhead) 1))). Nevertheless, j is expected
to reach value numsl that can be larger than one. So what? I am
confused.
procedure
rev-append list-a list-b --> list
Performs a (append (reverse list-a) list-b). The comment says it
should be defined in a list package but I am wondering how often this
will be used.
procedure
split-uri-path uri start end --> list
Splits uri at /'s. Only the substring given with start (inclusive) and
end (exclusive) is considered. Start and end - 1 have to be within the
range of the uri-string. Otherwise an index-out-of-range exception
will be raised. Example: (split-uri-path "foo/bar/colon" 4 11) ==>
'("bar" "col")
procedure
simplify-uri-path path --> list
Removes "." and ".." entries from path. The result is a (maybe empty)
list representing a path that does not contain any "." or "..". The
list can only be empty if the path did not start with "/" (for the
rare occasion someone wants to simplify a relative path). The result
is #f if the path tries to back up past root, for example by "/.." or
"/foo/../.." or just "..". "//" may occur somewhere in the path
referring to root but not being backed up.
Examples:
(simplify-uri-path (split-uri-path "/foo/bar/baz/.." 0 15))
==> '("" "foo" "bar")
(simplify-uri-path (split-uri-path "foo/bar/baz/../../.." 0 20))
==> '()
(simplify-uri-path (split-uri-path "/foo/../.." 0 10))
==> #f ; tried to back up root
(simplify-uri-path (split-uri-path "foo/bar//" 0 9))
==> '("") ; "//" refers to root
(simplify-uri-path (split-uri-path "foo/bar/" 0 8))
==> '("") ; last "/" also refers to root
(simplify-uri-path (split-uri-path "/foo/bar//baz/../.." 0 19))
==> #f ; tries to back up root