sunet/doc/uri.scm.doc

This file documents names specified in uri.scm.


NOTES

URIs are of following syntax:

[scheme] : path [? search ] [# fragmentid]

Parts in [] may be ommitted. The last part is usually referred to as
fragid in this document.


DEFINITIONS AND DESCRIPTIONS


char-set
uri-reserved

A list of reserved characters (semicolon, slash, hash, question mark,
double colon and space).

procedure
parse-uri uri-string --> (scheme, path, search, frag-id)

Multiple-value return: scheme, path, search, frag-id, in this
order. scheme, search and frag-id are either #f or a string. path is a
nonempty list of strings. An empty path is a list containing the empty
string. parse-uri tries to be tolerant of the various ways people build broken URIs out there on the Net (so it is not absolutely conform with RFC 1630).


procedure
unescape-uri string [start [end]] --> string

Unescapes a string. This procedure should only be used *after* the url
(!)  was parsed, since unescaping may introduce characters that blow
up the parse (that's why escape sequences are used in URIs ;).
Escape-sequences are of following scheme: %hh where h is a hexadecimal
digit. E.g. %20 is space (ASCII character 32).


procedure
hex-digit? character --> boolean

Returns #t if character is a hexadecimal digit (i.e., one of 1-9, a-f,
A-F), #f otherwise.


procedure
hexchar->int character --> number

Translates the given character to an integer, p.e. (hexchar->int \#a)
=> 10.


procedure
int->hexchar integer --> character

Translates the given integer from range 1-15 into an hexadecimal
character (uses uppercase letters), p.e. (int->hexchar 14) => E.


char-set
uri-escaped-chars

A set of characters that are escaped in URIs. These are the following
characters: dollar ($), minus (-), underscore (_), at (@), dot (.),
and-sign (&), exclamation mark (!), asterisk (*), backslash (\),
double quote ("), single quote ('), open brace ((), close brace ()),
comma (,) plus (+) and all other characters that are neither letters
nor digits (such as space and control characters).


procedure
escape-uri string [escaped-chars] --> string

Escapes characters of string that are given with escaped-chars.
escaped-chars default to uri-escaped-chars. Be careful with using this
procedure to chunks of text with syntactically meaningful reserved
characters (e.g., paths with URI slashes or colons) -- they'll be
escaped, and lose their special meaning. E.g. it would be a mistake to
apply escape-uri to "//lcs.mit.edu:8001/foo/bar.html" because the
slashes and colons would be escaped. Note that esacpe-uri doesn't
check this as it would lose his meaning.


procedure
resolve-uri cscheme cp scheme p --> (scheme, path)

Sorry, I can't figure out what resolve-uri is inteded to do. Perhaps
I find it out later.

The code seems to have a bug: In the body of receive, there's a
loop. j should, according to the comment, count sequential /. But j
counts nothing in the body. Either zero is added ((lp (cdr cp-tail)
(cons (car cp-tail) rhead) (+ j 0))) or j is set to 1 ((lp (cdr
cp-tail) (cons (car cp-tail) rhead) 1))). Nevertheless, j is expected
to reach value numsl that can be larger than one. So what? I am
confused.


procedure
rev-append list-a list-b --> list

Performs a (append (reverse list-a) list-b). The comment says it
should be defined in a list package but I am wondering how often this
will be used.


procedure
split-uri-path uri start end --> list

Splits uri at /'s. Only the substring given with start (inclusive) and
end (exclusive) is considered. Start and end - 1 have to be within the
range of the uri-string.  Otherwise an index-out-of-range exception
will be raised. Example: (split-uri-path "foo/bar/colon" 4 11) ==>
'("bar" "col")


procedure
simplify-uri-path path --> list

Removes "." and ".." entries from path. The result is a (maybe empty)
list representing a path that does not contain any "." or "..". The
list can only be empty if the path did not start with "/" (for the
rare occasion someone wants to simplify a relative path). The result
is #f if the path tries to back up past root, for example by "/.." or
"/foo/../.." or just "..". "//" may occur somewhere in the path
referring to root but not being backed up.
Examples:
(simplify-uri-path (split-uri-path "/foo/bar/baz/.." 0 15))
==> '("" "foo" "bar")

(simplify-uri-path (split-uri-path "foo/bar/baz/../../.." 0 20))
==> '()

(simplify-uri-path (split-uri-path "/foo/../.." 0 10))
==> #f          ; tried to back up root

(simplify-uri-path (split-uri-path "foo/bar//" 0 9))
==> '("")       ; "//" refers to root

(simplify-uri-path (split-uri-path "foo/bar/" 0 8))
==> '("")       ; last "/" also refers to root

(simplify-uri-path (split-uri-path "/foo/bar//baz/../.." 0 19))
==> #f          ; tries to back up root