finally adapt documentation to new uri lib procs

2005-04-10 13:03:33 +00:00 · 2005-04-10 13:03:33 +00:00 · e9bc839cd5
parent 90fc61473e
commit e9bc839cd5
1 changed files with 30 additions and 111 deletions
--- a/doc/latex/uri.tex
+++ b/doc/latex/uri.tex
@ -1,129 +1,48 @@
-\chapter{Parsing and Processing URIs}\label{cha:uri}
+\chapter{Processing URIs}\label{cha:uri}

-The \ex{uri} structure contains a library for dealing with URIs.
-and is out-of-date by now---it is build up on RFC 1360 of 1994 which
-was replaced by RFC 1738, RFC 1808, and finally RFC 2396 of 1998.
+The \ex{uri} structure contains library functions for dealing with URIs.

 \section{Notes on URI Syntax}

-A URI (Uniform Resource Identifier) is of following syntax:
-%
-\begin{inset}
-[\var{scheme}] \verb|:| \var{path} [\verb|?| \var{search}] [\verb|#| \var{fragid}]
-\end{inset}
-%
-Parts in brackets may be omitted.
+The generic syntax of URI (Uniform Resource Identifier) is defined in
+RFC 2396; see Appendix A for a collected BNF of URI.

-The URI contains characters like \verb|:| to indicate its different
-parts.  Some special characters are \emph{escaped} if they are a
-regular part of a name and not indicators for the structure of a URI.
-Escape sequences are of following scheme: \verb|%|\var{h}\var{h} where \var{h}
-is a hexadecimal digit.  The hexadecimal number refers to the
-ASCII of the escaped character, e.g.\ \verb|%20| is space (ASCII
-32) and \verb|%61| is `a' (ASCII 97). This module
-provides procedures to escape and unescape strings that are meant to
-be used in a URI.
+Within URI non-printable Ascii characters are represented by an
+\emph{escape encoding}. \emph{Reserved} characters used as
+delimiters indicating the different parts of a URI also must be
+\emph{escaped} if they are to be regular data of a URI component. The
+set of characters actually \emph{reserved} within any given URI
+component is defined by that component. Therefore
+\emph{escaping} can only be done when the URI is being created from
+its component parts; likewise, a URI must be separated into its
+component parts before \emph{unescaping} can be done.
+
+Escape sequences are of following scheme: \verb|%| \var{h}\var{h}
+where \var{h}\var{h} are the two hexadecimal digits representing the octet code. For
+example \verb|%20| is the escaped encoding for the US-ASCII space character.

 \section{Procedures}

-\defun{unescape-uri}{string [start] [end]}{string}
+\defun{unescape}{string}{string}
 \begin{desc}
-  \ex{Unescape-uri} unescapes a string. If \var{start} and/or \var{end} are
-  specified, they specify start and end positions within \var{string}
-  should be unescaped.
+  \ex{Unescape} unescapes a string. 
 \end{desc}
 %
-This procedure should only be used \emph{after} the URI was parsed,
-since unescaping may introduce characters that blow up the
-parse---that's why escape sequences are used in URIs.
+This procedure may only be used \emph{after} the URI was parsed into
+its component parts (see above).

-\defvar{uri-escaped-chars}{char-set}
+\defun{escape} {string regexp} {string}
 \begin{desc}
-  This is a set of characters (in the sense of SRFI~14) which are
-  escaped in URIs.  RFC 2396 defines this set as all characters which 
-  are neither letters, nor digits, nor one of the following characters:
-   \verb|-|, \verb|_|, \verb|.|, \verb|!|, %$
-   \verb|~|, \verb|*|, \verb|'|, \verb|(|, \verb|)|.
+  \ex{Escape} replaces reserved or excluded characters in \var{string}
+  by their escaped representation. \var{regexp} defines which
+  characters are reserved or excluded within the particular URI component
+  being escaped.
 \end{desc}

-\defun{escape-uri} {string [escaped-chars]} {string}
-\begin{desc}
-  This procedure escapes characters of \var{string} that are in
-  \var{escaped\=chars}. \var{Escaped\=chars} defaults to
-  \ex{uri\=escaped\=chars}.  
-\end{desc}
-%
-Be careful with using this procedure to chunks of text with
-syntactically meaningful reserved characters (e.g., paths with URI
-slashes or colons)---they'll be escaped, and lose their special
-meaning. E.g.\ it would be a mistake to apply \ex{escape-uri} to
-\begin{verbatim}
-//lcs.mit.edu:8001/foo/bar.html
-\end{verbatim}
-%
-because the sla\-shes and co\-lons would be escaped.
-
-\defun{split-uri}{uri start end} {list}
-\begin{desc}
-  This procedure splits \var{uri} at slashes. Only the substring given
-  with \var{start} (inclusive) and \var{end} (exclusive) as indices is
-  considered.  \var{start} and $\var{end} - 1$ have to be within the
-  range of \var{uri}.  Otherwise an \ex{index-out-of-range} exception
-  will be raised.
-  
-  Example: \codex{(split-uri "foo/bar/colon" 4 11)} returns
-  \codex{("bar" "col")}
-\end{desc}
-
-\defun{uri-path->uri}{path}{string}
-\begin{desc}
-  This procedure generates a path out of a URI path list by inserting
-  slashes between the elements of \var{plist}.
-\end{desc}
-%
-If you want to use the resulting string for further operation, you
-should escape the elements of \var{plist} in case they contain
-slashes, like so:
-%
-\begin{verbatim}
-(uri-path->uri (map escape-uri pathlist))
-\end{verbatim}
-
-\defun{simplify-uri-path}{path}{list}
-\begin{desc}
-  This procedure simplifies a URI path.  It removes \verb|"."| and
-  \verb|"/.."| entries from path, and removes parts before a root.
-  The result is a list, or \sharpf{} if the path tries to back up past
-  root.
-\end{desc}
-%
-According to RFC~2396, relative paths are considered not to start with
-\verb|/|.  They are appended to a base URL path and then simplified.
-So before you start to simplify a URL try to find out if it is a
-relative path (i.e. it does not start with a \verb|/|).
-
-Examples:
-%
-\begin{alltt}
-(simplify-uri-path (split-uri  "/foo/bar/baz/.."  0 15))
-\(\Rightarrow\) ("" "foo" "bar")
-
-(simplify-uri-path (split-uri "foo/bar/baz/../../.." 0 20))
-\(\Rightarrow\) ()
-
-(simplify-uri-path (split-uri "/foo/../.." 0 10))
-\(\Rightarrow\) #f
-
-(simplify-uri-path (split-uri "foo/bar//" 0 9))
-\(\Rightarrow\) ("")     
-
-(simplify-uri-path (split-uri "foo/bar/" 0 8))
-\(\Rightarrow\) ("")
-
-(simplify-uri-path (split-uri "/foo/bar//baz/../.." 0 19))
-\(\Rightarrow\) #f
-\end{alltt}
-
+This procedure may only be used on a URI component part, not on a
+complete URI made up of several component parts (see above). Use it to
+write specialized escape-procedures for the respective component
+part. (See the \ex{url} structure for examples).

 %%% Local Variables: 
 %%% mode: latex