diff --git a/doc/latex/uri.tex b/doc/latex/uri.tex index 3feb146..7e28b23 100644 --- a/doc/latex/uri.tex +++ b/doc/latex/uri.tex @@ -1,129 +1,48 @@ -\chapter{Parsing and Processing URIs}\label{cha:uri} +\chapter{Processing URIs}\label{cha:uri} -The \ex{uri} structure contains a library for dealing with URIs. -and is out-of-date by now---it is build up on RFC 1360 of 1994 which -was replaced by RFC 1738, RFC 1808, and finally RFC 2396 of 1998. +The \ex{uri} structure contains library functions for dealing with URIs. \section{Notes on URI Syntax} -A URI (Uniform Resource Identifier) is of following syntax: -% -\begin{inset} -[\var{scheme}] \verb|:| \var{path} [\verb|?| \var{search}] [\verb|#| \var{fragid}] -\end{inset} -% -Parts in brackets may be omitted. +The generic syntax of URI (Uniform Resource Identifier) is defined in +RFC 2396; see Appendix A for a collected BNF of URI. -The URI contains characters like \verb|:| to indicate its different -parts. Some special characters are \emph{escaped} if they are a -regular part of a name and not indicators for the structure of a URI. -Escape sequences are of following scheme: \verb|%|\var{h}\var{h} where \var{h} -is a hexadecimal digit. The hexadecimal number refers to the -ASCII of the escaped character, e.g.\ \verb|%20| is space (ASCII -32) and \verb|%61| is `a' (ASCII 97). This module -provides procedures to escape and unescape strings that are meant to -be used in a URI. +Within URI non-printable Ascii characters are represented by an +\emph{escape encoding}. \emph{Reserved} characters used as +delimiters indicating the different parts of a URI also must be +\emph{escaped} if they are to be regular data of a URI component. The +set of characters actually \emph{reserved} within any given URI +component is defined by that component. Therefore +\emph{escaping} can only be done when the URI is being created from +its component parts; likewise, a URI must be separated into its +component parts before \emph{unescaping} can be done. + +Escape sequences are of following scheme: \verb|%| \var{h}\var{h} +where \var{h}\var{h} are the two hexadecimal digits representing the octet code. For +example \verb|%20| is the escaped encoding for the US-ASCII space character. \section{Procedures} -\defun{unescape-uri}{string [start] [end]}{string} +\defun{unescape}{string}{string} \begin{desc} - \ex{Unescape-uri} unescapes a string. If \var{start} and/or \var{end} are - specified, they specify start and end positions within \var{string} - should be unescaped. + \ex{Unescape} unescapes a string. \end{desc} % -This procedure should only be used \emph{after} the URI was parsed, -since unescaping may introduce characters that blow up the -parse---that's why escape sequences are used in URIs. +This procedure may only be used \emph{after} the URI was parsed into +its component parts (see above). -\defvar{uri-escaped-chars}{char-set} +\defun{escape} {string regexp} {string} \begin{desc} - This is a set of characters (in the sense of SRFI~14) which are - escaped in URIs. RFC 2396 defines this set as all characters which - are neither letters, nor digits, nor one of the following characters: - \verb|-|, \verb|_|, \verb|.|, \verb|!|, %$ - \verb|~|, \verb|*|, \verb|'|, \verb|(|, \verb|)|. + \ex{Escape} replaces reserved or excluded characters in \var{string} + by their escaped representation. \var{regexp} defines which + characters are reserved or excluded within the particular URI component + being escaped. \end{desc} -\defun{escape-uri} {string [escaped-chars]} {string} -\begin{desc} - This procedure escapes characters of \var{string} that are in - \var{escaped\=chars}. \var{Escaped\=chars} defaults to - \ex{uri\=escaped\=chars}. -\end{desc} -% -Be careful with using this procedure to chunks of text with -syntactically meaningful reserved characters (e.g., paths with URI -slashes or colons)---they'll be escaped, and lose their special -meaning. E.g.\ it would be a mistake to apply \ex{escape-uri} to -\begin{verbatim} -//lcs.mit.edu:8001/foo/bar.html -\end{verbatim} -% -because the sla\-shes and co\-lons would be escaped. - -\defun{split-uri}{uri start end} {list} -\begin{desc} - This procedure splits \var{uri} at slashes. Only the substring given - with \var{start} (inclusive) and \var{end} (exclusive) as indices is - considered. \var{start} and $\var{end} - 1$ have to be within the - range of \var{uri}. Otherwise an \ex{index-out-of-range} exception - will be raised. - - Example: \codex{(split-uri "foo/bar/colon" 4 11)} returns - \codex{("bar" "col")} -\end{desc} - -\defun{uri-path->uri}{path}{string} -\begin{desc} - This procedure generates a path out of a URI path list by inserting - slashes between the elements of \var{plist}. -\end{desc} -% -If you want to use the resulting string for further operation, you -should escape the elements of \var{plist} in case they contain -slashes, like so: -% -\begin{verbatim} -(uri-path->uri (map escape-uri pathlist)) -\end{verbatim} - -\defun{simplify-uri-path}{path}{list} -\begin{desc} - This procedure simplifies a URI path. It removes \verb|"."| and - \verb|"/.."| entries from path, and removes parts before a root. - The result is a list, or \sharpf{} if the path tries to back up past - root. -\end{desc} -% -According to RFC~2396, relative paths are considered not to start with -\verb|/|. They are appended to a base URL path and then simplified. -So before you start to simplify a URL try to find out if it is a -relative path (i.e. it does not start with a \verb|/|). - -Examples: -% -\begin{alltt} -(simplify-uri-path (split-uri "/foo/bar/baz/.." 0 15)) -\(\Rightarrow\) ("" "foo" "bar") - -(simplify-uri-path (split-uri "foo/bar/baz/../../.." 0 20)) -\(\Rightarrow\) () - -(simplify-uri-path (split-uri "/foo/../.." 0 10)) -\(\Rightarrow\) #f - -(simplify-uri-path (split-uri "foo/bar//" 0 9)) -\(\Rightarrow\) ("") - -(simplify-uri-path (split-uri "foo/bar/" 0 8)) -\(\Rightarrow\) ("") - -(simplify-uri-path (split-uri "/foo/bar//baz/../.." 0 19)) -\(\Rightarrow\) #f -\end{alltt} - +This procedure may only be used on a URI component part, not on a +complete URI made up of several component parts (see above). Use it to +write specialized escape-procedures for the respective component +part. (See the \ex{url} structure for examples). %%% Local Variables: %%% mode: latex