diff --git a/doc/latex/uri.tex b/doc/latex/uri.tex index bcf42c4..3feb146 100644 --- a/doc/latex/uri.tex +++ b/doc/latex/uri.tex @@ -1,6 +1,8 @@ \chapter{Parsing and Processing URIs}\label{cha:uri} The \ex{uri} structure contains a library for dealing with URIs. +and is out-of-date by now---it is build up on RFC 1360 of 1994 which +was replaced by RFC 1738, RFC 1808, and finally RFC 2396 of 1998. \section{Notes on URI Syntax} @@ -24,43 +26,6 @@ be used in a URI. \section{Procedures} -\defun{parse-uri} {uri-string } {scheme path search - frag-id} \label{proc:parse-uri} -\begin{desc} - Parses an \var{uri\=string} into its four fields. - The fields are \emph{not} unescaped, as the rules for - parsing the \var{path} component in particular need unescaped - text, and are dependent on \var{scheme}. The URL parser is - responsible for doing this. If the \var{scheme}, \var{search} - or \var{fragid} portions are not specified, they are \sharpf. - Otherwise, \var{scheme}, \var{search}, and \var{fragid} are - strings. \var{path} is a non-empty string list---the path split - at slashes. -\end{desc} - -Here is a description of the parsing technique. It is inwards from -both ends: -\begin{itemize} -\item First, the code searches forwards for the first reserved - character (\verb|=|, \verb|;|, \verb|/|, \verb|#|, \verb|?|, - \verb|:| or \verb|space|). If it's a colon, then that's the - \var{scheme} part, otherwise there is no \var{scheme} part. At - all events, it is removed. -\item Then the code searches backwards from the end for the last reserved - char. If it's a sharp, then that's the \var{fragid} part---remove it. -\item Then the code searches backwards from the end for the last reserved - char. If it's a question-mark, then that's the \var{search} - part----remove it. -\item What's left is the path. The code split it at slashes. The - empty string becomes a list containing the empty string. -\end{itemize} -% -This scheme is tolerant of the various ways people build broken -URI's out there on the Net\footnote{So it does not absolutely conform - to RFC~1630.}, e.g.\ \verb|=| is a reserved character, but used -unescaped in the search-part. It was given to me\footnote{That's - Olin Shivers.} by Dan Connolly of the W3C and slightly modified. - \defun{unescape-uri}{string [start] [end]}{string} \begin{desc} \ex{Unescape-uri} unescapes a string. If \var{start} and/or \var{end} are diff --git a/doc/latex/url.tex b/doc/latex/url.tex index cceb609..bf5a289 100644 --- a/doc/latex/url.tex +++ b/doc/latex/url.tex @@ -56,6 +56,21 @@ For details about escaping and unescaping see Chapter~\ref{cha:uri}. \section{HTTP URLs} +\defun{parse-uri} {uri-string } {host port path query} \label{proc:parse-uri} +\begin{desc} + Parses an HTTP 1.1 \var{uri\=string} into its four fields. + The fields returned are \emph{not} decoded. + If \var{uri\=string} is not an http URL but an abs\_path + the \var{host}, \var{port} + and \var{query} portions are not specified, they are \sharpf. + Otherwise, \var{host}, \var{port}, and \var{query} are + strings. \var{path} is a non-empty string list---the path split + at slashes. +\end{desc} +This parser does not absolutely conform to RFC 2616 in allowing +a fragment-suffix. Furthermore only http URLs, not absolute URIs in general are +recognized. + \defun{make-http-url}{server path search frag-id}{http-url} \defunx{http-url?}{thing}{boolean} \defunx{http-url-server}{http-url}{server}