From 020f8264f1cf28c0a89416f765f8ff96c5599ab9 Mon Sep 17 00:00:00 2001 From: sperber Date: Fri, 10 Jan 2003 13:27:05 +0000 Subject: [PATCH] First stab at rudimentary up-to-date documentation for the HTTP server. --- doc/latex/httpd.tex | 816 ++++++++++++++++++++++++-------------------- 1 file changed, 442 insertions(+), 374 deletions(-) diff --git a/doc/latex/httpd.tex b/doc/latex/httpd.tex index 08a33f6..9c712bd 100644 --- a/doc/latex/httpd.tex +++ b/doc/latex/httpd.tex @@ -1,256 +1,411 @@ \chapter{HTTP server}\label{cha:httpd} % -\begin{description} -\item[Used files:] httpd/core.scm, httpd/handlers.scm, httpd/options.scm, -\item[Name of the packages:] httpd-core, httpd-basic-handler, httpd-make-options -\end{description} -There are also some other files and packages that are used internally. - % - -The SUnet web system is a collection of packages of \Scheme code that -provides utilities for interacting with the World-Wide Web. This -includes: -\begin{itemize} -\item A Web server. -\item URI and URL parsers and un-parsers (see Chapters \ref{cha:uri} - and \ref{cha:url}). -\item RFC822-style header parsers (see Chapter \ref{cha:rfc822}). -\item Code for performing structured html output -\item Code to assist in writing CGI \Scheme programs that can be used by - any CGI-compliant HTTP server (such as NCSA's httpd, or the SUnet - Web server). -\end{itemize} +The SUnet HTTP Server is a complete industrial-strength implementation +of the HTTP 1.0 protocol. It is highly configurable and allows the writing +of dynamic web pages that run inside the server without going through +complicated and slow protocols like CGI or Fast/CGI. -The server has three main design goals: -\begin{description} -\item[Extensibility] - The server is in fact nothing but extensions, using a mechanism - called ``path handlers'' to define URL-specific services. It has a - toolkit of services that can be used as-is, extended or built - upon. User extensions have exactly the same status as the base - services. - - The extension mechanism allows for easy implementation of new - services without the overhead of the CGI interface. Since the - server is written on top of the Scheme shell, the full set of Unix - system calls and program tools is available to the implementor. - -\item[Mobile code] - The server allows Scheme code to be uploaded for direct execution - inside the server. The server has complete control over the code, - and can safely execute it in restricted environments that do not - provide access to potentially dangerous primitives (such as the - ``delete file'' procedure.) - -\item[Clarity] - I\footnote{That's Olin Shivers (\ex{shivers@ai.mit.edu}, - \ex{http://www.\ob{}ai.\ob{}mit.\ob{}edu/\ob{}people/\ob{}shivers/}). - For the rest of the documentation, if not mentioned otherwise, - `I' refers to him.} wrote this server to help myself understand - the Web. It is voluminously commented, and I hope it will prove to - be an aid in understanding the low-level details of the Web - protocols. - - The SUnet web server has the ability to upload code from Web clients - and execute that code on behalf of the client in a protected - environment. - - Some simple documentation on the server is available. - \end{description} +\section{Starting and configuring the server} -\section{Basic server structure} - -The Web server is started by calling the httpd procedure, which takes -one argument, a \ex{httpd\=options}-record: +All procedures described in this section are exported by the +\texttt{httpd} structure. + +The Web server is started by calling the \ex{httpd} procedure, which takes +one argument, an options value: \defun{httpd}{options}{\noreturn} \begin{desc} - This procedure starts the server. The various \semvar{options} can - be set via the options transformers that are explained below. + This procedure starts the server. The \var{options} argument + specifies various configuration parameters, explained below. The server's basic loop is to wait on the port for a connection from an HTTP client. When it receives a connection, it reads in and - parses the request into a special request data structure. Then the - server forks a thread, who binds the current I/O ports to the + parses the request into a special request data structure. Then the + server forks a thread which binds the current I/O ports to the connection socket, and then hands off to the top-level - \semvar{path-handler} (the first argument to httpd). The - \semvar{path-handler} procedure is responsible for actually serving - the request -- it can be any arbitrary computation. Its output goes + request handler (which must be specified in the options). The + request handler is responsible for actually serving + the request---it can be any arbitrary computation. Its output goes directly back to the HTTP client that sent the request. - Before calling the path handler to service the request, the HTTP + Before calling the request handler to service the request, the HTTP server installs an error handler that fields any uncaught error, sends an error reply to the client, and aborts the request - transaction. Hence any error caused by a path-handler will be + transaction. Hence any error caused by a request handler will be handled in a reasonable and robust fashion. - - The basic server loop, and the associated request data structure are - the fixed architecture of the SUnet Web server; its flexibility lies - in the notion of path handlers. +\end{desc} +% +The options argument can be constructed through a number of procedures +with names of the form \texttt{with-\ldots}. Each of these procedures +either creates a fresh options value or adds a configuration parameter +to an old options argument. The configuration parameter value is +always the first argument, the (old) options value the optional second +one. Here they are: + +\defun{with-port}{port [options]}{options} +\begin{desc} + This specifies the port on which the server listens. Defaults to 80. \end{desc} -\defun{with-port}{port \ovar{options}}{options} -\defunx{with-root-directory}{root-directory - \ovar{options}}{options} -\defunx{with-fqdn}{fqdn \ovar{options}}{options} -\defunx{with-reported-port}{reported-port - \ovar{options}}{options} -\defunx{with-path-handler}{path-handler - \ovar{options}}{options} -\defunx{with-server-admin}{mail-address - \ovar{options}}{options} -\defunx{with-simultaneous-requests}{requests - \ovar{options}}{options} -\defunx{with-logfile}{logfile \ovar{options}}{options} -\defunx{with-syslog?}{syslog? \ovar{options}}{options} -\defunx{with-resolve-ip?}{resolve-ip? \ovar{options}}{options} +\defun{with-root-directory}{root-directory [options]}{options} \begin{desc} - As noted above, these transformers set the options for the web - server. Every transformer changes one aspect of the - \semvar{options} (for the \ex{httpd}). If this optional argument is missing, the - default values are used. These are the following: + This specifies the current directory of the server. Note that this + is \emph{not} the document root directory. Defaults to \texttt{/}. +\end{desc} - \begin{tabular}{ll} - \bf{transformer} & \bf{default value} \\ - \hline - \ex{with\=port} & 80 \\ - \ex{with\=root\=directory} & ``\ex{/}'' \\ - \ex{with\=fqdn} & \sharpf \\ - \ex{with\=reported-port} & \sharpf \\ - \ex{with\=path\=handler} & \sharpf \\ - \ex{with\=server\=admin} & \sharpf \\ - \ex{with\=simultaneous\=requests} & \sharpf \\ - \ex{with\=logfile} & ``\ex{/logfile.log}''\\ - \ex{with\=syslog?} & \sharpt \\ - \ex{with\=resolve\=ip?} & \sharpt - \end{tabular} +\defun{with-fqdn}{fqdn [options]}{options} +\begin{desc} + This specifies the fully-qualified domain name the server uses in + automatically generated replies, or \ex{\#f} if the server should + query DNS for the fully-qualified domain name.. Defaults to \ex{\#f}. +\end{desc} -% that can be found in the \ex{httpd\=make\=options}-structure: -% \ex{with\=port}, \ex{with\=root\=directory}, \ex{with\=fqdn}, -% \ex{with\=reported-port}, \ex{with\=path\=handler}, -% \ex{with\=server\=admin}, \ex{with\=simultaneous-requests}, -% \ex{with\=logfile}, \ex{with\=syslog?} that set the port the server -% is listening to, the root-directory of the server, the FQDN of the -% server, the port the server assumes it is listening to, the -% path-handler of the server (see below), the mail-address of the -% server-admin, the maximum number of simultaneous handled requests, -% the name of the file or the port logging in the Common Log Format -% (CLF) is output to and if the server shall create syslog messages, -% respectively. The port defaults to 80, the root directory defaults -% to ``\ex{/}'', the mail address of the server-admin defaults to -% ``\ex{sperber@\ob{}informatik.\ob{}uni\=tuebingen.\ob{}de}'', -% \FIXME{Why does the server admin mail address have -% sperber@informatik... as default value?}logging is done to -% ``\ex{httpd.log}'' and syslog is enabled. All other options default -% to \sharpf. +\defun{with-reported-port}{reported-port [options]}{options} +\begin{desc} + This specifies the port number the server uses in automatically + generated replies or \ex{\#f} if the reported port is the same as + the port the server is listening on. (This is useful if you're + running the server through an accelerating proxy.) Defaults to + \ex{\#f}. +\end{desc} - For example -\begin{alltt} -(httpd (with-path-handler - (rooted-file-handler "/usr/local/etc/httpd") - (with-root-directory "/usr/local/etc/httpd"))) -\end{alltt} +\defun{with-server-admin}{mail-address [options]}{options} +\begin{desc} + This specifies the email address of the server administrator the + server uses in automatically generated replies. Defaults to \ex{\#f}. +\end{desc} + +\defun{with-icon-name}{icon-name [options]}{options} +\begin{desc} + This specifies how to generate the links to various decorative icons + for the listings. It can either be a procedure which gets passed an + icon tag (a symbol) and is expected to return a link pointing to the icon. If + it is a string, that is taken as prefix to which the icon tag are + appended. If \ex{\#f}, just the plain file names will be used. Defaults to \ex{\#f}. + The valid icon tags, together with the default names of their icon + files, are: + + \begin{center} + \begin{tabular}{|l|l|} + \hline + \texttt{directory} & \texttt{directory.xbm}\\\hline + \texttt{text} & \texttt{text.xbm}\\\hline + \texttt{doc} & \texttt{doc.xbm}\\\hline + \texttt{image} & \texttt{image.xbm}\\\hline + \texttt{movie} & \texttt{movie.xbm}\\\hline + \texttt{audio} & \texttt{sound.xbm}\\\hline + \texttt{archive} & \texttt{tar.xbm}\\\hline + \texttt{compressed} & \texttt{compressed.xbm}\\\hline + \texttt{uu} & \texttt{uu.xbm}\\\hline + \texttt{binhex} & \texttt{binhex.xbm}\\\hline + \texttt{binary} & \texttt{binary.xbm}\\\hline + \texttt{blank} & \texttt{blank.xbm}\\\hline + \texttt{back} & \texttt{back.xbm}\\\hline + unknown & \texttt{unknown.xbm}\\\hline + \end{tabular} + + Example icons can be found as part of the CERN httpd distribution + at \url{http://www.w3.org/pub/WWW/Daemon/}. +\end{center} +\end{desc} + +\defun{with-request-handler}{request-handler [options]}{options} +\begin{desc} + This specifies the request handler of the server to which the server + delegates the actual work. More on that subject below in + Section~\ref{httpd:request-handlers}. This parameter must be specified. +\end{desc} + +\defun{with-simultaneous-requests}{requests [options]}{options} +\begin{desc} + This specifies a limit on the number of simultaneous requests the + server servers. If that limit is exceeded during operation, the + server will hold off on new requests until the number of + simultaneous requests has sunk below the limit again. If this + parameter is \ex{\#f}, no limit is imposed. Defaults to \ex{\#f}. +\end{desc} + +\defun{with-logfile}{logfile [options]}{options} +\begin{desc} + This specifies the name of a log file for the server where it writes + Common Log Format logging information. It can also be a port in + which case the information is logged to that port, or \ex{\#f} for + no logging. Defaults to \ex{\#f}. + + To allow rotation of logfiles, the server re-opens the logfile + whenever it receives a \texttt{USR1} signal. +\end{desc} + +\defun{with-syslog?}{syslog? [options]}{options} +\begin{desc} + This specifies whether the server will log information about + incoming to the Unix syslog facility. Defaults to \ex{\#t}. +\end{desc} + +\defun{with-resolve-ip?}{resolve-ip? [options]}{options} +\begin{desc} + This specifies whether the server writes the domain names rather + than numerical IPs to the output log it produces. Defaults to + \ex{\#t}. +\end{desc} + +To avoid paranthitis, the \ex{make-httpd-options} procedure eases the +construction of the options argument: + +\defun{make-httpd-options}{transformer value \ldots}{options} +\begin{desc} + This constructs an options value from an argument list of parameter + transformers and parameter values. The arguments come in pairs, + each an option transformer from the list above, and a value for that + parameter. \ex{Make-httpd-options} returns the resulting options value. +\end{desc} + +For example, +\begin{alltt} +(httpd (make-httpd-options + with-request-handler (rooted-file-handler "/usr/local/etc/httpd") + with-root-directory "/usr/local/etc/httpd")) +\end{alltt} + % starts the server on port 80 with - ``\ex{/usr/\ob{}local/\ob{}etc/\ob{}httpd}'' as root directory and + \ex{/usr/local/etc/httpd} as its root directory and lets it serve any file out from this directory. - \ex{rooted\=file\=handler} creates a path handler and is explained - below. You see, the transformers are used nested. So, every - transformer changes one aspect of the options that the following - transformer returns and the last transformer (here: - \ex{with\=root\=directory}) changes an aspect of the default values + % #### note about rooted-file-handler - - \semvar{port} is the port the server is listening to, - \semvar{root-directory} is the directory in the file system the - server uses as root, \semvar{fqdn} is the fully qualified domain - name the server reports, \semvar{reported-port} is the port the - server reports it is listening to and \semvar{server-admin} is the - mail address of the server admin. \semvar{requests} denote the - maximum number of allowed simultaneous requests to the server. - \sharpf\ means infinite. \semvar{logfile} is either a string, then - it is the file name of the logfile, or a port, where the log entries - are written to, or \sharpf, that means no logging is made. The - logfile is in Common Log Format (CLF). To allow rotation of - logfiles, the server will reopen the logfile when it receives the - signal \texttt{USR1}. \semvar{syslog?} tells the server to write - syslog messages (\sharpt) or not (\sharpf). -\end{desc} -\section{Path handlers} -\label{httpd:path-handlers} - - A path handler is a procedure taking two arguments: -\defun{path-handler}{path req}{value} +\section{Requests} +\label{httpd:requests} + +Request handlers operate on \textit{requests} which contain the +information needed to generate a page. The relevant procedures to +dissect requests are defined in the \texttt{httpd-requests} structure: + +\defun{request?}{value}{boolean} +\defunx{request-method}{request}{string} +\defunx{request-uri}{request}{string} +\defunx{request-url}{request}{url} +\defunx{request-version}{request}{pair} +\defunx{request-headers}{request}{list} +\defunx{request-socket}{request}{socket} \begin{desc} - The \semvar{req} argument is a request record giving all the details - of the client's request; it has the following structure: \FIXME{Make - the record's structure a table} -\begin{alltt} -(define-record request - method ; A string such as "GET", "PUT", etc. - uri ; The escaped URI string as read from request line. - url ; An http URL record (see url.scm). - version ; A (major . minor) integer pair. - headers ; An rfc822 header alist (see rfc822.scm). - socket) ; The socket connected to the client. -\end{alltt} - -The \semvar{path} argument is the URL's path, parsed and split at -slashes into a string list. For example, if the Web client -dereferences URL -\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz} -then the server would pass the following path to the top-level -handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{} - "web.\ob{}tar.\ob{}gz")} - -The \semvar{path} argument's pre-parsed representation as a string -list makes it easy for the path handler to implement recursive -operations dispatch on URL paths. + The procedure inspect request values. \ex{Request?} is a predicate + for requests. \ex{Request-method} extracts the method of the HTTP + request; it's a string such as \verb|"GET"|, \verb|"PUT"|. + \ex{Request-uri} returns the escaped URI string as read from request + line. \ex{Request-url} returns an HTTP URL value (see the + description of the \ex{url} structure in \ref{secchap:url}). + \ex{Request-version} returns \verb|(major . minor)| integer pair + representing the version specified in the HTTP request. + \ex{Request-headers} returns an association lists of header field + names and their values, each represented by a list of strings, one + for each line. \ex{Request-socket} returns the the socket connected + to the client.\footnote{Request handlers should not perform I/O on the + request record's socket. Request handlers are frequently called + recursively, and doing I/O directly to the socket might bypass a + filtering or other processing step interposed on the current I/O ports + by some superior request handler.} \end{desc} - -Path handlers can do anything they like to respond to HTTP requests; -they have the full range of Scheme to implement the desired -functionality. When handling HTTP requests that have an associated -entity body (such as POST), the body should be read from the current -input port. Path handlers should in all cases write their reply to the -current output port. Path handlers should not perform I/O on the -request record's socket. Path handlers are frequently called -recursively, and doing I/O directly to the socket might bypass a -filtering or other processing step interposed on the current I/O ports -by some superior path handler. -\section{Basic path handlers} - -Although the user can write any path-handler he likes, the SUnet web server -comes with a useful toolbox of basic path handlers that can be used -and built upon (exported by the \ex{httpd\=basic\=handlers}-structure): - -\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler} - This procedure takes a \ex{string->\ob{}path\=handler} alist, and a - default path handler, and returns a handler that dispatches on its - path argument. When the new path handler is applied to a path - \ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of - the path -- ``\ex{foo}'' -- to index into the alist. If it finds an - associated path handler in the alist, it hands the request off to - that handler, passing it the tail of the path, \ex{("bar"\ob{} - "baz")}. On the other hand, if the path is empty, or the alist - search does not yield a hit, we hand off to the default path - handler, passing it the entire original path, \ex{("foo"\ob{} - "bar"\ob{} "baz")}. - +\section{Responses} +\label{sec:http-responses} + +A path handler must return a \textit{response} value representing the +content to be sent to the client. The machinery presented here for +constructing responses lives in the \ex{httpd-responses} structure. + +\defun{make-response}{status-code maybe-message seconds mime extras + body}{response} +\begin{desc} + This procedure constructs a response value. \var{Status-code} is an + HTTP status code (more on that below). \var{Maybe-message} is a a + message elaborating on the circumstances of the status code; it can + also be \sharpf{} meaning that the server should send a default + message associated with the status code. \var{Seconds} natural + number indicating the time the content was created, typically the + value of \verb|(time)|. \var{Mime} is a string indicating the MIME + type of the response (such as \verb|"text/html"| or + \verb|"application/octet-stream"|). \var{Extras} is an association + list with extra headers to be added to the response; its elements + are pairs, each of which consists of a symbol representing the field + name and a string representing the field value. \var{Body} + represents the body of the response; more on that below. +\end{desc} + +\defun{make-error-response}{status-code request [message] extras \ldots}{response} +\begin{desc} + This is a helper procedure for constructing error responses. + \var{code} is status code of the response (see below). \var{Request} + is the request that led to the error. \var{Message} is an optional + string containing an error message written in HTML, and \var{extras} + are further optional arguments containing further message lines to + be added to the web page that's generated. + + \ex{Make-error-response} constructs a response value which generates + a web page containg a short explanatory message for the error at hand. +\end{desc} + +\begin{table}[htb] + \centering + \begin{tabular}{|l|l|l|} + \hline + ok & 200 & OK\\\hline + created & 201 & Created\\\hline + accepted & 202 & Accepted\\\hline + prov-info & 203 & Provisional Information\\\hline + no-content & 204 & No Content\\\hline + + mult-choice & 300 & Multiple Choices\\\hline + moved-perm & 301 & Moved Permanently\\\hline + moved-temp & 302 & Moved Temporarily\\\hline + method & 303 & Method (obsolete)\\\hline + not-mod & 304 & Not Modified\\\hline + + bad-request & 400 & Bad Request\\\hline + unauthorized & 401 & Unauthorized\\\hline + payment-req & 402 & Payment Required\\\hline + forbidden & 403 & Forbidden\\\hline + not-found & 404 & Not Found\\\hline + method-not-allowed & 405 & Method Not Allowed\\\hline + none-acceptable & 406 & None Acceptable\\\hline + proxy-auth-required & 407 & Proxy Authentication Required\\\hline + timeout & 408 & Request Timeout\\\hline + conflict & 409 & Conflict\\\hline + gone & 410 & Gone\\\hline + internal-error & 500 & Internal Server Error\\\hline + not-implemented & 501 & Not Implemented\\\hline + bad-gateway & 502 & Bad Gateway\\\hline + service-unavailable & 503 & Service Unavailable\\\hline + gateway-timeout & 504 & Gateway Timeout\\\hline + \end{tabular} + \caption{HTTP status codes} + \label{tab:status-code-names} +\end{table} + +\dfn{status-code}{\synvar{name}}{status-code}{syntax} +\defunx{name->status-code}{symbol}{status-code} +\defunx{status-code-number}{status-code}{integer} +\defunx{status-code-message}{status-code}{string} +\begin{desc} + The \ex{status-code} syntax returns a status code where + \synvar{name} is the name from Table~\ref{tab:status-code-names}. + \ex{Name->status-code} also returns a status code for a name + represented as a symbol. For a given status code, + \ex{status-code-number} extracts its number, and + \ex{status-code-message} extracts its associated default message. +\end{desc} + +\section{Request Handlers} + +A request handler generates the actual content for a request; request +handlers form a simple algebra and may be combined and composed in +various ways. + + +A request handler is a procedure of two arguments like this: +\defun{request-handler}{path req}{response} +\begin{desc} + \var{Req} is a request. The \semvar{path} argument is the URL's + path, parsed and split at slashes into a string list. For example, + if the Web client dereferences URL + % +\begin{verbatim} +http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz +\end{verbatim} + then the server would pass the following path to the top-level + handler: + % +\begin{verbatim} +("h" "shivers" "code" "web.tar.gz") +\end{verbatim} + % + The \var{path} argument's pre-parsed representation as a string + list makes it easy for the request handler to implement recursive + operations dispatch on URL paths. + + The request handler must return an HTTP response. +\end{desc} + +\subsection{Basic Request Handlers} + +The web server comes with a useful toolbox of basic request handlers +that can be used and built upon. The following procedures are +exported by the \ex{httpd\=basic\=handlers} structure: + +\defvar{null-request-handler}{request-handler} +\begin{desc} + This request handler always generated a \ex{not-found} error + response, no patter what the request is. +\end{desc} + +\defun{make-predicate-handler}{predicate handler + default-handler}{request-handler} +\begin{desc} + The request handler returned by this procedure first calls + \var{predicate} on its path and request; it then acts like + \var{handler} if the predicate returned a true vale, and like + \var{default-handler} if the predicate returned \sharpf. +\end{desc} + +\defun{make-host-name-handler}{hostname handler default-handler}{request-handler} +\begin{desc} + The request handler returned by this procedure compares the host + name specified in the request with \var{hostname}: if they match, it + acts like \var{handler}, otherwise, it acts like + \var{default-handler}. +\end{desc} + +\defun{make-path-predicate-handler}{predicate handler + default-handler}{request-handler} +\begin{desc} + The request handler returned by this procedure first calls + \var{predicate} on its path; it then acts like \var{handler} if the + predicate returned a true vale, and like \var{default-handler} if + the predicate returned \sharpf. +\end{desc} + +\defun{make-path-prefix-handler}{path-prefix handler default-handler}{request-handler} +\begin{desc} + This constructs a request handler that calls \var{handler} on its + argument if \var{path-prefix} (a string) is the first element of the + requested path; it calls \var{handler} on the rest of the path and + the original request. Otherwise, the handler acts like + \var{default-handler}. +\end{desc} + +\defun{alist-path-dispatcher}{handler-alist default-handler}{request-handler} +\begin{desc} + This procedure takes as arguments an alist mapping strings to path + handlers, and a default request handler, and returns a handler that + dispatches on its path argument. When the new request handler is + applied to a path +\begin{verbatim} +("foo" "bar" "baz") +\end{verbatim} + it uses the + first element of the path---\ex{foo}---to index into the + alist. If it finds an associated request handler in the alist, it + hands the request off to that handler, passing it the tail of the + path, in this case +\begin{verbatim} +("bar" "baz") +\end{verbatim} + % + On the other hand, if the path is + empty, or the alist search does not yield a hit, we hand off to the + default path handler, passing it the entire original path, +\begin{verbatim} +("foo" "bar" "baz") +\end{verbatim} + % This procedure is how you say: ``If the first element of the URL's - path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one - takes an object-oriented view of the process, an alist path-handler - does method lookup on the requested operation, dispatching off to - the appropriate method defined for the URL. - + path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' The slash-delimited URI path structure implies an associated tree of - names. The path-handler system and the alist dispatcher allow you to + names. The request-handler system and the alist dispatcher allow you to procedurally define the server's response to any arbitrary subtree of the path space. - Example: A typical top-level path handler is + Example: A typical top-level request handler is \begin{alltt} (define ph (alist-path-dispatcher @@ -265,9 +420,9 @@ and built upon (exported by the \ex{httpd\=basic\=handlers}-structure): \item If the path looks like \ex{("h"\ob{} "shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path \ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a - home-directory path handler. + home-directory request handler. \item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")}, - pass ("calendar") off to the CGI path handler. + pass ("calendar") off to the CGI request handler. \item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail of the path is passed off to the code-uploading seval path handler. @@ -276,133 +431,21 @@ and built upon (exported by the \ex{httpd\=basic\=handlers}-structure): \ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs}, and serve that file. \end{itemize} -\end{defundesc} - -\begin{defundesc}{home-dir-handler}{subdir}{path-handler} - This procedure builds a path handler that does basic file serving - out of home directories. If the resulting \semvar{path-handler} is - passed a path of \ex{(user . file\=path)}, then it serves the file - \ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path} - - The path handler only handles GET requests; the filename is not - allowed to contain \ex{..} elements. -\end{defundesc} - -\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler} - This path handler examines the car of the path. If it is a string - beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is - taken to mean a home directory, and the request is served similarly - to a home-dir-handler path handler. Otherwise, the request is passed - off in its entirety to the \semvar{default-path-handler}. - - This procedure is useful for implementing servers that provide the - semantics of the NCSA httpd server. -\end{defundesc} - -\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler} - This procedure returns a path-handler that passes the request off to - some program using the CGI interface. The script name is taken from - the car of the path; it is checked for occurrences of \ex{..}'s. If - the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the - program executed is - \ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}. +\end{desc} - When the CGI path handler builds the process environment for the CGI - script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be - computed at server start-up time. This can be done by calling - \codex{(initialise-request-invariant-cgi-env)} - when the server starts up. This is not necessary, but will make CGI - requests a little faster. -\end{defundesc} - -\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler} - Returns a path handler that serves files from a particular root in - the file system. Only the GET operation is provided. The path - argument passed to the handler is converted into a filename, and - appended to root-dir. The file name is checked for \ex{..} - components, and the transaction is aborted if it does. Otherwise, - the file is served to the client. -\end{defundesc} +\subsection{Static Content Request Handlers} -\begin{defundesc}{rooted-file-or-directory-handler}{root -icon-name}{path-handler} +The request handlers described in this section are for serving static +content off directory trees in the file system. They live in the +\ex{httpd-file-directory-handlers} structure. -Dito, but also serve directory indices for directories without -\ex{index.\ob{}html}. \semvar{icon-name} specifies how to generate -the links to various decorative icons for the listings. It can either -be a procedure which gets passed one of the icon tags listed below and -is expected to return a link pointing to the icon. If it is a string, -that is taken as prefix to which the file names of the tags listed -below are appended. - -\begin{tabular}{ll} -Tag & Icon's file name \\ -\hline -\ex{directory} & \ex{directory.xbm}\\ -\ex{text} & \ex{text.xbm}\\ -\ex{doc} & \ex{doc.xbm}\\ -\ex{image} & \ex{image.xbm}\\ -\ex{movie} & \ex{movie.xbm}\\ -\ex{audio} & \ex{sound.xbm}\\ -\ex{archive} & \ex{tar.xbm}\\ -\ex{compressed} & \ex{compressed.xbm}\\ -\ex{uu} & \ex{uu.xbm}\\ -\ex{binhex} & \ex{binhex.xbm}\\ -\ex{binary} & \ex{binary.xbm}\\ -\ex{blank} & \ex{blank.xbm}\\ -\ex{back} & \ex{back.xbm}\\ -\ex{\it{}else} & \ex{unknown.xbm}\\ -\end{tabular} -\end{defundesc} - -\begin{defundesc}{null-path-handler}{path req}{\noreturn} - This path handler is useful as a default handler. It handles no - requests, always returning a ``404 Not found'' reply to the client. -\end{defundesc} - -\section{HTTP errors} - -Authors of path-handlers need to be able to handle errors in a -reasonably simple fashion. The SUnet Web server provides a set of error -conditions that correspond to the error replies in the HTTP protocol. -These errors can be raised with the \ex{http\=error} procedure. When -the server runs a path handler, it runs it in the context of an error -handler that catches these errors, sends an error reply to the client, -and closes the transaction. - -\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn} - This raises an http error condition. The reply code is one of the - numeric HTTP error reply codes, which are bound to the variables - \ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found, - http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The - \semvar{req} argument is the request record that caused the error. - Any following extra args are passed along for informational - purposes. Different HTTP errors take different types of extra - arguments. For example, the ``301 moved permanently'' and ``302 - moved temporarily'' replies use the first two extra values as the - \ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header, - respectively. See the clauses of the - \ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details. -\end{defundesc} - -\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn} - This procedure writes an error reply out to the current output port. - If an error occurs during this process, it is caught, and the - procedure silently returns. The http server's standard error handler - passes all http errors raised during path-handler execution to this - procedure to generate the error reply before aborting the request - transaction. -\end{defundesc} - -\section{Simple directory generation} - -Most path-handlers that serve files to clients eventually call an -internal procedure named \ex{file\=serve}, which implements a simple -directory-generation service using the following rules: +The request handlers in this section eventually call an internal +procedure named \ex{file\=serve} for serving files which implements a +simple directory-generation service using the following rules: \begin{itemize} \item If the filename has the form of a directory (i.e., it ends with a slash), then \ex{file\=serve} actually looks for a file named - ``index.html'' in that directory. + \ex{index.html} in that directory. \item If the filename names a directory, but is not in directory form (i.e., it doesn't end in a slash, as in ``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''), @@ -415,49 +458,74 @@ directory-generation service using the following rules: \item If the filename names a regular file, it is served to the client. \end{itemize} + +\defun{rooted-file-handler}{root-dir}{request-handler} +\begin{desc} + This returns a request handler that serves files from a particular + root in the file system. Only the \ex{GET} operation is provided. + The path argument passed to the handler is converted into a + filename, and appended to root-dir. The file name is checked for + \ex{..} components, and the transaction is aborted if it does. + Otherwise, the file is served to the client. +\end{desc} + +\defun{rooted-file-or-directory-handler}{root}{request-handler} +\begin{desc} +Dito, but also serve directory indices for directories without +\ex{index.html}. +\end{desc} + +\defun{home-dir-handler}{subdir}{request-handler} +\begin{desc} + This procedure builds a request handler that does basic file serving + out of home directories. If the resulting \var{request-handler} is + passed a path of the form \ex{(\var{user} . \var{file-path})}, then it serves the file + \ex{\var{subdir}/\var{file-path}} inside the user's home directory. + + The request handler only handles GET requests; the filename is not + allowed to contain \ex{..} elements. +\end{desc} + +\defun{tilde-home-dir-handler}{subdir + default-request-handler}{request-handler} +\begin{desc} + This returns request handler that examines the car of the path. If + it is a string beginning with a tilde, e.g., \ex{"~ziggy"}, then the + string is taken to mean a home directory, and the request is served + similarly to a home-dir-handler request handler. Otherwise, the + request is passed off in its entirety to the + \var{default-request-handler}. +\end{desc} \section{CGI Server} -\begin{defundesc}{cgi-handler}{bin-dir \ovar{cgi-bin-dir}}{path-handler} - Returns a path handler (see \ref{httpd:path-handlers} for details - about path handlers) for cgi-scripts located in - \semvar{bin-dir}. \semvar{cgi-bin-dir} specifies the value of the -\ex{PATH} variable of the environment the cgi-scripts run in. It defaults -to -``\ex{/bin:\ob{}/usr/bin:\ob{}/usr/ucb:\ob{}/usr/bsd:\ob{}/usr/local/bin}'' -but is overwritten by the current \ex{PATH} environment variable at -the time \ex{cgi-handler} ist called. The cgi-scripts are called as -specified by CGI/1.1\footnote{see -\ex{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of -specification.}. - -\begin{itemize} -\item Various environment variables are set (like - \ex{QUERY\_STRING} or \ex{REMOTE\_HOST}). -\item ISINDEX queries get their arguments as command line arguments. -\item Scripts are handled differently according to their name: +\defun{cgi-handler}{bin-dir [cgi-bin-path]}{request-handler} +\begin{desc} + Returns a request handler for CGI scripts located in + \var{bin-dir}. \var{Cgi-bin-dir} specifies the value of the + \ex{PATH} variable of the environment the CGI scripts run in. It defaults + to +\begin{verbatim} +/bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin +\end{verbatim} + The CGI scripts are called as specified by CGI/1.1\footnote{see + \url{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of + specification.}. + Note that the CGI handler looks at the name of the CGI script to + determine how it should be handled: \begin{itemize} - - \item If the name of the script starts with `\ex{nph-}', its reply - is read, the RFC~822-fields like ``Content-Type'' and ``Status'' + \item If the name of the script starts with `\ex{nph-}', its reply + is read, the RFC~822-fields like \ex{Content-Type} and \ex{Status} are parsed and the client is sent back a real HTTP reply, - containing the rest of the script's output. - - \item If the name of the script doesn't start with `\ex{nph-}', + containing the rest of the script's output. + + \item If the name of the script doesn't start with `\ex{nph-}', its output is sent back to the client directly. If its return code is not zero, an error message is generated. - \end{itemize} -\end{itemize} -\end{defundesc} +\end{desc} -\section{Support procs} - -The source files contain a host of support procedures which will be of -utility to anyone writing a custom path-handler. Read the files first. -\FIXME{Let us read the files and paste the contents here.} - %%% Local Variables: %%% mode: latex %%% TeX-master: "man"