First stab at rudimentary up-to-date documentation for the HTTP

server.
This commit is contained in:
sperber 2003-01-10 13:27:05 +00:00
parent 9eeb665323
commit 020f8264f1
1 changed files with 442 additions and 374 deletions

View File

@ -1,256 +1,411 @@
\chapter{HTTP server}\label{cha:httpd} \chapter{HTTP server}\label{cha:httpd}
% %
\begin{description} The SUnet HTTP Server is a complete industrial-strength implementation
\item[Used files:] httpd/core.scm, httpd/handlers.scm, httpd/options.scm, of the HTTP 1.0 protocol. It is highly configurable and allows the writing
\item[Name of the packages:] httpd-core, httpd-basic-handler, httpd-make-options of dynamic web pages that run inside the server without going through
\end{description} complicated and slow protocols like CGI or Fast/CGI.
There are also some other files and packages that are used internally.
%
The SUnet web system is a collection of packages of \Scheme code that
provides utilities for interacting with the World-Wide Web. This
includes:
\begin{itemize}
\item A Web server.
\item URI and URL parsers and un-parsers (see Chapters \ref{cha:uri}
and \ref{cha:url}).
\item RFC822-style header parsers (see Chapter \ref{cha:rfc822}).
\item Code for performing structured html output
\item Code to assist in writing CGI \Scheme programs that can be used by
any CGI-compliant HTTP server (such as NCSA's httpd, or the SUnet
Web server).
\end{itemize}
The server has three main design goals: \section{Starting and configuring the server}
\begin{description}
\item[Extensibility]
The server is in fact nothing but extensions, using a mechanism
called ``path handlers'' to define URL-specific services. It has a
toolkit of services that can be used as-is, extended or built
upon. User extensions have exactly the same status as the base
services.
The extension mechanism allows for easy implementation of new
services without the overhead of the CGI interface. Since the
server is written on top of the Scheme shell, the full set of Unix
system calls and program tools is available to the implementor.
\item[Mobile code]
The server allows Scheme code to be uploaded for direct execution
inside the server. The server has complete control over the code,
and can safely execute it in restricted environments that do not
provide access to potentially dangerous primitives (such as the
``delete file'' procedure.)
\item[Clarity]
I\footnote{That's Olin Shivers (\ex{shivers@ai.mit.edu},
\ex{http://www.\ob{}ai.\ob{}mit.\ob{}edu/\ob{}people/\ob{}shivers/}).
For the rest of the documentation, if not mentioned otherwise,
`I' refers to him.} wrote this server to help myself understand
the Web. It is voluminously commented, and I hope it will prove to
be an aid in understanding the low-level details of the Web
protocols.
The SUnet web server has the ability to upload code from Web clients
and execute that code on behalf of the client in a protected
environment.
Some simple documentation on the server is available.
\end{description}
\section{Basic server structure} All procedures described in this section are exported by the
\texttt{httpd} structure.
The Web server is started by calling the httpd procedure, which takes
one argument, a \ex{httpd\=options}-record: The Web server is started by calling the \ex{httpd} procedure, which takes
one argument, an options value:
\defun{httpd}{options}{\noreturn} \defun{httpd}{options}{\noreturn}
\begin{desc} \begin{desc}
This procedure starts the server. The various \semvar{options} can This procedure starts the server. The \var{options} argument
be set via the options transformers that are explained below. specifies various configuration parameters, explained below.
The server's basic loop is to wait on the port for a connection from The server's basic loop is to wait on the port for a connection from
an HTTP client. When it receives a connection, it reads in and an HTTP client. When it receives a connection, it reads in and
parses the request into a special request data structure. Then the parses the request into a special request data structure. Then the
server forks a thread, who binds the current I/O ports to the server forks a thread which binds the current I/O ports to the
connection socket, and then hands off to the top-level connection socket, and then hands off to the top-level
\semvar{path-handler} (the first argument to httpd). The request handler (which must be specified in the options). The
\semvar{path-handler} procedure is responsible for actually serving request handler is responsible for actually serving
the request -- it can be any arbitrary computation. Its output goes the request---it can be any arbitrary computation. Its output goes
directly back to the HTTP client that sent the request. directly back to the HTTP client that sent the request.
Before calling the path handler to service the request, the HTTP Before calling the request handler to service the request, the HTTP
server installs an error handler that fields any uncaught error, server installs an error handler that fields any uncaught error,
sends an error reply to the client, and aborts the request sends an error reply to the client, and aborts the request
transaction. Hence any error caused by a path-handler will be transaction. Hence any error caused by a request handler will be
handled in a reasonable and robust fashion. handled in a reasonable and robust fashion.
\end{desc}
The basic server loop, and the associated request data structure are %
the fixed architecture of the SUnet Web server; its flexibility lies The options argument can be constructed through a number of procedures
in the notion of path handlers. with names of the form \texttt{with-\ldots}. Each of these procedures
either creates a fresh options value or adds a configuration parameter
to an old options argument. The configuration parameter value is
always the first argument, the (old) options value the optional second
one. Here they are:
\defun{with-port}{port [options]}{options}
\begin{desc}
This specifies the port on which the server listens. Defaults to 80.
\end{desc} \end{desc}
\defun{with-port}{port \ovar{options}}{options} \defun{with-root-directory}{root-directory [options]}{options}
\defunx{with-root-directory}{root-directory
\ovar{options}}{options}
\defunx{with-fqdn}{fqdn \ovar{options}}{options}
\defunx{with-reported-port}{reported-port
\ovar{options}}{options}
\defunx{with-path-handler}{path-handler
\ovar{options}}{options}
\defunx{with-server-admin}{mail-address
\ovar{options}}{options}
\defunx{with-simultaneous-requests}{requests
\ovar{options}}{options}
\defunx{with-logfile}{logfile \ovar{options}}{options}
\defunx{with-syslog?}{syslog? \ovar{options}}{options}
\defunx{with-resolve-ip?}{resolve-ip? \ovar{options}}{options}
\begin{desc} \begin{desc}
As noted above, these transformers set the options for the web This specifies the current directory of the server. Note that this
server. Every transformer changes one aspect of the is \emph{not} the document root directory. Defaults to \texttt{/}.
\semvar{options} (for the \ex{httpd}). If this optional argument is missing, the \end{desc}
default values are used. These are the following:
\begin{tabular}{ll} \defun{with-fqdn}{fqdn [options]}{options}
\bf{transformer} & \bf{default value} \\ \begin{desc}
\hline This specifies the fully-qualified domain name the server uses in
\ex{with\=port} & 80 \\ automatically generated replies, or \ex{\#f} if the server should
\ex{with\=root\=directory} & ``\ex{/}'' \\ query DNS for the fully-qualified domain name.. Defaults to \ex{\#f}.
\ex{with\=fqdn} & \sharpf \\ \end{desc}
\ex{with\=reported-port} & \sharpf \\
\ex{with\=path\=handler} & \sharpf \\
\ex{with\=server\=admin} & \sharpf \\
\ex{with\=simultaneous\=requests} & \sharpf \\
\ex{with\=logfile} & ``\ex{/logfile.log}''\\
\ex{with\=syslog?} & \sharpt \\
\ex{with\=resolve\=ip?} & \sharpt
\end{tabular}
% that can be found in the \ex{httpd\=make\=options}-structure: \defun{with-reported-port}{reported-port [options]}{options}
% \ex{with\=port}, \ex{with\=root\=directory}, \ex{with\=fqdn}, \begin{desc}
% \ex{with\=reported-port}, \ex{with\=path\=handler}, This specifies the port number the server uses in automatically
% \ex{with\=server\=admin}, \ex{with\=simultaneous-requests}, generated replies or \ex{\#f} if the reported port is the same as
% \ex{with\=logfile}, \ex{with\=syslog?} that set the port the server the port the server is listening on. (This is useful if you're
% is listening to, the root-directory of the server, the FQDN of the running the server through an accelerating proxy.) Defaults to
% server, the port the server assumes it is listening to, the \ex{\#f}.
% path-handler of the server (see below), the mail-address of the \end{desc}
% server-admin, the maximum number of simultaneous handled requests,
% the name of the file or the port logging in the Common Log Format
% (CLF) is output to and if the server shall create syslog messages,
% respectively. The port defaults to 80, the root directory defaults
% to ``\ex{/}'', the mail address of the server-admin defaults to
% ``\ex{sperber@\ob{}informatik.\ob{}uni\=tuebingen.\ob{}de}'',
% \FIXME{Why does the server admin mail address have
% sperber@informatik... as default value?}logging is done to
% ``\ex{httpd.log}'' and syslog is enabled. All other options default
% to \sharpf.
For example \defun{with-server-admin}{mail-address [options]}{options}
\begin{alltt} \begin{desc}
(httpd (with-path-handler This specifies the email address of the server administrator the
(rooted-file-handler "/usr/local/etc/httpd") server uses in automatically generated replies. Defaults to \ex{\#f}.
(with-root-directory "/usr/local/etc/httpd"))) \end{desc}
\end{alltt}
\defun{with-icon-name}{icon-name [options]}{options}
\begin{desc}
This specifies how to generate the links to various decorative icons
for the listings. It can either be a procedure which gets passed an
icon tag (a symbol) and is expected to return a link pointing to the icon. If
it is a string, that is taken as prefix to which the icon tag are
appended. If \ex{\#f}, just the plain file names will be used. Defaults to \ex{\#f}.
The valid icon tags, together with the default names of their icon
files, are:
\begin{center}
\begin{tabular}{|l|l|}
\hline
\texttt{directory} & \texttt{directory.xbm}\\\hline
\texttt{text} & \texttt{text.xbm}\\\hline
\texttt{doc} & \texttt{doc.xbm}\\\hline
\texttt{image} & \texttt{image.xbm}\\\hline
\texttt{movie} & \texttt{movie.xbm}\\\hline
\texttt{audio} & \texttt{sound.xbm}\\\hline
\texttt{archive} & \texttt{tar.xbm}\\\hline
\texttt{compressed} & \texttt{compressed.xbm}\\\hline
\texttt{uu} & \texttt{uu.xbm}\\\hline
\texttt{binhex} & \texttt{binhex.xbm}\\\hline
\texttt{binary} & \texttt{binary.xbm}\\\hline
\texttt{blank} & \texttt{blank.xbm}\\\hline
\texttt{back} & \texttt{back.xbm}\\\hline
unknown & \texttt{unknown.xbm}\\\hline
\end{tabular}
Example icons can be found as part of the CERN httpd distribution
at \url{http://www.w3.org/pub/WWW/Daemon/}.
\end{center}
\end{desc}
\defun{with-request-handler}{request-handler [options]}{options}
\begin{desc}
This specifies the request handler of the server to which the server
delegates the actual work. More on that subject below in
Section~\ref{httpd:request-handlers}. This parameter must be specified.
\end{desc}
\defun{with-simultaneous-requests}{requests [options]}{options}
\begin{desc}
This specifies a limit on the number of simultaneous requests the
server servers. If that limit is exceeded during operation, the
server will hold off on new requests until the number of
simultaneous requests has sunk below the limit again. If this
parameter is \ex{\#f}, no limit is imposed. Defaults to \ex{\#f}.
\end{desc}
\defun{with-logfile}{logfile [options]}{options}
\begin{desc}
This specifies the name of a log file for the server where it writes
Common Log Format logging information. It can also be a port in
which case the information is logged to that port, or \ex{\#f} for
no logging. Defaults to \ex{\#f}.
To allow rotation of logfiles, the server re-opens the logfile
whenever it receives a \texttt{USR1} signal.
\end{desc}
\defun{with-syslog?}{syslog? [options]}{options}
\begin{desc}
This specifies whether the server will log information about
incoming to the Unix syslog facility. Defaults to \ex{\#t}.
\end{desc}
\defun{with-resolve-ip?}{resolve-ip? [options]}{options}
\begin{desc}
This specifies whether the server writes the domain names rather
than numerical IPs to the output log it produces. Defaults to
\ex{\#t}.
\end{desc}
To avoid paranthitis, the \ex{make-httpd-options} procedure eases the
construction of the options argument:
\defun{make-httpd-options}{transformer value \ldots}{options}
\begin{desc}
This constructs an options value from an argument list of parameter
transformers and parameter values. The arguments come in pairs,
each an option transformer from the list above, and a value for that
parameter. \ex{Make-httpd-options} returns the resulting options value.
\end{desc}
For example,
\begin{alltt}
(httpd (make-httpd-options
with-request-handler (rooted-file-handler "/usr/local/etc/httpd")
with-root-directory "/usr/local/etc/httpd"))
\end{alltt}
%
starts the server on port 80 with starts the server on port 80 with
``\ex{/usr/\ob{}local/\ob{}etc/\ob{}httpd}'' as root directory and \ex{/usr/local/etc/httpd} as its root directory and
lets it serve any file out from this directory. lets it serve any file out from this directory.
\ex{rooted\=file\=handler} creates a path handler and is explained % #### note about rooted-file-handler
below. You see, the transformers are used nested. So, every
transformer changes one aspect of the options that the following
transformer returns and the last transformer (here:
\ex{with\=root\=directory}) changes an aspect of the default values
\semvar{port} is the port the server is listening to,
\semvar{root-directory} is the directory in the file system the
server uses as root, \semvar{fqdn} is the fully qualified domain
name the server reports, \semvar{reported-port} is the port the
server reports it is listening to and \semvar{server-admin} is the
mail address of the server admin. \semvar{requests} denote the
maximum number of allowed simultaneous requests to the server.
\sharpf\ means infinite. \semvar{logfile} is either a string, then
it is the file name of the logfile, or a port, where the log entries
are written to, or \sharpf, that means no logging is made. The
logfile is in Common Log Format (CLF). To allow rotation of
logfiles, the server will reopen the logfile when it receives the
signal \texttt{USR1}. \semvar{syslog?} tells the server to write
syslog messages (\sharpt) or not (\sharpf).
\end{desc}
\section{Path handlers} \section{Requests}
\label{httpd:path-handlers} \label{httpd:requests}
A path handler is a procedure taking two arguments: Request handlers operate on \textit{requests} which contain the
\defun{path-handler}{path req}{value} information needed to generate a page. The relevant procedures to
dissect requests are defined in the \texttt{httpd-requests} structure:
\defun{request?}{value}{boolean}
\defunx{request-method}{request}{string}
\defunx{request-uri}{request}{string}
\defunx{request-url}{request}{url}
\defunx{request-version}{request}{pair}
\defunx{request-headers}{request}{list}
\defunx{request-socket}{request}{socket}
\begin{desc} \begin{desc}
The \semvar{req} argument is a request record giving all the details The procedure inspect request values. \ex{Request?} is a predicate
of the client's request; it has the following structure: \FIXME{Make for requests. \ex{Request-method} extracts the method of the HTTP
the record's structure a table} request; it's a string such as \verb|"GET"|, \verb|"PUT"|.
\begin{alltt} \ex{Request-uri} returns the escaped URI string as read from request
(define-record request line. \ex{Request-url} returns an HTTP URL value (see the
method ; A string such as "GET", "PUT", etc. description of the \ex{url} structure in \ref{secchap:url}).
uri ; The escaped URI string as read from request line. \ex{Request-version} returns \verb|(major . minor)| integer pair
url ; An http URL record (see url.scm). representing the version specified in the HTTP request.
version ; A (major . minor) integer pair. \ex{Request-headers} returns an association lists of header field
headers ; An rfc822 header alist (see rfc822.scm). names and their values, each represented by a list of strings, one
socket) ; The socket connected to the client. for each line. \ex{Request-socket} returns the the socket connected
\end{alltt} to the client.\footnote{Request handlers should not perform I/O on the
request record's socket. Request handlers are frequently called
The \semvar{path} argument is the URL's path, parsed and split at recursively, and doing I/O directly to the socket might bypass a
slashes into a string list. For example, if the Web client filtering or other processing step interposed on the current I/O ports
dereferences URL by some superior request handler.}
\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz}
then the server would pass the following path to the top-level
handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{}
"web.\ob{}tar.\ob{}gz")}
The \semvar{path} argument's pre-parsed representation as a string
list makes it easy for the path handler to implement recursive
operations dispatch on URL paths.
\end{desc} \end{desc}
Path handlers can do anything they like to respond to HTTP requests;
they have the full range of Scheme to implement the desired
functionality. When handling HTTP requests that have an associated
entity body (such as POST), the body should be read from the current
input port. Path handlers should in all cases write their reply to the
current output port. Path handlers should not perform I/O on the
request record's socket. Path handlers are frequently called
recursively, and doing I/O directly to the socket might bypass a
filtering or other processing step interposed on the current I/O ports
by some superior path handler.
\section{Basic path handlers} \section{Responses}
\label{sec:http-responses}
Although the user can write any path-handler he likes, the SUnet web server
comes with a useful toolbox of basic path handlers that can be used A path handler must return a \textit{response} value representing the
and built upon (exported by the \ex{httpd\=basic\=handlers}-structure): content to be sent to the client. The machinery presented here for
constructing responses lives in the \ex{httpd-responses} structure.
\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler}
This procedure takes a \ex{string->\ob{}path\=handler} alist, and a \defun{make-response}{status-code maybe-message seconds mime extras
default path handler, and returns a handler that dispatches on its body}{response}
path argument. When the new path handler is applied to a path \begin{desc}
\ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of This procedure constructs a response value. \var{Status-code} is an
the path -- ``\ex{foo}'' -- to index into the alist. If it finds an HTTP status code (more on that below). \var{Maybe-message} is a a
associated path handler in the alist, it hands the request off to message elaborating on the circumstances of the status code; it can
that handler, passing it the tail of the path, \ex{("bar"\ob{} also be \sharpf{} meaning that the server should send a default
"baz")}. On the other hand, if the path is empty, or the alist message associated with the status code. \var{Seconds} natural
search does not yield a hit, we hand off to the default path number indicating the time the content was created, typically the
handler, passing it the entire original path, \ex{("foo"\ob{} value of \verb|(time)|. \var{Mime} is a string indicating the MIME
"bar"\ob{} "baz")}. type of the response (such as \verb|"text/html"| or
\verb|"application/octet-stream"|). \var{Extras} is an association
list with extra headers to be added to the response; its elements
are pairs, each of which consists of a symbol representing the field
name and a string representing the field value. \var{Body}
represents the body of the response; more on that below.
\end{desc}
\defun{make-error-response}{status-code request [message] extras \ldots}{response}
\begin{desc}
This is a helper procedure for constructing error responses.
\var{code} is status code of the response (see below). \var{Request}
is the request that led to the error. \var{Message} is an optional
string containing an error message written in HTML, and \var{extras}
are further optional arguments containing further message lines to
be added to the web page that's generated.
\ex{Make-error-response} constructs a response value which generates
a web page containg a short explanatory message for the error at hand.
\end{desc}
\begin{table}[htb]
\centering
\begin{tabular}{|l|l|l|}
\hline
ok & 200 & OK\\\hline
created & 201 & Created\\\hline
accepted & 202 & Accepted\\\hline
prov-info & 203 & Provisional Information\\\hline
no-content & 204 & No Content\\\hline
mult-choice & 300 & Multiple Choices\\\hline
moved-perm & 301 & Moved Permanently\\\hline
moved-temp & 302 & Moved Temporarily\\\hline
method & 303 & Method (obsolete)\\\hline
not-mod & 304 & Not Modified\\\hline
bad-request & 400 & Bad Request\\\hline
unauthorized & 401 & Unauthorized\\\hline
payment-req & 402 & Payment Required\\\hline
forbidden & 403 & Forbidden\\\hline
not-found & 404 & Not Found\\\hline
method-not-allowed & 405 & Method Not Allowed\\\hline
none-acceptable & 406 & None Acceptable\\\hline
proxy-auth-required & 407 & Proxy Authentication Required\\\hline
timeout & 408 & Request Timeout\\\hline
conflict & 409 & Conflict\\\hline
gone & 410 & Gone\\\hline
internal-error & 500 & Internal Server Error\\\hline
not-implemented & 501 & Not Implemented\\\hline
bad-gateway & 502 & Bad Gateway\\\hline
service-unavailable & 503 & Service Unavailable\\\hline
gateway-timeout & 504 & Gateway Timeout\\\hline
\end{tabular}
\caption{HTTP status codes}
\label{tab:status-code-names}
\end{table}
\dfn{status-code}{\synvar{name}}{status-code}{syntax}
\defunx{name->status-code}{symbol}{status-code}
\defunx{status-code-number}{status-code}{integer}
\defunx{status-code-message}{status-code}{string}
\begin{desc}
The \ex{status-code} syntax returns a status code where
\synvar{name} is the name from Table~\ref{tab:status-code-names}.
\ex{Name->status-code} also returns a status code for a name
represented as a symbol. For a given status code,
\ex{status-code-number} extracts its number, and
\ex{status-code-message} extracts its associated default message.
\end{desc}
\section{Request Handlers}
A request handler generates the actual content for a request; request
handlers form a simple algebra and may be combined and composed in
various ways.
A request handler is a procedure of two arguments like this:
\defun{request-handler}{path req}{response}
\begin{desc}
\var{Req} is a request. The \semvar{path} argument is the URL's
path, parsed and split at slashes into a string list. For example,
if the Web client dereferences URL
%
\begin{verbatim}
http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz
\end{verbatim}
then the server would pass the following path to the top-level
handler:
%
\begin{verbatim}
("h" "shivers" "code" "web.tar.gz")
\end{verbatim}
%
The \var{path} argument's pre-parsed representation as a string
list makes it easy for the request handler to implement recursive
operations dispatch on URL paths.
The request handler must return an HTTP response.
\end{desc}
\subsection{Basic Request Handlers}
The web server comes with a useful toolbox of basic request handlers
that can be used and built upon. The following procedures are
exported by the \ex{httpd\=basic\=handlers} structure:
\defvar{null-request-handler}{request-handler}
\begin{desc}
This request handler always generated a \ex{not-found} error
response, no patter what the request is.
\end{desc}
\defun{make-predicate-handler}{predicate handler
default-handler}{request-handler}
\begin{desc}
The request handler returned by this procedure first calls
\var{predicate} on its path and request; it then acts like
\var{handler} if the predicate returned a true vale, and like
\var{default-handler} if the predicate returned \sharpf.
\end{desc}
\defun{make-host-name-handler}{hostname handler default-handler}{request-handler}
\begin{desc}
The request handler returned by this procedure compares the host
name specified in the request with \var{hostname}: if they match, it
acts like \var{handler}, otherwise, it acts like
\var{default-handler}.
\end{desc}
\defun{make-path-predicate-handler}{predicate handler
default-handler}{request-handler}
\begin{desc}
The request handler returned by this procedure first calls
\var{predicate} on its path; it then acts like \var{handler} if the
predicate returned a true vale, and like \var{default-handler} if
the predicate returned \sharpf.
\end{desc}
\defun{make-path-prefix-handler}{path-prefix handler default-handler}{request-handler}
\begin{desc}
This constructs a request handler that calls \var{handler} on its
argument if \var{path-prefix} (a string) is the first element of the
requested path; it calls \var{handler} on the rest of the path and
the original request. Otherwise, the handler acts like
\var{default-handler}.
\end{desc}
\defun{alist-path-dispatcher}{handler-alist default-handler}{request-handler}
\begin{desc}
This procedure takes as arguments an alist mapping strings to path
handlers, and a default request handler, and returns a handler that
dispatches on its path argument. When the new request handler is
applied to a path
\begin{verbatim}
("foo" "bar" "baz")
\end{verbatim}
it uses the
first element of the path---\ex{foo}---to index into the
alist. If it finds an associated request handler in the alist, it
hands the request off to that handler, passing it the tail of the
path, in this case
\begin{verbatim}
("bar" "baz")
\end{verbatim}
%
On the other hand, if the path is
empty, or the alist search does not yield a hit, we hand off to the
default path handler, passing it the entire original path,
\begin{verbatim}
("foo" "bar" "baz")
\end{verbatim}
%
This procedure is how you say: ``If the first element of the URL's This procedure is how you say: ``If the first element of the URL's
path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one path is `foo', do X; if it's `bar', do Y; otherwise, do Z.''
takes an object-oriented view of the process, an alist path-handler
does method lookup on the requested operation, dispatching off to
the appropriate method defined for the URL.
The slash-delimited URI path structure implies an associated tree of The slash-delimited URI path structure implies an associated tree of
names. The path-handler system and the alist dispatcher allow you to names. The request-handler system and the alist dispatcher allow you to
procedurally define the server's response to any arbitrary subtree procedurally define the server's response to any arbitrary subtree
of the path space. of the path space.
Example: A typical top-level path handler is Example: A typical top-level request handler is
\begin{alltt} \begin{alltt}
(define ph (define ph
(alist-path-dispatcher (alist-path-dispatcher
@ -265,9 +420,9 @@ and built upon (exported by the \ex{httpd\=basic\=handlers}-structure):
\item If the path looks like \ex{("h"\ob{} "shivers"\ob{} \item If the path looks like \ex{("h"\ob{} "shivers"\ob{}
"code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path "code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path
\ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a \ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a
home-directory path handler. home-directory request handler.
\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")}, \item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")},
pass ("calendar") off to the CGI path handler. pass ("calendar") off to the CGI request handler.
\item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail \item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail
of the path is passed off to the code-uploading seval path of the path is passed off to the code-uploading seval path
handler. handler.
@ -276,133 +431,21 @@ and built upon (exported by the \ex{httpd\=basic\=handlers}-structure):
\ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs}, \ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs},
and serve that file. and serve that file.
\end{itemize} \end{itemize}
\end{defundesc} \end{desc}
\begin{defundesc}{home-dir-handler}{subdir}{path-handler}
This procedure builds a path handler that does basic file serving
out of home directories. If the resulting \semvar{path-handler} is
passed a path of \ex{(user . file\=path)}, then it serves the file
\ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path}
The path handler only handles GET requests; the filename is not
allowed to contain \ex{..} elements.
\end{defundesc}
\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler}
This path handler examines the car of the path. If it is a string
beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is
taken to mean a home directory, and the request is served similarly
to a home-dir-handler path handler. Otherwise, the request is passed
off in its entirety to the \semvar{default-path-handler}.
This procedure is useful for implementing servers that provide the
semantics of the NCSA httpd server.
\end{defundesc}
\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler}
This procedure returns a path-handler that passes the request off to
some program using the CGI interface. The script name is taken from
the car of the path; it is checked for occurrences of \ex{..}'s. If
the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the
program executed is
\ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}.
When the CGI path handler builds the process environment for the CGI \subsection{Static Content Request Handlers}
script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be
computed at server start-up time. This can be done by calling
\codex{(initialise-request-invariant-cgi-env)}
when the server starts up. This is not necessary, but will make CGI
requests a little faster.
\end{defundesc}
\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler}
Returns a path handler that serves files from a particular root in
the file system. Only the GET operation is provided. The path
argument passed to the handler is converted into a filename, and
appended to root-dir. The file name is checked for \ex{..}
components, and the transaction is aborted if it does. Otherwise,
the file is served to the client.
\end{defundesc}
\begin{defundesc}{rooted-file-or-directory-handler}{root The request handlers described in this section are for serving static
icon-name}{path-handler} content off directory trees in the file system. They live in the
\ex{httpd-file-directory-handlers} structure.
Dito, but also serve directory indices for directories without The request handlers in this section eventually call an internal
\ex{index.\ob{}html}. \semvar{icon-name} specifies how to generate procedure named \ex{file\=serve} for serving files which implements a
the links to various decorative icons for the listings. It can either simple directory-generation service using the following rules:
be a procedure which gets passed one of the icon tags listed below and
is expected to return a link pointing to the icon. If it is a string,
that is taken as prefix to which the file names of the tags listed
below are appended.
\begin{tabular}{ll}
Tag & Icon's file name \\
\hline
\ex{directory} & \ex{directory.xbm}\\
\ex{text} & \ex{text.xbm}\\
\ex{doc} & \ex{doc.xbm}\\
\ex{image} & \ex{image.xbm}\\
\ex{movie} & \ex{movie.xbm}\\
\ex{audio} & \ex{sound.xbm}\\
\ex{archive} & \ex{tar.xbm}\\
\ex{compressed} & \ex{compressed.xbm}\\
\ex{uu} & \ex{uu.xbm}\\
\ex{binhex} & \ex{binhex.xbm}\\
\ex{binary} & \ex{binary.xbm}\\
\ex{blank} & \ex{blank.xbm}\\
\ex{back} & \ex{back.xbm}\\
\ex{\it{}else} & \ex{unknown.xbm}\\
\end{tabular}
\end{defundesc}
\begin{defundesc}{null-path-handler}{path req}{\noreturn}
This path handler is useful as a default handler. It handles no
requests, always returning a ``404 Not found'' reply to the client.
\end{defundesc}
\section{HTTP errors}
Authors of path-handlers need to be able to handle errors in a
reasonably simple fashion. The SUnet Web server provides a set of error
conditions that correspond to the error replies in the HTTP protocol.
These errors can be raised with the \ex{http\=error} procedure. When
the server runs a path handler, it runs it in the context of an error
handler that catches these errors, sends an error reply to the client,
and closes the transaction.
\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn}
This raises an http error condition. The reply code is one of the
numeric HTTP error reply codes, which are bound to the variables
\ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found,
http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The
\semvar{req} argument is the request record that caused the error.
Any following extra args are passed along for informational
purposes. Different HTTP errors take different types of extra
arguments. For example, the ``301 moved permanently'' and ``302
moved temporarily'' replies use the first two extra values as the
\ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header,
respectively. See the clauses of the
\ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details.
\end{defundesc}
\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn}
This procedure writes an error reply out to the current output port.
If an error occurs during this process, it is caught, and the
procedure silently returns. The http server's standard error handler
passes all http errors raised during path-handler execution to this
procedure to generate the error reply before aborting the request
transaction.
\end{defundesc}
\section{Simple directory generation}
Most path-handlers that serve files to clients eventually call an
internal procedure named \ex{file\=serve}, which implements a simple
directory-generation service using the following rules:
\begin{itemize} \begin{itemize}
\item If the filename has the form of a directory (i.e., it ends with \item If the filename has the form of a directory (i.e., it ends with
a slash), then \ex{file\=serve} actually looks for a file named a slash), then \ex{file\=serve} actually looks for a file named
``index.html'' in that directory. \ex{index.html} in that directory.
\item If the filename names a directory, but is not in directory form \item If the filename names a directory, but is not in directory form
(i.e., it doesn't end in a slash, as in (i.e., it doesn't end in a slash, as in
``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''), ``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''),
@ -415,49 +458,74 @@ directory-generation service using the following rules:
\item If the filename names a regular file, it is served to the \item If the filename names a regular file, it is served to the
client. client.
\end{itemize} \end{itemize}
\defun{rooted-file-handler}{root-dir}{request-handler}
\begin{desc}
This returns a request handler that serves files from a particular
root in the file system. Only the \ex{GET} operation is provided.
The path argument passed to the handler is converted into a
filename, and appended to root-dir. The file name is checked for
\ex{..} components, and the transaction is aborted if it does.
Otherwise, the file is served to the client.
\end{desc}
\defun{rooted-file-or-directory-handler}{root}{request-handler}
\begin{desc}
Dito, but also serve directory indices for directories without
\ex{index.html}.
\end{desc}
\defun{home-dir-handler}{subdir}{request-handler}
\begin{desc}
This procedure builds a request handler that does basic file serving
out of home directories. If the resulting \var{request-handler} is
passed a path of the form \ex{(\var{user} . \var{file-path})}, then it serves the file
\ex{\var{subdir}/\var{file-path}} inside the user's home directory.
The request handler only handles GET requests; the filename is not
allowed to contain \ex{..} elements.
\end{desc}
\defun{tilde-home-dir-handler}{subdir
default-request-handler}{request-handler}
\begin{desc}
This returns request handler that examines the car of the path. If
it is a string beginning with a tilde, e.g., \ex{"~ziggy"}, then the
string is taken to mean a home directory, and the request is served
similarly to a home-dir-handler request handler. Otherwise, the
request is passed off in its entirety to the
\var{default-request-handler}.
\end{desc}
\section{CGI Server} \section{CGI Server}
\begin{defundesc}{cgi-handler}{bin-dir \ovar{cgi-bin-dir}}{path-handler} \defun{cgi-handler}{bin-dir [cgi-bin-path]}{request-handler}
Returns a path handler (see \ref{httpd:path-handlers} for details \begin{desc}
about path handlers) for cgi-scripts located in Returns a request handler for CGI scripts located in
\semvar{bin-dir}. \semvar{cgi-bin-dir} specifies the value of the \var{bin-dir}. \var{Cgi-bin-dir} specifies the value of the
\ex{PATH} variable of the environment the cgi-scripts run in. It defaults \ex{PATH} variable of the environment the CGI scripts run in. It defaults
to to
``\ex{/bin:\ob{}/usr/bin:\ob{}/usr/ucb:\ob{}/usr/bsd:\ob{}/usr/local/bin}'' \begin{verbatim}
but is overwritten by the current \ex{PATH} environment variable at /bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin
the time \ex{cgi-handler} ist called. The cgi-scripts are called as \end{verbatim}
specified by CGI/1.1\footnote{see The CGI scripts are called as specified by CGI/1.1\footnote{see
\ex{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of \url{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of
specification.}. specification.}.
\begin{itemize}
\item Various environment variables are set (like
\ex{QUERY\_STRING} or \ex{REMOTE\_HOST}).
\item ISINDEX queries get their arguments as command line arguments.
\item Scripts are handled differently according to their name:
Note that the CGI handler looks at the name of the CGI script to
determine how it should be handled:
\begin{itemize} \begin{itemize}
\item If the name of the script starts with `\ex{nph-}', its reply
\item If the name of the script starts with `\ex{nph-}', its reply is read, the RFC~822-fields like \ex{Content-Type} and \ex{Status}
is read, the RFC~822-fields like ``Content-Type'' and ``Status''
are parsed and the client is sent back a real HTTP reply, are parsed and the client is sent back a real HTTP reply,
containing the rest of the script's output. containing the rest of the script's output.
\item If the name of the script doesn't start with `\ex{nph-}', \item If the name of the script doesn't start with `\ex{nph-}',
its output is sent back to the client directly. If its return code its output is sent back to the client directly. If its return code
is not zero, an error message is generated. is not zero, an error message is generated.
\end{itemize} \end{itemize}
\end{itemize} \end{desc}
\end{defundesc}
\section{Support procs}
The source files contain a host of support procedures which will be of
utility to anyone writing a custom path-handler. Read the files first.
\FIXME{Let us read the files and paste the contents here.}
%%% Local Variables: %%% Local Variables:
%%% mode: latex %%% mode: latex
%%% TeX-master: "man" %%% TeX-master: "man"