465 lines
20 KiB
TeX
465 lines
20 KiB
TeX
\chapter{HTTP server}\label{cha:httpd}
|
|
%
|
|
\begin{description}
|
|
\item[Used files:] httpd/core.scm, httpd/handlers.scm, httpd/options.scm,
|
|
\item[Name of the packages:] httpd-core, httpd-basic-handler, httpd-make-options
|
|
\end{description}
|
|
There are also some other files and packages that are used internally.
|
|
%
|
|
|
|
The SUnet web system is a collection of packages of \Scheme code that
|
|
provides utilities for interacting with the World-Wide Web. This
|
|
includes:
|
|
\begin{itemize}
|
|
\item A Web server.
|
|
\item URI and URL parsers and un-parsers (see Chapters \ref{cha:uri}
|
|
and \ref{cha:url}).
|
|
\item RFC822-style header parsers (see Chapter \ref{cha:rfc822}).
|
|
\item Code for performing structured html output
|
|
\item Code to assist in writing CGI \Scheme programs that can be used by
|
|
any CGI-compliant HTTP server (such as NCSA's httpd, or the SUnet
|
|
Web server).
|
|
\end{itemize}
|
|
|
|
The server has three main design goals:
|
|
\begin{description}
|
|
\item[Extensibility]
|
|
The server is in fact nothing but extensions, using a mechanism
|
|
called ``path handlers'' to define URL-specific services. It has a
|
|
toolkit of services that can be used as-is, extended or built
|
|
upon. User extensions have exactly the same status as the base
|
|
services.
|
|
|
|
The extension mechanism allows for easy implementation of new
|
|
services without the overhead of the CGI interface. Since the
|
|
server is written on top of the Scheme shell, the full set of Unix
|
|
system calls and program tools is available to the implementor.
|
|
|
|
\item[Mobile code]
|
|
The server allows Scheme code to be uploaded for direct execution
|
|
inside the server. The server has complete control over the code,
|
|
and can safely execute it in restricted environments that do not
|
|
provide access to potentially dangerous primitives (such as the
|
|
``delete file'' procedure.)
|
|
|
|
\item[Clarity]
|
|
I\footnote{That's Olin Shivers (\ex{shivers@ai.mit.edu},
|
|
\ex{http://www.\ob{}ai.\ob{}mit.\ob{}edu/\ob{}people/\ob{}shivers/}).
|
|
For the rest of the documentation, if not mentioned otherwise,
|
|
`I' refers to him.} wrote this server to help myself understand
|
|
the Web. It is voluminously commented, and I hope it will prove to
|
|
be an aid in understanding the low-level details of the Web
|
|
protocols.
|
|
|
|
The SUnet web server has the ability to upload code from Web clients
|
|
and execute that code on behalf of the client in a protected
|
|
environment.
|
|
|
|
Some simple documentation on the server is available.
|
|
\end{description}
|
|
|
|
\section{Basic server structure}
|
|
|
|
The Web server is started by calling the httpd procedure, which takes
|
|
one argument, a \ex{httpd\=options}-record:
|
|
|
|
\defun{httpd}{options}{\noreturn}
|
|
\begin{desc}
|
|
This procedure starts the server. The various \semvar{options} can
|
|
be set via the options transformers that are explained below.
|
|
|
|
The server's basic loop is to wait on the port for a connection from
|
|
an HTTP client. When it receives a connection, it reads in and
|
|
parses the request into a special request data structure. Then the
|
|
server forks a thread, who binds the current I/O ports to the
|
|
connection socket, and then hands off to the top-level
|
|
\semvar{path-handler} (the first argument to httpd). The
|
|
\semvar{path-handler} procedure is responsible for actually serving
|
|
the request -- it can be any arbitrary computation. Its output goes
|
|
directly back to the HTTP client that sent the request.
|
|
|
|
Before calling the path handler to service the request, the HTTP
|
|
server installs an error handler that fields any uncaught error,
|
|
sends an error reply to the client, and aborts the request
|
|
transaction. Hence any error caused by a path-handler will be
|
|
handled in a reasonable and robust fashion.
|
|
|
|
The basic server loop, and the associated request data structure are
|
|
the fixed architecture of the SUnet Web server; its flexibility lies
|
|
in the notion of path handlers.
|
|
\end{desc}
|
|
|
|
\defun{with-port}{port \ovar{options}}{options}
|
|
\defunx{with-root-directory}{root-directory
|
|
\ovar{options}}{options}
|
|
\defunx{with-fqdn}{fqdn \ovar{options}}{options}
|
|
\defunx{with-reported-port}{reported-port
|
|
\ovar{options}}{options}
|
|
\defunx{with-path-handler}{path-handler
|
|
\ovar{options}}{options}
|
|
\defunx{with-server-admin}{mail-address
|
|
\ovar{options}}{options}
|
|
\defunx{with-simultaneous-requests}{requests
|
|
\ovar{options}}{options}
|
|
\defunx{with-logfile}{logfile \ovar{options}}{options}
|
|
\defunx{with-syslog?}{syslog? \ovar{options}}{options}
|
|
\defunx{with-resolve-ip?}{resolve-ip? \ovar{options}}{options}
|
|
\begin{desc}
|
|
As noted above, these transformers set the options for the web
|
|
server. Every transformer changes one aspect of the
|
|
\semvar{options} (for the \ex{httpd}). If this optional argument is missing, the
|
|
default values are used. These are the following:
|
|
|
|
\begin{tabular}{ll}
|
|
\bf{transformer} & \bf{default value} \\
|
|
\hline
|
|
\ex{with\=port} & 80 \\
|
|
\ex{with\=root\=directory} & ``\ex{/}'' \\
|
|
\ex{with\=fqdn} & \sharpf \\
|
|
\ex{with\=reported-port} & \sharpf \\
|
|
\ex{with\=path\=handler} & \sharpf \\
|
|
\ex{with\=server\=admin} & \sharpf \\
|
|
\ex{with\=simultaneous\=requests} & \sharpf \\
|
|
\ex{with\=logfile} & ``\ex{/logfile.log}''\\
|
|
\ex{with\=syslog?} & \sharpt \\
|
|
\ex{with\=resolve\=ip?} & \sharpt
|
|
\end{tabular}
|
|
|
|
% that can be found in the \ex{httpd\=make\=options}-structure:
|
|
% \ex{with\=port}, \ex{with\=root\=directory}, \ex{with\=fqdn},
|
|
% \ex{with\=reported-port}, \ex{with\=path\=handler},
|
|
% \ex{with\=server\=admin}, \ex{with\=simultaneous-requests},
|
|
% \ex{with\=logfile}, \ex{with\=syslog?} that set the port the server
|
|
% is listening to, the root-directory of the server, the FQDN of the
|
|
% server, the port the server assumes it is listening to, the
|
|
% path-handler of the server (see below), the mail-address of the
|
|
% server-admin, the maximum number of simultaneous handled requests,
|
|
% the name of the file or the port logging in the Common Log Format
|
|
% (CLF) is output to and if the server shall create syslog messages,
|
|
% respectively. The port defaults to 80, the root directory defaults
|
|
% to ``\ex{/}'', the mail address of the server-admin defaults to
|
|
% ``\ex{sperber@\ob{}informatik.\ob{}uni\=tuebingen.\ob{}de}'',
|
|
% \FIXME{Why does the server admin mail address have
|
|
% sperber@informatik... as default value?}logging is done to
|
|
% ``\ex{httpd.log}'' and syslog is enabled. All other options default
|
|
% to \sharpf.
|
|
|
|
For example
|
|
\begin{alltt}
|
|
(httpd (with-path-handler
|
|
(rooted-file-handler "/usr/local/etc/httpd")
|
|
(with-root-directory "/usr/local/etc/httpd")))
|
|
\end{alltt}
|
|
|
|
starts the server on port 80 with
|
|
``\ex{/usr/\ob{}local/\ob{}etc/\ob{}httpd}'' as root directory and
|
|
lets it serve any file out from this directory.
|
|
\ex{rooted\=file\=handler} creates a path handler and is explained
|
|
below. You see, the transformers are used nested. So, every
|
|
transformer changes one aspect of the options that the following
|
|
transformer returns and the last transformer (here:
|
|
\ex{with\=root\=directory}) changes an aspect of the default values
|
|
|
|
|
|
\semvar{port} is the port the server is listening to,
|
|
\semvar{root-directory} is the directory in the file system the
|
|
server uses as root, \semvar{fqdn} is the fully qualified domain
|
|
name the server reports, \semvar{reported-port} is the port the
|
|
server reports it is listening to and \semvar{server-admin} is the
|
|
mail address of the server admin. \semvar{requests} denote the
|
|
maximum number of allowed simultaneous requests to the server.
|
|
\sharpf\ means infinite. \semvar{logfile} is either a string, then
|
|
it is the file name of the logfile, or a port, where the log entries
|
|
are written to, or \sharpf, that means no logging is made. The
|
|
logfile is in Common Log Format (CLF). To allow rotation of
|
|
logfiles, the server will reopen the logfile when it receives the
|
|
signal \texttt{USR1}. \semvar{syslog?} tells the server to write
|
|
syslog messages (\sharpt) or not (\sharpf).
|
|
\end{desc}
|
|
|
|
\section{Path handlers}
|
|
\label{httpd:path-handlers}
|
|
|
|
A path handler is a procedure taking two arguments:
|
|
\defun{path-handler}{path req}{value}
|
|
\begin{desc}
|
|
The \semvar{req} argument is a request record giving all the details
|
|
of the client's request; it has the following structure: \FIXME{Make
|
|
the record's structure a table}
|
|
\begin{alltt}
|
|
(define-record request
|
|
method ; A string such as "GET", "PUT", etc.
|
|
uri ; The escaped URI string as read from request line.
|
|
url ; An http URL record (see url.scm).
|
|
version ; A (major . minor) integer pair.
|
|
headers ; An rfc822 header alist (see rfc822.scm).
|
|
socket) ; The socket connected to the client.
|
|
\end{alltt}
|
|
|
|
The \semvar{path} argument is the URL's path, parsed and split at
|
|
slashes into a string list. For example, if the Web client
|
|
dereferences URL
|
|
\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz}
|
|
then the server would pass the following path to the top-level
|
|
handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{}
|
|
"web.\ob{}tar.\ob{}gz")}
|
|
|
|
The \semvar{path} argument's pre-parsed representation as a string
|
|
list makes it easy for the path handler to implement recursive
|
|
operations dispatch on URL paths.
|
|
\end{desc}
|
|
|
|
Path handlers can do anything they like to respond to HTTP requests;
|
|
they have the full range of Scheme to implement the desired
|
|
functionality. When handling HTTP requests that have an associated
|
|
entity body (such as POST), the body should be read from the current
|
|
input port. Path handlers should in all cases write their reply to the
|
|
current output port. Path handlers should not perform I/O on the
|
|
request record's socket. Path handlers are frequently called
|
|
recursively, and doing I/O directly to the socket might bypass a
|
|
filtering or other processing step interposed on the current I/O ports
|
|
by some superior path handler.
|
|
|
|
\section{Basic path handlers}
|
|
|
|
Although the user can write any path-handler he likes, the SUnet web server
|
|
comes with a useful toolbox of basic path handlers that can be used
|
|
and built upon (exported by the \ex{httpd\=basic\=handlers}-structure):
|
|
|
|
\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler}
|
|
This procedure takes a \ex{string->\ob{}path\=handler} alist, and a
|
|
default path handler, and returns a handler that dispatches on its
|
|
path argument. When the new path handler is applied to a path
|
|
\ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of
|
|
the path -- ``\ex{foo}'' -- to index into the alist. If it finds an
|
|
associated path handler in the alist, it hands the request off to
|
|
that handler, passing it the tail of the path, \ex{("bar"\ob{}
|
|
"baz")}. On the other hand, if the path is empty, or the alist
|
|
search does not yield a hit, we hand off to the default path
|
|
handler, passing it the entire original path, \ex{("foo"\ob{}
|
|
"bar"\ob{} "baz")}.
|
|
|
|
This procedure is how you say: ``If the first element of the URL's
|
|
path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one
|
|
takes an object-oriented view of the process, an alist path-handler
|
|
does method lookup on the requested operation, dispatching off to
|
|
the appropriate method defined for the URL.
|
|
|
|
The slash-delimited URI path structure implies an associated tree of
|
|
names. The path-handler system and the alist dispatcher allow you to
|
|
procedurally define the server's response to any arbitrary subtree
|
|
of the path space.
|
|
|
|
Example: A typical top-level path handler is
|
|
\begin{alltt}
|
|
(define ph
|
|
(alist-path-dispatcher
|
|
`(("h" . ,(home-dir-handler "public\_html"))
|
|
("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
|
|
("seval" . ,seval-handler))
|
|
(rooted-file-handler "/usr/local/etc/httpd/htdocs")))
|
|
\end{alltt}
|
|
|
|
This means:
|
|
\begin{itemize}
|
|
\item If the path looks like \ex{("h"\ob{} "shivers"\ob{}
|
|
"code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path
|
|
\ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a
|
|
home-directory path handler.
|
|
\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")},
|
|
pass ("calendar") off to the CGI path handler.
|
|
\item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail
|
|
of the path is passed off to the code-uploading seval path
|
|
handler.
|
|
\item Otherwise, the whole path is passed to a rooted file handler,
|
|
who will convert it into a filename, rooted at
|
|
\ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs},
|
|
and serve that file.
|
|
\end{itemize}
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{home-dir-handler}{subdir}{path-handler}
|
|
This procedure builds a path handler that does basic file serving
|
|
out of home directories. If the resulting \semvar{path-handler} is
|
|
passed a path of \ex{(user . file\=path)}, then it serves the file
|
|
\ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path}
|
|
|
|
The path handler only handles GET requests; the filename is not
|
|
allowed to contain \ex{..} elements.
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler}
|
|
This path handler examines the car of the path. If it is a string
|
|
beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is
|
|
taken to mean a home directory, and the request is served similarly
|
|
to a home-dir-handler path handler. Otherwise, the request is passed
|
|
off in its entirety to the \semvar{default-path-handler}.
|
|
|
|
This procedure is useful for implementing servers that provide the
|
|
semantics of the NCSA httpd server.
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler}
|
|
This procedure returns a path-handler that passes the request off to
|
|
some program using the CGI interface. The script name is taken from
|
|
the car of the path; it is checked for occurrences of \ex{..}'s. If
|
|
the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the
|
|
program executed is
|
|
\ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}.
|
|
|
|
When the CGI path handler builds the process environment for the CGI
|
|
script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be
|
|
computed at server start-up time. This can be done by calling
|
|
\codex{(initialise-request-invariant-cgi-env)}
|
|
when the server starts up. This is not necessary, but will make CGI
|
|
requests a little faster.
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler}
|
|
Returns a path handler that serves files from a particular root in
|
|
the file system. Only the GET operation is provided. The path
|
|
argument passed to the handler is converted into a filename, and
|
|
appended to root-dir. The file name is checked for \ex{..}
|
|
components, and the transaction is aborted if it does. Otherwise,
|
|
the file is served to the client.
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{rooted-file-or-directory-handler}{root
|
|
icon-name}{path-handler}
|
|
|
|
Dito, but also serve directory indices for directories without
|
|
\ex{index.\ob{}html}. \semvar{icon-name} specifies how to generate
|
|
the links to various decorative icons for the listings. It can either
|
|
be a procedure which gets passed one of the icon tags listed below and
|
|
is expected to return a link pointing to the icon. If it is a string,
|
|
that is taken as prefix to which the file names of the tags listed
|
|
below are appended.
|
|
|
|
\begin{tabular}{ll}
|
|
Tag & Icon's file name \\
|
|
\hline
|
|
\ex{directory} & \ex{directory.xbm}\\
|
|
\ex{text} & \ex{text.xbm}\\
|
|
\ex{doc} & \ex{doc.xbm}\\
|
|
\ex{image} & \ex{image.xbm}\\
|
|
\ex{movie} & \ex{movie.xbm}\\
|
|
\ex{audio} & \ex{sound.xbm}\\
|
|
\ex{archive} & \ex{tar.xbm}\\
|
|
\ex{compressed} & \ex{compressed.xbm}\\
|
|
\ex{uu} & \ex{uu.xbm}\\
|
|
\ex{binhex} & \ex{binhex.xbm}\\
|
|
\ex{binary} & \ex{binary.xbm}\\
|
|
\ex{blank} & \ex{blank.xbm}\\
|
|
\ex{back} & \ex{back.xbm}\\
|
|
\ex{\it{}else} & \ex{unknown.xbm}\\
|
|
\end{tabular}
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{null-path-handler}{path req}{\noreturn}
|
|
This path handler is useful as a default handler. It handles no
|
|
requests, always returning a ``404 Not found'' reply to the client.
|
|
\end{defundesc}
|
|
|
|
\section{HTTP errors}
|
|
|
|
Authors of path-handlers need to be able to handle errors in a
|
|
reasonably simple fashion. The SUnet Web server provides a set of error
|
|
conditions that correspond to the error replies in the HTTP protocol.
|
|
These errors can be raised with the \ex{http\=error} procedure. When
|
|
the server runs a path handler, it runs it in the context of an error
|
|
handler that catches these errors, sends an error reply to the client,
|
|
and closes the transaction.
|
|
|
|
\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn}
|
|
This raises an http error condition. The reply code is one of the
|
|
numeric HTTP error reply codes, which are bound to the variables
|
|
\ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found,
|
|
http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The
|
|
\semvar{req} argument is the request record that caused the error.
|
|
Any following extra args are passed along for informational
|
|
purposes. Different HTTP errors take different types of extra
|
|
arguments. For example, the ``301 moved permanently'' and ``302
|
|
moved temporarily'' replies use the first two extra values as the
|
|
\ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header,
|
|
respectively. See the clauses of the
|
|
\ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details.
|
|
\end{defundesc}
|
|
|
|
\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn}
|
|
This procedure writes an error reply out to the current output port.
|
|
If an error occurs during this process, it is caught, and the
|
|
procedure silently returns. The http server's standard error handler
|
|
passes all http errors raised during path-handler execution to this
|
|
procedure to generate the error reply before aborting the request
|
|
transaction.
|
|
\end{defundesc}
|
|
|
|
\section{Simple directory generation}
|
|
|
|
Most path-handlers that serve files to clients eventually call an
|
|
internal procedure named \ex{file\=serve}, which implements a simple
|
|
directory-generation service using the following rules:
|
|
\begin{itemize}
|
|
\item If the filename has the form of a directory (i.e., it ends with
|
|
a slash), then \ex{file\=serve} actually looks for a file named
|
|
``index.html'' in that directory.
|
|
\item If the filename names a directory, but is not in directory form
|
|
(i.e., it doesn't end in a slash, as in
|
|
``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''),
|
|
then \ex{file\=serve} sends back a ``301 moved permanently''
|
|
message, redirecting the client to a slash-terminated version of the
|
|
original URL. For example, the URL
|
|
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers}
|
|
would be redirected to
|
|
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/}
|
|
\item If the filename names a regular file, it is served to the
|
|
client.
|
|
\end{itemize}
|
|
|
|
\section{CGI Server}
|
|
|
|
\begin{defundesc}{cgi-handler}{bin-dir \ovar{cgi-bin-dir}}{path-handler}
|
|
Returns a path handler (see \ref{httpd:path-handlers} for details
|
|
about path handlers) for cgi-scripts located in
|
|
\semvar{bin-dir}. \semvar{cgi-bin-dir} specifies the value of the
|
|
\ex{PATH} variable of the environment the cgi-scripts run in. It defaults
|
|
to
|
|
``\ex{/bin:\ob{}/usr/bin:\ob{}/usr/ucb:\ob{}/usr/bsd:\ob{}/usr/local/bin}''
|
|
but is overwritten by the current \ex{PATH} environment variable at
|
|
the time \ex{cgi-handler} ist called. The cgi-scripts are called as
|
|
specified by CGI/1.1\footnote{see
|
|
\ex{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of
|
|
specification.}.
|
|
|
|
\begin{itemize}
|
|
\item Various environment variables are set (like
|
|
\ex{QUERY\_STRING} or \ex{REMOTE\_HOST}).
|
|
\item ISINDEX queries get their arguments as command line arguments.
|
|
\item Scripts are handled differently according to their name:
|
|
|
|
\begin{itemize}
|
|
|
|
\item If the name of the script starts with `\ex{nph-}', its reply
|
|
is read, the RFC~822-fields like ``Content-Type'' and ``Status''
|
|
are parsed and the client is sent back a real HTTP reply,
|
|
containing the rest of the script's output.
|
|
|
|
\item If the name of the script doesn't start with `\ex{nph-}',
|
|
its output is sent back to the client directly. If its return code
|
|
is not zero, an error message is generated.
|
|
|
|
\end{itemize}
|
|
\end{itemize}
|
|
\end{defundesc}
|
|
|
|
\section{Support procs}
|
|
|
|
The source files contain a host of support procedures which will be of
|
|
utility to anyone writing a custom path-handler. Read the files first.
|
|
\FIXME{Let us read the files and paste the contents here.}
|
|
|
|
%%% Local Variables:
|
|
%%% mode: latex
|
|
%%% TeX-master: "man"
|
|
%%% End:
|