\chapter{HTTP server}\label{cha:httpd} % \begin{description} \item[Used files:] httpd/core.scm, httpd/handlers.scm, httpd/options.scm, \item[Name of the packages:] httpd-core, httpd-basic-handler, httpd-make-options \end{description} There are also some other files and packages that are used internally. % The SUnet web system is a collection of packages of \Scheme code that provides utilities for interacting with the World-Wide Web. This includes: \begin{itemize} \item A Web server. \item URI and URL parsers and un-parsers (see Chapters \ref{cha:uri} and \ref{cha:url}). \item RFC822-style header parsers (see Chapter \ref{cha:rfc822}). \item Code for performing structured html output \item Code to assist in writing CGI \Scheme programs that can be used by any CGI-compliant HTTP server (such as NCSA's httpd, or the SUnet Web server). \end{itemize} The server has three main design goals: \begin{description} \item[Extensibility] The server is in fact nothing but extensions, using a mechanism called ``path handlers'' to define URL-specific services. It has a toolkit of services that can be used as-is, extended or built upon. User extensions have exactly the same status as the base services. The extension mechanism allows for easy implementation of new services without the overhead of the CGI interface. Since the server is written on top of the Scheme shell, the full set of Unix system calls and program tools is available to the implementor. \item[Mobile code] The server allows Scheme code to be uploaded for direct execution inside the server. The server has complete control over the code, and can safely execute it in restricted environments that do not provide access to potentially dangerous primitives (such as the ``delete file'' procedure.) \item[Clarity] I\footnote{That's Olin Shivers (\ex{shivers@ai.mit.edu}, \ex{http://www.\ob{}ai.\ob{}mit.\ob{}edu/\ob{}people/\ob{}shivers/}). For the rest of the documentation, if not mentioned otherwise, `I' refers to him.} wrote this server to help myself understand the Web. It is voluminously commented, and I hope it will prove to be an aid in understanding the low-level details of the Web protocols. The SUnet web server has the ability to upload code from Web clients and execute that code on behalf of the client in a protected environment. Some simple documentation on the server is available. \end{description} \section{Basic server structure} The Web server is started by calling the httpd procedure, which takes one argument, a \ex{httpd\=options}-record: \defun{httpd}{options}{\noreturn} \begin{desc} This procedure starts the server. The various \semvar{options} can be set via the options transformers that are explained below. The server's basic loop is to wait on the port for a connection from an HTTP client. When it receives a connection, it reads in and parses the request into a special request data structure. Then the server forks a thread, who binds the current I/O ports to the connection socket, and then hands off to the top-level \semvar{path-handler} (the first argument to httpd). The \semvar{path-handler} procedure is responsible for actually serving the request -- it can be any arbitrary computation. Its output goes directly back to the HTTP client that sent the request. Before calling the path handler to service the request, the HTTP server installs an error handler that fields any uncaught error, sends an error reply to the client, and aborts the request transaction. Hence any error caused by a path-handler will be handled in a reasonable and robust fashion. The basic server loop, and the associated request data structure are the fixed architecture of the SUnet Web server; its flexibility lies in the notion of path handlers. \end{desc} \defun{with-port}{port \ovar{options}}{options} \defunx{with-root-directory}{root-directory \ovar{options}}{options} \defunx{with-fqdn}{fqdn \ovar{options}}{options} \defunx{with-reported-port}{reported-port \ovar{options}}{options} \defunx{with-path-handler}{path-handler \ovar{options}}{options} \defunx{with-server-admin}{mail-address \ovar{options}}{options} \defunx{with-simultaneous-requests}{requests \ovar{options}}{options} \defunx{with-logfile}{logfile \ovar{options}}{options} \defunx{with-syslog?}{syslog? \ovar{options}}{options} \defunx{with-resolve-ip?}{resolve-ip? \ovar{options}}{options} \begin{desc} As noted above, these transformers set the options for the web server. Every transformer changes one aspect of the \semvar{options} (for the \ex{httpd}). If this optional argument is missing, the default values are used. These are the following: \begin{tabular}{ll} \bf{transformer} & \bf{default value} \\ \hline \ex{with\=port} & 80 \\ \ex{with\=root\=directory} & ``\ex{/}'' \\ \ex{with\=fqdn} & \sharpf \\ \ex{with\=reported-port} & \sharpf \\ \ex{with\=path\=handler} & \sharpf \\ \ex{with\=server\=admin} & \sharpf \\ \ex{with\=simultaneous\=requests} & \sharpf \\ \ex{with\=logfile} & ``\ex{/logfile.log}''\\ \ex{with\=syslog?} & \sharpt \\ \ex{with\=resolve\=ip?} & \sharpt \end{tabular} % that can be found in the \ex{httpd\=make\=options}-structure: % \ex{with\=port}, \ex{with\=root\=directory}, \ex{with\=fqdn}, % \ex{with\=reported-port}, \ex{with\=path\=handler}, % \ex{with\=server\=admin}, \ex{with\=simultaneous-requests}, % \ex{with\=logfile}, \ex{with\=syslog?} that set the port the server % is listening to, the root-directory of the server, the FQDN of the % server, the port the server assumes it is listening to, the % path-handler of the server (see below), the mail-address of the % server-admin, the maximum number of simultaneous handled requests, % the name of the file or the port logging in the Common Log Format % (CLF) is output to and if the server shall create syslog messages, % respectively. The port defaults to 80, the root directory defaults % to ``\ex{/}'', the mail address of the server-admin defaults to % ``\ex{sperber@\ob{}informatik.\ob{}uni\=tuebingen.\ob{}de}'', % \FIXME{Why does the server admin mail address have % sperber@informatik... as default value?}logging is done to % ``\ex{httpd.log}'' and syslog is enabled. All other options default % to \sharpf. For example \begin{alltt} (httpd (with-path-handler (rooted-file-handler "/usr/local/etc/httpd") (with-root-directory "/usr/local/etc/httpd"))) \end{alltt} starts the server on port 80 with ``\ex{/usr/\ob{}local/\ob{}etc/\ob{}httpd}'' as root directory and lets it serve any file out from this directory. \ex{rooted\=file\=handler} creates a path handler and is explained below. You see, the transformers are used nested. So, every transformer changes one aspect of the options that the following transformer returns and the last transformer (here: \ex{with\=root\=directory}) changes an aspect of the default values \semvar{port} is the port the server is listening to, \semvar{root-directory} is the directory in the file system the server uses as root, \semvar{fqdn} is the fully qualified domain name the server reports, \semvar{reported-port} is the port the server reports it is listening to and \semvar{server-admin} is the mail address of the server admin. \semvar{requests} denote the maximum number of allowed simultaneous requests to the server. \sharpf\ means infinite. \semvar{logfile} is either a string, then it is the file name of the logfile, or a port, where the log entries are written to, or \sharpf, that means no logging is made. The logfile is in Common Log Format (CLF). To allow rotation of logfiles, the server will reopen the logfile when it receives the signal \texttt{USR1}. \semvar{syslog?} tells the server to write syslog messages (\sharpt) or not (\sharpf). \end{desc} \section{Path handlers} \label{httpd:path-handlers} A path handler is a procedure taking two arguments: \defun{path-handler}{path req}{value} \begin{desc} The \semvar{req} argument is a request record giving all the details of the client's request; it has the following structure: \FIXME{Make the record's structure a table} \begin{alltt} (define-record request method ; A string such as "GET", "PUT", etc. uri ; The escaped URI string as read from request line. url ; An http URL record (see url.scm). version ; A (major . minor) integer pair. headers ; An rfc822 header alist (see rfc822.scm). socket) ; The socket connected to the client. \end{alltt} The \semvar{path} argument is the URL's path, parsed and split at slashes into a string list. For example, if the Web client dereferences URL \codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz} then the server would pass the following path to the top-level handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} The \semvar{path} argument's pre-parsed representation as a string list makes it easy for the path handler to implement recursive operations dispatch on URL paths. \end{desc} Path handlers can do anything they like to respond to HTTP requests; they have the full range of Scheme to implement the desired functionality. When handling HTTP requests that have an associated entity body (such as POST), the body should be read from the current input port. Path handlers should in all cases write their reply to the current output port. Path handlers should not perform I/O on the request record's socket. Path handlers are frequently called recursively, and doing I/O directly to the socket might bypass a filtering or other processing step interposed on the current I/O ports by some superior path handler. \section{Basic path handlers} Although the user can write any path-handler he likes, the SUnet web server comes with a useful toolbox of basic path handlers that can be used and built upon (exported by the \ex{httpd\=basic\=handlers}-structure): \begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler} This procedure takes a \ex{string->\ob{}path\=handler} alist, and a default path handler, and returns a handler that dispatches on its path argument. When the new path handler is applied to a path \ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of the path -- ``\ex{foo}'' -- to index into the alist. If it finds an associated path handler in the alist, it hands the request off to that handler, passing it the tail of the path, \ex{("bar"\ob{} "baz")}. On the other hand, if the path is empty, or the alist search does not yield a hit, we hand off to the default path handler, passing it the entire original path, \ex{("foo"\ob{} "bar"\ob{} "baz")}. This procedure is how you say: ``If the first element of the URL's path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one takes an object-oriented view of the process, an alist path-handler does method lookup on the requested operation, dispatching off to the appropriate method defined for the URL. The slash-delimited URI path structure implies an associated tree of names. The path-handler system and the alist dispatcher allow you to procedurally define the server's response to any arbitrary subtree of the path space. Example: A typical top-level path handler is \begin{alltt} (define ph (alist-path-dispatcher `(("h" . ,(home-dir-handler "public\_html")) ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin")) ("seval" . ,seval-handler)) (rooted-file-handler "/usr/local/etc/httpd/htdocs"))) \end{alltt} This means: \begin{itemize} \item If the path looks like \ex{("h"\ob{} "shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path \ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a home-directory path handler. \item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")}, pass ("calendar") off to the CGI path handler. \item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail of the path is passed off to the code-uploading seval path handler. \item Otherwise, the whole path is passed to a rooted file handler, who will convert it into a filename, rooted at \ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs}, and serve that file. \end{itemize} \end{defundesc} \begin{defundesc}{home-dir-handler}{subdir}{path-handler} This procedure builds a path handler that does basic file serving out of home directories. If the resulting \semvar{path-handler} is passed a path of \ex{(user . file\=path)}, then it serves the file \ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path} The path handler only handles GET requests; the filename is not allowed to contain \ex{..} elements. \end{defundesc} \begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler} This path handler examines the car of the path. If it is a string beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is taken to mean a home directory, and the request is served similarly to a home-dir-handler path handler. Otherwise, the request is passed off in its entirety to the \semvar{default-path-handler}. This procedure is useful for implementing servers that provide the semantics of the NCSA httpd server. \end{defundesc} \begin{defundesc}{cgi-handler}{cgi-directory}{path-handler} This procedure returns a path-handler that passes the request off to some program using the CGI interface. The script name is taken from the car of the path; it is checked for occurrences of \ex{..}'s. If the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the program executed is \ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}. When the CGI path handler builds the process environment for the CGI script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be computed at server start-up time. This can be done by calling \codex{(initialise-request-invariant-cgi-env)} when the server starts up. This is not necessary, but will make CGI requests a little faster. \end{defundesc} \begin{defundesc}{rooted-file-handler}{root-dir}{path-handler} Returns a path handler that serves files from a particular root in the file system. Only the GET operation is provided. The path argument passed to the handler is converted into a filename, and appended to root-dir. The file name is checked for \ex{..} components, and the transaction is aborted if it does. Otherwise, the file is served to the client. \end{defundesc} \begin{defundesc}{rooted-file-or-directory-handler}{root icon-name}{path-handler} Dito, but also serve directory indices for directories without \ex{index.\ob{}html}. \semvar{icon-name} specifies how to generate the links to various decorative icons for the listings. It can either be a procedure which gets passed one of the icon tags listed below and is expected to return a link pointing to the icon. If it is a string, that is taken as prefix to which the file names of the tags listed below are appended. \begin{tabular}{ll} Tag & Icon's file name \\ \hline \ex{directory} & \ex{directory.xbm}\\ \ex{text} & \ex{text.xbm}\\ \ex{doc} & \ex{doc.xbm}\\ \ex{image} & \ex{image.xbm}\\ \ex{movie} & \ex{movie.xbm}\\ \ex{audio} & \ex{sound.xbm}\\ \ex{archive} & \ex{tar.xbm}\\ \ex{compressed} & \ex{compressed.xbm}\\ \ex{uu} & \ex{uu.xbm}\\ \ex{binhex} & \ex{binhex.xbm}\\ \ex{binary} & \ex{binary.xbm}\\ \ex{blank} & \ex{blank.xbm}\\ \ex{back} & \ex{back.xbm}\\ \ex{\it{}else} & \ex{unknown.xbm}\\ \end{tabular} \end{defundesc} \begin{defundesc}{null-path-handler}{path req}{\noreturn} This path handler is useful as a default handler. It handles no requests, always returning a ``404 Not found'' reply to the client. \end{defundesc} \section{HTTP errors} Authors of path-handlers need to be able to handle errors in a reasonably simple fashion. The SUnet Web server provides a set of error conditions that correspond to the error replies in the HTTP protocol. These errors can be raised with the \ex{http\=error} procedure. When the server runs a path handler, it runs it in the context of an error handler that catches these errors, sends an error reply to the client, and closes the transaction. \begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn} This raises an http error condition. The reply code is one of the numeric HTTP error reply codes, which are bound to the variables \ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found, http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The \semvar{req} argument is the request record that caused the error. Any following extra args are passed along for informational purposes. Different HTTP errors take different types of extra arguments. For example, the ``301 moved permanently'' and ``302 moved temporarily'' replies use the first two extra values as the \ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header, respectively. See the clauses of the \ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details. \end{defundesc} \begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn} This procedure writes an error reply out to the current output port. If an error occurs during this process, it is caught, and the procedure silently returns. The http server's standard error handler passes all http errors raised during path-handler execution to this procedure to generate the error reply before aborting the request transaction. \end{defundesc} \section{Simple directory generation} Most path-handlers that serve files to clients eventually call an internal procedure named \ex{file\=serve}, which implements a simple directory-generation service using the following rules: \begin{itemize} \item If the filename has the form of a directory (i.e., it ends with a slash), then \ex{file\=serve} actually looks for a file named ``index.html'' in that directory. \item If the filename names a directory, but is not in directory form (i.e., it doesn't end in a slash, as in ``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''), then \ex{file\=serve} sends back a ``301 moved permanently'' message, redirecting the client to a slash-terminated version of the original URL. For example, the URL \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers} would be redirected to \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/} \item If the filename names a regular file, it is served to the client. \end{itemize} \section{CGI Server} \begin{defundesc}{cgi-handler}{bin-dir \ovar{cgi-bin-dir}}{path-handler} Returns a path handler (see \ref{httpd:path-handlers} for details about path handlers) for cgi-scripts located in \semvar{bin-dir}. \semvar{cgi-bin-dir} specifies the value of the \ex{PATH} variable of the environment the cgi-scripts run in. It defaults to ``\ex{/bin:\ob{}/usr/bin:\ob{}/usr/ucb:\ob{}/usr/bsd:\ob{}/usr/local/bin}'' but is overwritten by the current \ex{PATH} environment variable at the time \ex{cgi-handler} ist called. The cgi-scripts are called as specified by CGI/1.1\footnote{see \ex{http://hoohoo.ncsa.uiuc.edu/cgi/interface.html} for a sort of specification.}. \begin{itemize} \item Various environment variables are set (like \ex{QUERY\_STRING} or \ex{REMOTE\_HOST}). \item ISINDEX queries get their arguments as command line arguments. \item Scripts are handled differently according to their name: \begin{itemize} \item If the name of the script starts with `\ex{nph-}', its reply is read, the RFC~822-fields like ``Content-Type'' and ``Status'' are parsed and the client is sent back a real HTTP reply, containing the rest of the script's output. \item If the name of the script doesn't start with `\ex{nph-}', its output is sent back to the client directly. If its return code is not zero, an error message is generated. \end{itemize} \end{itemize} \end{defundesc} \section{Support procs} The source files contain a host of support procedures which will be of utility to anyone writing a custom path-handler. Read the files first. \FIXME{Let us read the files and paste the contents here.} %%% Local Variables: %%% mode: latex %%% TeX-master: "man" %%% End: