From d9fc32433d068b9de6c6cb0e1c49c8806830a757 Mon Sep 17 00:00:00 2001 From: interp Date: Wed, 27 Feb 2002 19:28:27 +0000 Subject: [PATCH] Made a HTML to LaTeX from the existant HTML docu on the Web server. There are still a lots of FIXMEs. --- doc/latex/httpd.tex | 353 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 352 insertions(+), 1 deletion(-) diff --git a/doc/latex/httpd.tex b/doc/latex/httpd.tex index ceda935..3fc8621 100644 --- a/doc/latex/httpd.tex +++ b/doc/latex/httpd.tex @@ -5,4 +5,355 @@ \item[Name of the package:] ftpd \end{description} % -Not documented yet. \ No newline at end of file + +\subsection{Introduction} + +The \Scheme underground Web system is a package of \Scheme code +that provides utilities for interacting with the World-Wide Web. +This includes: +\begin{itemize} +\item A Web server. +\item URI and URL parsers and un-parsers (see sections \ref{sec:uri} + and \ref{sec:url}). +\item RFC822-style header parsers (see section \ref{sec:rfc822}). +\item Code for performing structured html output +\item Code to assist in writing CGI \Scheme programs that can be used by + any CGI-compliant HTTP server (such as NCSA's httpd, or the S.U. + Web server). +\end{itemize} + +The code can be obtained via anonymous +ftp\footnote{\ttt{}ftp://ftp-swiss.ai.mit.edu/pub/scsh/contrib/net/net.tar.gz} +and is implemented in \scm, using the system calls and support +procedures of scsh, the \Scheme Shell. The code was written to be +clear and modifiable -- it is voluminously commented and all non-\RnRS +dependencies are described at the beginning of each source file. + +\FIXME{We should remove the note to read the source files and insert +the essentials here instead.} +I do not have the time to write detailed documentation for these +packages. However, they are very thoroughly commented, and I strongly +recommend reading the source files; they were written to be read, and +the source code comments should provide a clear description of the +system. The remainder of this note gives an overview of the server's +basic architecture and interfaces. + +\subsection{The Scheme Underground Web Server} + +The server was designed with three principle goals in mind: + +\begin{description} +\item{Extensibility} \\ + The server is designed to make it easy to extend the basic + functionality. In fact, the server is nothing but extensions. There + is no distinction between the set of basic services provided by the + server implementation and user extensions -- they are both + implemented in Scheme, and have equal status. The design is ``turtles + all the way down''. + +\item{Mobile code} \\ + Because the server is written in \scm, it is simple to use the \scm + module system to upload programs to the server for safe execution + within a protected, server-chosen environment. The server comes with + a simple example upload service to demonstrate this capability. + +\item{Clarity of implementation} \\ + Because the server is written in a high-level language, it should + make for a clearer exposition of the HTTP protocol and the + associated URL and URI notations than one written in a low-level + language such as C. This also should help to make the server easy to + modify and adapt to different uses. +\end{description} + +\subsubsection*{Basic server structure} + +The Web server is started by calling the httpd procedure, which takes +one required and two optional arguments: + +\defun{httpd}{path-handler \ovar{port working-directory}}{\noreturn} +\begin{desc} + The server accepts connections from the given \semvar{port}, which + defaults to 80. The server runs with the \semvar{working-directory} set to + the given value, which defaults to \ex{/usr/local/etc/httpd}. + + The server's basic loop is to wait on the port for a connection from + an HTTP client. When it receives a connection, it reads in and + parses the request into a special request data structure. Then the + server \FIXME{Does the server still fork or does it make a thunk. Is + this a difference? (Do not know)} forks a child process, who binds + the current I/O ports to the connection socket, and then hands off + to the top-level \semvar{path-handler} (the first argument to + httpd). The \semvar{path-handler} procedure is responsible for + actually serving the request -- it can be any arbitrary computation. + Its output goes directly back to the HTTP client that sent the + request. + + Before calling the path handler to service the request, the HTTP + server installs an error handler that fields any uncaught error, + sends an error reply to the client, and aborts the request + transaction. Hence any error caused by a path-handler will be + handled in a reasonable and robust fashion. + + The basic server loop, and the associated request data structure are + the fixed architecture of the S.U. Web server; its flexibility lies + in the notion of path handlers. +\end{desc} + +\subsubsection*{Path handlers} + + A path handler is a procedure taking two arguments: +\defun{path-handler}{path req}{value} +\begin{desc} + The \semvar{req} argument is a request record giving all the details + of the client's request; it has the following structure: \FIXME{Make + the record's structure a table} + \begin{code} +(define-record request + method ; A string such as "GET", "PUT", etc. + uri ; The escaped URI string as read from request line. + url ; An http URL record (see url.scm). + version ; A (major . minor) integer pair. + headers ; An rfc822 header alist (see rfc822.scm). + socket) ; The socket connected to the client.\end{code} + +The \semvar{path} argument is the URL's path, parsed and split at +slashes into a string list. For example, if the Web client +dereferences URL +\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz} +then the server would pass the following path to the top-level +handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{} + "web.\ob{}tar.\ob{}gz")} + +The \semvar{path} argument's pre-parsed representation as a string +list makes it easy for the path handler to implement recursive +operations dispatch on URL paths. +\end{desc} + +Path handlers can do anything they like to respond to HTTP requests; +they have the full range of Scheme to implement the desired +functionality. When handling HTTP requests that have an associated +entity body (such as POST), the body should be read from the current +input port. Path handlers should in all cases write their reply to the +current output port. Path handlers should not perform I/O on the +request record's socket. Path handlers are frequently called +recursively, and doing I/O directly to the socket might bypass a +filtering or other processing step interposed on the current I/O ports +by some superior path handler. + +\subsubsection*{Basic path handlers} + +Although the user can write any path-handler he likes, the S.U. server +comes with a useful toolbox of basic path handlers that can be used +and built upon: + +\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler} + This procedure takes a \ex{string->\ob{}path\=handler} alist, and a + default path handler, and returns a handler that dispatches on its + path argument. When the new path handler is applied to a path + \ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of + the path -- ``\ex{foo}'' -- to index into the alist. If it finds an + associated path handler in the alist, it hands the request off to + that handler, passing it the tail of the path, \ex{("bar"\ob{} + "baz")}. On the other hand, if the path is empty, or the alist + search does not yield a hit, we hand off to the default path + handler, passing it the entire original path, \ex{("foo"\ob{} + "bar"\ob{} "baz")}. + + This procedure is how you say: ``If the first element of the URL's + path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one + takes an object-oriented view of the process, an alist path-handler + does method lookup on the requested operation, dispatching off to + the appropriate method defined for the URL. + + The slash-delimited URI path structure implies an associated tree of + names. The path-handler system and the alist dispatcher allow you to + procedurally define the server's response to any arbitrary subtree + of the path space. + + Example: A typical top-level path handler is +\begin{code} +(define ph + (alist-path-dispatcher + `(("h" . ,(home-dir-handler "public_html")) + ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin")) + ("seval" . ,seval-handler)) + (rooted-file-handler "/usr/local/etc/httpd/htdocs")))\end{code} + + This means: +\begin{itemize} +\item If the path looks like \ex{("h"\ob{} "shivers"\ob{} + "code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path + \ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a + home-directory path handler. +\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")}, + pass ("calendar") off to the CGI path handler. + \item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail + of the path is passed off to the code-uploading seval path + handler. + \item Otherwise, the whole path is passed to a rooted file handler, + who will convert it into a filename, rooted at + \ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs}, + and serve that file. +\end{itemize} +\end{defundesc} + +\begin{defundesc}{home-dir-handler}{subdir}{path-handler} + This procedure builds a path handler that does basic file serving + out of home directories. If the resulting \semvar{path-handler} is + passed a path of \ex{(user . file\=path)}, then it serves the file + \ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path} + + The path handler only handles GET requests; the filename is not + allowed to contain \ex{..} elements. +\end{defundesc} + +\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler} + This path handler examines the car of the path. If it is a string + beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is + taken to mean a home directory, and the request is served similarly + to a home-dir-handler path handler. Otherwise, the request is passed + off in its entirety to the \semvar{default-path-handler}. + + This procedure is useful for implementing servers that provide the + semantics of the NCSA httpd server. +\end{defundesc} + +\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler} + This procedure returns a path-handler that passes the request off to + some program using the CGI interface. The script name is taken from + the car of the path; it is checked for occurrences of \ex{..}'s. If + the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the + program executed is + \ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}. + + When the CGI path handler builds the process environment for the CGI + script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be + computed at server start-up time. This can be done by calling + \codex{(initialise-request-invariant-cgi-env)} + when the server starts up. This is not necessary, but will make CGI + requests a little faster. +\end{defundesc} + +\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler} + Returns a path handler that serves files from a particular root in + the file system. Only the GET operation is provided. The path + argument passed to the handler is converted into a filename, and + appended to root-dir. The file name is checked for \ex{..} + components, and the transaction is aborted if it does. Otherwise, + the file is served to the client. +\end{defundesc} + +\begin{defundesc}{null-path-handler}{path req}{\noreturn} + This path handler is useful as a default handler. It handles no + requests, always returning a ``404 Not found'' reply to the client. +\end{defundesc} + +\subsection{HTTP errors} + +Authors of path-handlers need to be able to handle errors in a +reasonably simple fashion. The S.U. Web server provides a set of error +conditions that correspond to the error replies in the HTTP protocol. +These errors can be raised with the \ex{http\=error} procedure. When +the server runs a path handler, it runs it in the context of an error +handler that catches these errors, sends an error reply to the client, +and closes the transaction. + +\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn} + This raises an http error condition. The reply code is one of the + numeric HTTP error reply codes, which are bound to the variables + \ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found, + http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The + \semvar{req} argument is the request record that caused the error. + Any following extra args are passed along for informational + purposes. Different HTTP errors take different types of extra + arguments. For example, the ``301 moved permanently'' and ``302 + moved temporarily'' replies use the first two extra values as the + \ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header, + respectively. See the clauses of the + \ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details. +\end{defundesc} + +\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn} + This procedure writes an error reply out to the current output port. + If an error occurs during this process, it is caught, and the + procedure silently returns. The http server's standard error handler + passes all http errors raised during path-handler execution to this + procedure to generate the error reply before aborting the request + transaction. +\end{defundesc} + +\subsection{Simple directory generation} + +Most path-handlers that serve files to clients eventually call an +internal procedure named \ex{file\=serve}, which implements a simple +directory-generation service using the following rules: +\begin{itemize} +\item If the filename has the form of a directory (i.e., it ends with + a slash), then \ex{file\=serve} actually looks for a file named + ``index.html'' in that directory. +\item If the filename names a directory, but is not in directory form + (i.e., it doesn't end in a slash, as in + ``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''), + then \ex{file\=serve} sends back a ``301 moved permanently'' + message, redirecting the client to a slash-terminated version of the + original URL. For example, the URL + \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers} + would be redirected to + \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/} +\item If the filename names a regular file, it is served to the + client. +\end{itemize} + +\subsection{Support procs} + +The source files contain a host of support procedures which will be of +utility to anyone writing a custom path-handler. Read the files first. +\FIXME{Let us read the files and paste the contents here.} + +\subsection{Losing} + + Be aware of two Unix problems, which may require workarounds: +\begin{enumerate} +\item NeXTSTEP's Posix implementation of the \ex{get\ob{}pwnam()} + routine will silently tell you that every user has uid 0. This means + that if your server, running as root, does a + \codex{(set-uid (user->uid "nobody"))} + it will essentially do a + \codex{(set-uid 0)} + and you will thus still be running as root. The fix is to manually + find out who user nobody is (he's -2 on my system), and to hard-wire + this into the server: + \codex{(set-uid -2)} + This problem is NeXTSTEP specific. If you are using not using + NeXTSTEP, no problem. +\item On NeXTSTEP, the \ex{ip\=ad\ob{}dress->\ob{}host\=name} + translation routine (in C, \ex{get\ob{}host\ob{}by\ob{}addr()}; in + scsh, \ex{(host\=in\ob{}fo addr)}) does not use the DNS system; it + goes through NeXT's propietary Netinfo system, and may not return a + fully-qualified domain name. For example, on my system, I get + ``\ex{ame\ob{}lia\=ear\ob{}hart}'', when I want + ``\ex{ame\ob{}lia\=ear\ob{}hart.\ob{}lcs.\ob{}mit.\ob{}edu}''. Since + the server uses this name to construct redirection URL's to be sent + back to the Web client, they need to be FQDN's. + + This problem may occur on other OS's; I cannot determine if + \ex{get\ob{}host\ob{}by\ob{}addr()} is required to return a FQDN or + not. (I would appreciate hearing the answer if you know; my local + Internet guru's couldn't tell me.) + + If your system doesn't give you a complete Internet address when you + say + \codex{(host-info:name (host-info (system-name)))} + then you have this problem. + + The server has a workaround. There is a procedure exported from the + \ex{httpd\=core} package: + \codex{(set-my-fqdn name)} + Call this to crow-bar the server's idea of its own Internet host + name before running the server, and all will be well. +\end{enumerate} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: t +%%% End: