From d9fc32433d068b9de6c6cb0e1c49c8806830a757 Mon Sep 17 00:00:00 2001
From: interp <interp>
Date: Wed, 27 Feb 2002 19:28:27 +0000
Subject: [PATCH] Made a HTML to LaTeX from the existant HTML docu on the Web
 server. There are still a lots of FIXMEs.

---
 doc/latex/httpd.tex | 353 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 352 insertions(+), 1 deletion(-)
diff --git a/doc/latex/httpd.tex b/doc/latex/httpd.tex
index ceda935..3fc8621 100644
--- a/doc/latex/httpd.tex
+++ b/doc/latex/httpd.tex
@@ -5,4 +5,355 @@
 \item[Name of the package:] ftpd
 \end{description}
 %
-Not documented yet.
\ No newline at end of file
+
+\subsection{Introduction}
+
+The \Scheme underground Web system is a package of \Scheme code
+that provides utilities for interacting with the World-Wide Web.
+This includes:
+\begin{itemize}
+\item A Web server.
+\item URI and URL parsers and un-parsers (see sections \ref{sec:uri}
+  and \ref{sec:url}).
+\item RFC822-style header parsers (see section \ref{sec:rfc822}).
+\item Code for performing structured html output
+\item Code to assist in writing CGI \Scheme programs that can be used by
+  any CGI-compliant HTTP server (such as NCSA's httpd, or the S.U.
+  Web server).
+\end{itemize}
+       
+The code can be obtained via anonymous
+ftp\footnote{\ttt{}ftp://ftp-swiss.ai.mit.edu/pub/scsh/contrib/net/net.tar.gz}
+and is implemented in \scm, using the system calls and support
+procedures of scsh, the \Scheme Shell. The code was written to be
+clear and modifiable -- it is voluminously commented and all non-\RnRS
+dependencies are described at the beginning of each source file.
+   
+\FIXME{We should remove the note to read the source files and insert
+the essentials here instead.}
+I do not have the time to write detailed documentation for these
+packages. However, they are very thoroughly commented, and I strongly
+recommend reading the source files; they were written to be read, and
+the source code comments should provide a clear description of the
+system. The remainder of this note gives an overview of the server's
+basic architecture and interfaces.
+   
+\subsection{The Scheme Underground Web Server}
+
+The server was designed with three principle goals in mind:
+
+\begin{description}   
+\item{Extensibility} \\
+  The server is designed to make it easy to extend the basic
+  functionality. In fact, the server is nothing but extensions.  There
+  is no distinction between the set of basic services provided by the
+  server implementation and user extensions -- they are both
+  implemented in Scheme, and have equal status. The design is ``turtles
+  all the way down''.
+          
+\item{Mobile code} \\
+  Because the server is written in \scm, it is simple to use the \scm
+  module system to upload programs to the server for safe execution
+  within a protected, server-chosen environment. The server comes with
+  a simple example upload service to demonstrate this capability.
+          
+\item{Clarity of implementation} \\
+  Because the server is written in a high-level language, it should
+  make for a clearer exposition of the HTTP protocol and the
+  associated URL and URI notations than one written in a low-level
+  language such as C. This also should help to make the server easy to
+  modify and adapt to different uses.
+\end{description}
+
+\subsubsection*{Basic server structure}
+  
+The Web server is started by calling the httpd procedure, which takes
+one required and two optional arguments:
+
+\defun{httpd}{path-handler \ovar{port working-directory}}{\noreturn}
+\begin{desc}
+  The server accepts connections from the given \semvar{port}, which
+  defaults to 80. The server runs with the \semvar{working-directory} set to
+  the given value, which defaults to \ex{/usr/local/etc/httpd}.
+  
+  The server's basic loop is to wait on the port for a connection from
+  an HTTP client. When it receives a connection, it reads in and
+  parses the request into a special request data structure. Then the
+  server \FIXME{Does the server still fork or does it make a thunk. Is
+    this a difference? (Do not know)} forks a child process, who binds
+  the current I/O ports to the connection socket, and then hands off
+  to the top-level \semvar{path-handler} (the first argument to
+  httpd). The \semvar{path-handler} procedure is responsible for
+  actually serving the request -- it can be any arbitrary computation.
+  Its output goes directly back to the HTTP client that sent the
+  request.
+   
+  Before calling the path handler to service the request, the HTTP
+  server installs an error handler that fields any uncaught error,
+  sends an error reply to the client, and aborts the request
+  transaction.  Hence any error caused by a path-handler will be
+  handled in a reasonable and robust fashion.
+   
+  The basic server loop, and the associated request data structure are
+  the fixed architecture of the S.U. Web server; its flexibility lies
+  in the notion of path handlers.
+\end{desc}   
+
+\subsubsection*{Path handlers}
+  
+   A path handler is a procedure taking two arguments:
+\defun{path-handler}{path req}{value}
+\begin{desc}
+  The \semvar{req} argument is a request record giving all the details
+  of the client's request; it has the following structure: \FIXME{Make
+    the record's structure a table}
+  \begin{code}
+(define-record request
+ method            ; A string such as "GET", "PUT", etc.
+ uri               ; The escaped URI string as read from request line.
+ url               ; An http URL record (see url.scm).
+ version           ; A (major . minor) integer pair.
+ headers           ; An rfc822 header alist (see rfc822.scm).
+ socket)           ; The socket connected to the client.\end{code}
+
+The \semvar{path} argument is the URL's path, parsed and split at
+slashes into a string list. For example, if the Web client
+dereferences URL
+\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz}
+then the server would pass the following path to the top-level
+handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{}
+  "web.\ob{}tar.\ob{}gz")}
+
+The \semvar{path} argument's pre-parsed representation as a string
+list makes it easy for the path handler to implement recursive
+operations dispatch on URL paths.
+\end{desc}
+   
+Path handlers can do anything they like to respond to HTTP requests;
+they have the full range of Scheme to implement the desired
+functionality. When handling HTTP requests that have an associated
+entity body (such as POST), the body should be read from the current
+input port. Path handlers should in all cases write their reply to the
+current output port. Path handlers should not perform I/O on the
+request record's socket. Path handlers are frequently called
+recursively, and doing I/O directly to the socket might bypass a
+filtering or other processing step interposed on the current I/O ports
+by some superior path handler.
+
+\subsubsection*{Basic path handlers}
+  
+Although the user can write any path-handler he likes, the S.U. server
+comes with a useful toolbox of basic path handlers that can be used
+and built upon:
+   
+\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler}
+  This procedure takes a \ex{string->\ob{}path\=handler} alist, and a
+  default path handler, and returns a handler that dispatches on its
+  path argument. When the new path handler is applied to a path
+  \ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of
+  the path -- ``\ex{foo}'' -- to index into the alist. If it finds an
+  associated path handler in the alist, it hands the request off to
+  that handler, passing it the tail of the path, \ex{("bar"\ob{}
+    "baz")}. On the other hand, if the path is empty, or the alist
+  search does not yield a hit, we hand off to the default path
+  handler, passing it the entire original path, \ex{("foo"\ob{}
+    "bar"\ob{} "baz")}.
+          
+  This procedure is how you say: ``If the first element of the URL's
+  path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one
+  takes an object-oriented view of the process, an alist path-handler
+  does method lookup on the requested operation, dispatching off to
+  the appropriate method defined for the URL.
+          
+  The slash-delimited URI path structure implies an associated tree of
+  names. The path-handler system and the alist dispatcher allow you to
+  procedurally define the server's response to any arbitrary subtree
+  of the path space.
+          
+  Example: A typical top-level path handler is
+\begin{code}          
+(define ph
+  (alist-path-dispatcher
+      `(("h"       . ,(home-dir-handler "public_html"))
+        ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
+        ("seval"   . ,seval-handler))
+      (rooted-file-handler "/usr/local/etc/httpd/htdocs")))\end{code}
+    
+    This means:
+\begin{itemize}          
+\item If the path looks like \ex{("h"\ob{} "shivers"\ob{}
+    "code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path
+  \ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a
+  home-directory path handler.
+\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")},
+    pass ("calendar") off to the CGI path handler.
+  \item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail
+    of the path is passed off to the code-uploading seval path
+    handler.
+  \item Otherwise, the whole path is passed to a rooted file handler,
+    who will convert it into a filename, rooted at
+    \ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs},
+    and serve that file.
+\end{itemize}
+\end{defundesc}
+            
+\begin{defundesc}{home-dir-handler}{subdir}{path-handler}
+  This procedure builds a path handler that does basic file serving
+  out of home directories. If the resulting \semvar{path-handler} is
+  passed a path of \ex{(user . file\=path)}, then it serves the file
+  \ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path}
+    
+  The path handler only handles GET requests; the filename is not
+  allowed to contain \ex{..} elements.
+\end{defundesc}
+          
+\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler}
+  This path handler examines the car of the path. If it is a string
+  beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is
+  taken to mean a home directory, and the request is served similarly
+  to a home-dir-handler path handler. Otherwise, the request is passed
+  off in its entirety to the \semvar{default-path-handler}.
+          
+  This procedure is useful for implementing servers that provide the
+  semantics of the NCSA httpd server.
+\end{defundesc}
+          
+\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler}
+  This procedure returns a path-handler that passes the request off to
+  some program using the CGI interface. The script name is taken from
+  the car of the path; it is checked for occurrences of \ex{..}'s. If
+  the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the
+  program executed is
+  \ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}.
+
+  When the CGI path handler builds the process environment for the CGI
+  script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be
+  computed at server start-up time. This can be done by calling
+  \codex{(initialise-request-invariant-cgi-env)} 
+  when the server starts up. This is not necessary, but will make CGI
+  requests a little faster.
+\end{defundesc}
+          
+\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler} 
+  Returns a path handler that serves files from a particular root in
+  the file system. Only the GET operation is provided. The path
+  argument passed to the handler is converted into a filename, and
+  appended to root-dir.  The file name is checked for \ex{..}
+  components, and the transaction is aborted if it does.  Otherwise,
+  the file is served to the client.
+\end{defundesc}
+          
+\begin{defundesc}{null-path-handler}{path req}{\noreturn}
+  This path handler is useful as a default handler. It handles no
+  requests, always returning a ``404 Not found'' reply to the client.
+\end{defundesc}
+          
+\subsection{HTTP errors}
+  
+Authors of path-handlers need to be able to handle errors in a
+reasonably simple fashion. The S.U. Web server provides a set of error
+conditions that correspond to the error replies in the HTTP protocol.
+These errors can be raised with the \ex{http\=error} procedure. When
+the server runs a path handler, it runs it in the context of an error
+handler that catches these errors, sends an error reply to the client,
+and closes the transaction.
+   
+\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn}
+  This raises an http error condition. The reply code is one of the
+  numeric HTTP error reply codes, which are bound to the variables
+  \ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found,
+    http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The
+  \semvar{req} argument is the request record that caused the error.
+  Any following extra args are passed along for informational
+  purposes. Different HTTP errors take different types of extra
+  arguments. For example, the ``301 moved permanently'' and ``302
+  moved temporarily'' replies use the first two extra values as the
+  \ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header,
+  respectively. See the clauses of the
+  \ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details.
+\end{defundesc}
+          
+\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn}
+  This procedure writes an error reply out to the current output port.
+  If an error occurs during this process, it is caught, and the
+  procedure silently returns. The http server's standard error handler
+  passes all http errors raised during path-handler execution to this
+  procedure to generate the error reply before aborting the request
+  transaction.
+\end{defundesc}
+          
+\subsection{Simple directory generation}
+  
+Most path-handlers that serve files to clients eventually call an
+internal procedure named \ex{file\=serve}, which implements a simple
+directory-generation service using the following rules:
+\begin{itemize}
+\item If the filename has the form of a directory (i.e., it ends with
+  a slash), then \ex{file\=serve} actually looks for a file named
+  ``index.html'' in that directory.
+\item If the filename names a directory, but is not in directory form
+  (i.e., it doesn't end in a slash, as in
+  ``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''),
+  then \ex{file\=serve} sends back a ``301 moved permanently''
+  message, redirecting the client to a slash-terminated version of the
+  original URL. For example, the URL
+  \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers}
+  would be redirected to
+  \ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/}
+\item If the filename names a regular file, it is served to the
+  client.
+\end{itemize}
+       
+\subsection{Support procs}
+  
+The source files contain a host of support procedures which will be of
+utility to anyone writing a custom path-handler. Read the files first.
+\FIXME{Let us read the files and paste the contents here.}
+   
+\subsection{Losing}
+  
+   Be aware of two Unix problems, which may require workarounds:
+\begin{enumerate}
+\item NeXTSTEP's Posix implementation of the \ex{get\ob{}pwnam()}
+  routine will silently tell you that every user has uid 0. This means
+  that if your server, running as root, does a
+  \codex{(set-uid (user->uid "nobody"))}
+  it will essentially do a 
+  \codex{(set-uid 0)}
+  and you will thus still be running as root.  The fix is to manually
+  find out who user nobody is (he's -2 on my system), and to hard-wire
+  this into the server: 
+  \codex{(set-uid -2)} 
+  This problem is NeXTSTEP specific. If you are using not using
+  NeXTSTEP, no problem.
+\item On NeXTSTEP, the \ex{ip\=ad\ob{}dress->\ob{}host\=name}
+  translation routine (in C, \ex{get\ob{}host\ob{}by\ob{}addr()}; in
+  scsh, \ex{(host\=in\ob{}fo addr)}) does not use the DNS system; it
+  goes through NeXT's propietary Netinfo system, and may not return a
+  fully-qualified domain name. For example, on my system, I get
+  ``\ex{ame\ob{}lia\=ear\ob{}hart}'', when I want
+  ``\ex{ame\ob{}lia\=ear\ob{}hart.\ob{}lcs.\ob{}mit.\ob{}edu}''. Since
+  the server uses this name to construct redirection URL's to be sent
+  back to the Web client, they need to be FQDN's.
+  
+  This problem may occur on other OS's; I cannot determine if
+  \ex{get\ob{}host\ob{}by\ob{}addr()} is required to return a FQDN or
+  not. (I would appreciate hearing the answer if you know; my local
+  Internet guru's couldn't tell me.)
+  
+  If your system doesn't give you a complete Internet address when you
+  say
+  \codex{(host-info:name (host-info (system-name)))}
+  then you have this problem. 
+
+  The server has a workaround. There is a procedure exported from the
+  \ex{httpd\=core} package:
+  \codex{(set-my-fqdn name)}
+  Call this to crow-bar the server's idea of its own Internet host
+  name before running the server, and all will be well.
+\end{enumerate}
+
+%%% Local Variables: 
+%%% mode: latex
+%%% TeX-master: t
+%%% End: