Made a HTML to LaTeX from the existant HTML docu on the Web server.
There are still a lots of FIXMEs.
This commit is contained in:
parent
3a5a0e7867
commit
d9fc32433d
|
@ -5,4 +5,355 @@
|
||||||
\item[Name of the package:] ftpd
|
\item[Name of the package:] ftpd
|
||||||
\end{description}
|
\end{description}
|
||||||
%
|
%
|
||||||
Not documented yet.
|
|
||||||
|
\subsection{Introduction}
|
||||||
|
|
||||||
|
The \Scheme underground Web system is a package of \Scheme code
|
||||||
|
that provides utilities for interacting with the World-Wide Web.
|
||||||
|
This includes:
|
||||||
|
\begin{itemize}
|
||||||
|
\item A Web server.
|
||||||
|
\item URI and URL parsers and un-parsers (see sections \ref{sec:uri}
|
||||||
|
and \ref{sec:url}).
|
||||||
|
\item RFC822-style header parsers (see section \ref{sec:rfc822}).
|
||||||
|
\item Code for performing structured html output
|
||||||
|
\item Code to assist in writing CGI \Scheme programs that can be used by
|
||||||
|
any CGI-compliant HTTP server (such as NCSA's httpd, or the S.U.
|
||||||
|
Web server).
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
The code can be obtained via anonymous
|
||||||
|
ftp\footnote{\ttt{}ftp://ftp-swiss.ai.mit.edu/pub/scsh/contrib/net/net.tar.gz}
|
||||||
|
and is implemented in \scm, using the system calls and support
|
||||||
|
procedures of scsh, the \Scheme Shell. The code was written to be
|
||||||
|
clear and modifiable -- it is voluminously commented and all non-\RnRS
|
||||||
|
dependencies are described at the beginning of each source file.
|
||||||
|
|
||||||
|
\FIXME{We should remove the note to read the source files and insert
|
||||||
|
the essentials here instead.}
|
||||||
|
I do not have the time to write detailed documentation for these
|
||||||
|
packages. However, they are very thoroughly commented, and I strongly
|
||||||
|
recommend reading the source files; they were written to be read, and
|
||||||
|
the source code comments should provide a clear description of the
|
||||||
|
system. The remainder of this note gives an overview of the server's
|
||||||
|
basic architecture and interfaces.
|
||||||
|
|
||||||
|
\subsection{The Scheme Underground Web Server}
|
||||||
|
|
||||||
|
The server was designed with three principle goals in mind:
|
||||||
|
|
||||||
|
\begin{description}
|
||||||
|
\item{Extensibility} \\
|
||||||
|
The server is designed to make it easy to extend the basic
|
||||||
|
functionality. In fact, the server is nothing but extensions. There
|
||||||
|
is no distinction between the set of basic services provided by the
|
||||||
|
server implementation and user extensions -- they are both
|
||||||
|
implemented in Scheme, and have equal status. The design is ``turtles
|
||||||
|
all the way down''.
|
||||||
|
|
||||||
|
\item{Mobile code} \\
|
||||||
|
Because the server is written in \scm, it is simple to use the \scm
|
||||||
|
module system to upload programs to the server for safe execution
|
||||||
|
within a protected, server-chosen environment. The server comes with
|
||||||
|
a simple example upload service to demonstrate this capability.
|
||||||
|
|
||||||
|
\item{Clarity of implementation} \\
|
||||||
|
Because the server is written in a high-level language, it should
|
||||||
|
make for a clearer exposition of the HTTP protocol and the
|
||||||
|
associated URL and URI notations than one written in a low-level
|
||||||
|
language such as C. This also should help to make the server easy to
|
||||||
|
modify and adapt to different uses.
|
||||||
|
\end{description}
|
||||||
|
|
||||||
|
\subsubsection*{Basic server structure}
|
||||||
|
|
||||||
|
The Web server is started by calling the httpd procedure, which takes
|
||||||
|
one required and two optional arguments:
|
||||||
|
|
||||||
|
\defun{httpd}{path-handler \ovar{port working-directory}}{\noreturn}
|
||||||
|
\begin{desc}
|
||||||
|
The server accepts connections from the given \semvar{port}, which
|
||||||
|
defaults to 80. The server runs with the \semvar{working-directory} set to
|
||||||
|
the given value, which defaults to \ex{/usr/local/etc/httpd}.
|
||||||
|
|
||||||
|
The server's basic loop is to wait on the port for a connection from
|
||||||
|
an HTTP client. When it receives a connection, it reads in and
|
||||||
|
parses the request into a special request data structure. Then the
|
||||||
|
server \FIXME{Does the server still fork or does it make a thunk. Is
|
||||||
|
this a difference? (Do not know)} forks a child process, who binds
|
||||||
|
the current I/O ports to the connection socket, and then hands off
|
||||||
|
to the top-level \semvar{path-handler} (the first argument to
|
||||||
|
httpd). The \semvar{path-handler} procedure is responsible for
|
||||||
|
actually serving the request -- it can be any arbitrary computation.
|
||||||
|
Its output goes directly back to the HTTP client that sent the
|
||||||
|
request.
|
||||||
|
|
||||||
|
Before calling the path handler to service the request, the HTTP
|
||||||
|
server installs an error handler that fields any uncaught error,
|
||||||
|
sends an error reply to the client, and aborts the request
|
||||||
|
transaction. Hence any error caused by a path-handler will be
|
||||||
|
handled in a reasonable and robust fashion.
|
||||||
|
|
||||||
|
The basic server loop, and the associated request data structure are
|
||||||
|
the fixed architecture of the S.U. Web server; its flexibility lies
|
||||||
|
in the notion of path handlers.
|
||||||
|
\end{desc}
|
||||||
|
|
||||||
|
\subsubsection*{Path handlers}
|
||||||
|
|
||||||
|
A path handler is a procedure taking two arguments:
|
||||||
|
\defun{path-handler}{path req}{value}
|
||||||
|
\begin{desc}
|
||||||
|
The \semvar{req} argument is a request record giving all the details
|
||||||
|
of the client's request; it has the following structure: \FIXME{Make
|
||||||
|
the record's structure a table}
|
||||||
|
\begin{code}
|
||||||
|
(define-record request
|
||||||
|
method ; A string such as "GET", "PUT", etc.
|
||||||
|
uri ; The escaped URI string as read from request line.
|
||||||
|
url ; An http URL record (see url.scm).
|
||||||
|
version ; A (major . minor) integer pair.
|
||||||
|
headers ; An rfc822 header alist (see rfc822.scm).
|
||||||
|
socket) ; The socket connected to the client.\end{code}
|
||||||
|
|
||||||
|
The \semvar{path} argument is the URL's path, parsed and split at
|
||||||
|
slashes into a string list. For example, if the Web client
|
||||||
|
dereferences URL
|
||||||
|
\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz}
|
||||||
|
then the server would pass the following path to the top-level
|
||||||
|
handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{}
|
||||||
|
"web.\ob{}tar.\ob{}gz")}
|
||||||
|
|
||||||
|
The \semvar{path} argument's pre-parsed representation as a string
|
||||||
|
list makes it easy for the path handler to implement recursive
|
||||||
|
operations dispatch on URL paths.
|
||||||
|
\end{desc}
|
||||||
|
|
||||||
|
Path handlers can do anything they like to respond to HTTP requests;
|
||||||
|
they have the full range of Scheme to implement the desired
|
||||||
|
functionality. When handling HTTP requests that have an associated
|
||||||
|
entity body (such as POST), the body should be read from the current
|
||||||
|
input port. Path handlers should in all cases write their reply to the
|
||||||
|
current output port. Path handlers should not perform I/O on the
|
||||||
|
request record's socket. Path handlers are frequently called
|
||||||
|
recursively, and doing I/O directly to the socket might bypass a
|
||||||
|
filtering or other processing step interposed on the current I/O ports
|
||||||
|
by some superior path handler.
|
||||||
|
|
||||||
|
\subsubsection*{Basic path handlers}
|
||||||
|
|
||||||
|
Although the user can write any path-handler he likes, the S.U. server
|
||||||
|
comes with a useful toolbox of basic path handlers that can be used
|
||||||
|
and built upon:
|
||||||
|
|
||||||
|
\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler}
|
||||||
|
This procedure takes a \ex{string->\ob{}path\=handler} alist, and a
|
||||||
|
default path handler, and returns a handler that dispatches on its
|
||||||
|
path argument. When the new path handler is applied to a path
|
||||||
|
\ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of
|
||||||
|
the path -- ``\ex{foo}'' -- to index into the alist. If it finds an
|
||||||
|
associated path handler in the alist, it hands the request off to
|
||||||
|
that handler, passing it the tail of the path, \ex{("bar"\ob{}
|
||||||
|
"baz")}. On the other hand, if the path is empty, or the alist
|
||||||
|
search does not yield a hit, we hand off to the default path
|
||||||
|
handler, passing it the entire original path, \ex{("foo"\ob{}
|
||||||
|
"bar"\ob{} "baz")}.
|
||||||
|
|
||||||
|
This procedure is how you say: ``If the first element of the URL's
|
||||||
|
path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one
|
||||||
|
takes an object-oriented view of the process, an alist path-handler
|
||||||
|
does method lookup on the requested operation, dispatching off to
|
||||||
|
the appropriate method defined for the URL.
|
||||||
|
|
||||||
|
The slash-delimited URI path structure implies an associated tree of
|
||||||
|
names. The path-handler system and the alist dispatcher allow you to
|
||||||
|
procedurally define the server's response to any arbitrary subtree
|
||||||
|
of the path space.
|
||||||
|
|
||||||
|
Example: A typical top-level path handler is
|
||||||
|
\begin{code}
|
||||||
|
(define ph
|
||||||
|
(alist-path-dispatcher
|
||||||
|
`(("h" . ,(home-dir-handler "public_html"))
|
||||||
|
("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
|
||||||
|
("seval" . ,seval-handler))
|
||||||
|
(rooted-file-handler "/usr/local/etc/httpd/htdocs")))\end{code}
|
||||||
|
|
||||||
|
This means:
|
||||||
|
\begin{itemize}
|
||||||
|
\item If the path looks like \ex{("h"\ob{} "shivers"\ob{}
|
||||||
|
"code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path
|
||||||
|
\ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a
|
||||||
|
home-directory path handler.
|
||||||
|
\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")},
|
||||||
|
pass ("calendar") off to the CGI path handler.
|
||||||
|
\item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail
|
||||||
|
of the path is passed off to the code-uploading seval path
|
||||||
|
handler.
|
||||||
|
\item Otherwise, the whole path is passed to a rooted file handler,
|
||||||
|
who will convert it into a filename, rooted at
|
||||||
|
\ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs},
|
||||||
|
and serve that file.
|
||||||
|
\end{itemize}
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{home-dir-handler}{subdir}{path-handler}
|
||||||
|
This procedure builds a path handler that does basic file serving
|
||||||
|
out of home directories. If the resulting \semvar{path-handler} is
|
||||||
|
passed a path of \ex{(user . file\=path)}, then it serves the file
|
||||||
|
\ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path}
|
||||||
|
|
||||||
|
The path handler only handles GET requests; the filename is not
|
||||||
|
allowed to contain \ex{..} elements.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler}
|
||||||
|
This path handler examines the car of the path. If it is a string
|
||||||
|
beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is
|
||||||
|
taken to mean a home directory, and the request is served similarly
|
||||||
|
to a home-dir-handler path handler. Otherwise, the request is passed
|
||||||
|
off in its entirety to the \semvar{default-path-handler}.
|
||||||
|
|
||||||
|
This procedure is useful for implementing servers that provide the
|
||||||
|
semantics of the NCSA httpd server.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler}
|
||||||
|
This procedure returns a path-handler that passes the request off to
|
||||||
|
some program using the CGI interface. The script name is taken from
|
||||||
|
the car of the path; it is checked for occurrences of \ex{..}'s. If
|
||||||
|
the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the
|
||||||
|
program executed is
|
||||||
|
\ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}.
|
||||||
|
|
||||||
|
When the CGI path handler builds the process environment for the CGI
|
||||||
|
script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be
|
||||||
|
computed at server start-up time. This can be done by calling
|
||||||
|
\codex{(initialise-request-invariant-cgi-env)}
|
||||||
|
when the server starts up. This is not necessary, but will make CGI
|
||||||
|
requests a little faster.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler}
|
||||||
|
Returns a path handler that serves files from a particular root in
|
||||||
|
the file system. Only the GET operation is provided. The path
|
||||||
|
argument passed to the handler is converted into a filename, and
|
||||||
|
appended to root-dir. The file name is checked for \ex{..}
|
||||||
|
components, and the transaction is aborted if it does. Otherwise,
|
||||||
|
the file is served to the client.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{null-path-handler}{path req}{\noreturn}
|
||||||
|
This path handler is useful as a default handler. It handles no
|
||||||
|
requests, always returning a ``404 Not found'' reply to the client.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\subsection{HTTP errors}
|
||||||
|
|
||||||
|
Authors of path-handlers need to be able to handle errors in a
|
||||||
|
reasonably simple fashion. The S.U. Web server provides a set of error
|
||||||
|
conditions that correspond to the error replies in the HTTP protocol.
|
||||||
|
These errors can be raised with the \ex{http\=error} procedure. When
|
||||||
|
the server runs a path handler, it runs it in the context of an error
|
||||||
|
handler that catches these errors, sends an error reply to the client,
|
||||||
|
and closes the transaction.
|
||||||
|
|
||||||
|
\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn}
|
||||||
|
This raises an http error condition. The reply code is one of the
|
||||||
|
numeric HTTP error reply codes, which are bound to the variables
|
||||||
|
\ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found,
|
||||||
|
http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The
|
||||||
|
\semvar{req} argument is the request record that caused the error.
|
||||||
|
Any following extra args are passed along for informational
|
||||||
|
purposes. Different HTTP errors take different types of extra
|
||||||
|
arguments. For example, the ``301 moved permanently'' and ``302
|
||||||
|
moved temporarily'' replies use the first two extra values as the
|
||||||
|
\ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header,
|
||||||
|
respectively. See the clauses of the
|
||||||
|
\ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn}
|
||||||
|
This procedure writes an error reply out to the current output port.
|
||||||
|
If an error occurs during this process, it is caught, and the
|
||||||
|
procedure silently returns. The http server's standard error handler
|
||||||
|
passes all http errors raised during path-handler execution to this
|
||||||
|
procedure to generate the error reply before aborting the request
|
||||||
|
transaction.
|
||||||
|
\end{defundesc}
|
||||||
|
|
||||||
|
\subsection{Simple directory generation}
|
||||||
|
|
||||||
|
Most path-handlers that serve files to clients eventually call an
|
||||||
|
internal procedure named \ex{file\=serve}, which implements a simple
|
||||||
|
directory-generation service using the following rules:
|
||||||
|
\begin{itemize}
|
||||||
|
\item If the filename has the form of a directory (i.e., it ends with
|
||||||
|
a slash), then \ex{file\=serve} actually looks for a file named
|
||||||
|
``index.html'' in that directory.
|
||||||
|
\item If the filename names a directory, but is not in directory form
|
||||||
|
(i.e., it doesn't end in a slash, as in
|
||||||
|
``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''),
|
||||||
|
then \ex{file\=serve} sends back a ``301 moved permanently''
|
||||||
|
message, redirecting the client to a slash-terminated version of the
|
||||||
|
original URL. For example, the URL
|
||||||
|
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers}
|
||||||
|
would be redirected to
|
||||||
|
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/}
|
||||||
|
\item If the filename names a regular file, it is served to the
|
||||||
|
client.
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
|
\subsection{Support procs}
|
||||||
|
|
||||||
|
The source files contain a host of support procedures which will be of
|
||||||
|
utility to anyone writing a custom path-handler. Read the files first.
|
||||||
|
\FIXME{Let us read the files and paste the contents here.}
|
||||||
|
|
||||||
|
\subsection{Losing}
|
||||||
|
|
||||||
|
Be aware of two Unix problems, which may require workarounds:
|
||||||
|
\begin{enumerate}
|
||||||
|
\item NeXTSTEP's Posix implementation of the \ex{get\ob{}pwnam()}
|
||||||
|
routine will silently tell you that every user has uid 0. This means
|
||||||
|
that if your server, running as root, does a
|
||||||
|
\codex{(set-uid (user->uid "nobody"))}
|
||||||
|
it will essentially do a
|
||||||
|
\codex{(set-uid 0)}
|
||||||
|
and you will thus still be running as root. The fix is to manually
|
||||||
|
find out who user nobody is (he's -2 on my system), and to hard-wire
|
||||||
|
this into the server:
|
||||||
|
\codex{(set-uid -2)}
|
||||||
|
This problem is NeXTSTEP specific. If you are using not using
|
||||||
|
NeXTSTEP, no problem.
|
||||||
|
\item On NeXTSTEP, the \ex{ip\=ad\ob{}dress->\ob{}host\=name}
|
||||||
|
translation routine (in C, \ex{get\ob{}host\ob{}by\ob{}addr()}; in
|
||||||
|
scsh, \ex{(host\=in\ob{}fo addr)}) does not use the DNS system; it
|
||||||
|
goes through NeXT's propietary Netinfo system, and may not return a
|
||||||
|
fully-qualified domain name. For example, on my system, I get
|
||||||
|
``\ex{ame\ob{}lia\=ear\ob{}hart}'', when I want
|
||||||
|
``\ex{ame\ob{}lia\=ear\ob{}hart.\ob{}lcs.\ob{}mit.\ob{}edu}''. Since
|
||||||
|
the server uses this name to construct redirection URL's to be sent
|
||||||
|
back to the Web client, they need to be FQDN's.
|
||||||
|
|
||||||
|
This problem may occur on other OS's; I cannot determine if
|
||||||
|
\ex{get\ob{}host\ob{}by\ob{}addr()} is required to return a FQDN or
|
||||||
|
not. (I would appreciate hearing the answer if you know; my local
|
||||||
|
Internet guru's couldn't tell me.)
|
||||||
|
|
||||||
|
If your system doesn't give you a complete Internet address when you
|
||||||
|
say
|
||||||
|
\codex{(host-info:name (host-info (system-name)))}
|
||||||
|
then you have this problem.
|
||||||
|
|
||||||
|
The server has a workaround. There is a procedure exported from the
|
||||||
|
\ex{httpd\=core} package:
|
||||||
|
\codex{(set-my-fqdn name)}
|
||||||
|
Call this to crow-bar the server's idea of its own Internet host
|
||||||
|
name before running the server, and all will be well.
|
||||||
|
\end{enumerate}
|
||||||
|
|
||||||
|
%%% Local Variables:
|
||||||
|
%%% mode: latex
|
||||||
|
%%% TeX-master: t
|
||||||
|
%%% End:
|
||||||
|
|
Loading…
Reference in New Issue