Made a HTML to LaTeX from the existant HTML docu on the Web server.
There are still a lots of FIXMEs.
This commit is contained in:
parent
3a5a0e7867
commit
d9fc32433d
|
@ -5,4 +5,355 @@
|
|||
\item[Name of the package:] ftpd
|
||||
\end{description}
|
||||
%
|
||||
Not documented yet.
|
||||
|
||||
\subsection{Introduction}
|
||||
|
||||
The \Scheme underground Web system is a package of \Scheme code
|
||||
that provides utilities for interacting with the World-Wide Web.
|
||||
This includes:
|
||||
\begin{itemize}
|
||||
\item A Web server.
|
||||
\item URI and URL parsers and un-parsers (see sections \ref{sec:uri}
|
||||
and \ref{sec:url}).
|
||||
\item RFC822-style header parsers (see section \ref{sec:rfc822}).
|
||||
\item Code for performing structured html output
|
||||
\item Code to assist in writing CGI \Scheme programs that can be used by
|
||||
any CGI-compliant HTTP server (such as NCSA's httpd, or the S.U.
|
||||
Web server).
|
||||
\end{itemize}
|
||||
|
||||
The code can be obtained via anonymous
|
||||
ftp\footnote{\ttt{}ftp://ftp-swiss.ai.mit.edu/pub/scsh/contrib/net/net.tar.gz}
|
||||
and is implemented in \scm, using the system calls and support
|
||||
procedures of scsh, the \Scheme Shell. The code was written to be
|
||||
clear and modifiable -- it is voluminously commented and all non-\RnRS
|
||||
dependencies are described at the beginning of each source file.
|
||||
|
||||
\FIXME{We should remove the note to read the source files and insert
|
||||
the essentials here instead.}
|
||||
I do not have the time to write detailed documentation for these
|
||||
packages. However, they are very thoroughly commented, and I strongly
|
||||
recommend reading the source files; they were written to be read, and
|
||||
the source code comments should provide a clear description of the
|
||||
system. The remainder of this note gives an overview of the server's
|
||||
basic architecture and interfaces.
|
||||
|
||||
\subsection{The Scheme Underground Web Server}
|
||||
|
||||
The server was designed with three principle goals in mind:
|
||||
|
||||
\begin{description}
|
||||
\item{Extensibility} \\
|
||||
The server is designed to make it easy to extend the basic
|
||||
functionality. In fact, the server is nothing but extensions. There
|
||||
is no distinction between the set of basic services provided by the
|
||||
server implementation and user extensions -- they are both
|
||||
implemented in Scheme, and have equal status. The design is ``turtles
|
||||
all the way down''.
|
||||
|
||||
\item{Mobile code} \\
|
||||
Because the server is written in \scm, it is simple to use the \scm
|
||||
module system to upload programs to the server for safe execution
|
||||
within a protected, server-chosen environment. The server comes with
|
||||
a simple example upload service to demonstrate this capability.
|
||||
|
||||
\item{Clarity of implementation} \\
|
||||
Because the server is written in a high-level language, it should
|
||||
make for a clearer exposition of the HTTP protocol and the
|
||||
associated URL and URI notations than one written in a low-level
|
||||
language such as C. This also should help to make the server easy to
|
||||
modify and adapt to different uses.
|
||||
\end{description}
|
||||
|
||||
\subsubsection*{Basic server structure}
|
||||
|
||||
The Web server is started by calling the httpd procedure, which takes
|
||||
one required and two optional arguments:
|
||||
|
||||
\defun{httpd}{path-handler \ovar{port working-directory}}{\noreturn}
|
||||
\begin{desc}
|
||||
The server accepts connections from the given \semvar{port}, which
|
||||
defaults to 80. The server runs with the \semvar{working-directory} set to
|
||||
the given value, which defaults to \ex{/usr/local/etc/httpd}.
|
||||
|
||||
The server's basic loop is to wait on the port for a connection from
|
||||
an HTTP client. When it receives a connection, it reads in and
|
||||
parses the request into a special request data structure. Then the
|
||||
server \FIXME{Does the server still fork or does it make a thunk. Is
|
||||
this a difference? (Do not know)} forks a child process, who binds
|
||||
the current I/O ports to the connection socket, and then hands off
|
||||
to the top-level \semvar{path-handler} (the first argument to
|
||||
httpd). The \semvar{path-handler} procedure is responsible for
|
||||
actually serving the request -- it can be any arbitrary computation.
|
||||
Its output goes directly back to the HTTP client that sent the
|
||||
request.
|
||||
|
||||
Before calling the path handler to service the request, the HTTP
|
||||
server installs an error handler that fields any uncaught error,
|
||||
sends an error reply to the client, and aborts the request
|
||||
transaction. Hence any error caused by a path-handler will be
|
||||
handled in a reasonable and robust fashion.
|
||||
|
||||
The basic server loop, and the associated request data structure are
|
||||
the fixed architecture of the S.U. Web server; its flexibility lies
|
||||
in the notion of path handlers.
|
||||
\end{desc}
|
||||
|
||||
\subsubsection*{Path handlers}
|
||||
|
||||
A path handler is a procedure taking two arguments:
|
||||
\defun{path-handler}{path req}{value}
|
||||
\begin{desc}
|
||||
The \semvar{req} argument is a request record giving all the details
|
||||
of the client's request; it has the following structure: \FIXME{Make
|
||||
the record's structure a table}
|
||||
\begin{code}
|
||||
(define-record request
|
||||
method ; A string such as "GET", "PUT", etc.
|
||||
uri ; The escaped URI string as read from request line.
|
||||
url ; An http URL record (see url.scm).
|
||||
version ; A (major . minor) integer pair.
|
||||
headers ; An rfc822 header alist (see rfc822.scm).
|
||||
socket) ; The socket connected to the client.\end{code}
|
||||
|
||||
The \semvar{path} argument is the URL's path, parsed and split at
|
||||
slashes into a string list. For example, if the Web client
|
||||
dereferences URL
|
||||
\codex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu:\ob{}8001/\ob{}h/\ob{}shi\ob{}vers/\ob{}co\ob{}de/\ob{}web.\ob{}tar.\ob{}gz}
|
||||
then the server would pass the following path to the top-level
|
||||
handler: \ex{("h"\ob{} "shivers"\ob{} "code"\ob{}
|
||||
"web.\ob{}tar.\ob{}gz")}
|
||||
|
||||
The \semvar{path} argument's pre-parsed representation as a string
|
||||
list makes it easy for the path handler to implement recursive
|
||||
operations dispatch on URL paths.
|
||||
\end{desc}
|
||||
|
||||
Path handlers can do anything they like to respond to HTTP requests;
|
||||
they have the full range of Scheme to implement the desired
|
||||
functionality. When handling HTTP requests that have an associated
|
||||
entity body (such as POST), the body should be read from the current
|
||||
input port. Path handlers should in all cases write their reply to the
|
||||
current output port. Path handlers should not perform I/O on the
|
||||
request record's socket. Path handlers are frequently called
|
||||
recursively, and doing I/O directly to the socket might bypass a
|
||||
filtering or other processing step interposed on the current I/O ports
|
||||
by some superior path handler.
|
||||
|
||||
\subsubsection*{Basic path handlers}
|
||||
|
||||
Although the user can write any path-handler he likes, the S.U. server
|
||||
comes with a useful toolbox of basic path handlers that can be used
|
||||
and built upon:
|
||||
|
||||
\begin{defundesc}{alist-path-dispatcher}{ph-alist default-ph}{path-handler}
|
||||
This procedure takes a \ex{string->\ob{}path\=handler} alist, and a
|
||||
default path handler, and returns a handler that dispatches on its
|
||||
path argument. When the new path handler is applied to a path
|
||||
\ex{("foo"\ob{} "bar"\ob{} "baz")}, it uses the first element of
|
||||
the path -- ``\ex{foo}'' -- to index into the alist. If it finds an
|
||||
associated path handler in the alist, it hands the request off to
|
||||
that handler, passing it the tail of the path, \ex{("bar"\ob{}
|
||||
"baz")}. On the other hand, if the path is empty, or the alist
|
||||
search does not yield a hit, we hand off to the default path
|
||||
handler, passing it the entire original path, \ex{("foo"\ob{}
|
||||
"bar"\ob{} "baz")}.
|
||||
|
||||
This procedure is how you say: ``If the first element of the URL's
|
||||
path is `foo', do X; if it's `bar', do Y; otherwise, do Z.'' If one
|
||||
takes an object-oriented view of the process, an alist path-handler
|
||||
does method lookup on the requested operation, dispatching off to
|
||||
the appropriate method defined for the URL.
|
||||
|
||||
The slash-delimited URI path structure implies an associated tree of
|
||||
names. The path-handler system and the alist dispatcher allow you to
|
||||
procedurally define the server's response to any arbitrary subtree
|
||||
of the path space.
|
||||
|
||||
Example: A typical top-level path handler is
|
||||
\begin{code}
|
||||
(define ph
|
||||
(alist-path-dispatcher
|
||||
`(("h" . ,(home-dir-handler "public_html"))
|
||||
("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
|
||||
("seval" . ,seval-handler))
|
||||
(rooted-file-handler "/usr/local/etc/httpd/htdocs")))\end{code}
|
||||
|
||||
This means:
|
||||
\begin{itemize}
|
||||
\item If the path looks like \ex{("h"\ob{} "shivers"\ob{}
|
||||
"code"\ob{} "web.\ob{}tar.\ob{}gz")}, pass the path
|
||||
\ex{("shivers"\ob{} "code"\ob{} "web.\ob{}tar.\ob{}gz")} to a
|
||||
home-directory path handler.
|
||||
\item If the path looks like \ex{("cgi-\ob{}bin"\ob{} "calendar")},
|
||||
pass ("calendar") off to the CGI path handler.
|
||||
\item If the path looks like \ex{("seval"\ob{} \ldots)}, the tail
|
||||
of the path is passed off to the code-uploading seval path
|
||||
handler.
|
||||
\item Otherwise, the whole path is passed to a rooted file handler,
|
||||
who will convert it into a filename, rooted at
|
||||
\ex{/usr/\ob{}lo\ob{}cal/\ob{}etc/\ob{}httpd/\ob{}htdocs},
|
||||
and serve that file.
|
||||
\end{itemize}
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{home-dir-handler}{subdir}{path-handler}
|
||||
This procedure builds a path handler that does basic file serving
|
||||
out of home directories. If the resulting \semvar{path-handler} is
|
||||
passed a path of \ex{(user . file\=path)}, then it serves the file
|
||||
\ex{user's\=ho\ob{}me\=di\ob{}rec\ob{}to\ob{}ry/\ob{}sub\ob{}dir/\ob{}file\=path}
|
||||
|
||||
The path handler only handles GET requests; the filename is not
|
||||
allowed to contain \ex{..} elements.
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{tilde-home-dir-handler}{subdir default-path-handler}{path-handler}
|
||||
This path handler examines the car of the path. If it is a string
|
||||
beginning with a tilde, e.g., \ex{"~ziggy"}, then the string is
|
||||
taken to mean a home directory, and the request is served similarly
|
||||
to a home-dir-handler path handler. Otherwise, the request is passed
|
||||
off in its entirety to the \semvar{default-path-handler}.
|
||||
|
||||
This procedure is useful for implementing servers that provide the
|
||||
semantics of the NCSA httpd server.
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{cgi-handler}{cgi-directory}{path-handler}
|
||||
This procedure returns a path-handler that passes the request off to
|
||||
some program using the CGI interface. The script name is taken from
|
||||
the car of the path; it is checked for occurrences of \ex{..}'s. If
|
||||
the path is \ex{("my\=prog"\ob{} "foo"\ob{} "bar")} then the
|
||||
program executed is
|
||||
\ex{cgi\=di\ob{}rec\ob{}to\ob{}ry\ob{}my\=prog}.
|
||||
|
||||
When the CGI path handler builds the process environment for the CGI
|
||||
script, several elements (e.g., \ex{\$PATH and \$SERVER\_SOFTWARE}) are request-invariant, and can be
|
||||
computed at server start-up time. This can be done by calling
|
||||
\codex{(initialise-request-invariant-cgi-env)}
|
||||
when the server starts up. This is not necessary, but will make CGI
|
||||
requests a little faster.
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{rooted-file-handler}{root-dir}{path-handler}
|
||||
Returns a path handler that serves files from a particular root in
|
||||
the file system. Only the GET operation is provided. The path
|
||||
argument passed to the handler is converted into a filename, and
|
||||
appended to root-dir. The file name is checked for \ex{..}
|
||||
components, and the transaction is aborted if it does. Otherwise,
|
||||
the file is served to the client.
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{null-path-handler}{path req}{\noreturn}
|
||||
This path handler is useful as a default handler. It handles no
|
||||
requests, always returning a ``404 Not found'' reply to the client.
|
||||
\end{defundesc}
|
||||
|
||||
\subsection{HTTP errors}
|
||||
|
||||
Authors of path-handlers need to be able to handle errors in a
|
||||
reasonably simple fashion. The S.U. Web server provides a set of error
|
||||
conditions that correspond to the error replies in the HTTP protocol.
|
||||
These errors can be raised with the \ex{http\=error} procedure. When
|
||||
the server runs a path handler, it runs it in the context of an error
|
||||
handler that catches these errors, sends an error reply to the client,
|
||||
and closes the transaction.
|
||||
|
||||
\begin{defundesc}{http-error}{reply-code req \ovar{extra \ldots}}{\noreturn}
|
||||
This raises an http error condition. The reply code is one of the
|
||||
numeric HTTP error reply codes, which are bound to the variables
|
||||
\ex{http\=re\ob{}ply/\ob{}ok, http\=re\ob{}ply/\ob{}not\=found,
|
||||
http\=re\ob{}ply/\ob{}bad\=request}, and so forth. The
|
||||
\semvar{req} argument is the request record that caused the error.
|
||||
Any following extra args are passed along for informational
|
||||
purposes. Different HTTP errors take different types of extra
|
||||
arguments. For example, the ``301 moved permanently'' and ``302
|
||||
moved temporarily'' replies use the first two extra values as the
|
||||
\ex{URI:} and \ex{Lo\-ca\-tion:} fields in the reply header,
|
||||
respectively. See the clauses of the
|
||||
\ex{send\=http\=er\ob{}ror\=re\ob{}ply} procedure for details.
|
||||
\end{defundesc}
|
||||
|
||||
\begin{defundesc}{send-http-error-reply}{reply-code request \ovar{extra \ldots}}{\noreturn}
|
||||
This procedure writes an error reply out to the current output port.
|
||||
If an error occurs during this process, it is caught, and the
|
||||
procedure silently returns. The http server's standard error handler
|
||||
passes all http errors raised during path-handler execution to this
|
||||
procedure to generate the error reply before aborting the request
|
||||
transaction.
|
||||
\end{defundesc}
|
||||
|
||||
\subsection{Simple directory generation}
|
||||
|
||||
Most path-handlers that serve files to clients eventually call an
|
||||
internal procedure named \ex{file\=serve}, which implements a simple
|
||||
directory-generation service using the following rules:
|
||||
\begin{itemize}
|
||||
\item If the filename has the form of a directory (i.e., it ends with
|
||||
a slash), then \ex{file\=serve} actually looks for a file named
|
||||
``index.html'' in that directory.
|
||||
\item If the filename names a directory, but is not in directory form
|
||||
(i.e., it doesn't end in a slash, as in
|
||||
``\ex{/usr\ob{}in\ob{}clu\ob{}de}'' or ``\ex{/usr\ob{}raj}''),
|
||||
then \ex{file\=serve} sends back a ``301 moved permanently''
|
||||
message, redirecting the client to a slash-terminated version of the
|
||||
original URL. For example, the URL
|
||||
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers}
|
||||
would be redirected to
|
||||
\ex{http://\ob{}clark.\ob{}lcs.\ob{}mit.\ob{}edu/\ob{}~shi\ob{}vers/}
|
||||
\item If the filename names a regular file, it is served to the
|
||||
client.
|
||||
\end{itemize}
|
||||
|
||||
\subsection{Support procs}
|
||||
|
||||
The source files contain a host of support procedures which will be of
|
||||
utility to anyone writing a custom path-handler. Read the files first.
|
||||
\FIXME{Let us read the files and paste the contents here.}
|
||||
|
||||
\subsection{Losing}
|
||||
|
||||
Be aware of two Unix problems, which may require workarounds:
|
||||
\begin{enumerate}
|
||||
\item NeXTSTEP's Posix implementation of the \ex{get\ob{}pwnam()}
|
||||
routine will silently tell you that every user has uid 0. This means
|
||||
that if your server, running as root, does a
|
||||
\codex{(set-uid (user->uid "nobody"))}
|
||||
it will essentially do a
|
||||
\codex{(set-uid 0)}
|
||||
and you will thus still be running as root. The fix is to manually
|
||||
find out who user nobody is (he's -2 on my system), and to hard-wire
|
||||
this into the server:
|
||||
\codex{(set-uid -2)}
|
||||
This problem is NeXTSTEP specific. If you are using not using
|
||||
NeXTSTEP, no problem.
|
||||
\item On NeXTSTEP, the \ex{ip\=ad\ob{}dress->\ob{}host\=name}
|
||||
translation routine (in C, \ex{get\ob{}host\ob{}by\ob{}addr()}; in
|
||||
scsh, \ex{(host\=in\ob{}fo addr)}) does not use the DNS system; it
|
||||
goes through NeXT's propietary Netinfo system, and may not return a
|
||||
fully-qualified domain name. For example, on my system, I get
|
||||
``\ex{ame\ob{}lia\=ear\ob{}hart}'', when I want
|
||||
``\ex{ame\ob{}lia\=ear\ob{}hart.\ob{}lcs.\ob{}mit.\ob{}edu}''. Since
|
||||
the server uses this name to construct redirection URL's to be sent
|
||||
back to the Web client, they need to be FQDN's.
|
||||
|
||||
This problem may occur on other OS's; I cannot determine if
|
||||
\ex{get\ob{}host\ob{}by\ob{}addr()} is required to return a FQDN or
|
||||
not. (I would appreciate hearing the answer if you know; my local
|
||||
Internet guru's couldn't tell me.)
|
||||
|
||||
If your system doesn't give you a complete Internet address when you
|
||||
say
|
||||
\codex{(host-info:name (host-info (system-name)))}
|
||||
then you have this problem.
|
||||
|
||||
The server has a workaround. There is a procedure exported from the
|
||||
\ex{httpd\=core} package:
|
||||
\codex{(set-my-fqdn name)}
|
||||
Call this to crow-bar the server's idea of its own Internet host
|
||||
name before running the server, and all will be well.
|
||||
\end{enumerate}
|
||||
|
||||
%%% Local Variables:
|
||||
%%% mode: latex
|
||||
%%% TeX-master: t
|
||||
%%% End:
|
||||
|
|
Loading…
Reference in New Issue