Long obsolete.
This commit is contained in:
parent
a95181e4bc
commit
4898196703
|
@ -1,352 +0,0 @@
|
||||||
The Scheme Underground Web system
|
|
||||||
Olin Shivers
|
|
||||||
7/95
|
|
||||||
Additions by Mike Sperber, 10/96
|
|
||||||
|
|
||||||
The Scheme underground Web system is a package of Scheme code that provides
|
|
||||||
utilities for interacting with the World-Wide Web. This includes:
|
|
||||||
|
|
||||||
- A Web server.
|
|
||||||
- URI and URL parsers and un-parsers.
|
|
||||||
- RFC822-style header parsers.
|
|
||||||
- Code for performing structured html output
|
|
||||||
- Code to assist in writing CGI Scheme programs
|
|
||||||
that can be used by any CGI-compliant HTTP server
|
|
||||||
(such as NCSA's httpd, or the S.U. Web server).
|
|
||||||
|
|
||||||
The code can be obtained via anonymous ftp and is implemented in Scheme 48,
|
|
||||||
using the system calls and support procedures of scsh, the Scheme Shell. The
|
|
||||||
code was written to be clear and modifiable -- it is voluminously commented
|
|
||||||
and all non-R4RS dependencies are described at the beginning of each source
|
|
||||||
file.
|
|
||||||
|
|
||||||
I do not have the time to write detailed documentation for these packages.
|
|
||||||
However, they are very thoroughly commented, and I strongly recommend reading
|
|
||||||
the source files; they were written to be read, and the source code comments
|
|
||||||
should provide a clear description of the system. The remainder of this note
|
|
||||||
gives an overview of the server's basic architecture and interfaces.
|
|
||||||
|
|
||||||
|
|
||||||
* The Scheme Underground Web Server
|
|
||||||
The server was designed with three principle goals in mind:
|
|
||||||
|
|
||||||
- Extensibility
|
|
||||||
The server is designed to make it easy to extend the basic
|
|
||||||
functionality. In fact, the server is nothing but extensions. There is
|
|
||||||
no distinction between the set of basic services provided by the server
|
|
||||||
implementation and user extensions -- they are both implemented in
|
|
||||||
Scheme, and have equal status. The design is "turtles all the way down."
|
|
||||||
|
|
||||||
- Mobile code
|
|
||||||
Because the server is written in Scheme 48, it is simple to use the
|
|
||||||
Scheme 48 module system to upload programs to the server for safe
|
|
||||||
execution within a protected, server-chosen environment. The server
|
|
||||||
comes with a simple example upload service to demonstrate this
|
|
||||||
capability.
|
|
||||||
|
|
||||||
- Clarity of implementation
|
|
||||||
Because the server is written in a high-level language, it should make
|
|
||||||
for a clearer exposition of the HTTP protocol and the associated URL
|
|
||||||
and URI notations than one written in a low-level language such as C.
|
|
||||||
This also should help to make the server easy to modify and adapt to
|
|
||||||
different uses.
|
|
||||||
|
|
||||||
|
|
||||||
** Basic server structure
|
|
||||||
|
|
||||||
The Web server is started by calling the HTTPD procedure, which takes
|
|
||||||
one required and two optional arguments:
|
|
||||||
|
|
||||||
(httpd path-handler [port working-directory])
|
|
||||||
|
|
||||||
The server accepts connections from the given port, which defaults to 80.
|
|
||||||
The server runs with the working directory set to the given value,
|
|
||||||
which defaults to
|
|
||||||
/usr/local/etc/httpd
|
|
||||||
|
|
||||||
The server's basic loop is to wait on the port for a connection from an HTTP
|
|
||||||
client. When it receives a connection, it reads in and parses the request into
|
|
||||||
a special request data structure. Then the server forks a child process, who
|
|
||||||
binds the current I/O ports to the connection socket, and then hands off to
|
|
||||||
the top-level path handler (the first argument to httpd). The path-handler
|
|
||||||
procedure is responsible for actually serving the request -- it can be any
|
|
||||||
arbitrary computation. Its output goes directly back to the HTTP client that
|
|
||||||
sent the request.
|
|
||||||
|
|
||||||
Before calling the path handler to service the request, the HTTP server
|
|
||||||
installs an error handler that fields any uncaught error, sends an
|
|
||||||
error reply to the client, and aborts the request transaction. Hence
|
|
||||||
any error caused by a path-handler will be handled in a reasonable and
|
|
||||||
robust fashion.
|
|
||||||
|
|
||||||
The basic server loop, and the associated request data structure are the fixed
|
|
||||||
architecture of the S.U. Web server; its flexibility lies in the notion of
|
|
||||||
path handlers.
|
|
||||||
|
|
||||||
|
|
||||||
** Path handlers
|
|
||||||
|
|
||||||
A path handler is a procedure taking two arguments:
|
|
||||||
|
|
||||||
(path-handler path req)
|
|
||||||
|
|
||||||
The REQ argument is a request record giving all the details of the
|
|
||||||
client's request; it has the following structure:
|
|
||||||
|
|
||||||
(define-record request
|
|
||||||
method ; A string such as "GET", "PUT", etc.
|
|
||||||
uri ; The escaped URI string as read from request line.
|
|
||||||
url ; An http URL record (see url.scm).
|
|
||||||
version ; A (major . minor) integer pair.
|
|
||||||
headers ; An rfc822 header alist (see rfc822.scm).
|
|
||||||
socket) ; The socket connected to the client.
|
|
||||||
|
|
||||||
|
|
||||||
The PATH argument is the URL's path, parsed and split at slashes into a string
|
|
||||||
list. For example, if the Web client dereferences URL
|
|
||||||
|
|
||||||
http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz
|
|
||||||
|
|
||||||
then the server would pass the following path to the top-level handler:
|
|
||||||
|
|
||||||
("h" "shivers" "code" "web.tar.gz")
|
|
||||||
|
|
||||||
The path argument's pre-parsed representation as a string list makes it easy
|
|
||||||
for the path handler to implement recursive operations dispatch on URL paths.
|
|
||||||
|
|
||||||
Path handlers can do anything they like to respond to HTTP requests; they have
|
|
||||||
the full range of Scheme to implement the desired functionality. When
|
|
||||||
handling HTTP requests that have an associated entity body (such as POST), the
|
|
||||||
body should be read from the current input port. Path handlers should in all
|
|
||||||
cases write their reply to the current output port. Path handlers should *not*
|
|
||||||
perform I/O on the request record's socket. Path handlers are frequently
|
|
||||||
called recursively, and doing I/O directly to the socket might bypass a
|
|
||||||
filtering or other processing step interposed on the current I/O ports by some
|
|
||||||
superior path handler.
|
|
||||||
|
|
||||||
|
|
||||||
*** Basic path handlers
|
|
||||||
|
|
||||||
Although the user can write any path-handler he likes, the S.U. server comes
|
|
||||||
with a useful toolbox of basic path handlers that can be used and built upon:
|
|
||||||
|
|
||||||
(alist-path-dispatcher ph-alist default-ph) -> path-handler
|
|
||||||
This procedure takes a string->path-handler alist, and a default
|
|
||||||
path handler, and returns a handler that dispatches on its path argument.
|
|
||||||
When the new path handler is applied to a path ("foo" "bar" "baz"),
|
|
||||||
it uses the first element of the path -- "foo" -- to index into
|
|
||||||
the alist. If it finds an associated path handler in the alist, it
|
|
||||||
hands the request off to that handler, passing it the tail of the path,
|
|
||||||
("bar" "baz"). On the other hand, if the path is empty, or the alist
|
|
||||||
search does not yield a hit, we hand off to the default path handler,
|
|
||||||
passing it the entire original path, ("foo" "bar" "baz").
|
|
||||||
|
|
||||||
This procedure is how you say: "If the first element of the URL's path
|
|
||||||
is `foo', do X; if it's `bar', do Y; otherwise, do Z." If one takes
|
|
||||||
an object-oriented view of the process, an alist path-handler does
|
|
||||||
method lookup on the requested operation, dispatching off to the
|
|
||||||
appropriate method defined for the URL.
|
|
||||||
|
|
||||||
The slash-delimited URI path structure implies an associated
|
|
||||||
tree of names. The path-handler system and the alist dispatcher
|
|
||||||
allow you to procedurally define the server's response to any
|
|
||||||
arbitrary subtree of the path space.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
A typical top-level path handler is
|
|
||||||
|
|
||||||
(define ph
|
|
||||||
(alist-path-dispatcher
|
|
||||||
`(("h" . ,(home-dir-handler "public_html"))
|
|
||||||
("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
|
|
||||||
("seval" . ,seval-handler))
|
|
||||||
(rooted-file-handler "/usr/local/etc/httpd/htdocs")))
|
|
||||||
|
|
||||||
|
|
||||||
This means:
|
|
||||||
- If the path looks like ("h" "shivers" "code" "web.tar.gz"),
|
|
||||||
pass the path ("shivers" "code" "web.tar.gz") to a
|
|
||||||
home-directory path handler.
|
|
||||||
|
|
||||||
- If the path looks like ("cgi-bin" "calendar"),
|
|
||||||
pass ("calendar") off to the CGI path handler.
|
|
||||||
|
|
||||||
- If the path looks like ("seval" ...), the tail of the path
|
|
||||||
is passed off to the code-uploading seval path handler.
|
|
||||||
|
|
||||||
- Otherwise, the whole path is passed to a rooted file handler, who
|
|
||||||
will convert it into a filename, rooted at /usr/local/etc/httpd/htdocs,
|
|
||||||
and serve that file.
|
|
||||||
|
|
||||||
|
|
||||||
(home-dir-handler subdir) -> path-handler
|
|
||||||
This procedure builds a path handler that does basic file serving
|
|
||||||
out of home directories. If the resulting path handler is passed
|
|
||||||
a path of (<user> . <file-path>), then it serves the file
|
|
||||||
<user's-home-directory>/<subdir>/<file-path>
|
|
||||||
The path handler only handles GET requests; the filename is not
|
|
||||||
allowed to contain .. elements.
|
|
||||||
|
|
||||||
|
|
||||||
(tilde-home-dir-handler subdir default-path-handler) -> path-handler
|
|
||||||
This path handler examines the car of the path. If it is a string
|
|
||||||
beginning with a tilde, e.g., "~ziggy", then the string is taken to
|
|
||||||
mean a home directory, and the request is served similarly to a
|
|
||||||
HOME-DIR-HANDLER path handler. Otherwise, the request is passed off in
|
|
||||||
its entirety to the default path handler.
|
|
||||||
|
|
||||||
This procedure is useful for implementing servers that provide the
|
|
||||||
semantics of the NCSA httpd server.
|
|
||||||
|
|
||||||
|
|
||||||
(cgi-handler cgi-directory) -> path-handler
|
|
||||||
This procedure returns a path-handler that passes the request off to some
|
|
||||||
program using the CGI interface. The script name is taken from the
|
|
||||||
car of the path; it is checked for occurrences of ..'s. If the path is
|
|
||||||
("my-prog" "foo" "bar")
|
|
||||||
then the program executed is
|
|
||||||
<cgi-directory>/my-prog
|
|
||||||
|
|
||||||
When the CGI path handler builds the process environment for the
|
|
||||||
CGI script, several elements (e.g., $PATH and $SERVER_SOFTWARE)
|
|
||||||
are request-invariant, and can be computed at server start-up time.
|
|
||||||
This can be done by calling
|
|
||||||
(initialise-request-invariant-cgi-env)
|
|
||||||
when the server starts up. This is *not* necessary, but will make CGI
|
|
||||||
requests a little faster.
|
|
||||||
|
|
||||||
|
|
||||||
(rooted-file-handler root-dir) -> path-handler
|
|
||||||
Returns a path handler that serves files from a particular root in the
|
|
||||||
file system. Only the GET operation is provided. The path argument
|
|
||||||
passed to the handler is converted into a filename, and appended to
|
|
||||||
ROOT-DIR. The file name is checked for .. components, and the
|
|
||||||
transaction is aborted if it does. Otherwise, the file is served to the
|
|
||||||
client.
|
|
||||||
|
|
||||||
|
|
||||||
(rooted-file-or-directory-handler root-dir icon-name) -> path-handler
|
|
||||||
The same as rooted-file-handler, except it can also serve
|
|
||||||
directory index listings for directories that do not contain a
|
|
||||||
file index.html. ICON-NAME is an object describing how to get at
|
|
||||||
the various icons required for generating directory listings. It
|
|
||||||
uses the icons provided by CERN httpd 3.0. ICON-NAME can either
|
|
||||||
be a string which is used as a prefix for generating the icon
|
|
||||||
URLs. If it is a procedure, it should accept an icon tag (read
|
|
||||||
httpd-handlers.scm for reference) and return an icon name. If it
|
|
||||||
is neither, it will just use the plain icon name, which is almost
|
|
||||||
guaranteed not to work.
|
|
||||||
|
|
||||||
|
|
||||||
(null-path-handler path req)
|
|
||||||
This path handler is useful as a default handler. It handles no requests,
|
|
||||||
always returning a "404 Not found" reply to the client.
|
|
||||||
|
|
||||||
|
|
||||||
** HTTP errors
|
|
||||||
|
|
||||||
Authors of path-handlers need to be able to handle errors in a reasonably
|
|
||||||
simple fashion. The S.U. Web server provides a set of error conditions that
|
|
||||||
correspond to the error replies in the HTTP protocol. These errors can be
|
|
||||||
raised with the HTTP-ERROR procedure. When the server runs a path handler,
|
|
||||||
it runs it in the context of an error handler that catches these errors,
|
|
||||||
sends an error reply to the client, and closes the transaction.
|
|
||||||
|
|
||||||
(http-error reply-code req [extra ...])
|
|
||||||
This raises an http error condition. The reply code is one of the
|
|
||||||
numeric HTTP error reply codes, which are bound to the variables
|
|
||||||
HTTP-REPLY/OK, HTTP-REPLY/NOT-FOUND, HTTP-REPLY/BAD-REQUEST, and so
|
|
||||||
forth. The REQ argument is the request record that caused the error.
|
|
||||||
Any following EXTRA args are passed along for informational purposes.
|
|
||||||
Different HTTP errors take different types of extra arguments. For
|
|
||||||
example, the "301 moved permanently" and "302 moved temporarily"
|
|
||||||
replies use the first two extra values as the URI: and Location: fields
|
|
||||||
in the reply header, respectively. See the clauses of the
|
|
||||||
SEND-HTTP-ERROR-REPLY procedure for details.
|
|
||||||
|
|
||||||
(send-http-error-reply reply-code request [extra ...])
|
|
||||||
This procedure writes an error reply out to the current output
|
|
||||||
port. If an error occurs during this process, it is caught, and
|
|
||||||
the procedure silently returns. The http server's standard error
|
|
||||||
handler passes all http errors raised during path-handler execution
|
|
||||||
to this procedure to generate the error reply before aborting the
|
|
||||||
request transaction.
|
|
||||||
|
|
||||||
|
|
||||||
** Simple directory generation
|
|
||||||
|
|
||||||
Most path-handlers that serve files to clients eventually call an internal
|
|
||||||
procedure named FILE-SERVE, which implements a simple directory-generation
|
|
||||||
service using the following rules:
|
|
||||||
|
|
||||||
- If the filename has the *form* of a directory (i.e., it ends with a
|
|
||||||
slash), then FILE-SERVE actually looks for a file named "index.html"
|
|
||||||
in that directory.
|
|
||||||
|
|
||||||
- If the filename names a directory, but is not in directory form
|
|
||||||
(i.e., it doesn't end in a slash, as in "/usr/include" or "/usr/raj"),
|
|
||||||
then FILE-SERVE sends back a "301 moved permanently" message,
|
|
||||||
redirecting the client to a slash-terminated version of the original
|
|
||||||
URL. For example, the URL
|
|
||||||
http://clark.lcs.mit.edu/~shivers
|
|
||||||
would be redirected to
|
|
||||||
http://clark.lcs.mit.edu/~shivers/
|
|
||||||
|
|
||||||
- If the filename names a regular file, it is served to the client.
|
|
||||||
|
|
||||||
|
|
||||||
** Support procs
|
|
||||||
|
|
||||||
The source files contain a host of support procedures which will be of utility
|
|
||||||
to anyone writing a custom path-handler. Read the files first.
|
|
||||||
|
|
||||||
** Local customization
|
|
||||||
|
|
||||||
The http-core package exports a procedure:
|
|
||||||
|
|
||||||
(set-server/admin! admin-name)
|
|
||||||
|
|
||||||
which allows you to set the name of the site administrator. If you
|
|
||||||
don't set this, Olin may get unwanted mail and visit
|
|
||||||
disproportionate violence on you in return.
|
|
||||||
|
|
||||||
There is a procedure exported from the httpd-core package:
|
|
||||||
|
|
||||||
(set-my-fqdn! name)
|
|
||||||
|
|
||||||
Call this to crow-bar the server's idea of its own Internet host
|
|
||||||
name before running the server, and all will be well.
|
|
||||||
|
|
||||||
You may want this for one of several reasons. On NeXTSTEP and on
|
|
||||||
systems that do DNS via NIS/Yellow Pages, you only get an
|
|
||||||
unqualified hostname. Also, in case of aliased names, you just
|
|
||||||
might get the wrong one. Furthermore, you may get screwed in the
|
|
||||||
presence of a server accelerator such as Squid.
|
|
||||||
|
|
||||||
There is a similar procedure in httpd-core:
|
|
||||||
|
|
||||||
(set-my-port! portnum)
|
|
||||||
|
|
||||||
Call this to set the local port of your server. This may be
|
|
||||||
important to get redirection right in the presence of a web server
|
|
||||||
accelerator.
|
|
||||||
|
|
||||||
** Losing
|
|
||||||
|
|
||||||
Be aware of certain Unix problems which may require workarounds:
|
|
||||||
1. NeXTSTEP's Posix implementation of the getpwnam() routine
|
|
||||||
will silently tell you that every user has uid 0. This means
|
|
||||||
that if your server, running as root, does a
|
|
||||||
(set-uid (user->uid "nobody"))
|
|
||||||
it will essentially do a
|
|
||||||
(set-uid 0)
|
|
||||||
and you will thus still be running as root.
|
|
||||||
|
|
||||||
The fix is to manually find out who user nobody is (he's -2 on my
|
|
||||||
system), and to hard-wire this into the server:
|
|
||||||
(set-uid -2)
|
|
||||||
This problem is NeXTSTEP specific. If you are not using NeXTSTEP,
|
|
||||||
no problem.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue