Note: Netscape typesets description lists in a manner that makes the procedure descriptions below blur together, even in the absence of the HTML COMPACT attribute. You may just wish to print out a simple ASCII version of this note, instead.
The code can be obtained via anonymous ftp and is implemented in Scheme 48, using the system calls and support procedures of scsh, the Scheme Shell. The code was written to be clear and modifiable -- it is voluminously commented and all non-R4RS dependencies are described at the beginning of each source file.
I do not have the time to write detailed documentation for these packages. However, they are very thoroughly commented, and I strongly recommend reading the source files; they were written to be read, and the source code comments should provide a clear description of the system. The remainder of this note gives an overview of the server's basic architecture and interfaces.
httpd
procedure,
which takes one required and two optional arguments:
(httpd path-handler [port working-directory])The server accepts connections from the given port, which defaults to 80. The server runs with the working directory set to the given value, which defaults to
/usr/local/etc/httpd
The server's basic loop is to wait on the port for a connection from an HTTP
client. When it receives a connection, it reads in and parses the request into
a special request data structure. Then the server forks a child process, who
binds the current I/O ports to the connection socket, and then hands off to
the top-level path handler (the first argument to httpd
).
The path-handler procedure is responsible for actually serving the request --
it can be any arbitrary computation.
Its output goes directly back to the HTTP client that sent the request.
Before calling the path handler to service the request, the HTTP server installs an error handler that fields any uncaught error, sends an error reply to the client, and aborts the request transaction. Hence any error caused by a path-handler will be handled in a reasonable and robust fashion.
The basic server loop, and the associated request data structure are the fixed architecture of the S.U. Web server; its flexibility lies in the notion of path handlers.
(path-handler path req)The req argument is a request record giving all the details of the client's request; it has the following structure:
(define-record request method ; A string such as "GET", "PUT", etc. uri ; The escaped URI string as read from request line. url ; An http URL record (see url.scm). version ; A (major . minor) integer pair. headers ; An rfc822 header alist (see rfc822.scm). socket) ; The socket connected to the client.The path argument is the URL's path, parsed and split at slashes into a string list. For example, if the Web client dereferences URL
http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gzthen the server would pass the following path to the top-level handler:
("h" "shivers" "code" "web.tar.gz")
The path argument's pre-parsed representation as a string list makes it easy for the path handler to implement recursive operations dispatch on URL paths.
Path handlers can do anything they like to respond to HTTP requests; they have the full range of Scheme to implement the desired functionality. When handling HTTP requests that have an associated entity body (such as POST), the body should be read from the current input port. Path handlers should in all cases write their reply to the current output port. Path handlers should not perform I/O on the request record's socket. Path handlers are frequently called recursively, and doing I/O directly to the socket might bypass a filtering or other processing step interposed on the current I/O ports by some superior path handler.
(alist-path-dispatcher ph-alist default-ph) -> path-handler
("foo" "bar" "baz")
,
it uses the first element of the path -- "foo"
-- to
index into the alist.
If it finds an associated path handler in the alist, it
hands the request off to that handler, passing it the tail of the
path, ("bar" "baz")
.
On the other hand, if the path is empty, or the alist search does
not yield a hit, we hand off to the default path handler,
passing it the entire original path, ("foo" "bar" "baz")
.
This procedure is how you say: "If the first element of the URL's path is `foo', do X; if it's `bar', do Y; otherwise, do Z." If one takes an object-oriented view of the process, an alist path-handler does method lookup on the requested operation, dispatching off to the appropriate method defined for the URL.
The slash-delimited URI path structure implies an associated tree of names. The path-handler system and the alist dispatcher allow you to procedurally define the server's response to any arbitrary subtree of the path space.
Example:
A typical top-level path handler is
(define ph (alist-path-dispatcher `(("h" . ,(home-dir-handler "public_html")) ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin")) ("seval" . ,seval-handler)) (rooted-file-handler "/usr/local/etc/httpd/htdocs")))This means:
("h" "shivers" "code" "web.tar.gz")
,
pass the path ("shivers" "code" "web.tar.gz")
to a
home-directory path handler.
("cgi-bin" "calendar")
,
pass ("calendar")
off to the CGI path handler.
("seval" ...)
,
the tail of the path is passed off to the code-uploading seval
path handler.
/usr/local/etc/httpd/htdocs
, and serve that file.
(home-dir-handler subdir) ->
path-handler
(user . file-path)
,
then it serves the file
user's-home-directory/subdir/file-pathThe path handler only handles GET requests; the filename is not allowed to contain
..
elements.
(tilde-home-dir-handler subdir default-path-handler)
-> path-handler
~ziggy
",
then the string is taken
to mean a home directory, and the request is served similarly to a
home-dir-handler
path handler.
Otherwise, the request is passed off
in its entirety to the default path handler.
This procedure is useful for implementing servers that provide the semantics of the NCSA httpd server.
(cgi-handler cgi-directory) -> path-handler
..
's.
If the path is
("my-prog" "foo" "bar")then the program executed is
cgi-directory/my-prog
When the CGI path handler builds the process environment for the
CGI script, several elements
(e.g., $PATH
and $SERVER_SOFTWARE
)
are request-invariant, and can be computed at server start-up time.
This can be done by calling
(initialise-request-invariant-cgi-env)when the server starts up. This is not necessary, but will make CGI requests a little faster.
(rooted-file-handler root-dir) -> path-handler
..
components,
and the transaction is aborted if it does. Otherwise, the file is
served to the client.
(null-path-handler path req)
http-error
procedure.
When the server runs a path handler,
it runs it in the context of an error handler that catches these errors,
sends an error reply to the client, and closes the transaction.
(http-error reply-code req [extra ...])
http-reply/ok
, http-reply/not-found
,
http-reply/bad-request
, and so
forth. The req argument is the request record that caused
the error.
Any following extra args are passed along for
informational purposes.
Different HTTP errors take different types of extra arguments.
For example, the "301 moved permanently" and "302 moved temporarily"
replies use the first two extra values as the
URI:
and Location:
fields in the reply header, respectively. See the clauses of the
send-http-error-reply
procedure for details.
(send-http-error-reply reply-code request
[extra ...])
file-serve
,
which implements a simple directory-generation service using the
following rules:
file-serve
actually looks for a
file named "index.html
" in that directory.
/usr/include
" or "/usr/raj
"),
then file-serve
sends back a "301 moved permanently"
message,
redirecting the client to a slash-terminated version of the original
URL. For example, the URL
http://clark.lcs.mit.edu/~shiverswould be redirected to
http://clark.lcs.mit.edu/~shivers/
getpwnam()
routine
will silently tell you that every user has uid 0. This means
that if your server, running as root, does a
(set-uid (user->uid "nobody"))it will essentially do a
(set-uid 0)and you will thus still be running as root.
The fix is to manually find out who user nobody is (he's -2 on my system), and to hard-wire this into the server:
(set-uid -2)This problem is NeXTSTEP specific. If you are using not using NeXTSTEP, no problem.
gethostbyaddr()
; in scsh,
(host-info addr)
) does not
use the DNS system; it goes through NeXT's propietary Netinfo
system, and may not return a fully-qualified domain name. For
example, on my system, I get "amelia-earhart", when I want
"amelia-earhart.lcs.mit.edu". Since the server uses this name
to construct redirection URL's to be sent back to the Web client,
they need to be FQDN's.
This problem may occur on other OS's;
I cannot determine if gethostbyaddr()
is required to return a FQDN or not. (I would appreciate hearing the
answer if you know; my local Internet guru's couldn't tell me.)
If your system doesn't give you a complete Internet address when you say
(host-info:name (host-info (system-name)))then you have this problem.
The server has a workaround. There is a procedure exported from the httpd-core package:
(set-my-fqdn name)Call this to crow-bar the server's idea of its own Internet host name before running the server, and all will be well.