diff --git a/scheme/httpd/su-httpd.txt b/scheme/httpd/su-httpd.txt deleted file mode 100644 index 9d7cfa9..0000000 --- a/scheme/httpd/su-httpd.txt +++ /dev/null @@ -1,352 +0,0 @@ -The Scheme Underground Web system -Olin Shivers -7/95 -Additions by Mike Sperber, 10/96 - -The Scheme underground Web system is a package of Scheme code that provides -utilities for interacting with the World-Wide Web. This includes: - - - A Web server. - - URI and URL parsers and un-parsers. - - RFC822-style header parsers. - - Code for performing structured html output - - Code to assist in writing CGI Scheme programs - that can be used by any CGI-compliant HTTP server - (such as NCSA's httpd, or the S.U. Web server). - -The code can be obtained via anonymous ftp and is implemented in Scheme 48, -using the system calls and support procedures of scsh, the Scheme Shell. The -code was written to be clear and modifiable -- it is voluminously commented -and all non-R4RS dependencies are described at the beginning of each source -file. - -I do not have the time to write detailed documentation for these packages. -However, they are very thoroughly commented, and I strongly recommend reading -the source files; they were written to be read, and the source code comments -should provide a clear description of the system. The remainder of this note -gives an overview of the server's basic architecture and interfaces. - - -* The Scheme Underground Web Server -The server was designed with three principle goals in mind: - - - Extensibility - The server is designed to make it easy to extend the basic - functionality. In fact, the server is nothing but extensions. There is - no distinction between the set of basic services provided by the server - implementation and user extensions -- they are both implemented in - Scheme, and have equal status. The design is "turtles all the way down." - - - Mobile code - Because the server is written in Scheme 48, it is simple to use the - Scheme 48 module system to upload programs to the server for safe - execution within a protected, server-chosen environment. The server - comes with a simple example upload service to demonstrate this - capability. - - - Clarity of implementation - Because the server is written in a high-level language, it should make - for a clearer exposition of the HTTP protocol and the associated URL - and URI notations than one written in a low-level language such as C. - This also should help to make the server easy to modify and adapt to - different uses. - - -** Basic server structure - -The Web server is started by calling the HTTPD procedure, which takes -one required and two optional arguments: - - (httpd path-handler [port working-directory]) - -The server accepts connections from the given port, which defaults to 80. -The server runs with the working directory set to the given value, -which defaults to - /usr/local/etc/httpd - -The server's basic loop is to wait on the port for a connection from an HTTP -client. When it receives a connection, it reads in and parses the request into -a special request data structure. Then the server forks a child process, who -binds the current I/O ports to the connection socket, and then hands off to -the top-level path handler (the first argument to httpd). The path-handler -procedure is responsible for actually serving the request -- it can be any -arbitrary computation. Its output goes directly back to the HTTP client that -sent the request. - -Before calling the path handler to service the request, the HTTP server -installs an error handler that fields any uncaught error, sends an -error reply to the client, and aborts the request transaction. Hence -any error caused by a path-handler will be handled in a reasonable and -robust fashion. - -The basic server loop, and the associated request data structure are the fixed -architecture of the S.U. Web server; its flexibility lies in the notion of -path handlers. - - -** Path handlers - -A path handler is a procedure taking two arguments: - - (path-handler path req) - -The REQ argument is a request record giving all the details of the -client's request; it has the following structure: - - (define-record request - method ; A string such as "GET", "PUT", etc. - uri ; The escaped URI string as read from request line. - url ; An http URL record (see url.scm). - version ; A (major . minor) integer pair. - headers ; An rfc822 header alist (see rfc822.scm). - socket) ; The socket connected to the client. - - -The PATH argument is the URL's path, parsed and split at slashes into a string -list. For example, if the Web client dereferences URL - - http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz - -then the server would pass the following path to the top-level handler: - - ("h" "shivers" "code" "web.tar.gz") - -The path argument's pre-parsed representation as a string list makes it easy -for the path handler to implement recursive operations dispatch on URL paths. - -Path handlers can do anything they like to respond to HTTP requests; they have -the full range of Scheme to implement the desired functionality. When -handling HTTP requests that have an associated entity body (such as POST), the -body should be read from the current input port. Path handlers should in all -cases write their reply to the current output port. Path handlers should *not* -perform I/O on the request record's socket. Path handlers are frequently -called recursively, and doing I/O directly to the socket might bypass a -filtering or other processing step interposed on the current I/O ports by some -superior path handler. - - -*** Basic path handlers - -Although the user can write any path-handler he likes, the S.U. server comes -with a useful toolbox of basic path handlers that can be used and built upon: - -(alist-path-dispatcher ph-alist default-ph) -> path-handler - This procedure takes a string->path-handler alist, and a default - path handler, and returns a handler that dispatches on its path argument. - When the new path handler is applied to a path ("foo" "bar" "baz"), - it uses the first element of the path -- "foo" -- to index into - the alist. If it finds an associated path handler in the alist, it - hands the request off to that handler, passing it the tail of the path, - ("bar" "baz"). On the other hand, if the path is empty, or the alist - search does not yield a hit, we hand off to the default path handler, - passing it the entire original path, ("foo" "bar" "baz"). - - This procedure is how you say: "If the first element of the URL's path - is `foo', do X; if it's `bar', do Y; otherwise, do Z." If one takes - an object-oriented view of the process, an alist path-handler does - method lookup on the requested operation, dispatching off to the - appropriate method defined for the URL. - - The slash-delimited URI path structure implies an associated - tree of names. The path-handler system and the alist dispatcher - allow you to procedurally define the server's response to any - arbitrary subtree of the path space. - - Example: - A typical top-level path handler is - - (define ph - (alist-path-dispatcher - `(("h" . ,(home-dir-handler "public_html")) - ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin")) - ("seval" . ,seval-handler)) - (rooted-file-handler "/usr/local/etc/httpd/htdocs"))) - - - This means: - - If the path looks like ("h" "shivers" "code" "web.tar.gz"), - pass the path ("shivers" "code" "web.tar.gz") to a - home-directory path handler. - - - If the path looks like ("cgi-bin" "calendar"), - pass ("calendar") off to the CGI path handler. - - - If the path looks like ("seval" ...), the tail of the path - is passed off to the code-uploading seval path handler. - - - Otherwise, the whole path is passed to a rooted file handler, who - will convert it into a filename, rooted at /usr/local/etc/httpd/htdocs, - and serve that file. - - -(home-dir-handler subdir) -> path-handler - This procedure builds a path handler that does basic file serving - out of home directories. If the resulting path handler is passed - a path of ( . ), then it serves the file - // - The path handler only handles GET requests; the filename is not - allowed to contain .. elements. - - -(tilde-home-dir-handler subdir default-path-handler) -> path-handler - This path handler examines the car of the path. If it is a string - beginning with a tilde, e.g., "~ziggy", then the string is taken to - mean a home directory, and the request is served similarly to a - HOME-DIR-HANDLER path handler. Otherwise, the request is passed off in - its entirety to the default path handler. - - This procedure is useful for implementing servers that provide the - semantics of the NCSA httpd server. - - -(cgi-handler cgi-directory) -> path-handler - This procedure returns a path-handler that passes the request off to some - program using the CGI interface. The script name is taken from the - car of the path; it is checked for occurrences of ..'s. If the path is - ("my-prog" "foo" "bar") - then the program executed is - /my-prog - - When the CGI path handler builds the process environment for the - CGI script, several elements (e.g., $PATH and $SERVER_SOFTWARE) - are request-invariant, and can be computed at server start-up time. - This can be done by calling - (initialise-request-invariant-cgi-env) - when the server starts up. This is *not* necessary, but will make CGI - requests a little faster. - - -(rooted-file-handler root-dir) -> path-handler - Returns a path handler that serves files from a particular root in the - file system. Only the GET operation is provided. The path argument - passed to the handler is converted into a filename, and appended to - ROOT-DIR. The file name is checked for .. components, and the - transaction is aborted if it does. Otherwise, the file is served to the - client. - - -(rooted-file-or-directory-handler root-dir icon-name) -> path-handler - The same as rooted-file-handler, except it can also serve - directory index listings for directories that do not contain a - file index.html. ICON-NAME is an object describing how to get at - the various icons required for generating directory listings. It - uses the icons provided by CERN httpd 3.0. ICON-NAME can either - be a string which is used as a prefix for generating the icon - URLs. If it is a procedure, it should accept an icon tag (read - httpd-handlers.scm for reference) and return an icon name. If it - is neither, it will just use the plain icon name, which is almost - guaranteed not to work. - - -(null-path-handler path req) - This path handler is useful as a default handler. It handles no requests, - always returning a "404 Not found" reply to the client. - - -** HTTP errors - -Authors of path-handlers need to be able to handle errors in a reasonably -simple fashion. The S.U. Web server provides a set of error conditions that -correspond to the error replies in the HTTP protocol. These errors can be -raised with the HTTP-ERROR procedure. When the server runs a path handler, -it runs it in the context of an error handler that catches these errors, -sends an error reply to the client, and closes the transaction. - -(http-error reply-code req [extra ...]) - This raises an http error condition. The reply code is one of the - numeric HTTP error reply codes, which are bound to the variables - HTTP-REPLY/OK, HTTP-REPLY/NOT-FOUND, HTTP-REPLY/BAD-REQUEST, and so - forth. The REQ argument is the request record that caused the error. - Any following EXTRA args are passed along for informational purposes. - Different HTTP errors take different types of extra arguments. For - example, the "301 moved permanently" and "302 moved temporarily" - replies use the first two extra values as the URI: and Location: fields - in the reply header, respectively. See the clauses of the - SEND-HTTP-ERROR-REPLY procedure for details. - -(send-http-error-reply reply-code request [extra ...]) - This procedure writes an error reply out to the current output - port. If an error occurs during this process, it is caught, and - the procedure silently returns. The http server's standard error - handler passes all http errors raised during path-handler execution - to this procedure to generate the error reply before aborting the - request transaction. - - -** Simple directory generation - -Most path-handlers that serve files to clients eventually call an internal -procedure named FILE-SERVE, which implements a simple directory-generation -service using the following rules: - - - If the filename has the *form* of a directory (i.e., it ends with a - slash), then FILE-SERVE actually looks for a file named "index.html" - in that directory. - - - If the filename names a directory, but is not in directory form - (i.e., it doesn't end in a slash, as in "/usr/include" or "/usr/raj"), - then FILE-SERVE sends back a "301 moved permanently" message, - redirecting the client to a slash-terminated version of the original - URL. For example, the URL - http://clark.lcs.mit.edu/~shivers - would be redirected to - http://clark.lcs.mit.edu/~shivers/ - - - If the filename names a regular file, it is served to the client. - - -** Support procs - -The source files contain a host of support procedures which will be of utility -to anyone writing a custom path-handler. Read the files first. - -** Local customization - - The http-core package exports a procedure: - - (set-server/admin! admin-name) - - which allows you to set the name of the site administrator. If you - don't set this, Olin may get unwanted mail and visit - disproportionate violence on you in return. - - There is a procedure exported from the httpd-core package: - - (set-my-fqdn! name) - - Call this to crow-bar the server's idea of its own Internet host - name before running the server, and all will be well. - - You may want this for one of several reasons. On NeXTSTEP and on - systems that do DNS via NIS/Yellow Pages, you only get an - unqualified hostname. Also, in case of aliased names, you just - might get the wrong one. Furthermore, you may get screwed in the - presence of a server accelerator such as Squid. - - There is a similar procedure in httpd-core: - - (set-my-port! portnum) - - Call this to set the local port of your server. This may be - important to get redirection right in the presence of a web server - accelerator. - -** Losing - -Be aware of certain Unix problems which may require workarounds: -1. NeXTSTEP's Posix implementation of the getpwnam() routine - will silently tell you that every user has uid 0. This means - that if your server, running as root, does a - (set-uid (user->uid "nobody")) - it will essentially do a - (set-uid 0) - and you will thus still be running as root. - - The fix is to manually find out who user nobody is (he's -2 on my - system), and to hard-wire this into the server: - (set-uid -2) - This problem is NeXTSTEP specific. If you are not using NeXTSTEP, - no problem. - - -