483 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			483 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <!-- check for *..* emphasis, etc., i.e., e.g. -->
 | |
| <HTML>
 | |
| <HEAD>
 | |
| <TITLE>The Scheme Underground Web system</TITLE>
 | |
| </HEAD>
 | |
| 
 | |
| <BODY>
 | |
| <H1>The Scheme Underground Web System</H1>
 | |
| 
 | |
| <ADDRESS><A HREF="http://www.ai.mit.edu/people/shivers/">Olin Shivers</A>
 | |
|        / <A HREF="plan-file">shivers@ai.mit.edu</A>
 | |
| </ADDRESS>
 | |
| July 1995
 | |
| 
 | |
| <BLOCKQUOTE>
 | |
| Note: Netscape typesets description lists in a manner that makes the
 | |
| procedure descriptions below blur together, even in the absence of the
 | |
| HTML COMPACT attribute. You may just wish to print out a simple
 | |
| <A HREF="su-httpd.txt">ASCII version</A> of this note, instead.
 | |
| </BLOCKQUOTE>
 | |
| 
 | |
| 
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H2>Introduction</H2>
 | |
| 
 | |
| The
 | |
| <A HREF="http://www.ai.mit.edu/projects/su/su.html">Scheme underground</A>
 | |
| Web system is a package of
 | |
| <A HREF="http://www-swiss.ai.mit.edu/scheme-home.html">Scheme</A>
 | |
| code that provides
 | |
| utilities for interacting with the
 | |
| <A HREF="http://www.w3.org/">World-Wide Web</A>.
 | |
| This includes:
 | |
| <UL>
 | |
| <LI>  A Web server.
 | |
| <LI>  URI and URL parsers and un-parsers.
 | |
| <LI>  RFC822-style header parsers.
 | |
| <LI>  Code for performing structured html output
 | |
| <LI>  Code to assist in writing CGI Scheme programs
 | |
|       that can be used by any CGI-compliant HTTP server
 | |
|       (such as NCSA's httpd, or the S.U. Web server).
 | |
| </UL>
 | |
| 
 | |
|  <P>
 | |
| The code can be obtained via
 | |
| <A HREF="ftp://ftp-swiss.ai.mit.edu/pub/scsh/contrib/net/net.tar.gz">
 | |
| anonymous ftp</A>
 | |
| and is implemented in
 | |
| <A HREF="http://www-swiss.ai.mit.edu/~jar/s48.html">Scheme 48</A>,
 | |
| using the system calls and support procedures of
 | |
| <A HREF="http://www-swiss.ai.mit.edu/scsh/scsh.html">scsh</A>,
 | |
| the Scheme Shell.
 | |
| The code was written to be clear and modifiable --
 | |
| it is voluminously commented and all non-R4RS dependencies are
 | |
| described at the beginning of each source file.
 | |
| 
 | |
|  <P>
 | |
| I do not have the time to write detailed documentation for these packages.
 | |
| However, they are very thoroughly commented, and I strongly recommend
 | |
| reading the source files; they were written to be read, and the source
 | |
| code comments should provide a clear description of the system.
 | |
| The remainder of this note gives an overview of the server's basic
 | |
| architecture and interfaces.
 | |
| 
 | |
| <H2>The Scheme Underground Web Server</H2>
 | |
| 
 | |
| The server was designed with three principle goals in mind:
 | |
| <DL>
 | |
| <DT> Extensibility
 | |
| <DD> The server is designed to make it easy to extend the basic
 | |
|      functionality.  In fact, the server is nothing but extensions.  There is
 | |
|      no distinction between the set of basic services provided by the server
 | |
|      implementation and user extensions -- they are both implemented in
 | |
|      Scheme, and have equal status. The design is "turtles all the way down."
 | |
| 
 | |
| 
 | |
| <DT> Mobile code
 | |
| <DD> Because the server is written in Scheme 48, it is simple to use the
 | |
|      Scheme 48 module system to upload programs to the server for safe
 | |
|      execution within a protected, server-chosen environment. The server
 | |
|      comes with a simple example upload service to demonstrate this
 | |
|      capability.
 | |
| 
 | |
| 
 | |
| <DT> Clarity of implementation
 | |
| <DD> Because the server is written in a high-level language, it should make
 | |
|      for a clearer exposition of the HTTP protocol and the associated URL
 | |
|      and URI notations than one written in a low-level language such as C.
 | |
|      This also should help to make the server easy to modify and adapt to
 | |
|      different uses.
 | |
| </DL>
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Basic server structure</H3>
 | |
| 
 | |
| The Web server is started by calling the <CODE>httpd</CODE> procedure,
 | |
| which takes one required and two optional arguments:
 | |
| <PRE>
 | |
|     (httpd <VAR>path-handler</VAR> [<VAR>port</VAR> <VAR>working-directory</VAR>])
 | |
| </PRE>
 | |
| 
 | |
| The server accepts connections from the given port, which defaults to 80.
 | |
| The server runs with the working directory set to the given value,
 | |
| which defaults to
 | |
| <PRE>
 | |
|     /usr/local/etc/httpd
 | |
| </PRE>
 | |
| 
 | |
| 
 | |
|  <P>
 | |
| The server's basic loop is to wait on the port for a connection from an HTTP
 | |
| client. When it receives a connection, it reads in and parses the request into
 | |
| a special request data structure. Then the server forks a child process, who
 | |
| binds the current I/O ports to the connection socket, and then hands off to
 | |
| the top-level path handler (the first argument to <CODE>httpd</CODE>).
 | |
| The path-handler procedure is responsible for actually serving the request --
 | |
| it can be any arbitrary computation.
 | |
| Its output goes directly back to the HTTP client that sent the request.
 | |
| 
 | |
|  <P>
 | |
| Before calling the path handler to service the request, the HTTP server
 | |
| installs an error handler that fields any uncaught error, sends an
 | |
| error reply to the client, and aborts the request transaction. Hence
 | |
| any error caused by a path-handler will be handled in a reasonable and
 | |
| robust fashion.
 | |
| 
 | |
|  <P>
 | |
| The basic server loop, and the associated request data structure are the fixed
 | |
| architecture of the S.U. Web server; its flexibility lies in the notion of
 | |
| path handlers.
 | |
| 
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Path handlers</H3>
 | |
| 
 | |
| A path handler is a procedure taking two arguments:
 | |
| <PRE>
 | |
|     (path-handler <VAR>path</VAR> <VAR>req</VAR>)
 | |
| </PRE>
 | |
| 
 | |
| 
 | |
| The <VAR>req</VAR> argument is a request record giving all the details of the
 | |
| client's request; it has the following structure:
 | |
| <PRE>
 | |
|     (define-record request
 | |
|       method		; A string such as "GET", "PUT", etc.
 | |
|       uri		; The escaped URI string as read from request line.
 | |
|       url		; An http URL record (see url.scm).
 | |
|       version		; A (major . minor) integer pair.
 | |
|       headers		; An rfc822 header alist (see rfc822.scm).
 | |
|       socket)		; The socket connected to the client.
 | |
| </PRE>
 | |
| 
 | |
| The <VAR>path</VAR> argument is the URL's path,
 | |
| parsed and split at slashes into a string list.
 | |
| For example, if the Web client dereferences URL
 | |
| <PRE>
 | |
|     http://clark.lcs.mit.edu:8001/h/shivers/code/web.tar.gz
 | |
| </PRE>
 | |
| then the server would pass the following path to the top-level handler:
 | |
| <PRE>
 | |
|     ("h" "shivers" "code" "web.tar.gz")
 | |
| </PRE>
 | |
| 
 | |
|  <P>
 | |
| The path argument's pre-parsed representation as a string list makes it easy
 | |
| for the path handler to implement recursive operations dispatch on URL paths.
 | |
| 
 | |
|  <P>
 | |
| Path handlers can do anything they like to respond to HTTP requests; they have
 | |
| the full range of Scheme to implement the desired functionality.  When
 | |
| handling HTTP requests that have an associated entity body (such as POST), the
 | |
| body should be read from the current input port. Path handlers should in all
 | |
| cases write their reply to the current output port. Path handlers should
 | |
| <EM>not</EM> perform I/O on the request record's socket.
 | |
| Path handlers are frequently called recursively, and doing I/O directly to the
 | |
| socket might bypass a filtering or other processing step interposed on the
 | |
| current I/O ports by some superior path handler.
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Basic path handlers</H3>
 | |
| 
 | |
| Although the user can write any path-handler he likes, the S.U. server comes
 | |
| with a useful toolbox of basic path handlers that can be used and built upon:
 | |
| 
 | |
| <DL>
 | |
| 
 | |
| <DT>
 | |
| <CODE>(alist-path-dispatcher <VAR>ph-alist</VAR> <VAR>default-ph</VAR>) -> <VAR>path-handler</VAR>
 | |
| </CODE>
 | |
| <DD>
 | |
|     This procedure takes a string->path-handler alist, and a default
 | |
|     path handler, and returns a handler that dispatches on its path argument.
 | |
|     When the new path handler is applied to a path
 | |
|     <CODE>("foo" "bar" "baz")</CODE>,
 | |
|     it uses the first element of the path -- <CODE>"foo"</CODE> -- to
 | |
|     index into the alist.
 | |
|     If it finds an associated path handler in the alist, it
 | |
|     hands the request off to that handler, passing it the tail of the
 | |
|     path, <CODE>("bar" "baz")</CODE>.
 | |
|     On the other hand, if the path is empty, or the alist search does
 | |
|     not yield a hit, we hand off to the default path handler,
 | |
|     passing it the entire original path, <CODE>("foo" "bar" "baz")</CODE>.
 | |
| 
 | |
|     <P>
 | |
|     This procedure is how you say: "If the first element of the URL's path
 | |
|     is `foo', do X; if it's `bar', do Y; otherwise, do Z." If one takes
 | |
|     an object-oriented view of the process, an alist path-handler does
 | |
|     method lookup on the requested operation, dispatching off to the
 | |
|     appropriate method defined for the URL.
 | |
| 
 | |
|     <P>
 | |
|     The slash-delimited URI path structure implies an associated
 | |
|     tree of names. The path-handler system and the alist dispatcher
 | |
|     allow you to procedurally define the server's response to any arbitrary
 | |
|     subtree of the path space.
 | |
| 
 | |
|     <P>
 | |
|     Example: <br>
 | |
|     A typical top-level path handler is
 | |
| 
 | |
| <PRE>
 | |
|   (define ph
 | |
|     (alist-path-dispatcher
 | |
| 	`(("h"       . ,(home-dir-handler "public_html"))
 | |
| 	  ("cgi-bin" . ,(cgi-handler "/usr/local/etc/httpd/cgi-bin"))
 | |
| 	  ("seval"   . ,seval-handler))
 | |
| 	(rooted-file-handler "/usr/local/etc/httpd/htdocs")))
 | |
| </PRE>
 | |
| 
 | |
|     This means:
 | |
| <UL>
 | |
| <LI> If the path looks like <CODE>("h" "shivers" "code" "web.tar.gz")</CODE>,
 | |
|      pass the path <CODE>("shivers" "code" "web.tar.gz")</CODE> to a
 | |
|      home-directory path handler.
 | |
| 
 | |
| 
 | |
| <LI> If the path looks like <CODE>("cgi-bin" "calendar")</CODE>,
 | |
|      pass <CODE>("calendar")</CODE> off to the CGI path handler.
 | |
| 
 | |
| 
 | |
| <LI> If the path looks like <CODE>("seval" ...)</CODE>,
 | |
|      the tail of the path is passed off to the code-uploading seval
 | |
|      path handler.
 | |
| 
 | |
| <LI> Otherwise, the whole path is passed to a rooted file handler, who
 | |
|      will convert it into a filename, rooted at
 | |
|      <CODE>/usr/local/etc/httpd/htdocs</CODE>, and serve that file.
 | |
| </UL>
 | |
| 
 | |
| 
 | |
| <DT> <CODE>(home-dir-handler <VAR>subdir</VAR>) ->
 | |
|            <VAR>path-handler</CODE></VAR>
 | |
| <DD>
 | |
|     This procedure builds a path handler that does basic file serving
 | |
|     out of home directories. If the resulting path handler is passed
 | |
|     a path of <CODE>(<VAR>user</VAR> . <VAR>file-path</VAR>)</CODE>,
 | |
|     then it serves the file
 | |
| <PRE>
 | |
|     <VAR>user's-home-directory</VAR>/<VAR>subdir</VAR>/<VAR>file-path</VAR>
 | |
| </PRE>
 | |
|     The path handler only handles GET requests; the filename is not
 | |
|     allowed to contain <CODE>..</CODE> elements.
 | |
| 
 | |
| 
 | |
| <DT>
 | |
| <CODE>(tilde-home-dir-handler <VAR>subdir</VAR> <VAR>default-path-handler</VAR>)
 | |
|        -> <VAR>path-handler</VAR>
 | |
| </CODE>
 | |
| <DD>
 | |
|     This path handler examines the car of the path. If it is a string
 | |
|     beginning with a tilde, <em>e.g.</em>, "<CODE>~ziggy</CODE>",
 | |
|     then the string is taken
 | |
|     to mean a home directory, and the request is served similarly to a
 | |
|     <CODE>home-dir-handler</CODE> path handler.
 | |
|     Otherwise, the request is passed off
 | |
|     in its entirety to the default path handler.
 | |
| 
 | |
|     <P>
 | |
|     This procedure is useful for implementing servers that provide the
 | |
|     semantics of the NCSA httpd server.
 | |
| 
 | |
| 
 | |
| <DT>
 | |
| <CODE>(cgi-handler <VAR>cgi-directory</VAR>) -> <VAR>path-handler</VAR>
 | |
| </CODE>
 | |
| <DD>
 | |
|     This procedure returns a path-handler that passes the request off to some
 | |
|     program using the CGI interface. The script name is taken from the
 | |
|     car of the path; it is checked for occurrences of <CODE>..</CODE>'s.
 | |
|     If the path is
 | |
| <PRE>
 | |
|     ("my-prog" "foo" "bar")
 | |
| </PRE>
 | |
|     then the program executed is
 | |
| <PRE>
 | |
|     <VAR>cgi-directory</VAR>/my-prog
 | |
| </PRE>
 | |
|     <P>
 | |
|     When the CGI path handler builds the process environment for the
 | |
|     CGI script, several elements
 | |
|     (<em>e.g.</em>, <CODE>$PATH</CODE> and <CODE>$SERVER_SOFTWARE</CODE>)
 | |
|     are request-invariant, and can be computed at server start-up time.
 | |
|     This can be done by calling
 | |
| <PRE>
 | |
|     (initialise-request-invariant-cgi-env)
 | |
| </PRE>
 | |
|     when the server starts up. This is <EM>not</EM> necessary,
 | |
|     but will make CGI requests a little faster.
 | |
| 
 | |
| 
 | |
| <DT>
 | |
| <CODE>(rooted-file-handler <VAR>root-dir</VAR>) -> <VAR>path-handler</VAR>
 | |
| </CODE>
 | |
| <DD>
 | |
|     Returns a path handler that serves files from a particular root
 | |
|     in the file system. Only the GET operation is provided. The path
 | |
|     argument passed to the handler is converted into a filename,
 | |
|     and appended to <VAR>root-dir</VAR>.
 | |
|     The file name is checked for <CODE>..</CODE> components,
 | |
|     and the transaction is aborted if it does. Otherwise, the file is
 | |
|     served to the client.
 | |
| 
 | |
| <DT>
 | |
| <CODE>(null-path-handler <VAR>path</VAR> <VAR>req</VAR>)</CODE>
 | |
| <DD>
 | |
|     This path handler is useful as a default handler. It handles no requests,
 | |
|     always returning a "404 Not found" reply to the client.
 | |
| 
 | |
| </DL>
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>HTTP errors</H3>
 | |
| 
 | |
| Authors of path-handlers need to be able to handle errors in a reasonably
 | |
| simple fashion. The S.U. Web server provides a set of error conditions that
 | |
| correspond to the error replies in the HTTP protocol. These errors can be
 | |
| raised with the <CODE>http-error</CODE> procedure.
 | |
| When the server runs a path handler,
 | |
| it runs it in the context of an error handler that catches these errors,
 | |
| sends an error reply to the client, and closes the transaction.
 | |
| 
 | |
| <DL>
 | |
| 
 | |
| <DT>
 | |
| <CODE>(http-error <VAR>reply-code</VAR> <VAR>req</VAR> [<VAR>extra</VAR> ...])</CODE>
 | |
| <DD>
 | |
|     This raises an http error condition. The reply code is one of the
 | |
|     numeric HTTP error reply codes, which are bound to the variables
 | |
|     <CODE>http-reply/ok</CODE>, <CODE>http-reply/not-found</CODE>,
 | |
|     <CODE>http-reply/bad-request</CODE>, and so
 | |
|     forth. The <VAR>req</VAR> argument is the request record that caused
 | |
|     the error.
 | |
|     Any following <VAR>extra</VAR> args are passed along for
 | |
|     informational purposes.
 | |
|     Different HTTP errors take different types of extra arguments.
 | |
|     For example, the "301 moved permanently" and "302 moved temporarily"
 | |
|     replies use the first two <VAR>extra</VAR> values as the
 | |
|     <CODE>URI:</CODE> and <CODE>Location:</CODE>
 | |
|     fields in the reply header, respectively. See the clauses of the
 | |
|     <CODE>send-http-error-reply</CODE> procedure for details.
 | |
| 
 | |
| 
 | |
| <DT>
 | |
| <CODE>(send-http-error-reply <VAR>reply-code</VAR> <VAR>request</VAR>
 | |
|                              [<VAR>extra</VAR> ...])
 | |
| </CODE>
 | |
| <DD>
 | |
|     This procedure writes an error reply out to the current output
 | |
|     port. If an error occurs during this process, it is caught, and
 | |
|     the procedure silently returns. The http server's standard error
 | |
|     handler passes all http errors raised during path-handler execution
 | |
|     to this procedure to generate the error reply before aborting the
 | |
|     request transaction.
 | |
| </DL>
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Simple directory generation</H3>
 | |
| 
 | |
| Most path-handlers that serve files to clients eventually call an internal
 | |
| procedure named <CODE>file-serve</CODE>,
 | |
| which implements a simple directory-generation service using the
 | |
| following rules:
 | |
| <UL>
 | |
| <LI> If the filename has the <EM>form</EM> of a directory
 | |
|      (<EM>i.e.</EM>, it ends with a slash),
 | |
|      then <CODE>file-serve</CODE> actually looks for a
 | |
|      file named "<CODE>index.html</CODE>" in that directory.
 | |
| 
 | |
| <LI> If the filename names a directory, but is not in directory form
 | |
|       (<EM>i.e.</EM>, it doesn't end in a slash,
 | |
|       as in "<CODE>/usr/include</CODE>" or "<CODE>/usr/raj</CODE>"),
 | |
|       then <CODE>file-serve</CODE> sends back a "301 moved permanently"
 | |
|       message,
 | |
|       redirecting the client to a slash-terminated version of the original
 | |
|       URL. For example, the URL
 | |
| <PRE>
 | |
|     http://clark.lcs.mit.edu/~shivers
 | |
| </PRE>
 | |
|       would be redirected to
 | |
| <PRE>
 | |
|     http://clark.lcs.mit.edu/~shivers/
 | |
| </PRE>
 | |
| 
 | |
| <LI> If the filename names a regular file, it is served to the client.
 | |
| </UL>
 | |
| 
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Support procs</H3>
 | |
| 
 | |
| The source files contain a host of support procedures which will be of utility
 | |
| to anyone writing a custom path-handler. Read the files first.
 | |
| 
 | |
| 
 | |
| <!---------------------------------------------------------------------------->
 | |
| <H3>Losing</H3>
 | |
| 
 | |
| Be aware of two Unix problems, which may require workarounds:
 | |
| <OL>
 | |
| 
 | |
| <LI>
 | |
|    NeXTSTEP's Posix implementation of the <CODE>getpwnam()</CODE> routine
 | |
|    will silently tell you that every user has uid 0. This means
 | |
|    that if your server, running as root, does a
 | |
| <PRE>
 | |
|     (set-uid (user->uid "nobody"))
 | |
| </PRE>
 | |
|    it will essentially do a
 | |
| <PRE>
 | |
|     (set-uid 0)
 | |
| </PRE>
 | |
|    and you will thus still be running as root.
 | |
| 
 | |
|    <P>
 | |
|    The fix is to manually find out who user nobody is (he's -2 on my
 | |
|    system), and to hard-wire this into the server:
 | |
| <PRE>
 | |
|     (set-uid -2)
 | |
| </PRE>
 | |
|    This problem is NeXTSTEP specific. If you are using not using NeXTSTEP,
 | |
|    no problem.
 | |
| 
 | |
| 
 | |
| <LI>
 | |
|    On NeXTSTEP, the ip-address->host-name translation routine
 | |
|    (in C, <CODE>gethostbyaddr()</CODE>; in scsh,
 | |
|    <CODE>(host-info addr)</CODE>) does not
 | |
|    use the DNS system; it goes through NeXT's propietary Netinfo
 | |
|    system, and may not return a fully-qualified domain name. For
 | |
|    example, on my system, I get "amelia-earhart", when I want
 | |
|    "amelia-earhart.lcs.mit.edu". Since the server uses this name
 | |
|    to construct redirection URL's to be sent back to the Web client,
 | |
|    they need to be FQDN's.
 | |
| 
 | |
|    <P>
 | |
|    This problem may occur on other OS's;
 | |
|    I cannot determine if <CODE>gethostbyaddr()</CODE>
 | |
|    is required to return a FQDN or not. (I would appreciate hearing the
 | |
|    answer if you know; my local Internet guru's couldn't tell me.)
 | |
| 
 | |
|    <P>
 | |
|    If your system doesn't give you a complete Internet address when
 | |
|    you say
 | |
| <PRE>
 | |
|     (host-info:name (host-info (system-name)))
 | |
| </PRE>
 | |
|    then you have this problem.
 | |
| 
 | |
|    <P>
 | |
|    The server has a workaround. There is a procedure exported from
 | |
|    the httpd-core package:
 | |
| <PRE>
 | |
|     (set-my-fqdn name)
 | |
| </PRE>
 | |
|    Call this to crow-bar the server's idea of its own Internet host name
 | |
|    before running the server, and all will be well.
 | |
| </OL>
 | |
| 
 | |
| </BODY>
 | |
| </HTML>
 |