%&latex -*- latex -*- \documentstyle[code,11pt,lcs-note,boxedminipage,openbib,twoside, palatino,ct]{article} \input{headings} % Squeeeeeeze those figures onto the page. \renewcommand{\floatpagefraction}{0.7} \renewcommand{\topfraction}{.9} \renewcommand{\bottomfraction}{.9} \renewcommand{\textfraction}{.1} \raggedbottom \makeatletter %% For chapter and section quotes: \newcommand{\headingquote}[2] {\begin{flushright}\em\begin{tabular}{@{}l@{}}#1 \\ {\rm \qquad --- \begin{tabular}[t]{@{}l@{}}#2\end{tabular}}\end{tabular} \end{flushright}\par\noindent} \newcommand{\halfpage}[1]{\parbox[t]{0.5\linewidth}{#1}} \def\ie{\mbox{\em i.e.}} % \mbox keeps the last period from \def\Ie{\mbox{\em I.e.}} % looking like an end-of-sentence. \def\eg{\mbox{\em e.g.}} \def\Eg{\mbox{\em E.g.}} \def\etc{\mbox{\em etc.}} \def\Lisp{{\sc Lisp}} \def\CommonLisp{{\sc Common Lisp}} \def\Ascii{{\sc Ascii}} \def\Unix{{Unix}} % No \sc, according to Bart. \def\Scheme{{Scheme}} % No \sc. \def\scm{{Scheme 48}} \def\R4RS{R4RS} \newcommand{\synteq}{{\rm ::=}} % One-line code examples %\newcommand{\codex}[1]% One line, centred. Tight spacing. % {$$\abovedisplayskip=.75ex plus 1ex minus .5ex% % \belowdisplayskip=\abovedisplayskip% % \abovedisplayshortskip=0ex plus .5ex% % \belowdisplayshortskip=\abovedisplayshortskip% % \hbox{\ttt #1}$$} %\newcommand{\codex}[1]{\begin{tightinset}\ex{#1}\end{tightinset}\ignorespaces} \newcommand{\codex}[1]{\begin{leftinset}\ex{#1}\end{leftinset}\ignorespaces} % For multiletter vars in math mode: \newcommand{\var}[1]{{\it #1}} \newcommand{\vari}[2]{${\it #1}_{#2}$} %% What you frequently want when you say \tt: \def\ttt{\tt\catcode``=13\@noligs\frenchspacing} % Works in math mode; all special chars remain special; cheaper than \cd. % Will not be correct size in super and subscripts, though. \newcommand{\ex}[1]{\mbox{\ttt #1}} \newenvironment{inset} {\bgroup\parskip=1ex plus 1ex\begin{list}{}% {\topsep=0pt\rightmargin\leftmargin}% \item[]}% {\end{list}\leavevmode\egroup\global\@ignoretrue} \newenvironment{leftinset} {\bgroup\parskip=1ex plus 1ex\begin{list}{}% {\topsep=0pt}% \item[]}% {\end{list}\leavevmode\egroup\global\@ignoretrue} \newenvironment{tightinset} {\bgroup\parskip=0pt\begin{list}{}% {\topsep=0pt\rightmargin\leftmargin}% \item[]}% {\end{list}\leavevmode\egroup\ignorespaces} \newcommand{\remark}[1]{\mbox{$<<$}{\bf #1}\mbox{$>>$}} \newcommand{\note}[1]{\{Note #1\}} % For use in code. The \llap magicness makes the lambda exactly as wide as % the other chars in \tt; the \hskip shifts it right a bit so it doesn't % crowd the left paren -- which is necessary if \tt is cmtt. % Note that (\l{x y} (+ x y)) uses the same number of columns in TeX form % as it produces when typeset. This makes it easy to line up the columns % in your input. \l is bound to some useless command in LaTeX, so we have to % define it w/renewcommand. \let\oldl\l %Save the old \l on \oldl \renewcommand{\l}[1]{\ \llap{$\lambda$\hskip-.05em}\ (#1)} % This horrible hack is for typesetting procedure doc. \newcommand{\proto}[3] {\makebox[\protowidth][l]{{\ttt(#1 {\it #2}\/)} \hfill{\sl #3}}} \newcommand{\protoitem}[3]{\item[\proto{#1}{#2}{#3}]} \newlength{\protowidth} \protowidth \linewidth \newenvironment{protos}{\protowidth \linewidth \begin{description}} {\end{description}} \newenvironment{column}{\protowidth \linewidth\begin{tabular}{@{}l@{}}}{\end{tabular}} % For subcaptions \newcommand{\subcaption}[1] {\unskip\vspace{-2mm}\begin{center}\unskip\em#1\end{center}} \makeatother %%% End preamble %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{document} \notenum{3} \project{Personal Information Architecture} \title{A {\Scheme} Shell} \author{Olin Shivers \\ {\ttt shivers@lcs.mit.edu}} \date{4/94} \maketitle \pagestyle{empty} \thispagestyle{empty} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \headingquote{ Although robust enough for general use, adventures \\ into the esoteric periphery of the C shell may reveal \\ unexpected quirks.} {SunOS 4.1 csh(1) man page, 10/2/89} \vspace{-2em} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section*{Prologue} %\addcontentsline{toc}{section}{Prologue} Shell programming terrifies me. There is something about writing a simple shell script that is just much, much more unpleasant than writing a simple C program, or a simple {\CommonLisp} program, or a simple Mips assembler program. Is it trying to remember what the rules are for all the different quotes? Is it having to look up the multi-phased interaction between filename expansion, shell variables, quotation, backslashes and alias expansion? Maybe it's having to subsequently look up which of the twenty or thirty flags I need for my grep, sed, and awk invocations. Maybe it just gets on my nerves that I have to run two complete programs simply to count the number of files in a directory (\ex{ls | wc -l}), which seems like several orders of magnitude more cycles than was really needed. Whatever it is, it's an object lesson in angst. Furthermore, during late-night conversations with office mates and graduate students, I have formed the impression that I am not alone. In late February\footnote{February 1992, that is.}, I got embroiled in a multi-way email flamefest about just exactly what it was about Unix that drove me nuts. In the midst of the debate, I did a rash thing. I claimed that it would be easy and so much nicer to do shell programming from {\Scheme}. Some functions to interface to the OS and a few straightforward macros would suffice to remove the spectre of \cd{#!/bin/csh} from my life forever. The serious Unix-philes in the debate expressed their doubts. So I decided to go do it. Probably only take a week or two. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Keywords page for the MIT TR { \clearpage \vspace*{\fill} \newcommand{\keywords}[1]% {\newlength{\kwlength}\settowidth{\kwlength}{\bf Keywords: }% \setlength{\kwlength}{-\kwlength}\addtolength{\kwlength}{\linewidth}% \noindent{\bf Keywords: }\parbox[t]{\kwlength}{\raggedright{}#1.}} \keywords{operating systems, programming languages, Scheme, Unix, shells, functional languages, systems programming} } %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \cleardoublepage \tableofcontents \cleardoublepage \setcounter{page}{1} \pagestyle{plain} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Introduction} The central artifact of this paper is a new {\Unix} shell called scsh. However, I have a larger purpose beyond simply giving a description of the new system. It has become fashionable recently to claim that ``language doesn't matter.'' After twenty years of research, operating systems and systems applications are still mainly written in C and its complex successor, C++. Perhaps advanced programming languages offer too little for the price they demand in efficiency and formal rigor. I disagree strongly with this position, and I would like to use scsh, in comparison to other {\Unix} systems programming languages, to make the point that language {\em does\/} matter. After presenting scsh in the initial sections of the paper, I will describe its design principles, and make a series of points concerning the effect language design has upon systems programming. I will use scsh, C, and the traditional shells as linguistic exemplars, and show how their various notational and semantic tradeoffs affect the programmer's task. In particular, I wish to show that a functional language such as Scheme is an excellent tool for systems programming. Many of the linguistic points I will make are well-known to the members of the systems programming community that employ modern programming languages, such as DEC SRC's Modula-3 \cite{Nelson}. In this respect, I will merely be serving to recast these ideas in a different perspective, and perhaps diffuse them more widely. The rest of this paper is divided into four parts: \begin{itemize} \item In part one, I will motivate the design of scsh (section~\ref{sec:shells}), and then give a brief tutorial on the system (\ref{sec:proc-forms}, \ref{sec:syscall-lib}). \item In part two, I discuss the design issues behind scsh, and cover some of the relevant implementation details (\ref{sec:zen}--\ref{sec:size}). \item Part three concerns systems programming with advanced languages. I will illustrate my points by comparing scsh to other {\Unix} programming systems (\ref{sec:scm-sysprog}, \ref{sec:opl}). \item Finally, we conclude, with some indication of future directions and a few final thoughts. \end{itemize} %\part{Shell Programming} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Unix shells} \label{sec:shells} Unix shells, such as sh or csh, provide two things at once: an interactive command language and a programming language. Let us focus on the latter function: the writing of ``shell scripts''---interpreted programs that perform small tasks or assemble a collection of Unix tools into a single application. Unix shells are real programming languages. They have variables, if/then conditionals, and loops. But they are terrible programming languages. The data structures typically consist only of integers and vectors of strings. The facilities for procedural abstraction are non-existent to minimal. The lexical and syntactic structures are multi-phased, unprincipled, and baroque. If most shell languages are so awful, why does anyone use them? There are a few important reasons. \begin{itemize} \item A programming language is a notation for expressing computation. Shells have a notation that is specifically tuned for running Unix programs and hooking them together. For example, suppose you want to run programs \ex{foo} and \ex{bar} with \ex{foo} feeding output into \ex{bar}. If you do this in C, you must write: two calls to \ex{fork()}, two calls to \ex{exec()}, one call to \ex{pipe()}, several calls to \ex{close()}, two calls to \ex{dup()}, and a lot of error checks (fig.~\ref{fig:C-pipe}). This is a lot of picky bookkeeping: tedious to write, tedious to read, and easy to get wrong on the first try. In sh, on the other hand, you simply write ``\ex{foo | bar}'' which is much easier to write and much clearer to read. One can look at this expression and instantly understand it; one can write it and instantly be sure that it is correct. \begin{figure} \begin{boxedminipage}{\linewidth}\vskip 1.5ex \footnotesize \begin{verbatim} int fork_foobar(void) /* foo | bar in C */ { int pid1 = fork(); int pid2, fds[2]; if( pid1 == -1 ) { perror("foo|bar"); return -1; } if( !pid1 ) { int status; if( -1 == waitpid(pid1, &status, 0) ) { perror("foo|bar"); return -1; } return status; } if( -1 == pipe(fds) ) { perror("foo|bar"); exit(-1); } pid2 = fork(); if( pid2 == -1 ) { perror("foo|bar"); exit(-1); } if( !pid2 ) { close(fds[1]); dup2(fds[0], 1); execlp("foo", "foo", NULL); perror("foo|bar"); exit(-1); } close(fds[0]); dup2(fds[1], 0); execlp("bar", "bar", NULL); perror("foo|bar"); exit(-1); }\end{verbatim} \caption{Why we program with shells.} \label{fig:C-pipe} \end{boxedminipage} \end{figure} \item They are interpreted. Debugging is easy and interactive; programs are small. On my workstation, the ``hello, world'' program is 16kb as a compiled C program, and 29 bytes as an interpreted sh script. In fact, \ex{/bin/sh} is just about the only language interpreter that a programmer can absolutely rely upon having available on the system, so this is just about the only reliable way to get interpreted-code density and know that one's program will run on any Unix system. \item Because the shell is the programmer's command language, the programmer is usually very familiar with its commonly-used command-language subset (this familiarity tails off rapidly, however, as the demands of shell programming move the programmer out into the dustier recesses of the language's definition.) \end{itemize} There is a tension between the shell's dual role as interactive command language and shell-script programming language. A command language should be terse and convenient to type. It doesn't have to be comprehensible. Users don't have to maintain or understand a command they typed into a shell a month ago. A command language can be ``write-only,'' because commands are thrown away after they are used. However, it is important that most commands fit on one line, because most interaction is through tty drivers that don't let the user back up and edit a line after its terminating newline has been entered. This seems like a trivial point, but imagine how irritating it would be if typical shell commands required several lines of input. Terse notation is important for interactive tasks. Shell syntax is also carefully designed to allow it to be parsed on-line---that is, to allow parsing and interpretation to be interleaved. This usually penalizes the syntax in other ways (for example, consider rc's clumsy if/then/else syntax \cite{rc}). Programming languages, on the other hand, can be a little more verbose, in return for generality and readability. The programmer enters programs into a text editor, so the language can spread out a little more. The constraints of the shell's role as command language are one of the things that make it unpleasant as a programming language. The really compelling advantage of shell languages over other programming languages is the first one mentioned above. Shells provide a powerful notation for connecting processes and files together. In this respect, shell languages are extremely well-adapted to the general paradigm of the Unix operating system. In Unix, the fundamental computational agents are programs, running as processes in individual address spaces. These agents cooperate and communicate among themselves to solve a problem by communicating over directed byte streams called pipes. Viewed at this level, Unix is a data-flow architecture. From this perspective, the shell serves a critical role as the language designed to assemble the individual computational agents to solve a particular task. As a programming language, this interprocess ``glue'' aspect of the shell is its key desireable feature. This leads us to a fairly obvious idea: instead of adding weak programming features to a Unix process-control language, why not add process invocation features to a strong programming language? What programming language would make a good base? We would want a language that was powerful and high-level. It should allow for implementations based on interactive interpreters, for ease of debugging and to keep programs small. Since we want to add new notation to the language, it would help if the language was syntactically extensible. High-level features such as automatic storage allocation would help keep programs small and simple. {\Scheme} is an obvious choice. It has all of the desired features, and its weak points, such as it lack of a module system or its poor performance relative to compiled C on certain classes of program, do not apply to the writing of shell scripts. I have designed and implemented a {\Unix} shell called scsh that is embedded inside {\Scheme}. I had the following design goals and non-goals: \begin{itemize} \item The general systems architecture of {\Unix} is cooperating computational agents that are realised as processes running in separate, protected address spaces, communicating via byte streams. The point of a shell language is to act as the glue to connect up these computational agents. That is the goal of scsh. I resisted the temptation to delve into other programming models. Perhaps cooperating lightweight threads communicating through shared memory is a better way to live, but it is not {\Unix}. The goal here was not to come up with a better systems architecture, but simply to provide a better way to drive {\Unix}. \note{Agenda} \item I wanted a programming language, not a command language, and I was unwilling to compromise the quality of the programming language to make it a better command language. I was not trying to replace use of the shell as an interactive command language. I was trying to provide a better alternative for writing shell scripts. So I did not focus on issues that might be important for a command language, such as job control, command history, or command-line editing. There are no write-only notational conveniences. I made no effort to hide the base {\Scheme} syntax, even though an interactive user might find all the necessary parentheses irritating. (However, see section \ref{sec:future-work}.) \item I wanted the result to fit naturally within {\Scheme}. For example, this ruled out complex non-standard control-flow paradigms, such as awk's or sed's. \end{itemize} The result design, scsh, has two dependent components, embedded within a very portable {\Scheme} system: \begin{itemize} \item A high-level process-control notation. \item A complete library of {\Unix} system calls. \end{itemize} The process-control notation allows the user to control {\Unix} programs with a compact notation. The syscall library gives the programmer full low-level access to the kernel for tasks that cannot be handled by the high-level notation. In this way, scsh's functionality spans a spectrum of detail that is not available to either C or sh. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Process notation} \label{sec:proc-forms} Scsh has a notation for controlling {\Unix} processes that takes the form of s-expressions; this notation can then be embedded inside of standard {\Scheme} code. The basic elements of this notation are {\em process forms}, {\em extended process forms}, and {\em redirections}. \subsection{Extended process forms and i/o redirections} An {\em extended process form\/} is a specification of a {\Unix} process to run, in a particular I/O environment: \codex{\var{epf} {\synteq} (\var{pf} $\var{redir}_1$ {\ldots} $\var{redir}_n$)} where \var{pf} is a process form and the $\var{redir}_i$ are redirection specs. A {\em redirection spec} is one of: \begin{inset} \begin{tabular}{@{}l@{\qquad{\tt; }}l@{}} \ex{(< \var{[fdes]} \var{file-name})} & \ex{Open file for read.} \\\ex{(> \var{[fdes]} \var{file-name})} & \ex{Open file create/truncate.} \\\ex{(<< \var{[fdes]} \var{object})} & \ex{Use \var{object}'s printed rep.} \\\ex{(>> \var{[fdes]} \var{file-name})} & \ex{Open file for append.} \\\ex{(= \var{fdes} \var{fdes/port})} & \ex{Dup2} \\\ex{(- \var{fdes/port})} & \ex{Close \var{fdes/port}.} \\\ex{stdports} & \ex{0,1,2 dup'd from standard ports.} \end{tabular} \end{inset} The \var{fdes} file descriptors have these defaults: \begin{center} {\ttt \begin{tabular}{|cccc|}\hline < & << & > & >> \\ 0 & 0 & 1 & 1 \\ \hline \end{tabular} } \end{center} The subforms of a redirection are implicitly backquoted, and symbols stand for their print-names. So \ex{(> ,x)} means ``output to the file named by {\Scheme} variable \ex{x},'' and \ex{(< /usr/shivers/.login)} means ``read from \ex{/usr/shivers/.login}.'' This implicit backquoting is an important feature of the process notation, as we'll see later (sections~\ref{sec:zen} and \ref{sec:sexp}). Here are two more examples of i/o redirection: % \begin{center} \begin{codebox} (< ,(vector-ref fv i)) (>> 2 /tmp/buf)\end{codebox} \end{center} % These two redirections cause the file \ex{fv[i]} to be opened on stdin, and \ex{/tmp/buf} to be opened for append writes on stderr. The redirection \ex{(<< \var{object})} causes input to come from the printed representation of \var{object}. For example, \codex{(<< "The quick brown fox jumped over the lazy dog.")} causes reads from stdin to produce the characters of the above string. The object is converted to its printed representation using the \ex{display} procedure, so \codex{(<< (A five element list))} is the same as \codex{(<< "(A five element list)")} is the same as \codex{(<< ,(reverse '(list element five A))){\rm.}} (Here we use the implicit backquoting feature to compute the list to be printed.) The redirection \ex{(= \var{fdes} \var{fdes/port})} causes \var{fdes/port} to be dup'd into file descriptor \var{fdes}. For example, the redirection \codex{(= 2 1)} causes stderr to be the same as stdout. \var{fdes/port} can also be a port, for example: \codex{(= 2 ,(current-output-port))} causes stderr to be dup'd from the current output port. In this case, it is an error if the port is not a file port (\eg, a string port). \note{No port sync} More complex redirections can be accomplished using the \ex{begin} process form, discussed below, which gives the programmer full control of i/o redirection from {\Scheme}. \subsection{Process forms} A {\em process form\/} specifies a computation to perform as an independent {\Unix} process. It can be one of the following: % \begin{leftinset} \begin{codebox} (begin . \var{scheme-code}) (| \vari{pf}{\!1} {\ldots} \vari{pf}{\!n}) (|+ \var{connect-list} \vari{pf}{\!1} {\ldots} \vari{pf}{\!n}) (epf . \var{epf}) (\var{prog} \vari{arg}{1} {\ldots} \vari{arg}{n}) \end{codebox} \qquad \begin{codebox} ; Run \var{scheme-code} in a fork. ; Simple pipeline ; Complex pipeline ; An extended process form. ; Default: exec the program. \end{codebox} \end{leftinset} % The default case \ex{(\var{prog} \vari{arg}1 {\ldots} \vari{arg}n)} is also implicitly backquoted. That is, it is equivalent to: % \codex{(begin (apply exec-path `(\var{prog} \vari{arg}1 {\ldots} \vari{arg}n)))} % \ex{Exec-path} is the version of the \ex{exec()} system call that uses scsh's path list to search for an executable. The program and the arguments must be either strings, symbols, or integers. Symbols and integers are coerced to strings. A symbol's print-name is used. Integers are converted to strings in base 10. Using symbols instead of strings is convenient, since it suppresses the clutter of the surrounding \ex{"{\ldots}"} quotation marks. To aid this purpose, scsh reads symbols in a case-sensitive manner, so that you can say \codex{(more Readme)} and get the right file. (See section \ref{sec:lex} for further details on lexical issues.) A \var{connect-list} is a specification of how two processes are to be wired together by pipes. It has the form \ex{((\vari{from}1 \vari{from}2 {\ldots} \var{to}) \ldots)} and is implicitly backquoted. For example, % \codex{(|+ ((1 2 0) (3 3)) \vari{pf}{\!1} \vari{pf}{\!2})} % runs \vari{pf}{\!1} and \vari{pf}{\!2}. The first clause \ex{(1 2 0)} causes \vari{pf}{\!1}'s stdout (1) and stderr (2) to be connected via pipe to \vari{pf}{\!2}'s stdin (0). The second clause \ex{(3 3)} causes \vari{pf}{\!1}'s file descriptor 3 to be connected to \vari{pf}{\!2}'s file descriptor 3. %---this is unusual, and not expected to occur very often. %[Note that {\R4RS} does not specify whether or not | and |+ are readable %symbols. Scsh does.] \subsection{Using extended process forms in \Scheme} Process forms and extended process forms are {\em not\/} {\Scheme}. They are a different notation for expressing computation that, like {\Scheme}, is based upon s-expressions. Extended process forms are used in {\Scheme} programs by embedding them inside special Scheme forms. \pagebreak There are three basic {\Scheme} forms that use extended process forms: \ex{exec-epf}, \cd{&}, and \ex{run}: \begin{inset} \begin{codebox}[t] (exec-epf . \var{epf}) (& . \var{epf}) (run . \var{epf}) \end{codebox} \quad \begin{codebox}[t] ; Nuke the current process. ; Run \var{epf} in background; return pid. ; Run \var{epf}; wait for termination. ; Returns exit status.\end{codebox} \end{inset} These special forms are macros that expand into the equivalent series of system calls. The definition of the \ex{exec-epf} macro is non-trivial, as it produces the code to handle i/o redirections and set up pipelines. However, the definitions of the \cd{&} and \ex{run} macros are very simple: \begin{leftinset} \begin{tabular}{@{}l@{\quad$\Rightarrow$\quad}l@{}} \cd{(& . \var{epf})} & \ex{(fork (\l{} (exec-epf . \var{epf})))} \\ \ex{(run . \var{epf})} & \cd{(wait (& . \var{epf}))} \end{tabular} \end{leftinset} Figures \ref{fig:ex1} and \ref{fig:ex2} show a series of examples employing a mix of the process notation and the syscall library. Note that regular Scheme is used to provide the control structure, variables, and other linguistic machinery needed by the script fragments. % \begin{figure}[bp]\footnotesize \begin{boxedminipage}{\linewidth}\vskip 1.5ex \begin{center}\begin{codebox} ;; If the resource file exists, load it into X. (if (file-exists? f)) (run (xrdb -merge ,f))) ;; Decrypt my mailbox; key is "xyzzy". (run (crypt xyzzy) (< mbox.crypt) (> mbox)) ;; Dump the output from ls, fortune, and from into log.txt. (run (begin (run (ls)) (run (fortune)) (run (from))) (> log.txt)) ;; Compile FILE with FLAGS. (run (cc ,file ,@flags)) ;; Delete every file in DIR containing the string "/bin/perl": (with-cwd dir (for-each (\l{file} (if (zero? (run (grep -s /bin/perl ,file))) (delete-file file))) (directory-files)))\end{codebox} \end{center} \caption{Example shell script fragments (a)} \label{fig:ex1} \end{boxedminipage} \end{figure} \begin{figure}\footnotesize \begin{boxedminipage}{\linewidth}\vskip 1.5ex \begin{center}\begin{codebox} ;; M4 preprocess each file in the current directory, then pipe ;; the input into cc. Errlog is foo.err, binary is foo.exe. ;; Run compiles in parallel. (for-each (\l{file} (let ((outfile (replace-extension file ".exe")) (errfile (replace-extension file ".err"))) (& (| (m4) (cc -o ,outfile)) (< ,file) (> 2 ,errfile)))) (directory-files)) ;; Same as above, but parallelise even the computation ;; of the filenames. (for-each (\l{file} (& (begin (let ((outfile (replace-extension file ".exe")) (errfile (replace-extension file ".err"))) (exec-epf (| (m4) (cc -o ,outfile)) (< ,file) (> 2 ,errfile)))))) (directory-files)) ;; DES encrypt string PLAINTEXT with password KEY. My DES program ;; reads the input from fdes 0, and the key from fdes 3. We want to ;; collect the ciphertext into a string and return that, with error ;; messages going to our stderr. Notice we are redirecting Scheme data ;; structures (the strings PLAINTEXT and KEY) from our program into ;; the DES process, instead of redirecting from files. RUN/STRING is ;; like the RUN form, but it collects the output into a string and ;; returns it (see following section). (run/string (/usr/shivers/bin/des -e -3) (<< ,plaintext) (<< 3 ,key)) ;; Delete the files matching regular expression PAT. ;; Note we aren't actually using any of the process machinery here -- ;; just pure Scheme. (define (dsw pat) (for-each (\l{file} (if (y-or-n? (string-append "Delete " file)) (delete-file file))) (file-match #f pat)))\end{codebox} \end{center} \caption{Example shell script fragments (b)} \label{fig:ex2} \end{boxedminipage} \end{figure} \subsection{Procedures and special forms} It is a general design principle in scsh that all functionality made available through special syntax is also available in a straightforward procedural form. So there are procedural equivalents for all of the process notation. In this way, the programmer is not restricted by the particular details of the syntax. Here are some of the syntax/procedure equivalents: \begin{inset} \begin{tabular}{@{}|ll|@{}} \hline Notation & Procedure \\ \hline \hline \ex{|} & \ex{fork/pipe} \\ \ex{|+} & \ex{fork/pipe+} \\ \ex{exec-epf} & \ex{exec-path} \\ redirection & \ex{open}, \ex{dup} \\ \cd{&} & \ex{fork} \\ \ex{run} & $\ex{wait} + \ex{fork}$ \\ \hline \end{tabular} \end{inset} % Having a solid procedural foundation also allows for general notational experimentation using Scheme's macros. For example, the programmer can build his own pipeline notation on top of the \ex{fork} and \ex{fork/pipe} procedures. %Because the shell notation has {\Scheme} escapes %(\eg, the \ex{begin} process form), %the programmer can move back and forth easily, using the simple notation %where possible, and escaping to general {\Scheme} only where necessary. \begin{protos} \protoitem{fork}{[thunk]}{procedure} \ex{Fork} spawns a {\Unix} subprocess. Its exact behavior depends on whether it is called with the optional \var{thunk} argument. With the \var{thunk} argument, \ex{fork} spawns off a subprocess that calls \var{thunk}, exiting when \var{thunk} returns. \ex{Fork} returns the subprocess' pid to the parent process. Without the \var{thunk} argument, \ex{fork} behaves like the C \ex{fork()} routine. It returns in both the parent and child process. In the parent, \ex{fork} returns the child's pid; in the child, \ex{fork} returns \cd{#f}. \protoitem{fork/pipe}{[thunk]}{procedure} Like \ex{fork}, but the parent and child communicate via a pipe connecting the parent's stdin to the child's stdout. This function side-effects the parent by changing his stdin. In effect, \ex{fork/pipe} splices a process into the data stream immediately upstream of the current process. This is the basic function for creating pipelines. Long pipelines are built by performing a sequence of \ex{fork/pipe} calls. \pagebreak For example, to create a background two-process pipe \ex{a | b}, we write: % \begin{tightcode} (fork (\l{} (fork/pipe a) (b)))\end{tightcode} % which returns the pid of \ex{b}'s process. To create a background three-process pipe \ex{a | b | c}, we write: % \begin{code} (fork (\l{} (fork/pipe a) (fork/pipe b) (c)))\end{code} % which returns the pid of \ex{c}'s process. \protoitem{fork/pipe+}{conns [thunk]}{procedure} Like \ex{fork/pipe}, but the pipe connections between the child and parent are specified by the connection list \var{conns}. See the \codex{(|+ \var{conns} \vari{pf}{\!1} \ldots{} \vari{pf}{\!n})} process form for a description of connection lists. \end{protos} \subsection{Interfacing process output to {\Scheme}} \label{sec:io-interface} There is a family of procedures and special forms that can be used to capture the output of processes as {\Scheme} data. Here are the special forms for the simple variants: \\[2ex]%\begin{center} \begin{codebox} (run/port . \var{epf}) ; Return port open on process's stdout. (run/file . \var{epf}) ; Process > temp file; return file name. (run/string . \var{epf}) ; Collect stdout into a string and return. (run/strings . \var{epf}) ; Stdout->list of newline-delimited strings. (run/sexp . \var{epf}) ; Read one sexp from stdout with READ. (run/sexps . \var{epf}) ; Read list of sexps from stdout with READ.\end{codebox} \\[2ex]%\end{center} % \ex{Run/port} returns immediately after forking off the process; other forms wait for either the process to die (\ex{run/file}), or eof on the communicating pipe (\ex{run/string}, \ex{run/strings}, \ex{run/sexps}). These special forms just expand into calls to the following analogous procedures: % \begin{center} \begin{column} \proto{run/port*} {thunk}{procedure} \\ \proto{run/file*} {thunk}{procedure} \\ \proto{run/string*} {thunk}{procedure} \\ \proto{run/strings*} {thunk}{procedure} \\ \proto{run/sexp*} {thunk}{procedure} \\ \proto{run/sexps*} {thunk}{procedure} \end{column} \end{center} % For example, \ex{(run/port . \var{epf})} expands into \codex{(run/port* (\l{} (exec-epf . \var{epf}))).} These procedures can be used to manipulate the output of {\Unix} programs with {\Scheme} code. For example, the output of the \ex{xhost(1)} program can be manipulated with the following code: \begin{code} ;;; Before asking host REMOTE to do X stuff, ;;; make sure it has permission. (while (not (member remote (run/strings (xhost)))) (display "Pausing for xhost...") (read-char))\end{code} The following procedures are also of utility for generally parsing input streams in scsh: %(port->string \var{port}) %(port->sexp-list \var{port}) %(port->string-list \var{port}) %(port->list \var{reader} \var{port}) \begin{center} \begin{column} \proto{port->string}{port}{procedure} \\ \proto{port->sexp-list}{port}{procedure} \\ \proto{port->string-list}{port}{procedure} \\ \proto{port->list}{reader port}{procedure} \end{column} \end{center} \ex{Port->string} reads the port until eof, then returns the accumulated string. \ex{Port->sexp-list} repeatedly reads data from the port until eof, then returns the accumulated list of items. \ex{Port->string-list} repeatedly reads newline-terminated strings from the port until eof, then returns the accumulated list of strings. The delimiting newlines are not part of the returned strings. \ex{Port->list} generalises these two procedures. It uses \var{reader} to repeatedly read objects from a port. It accumulates these objects into a list, which is returned upon eof. The \ex{port->string-list} and \ex{port->sexp-list} procedures are trivial to define, being merely \ex{port->list} curried with the appropriate parsers: \begin{code}\cddollar (port->string-list \var{port}) $\equiv$ (port->list read-line \var{port}) (port->sexp-list \var{port}) $\equiv$ (port->list read \var{port})\end{code} % The following compositions also hold: \begin{code}\cddollar run/string* $\equiv$ port->string $\circ$ run/port* run/strings* $\equiv$ port->string-list $\circ$ run/port* run/sexp* $\equiv$ read $\circ$ run/port* run/sexps* $\equiv$ port->sexp-list $\circ$ run/port*\end{code} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{System calls} \label{sec:syscall-lib} We've just seen scsh's high-level process-form notation, for running programs, creating pipelines, and performing I/O redirection. This notation is at roughly the same level as traditional {\Unix} shells. The process-form notation is convenient, but does not provide detailed, low-level access to the operating system. This is provided by the second component of scsh: its system-call library. Scsh's system-call library is a nearly-complete set of {\sc Posix} bindings, with some extras, such as symbolic links. As of this writing, network and terminal i/o controls have still not yet been implemented; work on them is underway. Scsh also provides a convenient set of systems programming utility procedures, such as routines to perform pattern matching on file-names and general strings, manipulate {\Unix} environment variables, and parse file pathnames. Although some of the procedures have been described in passing, a detailed description of the system-call library is beyond the scope of this note. The reference manual \cite{ref-man} contains the full details. %\part{Design Notes} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{The Tao of {\Scheme} and {\Unix}} \label{sec:zen} Most attempts at embedding shells in functional programming languages \cite{fsh,ellis} try to hide the difference between running a program and calling a procedure. That is, if the user tries \codex{(lpr "notes.txt")} the shell will first treat \ex{lpr} as a procedure to be called. If \ex{lpr} isn't found in the variable environment, the shell will then do a path search of the file system for a program. This sort of transparency is in analogy to the function-binding mechanisms of traditional shells, such as ksh. This is a fundamental error that has hindered these previous designs. Scsh, in contrast, is explicit about the distinction between procedures and programs. In scsh, the programmer must know which are which---the mechanisms for invocation are different for the two cases (procedure call {\em versus\/} the \ex{(run . \var{epf})} special form), and the namespaces are different (the program's lexical environment {\em versus\/} \ex{\$PATH} search in the file system). Linguistically separating these two mechanisms was an important design decision in the language. It was done because the two computational models are fundamentally different; any attempt to gloss over the distinctions would have made the semantics ugly and inconsistent. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure} \begin{boxedminipage}{\linewidth}\vskip 1.5ex \begin{center} \begin{tabular}{ll} \bf Unix: & \begin{tabular}[t]{l} Computational agents are processes, \\ communicate via byte streams. \end{tabular} \\ \\ \bf Scheme: & \begin{tabular}[t]{l} Computational agents are procedures, \\ communicate via procedure call/return. \end{tabular} \end{tabular} \end{center} \caption{The Tao of {\Scheme} and {\Unix}} \label{fig:tao} \end{boxedminipage} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% There are two computational worlds here (figure~\ref{fig:tao}), where the basic computational agents are procedures or processes. These agents are composed differently. In the world of applicative-order procedures, agents execute serially, and are composed with function composition: \ex{(g (f x))}. In the world of processes, agents execute concurrently and are composed with pipes, in a data-flow network: \ex{f | g}. A language with both of these computational structures, such as scsh, must provide a way to interface them. \note{Normal order} In scsh, we have ``adapters'' for crossing between these paradigms: %(figure~\ref{fig:cross-connect}). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\begin{figure}[bhp] %\begin{center} \begin{inset} \def\foo{\rule[-1.5ex]{0in}{4ex}} \begin{tabular}{l|c|c|} \multicolumn{1}{l}{} & \multicolumn{1}{c}{Scheme} & \multicolumn{1}{c}{Unix} \\ \cline{2-3} \foo Scheme & \ex{(g (f x))} & \ex{(<< ,x)} \\ \cline{2-3} \foo Unix & \ex{run/string},\ldots & \ex{f | g} \\ \cline{2-3} \end{tabular} \end{inset} %\end{center} %\caption{Scheme/Unix cross-connectors} %\label{fig:cross-connect} %\end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The \ex{run/string} form and its cousins (section~\ref{sec:io-interface}) map process output to procedure input; the \ex{<<} i/o redirection maps procedure output to process input. For example: \begin{code} (run/string (nroff -ms) (<< ,(texinfo->nroff doc-string)))\end{code} By separating the two worlds, and then providing ways for them to cross-connect, scsh can cleanly accommodate the two paradigms within one notational framework. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{I/O} \label{sec:io} Perhaps the most difficult part of the design of scsh was the integration of {\Scheme} ports and {\Unix} file descriptors. Dealing with {\Unix} file descriptors in a {\Scheme} environment is difficult. In {\Unix}, open files are part of the process state, and are referenced by small integers called {\em file descriptors}. Open file descriptors are the fundamental way i/o redirections are passed to subprocesses, since file descriptors are preserved across \ex{fork()} and \ex{exec()} calls. {\Scheme}, on the other hand, uses ports for specifying i/o sources. Ports are anonymous, garbage-collected Scheme objects, not integers. When a port is collected, it is also closed. Because file descriptors are just integers, it's impossible to garbage collect them---in order to close file descriptor 3, you must prove that the process will never again pass a 3 as a file descriptor to a system call doing I/O, and that it will never \ex{exec()} a program that will refer to file descriptor 3. This is difficult at best. If a {\Scheme} program only used {\Scheme} ports, and never directly used file descriptors, this would not be a problem. But {\Scheme} code must descend to the file-descriptor level in at least two circumstances: \begin{itemize} \item when interfacing to foreign code; \item when interfacing to a subprocess. \end{itemize} This causes problems. Suppose we have a {\Scheme} port constructed on top of file descriptor 2. We intend to fork off a C program that will inherit this file descriptor. If we drop references to the port, the garbage collector may prematurely close file 2 before we exec the C program. Another difficulty arising between the anonymity of ports and the explicit naming of file descriptors arises when the user explicitly manipulates file descriptors, as is required by {\Unix}. For example, when a file port is opened in {\Scheme}, the underlying run-time {\Scheme} kernel must open a file and allocate an integer file descriptor. When the user subsequently explicitly manipulates particular file descriptors, perhaps preparatory to executing some {\Unix} subprocess, the port's underlying file descriptor could be silently redirected to some new file. Scsh's {\Unix} i/o interface is intended to fix this and other problems arising from the mismatch between ports and file descriptors. The fundamental principle is that in scsh, most ports are attached to files, not to particular file descriptors. When the user does an i/o redirection (\eg, with \ex{dup2()}) that must allocate a particular file descriptor \var{fd}, there is a chance that \var{fd} has already been inadvertently allocated to a port by a prior operation (\eg, an \ex{open-input-file} call). If so, \var{fd}'s original port will be shifted to some new file descriptor with a \ex{dup(\var{fd})} operation, freeing up \var{fd} for use. The port machinery is allowed to do this as it does not in general reveal which file descriptors are allocated to particular {\Scheme} ports. Not revealing the particular file descriptors allocated to {\Scheme} ports allows the system two important freedoms: \begin{itemize} \item When the user explicitly allocates a particular file descriptor, the run-time system is free to shuffle around the port/file-descriptor associations as required to free up that descriptor. \item When all pointers to an unrevealed file port have been dropped, the run-time system is free to close the underlying file descriptor. If the user doesn't know which file descriptor was associated with the port, then there is no way he could refer to that i/o channel by its file-descriptor name. This allows scsh to close file descriptors during gc or when performing an \ex{exec()}. \end{itemize} Users {\em can\/} explicitly manipulate file descriptors, if so desired. In this case, the associated ports are marked by the run time as ``revealed,'' and are no longer subject to automatic collection. The machinery for handling this is carefully marked in the documentation, and with some simple invariants in mind, follow the user's intuitions. This facility preserves the transparent close-on-collect property for file ports that are used in straightforward ways, yet allows access to the underlying {\Unix} substrate without interference from the garbage collector. This is critical, since shell programming absolutely requires access to the {\Unix} file descriptors, as their numerical values are a critical part of the process interface. Under normal circumstances, all this machinery just works behind the scenes to keep things straightened out. The only time the user has to think about it is when he starts accessing file descriptors from ports, which he should almost never have to do. If a user starts asking what file descriptors have been allocated to what ports, he has to take responsibility for managing this information. Further details on the port mechanisms in scsh are beyond the scope of this note; for more information, see the reference manual \cite{ref-man}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Lexical issues} \label{sec:lex} Scsh's lexical syntax is not fully {\R4RS}-compliant in two ways: \begin{itemize} \item In scsh, symbol case is preserved by \ex{read} and is significant on symbol comparison. This means \codex{(run (less Readme))} displays the right file. \item ``\ex{-}'' and ``\ex{+}'' are allowed to begin symbols. So the following are legitimate symbols: \codex{-O2 -geometry +Wn} \end{itemize} % Scsh also extends {\R4RS} lexical syntax in the following ways: \begin{itemize} \item ``\ex{|}'' and ``\ex{.}'' are symbol constituents. This allows \ex{|} for the pipe symbol, and \ex{..} for the parent-directory symbol. (Of course, ``\ex{.}'' alone is not a symbol, but a dotted-pair marker.) \item A symbol may begin with a digit. So the following are legitimate symbols: \codex{9x15 80x36-3+440} \item Strings are allowed to contain the {\sc Ansi} C escape sequences such as \verb|\n| and \verb|\161|. \item \cd{#!} is a comment read-macro similar to \ex{;}. This is important for writing shell scripts. \end{itemize} The lexical details of scsh are perhaps a bit contentious. Extending the symbol syntax remains backwards compatible with existing correct {\R4RS} code. Since flags to {\Unix} programs always begin with a dash, not extending the syntax would have required the user to explicitly quote every flag to a program, as in \codex{(run (cc "-O" "-o" "-c" main.c)).} This is unacceptably obfuscatory, so the change was made to cover these sorts of common {\Unix} flags. More serious was the decision to make symbols read case-sensitively, which introduces a true backwards incompatibility with {\R4RS} {\Scheme}. This was a true case of clashing world-views: {\Unix}'s tokens are case-sensitive; {\Scheme}'s, are not. It is also unfortunate that the single-dot token, ``\ex{.}'', is both a fundamental {\Unix} file name and a deep, primitive syntactic token in {\Scheme}---it means the following will not parse correctly in scsh: \codex{(run/strings (find . -name *.c -print))} You must instead quote the dot: \codex{(run/strings (find "." -name *.c -print))} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Implementation} \label{sec:impl} Scsh is currently implemented on top of {\scm}, a freely-available {\Scheme} implementation written by Kelsey and Rees \cite{S48}. {\scm} uses a byte-code interpreter for portability, good code density, and medium efficiency. It is {\R4RS}-compliant, and includes a module system designed by Rees. The scsh design is not {\scm}-specific, although the current implementation is necessarily so. Scsh is intended to be implementable in other {\Scheme} implementations---although such a port may require some work. (I would be very interested to see scsh ported to some of the {\Scheme} systems designed to serve as embedded command languages---\eg, elk, esh, or any of the other C-friendly interpreters.) Scsh scripts currently have a few problems owing to the current {\scm} implementation technology. \begin{itemize} \item Before running even the smallest shell script, the {\scm} vm must first load in a 1.4Mb heap image. This i/o load adds a few seconds to the startup time of even trivial shell scripts. \item Since the entire {\scm} and scsh runtime is in the form of byte-code data in the {\Scheme} heap, the heap is fairly large. As the {\scm} vm uses a non-generational gc, all of this essentially permanent data gets copied back and forth by the collector. \item The large heap size is compounded by {\Unix} forking. If you run a four-stage pipeline, \eg, \begin{code} (run (| (zcat paper.tex.Z) (detex) (spell) (enscript -2r)))\end{code} then, for a brief instant, you could have up to five copies of scsh forked into existence. This would briefly quintuple the virtual memory demand placed by a single scsh heap, which is fairly large to begin with. Since all the code is actually in the data pages of the process, the OS can't trivially share pages between the processes. Even if the OS is clever enough to do copy-on-write page sharing, it may insist on reserving enough backing store on disk for worst-case swapping requirements. If disk space is limited, this may overflow the paging area, causing the \ex{fork()} operations to fail. \end{itemize} % Byte-coded virtual machines are intended to be a technology that provides memory savings through improved code density. It is ironic that the straightforward implementation of such a byte-code interpreter actually has high memory cost through bad interactions with {\Unix} \ex{fork()} and the virtual memory system. The situation is not irretrievable, however. A recent release of {\scm} allows the pure portion of a heap image to be statically linked with the text pages of the vm binary. Putting static data---such as all the code for the runtime---into the text pages should drastically shorten start-up time, move a large amount of data out of the heap, improve paging, and greatly shrink the dynamic size. This should all lessen the impact of \ex{fork()} on the virtual memory system. Arranging for the garbage collector to communicate with the virtual memory system with the near-standard \ex{madvise()} system call would further improve the system. Also, breaking the system run-time into separate modules (\eg, bignums, list operations, i/o, string operations, scsh operations, compiler, \etc), each of which can be demand-loaded shared-text by the {\scm} vm (using \ex{mmap()}), will allow for a full-featured system with a surprisingly small memory footprint. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Size} \label{sec:size} Scsh can justifiably be criticised for being a florid design. There are a lot of features---perhaps too many. The optional arguments to many procedures, the implicit backquoting, and the syntax/procedure equivalents are all easily synthesized by the user. For example, \ex{port->strings}, \ex{run/strings*}, \ex{run/sexp*}, and \ex{run/sexps*} are all trivial compositions and curries of other base procedures. The \ex{run/strings} and \ex{run/sexps} forms are easily written as macros, or simply written out by hand. Not only does scsh provide the basic \ex{file-attributes} procedure (\ie, the \ex{stat()} system call), it also provides a host of derived procedures: \ex{file-owner}, \ex{file-mode}, \ex{file-directory?}, and so forth. Still, my feeling is that it is easier and clearer to read \codex{(filter file-directory? (directory-files))} than \begin{code} (filter (\l{fname} (eq? 'directory (fileinfo:type (file-attributes fname)))) (directory-files))\end{code} A full library can make for clearer user code. One measure of scsh's design is that the source code consists of a large number of small procedures: the source code for scsh has 448 top-level definitions; the definitions have an average length of 5 lines of code. That is, scsh is constructed by connecting together a lot of small, composable parts, instead of designing one inflexible monolithic structure. These small parts can also be composed and abstracted by the programmer into his own computational structures. Thus the total functionality of scsh is greater than more traditional large systems. %\part{Systems Programming} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Systems programming in {\Scheme}} \label{sec:scm-sysprog} {\Unix} systems programming in {\Scheme} is a much more pleasant experience than {\Unix} systems programming in C. Several features of the language remove a lot of the painful or error-prone problems C systems programmers are accustomed to suffering. The most important of these features are: \begin{itemize} \item exceptions \item automatic storage management \item real strings \item higher-order procedures \item S-expression syntax and backquote \end{itemize} % Many of these features are available in other advanced programming languages, such as Modula-3 or ML. None are available in C. \subsection{Exceptions and robust error handling} In scsh, system calls never return the error codes that make careful systems programming in C so difficult. Errors are signaled by raising exceptions. Exceptions are usually handled by default handlers that either abort the program or invoke a run-time debugger; the programmer can override these when desired by using exception-handler expressions. Not having to return error codes frees up procedures to return useful values, which encourages procedural composition. It also keeps the programmer from cluttering up his code with (or, as is all too often the case, just forgetting to include) error checks for every system call. In scsh, the programmer can assume that if a system call returns at all, it returns successfully. This greatly simplifies the flow of the code from the programmer's point of view, as well as greatly increasing the robustness of the program. \subsection{Automatic storage management} Further, {\Scheme}'s automatic storage allocation removes the ``result'' parameters from the procedure argument lists. When composite data is returned, it is simply returned in a freshly-allocated data structure. Again, this helps make it possible for procedures to return useful values. For example, the C system call \ex{readlink()} dereferences a symbolic link in the file system. A working definition for the system call is given in figure~\ref{fig:symlink}b. It is complicated by many small bookkeeping details, made necessary by C's weak linguistic facilities. In contrast, scsh's equivalent procedure, \ex{read-symlink}, has a much simpler definition (fig.~\ref{fig:symlink}a). % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \begin{figure}\fboxsep=1.5em \renewcommand{\subcaption}[1] {\unskip\begin{center}\unskip\em#1\end{center}} \begin{boxedminipage}{\linewidth} \vskip 1.5 ex \ex{(read-symlink fname)}\\[1.5ex] \ex{read-symlink} returns the filename referenced by symbolic link \ex{fname}. An exception is raised if there is an error. \subcaption{(a) {\Scheme} definition of \ex{readlink}} \end{boxedminipage} \vskip 3ex plus 1fil \begin{boxedminipage}{\linewidth}\vskip 1.5ex \ex{readlink(char *path, char *buf, int bufsiz)}\\[1.5ex] \ex{readlink} dereferences the symbolic link \ex{path}. If the referenced filename is less than or equal to \ex{bufsiz} characters in length, it is written into the \ex{buf} array, which we fondly hope the programmer has arranged to be at least of size \ex{bufsiz} characters. If the referenced filename is longer than \ex{bufsiz} characters, the system call returns an error code; presumably the programmer should then reallocate a larger buffer and try again. If the system call succeeds, it returns the length of the result filename. When the referenced filename is written into \ex{buf}, it is {\em not\/} nul-terminated; it is the programmer's responsibility to leave space in the buffer for the terminating nul (remembering to subtract one from the actual buffer length when passing it to the system call), and deposit the terminal nul after the system call returns. If there is a real error, the procedure will, in most cases, return an error code. (We will gloss over the error-code mechanism for the sake of brevity.) % I will gloss over the -1/\ex{errno} mechanism involved, with its % dependency upon a global, shared variable, for the sake of % brevity. However, if the length of \ex{buf} does not actually match the argument \ex{bufsiz}, the system call may either% \begin{itemize}% \item succeed anyway, \item dump core, \item overwrite other storage and silently proceed, \item report an error, \item or perform some fifth action. \end{itemize}% It all depends. \subcaption{(b) C definition of \ex{readlink}} \end{boxedminipage} \caption{Two definitions of \protect\ex{readlink}} \label{fig:symlink} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % With the scsh version, there is no possibility that the result buffer will be too small. There is no possibility that the programmer will misrepresent the size of the result buffer with an incorrect \ex{bufsiz} argument. These sorts of issues are completely eliminated by the {\Scheme} programming model. Instead of having to worry about seven or eight trivial but potentially fatal issues, and write the necessary 10 or 15 lines of code to correctly handle the operation, the programmer can write a single function call and get on with his task. \subsection{Return values and procedural composition} Exceptions and automatic storage allocation make it easier for procedures to return useful values. This increases the odds that the programmer can use the compact notation of function composition---\ex{f(g(x))}---to connect producers and consumers of data, which is surprisingly difficult in C. %Making it possible for procedures to return useful values is quite %useful, as it encourages programmers to use the compact notation of function %composition---\ex{f(g(x))}---to indicate data flow, which is surprisingly %difficult in C. In C, if we wish to compose two procedure calls, we frequently must write: \begin{code} /* C style: */ g(x,&y); {\ldots}f(y)\ldots\end{code} Procedures that compute composite data structures for a result commonly return them by storing them into a data structure passed by-reference as a parameter. If \ex{g} does this, we cannot nest calls, but must write the code as shown. In fact, the above code is not quite what we want; we forgot to check \ex{g} for an error return. What we really wanted was: \begin{code} /* Worse/better: */ err=g(x,&y); if( err ) \{ <{\it{handle error on {\tt{g}} call}}> \} {\ldots}f(y)\ldots\end{code} The person who writes this code has to remember to check for the error; the person who reads it has to visually link up the data flow by connecting \ex{y}'s def and use points. % puzzle out the data flow that goes from \ex{g}'s output value \ex{y} to % \ex{f}'s input value. % This is the data-flow equivalent of puzzling out the control flow % of a program by tracing its \ex{goto}'s. This is the data-flow equivalent of \ex{goto}'s, with equivalent effects on program clarity. In {\Scheme}, none of this is necessary. We simply write \codex{(f (g x)) ; Scheme} Easy to write; easy to read and understand. Figure \ref{fig:stat-file} shows an example of this problem, where the task is determining if a given file is owned by root. \begin{figure}[bthp] \begin{boxedminipage}{\linewidth}\vskip 1.5ex \begin{tightcode} (if (zero? (fileinfo:owner (file-attributes fname))) \ldots)\end{tightcode} \subcaption{\Scheme} \medskip \begin{tightinset} \begin{verbatim} if( stat(fname,&statbuf) ) { perror(progname); exit(-1); } if( statbuf.st_uid == 0 ) ...\end{verbatim} \end{tightinset} \subcaption{C} \caption{Why we program with Scheme.} \label{fig:stat-file} \end{boxedminipage} \end{figure} \subsection{Strings} Having a true string datatype turns out to be surprisingly valuable in making systems programs simpler and more robust. The programmer never has to expend effort to make sure that a string length kept in a variable matches the actual length of the string; never has to expend effort wondering how it will affect his program if a nul byte gets stored into his string. This is a minor feature, but like garbage collection, it eliminates a whole class of common C programming bugs. \subsection{Higher-order procedures} Scheme's first-class procedures are very convenient for systems programming. Scsh uses them to parameterise the action of procedures that create {\Unix} processes. The ability to package up an arbitrary computation as a thunk turns out to be as useful in the domain of {\Unix} processes as it is in the domain of {\Scheme} computation. Being able to pass computations in this way to the procedures that create {\Unix} processes, such as \ex{fork}, \ex{fork/pipe} and \ex{run/port*} is a powerful programming technique. First-class procedures allow us to parameterise port readers over different parsers, with the \codex{(port->list \var{parser} \var{port})} procedure. This is the essential {\Scheme} ability to capture abstraction in a procedure definition. If the user wants to read a list of objects written in some syntax from an i/o source, he need only write a parser capable of parsing a single object. The \ex{port->list} procedure can work with the user's parser as easily as it works with \ex{read} or \ex{read-line}. \note{On-line streams} First-class procedures also allow iterators such as \ex{for-each} and \ex{filter} to loop over lists of data. For example, to build the list of all my files in \ex{/usr/tmp}, I write: \begin{code} (filter (\l{f} (= (file-owner f) (user-uid))) (glob "/usr/tmp/*"))\end{code} To delete every C file in my directory, I write: \codex{(for-each delete-file (glob "*.c"))} \subsection{S-expression syntax and backquote} \label{sec:sexp} In general, {\Scheme}'s s-expression syntax is much, much simpler to understand and use than most shells' complex syntax, with their embedded pattern matching, variable expansion, alias substitution, and multiple rounds of parsing. This costs scsh's notation some compactness, at the gain of comprehensibility. \subsubsection*{Recursive embeddings and balls of mud} Scsh's ability to cover a high-level/low-level spectrum of expressiveness is a function of its uniform s-expression notational framework. Since scsh's process notation is embedded within Scheme, and Scheme escapes are embedded within the process notation, the programmer can easily switch back and forth as needed, using the simple notation where possible, and escaping to system calls and general {\Scheme} where necessary. This recursive embedding is what gives scsh its broad-spectrum coverage of systems functionality not available to either shells or traditional systems programming languages; it is essentially related to the ``ball of mud'' extensibility of the Lisp and Scheme family of languages. \subsubsection*{Backquote and reliable argument lists} Scsh's use of implicit backquoting in the process notation is a particularly nice feature of the s-expression syntax. %Most {\Unix} shells provide the user with a way to compute a list of strings %and use these strings as arguments to a program. Most {\Unix} shells provide the user with a way to take a computed string, split it into pieces, and pass them as arguments to a program. This usually requires the introduction of some sort of \ex{\$IFS} separator variable to control how the string is parsed into separate arguments. This makes things error prone in the cases where a single argument might contain a space or other parser delimiter. Worse than error prone, \ex{\$IFS} rescanning is in fact the source of a famous security hole in {\Unix} \cite{Reeds}. In scsh, data are used to construct argument lists using the implicit backquote feature of process forms, \eg: \begin{tightcode} (run (cc ,file -o ,binary ,@flags)).\end{tightcode} Backquote completely avoids the parsing issue because it deals with pre-parsed data: it constructs expressions from lists, not character strings. When the programmer computes a list of arguments, he has complete confidence that they will be passed to the program exactly as is, without running the risk of being re-parsed by the shell. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Other programming languages} \label{sec:opl} Having seen the design of scsh, we can now compare it to other approaches in some detail. \subsection{Functional languages} The design of scsh could be ported without much difficulty to any language that provides first-class procedures, GC, and exceptions, such as {\CommonLisp} or ML. However, {\Scheme}'s syntactic extensibility (macros) plays an important role in making the shell features convenient to use. In this respect, {\Scheme} and {\CommonLisp} are better choices than ML. Using the \ex{fork/pipe} procedure with a series of closures involves more low-level detail than using scsh's \ex{(| \vari{pf}{\!1} {\ldots} \vari{pf}{\!n})} process form with the closures implied. Good notations suppress unnecessary detail. The payoff for using a language such as ML would come not with small shell scripts, but with larger programs, where the power provided by the module system and the static type checking would come into play. \subsection{Shells} Traditional {\Unix} shells, such as sh, have no advantage at all as scripting languages. \subsubsection*{Escaping the least common denominator trap} One of the attractions of scsh is that it is a {\Unix} shell that isn't constrained by the limits of {\Unix}'s uniform ``least common denominator'' representation of data as a text string. Since the standard medium of interchange at the shell level is {\Ascii} byte strings, shell programmers are forced to parse and reparse data, often with tools of limited power. For example, to determine the number of files in a directory, a shell programmer typically uses an expression of the form \ex{ls | wc -l}. This traditional idiom is in fact buggy: {\Unix} files are allowed to contain newlines in their names, which would defeat the simple \ex{wc} parser. Scsh, on the other hand, gives the programmer direct access to the system calls, and employs a much richer set of data structures. Scsh's \ex{directory-files} procedure returns a {\em list\/} of strings, directly taken from the system call. There is no possibility of a parsing error. As another example, consider the problem of determining if a file has its setuid bit set. The shell programmer must grep the text-string output of \ex{ls -l} for the ``s'' character in the right position. Scsh gives the programmer direct access to the \ex{stat()} system call, so that the question can be directly answered. \subsubsection*{Computation granularity and impedance matching} Sh and csh provide minimal computation facilities on the assumption that all real computation will happen in C programs invoked from the shell. This is a granularity assumption. As long as the individual units of computation are large, then the cost of starting up a separate program is amortised over the actual computation. However, when the user wants to do something simple---\eg, split an X \verb|$DISPLAY| string at the colon, count the number of files in a directory, or lowercase a string---then the overhead of program invocation swamps the trivial computation being performed. One advantage of using a real programming language for the shell language is that we can get a wider-range ``impedance match'' of computation to process overhead. Simple computations can be done in the shell; large grain computations can still be spawned off to other programs if necessary. \subsection{New-generation scripting languages} A newer generation of scripting languages has been supplanting sh in {\Unix}. Systems such as perl and tcl provide many of the advantages of scsh for programming shell scripts \cite{perl, tcl}. However, they are still limited by weak linguistic features. Perl and tcl still deal with the world primarily in terms of strings, which is both inefficient and expressively limiting. Scsh makes the full range of Scheme data types available to the programmer: lists, records, floating point numbers, procedures, and so forth. Further, the abstraction mechanisms in perl and tcl are also much more limited than Scheme's lexically scoped, first-class procedures and lambda expressions. As convenient as tcl and perl are, they are in no sense full-fledged general systems-programming languages: you would not, for example, want to write an optimizing compiler in tcl. Scsh is Scheme, hence a powerful, full-featured general programming tool. It is, however, instructive to consider the reasons for the popular success of tcl and perl. I would argue that good design is necessary but insufficient for a successful tool. Tcl and perl are successful because they are more than just competently designed; critically, they are also available on the Net in turn-key forms, with solid documentation. A potential user can just down-load and compile them. Scheme, on the other hand, has existed in multiple mutually-incompatible implementations that are not widely portable, do not portably address systems issues, and are frequently poorly documented. A contentious and standards-cautious Scheme community has not standardised on a record datatype or exception facility for the language, features critical for systems programming. Scheme solves the hard problems, but punts the necessary, simpler ones. This has made Scheme an impractical systems tool, banishing it to the realm of pedagogical programming languages. Scsh, together with Scheme 48, fills in these lacunae. Its facilities may not be the ultimate solutions, but they are useable technology: clean, consistent, portable and documented. %\part{Conclusion} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Future work} \label{sec:future-work} Several extensions to scsh are being considered or implemented. \subsection{Command language features} The primary design effort of scsh was for programming. We are now designing and implementing features to make scsh a better interactive command language, such as job control. A top-level parser for an sh-like notation has been designed; the parser will allow the user to switch back to {\Scheme} notation when desired. We are also considering a display-oriented interactive shell, to be created by merging the edwin screen editor and scsh. The user will interact with the operating system using single-keystroke commands, defining these commands using scsh, and reverting to {\Scheme} when necessary for complex tasks. Given a reasonable set of GUI widgets, the same trick could be played directly in X. \subsection{Little languages} Many {\Unix} tools are built around the idea of ``little languages,'' that is, custom, limited-purpose languages that are designed to fit the area of application. The problem with the little-languages approach is that these languages are usually ugly, idiosyncratic, and limited in expressiveness. The syntactic quirks of these little languages are notorious. The well-known problem with \ex{make}'s syntax distinguishing tab and space has been tripping up programmers for years. Because each little language is different from the next, the user is required to master a handful of languages, unnecessarily increasing the cognitive burden to use these tools. An alternate approach is to embed the tool's primitive operations inside {\Scheme}, and use the rest of {\Scheme} as the procedural glue to connect the primitives into complex systems. This sort of approach doesn't require the re-invention of all the basic functionality needed by a language---{\Scheme} provides variables, procedures, conditionals, data structures, and so forth. This means there is a greater chance of the designer ``getting it right'' since he is really leveraging off of the enormous design effort that was put into designing the {\Scheme} language. It also means the user doesn't have to learn five or six different little languages---just {\Scheme} plus the set of base primitives for each application. Finally, it means the base language is not limited because the designer didn't have the time or resources to implement all the features of a real programming language. With the scsh {\Unix} library, these ``little language'' {\Unix} tools could easily be redesigned from a {\Scheme} perspective and have their interface and functionality significantly improved. Some examples under consideration are: \begin{itemize} \item The awk pattern-matching language can be implemented in scsh by adding a single record-input procedure to the existing code. \item Expect is a scripting language used for automating the use of interactive programs, such as ftp. With the exception of the tty control syscalls currently under construction, all the pieces needed to design an alternate scsh-based {\Unix} scripting tool already exist in scsh. \item A dependency-directed system for controlling recompilation such as make could easily be implemented on top of scsh. Here, instead of embedding the system inside of {\Scheme}, we embed {\Scheme} inside of the system. The dependency language would use s-expression notation, and the embedded compilation actions would be specified as {\Scheme} expressions, including scsh notation for running {\Unix} programs. \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Conclusion} Scsh is a system with several faces. From one perspective, it is not much more than a system-call library and a few macros. Yet, there is power in this minimalist description---it points up the utility of embedding systems in languages such as {\Scheme}. {\Scheme} is at core what makes scsh a successful design. Which leads us to three final thoughts on the subject of scsh and systems programming in {\Unix}: \begin{itemize} \item A Scheme shell wins because it is broad-spectrum. \item A functional language is an excellent tool for systems programming. \item Hacking Unix isn't so bad, actually, if you don't have to use C. \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Acknowledgements} John Ellis' 1980 {\em SIGPLAN Notices\/} paper \cite{ellis} got me thinking about this entire area. Some of the design for the system calls was modeled after Richard Stallman's emacs \cite{emacs}, Project MAC's MIT {\Scheme} \cite{c-scheme}, and {\CommonLisp} \cite{cltl2}. Tom Duff's {\Unix} shell, rc, was also inspirational; his is the only elegant {\Unix} shell I've seen \cite{rc}. Flames with Bennet Yee and Scott Draves drove me to design scsh in the first place; polite discussions with John Ellis and Scott Nettles subsequently improved it. Douglas Orr was my private {\Unix} kernel consultant. Richard Kelsey and Jonathan Rees provided me with twenty-four hour turnaround time on requested modifications to {\scm}, and spent a great deal of time explaining the internals of the implementation to me. Their elegant {\Scheme} implementation was a superb platform for development. The design and the major portion of the implementation of scsh were completed while I was visiting on the faculty of the University of Hong Kong in 1992. It was very pleasant to work in such a congenial atmosphere. Doug Kwan was a cooperative sounding-board during the design phase. Hsu Suchu has patiently waited quite a while for this document to be finished. Members of the MIT LCS and AI Lab community encouraged me to polish the research prototype version of the shell into something releasable to the net. Henry Minsky and Ian Horswill did a lot of the encouraging; my students Dave Albertz and Brian Carlstrom did a lot of the polishing. Finally, the unix-haters list helped a great deal to maintain my perspective. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \cleardoublepage \begin{thebibliography}{MIT Scheme} \addcontentsline{toc}{section}{References} \sloppy \def\\{\newblock} \renewcommand{\=}{\discretionary{/}{}{/}} \renewcommand{\.}{\discretionary{.}{}{.}} \newcommand{\ob}{\linebreak[0]} \itemsep= 2ex plus 1fil \let\Bibitem=\bibitem \Bibitem[CLtL2]{cltl2} Guy L.~Steele Jr. \\ {\em Common Lisp: The Language.} \\ Digital Press, Maynard, Mass., second edition 1990. \Bibitem[Ellis]{ellis} John R.~Ellis. \\ A {\sc Lisp} shell. \\ {\em SIGPLAN Notices}, 15(5):24--34, May 1980. \Bibitem[emacs]{emacs} Bil Lewis, Dan LaLiberte, Richard M.~Stallman, {\em et al.} \\ {\em The GNU Emacs Lisp Reference Manual, vol.~2.} \\ Free Software Foundation, Cambridge, Mass., edition 2.1 September 1993. (Also available from many ftp sites.) \Bibitem[fsh]{fsh} Chris S.~McDonald. \\ {\em fsh}---A functional {\Unix} command interpreter. \\ {\em Software---Practice and Experience}, 17(10):685--700, October 1987. \Bibitem[MIT Scheme]{c-scheme} Chris Hanson. \\ {\em MIT Scheme Reference Manual.} \\ MIT Artificial Intelligence Laboratory Technical Report 1281, January 1991. (Also URL {\tt http://martigny\.ai\.mit\.edu\=emacs-html\.local\=scheme\_toc.html}) \Bibitem[Nelson]{Nelson} Greg Nelson, ed. \\ {\em Systems Programming with Modula-3.} \\ Prentice Hall, Englewood Cliffs, New Jersey, 1991. \Bibitem[perl]{perl} Larry Wall and Randal Schwartz. \\ {\em Programming Perl.} \\ O'Reilly \& Associates. \Bibitem[rc]{rc} Tom Duff. \\ Rc---A shell for Plan 9 and {\Unix} systems. \\ In {\em Proceedings of the Summer 1990 UKUUG Conference}, pages 21--33, July 1990, London. (A revised version is reprinted in ``Plan 9: The early papers,'' Computing Science Technical Report 158, AT\&T Bell Laboratories. Also available in Postscript form as URL \ex{ftp:{\ob}/\=research.att.com/dist/plan9doc/7}.) \Bibitem[Reeds]{Reeds} J.~Reeds. \\ \ex{/bin/sh}: the biggest UNIX security loophole. \\ 11217-840302-04TM, AT\&T Bell Laboratories (1988). \Bibitem[refman]{ref-man} Olin Shivers. \\ Scsh reference manual. \\ In preparation. \Bibitem[S48]{S48} Richard A.~Kelsey and Jonathan A.~Rees. \\ A tractable Scheme implementation. \\ To appear, {\em Lisp and Symbolic Computation}, Kluwer Academic Publishers, The Netherlands. (Also URL {\tt ftp:/\=altdorf\.ai\.mit\.edu\=pub\=jar\=lsc.ps}) \Bibitem[tcl]{tcl} John~K.~Ousterhout. \\ Tcl: An embeddable command language. \\ In {\em The Proceedings of the 1990 Winter USENIX Conference}, pp.~133--146. (Also URL {\tt ftp:{\ob}/\=ftp\.cs\.berkeley\.edu\=ucb\=tcl\=tclUsenix90.ps}) \vfill \end{thebibliography} \appendix %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \cleardoublepage \section*{Notes} \addcontentsline{toc}{section}{Notes} \newcommand{\notetext}[1]{\subsection*{\{Note #1\}}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \notetext{Agenda} In fact, I have an additional hidden agenda. I do believe that computational agents should be expressed as procedures or procedure libraries, not as programs. Scsh is intended to be an incremental step in this direction, one that is integrated with {\Unix}. Writing a program as a Scheme 48 module should allow the user to make it available as a both a subroutine library callable from other Scheme 48 programs or the interactive read-eval-print loop, and, by adding a small top-level, as a standalone {\Unix} program. So {\Unix} programs written this way will also be useable as linkable subroutine libraries---giving the programmer module interfaces superior to {\Unix}'s ``least common denominator'' of {\sc Ascii} byte streams sent over pipes. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \notetext{No port sync} \begin{sloppypar} In scsh, {\Unix}' stdio file descriptors and {\Scheme}'s standard i/o ports (\ie, the values of \ex{(current-input-port)}, \ex{(current-output-port)} and \ex{(error-output-port)}) are not necessarily synchronised. This is impossible to do in general, since some {\Scheme} ports are not representable as {\Unix} file descriptors. For example, many Scheme implementations provide ``string ports,'' that is, ports that collect characters sent to them into memory buffers. The accumulated string can later be retrieved from the port as a string. If a user were to bind \ex{(current-output-port)} to such a port, it would be impossible to associate file descriptor 1 with this port, as it cannot be represented in {\Unix}. So, if the user subsequently forked off some other program as a subprocess, that program would of course not see the Scheme string port as its standard output. \end{sloppypar} To keep stdio synced with the values of {\Scheme}'s current i/o ports, use the special redirection \ex{stdports}. This causes 0, 1, 2 to be redirected from the current {\Scheme} standard ports. It is equivalent to the three redirections: \begin{code} (= 0 ,(current-input-port)) (= 1 ,(current-output-port)) (= 2 ,(error-output-port))\end{code} % The redirections are done in the indicated order. This will cause an error if the one of current i/o ports isn't a {\Unix} port (\eg, if one is a string port). This {\Scheme}/{\Unix} i/o synchronisation can also be had in {\Scheme} code (as opposed to a redirection spec) with the \ex{(stdports->stdio)} procedure. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \notetext{Normal order} Having to explicitly shift between processes and functions in scsh is in part due to the arbitrary-size nature of a {\Unix} stream. A better, more integrated approach might be to use a lazy, normal-order language as the glue or shell language. Then files and process output streams could be regarded as first-class values, and treated like any other sequence in the language. However, I suspect that the realities of {\Unix}, such as side-effects, will interfere with this simple model. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \notetext{On-line streams} The \ex{(port->list \var{reader} \var{port})} procedure is a batch processor: it reads the port all the way to eof before returning a value. As an alternative, we might write a procedure to take a port and a reader, and return a lazily-evaluated list of values, so that I/O can be interleaved with element processing. A nice example of the power of Scheme's abstraction facilities is the ease with which we can write this procedure: it can be done with five lines of code. \begin{code} ;;; A is either ;;; (delay '()) or ;;; (delay (cons data )). (define (port->lazy-list reader port) (let collector () (delay (let ((x (reader port))) (if (eof-object? x) '() (cons x (collector)))))))\end{code} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \notetext{Tempfile example} For a more detailed example showing the advantages of higher-order procedures in {\Unix} systems programming, consider the task of making random temporary objects (files, directories, fifos, \etc) in the file system. Most {\Unix}'s simply provide a function such as \ex{tmpnam()} that creates a file with an unusual name, and hope for the best. Other {\Unix}'s provide functions that avoid the race condition between determining the temporary file's name and creating it, but they do not provide equivalent features for non-file objects, such as directories or symbolic links. \pagebreak This functionality is easily generalised with the procedure \codex{(temp-file-iterate \var{maker} \var{[template]})} This procedure can be used to perform atomic transactions on the file system involving filenames, \eg: \begin{itemize} \item Linking a file to a fresh backup temporary name. \item Creating and opening an unused, secure temporary file. \item Creating an unused temporary directory.% \end{itemize} % The string \var{template} is a \ex{format} control string used to generate a series of trial filenames; it defaults to % \begin{tightinset}\verb|"/usr/tmp/.~a"|\end{tightinset}\ignorespaces % where \ex{} is the current process' process id. Filenames are generated by calling \ex{format} to instantiate the template's \verb|~a| field with a varying string. (It is not necessary for the process' pid to be a part of the filename for the uniqueness guarantees to hold. The pid component of the default prefix simply serves to scatter the name searches into sparse regions, so that collisions are less likely to occur. This speeds things up, but does not affect correctness.) The \ex{maker} procedure is serially called on each filename generated. It must return at least one value; it may return multiple values. If the first return value is \ex{\#f} or if \ex{maker} raises the ``file already exists'' syscall error exception, \ex{temp-file-iterate} will loop, generating a new filename and calling \ex{maker} again. If the first return value is true, the loop is terminated, returning whatever \ex{maker} returned. After a number of unsuccessful trials, \ex{temp-file-iterate} may give up and signal an error. To rename a file to a temporary name, we write: \begin{code} (temp-file-iterate (\l{backup-name} (create-hard-link old-file backup-name) backup-name) ".#temp.~a") ; Keep link in cwd. (delete-file old-file)\end{code} Note the guarantee: if \ex{temp-file-iterate} returns successfully, then the hard link was definitely created, so we can safely delete the old link with the following \ex{delete-file}. To create a unique temporary directory, we write: % \codex{(temp-file-iterate (\l{dir} (create-directory dir) dir))} % Similar operations can be used to generate unique symlinks and fifos, or to return values other than the new filename (\eg, an open file descriptor or port). \end{document} % LocalWords: Mips grep sed awk ls wc email flamefest philes SRC's dup int pid % LocalWords: foobar fds perror waitpid execlp kb rc's epf pf fdes fv % LocalWords: stdports dup'd subforms backquoted usr backquoting ref tmp % LocalWords: buf stdin stderr stdout sync prog arg Readme xrdb xyzzy SunOS % LocalWords: mbox txt cc preprocess Errlog exe outfile errfile PLAINTEXT des % LocalWords: plaintext DIR perl cwd dir dsw ll conns xhost lpr ksh namespaces % LocalWords: ms texinfo doc fd RS Wn Ansi esh zcat tex detex enscript madvise % LocalWords: mmap stat fname eq fileinfo backquote readlink symlink fil nul % LocalWords: bufsiz def bthp statbuf progname uid Tempfile IFS pre Ascii bp % LocalWords: reparse setuid