some cleanup

This commit is contained in:
Jeff Bezanson 2012-02-17 17:38:10 -05:00
parent 21dd640454
commit ed2b11a8ac
22 changed files with 5 additions and 697 deletions

4
femtolisp/.gitignore vendored Normal file
View File

@ -0,0 +1,4 @@
/*.o
/*.do
/*.a
/flisp

View File

@ -22,7 +22,7 @@ SHIPFLAGS = -O2 -DNDEBUG $(FLAGS)
default: release test default: release test
test: test:
./flisp unittest.lsp cd tests && ../flisp unittest.lsp
%.o: %.c %.o: %.c
$(CC) $(SHIPFLAGS) -c $< -o $@ $(CC) $(SHIPFLAGS) -c $< -o $@

View File

@ -1,62 +0,0 @@
1. Syntax
symbols
numbers
conses and vectors
comments
special prefix tokens: ' ` , ,@ ,.
other read macros: #. #' #\ #< #n= #n# #: #ctor
builtins
2. Data and execution models
3. Primitive functions
eq atom not set prog1 progn
symbolp numberp builtinp consp vectorp boundp
+ - * / <
apply eval
4. Special forms
quote if lambda macro while label cond and or
5. Data structures
cons car cdr rplaca rplacd list
alloc vector aref aset length
6. Other functions
read, print, princ, load, exit
equal, compare
gensym
7. Exceptions
trycatch raise
8. Cvalues
introduction
type representations
constructors
access
memory management concerns
ccall
If deliberate 50% heap utilization seems wasteful, consider:
- malloc has per-object overhead. for small allocations you might use
much more space than you think.
- any non-moving memory manager (whether malloc or a collector) can
waste arbitrary amounts of memory through fragmentation.
With a copying collector, you agree to give up 50% of your memory
up front, in exchange for significant benefits:
- really fast allocation
- heap compaction, improving locality and possibly speeding up computation
- collector performance O(1) in number of dead objects, essential for
maximal performance on generational workloads

View File

@ -1,428 +0,0 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>femtoLisp</title>
</head>
<body bgcolor="#fcfcfc"> <!-"#fcfcc8">
<img src="flbanner.jpg">
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>0. Argument</h1>
This Lisp has the following characteristics and goals:
<ul>
<li>Lisp-1 evaluation rule (ala Scheme)
<li>Self-evaluating lambda (i.e. <tt>'(lambda (x) x)</tt> is callable)
<li>Full Common Lisp-style macros
<li>Dotted lambda lists for rest arguments (ala Scheme)
<li>Symbols have one binding
<li>Builtin functions are constants
<li><em>All</em> values are printable and readable
<li>Case-sensitive symbol names
<li>Only the minimal core built-in (i.e. written in C), but
enough to provide a practical level of performance
<li>Very short (but not necessarily simple...) implementation
<li>Generally use Common Lisp operator names
<li>Nothing excessively weird or fancy
</ul>
<h1>1. Syntax</h1>
<h2>1.1. Symbols</h2>
Any character string can be a symbol name, including the empty string. In
general, text between whitespace is read as a symbol except in the following
cases:
<ul>
<li>The text begins with <tt>#</tt>
<li>The text consists of a single period <tt>.</tt>
<li>The text contains one of the special characters <tt>()[]';`,\|</tt>
<li>The text is a valid number
<li>The text is empty
</ul>
In these cases the symbol can be written by surrounding it with <tt>| |</tt>
characters, or by escaping individual characters within the symbol using
backslash <tt>\</tt>. Note that <tt>|</tt> and <tt>\</tt> must always be
preceded with a backslash when writing a symbol name.
<h2>1.2. Numbers</h2>
A number consists of an optional + or - sign followed by one of the following
sequences:
<ul>
<li><tt>NNN...</tt> where N is a decimal digit
<li><tt>0xNNN...</tt> where N is a hexadecimal digit
<li><tt>0NNN...</tt> where N is an octal digit
</ul>
femtoLisp provides 30-bit integers, and it is an error to write a constant
less than -2<sup>29</sup> or greater than 2<sup>29</sup>-1.
<h2>1.3. Conses and vectors</h2>
The text <tt>(a b c)</tt> parses to the structure
<tt>(cons a (cons b (cons c nil)))</tt> where a, b, and c are arbitrary
expressions.
<p>
The text <tt>(a . b)</tt> parses to the structure
<tt>(cons a b)</tt> where a and b are arbitrary expressions.
<p>
The text <tt>()</tt> reads as the symbol <tt>nil</tt>.
<p>
The text <tt>[a b c]</tt> parses to a vector of expressions a, b, and c.
The syntax <tt>#(a b c)</tt> has the same meaning.
<h2>1.4. Comments</h2>
Text between a semicolon <tt>;</tt> and the next end-of-line is skipped.
Text between <tt>#|</tt> and <tt>|#</tt> is also skipped.
<h2>1.5. Prefix tokens</h2>
There are five special prefix tokens which parse as follows:<p>
<tt>'a</tt> is equivalent to <tt>(quote a)</tt>.<br>
<tt>`a</tt> is equivalent to <tt>(backquote a)</tt>.<br>
<tt>,a</tt> is equivalent to <tt>(*comma* a)</tt>.<br>
<tt>,@a</tt> is equivalent to <tt>(*comma-at* a)</tt>.<br>
<tt>,.a</tt> is equivalent to <tt>(*comma-dot* a)</tt>.
<h2>1.6. Other read macros</h2>
femtoLisp provides a few "read macros" that let you accomplish interesting
tricks for textually representing data structures.
<table border=1>
<tr>
<td>sequence<td>meaning
<tr>
<td><tt>#.e</tt><td>evaluate expression <tt>e</tt> and behave as if e's
value had been written in place of e
<tr>
<td><tt>#\c</tt><td><tt>c</tt> is a character; read as its Unicode value
<tr>
<td><tt>#n=e</tt><td>read <tt>e</tt> and label it as <tt>n</tt>, where n
is a decimal number
<tr>
<td><tt>#n#</tt><td>read as the identically-same value previously labeled
<tt>n</tt>
<tr>
<td><tt>#:gNNN or #:NNN</tt><td>read a gensym. NNN is a hexadecimal
constant. future occurrences of the same <tt>#:</tt> sequence will read to
the identically-same gensym
<tr>
<td><tt>#sym(...)</tt><td>reads to the result of evaluating
<tt>(apply sym '(...))</tt>
<tr>
<td><tt>#&lt;</tt><td>triggers an error
<tr>
<td><tt>#'</tt><td>ignored; provided for compatibility
<tr>
<td><tt>#!</tt><td>single-line comment, for script execution support
<tr>
<td><tt>"str"</tt><td>UTF-8 character string; may contain newlines.
<tt>\</tt> is the escape character. All C escape sequences are supported, plus
<tt>\u</tt> and <tt>\U</tt> for unicode values.
</table>
When a read macro involves persistent state (e.g. label assignments), that
state is valid only within the closest enclosing call to <tt>read</tt>.
<h2>1.7. Builtins</h2>
Builtin functions are represented as opaque constants. Every builtin
function is the value of some constant symbol, so the builtin <tt>eq</tt>,
for example, can be written as <tt>#.eq</tt> ("the value of symbol eq").
Note that <tt>eq</tt> itself is still an ordinary symbol, except that its
value cannot be changed.
<p>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>2. Data and execution models</h1>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>3. Primitive functions</h1>
eq atom not set prog1 progn
symbolp numberp builtinp consp vectorp boundp
+ - * / <
apply eval
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>4. Special forms</h1>
quote if lambda macro while label cond and or
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>5. Data structures</h1>
cons car cdr rplaca rplacd list
alloc vector aref aset length
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>6. Other functions</h1>
read print princ load exit
equal compare
gensym
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>7. Exceptions</h1>
trycatch raise
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>8. Cvalues</h1>
<h2>8.1. Introduction</h2>
femtoLisp allows you to use the full range of C data types on
dynamically-typed Lisp values. The motivation for this feature is that
useful
interpreters must provide a large library of routines in C for dealing
with "real world" data like text and packed numeric arrays, and I would
rather not write yet another such library. Instead, all the
required data representations and primitives are provided so that such
features could be implemented in, or at least described in, Lisp.
<p>
The cvalues capability makes it easier to call C from Lisp by providing
ways to construct whatever arguments your C routines might require, and ways
to decipher whatever values your C routines might return. Here are some
things you can do with cvalues:
<ul>
<li>Call native C functions from Lisp without wrappers
<li>Wrap C functions in pure Lisp, automatically inheriting some degree
of type safety
<li>Use Lisp functions as callbacks from C code
<li>Use the Lisp garbage collector to reclaim malloc'd storage
<li>Annotate C pointers with size information for bounds checking or
serialization
<li>Attach symbolic type information to a C data structure, allowing it to
inherit Lisp services such as printing a readable representation
<li>Add datatypes like strings to Lisp
<li>Use more efficient represenations for your Lisp programs' data
</ul>
<p>
femtoLisp's "cvalues" is inspired in part by Python's "ctypes" package.
Lisp doesn't really have first-class types the way Python does, but it does
have values, hence my version is called "cvalues".
<h2>8.2. Type representations</h2>
The core of cvalues is a language for describing C data types as
symbolic expressions:
<ul>
<li>Primitive types are symbols <tt>int8, uint8, int16, uint16, int32, uint32,
int64, uint64, char, wchar, long, ulong, float, double, void</tt>
<li>Arrays <tt>(array TYPE SIZE)</tt>, where TYPE is another C type and
SIZE is either a Lisp number or a C ulong. SIZE can be omitted to
represent incomplete C array types like "int a[]". As in C, the size may
only be omitted for the top level of a nested array; all array
<em>element</em> types
must have explicit sizes. Examples:
<ul>
<tt>int a[][2][3]</tt> is <tt>(array (array (array int32 3) 2))</tt><br>
<tt>int a[4][]</tt> would be <tt>(array (array int32) 4)</tt>, but this is
invalid.
</ul>
<li>Pointer <tt>(pointer TYPE)</tt>
<li>Struct <tt>(struct ((NAME TYPE) (NAME TYPE) ...))</tt>
<li>Union <tt>(union ((NAME TYPE) (NAME TYPE) ...))</tt>
<li>Enum <tt>(enum (NAME NAME ...))</tt>
<li>Function <tt>(c-function RET-TYPE (ARG-TYPE ARG-TYPE ...))</tt>
</ul>
A cvalue can be constructed using <tt>(c-value TYPE arg)</tt>, where
<tt>arg</tt> is some Lisp value. The system will try to convert the Lisp
value to the specified type. In many cases this will work better if some
components of the provided Lisp value are themselves cvalues.
<p>
Note the function type is called "c-function" to avoid confusion, since
functions are such a prevalent concept in Lisp.
<p>
The function <tt>sizeof</tt> returns the size (in bytes) of a cvalue or a
c type. Every cvalue has a size, but incomplete types will cause
<tt>sizeof</tt> to raise an error. The function <tt>typeof</tt> returns
the type of a cvalue.
<p>
You are probably wondering how 32- and 64-bit integers are constructed from
femtoLisp's 30-bit integers. The answer is that larger integers are
constructed from multiple Lisp numbers 16 bits at a time, in big-endian
fashion. In fact, the larger numeric types are the only cvalues
types whose constructors accept multiple arguments. Examples:
<ul>
<pre>
(c-value 'int32 0xdead 0xbeef) ; make 0xdeadbeef
(c-value 'uint64 0x1001 0x8000 0xffff) ; make 0x000010018000ffff
</pre>
</ul>
As you can see, missing zeros are padded in from the left.
<h2>8.3. Constructors</h2>
For convenience, a specialized constructor is provided for each
class of C type (primitives, pointer, array, struct, union, enum,
and c-function).
For example:
<ul>
<pre>
(uint32 0xcafe 0xd00d)
(int32 -4)
(char #\w)
(array 'int8 [1 1 2 3 5 8])
</pre>
</ul>
These forms can be slightly less efficient than <tt>(c-value ...)</tt>
because in many cases they will allocate a new type for the new value.
For example, the fourth expression must create the type
<tt>(array int8 6)</tt>.
<p>
Notice that calls to these constructors strongly resemble
the types of the values they create. This relationship can be expressed
formally as follows:
<pre>
(define (c-allocate type)
(if (atom type)
(apply (eval type) ())
(apply (eval (car type)) (cdr type))))
</pre>
This function produces an instance of the given type by
invoking the appropriate constructor. Primitive types (whose representations
are symbols) can be constructed with zero arguments. For other types,
the only required arguments are those present in the type representation.
Any arguments after those are initializers. Using
<tt>(cdr type)</tt> as the argument list provides only required arguments,
so the value you get will not be initialized.
<p>
The builtin <tt>c-value</tt> function is similar to this one, except that it
lets you pass initializers.
<p>
Cvalue constructors are generally permissive; they do the best they
can with whatever you pass in. For example:
<ul>
<pre>
(c-value '(array int8 1)) ; ok, full type provided
(c-value '(array int8)) ; error, no size information
(c-value '(array int8) [0 1]) ; ok, size implied by initializer
</pre>
</ul>
<p>
ccopy, c2lisp
<h2>8.4. Pointers, arrays, and strings</h2>
Pointer types are provided for completeness and C interoperability, but
they should not generally be used from Lisp. femtoLisp doesn't know
anything about a pointer except the raw address and the (alleged) type of the
value it points to. Arrays are much more useful. They behave like references
as in C, but femtoLisp tracks their sizes and performs bounds checking.
<p>
Arrays are used to allocate strings. All strings share
the incomplete array type <tt>(array char)</tt>:
<pre>
> (c-value '(array char) [#\h #\e #\l #\l #\o])
"hello"
> (sizeof that)
5
</pre>
<tt>sizeof</tt> reveals that the size is known even though it is not
reflected in the type (as is always the case with incomplete array types).
<p>
Since femtoLisp tracks the sizes of all values, there is no need for NUL
terminators. Strings are just arrays of bytes, and may contain zero bytes
throughout. However, C functions require zero-terminated strings. To
solve this problem, femtoLisp allocates magic strings that actually have
space for one more byte than they appear to. The hidden extra byte is
always zero. This guarantees that a C function operating on the string
will never overrun its allocated space.
<p>
Such magic strings are produced by double-quoted string literals, and by
any explicit string-constructing function (such as <tt>string</tt>).
<p>
Unfortunately you still need to be careful, because it is possible to
allocate a non-magic character array with no terminator. The "hello"
string above is an example of this, since it was constructed from an
explicit vector of characters.
Such an array would cause problems if passed to a function expecting a
C string.
<p>
deref
<h2>8.5. Access</h2>
cref,cset,byteref,byteset,ccopy
<h2>8.6. Memory management concerns</h2>
autorelease
<h2>8.7. Guest functions</h2>
Functions written in C but designed to operate on Lisp values are
known here as "guest functions". Although they are foreign, they live in
Lisp's house and so live by its rules. Guest functions are what you
use to write interpreter extensions, for example to implement a function
like <tt>assoc</tt> in C for performance.
<p>
Guest functions must have a particular signature:
<pre>
value_t func(value_t *args, uint32_t nargs);
</pre>
Guest functions must also be aware of the femtoLisp API and garbage
collector.
<h2>8.8. Native functions</h2>
</body>
</html>

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.7 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.2 KiB

View File

@ -1,206 +0,0 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>femtoLisp</title>
</head>
<body>
<h1>femtoLisp</h1>
<hr>
femtoLisp is an elegant Lisp implementation. Its goal is to be a
reasonably efficient and capable interpreter with the shortest, simplest
code possible. As its name implies, it is small (10<sup>-15</sup>).
Right now it is just 1000 lines of C (give or take). It would make a great
teaching example, or a useful system anywhere a very small Lisp is wanted.
It is also a useful basis for developing other interpreters or related
languages.
<h2>The language implemented</h2>
femtoLisp tries to be a generic, simple Lisp dialect, influenced by McCarthy's
original.
<ul>
<li>Types: cons, symbol, 30-bit integer, builtin function
<li>Self-evaluating lambda, macro, and label forms
<li>Full Common Lisp-style macros
<li>Case-sensitive symbol names
<li>Scheme-style evaluation rule where any expression may appear in head
position as long as it evaluates to a callable
<li>Scheme-style formal argument lists (dotted lists for varargs)
<li>Transparent closure representation <tt>(lambda args body . env)</tt>
<li>A lambda body may contain only one form. Use explicit <tt>progn</tt> for
multiple forms. Included macros, however, allow <tt>defun</tt>,
<tt>let</tt>, etc. to accept multiple body forms.
<li>Builtin function names are constants and cannot be redefined.
<li>Symbols have one binding, as in Scheme.
</ul>
<b>Builtin special forms:</b><br>
<tt>quote, cond, if, and, or, lambda, macro, label, while, progn, prog1</tt>
<p>
<b>Builtin functions:</b><br>
<tt>eq, atom, not, symbolp, numberp, boundp, cons, car, cdr,
read, eval, print, load, set,
+, -, *, /, &lt;, apply, rplaca, rplacd</tt>
<p>
<b>Included library functions and macros:</b><br>
<tt>
setq, setf, defmacro, defun, define, let, let*, labels, dotimes,
macroexpand-1, macroexpand, backquote,
null, consp, builtinp, self-evaluating-p, listp, eql, equal, every, any,
when, unless,
=, !=, &gt;, &lt;=, &gt;=, compare, mod, abs, identity,
list, list*, length, last, nthcdr, lastcdr, list-ref, reverse, nreverse,
assoc, member, append, nconc, copy-list, copy-tree, revappend, nreconc,
mapcar, filter, reduce, map-int,
symbol-plist, set-symbol-plist, put, get
</tt>
<p>
<a href="system.lsp">system.lsp</a>
<h2>The implementation</h2>
<ul>
<li>Compacting copying garbage collector (<tt>O(1)</tt> in number of dead
objects)
<li>Tagged pointers for efficient type checking and fast integers
<li>Tail-recursive evaluator (tail calls use no stack space)
<li>Minimally-consing <tt>apply</tt>
<li>Interactive and script execution modes
</ul>
<p>
<a href="lisp.c">lisp.c</a>
<h2>femtoLisp2</h2>
This version includes robust reading and printing capabilities for
circular structures and escaped symbol names. It adds read and print support
for the Common Lisp read-macros <tt>#., #n#,</tt> and <tt>#n=</tt>.
This allows builtins to be printed in a readable fashion as e.g.
"<tt>#.eq</tt>".
<p>
The net result is that the interpreter achieves a highly satisfying property
of closure under I/O. In other words, every representable Lisp value can be
read and printed.
<p>
The traditional builtin <tt>label</tt> provides a purely-functional,
non-circular way
to write an anonymous recursive function. In femtoLisp2 you can
achieve the same effect "manually" using nothing more than the reader:
<br>
<tt>#0=(lambda (x) (if (&lt;= x 0) 1 (* x (#0# (- x 1)))))</tt>
<p>
femtoLisp2 has the following extra features and optimizations:
<ul>
<li> builtin functions <tt>error, exit,</tt> and <tt>princ</tt>
<li> read support for backquote expressions
<li> delayed environment consing
<li> collective allocation of cons chains
</ul>
Those two optimizations are a Big Deal.
<p>
<a href="lisp2.c">lisp2.c</a> (uses <a href="flutils.c">flutils.c</a>)
<h2>Performance</h2>
femtoLisp's performance is surprising. It is faster than most
interpreters, and it is usually within a factor of 2-5 of compiled CLISP.
<table border=1>
<tr>
<td colspan=3><center><b>solve 5 queens problem 100x</b></center></td>
<tr>
<td> <td>interpreted<td>compiled
<tr>
<td>CLISP <td>4.02 sec <td>0.68 sec
<tr>
<td>femtoLisp2<td>2.62 sec <td>2.03 sec**
<tr>
<td>femtoLisp <td>6.02 sec <td>5.64 sec**
<tr>
<td colspan=3><center><b>recursive fib(34)</b></center></td>
<tr>
<td> <td>interpreted<td>compiled
<tr>
<td>CLISP <td>23.12 sec <td>4.04 sec
<tr>
<td>femtoLisp2<td>4.71 sec <td>n/a
<tr>
<td>femtoLisp <td>7.25 sec <td>n/a
<tr>
</table>
** femtoLisp is not a compiler; in this context "compiled" means macros
were pre-expanded.
<h2>"Installation"</h2>
Here is a <a href="Makefile">Makefile</a>. Type <tt>make</tt> to build
femtoLisp, <tt>make NAME=lisp2</tt> to build femtoLisp2.
<h2>Tail recursion</h2>
The femtoLisp evaluator is tail-recursive, following the idea in
<a href="http://library.readscheme.org/servlets/cite.ss?pattern=Ste-76b">
Lambda: The Ultimate Declarative</a> (should be required reading
for all schoolchildren).
<p>
The femtoLisp source provides a simple concrete example showing why a function
call is best viewed as a "renaming plus goto" rather than as a set of stack
operations.
<p>
Here is the non-tail-recursive evaluator code to evaluate the body of a
lambda (function), from <a href="lisp-nontail.c">lisp-nontail.c</a>:
<pre>
PUSH(*lenv); // preserve environment on stack
lenv = &amp;Stack[SP-1];
v = eval(*body, lenv);
POP();
return v;
</pre>
(Note that because of the copying garbage collector, values are referenced
through relocatable handles.)
<p>
Superficially, the call to <tt>eval</tt> is not a tail call, because work
remains after it returns&mdash;namely, popping the environment off the stack.
In other words, the control stack must be saved and restored to allow us to
eventually restore the environment stack. However, restoring the environment
stack is the <i>only</i> work to be done. Yet after this point the old
environment is not used! So restoring the environment stack isn't
necessary, therefore restoring the control stack isn't either.
<p>
This perspective makes proper tail recursion seem like more than an
alternate design or optimization. It seems more correct.
<p>
Here is the corrected, tail-recursive version of the code:
<pre>
SP = saveSP; // restore stack completely
e = *body; // reassign arguments
*penv = *lenv;
goto eval_top;
</pre>
<tt>penv</tt> is a pointer to the old environment, which we overwrite.
(Notice that the variable <tt>penv</tt> does not even appear in the first code
example.)
So where is the environment saved and restored, if not here? The answer
is that the burden is shifted to the caller; a caller to <tt>eval</tt> must
expect that its environment might be overwritten, and take steps to save it
if it will be needed further after the call. In practice, this means
the environment is saved and restored around the evaluation of
arguments, rather than around function applications. Hence <tt>(f x)</tt>
might be a tail call to <tt>f</tt>, but <tt>(+ y (f x))</tt> is not.
</body>
</html>

Binary file not shown.

Binary file not shown.

Binary file not shown.