upscheme/femtolisp/site/doc.html

429 lines
14 KiB
HTML
Raw Normal View History

2008-06-30 21:54:22 -04:00
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >
<title>femtoLisp</title>
</head>
<body bgcolor="#fcfcfc"> <!-"#fcfcc8">
<img src="flbanner.jpg">
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>0. Argument</h1>
This Lisp has the following characteristics and goals:
<ul>
<li>Lisp-1 evaluation rule (ala Scheme)
<li>Self-evaluating lambda (i.e. <tt>'(lambda (x) x)</tt> is callable)
<li>Full Common Lisp-style macros
<li>Dotted lambda lists for rest arguments (ala Scheme)
<li>Symbols have one binding
<li>Builtin functions are constants
<li><em>All</em> values are printable and readable
<li>Case-sensitive symbol names
<li>Only the minimal core built-in (i.e. written in C), but
enough to provide a practical level of performance
<li>Very short (but not necessarily simple...) implementation
<li>Generally use Common Lisp operator names
<li>Nothing excessively weird or fancy
</ul>
<h1>1. Syntax</h1>
<h2>1.1. Symbols</h2>
Any character string can be a symbol name, including the empty string. In
general, text between whitespace is read as a symbol except in the following
cases:
<ul>
<li>The text begins with <tt>#</tt>
<li>The text consists of a single period <tt>.</tt>
<li>The text contains one of the special characters <tt>()[]';`,\|</tt>
<li>The text is a valid number
<li>The text is empty
</ul>
In these cases the symbol can be written by surrounding it with <tt>| |</tt>
characters, or by escaping individual characters within the symbol using
backslash <tt>\</tt>. Note that <tt>|</tt> and <tt>\</tt> must always be
preceded with a backslash when writing a symbol name.
<h2>1.2. Numbers</h2>
A number consists of an optional + or - sign followed by one of the following
sequences:
<ul>
<li><tt>NNN...</tt> where N is a decimal digit
<li><tt>0xNNN...</tt> where N is a hexadecimal digit
<li><tt>0NNN...</tt> where N is an octal digit
</ul>
femtoLisp provides 30-bit integers, and it is an error to write a constant
less than -2<sup>29</sup> or greater than 2<sup>29</sup>-1.
<h2>1.3. Conses and vectors</h2>
The text <tt>(a b c)</tt> parses to the structure
<tt>(cons a (cons b (cons c nil)))</tt> where a, b, and c are arbitrary
expressions.
<p>
The text <tt>(a . b)</tt> parses to the structure
<tt>(cons a b)</tt> where a and b are arbitrary expressions.
<p>
The text <tt>()</tt> reads as the symbol <tt>nil</tt>.
<p>
The text <tt>[a b c]</tt> parses to a vector of expressions a, b, and c.
The syntax <tt>#(a b c)</tt> has the same meaning.
<h2>1.4. Comments</h2>
Text between a semicolon <tt>;</tt> and the next end-of-line is skipped.
Text between <tt>#|</tt> and <tt>|#</tt> is also skipped.
<h2>1.5. Prefix tokens</h2>
There are five special prefix tokens which parse as follows:<p>
<tt>'a</tt> is equivalent to <tt>(quote a)</tt>.<br>
<tt>`a</tt> is equivalent to <tt>(backquote a)</tt>.<br>
<tt>,a</tt> is equivalent to <tt>(*comma* a)</tt>.<br>
<tt>,@a</tt> is equivalent to <tt>(*comma-at* a)</tt>.<br>
<tt>,.a</tt> is equivalent to <tt>(*comma-dot* a)</tt>.
<h2>1.6. Other read macros</h2>
femtoLisp provides a few "read macros" that let you accomplish interesting
tricks for textually representing data structures.
<table border=1>
<tr>
<td>sequence<td>meaning
<tr>
<td><tt>#.e</tt><td>evaluate expression <tt>e</tt> and behave as if e's
value had been written in place of e
<tr>
<td><tt>#\c</tt><td><tt>c</tt> is a character; read as its Unicode value
<tr>
<td><tt>#n=e</tt><td>read <tt>e</tt> and label it as <tt>n</tt>, where n
is a decimal number
<tr>
<td><tt>#n#</tt><td>read as the identically-same value previously labeled
<tt>n</tt>
<tr>
<td><tt>#:gNNN or #:NNN</tt><td>read a gensym. NNN is a hexadecimal
constant. future occurrences of the same <tt>#:</tt> sequence will read to
the identically-same gensym
<tr>
<td><tt>#sym(...)</tt><td>reads to the result of evaluating
<tt>(apply sym '(...))</tt>
<tr>
<td><tt>#&lt;</tt><td>triggers an error
<tr>
<td><tt>#'</tt><td>ignored; provided for compatibility
<tr>
<td><tt>#!</tt><td>single-line comment, for script execution support
<tr>
<td><tt>"str"</tt><td>UTF-8 character string; may contain newlines.
<tt>\</tt> is the escape character. All C escape sequences are supported, plus
<tt>\u</tt> and <tt>\U</tt> for unicode values.
</table>
When a read macro involves persistent state (e.g. label assignments), that
state is valid only within the closest enclosing call to <tt>read</tt>.
<h2>1.7. Builtins</h2>
Builtin functions are represented as opaque constants. Every builtin
function is the value of some constant symbol, so the builtin <tt>eq</tt>,
for example, can be written as <tt>#.eq</tt> ("the value of symbol eq").
Note that <tt>eq</tt> itself is still an ordinary symbol, except that its
value cannot be changed.
<p>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>2. Data and execution models</h1>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>3. Primitive functions</h1>
eq atom not set prog1 progn
symbolp numberp builtinp consp vectorp boundp
+ - * / <
apply eval
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>4. Special forms</h1>
quote if lambda macro while label cond and or
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>5. Data structures</h1>
cons car cdr rplaca rplacd list
alloc vector aref aset length
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>6. Other functions</h1>
read print princ load exit
equal compare
gensym
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>7. Exceptions</h1>
trycatch raise
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr><td bgcolor="#2d3f5f" height=4></table>
<h1>8. Cvalues</h1>
<h2>8.1. Introduction</h2>
femtoLisp allows you to use the full range of C data types on
dynamically-typed Lisp values. The motivation for this feature is that
useful
interpreters must provide a large library of routines in C for dealing
with "real world" data like text and packed numeric arrays, and I would
rather not write yet another such library. Instead, all the
required data representations and primitives are provided so that such
features could be implemented in, or at least described in, Lisp.
<p>
The cvalues capability makes it easier to call C from Lisp by providing
ways to construct whatever arguments your C routines might require, and ways
to decipher whatever values your C routines might return. Here are some
things you can do with cvalues:
<ul>
<li>Call native C functions from Lisp without wrappers
<li>Wrap C functions in pure Lisp, automatically inheriting some degree
of type safety
<li>Use Lisp functions as callbacks from C code
<li>Use the Lisp garbage collector to reclaim malloc'd storage
<li>Annotate C pointers with size information for bounds checking or
serialization
<li>Attach symbolic type information to a C data structure, allowing it to
inherit Lisp services such as printing a readable representation
<li>Add datatypes like strings to Lisp
<li>Use more efficient represenations for your Lisp programs' data
</ul>
<p>
femtoLisp's "cvalues" is inspired in part by Python's "ctypes" package.
Lisp doesn't really have first-class types the way Python does, but it does
have values, hence my version is called "cvalues".
<h2>8.2. Type representations</h2>
The core of cvalues is a language for describing C data types as
symbolic expressions:
<ul>
<li>Primitive types are symbols <tt>int8, uint8, int16, uint16, int32, uint32,
int64, uint64, char, wchar, long, ulong, float, double, void</tt>
<li>Arrays <tt>(array TYPE SIZE)</tt>, where TYPE is another C type and
SIZE is either a Lisp number or a C ulong. SIZE can be omitted to
represent incomplete C array types like "int a[]". As in C, the size may
only be omitted for the top level of a nested array; all array
<em>element</em> types
must have explicit sizes. Examples:
<ul>
<tt>int a[][2][3]</tt> is <tt>(array (array (array int32 3) 2))</tt><br>
<tt>int a[4][]</tt> would be <tt>(array (array int32) 4)</tt>, but this is
invalid.
</ul>
<li>Pointer <tt>(pointer TYPE)</tt>
<li>Struct <tt>(struct ((NAME TYPE) (NAME TYPE) ...))</tt>
<li>Union <tt>(union ((NAME TYPE) (NAME TYPE) ...))</tt>
<li>Enum <tt>(enum (NAME NAME ...))</tt>
<li>Function <tt>(c-function RET-TYPE (ARG-TYPE ARG-TYPE ...))</tt>
</ul>
A cvalue can be constructed using <tt>(c-value TYPE arg)</tt>, where
<tt>arg</tt> is some Lisp value. The system will try to convert the Lisp
value to the specified type. In many cases this will work better if some
components of the provided Lisp value are themselves cvalues.
<p>
Note the function type is called "c-function" to avoid confusion, since
functions are such a prevalent concept in Lisp.
<p>
The function <tt>sizeof</tt> returns the size (in bytes) of a cvalue or a
c type. Every cvalue has a size, but incomplete types will cause
<tt>sizeof</tt> to raise an error. The function <tt>typeof</tt> returns
the type of a cvalue.
<p>
You are probably wondering how 32- and 64-bit integers are constructed from
femtoLisp's 30-bit integers. The answer is that larger integers are
constructed from multiple Lisp numbers 16 bits at a time, in big-endian
fashion. In fact, the larger numeric types are the only cvalues
types whose constructors accept multiple arguments. Examples:
<ul>
<pre>
(c-value 'int32 0xdead 0xbeef) ; make 0xdeadbeef
(c-value 'uint64 0x1001 0x8000 0xffff) ; make 0x000010018000ffff
</pre>
</ul>
As you can see, missing zeros are padded in from the left.
<h2>8.3. Constructors</h2>
For convenience, a specialized constructor is provided for each
class of C type (primitives, pointer, array, struct, union, enum,
and c-function).
For example:
<ul>
<pre>
(uint32 0xcafe 0xd00d)
(int32 -4)
(char #\w)
(array 'int8 [1 1 2 3 5 8])
</pre>
</ul>
These forms can be slightly less efficient than <tt>(c-value ...)</tt>
because in many cases they will allocate a new type for the new value.
For example, the fourth expression must create the type
<tt>(array int8 6)</tt>.
<p>
Notice that calls to these constructors strongly resemble
the types of the values they create. This relationship can be expressed
formally as follows:
<pre>
(define (c-allocate type)
(if (atom type)
(apply (eval type) ())
(apply (eval (car type)) (cdr type))))
</pre>
This function produces an instance of the given type by
invoking the appropriate constructor. Primitive types (whose representations
are symbols) can be constructed with zero arguments. For other types,
the only required arguments are those present in the type representation.
Any arguments after those are initializers. Using
<tt>(cdr type)</tt> as the argument list provides only required arguments,
so the value you get will not be initialized.
<p>
The builtin <tt>c-value</tt> function is similar to this one, except that it
lets you pass initializers.
<p>
Cvalue constructors are generally permissive; they do the best they
can with whatever you pass in. For example:
<ul>
<pre>
(c-value '(array int8 1)) ; ok, full type provided
(c-value '(array int8)) ; error, no size information
(c-value '(array int8) [0 1]) ; ok, size implied by initializer
</pre>
</ul>
<p>
ccopy, c2lisp
<h2>8.4. Pointers, arrays, and strings</h2>
Pointer types are provided for completeness and C interoperability, but
they should not generally be used from Lisp. femtoLisp doesn't know
anything about a pointer except the raw address and the (alleged) type of the
value it points to. Arrays are much more useful. They behave like references
as in C, but femtoLisp tracks their sizes and performs bounds checking.
<p>
Arrays are used to allocate strings. All strings share
the incomplete array type <tt>(array char)</tt>:
<pre>
> (c-value '(array char) [#\h #\e #\l #\l #\o])
"hello"
> (sizeof that)
5
</pre>
<tt>sizeof</tt> reveals that the size is known even though it is not
reflected in the type (as is always the case with incomplete array types).
<p>
Since femtoLisp tracks the sizes of all values, there is no need for NUL
terminators. Strings are just arrays of bytes, and may contain zero bytes
throughout. However, C functions require zero-terminated strings. To
solve this problem, femtoLisp allocates magic strings that actually have
space for one more byte than they appear to. The hidden extra byte is
always zero. This guarantees that a C function operating on the string
will never overrun its allocated space.
<p>
Such magic strings are produced by double-quoted string literals, and by
any explicit string-constructing function (such as <tt>string</tt>).
<p>
Unfortunately you still need to be careful, because it is possible to
allocate a non-magic character array with no terminator. The "hello"
string above is an example of this, since it was constructed from an
explicit vector of characters.
Such an array would cause problems if passed to a function expecting a
C string.
<p>
deref
<h2>8.5. Access</h2>
cref,cset,byteref,byteset,ccopy
<h2>8.6. Memory management concerns</h2>
autorelease
<h2>8.7. Guest functions</h2>
Functions written in C but designed to operate on Lisp values are
known here as "guest functions". Although they are foreign, they live in
Lisp's house and so live by its rules. Guest functions are what you
use to write interpreter extensions, for example to implement a function
like <tt>assoc</tt> in C for performance.
<p>
Guest functions must have a particular signature:
<pre>
value_t func(value_t *args, uint32_t nargs);
</pre>
Guest functions must also be aware of the femtoLisp API and garbage
collector.
<h2>8.8. Native functions</h2>
</body>
</html>