<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" > <title>femtoLisp</title> </head> <body bgcolor="#fcfcfc"> <!-"#fcfcc8"> <img src="flbanner.jpg"> <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>0. Argument</h1> This Lisp has the following characteristics and goals: <ul> <li>Lisp-1 evaluation rule (ala Scheme) <li>Self-evaluating lambda (i.e. <tt>'(lambda (x) x)</tt> is callable) <li>Full Common Lisp-style macros <li>Dotted lambda lists for rest arguments (ala Scheme) <li>Symbols have one binding <li>Builtin functions are constants <li><em>All</em> values are printable and readable <li>Case-sensitive symbol names <li>Only the minimal core built-in (i.e. written in C), but enough to provide a practical level of performance <li>Very short (but not necessarily simple...) implementation <li>Generally use Common Lisp operator names <li>Nothing excessively weird or fancy </ul> <h1>1. Syntax</h1> <h2>1.1. Symbols</h2> Any character string can be a symbol name, including the empty string. In general, text between whitespace is read as a symbol except in the following cases: <ul> <li>The text begins with <tt>#</tt> <li>The text consists of a single period <tt>.</tt> <li>The text contains one of the special characters <tt>()[]';`,\|</tt> <li>The text is a valid number <li>The text is empty </ul> In these cases the symbol can be written by surrounding it with <tt>| |</tt> characters, or by escaping individual characters within the symbol using backslash <tt>\</tt>. Note that <tt>|</tt> and <tt>\</tt> must always be preceded with a backslash when writing a symbol name. <h2>1.2. Numbers</h2> A number consists of an optional + or - sign followed by one of the following sequences: <ul> <li><tt>NNN...</tt> where N is a decimal digit <li><tt>0xNNN...</tt> where N is a hexadecimal digit <li><tt>0NNN...</tt> where N is an octal digit </ul> femtoLisp provides 30-bit integers, and it is an error to write a constant less than -2<sup>29</sup> or greater than 2<sup>29</sup>-1. <h2>1.3. Conses and vectors</h2> The text <tt>(a b c)</tt> parses to the structure <tt>(cons a (cons b (cons c nil)))</tt> where a, b, and c are arbitrary expressions. <p> The text <tt>(a . b)</tt> parses to the structure <tt>(cons a b)</tt> where a and b are arbitrary expressions. <p> The text <tt>()</tt> reads as the symbol <tt>nil</tt>. <p> The text <tt>[a b c]</tt> parses to a vector of expressions a, b, and c. The syntax <tt>#(a b c)</tt> has the same meaning. <h2>1.4. Comments</h2> Text between a semicolon <tt>;</tt> and the next end-of-line is skipped. Text between <tt>#|</tt> and <tt>|#</tt> is also skipped. <h2>1.5. Prefix tokens</h2> There are five special prefix tokens which parse as follows:<p> <tt>'a</tt> is equivalent to <tt>(quote a)</tt>.<br> <tt>`a</tt> is equivalent to <tt>(backquote a)</tt>.<br> <tt>,a</tt> is equivalent to <tt>(*comma* a)</tt>.<br> <tt>,@a</tt> is equivalent to <tt>(*comma-at* a)</tt>.<br> <tt>,.a</tt> is equivalent to <tt>(*comma-dot* a)</tt>. <h2>1.6. Other read macros</h2> femtoLisp provides a few "read macros" that let you accomplish interesting tricks for textually representing data structures. <table border=1> <tr> <td>sequence<td>meaning <tr> <td><tt>#.e</tt><td>evaluate expression <tt>e</tt> and behave as if e's value had been written in place of e <tr> <td><tt>#\c</tt><td><tt>c</tt> is a character; read as its Unicode value <tr> <td><tt>#n=e</tt><td>read <tt>e</tt> and label it as <tt>n</tt>, where n is a decimal number <tr> <td><tt>#n#</tt><td>read as the identically-same value previously labeled <tt>n</tt> <tr> <td><tt>#:gNNN or #:NNN</tt><td>read a gensym. NNN is a hexadecimal constant. future occurrences of the same <tt>#:</tt> sequence will read to the identically-same gensym <tr> <td><tt>#sym(...)</tt><td>reads to the result of evaluating <tt>(apply sym '(...))</tt> <tr> <td><tt>#<</tt><td>triggers an error <tr> <td><tt>#'</tt><td>ignored; provided for compatibility <tr> <td><tt>#!</tt><td>single-line comment, for script execution support <tr> <td><tt>"str"</tt><td>UTF-8 character string; may contain newlines. <tt>\</tt> is the escape character. All C escape sequences are supported, plus <tt>\u</tt> and <tt>\U</tt> for unicode values. </table> When a read macro involves persistent state (e.g. label assignments), that state is valid only within the closest enclosing call to <tt>read</tt>. <h2>1.7. Builtins</h2> Builtin functions are represented as opaque constants. Every builtin function is the value of some constant symbol, so the builtin <tt>eq</tt>, for example, can be written as <tt>#.eq</tt> ("the value of symbol eq"). Note that <tt>eq</tt> itself is still an ordinary symbol, except that its value cannot be changed. <p> <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>2. Data and execution models</h1> <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>3. Primitive functions</h1> eq atom not set prog1 progn symbolp numberp builtinp consp vectorp boundp + - * / < apply eval <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>4. Special forms</h1> quote if lambda macro while label cond and or <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>5. Data structures</h1> cons car cdr rplaca rplacd list alloc vector aref aset length <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>6. Other functions</h1> read print princ load exit equal compare gensym <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>7. Exceptions</h1> trycatch raise <table border=0 width="100%" cellpadding=0 cellspacing=0> <tr><td bgcolor="#2d3f5f" height=4></table> <h1>8. Cvalues</h1> <h2>8.1. Introduction</h2> femtoLisp allows you to use the full range of C data types on dynamically-typed Lisp values. The motivation for this feature is that useful interpreters must provide a large library of routines in C for dealing with "real world" data like text and packed numeric arrays, and I would rather not write yet another such library. Instead, all the required data representations and primitives are provided so that such features could be implemented in, or at least described in, Lisp. <p> The cvalues capability makes it easier to call C from Lisp by providing ways to construct whatever arguments your C routines might require, and ways to decipher whatever values your C routines might return. Here are some things you can do with cvalues: <ul> <li>Call native C functions from Lisp without wrappers <li>Wrap C functions in pure Lisp, automatically inheriting some degree of type safety <li>Use Lisp functions as callbacks from C code <li>Use the Lisp garbage collector to reclaim malloc'd storage <li>Annotate C pointers with size information for bounds checking or serialization <li>Attach symbolic type information to a C data structure, allowing it to inherit Lisp services such as printing a readable representation <li>Add datatypes like strings to Lisp <li>Use more efficient represenations for your Lisp programs' data </ul> <p> femtoLisp's "cvalues" is inspired in part by Python's "ctypes" package. Lisp doesn't really have first-class types the way Python does, but it does have values, hence my version is called "cvalues". <h2>8.2. Type representations</h2> The core of cvalues is a language for describing C data types as symbolic expressions: <ul> <li>Primitive types are symbols <tt>int8, uint8, int16, uint16, int32, uint32, int64, uint64, char, wchar, long, ulong, float, double, void</tt> <li>Arrays <tt>(array TYPE SIZE)</tt>, where TYPE is another C type and SIZE is either a Lisp number or a C ulong. SIZE can be omitted to represent incomplete C array types like "int a[]". As in C, the size may only be omitted for the top level of a nested array; all array <em>element</em> types must have explicit sizes. Examples: <ul> <tt>int a[][2][3]</tt> is <tt>(array (array (array int32 3) 2))</tt><br> <tt>int a[4][]</tt> would be <tt>(array (array int32) 4)</tt>, but this is invalid. </ul> <li>Pointer <tt>(pointer TYPE)</tt> <li>Struct <tt>(struct ((NAME TYPE) (NAME TYPE) ...))</tt> <li>Union <tt>(union ((NAME TYPE) (NAME TYPE) ...))</tt> <li>Enum <tt>(enum (NAME NAME ...))</tt> <li>Function <tt>(c-function RET-TYPE (ARG-TYPE ARG-TYPE ...))</tt> </ul> A cvalue can be constructed using <tt>(c-value TYPE arg)</tt>, where <tt>arg</tt> is some Lisp value. The system will try to convert the Lisp value to the specified type. In many cases this will work better if some components of the provided Lisp value are themselves cvalues. <p> Note the function type is called "c-function" to avoid confusion, since functions are such a prevalent concept in Lisp. <p> The function <tt>sizeof</tt> returns the size (in bytes) of a cvalue or a c type. Every cvalue has a size, but incomplete types will cause <tt>sizeof</tt> to raise an error. The function <tt>typeof</tt> returns the type of a cvalue. <p> You are probably wondering how 32- and 64-bit integers are constructed from femtoLisp's 30-bit integers. The answer is that larger integers are constructed from multiple Lisp numbers 16 bits at a time, in big-endian fashion. In fact, the larger numeric types are the only cvalues types whose constructors accept multiple arguments. Examples: <ul> <pre> (c-value 'int32 0xdead 0xbeef) ; make 0xdeadbeef (c-value 'uint64 0x1001 0x8000 0xffff) ; make 0x000010018000ffff </pre> </ul> As you can see, missing zeros are padded in from the left. <h2>8.3. Constructors</h2> For convenience, a specialized constructor is provided for each class of C type (primitives, pointer, array, struct, union, enum, and c-function). For example: <ul> <pre> (uint32 0xcafe 0xd00d) (int32 -4) (char #\w) (array 'int8 [1 1 2 3 5 8]) </pre> </ul> These forms can be slightly less efficient than <tt>(c-value ...)</tt> because in many cases they will allocate a new type for the new value. For example, the fourth expression must create the type <tt>(array int8 6)</tt>. <p> Notice that calls to these constructors strongly resemble the types of the values they create. This relationship can be expressed formally as follows: <pre> (define (c-allocate type) (if (atom type) (apply (eval type) ()) (apply (eval (car type)) (cdr type)))) </pre> This function produces an instance of the given type by invoking the appropriate constructor. Primitive types (whose representations are symbols) can be constructed with zero arguments. For other types, the only required arguments are those present in the type representation. Any arguments after those are initializers. Using <tt>(cdr type)</tt> as the argument list provides only required arguments, so the value you get will not be initialized. <p> The builtin <tt>c-value</tt> function is similar to this one, except that it lets you pass initializers. <p> Cvalue constructors are generally permissive; they do the best they can with whatever you pass in. For example: <ul> <pre> (c-value '(array int8 1)) ; ok, full type provided (c-value '(array int8)) ; error, no size information (c-value '(array int8) [0 1]) ; ok, size implied by initializer </pre> </ul> <p> ccopy, c2lisp <h2>8.4. Pointers, arrays, and strings</h2> Pointer types are provided for completeness and C interoperability, but they should not generally be used from Lisp. femtoLisp doesn't know anything about a pointer except the raw address and the (alleged) type of the value it points to. Arrays are much more useful. They behave like references as in C, but femtoLisp tracks their sizes and performs bounds checking. <p> Arrays are used to allocate strings. All strings share the incomplete array type <tt>(array char)</tt>: <pre> > (c-value '(array char) [#\h #\e #\l #\l #\o]) "hello" > (sizeof that) 5 </pre> <tt>sizeof</tt> reveals that the size is known even though it is not reflected in the type (as is always the case with incomplete array types). <p> Since femtoLisp tracks the sizes of all values, there is no need for NUL terminators. Strings are just arrays of bytes, and may contain zero bytes throughout. However, C functions require zero-terminated strings. To solve this problem, femtoLisp allocates magic strings that actually have space for one more byte than they appear to. The hidden extra byte is always zero. This guarantees that a C function operating on the string will never overrun its allocated space. <p> Such magic strings are produced by double-quoted string literals, and by any explicit string-constructing function (such as <tt>string</tt>). <p> Unfortunately you still need to be careful, because it is possible to allocate a non-magic character array with no terminator. The "hello" string above is an example of this, since it was constructed from an explicit vector of characters. Such an array would cause problems if passed to a function expecting a C string. <p> deref <h2>8.5. Access</h2> cref,cset,byteref,byteset,ccopy <h2>8.6. Memory management concerns</h2> autorelease <h2>8.7. Guest functions</h2> Functions written in C but designed to operate on Lisp values are known here as "guest functions". Although they are foreign, they live in Lisp's house and so live by its rules. Guest functions are what you use to write interpreter extensions, for example to implement a function like <tt>assoc</tt> in C for performance. <p> Guest functions must have a particular signature: <pre> value_t func(value_t *args, uint32_t nargs); </pre> Guest functions must also be aware of the femtoLisp API and garbage collector. <h2>8.8. Native functions</h2> </body> </html>