0. Argument

This Lisp has the following characteristics and goals:

Lisp-1 evaluation rule (ala Scheme)
Self-evaluating lambda (i.e. '(lambda (x) x) is callable)
Full Common Lisp-style macros
Dotted lambda lists for rest arguments (ala Scheme)
Symbols have one binding
Builtin functions are constants
All values are printable and readable
Case-sensitive symbol names
Only the minimal core built-in (i.e. written in C), but enough to provide a practical level of performance
Very short (but not necessarily simple...) implementation
Generally use Common Lisp operator names
Nothing excessively weird or fancy

1. Syntax

1.1. Symbols

Any character string can be a symbol name, including the empty string. In general, text between whitespace is read as a symbol except in the following cases:

The text begins with #
The text consists of a single period .
The text contains one of the special characters ()[]';`,\|
The text is a valid number
The text is empty

In these cases the symbol can be written by surrounding it with | | characters, or by escaping individual characters within the symbol using backslash \. Note that | and \ must always be preceded with a backslash when writing a symbol name.

1.2. Numbers

A number consists of an optional + or - sign followed by one of the following sequences:

NNN... where N is a decimal digit
0xNNN... where N is a hexadecimal digit
0NNN... where N is an octal digit

femtoLisp provides 30-bit integers, and it is an error to write a constant less than -2²⁹ or greater than 2²⁹-1.

1.3. Conses and vectors

The text (a b c) parses to the structure (cons a (cons b (cons c nil))) where a, b, and c are arbitrary expressions.

The text (a . b) parses to the structure (cons a b) where a and b are arbitrary expressions.

The text () reads as the symbol nil.

The text [a b c] parses to a vector of expressions a, b, and c. The syntax #(a b c) has the same meaning.

1.4. Comments

Text between a semicolon ; and the next end-of-line is skipped. Text between #| and |# is also skipped.

1.5. Prefix tokens

There are five special prefix tokens which parse as follows:

'a is equivalent to (quote a).
`a is equivalent to (backquote a).
,a is equivalent to (*comma* a).
,@a is equivalent to (*comma-at* a).
,.a is equivalent to (*comma-dot* a).

1.6. Other read macros

femtoLisp provides a few "read macros" that let you accomplish interesting tricks for textually representing data structures.

sequence	meaning
`#.e`	evaluate expression `e` and behave as if e's value had been written in place of e
`#\c`	`c` is a character; read as its Unicode value
`#n=e`	read `e` and label it as `n`, where n is a decimal number
`#n#`	read as the identically-same value previously labeled `n`
`#:gNNN or #:NNN`	read a gensym. NNN is a hexadecimal constant. future occurrences of the same `#:` sequence will read to the identically-same gensym
`#sym(...)`	reads to the result of evaluating `(apply sym '(...))`
`#<`	triggers an error
`#'`	ignored; provided for compatibility
`#!`	single-line comment, for script execution support
`"str"`	UTF-8 character string; may contain newlines. `\` is the escape character. All C escape sequences are supported, plus `\u` and `\U` for unicode values.

When a read macro involves persistent state (e.g. label assignments), that state is valid only within the closest enclosing call to read.

1.7. Builtins

Builtin functions are represented as opaque constants. Every builtin function is the value of some constant symbol, so the builtin eq, for example, can be written as #.eq ("the value of symbol eq"). Note that eq itself is still an ordinary symbol, except that its value cannot be changed.

2. Data and execution models

3. Primitive functions

eq atom not set prog1 progn symbolp numberp builtinp consp vectorp boundp + - * / < apply eval

4. Special forms

quote if lambda macro while label cond and or

5. Data structures

cons car cdr rplaca rplacd list alloc vector aref aset length

6. Other functions

read print princ load exit equal compare gensym

7. Exceptions

trycatch raise

8. Cvalues

8.1. Introduction

femtoLisp allows you to use the full range of C data types on dynamically-typed Lisp values. The motivation for this feature is that useful interpreters must provide a large library of routines in C for dealing with "real world" data like text and packed numeric arrays, and I would rather not write yet another such library. Instead, all the required data representations and primitives are provided so that such features could be implemented in, or at least described in, Lisp.

The cvalues capability makes it easier to call C from Lisp by providing ways to construct whatever arguments your C routines might require, and ways to decipher whatever values your C routines might return. Here are some things you can do with cvalues:

Call native C functions from Lisp without wrappers
Wrap C functions in pure Lisp, automatically inheriting some degree of type safety
Use Lisp functions as callbacks from C code
Use the Lisp garbage collector to reclaim malloc'd storage
Annotate C pointers with size information for bounds checking or serialization
Attach symbolic type information to a C data structure, allowing it to inherit Lisp services such as printing a readable representation
Add datatypes like strings to Lisp
Use more efficient represenations for your Lisp programs' data

femtoLisp's "cvalues" is inspired in part by Python's "ctypes" package. Lisp doesn't really have first-class types the way Python does, but it does have values, hence my version is called "cvalues".

8.2. Type representations

The core of cvalues is a language for describing C data types as symbolic expressions:

Primitive types are symbols int8, uint8, int16, uint16, int32, uint32, int64, uint64, char, wchar, long, ulong, float, double, void
Arrays (array TYPE SIZE), where TYPE is another C type and SIZE is either a Lisp number or a C ulong. SIZE can be omitted to represent incomplete C array types like "int a[]". As in C, the size may only be omitted for the top level of a nested array; all array element types must have explicit sizes. Examples:
Pointer (pointer TYPE)
Struct (struct ((NAME TYPE) (NAME TYPE) ...))
Union (union ((NAME TYPE) (NAME TYPE) ...))
Enum (enum (NAME NAME ...))
Function (c-function RET-TYPE (ARG-TYPE ARG-TYPE ...))

A cvalue can be constructed using (c-value TYPE arg), where arg is some Lisp value. The system will try to convert the Lisp value to the specified type. In many cases this will work better if some components of the provided Lisp value are themselves cvalues.

Note the function type is called "c-function" to avoid confusion, since functions are such a prevalent concept in Lisp.

The function sizeof returns the size (in bytes) of a cvalue or a c type. Every cvalue has a size, but incomplete types will cause sizeof to raise an error. The function typeof returns the type of a cvalue.

You are probably wondering how 32- and 64-bit integers are constructed from femtoLisp's 30-bit integers. The answer is that larger integers are constructed from multiple Lisp numbers 16 bits at a time, in big-endian fashion. In fact, the larger numeric types are the only cvalues types whose constructors accept multiple arguments. Examples:

(c-value 'int32 0xdead 0xbeef)         ; make 0xdeadbeef
(c-value 'uint64 0x1001 0x8000 0xffff) ; make 0x000010018000ffff

As you can see, missing zeros are padded in from the left.

8.3. Constructors

For convenience, a specialized constructor is provided for each class of C type (primitives, pointer, array, struct, union, enum, and c-function). For example:

(uint32 0xcafe 0xd00d)
(int32 -4)
(char #\w)
(array 'int8 [1 1 2 3 5 8])

These forms can be slightly less efficient than (c-value ...) because in many cases they will allocate a new type for the new value. For example, the fourth expression must create the type (array int8 6).

Notice that calls to these constructors strongly resemble the types of the values they create. This relationship can be expressed formally as follows:

(define (c-allocate type)
  (if (atom type)
      (apply (eval type) ())
      (apply (eval (car type)) (cdr type))))

This function produces an instance of the given type by invoking the appropriate constructor. Primitive types (whose representations are symbols) can be constructed with zero arguments. For other types, the only required arguments are those present in the type representation. Any arguments after those are initializers. Using (cdr type) as the argument list provides only required arguments, so the value you get will not be initialized.

The builtin c-value function is similar to this one, except that it lets you pass initializers.

Cvalue constructors are generally permissive; they do the best they can with whatever you pass in. For example:

(c-value '(array int8 1))      ; ok, full type provided
(c-value '(array int8))        ; error, no size information
(c-value '(array int8) [0 1])  ; ok, size implied by initializer

ccopy, c2lisp

8.4. Pointers, arrays, and strings

Pointer types are provided for completeness and C interoperability, but they should not generally be used from Lisp. femtoLisp doesn't know anything about a pointer except the raw address and the (alleged) type of the value it points to. Arrays are much more useful. They behave like references as in C, but femtoLisp tracks their sizes and performs bounds checking.

Arrays are used to allocate strings. All strings share the incomplete array type (array char):

> (c-value '(array char) [#\h #\e #\l #\l #\o])
"hello"

> (sizeof that)
5

sizeof reveals that the size is known even though it is not reflected in the type (as is always the case with incomplete array types).

Since femtoLisp tracks the sizes of all values, there is no need for NUL terminators. Strings are just arrays of bytes, and may contain zero bytes throughout. However, C functions require zero-terminated strings. To solve this problem, femtoLisp allocates magic strings that actually have space for one more byte than they appear to. The hidden extra byte is always zero. This guarantees that a C function operating on the string will never overrun its allocated space.

Such magic strings are produced by double-quoted string literals, and by any explicit string-constructing function (such as string).

Unfortunately you still need to be careful, because it is possible to allocate a non-magic character array with no terminator. The "hello" string above is an example of this, since it was constructed from an explicit vector of characters. Such an array would cause problems if passed to a function expecting a C string.

deref

8.5. Access

cref,cset,byteref,byteset,ccopy

8.6. Memory management concerns

autorelease

8.7. Guest functions

Functions written in C but designed to operate on Lisp values are known here as "guest functions". Although they are foreign, they live in Lisp's house and so live by its rules. Guest functions are what you use to write interpreter extensions, for example to implement a function like assoc in C for performance.

Guest functions must have a particular signature:

value_t func(value_t *args, uint32_t nargs);

Guest functions must also be aware of the femtoLisp API and garbage collector.