Wayback 20050205230520
www.stripedgazelle.org/joey/src/dream/dream.html
This commit is contained in:
parent
91c4ac2da3
commit
d4be16d42e
|
@ -1,17 +1,30 @@
|
|||
<html>
|
||||
<title>Notes on the Design of the 'dream' Scheme Interpreter</title>
|
||||
<title>The 'dream' Scheme Interpreter</title>
|
||||
<h1>The 'dream' Scheme Interpreter</h1>
|
||||
Download source (gcc on x86): <a href="http://www.stripedgazelle.org/joey/src/dream.tar.gz">dream.tar.gz</a> or <a href="http://www.stripedgazelle.org/joey/src/dream.zip">dream.zip</a>
|
||||
<p>
|
||||
I am proud to announce the premier production version of my Scheme interpreter, 'dream', written entirely in x86 assembly language (GAS syntax.) :-)
|
||||
All essential syntax and procedures from the R4RS standard are implemented.
|
||||
The interpreter passes all applicable tests from the 'r4rstest.scm' test suite.
|
||||
My overarching goal has been SIMPLICITY.
|
||||
I have chosen a design and implementation that is as straightforward and direct as I could imagine.
|
||||
The result is an executable that is quite small and, for an interpreter, quite fast as well. :-)
|
||||
The binary image 'dream' included in the files above was compiled on Debian GNU/Linux using 'as' and 'gcc' version 2.95.4.
|
||||
The binary image 'dream.exe' included in the files above was compiled on Debian GNU/Linux using 'i586-mingw32msvc-as' and 'i586-mingw32msvc-gcc' version 2.95.3-7.
|
||||
</p>
|
||||
<hr>
|
||||
<h1>Notes on the Design of the 'dream' Scheme Interpreter</h1>
|
||||
Download source (gcc on x86): <a href="http://www.stripedgazelle.org/joey/src/dream.tar.gz">dream.tar.gz</a>
|
||||
<p>
|
||||
The design for the 'dream' Scheme interpreter began with the design given in Abelson and Sussman's <u>Structure and Interpretation of Computer Programs</u>.
|
||||
</p>
|
||||
<h2>Scheme Object Storage and Garbage Collection</h2>
|
||||
<h2>Garbage Collection</h2>
|
||||
<p>
|
||||
Two areas of memory of equal size are used for the storage of scheme objects.
|
||||
Both are aligned on an 8-byte boundary.
|
||||
Only one of the two is used at a time by the scheme interpreter; when it becomes full, the garbage collector copies all scheme objects in use to the other memory area (which then becomes the active one).
|
||||
Dynamically allocated scheme objects other than symbols are represented by a discrete number of quad-words which are allocated consecutively within the active memory area.
|
||||
The simplest of these objects is the scheme pair which consists of two double-words each of which addresses a scheme object.
|
||||
<h2>Scheme Object Types</h2>
|
||||
The simplest object is the scheme pair which consists of two double-words each of which addresses a scheme object.
|
||||
</p>
|
||||
<p>
|
||||
Symbols and statically allocated objects are not garbage-collected, but they must begin on a 2-byte boundary so that the addresses stored in pairs are always divisible by 2.
|
||||
|
@ -31,24 +44,40 @@ The address of every symbol is stored in an array of double-words.
|
|||
</p>
|
||||
<p>
|
||||
Strings, unlike symbols, are stored along with the other dynamically allocated objects, and use the second double-word to store the address of their string of ascii byte codes (ending with a null byte).
|
||||
Furthermore, unlike symbols, the high word of the type field is used to store the length of the string (prior to the terminating null byte.)
|
||||
Another pair of memory areas of equal size is used to store these strings of ascii byte codes.
|
||||
When the active string storage area becomes full, the garbage collector copies in-use string data to the other string storage area (which then becomes the active one).
|
||||
Otherwise the garbage collector leaves these string storage areas untouched.
|
||||
</p>
|
||||
<p>
|
||||
Vectors are stored as consecutive pairs (but the first half of the first pair is the vector type header.)
|
||||
All other objects which require more than a quad-word of storage simply store the address of a scheme pair in the second double-word and then use scheme pair and list structure to store everything they need.
|
||||
These types must set the low bit in the high byte of the low word of their type in order to indicate to the garbage collector that this address in the second double-word must be followed just as if it were the cdr of a pair.
|
||||
The high word of the type field is used to store the length of the vector.
|
||||
This and all other objects which require more than a quad-word of storage simply store the address of a scheme object in the second double-word and set the low bit in the high byte of the low word of their type to indicate to the garbage collector that this address in the second double-word must be followed just as if it were the cdr of a pair.
|
||||
</p>
|
||||
<p>
|
||||
Special forms and built-in procedures store an address to JMP to in the second double-word.
|
||||
When combinations are evaluated, if the 'car' of the combination is a symbol of the type MEMOIZABLE_SYMBOL bound to a special form or built-in procedure (in the top-level environment), then the 'car' of the combination is set! to the special form or built-in procedure to which the symbol was bound.
|
||||
This way, each time this combination is evaluated afterwards, the symbol lookup in the top-level environment will no longer be necessary, and the efficiency of the interpreter is greatly enhanced.
|
||||
However, the R4RS standard for scheme requires that these symbols, bound to built-in procedures, may be redefined, and we may presume that any redefinition should then be effective retroactively for all occurences of the symbol within the same environment.
|
||||
The memoizing technique described above is inconsistent with this requirement.
|
||||
Therefore, symbols of the type MEMOIZABLE_SYMBOL are allowed by R4RS to be bound only to special forms.
|
||||
Consequently, we provide this standards compliant behavior as a conditional compilation option, but for efficiency's sake we retain memoization of built-in procedures as the default.
|
||||
</p>
|
||||
<p>
|
||||
Number types are distinguished by the high byte of the low word, which increases as the complexity of the type of number ascends the numeric tower. Integers simply store their 32 bit signed value in the second half of the quad-word. Rationals are stored as a pair of integers, thus the low bit in the high byte of the low word of their type is set so that the garbage collector will see the pair. Inexactness of a number is indicated by setting the lowest bit of the high word of the number's type. Note, however, that all internal representations of numbers are exact (no floating point numbers are used), and so inexactness is given only as an auxiliary property of the number.
|
||||
</p>
|
||||
<p>
|
||||
</p>
|
||||
Input ports use the low byte of the high word of the type field to store the last character read by the (peek) procedure. The second half of the quad-word for both input and output ports holds the FILE* pointer associated with the port.
|
||||
The ports returned by (current-input-port) and (current-output-port) are stored internally for efficiency's sake.
|
||||
They therefore must be treated by the garbage collector as if they were registers (which indeed they would be if only we had more registers to work with).
|
||||
<h2>The Stack</h2>
|
||||
<p>
|
||||
The scheme object stack is maintained as a scheme list (dynamically allocated as pairs).
|
||||
The garbage collector, when it runs, begins at the root of this scheme list.
|
||||
Hence when garbage collection commences, only the registers need be pushed on to this scheme object stack and popped off afterwards to insure that all reachable objects are retained thoughout the garbage collection process.
|
||||
The x86 native stack is used only for the flow of continuation control.
|
||||
Consequently when call-with-current-continuation is invoked, the native stack is copied to a scheme list with each address represented as a special form.
|
||||
Hence when garbage collection commences, only the registers and the current input and output ports need be pushed on to this scheme object stack and popped off afterwards to insure that all reachable objects are retained thoughout the garbage collection process.
|
||||
The stack pointed to by the esp register is used only for the flow of continuation control.
|
||||
Consequently when call-with-current-continuation is invoked, this native stack is copied to a scheme list with each address represented as a special form.
|
||||
</p>
|
||||
<h2>Scheme Registers</h2>
|
||||
<p>
|
||||
|
@ -59,7 +88,7 @@ The stop and copy garbage collector registers old, new, and scan in <u>Structure
|
|||
</p>
|
||||
<h2>Input/Output</h2>
|
||||
<p>
|
||||
The standard C library is used for file input and output in order facilitate portability (albeit limited to x86 architectures). Specifically, only fgetc, fputc, fprintf, fopen, and fclose are used.
|
||||
The standard C library is used for file input and output in order facilitate portability (albeit limited to x86 architectures). Specifically, only fgetc, fputc, fprintf, fopen, fclose, and fflush are used.
|
||||
</p>
|
||||
<hr>
|
||||
<a href="http://www.stripedgazelle.org/joey/index.html">Home</a>
|
||||
|
|
Loading…
Reference in New Issue