Wayback 20090703000521

www.stripedgazelle.org/joey/dream.html
This commit is contained in:
Lassi Kortela 2023-02-22 15:58:58 +02:00
parent 35b13a5e22
commit b37fc5f6d4
1 changed files with 67 additions and 30 deletions

View File

@ -10,10 +10,7 @@ All essential syntax and procedures from the <a href="http://www.swiss.ai.mit.ed
<br />
The interpreter is properly tail recursive and passes all applicable tests from the 'r4rstest.scm' test suite.
<br />
Rational arithmetic with 32 bit numerator and denominator is supported, but no Real or Complex numbers.
<br />
Bignums (arbitrary-precision integers) are in the works, but not yet complete.
(At this point they may be read and written, added and subtracted only).
Rational arithmetic with 32 bit magnitude numerator and denominator, or up to 262112 bit magnitude if the GMP library is available, is supported (with sign stored separately), but no Real or Complex numbers (these currently are in the works).
<hr>
Dream is compiled using an x86 <a href="assembler.html">assembler</a>
I have written in Scheme, with a syntax very similar to GAS.
@ -21,10 +18,23 @@ Dream is compiled using an x86 <a href="assembler.html">assembler</a>
Consequently Dream can compile itself. :-)
<hr>
<b>Download latest version for Linux on x86:</b>
<a href="/cgi-bin/wiki_joey/dream20090101.tar.gz">dream20090101.tar.gz</a>
<a href="/cgi-bin/wiki_joey/dream20090702.tar.gz">dream20090702.tar.gz</a>
<br />
This is an ELF executable that may use Linux syscalls only or, by default, is dynamically linked with <b>ld-linux.so.2</b> in order to provide access in scheme to dlopen, dlclose, and dlsym.
<br />
In particular, if <b>libgmp.so.3</b> is available then it is dynamically loaded in order to implement multiple precision integer arithmetic.
<hr>
<b>Download latest version for Windows on x86:</b>
<a href="/cgi-bin/wiki_joey/nightmare20090122.zip">nightmare20090122.zip</a>
<a href="/cgi-bin/wiki_joey/dream20090702.zip">dream20090702.zip</a>
<br />
Since dream.exe expects to find 'c:/dream/bootstrap.scm', unzip to your C: drive, or else place a copy of 'dream/bootstrap.scm' there.
<br />
This is a PE executable linked with <b>KERNEL.DLL</b>, giving access in scheme to call DLL functions via LoadLibraryA and GetProcAddress (known by 'dlopen' and 'dlsym' in the dream).
<br />
In particular, if <b>libgmp-3.dll</b> is available then it is automatically loaded in order to implement multiple precision integer arithmetic.
<hr>
Note that the sources for both versions above are the same.
The boolean 'WINDOWS' in make.scm has simply been set #f or #t respectively, and then 'dream make.scm' was used to compile.
<hr>
<b>Check out my DreamOS based on the Dream Scheme
Interpreter as a bootable floppy disk:</b> <a href="dreamos.html">dreamos</a>
@ -34,6 +44,7 @@ Interpreter as a bootable floppy disk:</b> <a href="dreamos.html">dreamos</a>
<p>
The design for the 'dream' Scheme interpreter began with the design given in Abelson and Sussman's <a href="http://mitpress.mit.edu/sicp/full-text/book/book.html">Structure and Interpretation of Computer Programs</a>.
</p>
<p>In the following I will use the term 'byte' to refer to 8 bits, 'wyde' to refer to two bytes, 'tetra' to refer to four bytes, and 'octa' to refer to eight bytes.</p>
<h2>Garbage Collection</h2>
<p>
Two areas of memory of equal size are used for the storage of scheme objects.
@ -41,46 +52,72 @@ Both are aligned on an 8-byte boundary.
Only one of the two is used at a time by the scheme interpreter; when it becomes full, the garbage collector copies all scheme objects in use to the other memory area (which then becomes the active one).
Dynamically allocated scheme objects other than symbols are represented by a discrete number of quad-words which are allocated consecutively within the active memory area.
<h2>Scheme Object Types</h2>
The simplest object is the scheme pair which consists of two double-words each of which addresses a scheme object.
The simplest object is the scheme pair which consists of two tetras each of which addresses a scheme object.
</p>
<p>
Symbols and statically allocated objects are not garbage-collected, but they must begin on a 2-byte boundary so that the addresses stored in pairs are always divisible by 2.
By virtue of the fact that all scheme objects begin on a 2-byte boundary, scheme objects other than scheme pairs are differentiated from scheme pairs by storing a double-word which is NOT divisible by 2 in the first half of the quad-word.
This double-word represents the type of scheme object.
Symbols and statically allocated objects are not garbage-collected, but they must begin on a wyde boundary so that the addresses stored in pairs are always divisible by 2.
By virtue of the fact that all scheme objects begin on a wyde boundary, scheme objects other than scheme pairs are differentiated from scheme pairs by storing a tetra which is NOT divisible by 2 in the first tetra of the octa.
This tetra represents the type of scheme object.
The low byte of this type represents the major type classification used by the procedures boolean?, pair?, procedure?, char?, number?, symbol?, string?, vector?, input-port?, and output-port?.
Statically allocated objects are given a type which is negative so that the garbage collector can easily ignore them.
Statically allocated objects are given a type which is negative (sign bit set) so that the garbage collector can easily ignore them.
All 256 ascii chars, #t and #f, and the end-of-file object are statically allocated.
Only one double-word is necessary for these statically allocated objects.
In the case of chars and booleans, the value is stored in the high byte of the low word.
Only one tetra is necessary for these statically allocated objects.
In the case of chars and booleans, the value is stored in the high byte of the low wyde.
</p>
<table border>
<tr>
<th></th><th>Bit:</th><th>0-7</th><th>8</th><th>9</th><th>10-15</th><th>16</th><th>17</th><th>18-30</th><th>31</th>
</tr>
<tr>
<th colspan="2">Scheme Object</th><th>Major Type</th><th>CDR is scheme?</th><th colspan="2">Minor Type</th><th colspan="3">Type Specific Info</th><th>Statically Allocated?</th>
</tr>
<tr>
<td rowspan="2">NUMBER</td><td>INTEGER</td><td rowspan="2">3</td><td>0</td><td colspan="2">0</td><td rowspan="2">Exact?</td><td>Negative?</td><td>Length in tetras (or 0 if value is stored directly in CDR)</td><td rowspan="2">0</td>
</tr>
<tr>
<td>RATIONAL</td><td>1</td><td colspan="2">0</td><td colspan="2"></td>
</tr>
<tr>
<tr>
<td rowspan="2">PORT</td><td>OUTPUT</td><td rowspan="2">5</td><td>0</td><td colspan="2">0</td><td colspan="3" rowspan="2"></td><td rowspan="2">0</td>
</tr>
<tr>
<td>INPUT</td><td>0</td><td colspan="2">1</td>
</tr>
<td colspan="2">STRING</td><td>7</td><td>0</td><td>Immutable?</td><td colspan="4">Length in bytes</td><td>0</td>
</tr>
<tr>
<td colspan="2">VECTOR</td><td>9</td><td>1</td><td>0</td><td colspan="4">Length in objects</td><td>0</th>
</tr>
</table>
<p>
Symbols are also given a negative type, since they are not garbage collected, but they are dynamically created in a separate memory area devoted to them.
Each symbol begins with the double-word type header (on a 2 byte boundary) which is followed by the bytes of ascii code that form the name of the symbol.
A null (0) byte marks the end of the symbol name.
Each symbol begins with the tetra type header (on a wyde boundary) which is followed by the bytes of ascii code that form the name of the symbol.
A zero byte marks the end of the symbol name.
The address of every symbol is stored in a simple hash keyed on just the first character of the symbol.
</p>
<p>
Strings, unlike symbols, are stored along with the other dynamically allocated objects, and use the second double-word to store the address of their string of ascii byte codes (ending with a null byte).
Furthermore, unlike symbols, the high word of the type field is used to store the length of the string (prior to the terminating null byte.)
Another pair of memory areas of equal size is used to store these strings of ascii byte codes.
Strings, unlike symbols, are stored along with the other dynamically allocated objects, and use the second tetra to store the address of their string of ascii byte codes.
Furthermore, unlike symbols, the length of the string is stored in the header.
For mutable strings (the default) another pair of memory areas of equal size is used to store these strings of ascii byte codes.
When the active string storage area becomes full, the garbage collector copies in-use string data to the other string storage area (which then becomes the active one).
Otherwise the garbage collector leaves these string storage areas untouched.
The ascii data for immutable strings (produced by symbol->string) are always ignored by the garbage collector.
</p>
<p>
Vectors are stored as consecutive pairs (but the first half of the first pair is the vector type header.)
The high word of the type field is used to store the length of the vector.
This and all other objects which require more than a quad-word of storage simply store the address of a scheme object in the second double-word and set the low bit in the high byte of the low word of their type to indicate to the garbage collector that this address in the second double-word must be followed just as if it were the cdr of a pair.
Vectors are stored as consecutive pairs (but the first tetra of the first pair is the vector type header.)
The length of the vector is stored in the header.
This and all other objects which require more than an octa of storage simply store the address of a scheme object in the second tetra and set the low bit in the high byte of the low wyde of their type to indicate to the garbage collector that this address in the second tetra must be followed just as if it were the cdr of a pair.
</p>
<p>
Procedures store their starting address in the second double-word.
Procedures store their starting address in the second tetra.
</p>
<p>
Number types are distinguished by the high byte of the low word, which increases as the complexity of the type of number ascends the numeric tower. Integers simply store their 32 bit signed value in the second half of the quad-word. Rationals are stored as a pair of integers, thus the low bit in the high byte of the low word of their type is set so that the garbage collector will see the pair. Inexactness of a number is indicated by setting the lowest bit of the high word of the number's type. Note, however, that all internal representations of numbers are exact (no floating point numbers are used), and so inexactness is given only as an auxiliary property of the number.
Number types are distinguished by the high byte of the low wyde, which increases as the complexity of the type of number ascends the numeric tower. Integers simply store their 32 bit unsigned value in the second tetra. But integers requiring more than 32 bits for their absolute value store their length (in tetras) in the high 14 bits of their type tetra (sign bit clear), and the address of their data (just like string storage) in the second tetra. Rationals are stored as a pair of integers, thus the low bit in the high byte of the low wyde of their type is set so that the garbage collector will copy the pair. Inexactness of a number is indicated by setting the lowest bit of the high wyde of the number's type. Note, however, that all internal representations of numbers are exact (no floating point numbers are used), and so inexactness is given only as an auxiliary property of the number.
</p>
<p>
Input ports use the low byte of the high word of the type field to store the last character read by the (peek) procedure. The second half of the quad-word for both input and output ports holds the file descriptor associated with the port.
The ports returned by (current-input-port) and (current-output-port) are stored internally for efficiency's sake.
They therefore must be treated by the garbage collector as if they were registers.
Input ports use the low byte of the high wyde of the type field to store the last character read by the (peek) procedure. The second tetra for both input and output ports holds the file descriptor associated with the port.
The ports returned by (current-input-port) and (current-output-port) are stored internally for efficiency's sake; they therefore must be treated by the garbage collector as if they were registers (and hence saved on the scheme stack before garbage collection begins and restored afterward).
</p>
<p>
Closures are initially stored simply as scheme along with the enclosing environment.
@ -95,14 +132,14 @@ This garbage collection process for machine code may be invoked from scheme with
<p>
The scheme object stack is maintained as a scheme list (dynamically allocated as pairs).
The garbage collector, when it runs, begins at the root of this scheme list.
Hence when garbage collection commences, only the registers and the current input and output ports need be pushed on to this scheme object stack and popped off afterwards to insure that all reachable objects are retained thoughout the garbage collection process.
The native stack (pointed to by the ESP register) is used for the flow of continuation control.
Consequently when call-with-current-continuation is invoked, this native stack is copied to a scheme list with each address represented as an integer.
Hence when garbage collection commences, only the registers and the current input and output ports need be saved on this scheme object stack and restored afterward to insure that all reachable objects are retained throughout the garbage collection process.
The native x86 stack (pointed to by the ESP register) is used for the flow of continuation control.
Consequently when call-with-current-continuation is invoked, this native stack is copied to a scheme list with each address represented as an Integer.
</p>
<h2>Scheme Registers</h2>
<p>
The registers denoted by EXP, ENV, UNEV, ARGL, VAL, and FREE in <a href="http://mitpress.mit.edu/sicp/full-text/book/book.html">Structure and Interpretation of Computer Programs</a> are implemented by the machine registers EDX, EBP, ESI, EDI, EAX, and EBX respectively.
The registers EXP, ENV, UNEV, ARGL, and VAL must point to a valid scheme object (or null) when the garbage collector is invoked.
The registers EXP, ENV, UNEV, ARGL, and VAL must point to a valid scheme object (or be zero) when the garbage collector is invoked.
Likewise, the garbage collector registers OLD, NEW, and SCAN are implemented by the machine registers ESI, EDI, and EAX respectively.
</p>
<hr>