394 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			394 lines
		
	
	
		
			17 KiB
		
	
	
	
		
			HTML
		
	
	
	
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
 | 
						|
<HTML>
 | 
						|
<HEAD>
 | 
						|
   <TITLE>Galapagos - Implementation</TITLE>
 | 
						|
   <META NAME="Author" CONTENT="">
 | 
						|
   <META NAME="GENERATOR" CONTENT="Mozilla/3.01Gold (Win95; I) [Netscape]">
 | 
						|
</HEAD>
 | 
						|
<BODY background=stone.jpg>
 | 
						|
 | 
						|
<CENTER>
 | 
						|
<A href="gui.html"><IMG border=0 src=prev.gif ALT=" [PREV] "></A>
 | 
						|
<A href="manual.html"><IMG border=0 src=next.gif ALT=" [PREV] "></A>
 | 
						|
<A href="index.html#toc"><IMG border=0 src=toc.gif ALT=" [PREV] "></A>
 | 
						|
</CENTER>
 | 
						|
<HR width=80% align=right color=blue>
 | 
						|
<P>
 | 
						|
 | 
						|
<H1 ALIGN=CENTER><A NAME="top"></A>IMPLEMENTATION</H1>
 | 
						|
 | 
						|
<UL>
 | 
						|
 <LI><A HREF="implementation.html#GARBAGE">Garbage Collection</A></LI>
 | 
						|
 <LI><A HREF="implementation.html#BOARDS">Boards</A></LI>
 | 
						|
 <LI><A HREF="implementation.html#TURTLES">Turtles</A></LI>
 | 
						|
 <LI><A HREF="implementation.html#VISION">Vision</A></LI>
 | 
						|
 <LI><A HREF="implementation.html#INTERRUPTS">Interrupts</A></LI>
 | 
						|
 <LI><A HREF="implementation.html#CLASS">Class Organization</A></LI>
 | 
						|
</UL>
 | 
						|
 | 
						|
<HR>
 | 
						|
<P>Galapagos was written using Microsoft Visual C++, and is designed to
 | 
						|
run under Windows 95. We chose Windows 95 because it seems to have the
 | 
						|
largest potential Galapagos users base. The following sections describe
 | 
						|
some implementation details of Galapagos.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The SCM interpreter we used as a base is written in C. Most of the original
 | 
						|
code we added was written in C++. The parts we used from SCM are almost
 | 
						|
identical to the original. In fact, by changing the scmconfig.h file (which
 | 
						|
contain machine-specific configuration) and #defining THREAD to be null,
 | 
						|
the C files should become equivalent to the sources we used.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>We have implemented two MT-safe FIFO message queue classes. Both will
 | 
						|
block when trying to read from an empty queue. <B>CMsgQx</B>, the extended
 | 
						|
message queue, supports the same interface as the one provided in Scheme,
 | 
						|
plus an additional support for "Urgent Messages". These take
 | 
						|
precedence over all other messages. <B>CMessageQueue</B> is message queue
 | 
						|
with exactly the same interface as the Scheme level message queues, but
 | 
						|
which contain internal logic to handle "Urgent" messages used
 | 
						|
to deal with cases where synchronous respond is needed, such as I/O, Garbage
 | 
						|
collection, and Scheme-level inter-thread communications. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="FITTING SCM INTO"></A>FITTING SCM INTO WINDOWS<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P>Galapagos is based on SCM, which is a single-thread, read-evaluate-print-loop
 | 
						|
(repl) based Scheme interpreter. The most important issue in migrating
 | 
						|
SCM was how to maintain the interpreter's natural repl-based approach,
 | 
						|
yet allow for multiple threads to interact, and for Windows messages to
 | 
						|
be processed quickly.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>We used WIN32's multithreading capabilities to solve these problems.
 | 
						|
A single thread handles all aspects of the GUI - in a sense, "all
 | 
						|
that is Windows": graphic boards, turtles, consoles, menus and so
 | 
						|
on. Each interpreter runs in a thread of its own, interacting with the
 | 
						|
GUI using a message queue similar to the one provided at the Scheme level.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<CENTER><P><IMG border=0 SRC="img00011.gif" HEIGHT=182 WIDTH=530><BR>
 | 
						|
</P></CENTER>
 | 
						|
 | 
						|
<P>The GUI thread manages both commands received from the OS and from the
 | 
						|
different interpreters running on their own threads. To ensure as fast
 | 
						|
responses as possible, priorities are used: OS messages (such as windows
 | 
						|
updates and input devices) gets highest priority; Console (text messages)
 | 
						|
come second, and graphics messages are last. This allows the interpreters
 | 
						|
to run interactively in a satisfactory manner. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>SCM interpreter threads each run in the old-fashion repl mode. When
 | 
						|
a computation is over, the interpreter blocks until new input comes from
 | 
						|
the GUI thread. All blocking functions were modified to allow synchronous
 | 
						|
messages (such as the one generated by <B>tell-thread</B>) to work. In
 | 
						|
addition, SCM's "poll routine" is used to force checking for
 | 
						|
such messages even during computations.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>An additional thread is used for Garbage Collection. It is described
 | 
						|
in detail in the section dealing with garbage collection.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="GARBAGE"></A>GARBAGE COLLECTION</H2>
 | 
						|
 | 
						|
<P>In this section we will briefly describe SCM's garbage collector, and
 | 
						|
then discuss the modifications done to adapt it to Galapagos's multithreading
 | 
						|
computations. It should be noted that the garbage collector used is a portable
 | 
						|
garbage collector taken from "SCHEME IN ONE DEFUN, BUT IN C THIS TIME",
 | 
						|
by George J Carrette <gjc@world.std.com>. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>SCM uses a conservative Mark & Sweep garbage collector (GC). All
 | 
						|
Scheme data objects share some common configuration (called "cells"):
 | 
						|
They are 8-byte aligned; they reside is specially-allocated memory segments
 | 
						|
(called hplims); they are the size of two pointers (so a scheme cons is
 | 
						|
exactly a cell); and they contain a special GC bit used by the garbage
 | 
						|
collector. This bit is 0 during actual computations. When a new cell is
 | 
						|
needed and all the hplims are used, garbage collection is initiated. If
 | 
						|
it does not free enough space to pass a certain threshold, a new hplim
 | 
						|
is allocated.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The first stage in garbage collection is marking all cells which are
 | 
						|
not to be deleted. Some terminology might be helpful here: <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>A cell (or any data object) is called <B><I>alive</I></B> if it may
 | 
						|
in some way influence the future of the computation. Needless to say, discovering
 | 
						|
which cells are alive and which are not is impossible, because of the very
 | 
						|
nature of the future. </LI>
 | 
						|
</UL>
 | 
						|
 | 
						|
<UL>
 | 
						|
<LI>A cell is called <B><I>reachable</I></B> if the computation can read
 | 
						|
its value. Some data is <I>immediately reachable</I>: The data on the machine's
 | 
						|
stack or in the CPU registers, for example; some interpreters store some
 | 
						|
information in a fixed location so it's permanently reachable. In SCM the
 | 
						|
array <TT><FONT FACE="Courier New">sys_protects[]</FONT></TT> is used for
 | 
						|
this propose. The set of reachable cells is the union of all immediately
 | 
						|
reachable cells, and all those cells pointed by reachable cells, recursively.
 | 
						|
</LI>
 | 
						|
</UL>
 | 
						|
 | 
						|
<P>Obviously, all unreachable data is dead. Conservative garbage collectors
 | 
						|
treat all reachable data as alive. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The Mark stage of the garbage collector scans the <TT><FONT FACE="Courier New">sys_protects[]</FONT></TT>
 | 
						|
array and the machine's stack and registers for anything that looks like
 | 
						|
a valid cell. All cell pointer have their 3 least significant bits zero,
 | 
						|
and are in one of few known ranges (the hplims). The garbage collector
 | 
						|
searches for anything matching a cell's bit pattern, and treats it as an
 | 
						|
immediately reachable cell pointer. In some cases, this may mean an integer,
 | 
						|
for example, happens to match the binary pattern and thus be interpreted
 | 
						|
as a cell pointer. However, this will only mean some cell or cells are
 | 
						|
marked as reachable though they are not such. Because of the uniform structure
 | 
						|
of the cell and its limited range of possible locations, such an accident
 | 
						|
is guarantied not to corrupt memory. Furthermore, if we accept the assumption
 | 
						|
that integers are usually relatively small, and memory addresses are relatively
 | 
						|
big, we conclude that such accidents are not very likely to happen often
 | 
						|
anyway.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>During mark stage, the garbage collector recursively finds (a superset
 | 
						|
of) all live cells, and marks them by setting their special GC bit to 1.
 | 
						|
The second stage is the Sweep stage, in which all the hplims are scanned
 | 
						|
linearly, and every cell which is not marked is recognized as dead, and
 | 
						|
as such is reclaimed as free. Marked cells get unmarked so they are ready
 | 
						|
for the next garbage collection. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>Mark & Sweep garbage collection has two main disadvantages: One,
 | 
						|
that it requires all computation to stop while garbage collection is in
 | 
						|
progress. In Galapagos, since all threads use a shared memory heap, it
 | 
						|
means all threads must synchronize and halt while garbage is collected.
 | 
						|
Second, Mark & Sweep is very likely to cause memory fragmentation.
 | 
						|
However, since cells are equally sized, fragmentation is only rarely a
 | 
						|
problem.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>We chose to stick with Mark & Sweep in Galapagos because of its
 | 
						|
two major advantages: Simplicity and efficiency. Mark & Sweep GC does
 | 
						|
not affect computation speed, because direct pointers are used. Most concurrent
 | 
						|
garbage collectors work by making all pointers indirect, which may slow
 | 
						|
computations down considerably. The need to halt all threads for GC is
 | 
						|
accepted. Since memory is shared, it would only be fair to stop all threads
 | 
						|
when GC is needed: Threads will probably halt anyway since cells are allocated
 | 
						|
continuously during computations.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>Two major issues are introduced when trying to multithread the garbage
 | 
						|
collector. One is the synchronization of the different threads, which run
 | 
						|
almost completely unaware one of the other; the second is the need to mark
 | 
						|
data from every thread's specific stack, registers, and <TT><FONT FACE="Courier New">sys_protects[]</FONT></TT>.
 | 
						|
We solved these two issues by combining them to one.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The intuitive approach might be to let each thread mark its own information,
 | 
						|
and then sweep centrally. However, since synchronization of threads is
 | 
						|
mandatory, letting every thread mark its own data will lead only to redundant
 | 
						|
marking and to excessive context switches, since each threads has to become
 | 
						|
active. Therefore we created the "<B>Garbage Collection daemon</B>"
 | 
						|
(GCd), which runs in a distinct thread and lasts for the whole Galapagos
 | 
						|
session. The GCd is not an interpreter, but a mechanism which keeps track
 | 
						|
of live threads and their need of GC. The GCd thread is always blocked,
 | 
						|
except when a thread notifies it on its birth or death, or when a thread
 | 
						|
indicates the need for garbage collection. Since the GC daemon is blocked
 | 
						|
whenever it is not needed, and then becomes the exclusive running thread
 | 
						|
during actual GC (with the exception of the GUI thread), its existence
 | 
						|
does not hurt performance. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>To explain how the GCd synchronizes all threads, let us examine the
 | 
						|
three-way protocol involved. <TT><FONT FACE="Courier New">freelist</FONT></TT>
 | 
						|
is a global pointer which holds a linked list of free cells - it can be
 | 
						|
either a cell pointer, a value indicating "busy" (thus implementing
 | 
						|
busy/wait protection over it) or "end of memory" which is found
 | 
						|
at the end of the linked list. MIB stands for <B><I>Memory Information
 | 
						|
Block</I></B>, which is a block of memory containing all of a thread's
 | 
						|
immediately reachable data.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P><B><U>GCd scenario:</U></B> GCd is blocked until a threads sends a GC
 | 
						|
request. </P>
 | 
						|
 | 
						|
<OL>
 | 
						|
<LI>GCd scans through its list of active threads, and sends each a <I>MIB
 | 
						|
request</I>. </LI>
 | 
						|
 | 
						|
<LI>It then blocks until all MIB blocks are received. GCd ignores further
 | 
						|
GC request messages it get. </LI>
 | 
						|
 | 
						|
<LI>At this point all threads are blocked. The GCd has gained, therefore,
 | 
						|
exclusive access to the hplims. The GCd now marks all reachable cells,
 | 
						|
inspecting each MIB block for immediately reachable cells and proceeding
 | 
						|
recursively. Then, it sweeps. </LI>
 | 
						|
 | 
						|
<LI>If needed, the GCd allocated a fresh hplim. </LI>
 | 
						|
 | 
						|
<LI>GCd sends every thread a message allowing it to resume. Then it blocks
 | 
						|
waiting for the next time. </LI>
 | 
						|
</OL>
 | 
						|
 | 
						|
<P><B><U>Scenario 1:</U></B> A thread needs to allocate a cell but can't.
 | 
						|
</P>
 | 
						|
 | 
						|
<OL>
 | 
						|
<LI>The thread sends GCd a<I> GC request message.</I> </LI>
 | 
						|
 | 
						|
<LI>Then it suspends until GCd sends it an <I>MIB request</I>. </LI>
 | 
						|
 | 
						|
<LI>When one arrives, the thread generates and sends a MIB block to the
 | 
						|
GCd. </LI>
 | 
						|
 | 
						|
<LI>And blocks again until GCd notifies it that GC is done. </LI>
 | 
						|
 | 
						|
<LI>At this point free cells are available and the computation can resume.
 | 
						|
</LI>
 | 
						|
</OL>
 | 
						|
 | 
						|
<P><B><U>Scenario 2:</U></B> A thread receives a<I> MIB request</I>. This
 | 
						|
may happen within a computation or when considered otherwise blocked -
 | 
						|
waiting for input, for example. </P>
 | 
						|
 | 
						|
<OL>
 | 
						|
<LI>The thread generates and sends a MIB block to the GCd. </LI>
 | 
						|
 | 
						|
<LI>And blocks until GCd notifies it that GC is done. </LI>
 | 
						|
</OL>
 | 
						|
 | 
						|
<P>The important thing to note about this protocol is its indifference
 | 
						|
to the GC "initiator". Several threads can "initiate"
 | 
						|
GC, and each request is "satisfied", although of course only
 | 
						|
one GC takes place. The GCd itself is unaware of the initiating thread
 | 
						|
identity, and completely ignores any further GC requests. It treats all
 | 
						|
threads identically. This is important because it allows each thread meeting
 | 
						|
a low memory condition to initiate GC immediately. This is in fact the
 | 
						|
mechanism which saves us from explicitly checking for a third-party GC
 | 
						|
request during computation: If a thread runs out of memory, the <TT><FONT FACE="Courier New">freelist</FONT></TT>
 | 
						|
variable is kept at "out of memory" state, causing any other
 | 
						|
thread trying to allocate memory to initiate GC as well. This simplifies
 | 
						|
the GC protocol (technically, if not conceptually), and does it with almost
 | 
						|
no affect on computation speed.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="BOARDS"></A>BOARDS<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P>A board or a view as it called in MFC is the environment where a turtle
 | 
						|
moves and interact with. It hold two main data structures. The first is
 | 
						|
the bitmap of the drawing. It is a 800X600 bitmap. Every time a turtle
 | 
						|
draws on the board it makes its pen the active pen on the board and draws
 | 
						|
on it. Every time the picture needs refreshing (as signaled by the operating
 | 
						|
system) it is the board's duty to copy the relevant section from the bitmap
 | 
						|
to the screen. The second data object is the turtles list, an expandable
 | 
						|
array of turtles which holds pointers to all turtles on the specific board.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The most important part in the board's work is to notify the turtles
 | 
						|
on any event that happened on the board such as drawing, changing background
 | 
						|
or moving of a turtle. If for example a user draws a line on the board,
 | 
						|
the board (after drawing the line) goes through the turtles list and tell
 | 
						|
each one that some event happened at a rectangle that contains the line.
 | 
						|
Each turtle will decide if this has any importance to it or not.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>Apart from that, the board handles all the user interface from the menus
 | 
						|
and the toolbar. The most obvious example is the move turtle button, which,
 | 
						|
when pressed, causes the board to find a turtle close enough to the click's
 | 
						|
location. Then, on every movement of the mouse it gives the turtle a command
 | 
						|
to move to this point. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>In order to support scrolling of the picture, we derived the <B>CBoardView</B>
 | 
						|
class from <B>CScrollView</B>. The interface with the interpreter threads
 | 
						|
is done via a message queue. The main function is <TT><FONT FACE="Courier New">ReadAndEval,</FONT></TT>
 | 
						|
which gets a message and then interprets its and act upon the result.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="TURTLES"></A>TURTLES<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P>In addition to a pointer to its current board, and to inner-state variable
 | 
						|
which affects its graphical aspects, every turtle holds an expandable array
 | 
						|
of interrupts. When a turtle gets from the window that a message signifying
 | 
						|
that some change has happened, it sends this change to each of its interrupts
 | 
						|
(only if the interrupt flag is on) and the interrupt is responsible to
 | 
						|
send an appropriate message if necessary. The turtle's location are stored
 | 
						|
as floating points on x,y axes, to allow for accuracy on the turtle's location
 | 
						|
and heading.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>The turtles interacts with the interpreter thread using a message queue.
 | 
						|
As with the board, the main function here is the <TT><FONT FACE="Courier New">ReadAndEval,</FONT></TT>
 | 
						|
which translates these messages to valid function calls.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="VISION"></A>VISION<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P>Every turtle holds a pointer to the bitmap it is drawing on. When it
 | 
						|
is "looking" for a color it calculates the minimal rectangle
 | 
						|
that holds the desired area. Then it iterates on all the pixels in this
 | 
						|
rectangle. First it checks if the pixel is in the vision area using the
 | 
						|
sign rule to determine if a point is clockwise or anti clockwise from a
 | 
						|
line, and then check for distance. If the point is in the relevant area
 | 
						|
the turtle gets its color from the bitmap and compares it with the sought
 | 
						|
color. <BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>When looking for a specific turtle, the turtle gets this turtle's position
 | 
						|
and calculates if this location is in the relevant area using the same
 | 
						|
algorithm. When looking for any turtle, the turtle passes the relevant
 | 
						|
arguments to the view, which then uses the same algorithm for each turtle
 | 
						|
on its turtles array.<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="INTERRUPTS"></A>INTERRUPTS<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P>Each turtle holds an expandable array of interrupts. Each interrupts
 | 
						|
is an object that is much like a turtle vision. The interrupts has the
 | 
						|
view area argument and what it is looking for. It also has the message
 | 
						|
it needs to send and a pointer to the given queue. When the turtle notifies
 | 
						|
the interrupt that a change has happened, the interrupt first checks if
 | 
						|
the change is an area which is of interest to it. If so it calls the turtle
 | 
						|
look function with its location and the sought object. According to the
 | 
						|
turtle's answer and to the data stored inside the interrupt, the interrupt
 | 
						|
sends the needed message, if any.<BR>
 | 
						|
<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<H2><A NAME="CLASS"></A>CLASS ORGANIZATION<BR>
 | 
						|
</H2>
 | 
						|
 | 
						|
<P><CENTER><IMG border=0 SRC="img00012.gif" HEIGHT=626 WIDTH=693><BR>
 | 
						|
</CENTER>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>Message-passing mechanisms:<BR>
 | 
						|
</P>
 | 
						|
 | 
						|
<P>
 | 
						|
<CENTER><IMG border=0 SRC="img00013.gif" HEIGHT=198 WIDTH=694> </CENTER></P>
 | 
						|
<FONT SIZE=+3></FONT>
 | 
						|
 | 
						|
 | 
						|
<P>
 | 
						|
<HR width=80% align=left color=blue>
 | 
						|
<CENTER>
 | 
						|
<A href="#top"><IMG border=0 src=back.gif ALT=" [TOP] "></A>  
 | 
						|
<A href="gui.html"><IMG border=0 src=prev.gif ALT=" [PREV] "></A>
 | 
						|
<A href="manual.html"><IMG border=0 src=next.gif ALT=" [PREV] "></A>
 | 
						|
<A href="index.html#toc"><IMG border=0 src=toc.gif ALT=" [PREV] "></A>
 | 
						|
</CENTER>
 | 
						|
</BODY>
 | 
						|
</HTML>
 | 
						|
 | 
						|
 | 
						|
 | 
						|
 |