262 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
			
		
		
	
	
			262 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
| [This file and the accompanying source code were derived from
 | |
|  Oleg's code for Gambit available from
 | |
| 
 | |
|         http://okmij.org/ftp/Scheme/lib/treap.scm
 | |
| 
 | |
| ]
 | |
| 
 | |
| 
 | |
| An implementation of an ordered dictionary data structure, based
 | |
| on randomized search trees (treaps) by Seidel and Aragon:
 | |
| 
 | |
| 	R. Seidel and C. R. Aragon. Randomized Search Trees.
 | |
| 	Algorithmica, 16(4/5):464-497, 1996.
 | |
| 
 | |
| 
 | |
| This code defines a treap object that implements an ordered dictionary
 | |
| mapping of keys to values. The object responds to a variety of query and
 | |
| update messages, including efficient methods for finding the minimum and
 | |
| maximum keys and their associated values as well as traversing of a
 | |
| treap in an ascending or descending order of keys. Looking up an arbitrary
 | |
| or the min/max keys, and deleting the min/max keys require no more
 | |
| key comparisons than the depth of the treap, which is O(log n) where
 | |
| n is the total number of keys in the treap. Arbitrary key deletion and
 | |
| insertions run in O(log n) _amortized_ time.
 | |
| 
 | |
| This code is inspired by a Stefan Nilsson's article "Treaps in Java"
 | |
| (Dr.Dobb's Journal,  July 1997, p.40-44) and by the Java implementation
 | |
| of treaps described in the article. Yet this Scheme code has been
 | |
| developed completely from scratch, using the description of the algorithm
 | |
| given in the article, and insight gained from examining the Java source
 | |
| code. As a matter of fact, treap insertion and deletion algorithms
 | |
| implemented in this code differ from the ones described in the article
 | |
| and implemented in the Java code; this Scheme implementation uses fewer
 | |
| assignments and comparisons (see below for details). Some insight as
 | |
| to a generic tree interface gleaned from wttree.scm, "Weight balanced
 | |
| trees" by Stephen Adams (a part of The Scheme Library, slib2b1).
 | |
| 
 | |
| A treap is a regular binary search tree, with one extension. The extension
 | |
| is that every node in a tree, beside a key, a value, and references to
 | |
| its left and right children, has an additional constant field, a priority.
 | |
| The value of this field is set (to a random integer number) when the node
 | |
| is constructed, and is not changed afterwards. At any given moment,
 | |
| the priority of every non-leaf node never exceeds the priorities of its
 | |
| children. When a new node is inserted, we check that this invariant holds;
 | |
| otherwise, we perform a right or left rotation that swaps a parent and
 | |
| its kid, keeping the ordering of keys intact; the changes may need to be
 | |
| propagated recursively up. The priority property, and the fact they are
 | |
| chosen at random, makes a treap look like a binary search tree built by
 | |
| a random sequence of insertions. As the article shows, this makes a treap
 | |
| a balanced tree: the expected length of an average search path is roughly
 | |
| 1.4log2(n)-1, and the expected length of the longest search path is about
 | |
| 4.3log2(n). See the Stefan Nilsson's article for more details.
 | |
| 
 | |
| 
 | |
| (make-treap key-compare) -> treap
 | |
| 
 | |
| Creates a treap object. Here KEY-COMPARE-PROC is a user-supplied
 | |
| function
 | |
| 
 | |
|      KEY-COMPARE-PROC key1 key2
 | |
| 
 | |
| that takes two keys and returns a negative, positive, or zero number
 | |
| depending on how the first key compares to the second. The treap uses
 | |
| SRFI-27's DEFAULT-RANDOM-SOURCE so you may want to initialize this
 | |
| once with (random-source-randomize! default-random-source).
 | |
| 
 | |
| 
 | |
| (treap? thing) -> boolean
 | |
| 
 | |
| Type predicate for treaps.
 | |
| 
 | |
| 
 | |
| (treap-get treap key [default-clause]) -> key-value pair or value of default-clause
 | |
| 
 | |
| Searches the treap for an association with a given KEY, and returns a
 | |
| (key . value) pair of the found association.  If an association with
 | |
| the KEY cannot be located in the treap, the PROC returns the result of
 | |
| evaluating the DEFAULT-CLAUSE.  If the default clause is omitted, an
 | |
| error is signalled.  The KEY must be comparable to the keys in the
 | |
| treap by a key-compare predicate (which has been specified when the
 | |
| treap was created)
 | |
| 
 | |
| 
 | |
| (treap-get-min treap) -> key-value pair
 | |
| (treap-get-max treap) -> key-value pair
 | |
| 
 | |
| return a (key . value) pair for an association in the treap with the
 | |
| smallest/largest key. If the treap is empty, an error is signalled.
 | |
| 
 | |
| 
 | |
| (treap-delete-min! treap) -> key-value pair
 | |
| (treap-delete-max! treap) -> key-value pair
 | |
| 
 | |
| remove the min/max key and the corresponding association from the
 | |
| treap. Return a (key . value) pair of the removed association.  If
 | |
| the treap is empty, an error is signalled.
 | |
| 
 | |
| 
 | |
| (treap-empty? treap) -> boolean
 | |
| 
 | |
| returns #t if the treap is empty.
 | |
| 
 | |
| 
 | |
| (treap-size treap) -> integer
 | |
| 
 | |
| returns the size (the number of associations) in the treap.
 | |
| 
 | |
| 
 | |
| (treap-depth treap) -> integer
 | |
| 
 | |
| returns the depth of the tree. It requires the complete traversal of
 | |
| the tree, so use sparingly.
 | |
| 
 | |
| 
 | |
| (treap-clear! treap) -> unspecific
 | |
| 
 | |
| removes all associations from the treap (thus making it empty).
 | |
| 
 | |
| 
 | |
| (treap-put! treap key value) -> key-value pair or #f
 | |
| 
 | |
| adds the corresponding association to the treap. If an association
 | |
| with the same KEY already exists, its value is replaced with the VALUE
 | |
| (and the old (key . value) association is returned). Otherwise, the
 | |
| return value is #f.
 | |
| 
 | |
| 
 | |
| (treap-delete! treap key [default-clause]) -> key-value pair
 | |
| 
 | |
| searches the treap for an association with a given KEY, deletes it,
 | |
| and returns a (key . value) pair of the found and deleted association.
 | |
| If an association with the KEY cannot be located in the treap, the
 | |
| PROC returns the result of evaluating the DEFAULT-CLAUSE.  If the
 | |
| default clause is omitted, an error is signalled.
 | |
| 
 | |
| 
 | |
| (treap-for-each-ascending treap proc) -> unspecific
 | |
| 
 | |
| applies the given procedure PROC to each (key . value) association of
 | |
| the treap, from the one with the smallest key all the way to the one
 | |
| with the max key, in an ascending order of keys.  The treap must not
 | |
| be empty.
 | |
| 
 | |
| 
 | |
| (treap-for-each-descending treap proc) -> unspecific
 | |
| 
 | |
| applies the given procedure PROC to each (key . value) association of
 | |
| the treap, in the descending order of keys.  The treap must not be
 | |
| empty.
 | |
| 
 | |
| 
 | |
| (treap-debugprint treap) -> unspecific
 | |
| 
 | |
| prints out all the nodes in the treap, for debug purposes.
 | |
| 
 | |
| 
 | |
| 
 | |
| Notes on the algorithm
 | |
| As the DDJ paper shows, insertion of a node into a treap is a simple
 | |
| recursive algorithm, Example 1 of the paper. This algorithm is implemented
 | |
| in the accompanying source [Java] code as
 | |
| <BLOCKQUOTE>
 | |
|   private Tree insert(Tree node, Tree tree) {
 | |
|      if (tree == null) return node;
 | |
|      int comp = node.key.compareTo(tree.key);
 | |
|      if (comp < 0) {
 | |
|         tree.left = insert(node, tree.left);
 | |
|         if (tree.prio > tree.left.prio)
 | |
|            tree = tree.rotateRight();
 | |
|      } else if (comp > 0) {
 | |
|         tree.right = insert(node, tree.right);
 | |
|         if (tree.prio > tree.right.prio)
 | |
|            tree = tree.rotateLeft();
 | |
|      } else {
 | |
|         keyFound = true;
 | |
|         prevValue = tree.value;
 | |
|         tree.value = node.value;
 | |
|      }
 | |
|      return tree;
 | |
|   }
 | |
| </BLOCKQUOTE>
 | |
| 
 | |
| This algorithm, however, is not as efficient as it could be. Suppose we
 | |
| try to insert a node which turns out to already exist in the tree,
 | |
| at a depth D. The algorithm above would descend into this node in the
 | |
| winding phase of the algorithm, replace the node's value, and, in the
 | |
| unwinding phase of the recursion, would perform D assignments of the kind
 | |
|       tree.left = insert(node, tree.left);
 | |
| and D comparisons of nodes' priorities. None of these priority checks and
 | |
| assignments are actually necessary: as we haven't added any new node,
 | |
| the tree structure hasn't changed.
 | |
| 
 | |
| Therefore, the present Scheme code implements a different insertion
 | |
| algorithm, which avoids doing unnecessary operations. The idea is simple:
 | |
| if we insert a new node into some particular branch of the treap and verify
 | |
| that this branch conforms to the treap priority invariant, we are certain
 | |
| the invariant holds throughout the entire treap, and no further checks 
 | |
| (up the tree to the root) are necessary. In more detail:
 | |
|       - Starting from the root, we recursively descend until we find
 | |
|         a node with a given key, or a place a new node can be inserted.
 | |
|       - We insert the node and check to see if its priority is less than
 | |
|         that of its parent. If this is the case, we left- or right-rotate
 | |
|         the tree to make the old parent a child of the new node, and the
 | |
|         new node a new root of this particular branch. We return this new
 | |
|         parent as an indication that further checks up the tree are
 | |
|         necessary. If the new node conforms to the parent's priority, we
 | |
|         insert it and return #f
 | |
|       - On the way up, we check the priorities again and rotate the tree
 | |
|         to restore the priority invariant at the current level if needed.
 | |
|       - If no changes are made at the current level, we return a flag #f
 | |
|         meaning that no further changes or checks are necessary at the
 | |
|         higher levels.
 | |
| Thus, if a new node was originally inserted at a level D in the tree (level
 | |
| 0 being the root) but then migrated up by L levels (because of its priority),
 | |
| the original insertion algorithm would perform (D-1) assignments,
 | |
| (D-1) priority checks, plus L rotations (at a cost of 2 assignments in the
 | |
| treap each). Our algorithm does only (L+1) node priority checks and
 | |
| max(2(L-1)+2,1) assignments.
 | |
| Note if priorities are really (uniformly) random, L is uniformly distributed
 | |
| over [0,D], so the average cost of our algorithm is 
 | |
| 	D/2 +1 checks and D assignments
 | |
| compared to
 | |
|       D-1 checks and 2D-1 assignments
 | |
| for the original algorithm described in the DDJ paper.
 | |
| 
 | |
| The similar gripe applies to the Java implementation of a node deletion
 | |
| algorithm:
 | |
| <BLOCKQUOTE>
 | |
|   private Tree delete(Ordered key, Tree t) {
 | |
|      if (t == null) return null;
 | |
|      int comp = key.compareTo(t.key);
 | |
|      if (comp < 0)
 | |
|         t.left = delete(key, t.left);
 | |
|      else if (comp > 0)
 | |
|         t.right = delete(key, t.right);
 | |
|      else {
 | |
|         keyFound = true;
 | |
|         prevValue = t.value;
 | |
|         t = t.deleteRoot();
 | |
|      }
 | |
|      return t;
 | |
|   }
 | |
| </BLOCKQUOTE>
 | |
| 
 | |
| The algorithm as implemented looks fully-recursive. Furthermore, deletion
 | |
| of a node at a level D in the treap involves at least D assignments, most
 | |
| of them being unnecessary. Indeed, if a node being deleted is a leaf, only
 | |
| one change to the tree is needed to detach the node. Deleting a node
 | |
| obviously requires a left- or a right-kid field of the node's parent be
 | |
| modified (cleared). This change, however does NOT need to be propagated up:
 | |
| deleting of a node does not violate neither ordering nor priority invariants
 | |
| of the treap; all changes are confined to a branch rooted at the
 | |
| parent of the deleted node.
 | |
| This Scheme code implements node deletion algorithm in the optimal way,
 | |
| performing only those assignments which are absolutely necessary, and 
 | |
| replacing full recursions with tail recursions (which are simply iterations).
 | |
| Our implementation is also simpler and clearer, making use of a helper
 | |
| procedure join! to join two treap branches (while keeping both treap
 | |
| invariants intact). The deletion algorithm can then be expressed as
 | |
| replacing a node with a join of its two kids; compare this explanation
 | |
| to the one given in the DDJ paper!
 |