ffigen/chez.w

% -*- scheme -*-
%
% This is a nuweb document.
% nuweb is available by anon. ftp from cs.rice.edu in /public/preston.
%
% Author: Lars Thomas Hansen (lth@cs.uoregon.edu)
%
% Copyright (C) 1996 The University of Oregon.  All rights reserved.
% 
% This file may be freely redistributed in its entirety with or without
% modification provided that this copyright notice is not removed.  It
% may not be sold for profit or incorporated in commercial software
% products without the prior written permission of the copyright holder.

\documentstyle{article}
\oddsidemargin 0in
\evensidemargin 0in
\textwidth 6.5in
\topmargin -0.5in
\textheight 9in

\newcommand{\fixme}[1]{{\Large\bf FIXME: #1}}

\title{FFIGEN Back-end for Chez Scheme Version 5\footnote{This work
has been supported by ARPA under U.S.~Army grant No.~DABT63-94-C-0029, 
``Programming Environments, Compiler Technology and Runtime Systems 
for Object Oriented Parallel Processing''.}}
\author{Lars Thomas Hansen \\ {\tt lth@@cs.uoregon.edu}}
\date{February 7, 1996}


\begin{document}
\maketitle

\section{Introduction}

This document describes the FFIGEN back-end for Chez Scheme version 5.
The exposition is provided in the hope that it will make it reasonably
clear how back-ends for FFIGEN are constructed; many back-ends will
probably have a structure similar to that of the present program.

If you haven't read the {\em FFIGEN Manifesto and Overview}, now is a
good time to do so.

If you haven't read the {\em FFIGEN User's Manual,\/} which documents
the target-independent code for FFIGEN back-ends and the format for the
intermediate code, now is a good time to do so.

This document is organized in the following manner.  Section
\ref{overall} documents the overall structure of the translation for
Chez Scheme.  Section \ref{thecode} presents the policy choices made for
data types and most of the code implementing the policy;  section
\ref{utils} presents several utility procedures used by the
implementation.  Section \ref{discussion} discusses some of the
implementation choices, particularly in terms of performance, and
section \ref{future} summarizes future work on the Chez Scheme back-end.
Section \ref{stdlib} presents the implementation of the support
libraries.

This document is also the program: the \LaTeX\ source for the document,
the Scheme source code for the program, and the standard library sources
have been derived from the same file using the \verb|nuweb| system for
literate programming.\footnote{\verb|nuweb| was implemented by Preston
Briggs of Rice University; you may retrieve it by anonymous ftp from
\verb|cs.rice.edu| in the directory \verb|/public/preston|.}  The
document presents the program in an order which I believe is conducive
to understanding its workings.  Here's how you read it: each snippet of
code (called a {\em macro\/} or {\em scrap\/}) has a title which includes
the page number on which the scrap is defined and a letter following the
page number which shows the relative place on that page where you can
find the scrap (for example, 9a is the first scrap on page 9 and 6c is
the third scrap on page 6).  Scraps reference other scraps; these
references are the titles, so any cross-reference immediately gives you
the page number of the referenced code.  In addition, an index of global
identifiers can be found in the appendix.


\section{Overall Structure}
\label{overall}

The FFIGEN back-end for any FFI consists of two parts: a
target-independent part which reads the output of the front end into
global in-memory data structures, and a target-dependent part which does
the actual translation.  The target-dependent part is implemented by two
procedures, \verb|select-functions| and \verb|generate-translation|.

\subsection{The {\tt select-functions} procedure}

The procedure \verb|select-functions| is called during initial
processing to mark as selected those procedures which are interesting
for the translation.  This makes it possible to exclude some functions
from the translation.  For example, if you are translating the interface
to a library whose header file includes \verb|stdio.h|, you may wish to
exclude all function prototypes included from the latter file.  Since
each function record in the intermediate form has a field naming the
file the record came from, you can easily exclude functions from
\verb|stdio.h|.

The implementation of \verb|select-functions| for Chez Scheme simply 
marks all functions as selected, by turning on the ``referenced'' bit
in the record.

@d the select-functions interface procedure
@{(define (select-functions)
  (do ((functions functions (cdr functions)))
      ((null? functions))
    (referenced! (car functions))))
@| select-functions @}

\subsection{The {\tt generate-translation} procedure}

The procedure \verb|generate-translation| is called when the
target-independent part is done processing the data, and it must
implement all aspects of the target-dependent translation.  It takes no
arguments.  My version sets up the output files, initializes the files,
calls the generating functions, and cleans up.

@d the generate-translation interface procedure 
@{(define (generate-translation)
  @<initialize output files@>
  @<generate code for all constructs@>
  @<finalize output files@>
  #t)
@| generate-translation @}

\verb|generate-translation| calls on specialized procedures to 
generate code for the various intermediate language records:

@d generate code for all constructs 
@{(dump-structs)
(dump-unions)
(dump-functions)
(dump-variables)
(dump-enums)
(dump-macros)
@}

\subsection{Structure of the Scheme source file}

The final file is called \verb|chez.sch| and is laid out in the
following way:

@o chez.sch
@{@<back-end global variables@>
@<the select-functions interface...@>
@<the generate-translation interface...@>
@<Chez FFI names for primitive types@>
@<dump structs and unions@>
@<dump global variable accessors@>
@<dump function definitions@>
@<dump enum definitions@>
@<dump macro definitions@>
@<utility functions@>
@}

\subsection{The standard library}

To facilitate the translation, a small standard library has been created
ahead of time (see section \ref{stdlib} for the implementation).  The
library implements primitive dereferencing functions and a memory copy
procedure.  There is a C file and a Scheme file.  The C file must be
compiled and loaded into the Scheme system using the FFI facilities for
this (which happen to be operating-system dependent in Chez Scheme).
The Scheme FFI code for the standard library must be loaded into the
Scheme system at run time.

\section{Policy in the Chez Scheme Back-end}
\label{thecode}

Chez Scheme version 5 has a fairly simple native FFI.  It supports most
primitive C data types for both parameters and return values, and
performs data conversions on call and return.  A string is passed as a
\verb|char*| to the first character (in Chez Scheme, strings are
0-terminated for C compatibility), but is returned by copying the data
into a fresh Scheme string.  There is no direct support for the
\verb|short| datatype, variadic procedures, call by reference, calling
functions through a function pointer, struct/union arguments, or
struct/union return values, nor is there support for passing general
Scheme objects like pairs, arrays, or functions to foreign functions.

In this (first) version of the translation I have decided not to
implement any mechanisms for handling special cases like the
\verb|fgets()| example outlined in the {\em Manifesto}; that will have
to come later.  Such a mechanism might take the form of a file
containing rules for how to override certain function signatures, for
example.  See section \ref{future}.

\subsection{The Output}
\label{the-output}

Two output files are produced: a C file and a Scheme file.  The Scheme
file will hold the FFI code which interfaces to the library and its data
types.  The C file will hold supporting (``glue'') code generated along
the way: accessors, mutators, and so on.  The need for supporting code
will become clear as you read.

\verb|c-output| and \verb|sch-output| are global variables which hold
output ports for generated C code and Scheme code, respectively.

@d back-end global variables 
@{(define c-output #f)
(define sch-output #f)
@| c-output sch-output @}

The files are initialized in \verb|generate-translation| by computing
file names and assigning opened ports to the global variables; for
simplicity I use fixed names for the time being.  The use of
\verb|delete-file| is necessary in Chez Scheme, lest it complain when
opening an existing file for output.

@d initialize output files 
@{(delete-file "C-OUTPUT")
(delete-file "SCH-OUTPUT")
(set! c-output (open-output-file "C-OUTPUT"))
(set! sch-output (open-output-file "SCH-OUTPUT"))
@}

Once the files are opened they are initialized with standard
definitions.  The C file \verb|#include|s the FFIGEN standard library
header and the ANSI C standard library header; there is nothing to be
done in the Scheme case.  We should probably also emit a line here
which includes the original header file, but the name of that file
is not currently available in the intermediate format.

@d initialize output files 
@{(display "#include \"chez-stdlib.h\"" c-output) (newline c-output)
(display "#include \"stdlib.h\"" c-output) (newline c-output)
@}

When processing is done, the output files must be closed:

@d finalize output files 
@{(close-output-port c-output)
(close-output-port sch-output)
@}


\subsection{Names}

Since Scheme is case-insensitive and C is not, names which do not clash
in C may do so in Scheme.  This can be handled fairly simply by keeping
track of all generated Scheme names and warning about conflicts.  In
this version I have not implemented this, but it will be implemented
in the future.

In general, parameter names in generated C procedures are prepended with
underscores to prevent name clashes with \verb|typedef|s: names starting
with a single underscore are reserved for libraries and will in
principle not appear in user code.  However, this scheme is not
foolproof, considering that FFIGEN is used to translate library code.
We might need to come up with something better.

\subsection{Primitive Types}

The procedure \verb|chez-type| takes a primitive type structure and
returns the name of the Chez Scheme parameter or return type.  Already
we have to make hard choices: what do we do with pointers, in particular
character pointers?  In the current implementation I let a \verb|char*|
be a \verb|char*|; programs which wish to pass strings will have to 
first copy the string data into an allocated character buffer and then
pass the address of the buffer.  A function which performs the copy
operation is part of the standard library.

Other decisions I have made are to represent pointers as unsigned
32-bit ints (hence all pointers are the same size), and to represent
\verb|char|, \verb|signed char|, and \verb|unsigned char| all as the
 same \verb|char| FFI
type.  If the type is a \verb|short| variant, a warning is generated.%
\footnote{Chez Scheme does not support a \verb|short| FFI type.  Since
the ANSI C standard (or at least K\&R 2nd edition) seems to say that
a \verb|short| can be passed to a function with a prototype in scope
without being widened to \verb|int|, we can't simply use \verb|integer-32|
as the FFI argument type for a \verb|short|.  On many architectures using
a 32-bit integer would
work because the APIs don't support parameters smaller than 32 bits.  We
could also have used proxy functions; see section \ref{struct-vals}.}
Enums are assumed to be 32-bit integers.

@d Chez FFI names for primitive types 
@{(define (chez-type type)
  (case (record-tag type)
    ((pointer) 'unsigned-32)
    ((int long enum) 'integer-32)
    ((unsigned unsigned-long) 'unsigned-32)
    ((char unsigned-char signed-char) 'char)
    ((void) 'void)
    ((double) 'double-float)
    ((float) 'single-float)
    ((***invalid***) '***invalid***)
    (else
     (warn "Cannot translate this type: " type)
     (string->symbol (string-append (symbol->string '***invalid:)
				    (symbol->string (record-tag type))
				    "***")))))
@| chez-type @}

The special type \verb|***invalid***| and its variant
\verb|***invalid:tag***| are used to signal errors detected during
translation; by using these types in the output, no file that was
generated with errors can actually be used.

In addition, we have to consider how to dereference pointers to
primitive types (pointers to structured types are handled later).  For
example, if I have an \verb|int*| and I want to access the integer it
points to, how do I do this?  For this purpose, dereferencing functions
are provided in the standard library.  Each function takes a pointer
\verb|p| and an offset \verb|k| and fetches the element \verb|p[k]|.
There are also functions which store values through pointers to
primitive types.  See section \ref{stdlib} for the signatures and the
actual implementation.

By treating pointers to pointers as generic pointer types, we handle the
general case of recursive dereferencing.  Since pointers and integers
are the same size and type checking is lax, we can use the unsigned
integer access functions to follow pointer chains.  For example, if we
have an \verb|int** p| and want the integer, the code to access it would
be (in C) \verb|_ref_int(_ref_uint(p,0),0)|.

\subsection{Structured Types}

The translations for structs and unions are similar.  I'll talk about
structures, but the discussion pertains to unions as well.  The
procedures \verb|dump-structs| and \verb|dump-unions| are almost
identical, and call the same function to do their work:

@d dump structs and unions 
@{(define (dump-structs)
  (dump-struct/union structs struct-names "struct"))

(define (dump-unions)
  (dump-struct/union unions union-names "union"))
@| dump-structs dump-unions @}

\verb|struct-names| and \verb|union-names| are procedures which return a
list of all \verb|typedef|s which directly name a struct or union; see
the {\em FFIGEN User's Manual} for a discussion.

In the following, we will use this structure as an example:

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@struct X {@@\\
\mbox{}\verb@@  int i;@@\\
\mbox{}\verb@@  char c[5];@@\\
\mbox{}\verb@@  union {@@\\
\mbox{}\verb@@    char *s;@@\\
\mbox{}\verb@@    int *q;@@\\
\mbox{}\verb@@  } u;@@\\
\mbox{}\verb@@};@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

Each structure in the intermediate form has a tag (\verb|X| in the
example), although some of these tags were generated by the compiler.
Whether the tag was programmer-defined or not is important for how code
is generated (see below), and the procedure \verb|dump-struct/union|
takes it into account when generating FFI code for each referenced
structure.

Only structures which have programmer-defined tags have operations
generated for them with the name of the tag.  Structures which are
referred to by a \verb|typedef| must be handled specially.  There are
two cases.  If the \verb|typedef|'d structure also has a
programmer-defined tag, then Scheme definitions are emitted which define
operations using the \verb|typedef| name to be the same as the
operations using the structure tag.  If the type does not have a
programmer-defined tag, then operations are defined on the structure
using the \verb|typedef| name only; you get names like \verb|_get_X_i|.

@d dump structs and unions 
@{(define (dump-struct/union records typedef-name-getter qualifier)
  (for-each 
   (lambda (structure) 
     (if (referenced? structure)
	 (begin
	   (if (user-defined-tag? (tag structure))
	       (dump-struct/union-def structure qualifier (tag structure)))
	   (for-each (lambda (n)
		       (if (user-defined-tag? (tag structure))
			   (generate-reference-to-structure structure n qualifier)
			   (dump-struct/union-def structure "" n)))
		     (typedef-name-getter structure)))))
   records))
@| dump-struct/union @}

The procedure \verb|generate-reference-to-structure| takes a structure
which has a cached list of defined constructor, destructor, accessor,
and mutator function, a \verb|typedef|-name, and a qualifier (\verb|struct|,
\verb|union|, or the empty string) and generates Scheme definitions
which use the \verb|typedef|-name but which refer to the already-defined
functions using the qualified name and structure tag.  For example, if
there is already a foreign function called \verb|_get_struct_X_F| and
the \verb|typedef| name is \verb|Y|, then a variable called \verb|_get_Y_F|
will be bound to the value of the existing foreign function.

@d dump structs and unions
@{(define (generate-reference-to-structure structure typedef-name qualifier)
  (for-each (lambda (n)
	      (let ((newname (compute-newname n typedef-name (tag structure) qualifier)))
		(display `(define ,newname ,n) sch-output)
		(newline sch-output)))
	    (cached-names structure)))
@|  generate-reference-to-structure @}

The procedure \verb|compute-newname| takes an already emitted name and
generates a name which uses the \verb|typedef| name.

@d dump structs and unions 
@{(define (compute-newname oldname typedef-name tag qualifier)
  (let ((q (string-append qualifier "_" tag)))
    (let ((get   (string-append "_get_" q))
	  (set   (string-append "_set_" q))
	  (alloc (string-append "_alloc_" q))
	  (free  (string-append "_free_" q)))
      (cond ((string-prefix=? oldname get)
	     (string-append "_get_" typedef-name (substring oldname (string-length get)
							    (string-length oldname))))
	    ((string-prefix=? oldname set)
	     (string-append "_set_" typedef-name (substring oldname (string-length set)
							    (string-length oldname))))
	    ((string-prefix=? oldname alloc) (string-append "_alloc_" typedef-name))
	    ((string-prefix=? oldname free) (string-append "_free_" typedef-name))
	    (else (error "compute-newname: can't handle: " oldname))))))
@| compute-newname @}

The procedure \verb|dump-struct/union-def| takes a structure type, a
qualifier (a string, either \verb|struct|, \verb|union|, or the empty
string) and the C name of the structure (its tag or \verb|typedef|
name), and generates constructors, destructors, accessors, and mutators
for the structure.

@d dump structs and unions 
@{(define (dump-struct/union-def structure qualifier name)
  (let* ((funcname (if (string=? qualifier "")
		       name
		       (string-append qualifier "_" name)))
	 (cast     (if (string=? qualifier "")
		       name
		       (string-append qualifier " " name))))
    (generate-constructor-and-destructor structure funcname cast)
    (generate-accessors-and-mutators structure funcname cast "")))
@| dump-struct/union-def @}

\subsubsection{Constructors and destructors}

The procedure \verb|generate-constructor-and-destructor| generates
constructor and destructor procedures for its argument.  The constructor
procedure is called \verb|_alloc_struct_X|.  This procedure takes no
parameters and allocates memory for the given type, returning a pointer
(cast to an unsigned integer) or the literal integer 0 (which may be
different from the null pointer).

The destructor procedure is called \verb|_free_struct_X|; it takes a
value returned from the constructor and frees the allocated memory.
Generating a constructor may be excessive; a single interface to
\verb|free()| is probably enough.  The destructors may go away in the
future.

The choice of value for the null pointer warrants discussion.  I use the
integer 0 to make it possible to test for a null pointer by comparing
with 0 on the Scheme side; it is a decision which probably simplifies
the Scheme code.  Using an unadulterated null pointer would make
comparison with 0 nonportable, since a null pointer is not guaranteed to
be all-bits-zero.

An alternative would be to generate foreign procedures which return null
pointers, one procedure for each pointer type in the program, and these
return values could be used for pointer comparisons.\footnote{There's
one null pointer for every pointer type.}

@d dump structs and unions
@{(define (generate-constructor-and-destructor structure funcname cast)
  (function-pair constructor-template 
		 (vector funcname cast)
		 (string-append "_alloc_" funcname)
		 '((void ()))
		 `(pointer ,(struct/union-ref structure)))
  (function-pair destructor-template 
		 (vector funcname cast)
		 (string-append "_free_" funcname)
		 `((pointer ,(struct/union-ref structure)))
		 '(void ()))
  (cache-name structure (string-append "_alloc_" funcname))
  (cache-name structure (string-append "_free_" funcname)))
@| generate-constructor-and-destructor @}

In the templates for constructors and destructors, \verb|@@0| is the
name of the structure as we want it to be generated: \verb|struct_X|,
\verb|union_X|, or \verb|X| (for a \verb|typedef|); and \verb|@@1| is the
cast expression for the type: \verb|struct X|, \verb|union X|, or
\verb|X|.

@d dump structs and unions 
@{(define constructor-template
  "unsigned _alloc_@@0(void) { 
     @@1 *_p = (@@1 *)malloc(sizeof(@@1)); return (_p == 0 ? 0 : (unsigned)_p);
   }")

(define destructor-template
  "void _free_@@0(unsigned _p) { if (_p == 0) abort(); free((@@1 *)_p); }")
@| constructor-template destructor-template @}


\subsubsection{Accessors and Mutators}

For each field \verb|F| of basic or array type there will be an accessor
function \verb|_get_struct_X_F|, and for each field of basic type there
will be a mutator function \verb|_set_struct_X_F|.  The accessor takes a
pointer and returns the field value; the mutator takes a pointer and a
value and updates the field.

If the structure has nested structures, functions are generated which
will access fields in the nested structures in the expected way; the
naming scheme is to replace the ``.'' in the C syntax with an underscore
``\_''.  The generated C code for the accessors and mutators will look
like this:

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@int  _get_struct_X_i(unsigned _p) { return ((struct X*)_p)->i; }@@\\
\mbox{}\verb@@void _set_struct_X_i(unsigned _p, int _v) { ((struct X*)_p)->i = _v; }@@\\
\mbox{}\verb@@unsigned _get_struct_X_u_s(unsigned _p) { return (unsigned)((struct X*)_p)->u.s; }@@\\
\mbox{}\verb@@void _set_struct_X_u_s(unsigned _p, unsigned _v) { ((struct X*)_p)->u.s = (char*)_v; }@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

Procedures are not generated which directly dereference pointer fields;
the program must first fetch the pointer field and then dereference it.
Arrays in structures are handled like pointer fields, so
\verb|_get_struct_X_c| will be generated and will return a pointer to
the first element of \verb|c|, which can then be dereferenced.

The procedure \verb|generate-accessors-and-mutators| takes a structure
type, a name to be used for the function (initially just
\verb|struct_X|, \verb|union_X|, or \verb|X| for a typedef name), a cast
expression (\verb|struct X|, \verb|union X|, or just \verb|X|), and a field
selector expression (initially the empty string).  As the program
descends into the structure the function name and the selector
expression will be updated to reflect the field which is being accessed.
Appended to the function name will be an underscore and the field name,
so we'll see \verb|struct_X_i| and \verb|struct_X_u_s|, for example.
The selector will be the corresponding access expression for C, so
\verb|i| and \verb|u.s|.

When generating accessors and mutators, there are three cases: fields of
some basic type (primitive types and pointers), fields of array type,
and fields of structured type.

@d dump structs and unions 
@{(define (generate-accessors-and-mutators structure funcname cast selector)
  (for-each 
   (lambda (field)
     (let ((funcname (string-append funcname "_" (canonical-name (name field))))
	   (selector (string-append selector (if (string=? selector "") "" ".") (name field))))
       (cond ((basic-type? (type field))
	      (getset-basic-type structure funcname cast selector field))
	     ((array-type? (type field))
	      (getset-array-type structure funcname cast selector field))
	     ((structured-type? (type field))
	      (getset-structured-type structure funcname cast selector field))
	     (else (error 'generate-accessors-and-mutators "Unknown: " field)))))
   (fields structure)))
@| generate-accessors-and-mutators @}

The use of \verb|canonical-name| is important.  This procedure
transforms an identifier from its C syntax into a syntax acceptable to
Scheme by transforming the letters to the canonical case of the
representation.  (For example, if the symbol \verb|a| prints as \verb|A|
then the canonical case of the implementation is upper case.)

Fields of basic types get both accessors and mutators.\footnote{Fields
which are \verb|const| should not have a mutator; when we get around to
implementing general qualifiers on types, this must be fixed.}

@d dump structs and unions 
@{(define (getset-basic-type struct funcname cast selector field)
  (let* ((typename   (basic-type-name (type field)))
	 (fieldtype  (c-cast-expression (type field))))
    (function-pair accessor-template 
		   (vector typename funcname cast selector)
		   (string-append "_get_" funcname)
		   `((pointer ,(struct/union-ref struct)))
		   (type field))
    (function-pair mutator-template
		   (vector typename funcname cast selector fieldtype)
		   (string-append "_set_" funcname)
		   `((pointer ,(struct/union-ref struct)) ,(type field))
		   `(void ()))
    (cache-name struct (string-append "_get_" funcname))
    (cache-name struct (string-append "_set_" funcname))))
@| getset-basic-type @}

In the accessor and mutator templates, \verb|@@0| is the value return or
parameter type (pointers are always ``unsigned''), \verb|@@1| is the
function name, \verb|@@2| is a cast expression for the structure pointer
type, \verb|@@3| is the C field selector expression, and \verb|@@4| is
the cast expression from the parameter type to the field type (used in
mutators only).

@d dump structs and unions 
@{(define accessor-template 
  "@@0 _get_@@1( unsigned _p ) { return (@@0)((@@2*)_p)->@@3; }")

(define mutator-template 
  "void _set_@@1( unsigned _p, @@0 _v ) { ((@@2*)_p)->@@3 = (@@4)_v; }")
@| accessor-template mutator-template @}

Array types are just like basic types, but there is no mutator because
the array name is \verb|const|.  The name of the array denotes a pointer to
the first element, so that's what we return.

@d dump structs and unions 
@{(define (getset-array-type structure funcname cast selector field)
  (function-pair array-accessor-template
		 (vector funcname cast selector)
		 (string-append "_get_" funcname)
		 `((pointer ,(struct/union-ref structure)))
		 '(unsigned))
  (cache-name structure (string-append "_get_" funcname)))

(define array-accessor-template 
  "unsigned _get_@@0( unsigned _p ) { return (unsigned)(((@@1*)_p)->@@2); }")
@| getset-array-type array-accessor-template @}

Structured types are handled by adding the field name to the selector
and recurring; we ignore the distinction between struct and union here
as it doesn't matter.

@d dump structs and unions 
@{(define (getset-structured-type structure funcname cast selector field)
  (let (;(selector (string-append selector "." (name field)))
	;(funcname (string-append funcname "_" (canonical-name (name field))))
	(struct   (if (eq? (record-tag (type field)) 'struct-ref)
		      (lookup (tag (type field)) structs)
		      (lookup (tag (type field)) unions))))
    (generate-accessors-and-mutators struct funcname cast selector)))
@| getset-structured-type @}

\subsubsection{Structures as values}
\label{struct-vals}

As mentioned above, there are no provisions for using structures or
unions as values in the program, not even when they have the size of a
primitive type.  The reason for this is that it is not natural for
Scheme to assign objects which are larger than a single location (and
special-casing small structures because the are small seems silly).  For
example, the following function takes a structure value as an argument
(by copy, that is, by assignment):

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@void f( struct X x ) { ... }@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

There is no way to call this function directly in the generated FFI for
Chez Scheme.  We could fudge our way around it by generating a proxy
function which acts as a level of indirection:

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@void _proxy_f( struct X *x ) { f( *x ); }@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

\noindent but I have not implemented this.  If it becomes a problem it
needs to be fixed.  Libraries which manipulate complex numbers
represented as structures may need it.

Another upshot is that there is no way to ask for the field \verb|u| of
the structure defined earlier.\footnote{There is no way to ask for its
address, either, but that's easy to implement.}

\subsection{Global Variables}

In order to simplify things, global variables are handled by generating
a single function for each global variable; the function takes no
parameters and returns the address of the global.  Technically the
following code should be finessed for arrays, since the type of a global
array is not a pointer to an array, but an array, or at the very least, 
a pointer to the base type of the array.

@d dump global variable accessors 
@{(define (dump-variables)
  (for-each (lambda (v)
	      (let ((n (canonical-name (name v))))
		(function-pair global-template 
			       (vector n (name v))
			       (string-append "_glob_" n)
			       '((void ()))
			       `(pointer ,(type v)))))
	    vars))

(define global-template 
  "unsigned _glob_@@0( void ) { return (unsigned)&@@1; }")
@| dump-vars global-template @}


\subsection{Functions}

Supporting functions in all their generality is pretty much impossible,
and I'm not going to try.  The restrictions are:

\begin{description}

\item[Variadic functions] are not supported.  Such support can be
implemented but the resulting code is target-specific and probably
involves some assembly language.  Some C compilers (like Watcom C)
provide library functions which makes the job much easier, but most
don't.

\item[Old-style prototypes] (that is, declarations of the form
\verb|char *malloc()| where the number and types of the arguments are
not known) are hard, since the native Chez FFI requires an exact
argument count and the type for each argument.  My solution is to
generate a warning for this and translate the interface as if it had no
arguments, which is almost always wrong.

\item[Parameters of function type] work fine, but you can't pass Scheme
procedures to a C function.

\item[Calling through function pointers] is not implemented, since it's hard to
express this in the native FFI, which requires the name of the function
in order to fetch it from a library.  We can get around this by
generating a proxy function for each function pointer type in the header
file and calling the function through it by passing it the function
pointer and the arguments, essentially a trampoline.

\end{description}

\noindent \verb|Dump-functions| creates a Scheme-side interface for each
referenced function.  

@d dump function definitions 
@{(define (dump-functions)
  (for-each (lambda (f) (define-foreign (name f) (type f))) 
	    functions))
@| dump-functions @}

\verb|Define-foreign| checks that every argument type and return type is
valid and attempts to give a reasonable error message if not; if everything
is OK then a Scheme definition for the foreign function is emitted.:

@D dump function definitions 
@{(define (define-foreign name type)
  (let ((argtypes (arglist type))
	(returntype (rett type)))
    (let loop ((l argtypes))
      (cond ((null? l) #t)
	    ((structured-type? (car l))
	     (warn "Cannot pass structured value of type" 
		   (rational-typename (car l))
		   "to function" 
		   name)
	     (set-car! l '(***invalid***))
	     (loop (cdr l)))
	    (else
	     (loop (cdr l)))))
    (if (structured-type? returntype)
	(begin (warn "Cannot receive structured value of type"
		     (rational-typename returntype)
		     "from function"
		     name)
	       (set! returntype '(***invalid***))))

    (write
     `(define ,(string->symbol (canonical-name name))
	(foreign-function ,name
			  ,(chez-map-args argtypes name)
			  ,(chez-type returntype)))
     sch-output)
    (newline sch-output)))
@| define-foreign @}

The argument list must be handled carefully to deal with functions
without prototypes and functions with variable-length parameter lists:

@d dump function definitions 
@{(define (chez-map-args args name)
  (cond ((and (= (length args) 1)
	      (eq? (caar args) 'void))
	 '())
	((= (length args) 0)
	 (warn "Function without prototype assumed to take no arguments:"
	       name)
	 '())
	(else
	 (map (lambda (x)
		(if (eq? (record-tag x) 'void)
		    (begin (warn "Varargs *cannot* be handled for" name)
			   '***invalid***)
		    (chez-type x)))
	      args))))
@| chez-map-args @}

\subsection{Enums}

Enums are straightforward.  There is a definition for each enum-ident:

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@enum { A=100, B, C };@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

\noindent becomes 

\begin{flushleft} \small
\begin{minipage}{\linewidth}
\begin{list}{}{} \item
\mbox{}\verb@@(define a 100)@@\\
\mbox{}\verb@@(define b 101)@@\\
\mbox{}\verb@@(define c 102)@@\\
\end{list}
\vspace{-1ex}
\end{minipage}\\[4ex]
\end{flushleft}
\vspace{-3ex}

\noindent The enum's tag, should it have one, is ignored.  
The code is straightforward:

@d dump enum definitions
@{(define (dump-enums)
  (for-each (lambda (x)
	      (display (instantiate "(define @@0 @@1)"
				    (vector (canonical-name (name x))
					    (number->string (value x))))
		       sch-output)
	      (newline sch-output))
	    enum-idents))
@| dump-enums @}

\subsection{Macros}

Macros are hard in the general case, because the general case is that
the right-hand-side names some arbitrarily complex C statement which
might even depend on variables in some scope not in effect when the
macro is defined.  Handling C macros in their full generality is
therefore not an attainable goal for most targets.  However, we
can still do useful things.

If the left-hand-side is a simple identifier (no parameters) and the
right-hand-side parses as a number or a string, then we can emit a
simple definition which gives the name the value.

Being more ambitious we can take each macro which has parameters and
whose right-hand-side is a closed expression--that is, the macro which
results from expanding all applicable macros on the right-hand-side of
the macro has no free variables--and convert it to a function.

For now, we handle the simple case of an integer right-hand-side only.

@d dump macro definitions 
@{(define (dump-macros)
    (for-each (lambda (m)
		(if (and (valid-ident? (name m))
			 (valid-number? (value m)))
		    (begin 
		      (display `(define ,(canonical-name (name m)) 
				  ,(evaluate-number (value m)))
			       sch-output)
		      (newline sch-output))))
	      macros))
@| dump-macros @}

\verb|Valid-ident| and \verb|valid-number| check that their arguments
are valid C identifier and numbers, respectively.

@d dump macro definitions
@{(define (valid-ident? s)
    (andmap (lambda (c)
	      (or (char-upper-case? c)
		  (char-lower-case? c)
		  (char-numeric? c)
		  (char=? c #\_)))
	    (string->list s)))

(define (valid-number? s)
  (let ((n (evaluate-number s)))
    n))
@| valid-ident? valid-number? @}

\section{Utility functions}
\label{utils}

The procedure \verb|function-pair| generates corresponding definitions
in both languages.

@d utility functions
@{(define (function-pair c-template template-args scheme-name arglist rett)
  (display (instantiate c-template template-args) c-output)
  (newline c-output)
  (define-foreign scheme-name
    `(function ,arglist ,rett)))
@| function-pair @}

The C type name generated for a given basic type is very much a policy
issue; here are our encodings:

@d utility functions
@{(define (basic-type-name type)
  (let ((probe (assq (record-tag type)
		     '((char . "char")
		       (signed-char . "signed char")
		       (unsigned-char . "unsigned char")
		       (short . "short")
		       (unsigned-short "unsigned short")
		       (int . "int")
		       (enum . "int")
		       (unsigned . "unsigned")
		       (long . "long")
		       (unsigned-long . "unsigned long")
		       (void . "void")
		       (pointer . "unsigned")
		       (float . "float")
		       (double . "double")
		       ))))
    (if probe
	(cdr probe)
	(begin (warn "Unknown type " type)
	       "***invalid***"))))
@| basic-type-name @}

The procedure \verb|c-cast-expression| takes a type and returns a
C-syntax expression which can be used in a type cast for casting a value
to the given type.  It's a hard problem and the code below does not do a
very good job; it needs to be refined considerably.  See e.g.~K\&R~2nd
edition section 5.12.

One issue is what happens when we have a structured type which does not
have a user-defined tag.  There will be exactly one \verb|typedef| which
refers to the structure, and that name should be used instead.

@d utility functions
@{(define (c-cast-expression type)
  (cond ((primitive-type? type) 
	 (basic-type-name type))
	((pointer-type? type)
	 (string-append (c-cast-expression (cadr type)) "*"))
	((eq? (record-tag type) 'enum-ref)
	 (basic-type-name '(int ())))
	((memq (record-tag type) '(struct-ref union-ref))
	 (let ((t (tag type)))
	   (if (user-defined-tag? t)
	       (string-append (if (eq? (record-tag type) 'struct-ref) 
				  "struct " 
				  "union ")
			      t)
	       (let ((names (if (eq? (record-tag type) 'struct-ref)
				(struct-names type)
				(union-names type))))
		 (if (= (length names) 1)
		     (car names)
		     (error "c-cast-expression: bad: " type))))))
	(else
	 (warn "c-cast-expression: Too complicated: " type)
	 "unknown")))
@| c-cast-expression @}

\verb|String-prefix=?| takes a string and a prefix and tests whether the
prefix matches the string.

@d utility functions
@{(define (string-prefix=? s prefix)
  (let ((limit (string-length prefix)))
    (and (<= limit (string-length s))
	 (let loop ((i 0))
	   (or (= i limit)
	       (and (char=? (string-ref s i) (string-ref prefix i))
		    (loop (+ i 1))))))))
@| string-prefix=? @}

\verb|Rational-typename| takes a type; if the type is a
\verb|struct-ref| or \verb|union-ref| and the tag is auto-generated,
then a type is returned where the tag is replaced by the unique
\verb|typedef|-name which uses the structure (if one can be found).  This
procedure is useful for giving meaningful error messages.

@d utility functions
@{(define (rational-typename type)
  (case (record-tag type)
    ((struct-ref) 
     (if (user-defined-tag? (tag type)) 
	 type
	 (let ((t (lookup (tag type) structs)))
	   (if (not t)
	       type
	       (list 'struct-ref (tag t))))))
    ((union-ref)
     (if (user-defined-tag? (tag type)) 
	 type
	 (let ((t (lookup (tag type) unions)))
	   (if (not t)
	       type
	       (list 'union-ref (tag t))))))
    (else type)))
@| rational-typename @}

\verb|Evaluate-number| takes a string and attempts to parse it as a C
number, returning the value if the parse was successful and \verb|#f| if
not.  Currently only integers are handled reliably.

@d utility functions
@{(define (evaluate-number s)
  (let ((k (string->list s)))
    (cond ((null? k) #f)
	  ((not (char-numeric? (car k))) #f)
	  ((char=? (car k) #\0)
	   (cond ((null? (cdr k)) 0)
		 ((or (char=? (cadr k) #\x) (char=? (cadr k) #\X))
		  (string->number (list->string (cddr k)) 16))
		 (else
		  (string->number s 8))))
	  (else
	   (string->number s)))))
@| evaluate-number @}

\section{Discussion}
\label{discussion}

Clearly using a function call to access each field is not very
efficient, compared to C.  But if the foreign function interface is
reasonably efficient it shouldn't be too bad.  It's hard to know how
efficient it will be in Chez Scheme without some benchmarking, but a
call-out to a foreign function does not need to be much more expensive
than a call to a procedure in the same language.

\section{Future Work on the Chez Back-end}
\label{future}

\begin{itemize}
\item Handling name clashes, both finding them and working around them 
(if necessary).
\item Dealing with qualifiers on types in general; handling \verb|const| in
      particular.
\item Better generation of C \verb|typedef| casts.
\item Proper choice of names for the output files.
\item Experiment with policy choices defined in a file: for example, a
 file could contain a specification of which prototypes to override
 (like in the \verb|fgets| example), which file names to include or
 exclude, how to name procedures on the Scheme side, and how to compute
 the referenced or unreferenced sets.  Since we are programming in
 Scheme here, these decisions could be implemented as code in a file
 which is loaded at run-time.  As an example of what has been thought
 of but not implemented, read the file \verb|chez-policy.sch| in the 
 current distribution.


\end{itemize}

\section{Support libraries}
\label{stdlib}

The file \verb|chez-stdlib.h| defines the prototypes for the C-side
standard library procedures.

@O chez-stdlib.h 
@{int _ref_int( unsigned _p, int _k );
void _set_int( unsigned _p, int _k, int _v );
unsigned _ref_uint( unsigned _p, int _k );
void _set_uint( unsigned _p, int _k, unsigned _v );
char _ref_char( unsigned _p, int _k );
void _set_char( unsigned _p, int _k, char _v );
double _ref_double( unsigned _p, int _k );
void _set_double( unsigned _p, int _k, double _v );
float _ref_float( unsigned _p, int _k );
void _set_float( unsigned _p, int _k, float _v );
@}

The implementation of the library is in \verb|chez-stdlib.c|; it's
pretty obvious.

@O chez-stdlib.c 
@{#include "chez-stdlib.h"

int _ref_int( unsigned _p, int _k ) { return ((int*)_p)[_k]; }
void _set_int( unsigned _p, int _k, int _v ) { ((int*)_p)[_k] = _v; }

unsigned _ref_uint( unsigned _p, int _k ) { return ((unsigned*)_p)[_k]; }
void _set_uint( unsigned _p, int _k, unsigned _v ) { ((unsigned*)_p)[_k] = _v; }

char _ref_char( unsigned _p, int _k ) { return ((char*)_p)p[_k]; }
void _set_char( unsigned _p, int _k, char _v ) { ((char*)_p)[_k] = _v; }

double _ref_double( unsigned _p, int _k ) { return ((double*)_p)[_k]; }
void _set_double( unsigned _p, int _k, double _v ) { ((double*)_p)[_k] = _v; }

float _ref_float( unsigned _p, int _k ) { return ((float*)_p)[_k]; }
void _set_float( unsigned _p, int _k, float _v ) { ((float*)_p)[_k] = _v; }
@}

Finally, a Scheme file \verb|chez-stdlib.sch| defines the foreign function
interfaces to the standard library:

@O chez-stdlib.sch 
@{(define _ref_int
  (foreign-function "_ref_int" (unsigned-32 integer-32) integer-32))

(define _set_int
  (foreign-function "_set_int" (unsigned-32 integer-32 integer-32) void))

(define _ref_uint
  (foreign-function "_ref_uint" (unsigned-32 integer-32) integer-32))

(define _set_uint
  (foreign-function "_set_uint" (unsigned-32 integer-32 unsigned-32) void))

(define _ref_char
  (foreign-function "_ref_char" (unsigned-32 integer-32) char))

(define _set_char
  (foreign-function "_set_char" (unsigned-32 integer-32 char) void))

(define _ref_double
  (foreign-function "_ref_double" (unsigned-32 integer-32) double-float))

(define _set_double
  (foreign-function "_set_double" (unsigned-32 integer-32 double-float) void))

(define _ref_float
  (foreign-function "_ref_float" (unsigned-32 integer-32) single-float))

(define _set_float
  (foreign-function "_set_float" (unsigned-32 integer-32 single-float) void))

(define _memcpy
  (let ((memcpy-*-*
	 (foreign-function "memcpy" (unsigned-32 unsigned-32 unsigned-32) void))
	(memcpy-string-string
	 (foreign-function "memcpy" (string string unsigned-32) void))
	(memcpy-string-*
	 (foreign-function "memcpy" (string unsigned-32 unsigned-32) void))
	(memcpy-*-string
	 (foreign-function "memcpy" (unsigned-32 string unsigned-32) void)))
    (lambda (a b count)
      (cond ((string? a)
	     (cond ((string? b) (memcpy-string-string a b count))
		   ((integer? b) (memcpy-string-* a b count))
		   (else ???)))
	    ((integer? a)
	     (cond ((string? b) (memcpy-*-string a b count))
		   ((integer? b) (memcpy-*-* a b count))
		   (else ???)))
	    (else ???)))))
@}

\pagebreak
\section*{Index of Identifiers}

The underlined scrap number shows the defining scrap for the identifier;
all other scrap numbers are uses.  Also note that many procedures are
not defined in this file but in the target-independent part of the
back-end, in file {\tt process.sch}.

@u


\end{document}