% -*- scheme -*- % % This is a nuweb document. % nuweb is available by anon. ftp from cs.rice.edu in /public/preston. % % Author: Lars Thomas Hansen (lth@@cs.uoregon.edu) % % Copyright (C) 1996 The University of Oregon. All rights reserved. % % This file may be freely redistributed in its entirety with or without % modification provided that this copyright notice is not removed. It % may not be sold for profit or incorporated in commercial software % products without the prior written permission of the copyright holder. \documentstyle{article} \oddsidemargin 0in \evensidemargin 0in \textwidth 6.5in \topmargin -0.5in \textheight 9in \newcommand{\fixme}[1]{{\Large\bf FIXME: #1}} \title{FFIGEN Back-end for Chez Scheme Version 5\footnote{This work has been supported by ARPA under U.S.~Army grant No.~DABT63-94-C-0029, ``Programming Environments, Compiler Technology and Runtime Systems for Object Oriented Parallel Processing''.}} \author{Lars Thomas Hansen \\ {\tt lth@@cs.uoregon.edu}} \date{February 7, 1996} \begin{document} \maketitle \section{Introduction} This document describes the FFIGEN back-end for Chez Scheme version 5. The exposition is provided in the hope that it will make it reasonably clear how back-ends for FFIGEN are constructed; many back-ends will probably have a structure similar to that of the present program. If you haven't read the {\em FFIGEN Manifesto and Overview}, now is a good time to do so. If you haven't read the {\em FFIGEN User's Manual,\/} which documents the target-independent code for FFIGEN back-ends and the format for the intermediate code, now is a good time to do so. This document is organized in the following manner. Section \ref{overall} documents the overall structure of the translation for Chez Scheme. Section \ref{thecode} presents the policy choices made for data types and most of the code implementing the policy; section \ref{utils} presents several utility procedures used by the implementation. Section \ref{discussion} discusses some of the implementation choices, particularly in terms of performance, and section \ref{future} summarizes future work on the Chez Scheme back-end. Section \ref{stdlib} presents the implementation of the support libraries. This document is also the program: the \LaTeX\ source for the document, the Scheme source code for the program, and the standard library sources have been derived from the same file using the \verb|nuweb| system for literate programming.\footnote{\verb|nuweb| was implemented by Preston Briggs of Rice University; you may retrieve it by anonymous ftp from \verb|cs.rice.edu| in the directory \verb|/public/preston|.} The document presents the program in an order which I believe is conducive to understanding its workings. Here's how you read it: each snippet of code (called a {\em macro\/} or {\em scrap\/}) has a title which includes the page number on which the scrap is defined and a letter following the page number which shows the relative place on that page where you can find the scrap (for example, 9a is the first scrap on page 9 and 6c is the third scrap on page 6). Scraps reference other scraps; these references are the titles, so any cross-reference immediately gives you the page number of the referenced code. In addition, an index of global identifiers can be found in the appendix. \section{Overall Structure} \label{overall} The FFIGEN back-end for any FFI consists of two parts: a target-independent part which reads the output of the front end into global in-memory data structures, and a target-dependent part which does the actual translation. The target-dependent part is implemented by two procedures, \verb|select-functions| and \verb|generate-translation|. \subsection{The {\tt select-functions} procedure} The procedure \verb|select-functions| is called during initial processing to mark as selected those procedures which are interesting for the translation. This makes it possible to exclude some functions from the translation. For example, if you are translating the interface to a library whose header file includes \verb|stdio.h|, you may wish to exclude all function prototypes included from the latter file. Since each function record in the intermediate form has a field naming the file the record came from, you can easily exclude functions from \verb|stdio.h|. The implementation of \verb|select-functions| for Chez Scheme simply marks all functions as selected, by turning on the ``referenced'' bit in the record. @d the select-functions interface procedure @{(define (select-functions) (do ((functions functions (cdr functions))) ((null? functions)) (referenced! (car functions)))) @| select-functions @} \subsection{The {\tt generate-translation} procedure} The procedure \verb|generate-translation| is called when the target-independent part is done processing the data, and it must implement all aspects of the target-dependent translation. It takes no arguments. My version sets up the output files, initializes the files, calls the generating functions, and cleans up. @d the generate-translation interface procedure @{(define (generate-translation) @ @ @ #t) @| generate-translation @} \verb|generate-translation| calls on specialized procedures to generate code for the various intermediate language records: @d generate code for all constructs @{(dump-structs) (dump-unions) (dump-functions) (dump-variables) (dump-enums) (dump-macros) @} \subsection{Structure of the Scheme source file} The final file is called \verb|chez.sch| and is laid out in the following way: @o chez.sch @{@ @ @ @ @ @ @ @ @ @ @} \subsection{The standard library} To facilitate the translation, a small standard library has been created ahead of time (see section \ref{stdlib} for the implementation). The library implements primitive dereferencing functions and a memory copy procedure. There is a C file and a Scheme file. The C file must be compiled and loaded into the Scheme system using the FFI facilities for this (which happen to be operating-system dependent in Chez Scheme). The Scheme FFI code for the standard library must be loaded into the Scheme system at run time. \section{Policy in the Chez Scheme Back-end} \label{thecode} Chez Scheme version 5 has a fairly simple native FFI. It supports most primitive C data types for both parameters and return values, and performs data conversions on call and return. A string is passed as a \verb|char*| to the first character (in Chez Scheme, strings are 0-terminated for C compatibility), but is returned by copying the data into a fresh Scheme string. There is no direct support for the \verb|short| datatype, variadic procedures, call by reference, calling functions through a function pointer, struct/union arguments, or struct/union return values, nor is there support for passing general Scheme objects like pairs, arrays, or functions to foreign functions. In this (first) version of the translation I have decided not to implement any mechanisms for handling special cases like the \verb|fgets()| example outlined in the {\em Manifesto}; that will have to come later. Such a mechanism might take the form of a file containing rules for how to override certain function signatures, for example. See section \ref{future}. \subsection{The Output} \label{the-output} Two output files are produced: a C file and a Scheme file. The Scheme file will hold the FFI code which interfaces to the library and its data types. The C file will hold supporting (``glue'') code generated along the way: accessors, mutators, and so on. The need for supporting code will become clear as you read. \verb|c-output| and \verb|sch-output| are global variables which hold output ports for generated C code and Scheme code, respectively. @d back-end global variables @{(define c-output #f) (define sch-output #f) @| c-output sch-output @} The files are initialized in \verb|generate-translation| by computing file names and assigning opened ports to the global variables; for simplicity I use fixed names for the time being. The use of \verb|delete-file| is necessary in Chez Scheme, lest it complain when opening an existing file for output. @d initialize output files @{(delete-file "C-OUTPUT") (delete-file "SCH-OUTPUT") (set! c-output (open-output-file "C-OUTPUT")) (set! sch-output (open-output-file "SCH-OUTPUT")) @} Once the files are opened they are initialized with standard definitions. The C file \verb|#include|s the FFIGEN standard library header and the ANSI C standard library header; there is nothing to be done in the Scheme case. We should probably also emit a line here which includes the original header file, but the name of that file is not currently available in the intermediate format. @d initialize output files @{(display "#include \"chez-stdlib.h\"" c-output) (newline c-output) (display "#include \"stdlib.h\"" c-output) (newline c-output) @} When processing is done, the output files must be closed: @d finalize output files @{(close-output-port c-output) (close-output-port sch-output) @} \subsection{Names} Since Scheme is case-insensitive and C is not, names which do not clash in C may do so in Scheme. This can be handled fairly simply by keeping track of all generated Scheme names and warning about conflicts. In this version I have not implemented this, but it will be implemented in the future. In general, parameter names in generated C procedures are prepended with underscores to prevent name clashes with \verb|typedef|s: names starting with a single underscore are reserved for libraries and will in principle not appear in user code. However, this scheme is not foolproof, considering that FFIGEN is used to translate library code. We might need to come up with something better. \subsection{Primitive Types} The procedure \verb|chez-type| takes a primitive type structure and returns the name of the Chez Scheme parameter or return type. Already we have to make hard choices: what do we do with pointers, in particular character pointers? In the current implementation I let a \verb|char*| be a \verb|char*|; programs which wish to pass strings will have to first copy the string data into an allocated character buffer and then pass the address of the buffer. A function which performs the copy operation is part of the standard library. Other decisions I have made are to represent pointers as unsigned 32-bit ints (hence all pointers are the same size), and to represent \verb|char|, \verb|signed char|, and \verb|unsigned char| all as the same \verb|char| FFI type. If the type is a \verb|short| variant, a warning is generated.% \footnote{Chez Scheme does not support a \verb|short| FFI type. Since the ANSI C standard (or at least K\&R 2nd edition) seems to say that a \verb|short| can be passed to a function with a prototype in scope without being widened to \verb|int|, we can't simply use \verb|integer-32| as the FFI argument type for a \verb|short|. On many architectures using a 32-bit integer would work because the APIs don't support parameters smaller than 32 bits. We could also have used proxy functions; see section \ref{struct-vals}.} Enums are assumed to be 32-bit integers. @d Chez FFI names for primitive types @{(define (chez-type type) (case (record-tag type) ((pointer) 'unsigned-32) ((int long enum) 'integer-32) ((unsigned unsigned-long) 'unsigned-32) ((char unsigned-char signed-char) 'char) ((void) 'void) ((double) 'double-float) ((float) 'single-float) ((***invalid***) '***invalid***) (else (warn "Cannot translate this type: " type) (string->symbol (string-append (symbol->string '***invalid:) (symbol->string (record-tag type)) "***"))))) @| chez-type @} The special type \verb|***invalid***| and its variant \verb|***invalid:tag***| are used to signal errors detected during translation; by using these types in the output, no file that was generated with errors can actually be used. In addition, we have to consider how to dereference pointers to primitive types (pointers to structured types are handled later). For example, if I have an \verb|int*| and I want to access the integer it points to, how do I do this? For this purpose, dereferencing functions are provided in the standard library. Each function takes a pointer \verb|p| and an offset \verb|k| and fetches the element \verb|p[k]|. There are also functions which store values through pointers to primitive types. See section \ref{stdlib} for the signatures and the actual implementation. By treating pointers to pointers as generic pointer types, we handle the general case of recursive dereferencing. Since pointers and integers are the same size and type checking is lax, we can use the unsigned integer access functions to follow pointer chains. For example, if we have an \verb|int** p| and want the integer, the code to access it would be (in C) \verb|_ref_int(_ref_uint(p,0),0)|. \subsection{Structured Types} The translations for structs and unions are similar. I'll talk about structures, but the discussion pertains to unions as well. The procedures \verb|dump-structs| and \verb|dump-unions| are almost identical, and call the same function to do their work: @d dump structs and unions @{(define (dump-structs) (dump-struct/union structs struct-names "struct")) (define (dump-unions) (dump-struct/union unions union-names "union")) @| dump-structs dump-unions @} \verb|struct-names| and \verb|union-names| are procedures which return a list of all \verb|typedef|s which directly name a struct or union; see the {\em FFIGEN User's Manual} for a discussion. In the following, we will use this structure as an example: \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@struct X {@@\\ \mbox{}\verb@@ int i;@@\\ \mbox{}\verb@@ char c[5];@@\\ \mbox{}\verb@@ union {@@\\ \mbox{}\verb@@ char *s;@@\\ \mbox{}\verb@@ int *q;@@\\ \mbox{}\verb@@ } u;@@\\ \mbox{}\verb@@};@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} Each structure in the intermediate form has a tag (\verb|X| in the example), although some of these tags were generated by the compiler. Whether the tag was programmer-defined or not is important for how code is generated (see below), and the procedure \verb|dump-struct/union| takes it into account when generating FFI code for each referenced structure. Only structures which have programmer-defined tags have operations generated for them with the name of the tag. Structures which are referred to by a \verb|typedef| must be handled specially. There are two cases. If the \verb|typedef|'d structure also has a programmer-defined tag, then Scheme definitions are emitted which define operations using the \verb|typedef| name to be the same as the operations using the structure tag. If the type does not have a programmer-defined tag, then operations are defined on the structure using the \verb|typedef| name only; you get names like \verb|_get_X_i|. @d dump structs and unions @{(define (dump-struct/union records typedef-name-getter qualifier) (for-each (lambda (structure) (if (referenced? structure) (begin (if (user-defined-tag? (tag structure)) (dump-struct/union-def structure qualifier (tag structure))) (for-each (lambda (n) (if (user-defined-tag? (tag structure)) (generate-reference-to-structure structure n qualifier) (dump-struct/union-def structure "" n))) (typedef-name-getter structure))))) records)) @| dump-struct/union @} The procedure \verb|generate-reference-to-structure| takes a structure which has a cached list of defined constructor, destructor, accessor, and mutator function, a \verb|typedef|-name, and a qualifier (\verb|struct|, \verb|union|, or the empty string) and generates Scheme definitions which use the \verb|typedef|-name but which refer to the already-defined functions using the qualified name and structure tag. For example, if there is already a foreign function called \verb|_get_struct_X_F| and the \verb|typedef| name is \verb|Y|, then a variable called \verb|_get_Y_F| will be bound to the value of the existing foreign function. @d dump structs and unions @{(define (generate-reference-to-structure structure typedef-name qualifier) (for-each (lambda (n) (let ((newname (compute-newname n typedef-name (tag structure) qualifier))) (display `(define ,newname ,n) sch-output) (newline sch-output))) (cached-names structure))) @| generate-reference-to-structure @} The procedure \verb|compute-newname| takes an already emitted name and generates a name which uses the \verb|typedef| name. @d dump structs and unions @{(define (compute-newname oldname typedef-name tag qualifier) (let ((q (string-append qualifier "_" tag))) (let ((get (string-append "_get_" q)) (set (string-append "_set_" q)) (alloc (string-append "_alloc_" q)) (free (string-append "_free_" q))) (cond ((string-prefix=? oldname get) (string-append "_get_" typedef-name (substring oldname (string-length get) (string-length oldname)))) ((string-prefix=? oldname set) (string-append "_set_" typedef-name (substring oldname (string-length set) (string-length oldname)))) ((string-prefix=? oldname alloc) (string-append "_alloc_" typedef-name)) ((string-prefix=? oldname free) (string-append "_free_" typedef-name)) (else (error "compute-newname: can't handle: " oldname)))))) @| compute-newname @} The procedure \verb|dump-struct/union-def| takes a structure type, a qualifier (a string, either \verb|struct|, \verb|union|, or the empty string) and the C name of the structure (its tag or \verb|typedef| name), and generates constructors, destructors, accessors, and mutators for the structure. @d dump structs and unions @{(define (dump-struct/union-def structure qualifier name) (if (not (null? (fields structure))) (let* ((funcname (if (string=? qualifier "") name (string-append qualifier "_" name))) (cast (if (string=? qualifier "") name (string-append qualifier " " name)))) (generate-constructor-and-destructor structure funcname cast) (generate-accessors-and-mutators structure funcname cast "")))) @| dump-struct/union-def @} Constructors, destructors, accessors and mutators are generated only if the field list for the structure is non-empty, as an empty field lists implies that the structure has merely been declared. \subsubsection{Constructors and destructors} The procedure \verb|generate-constructor-and-destructor| generates constructor and destructor procedures for its argument. The constructor procedure is called \verb|_alloc_struct_X|. This procedure takes no parameters and allocates memory for the given type, returning a pointer (cast to an unsigned integer) or the literal integer 0 (which may be different from the null pointer). The destructor procedure is called \verb|_free_struct_X|; it takes a value returned from the constructor and frees the allocated memory. Generating a constructor may be excessive; a single interface to \verb|free()| is probably enough. The destructors may go away in the future. The choice of value for the null pointer warrants discussion. I use the integer 0 to make it possible to test for a null pointer by comparing with 0 on the Scheme side; it is a decision which probably simplifies the Scheme code. Using an unadulterated null pointer would make comparison with 0 nonportable, since a null pointer is not guaranteed to be all-bits-zero. An alternative would be to generate foreign procedures which return null pointers, one procedure for each pointer type in the program, and these return values could be used for pointer comparisons.\footnote{There's one null pointer for every pointer type.} @d dump structs and unions @{(define (generate-constructor-and-destructor structure funcname cast) (function-pair constructor-template (vector funcname cast) (string-append "_alloc_" funcname) '((void ())) `(pointer ,(struct/union-ref structure))) (function-pair destructor-template (vector funcname cast) (string-append "_free_" funcname) `((pointer ,(struct/union-ref structure))) '(void ())) (cache-name structure (string-append "_alloc_" funcname)) (cache-name structure (string-append "_free_" funcname))) @| generate-constructor-and-destructor @} In the templates for constructors and destructors, \verb|@@0| is the name of the structure as we want it to be generated: \verb|struct_X|, \verb|union_X|, or \verb|X| (for a \verb|typedef|); and \verb|@@1| is the cast expression for the type: \verb|struct X|, \verb|union X|, or \verb|X|. @d dump structs and unions @{(define constructor-template "unsigned _alloc_@@0(void) { @@1 *_p = (@@1 *)malloc(sizeof(@@1)); return (_p == 0 ? 0 : (unsigned)_p); }") (define destructor-template "void _free_@@0(unsigned _p) { if (_p == 0) abort(); free((@@1 *)_p); }") @| constructor-template destructor-template @} \subsubsection{Accessors and Mutators} For each field \verb|F| of basic or array type there will be an accessor function \verb|_get_struct_X_F|, and for each field of basic type there will be a mutator function \verb|_set_struct_X_F|. The accessor takes a pointer and returns the field value; the mutator takes a pointer and a value and updates the field. If the structure has nested structures, functions are generated which will access fields in the nested structures in the expected way; the naming scheme is to replace the ``.'' in the C syntax with an underscore ``\_''. The generated C code for the accessors and mutators will look like this: \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@int _get_struct_X_i(unsigned _p) { return ((struct X*)_p)->i; }@@\\ \mbox{}\verb@@void _set_struct_X_i(unsigned _p, int _v) { ((struct X*)_p)->i = _v; }@@\\ \mbox{}\verb@@unsigned _get_struct_X_u_s(unsigned _p) { return (unsigned)((struct X*)_p)->u.s; }@@\\ \mbox{}\verb@@void _set_struct_X_u_s(unsigned _p, unsigned _v) { ((struct X*)_p)->u.s = (char*)_v; }@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} Procedures are not generated which directly dereference pointer fields; the program must first fetch the pointer field and then dereference it. Arrays in structures are handled like pointer fields, so \verb|_get_struct_X_c| will be generated and will return a pointer to the first element of \verb|c|, which can then be dereferenced. The procedure \verb|generate-accessors-and-mutators| takes a structure type, a name to be used for the function (initially just \verb|struct_X|, \verb|union_X|, or \verb|X| for a typedef name), a cast expression (\verb|struct X|, \verb|union X|, or just \verb|X|), and a field selector expression (initially the empty string). As the program descends into the structure the function name and the selector expression will be updated to reflect the field which is being accessed. Appended to the function name will be an underscore and the field name, so we'll see \verb|struct_X_i| and \verb|struct_X_u_s|, for example. The selector will be the corresponding access expression for C, so \verb|i| and \verb|u.s|. When generating accessors and mutators, there are three cases: fields of some basic type (primitive types and pointers), fields of array type, and fields of structured type. @d dump structs and unions @{(define (generate-accessors-and-mutators structure funcname cast selector) (for-each (lambda (field) (let ((funcname (string-append funcname "_" (canonical-name (name field)))) (selector (string-append selector (if (string=? selector "") "" ".") (name field)))) (cond ((basic-type? (type field)) (getset-basic-type structure funcname cast selector field)) ((array-type? (type field)) (getset-array-type structure funcname cast selector field)) ((structured-type? (type field)) (getset-structured-type structure funcname cast selector field)) (else (error 'generate-accessors-and-mutators "Unknown: " field))))) (fields structure))) @| generate-accessors-and-mutators @} The use of \verb|canonical-name| is important. This procedure transforms an identifier from its C syntax into a syntax acceptable to Scheme by transforming the letters to the canonical case of the representation. (For example, if the symbol \verb|a| prints as \verb|A| then the canonical case of the implementation is upper case.) Fields of basic types get both accessors and mutators.\footnote{Fields which are \verb|const| should not have a mutator; when we get around to implementing general qualifiers on types, this must be fixed.} @d dump structs and unions @{(define (getset-basic-type struct funcname cast selector field) (let* ((typename (basic-type-name (type field))) (fieldtype (c-cast-expression (type field)))) (function-pair accessor-template (vector typename funcname cast selector) (string-append "_get_" funcname) `((pointer ,(struct/union-ref struct))) (type field)) (function-pair mutator-template (vector typename funcname cast selector fieldtype) (string-append "_set_" funcname) `((pointer ,(struct/union-ref struct)) ,(type field)) `(void ())) (cache-name struct (string-append "_get_" funcname)) (cache-name struct (string-append "_set_" funcname)))) @| getset-basic-type @} In the accessor and mutator templates, \verb|@@0| is the value return or parameter type (pointers are always ``unsigned''), \verb|@@1| is the function name, \verb|@@2| is a cast expression for the structure pointer type, \verb|@@3| is the C field selector expression, and \verb|@@4| is the cast expression from the parameter type to the field type (used in mutators only). @d dump structs and unions @{(define accessor-template "@@0 _get_@@1( unsigned _p ) { return (@@0)((@@2*)_p)->@@3; }") (define mutator-template "void _set_@@1( unsigned _p, @@0 _v ) { ((@@2*)_p)->@@3 = (@@4)_v; }") @| accessor-template mutator-template @} Array types are just like basic types, but there is no mutator because the array name is \verb|const|. The name of the array denotes a pointer to the first element, so that's what we return. @d dump structs and unions @{(define (getset-array-type structure funcname cast selector field) (function-pair array-accessor-template (vector funcname cast selector) (string-append "_get_" funcname) `((pointer ,(struct/union-ref structure))) '(unsigned)) (cache-name structure (string-append "_get_" funcname))) (define array-accessor-template "unsigned _get_@@0( unsigned _p ) { return (unsigned)(((@@1*)_p)->@@2); }") @| getset-array-type array-accessor-template @} Structured types are handled by adding the field name to the selector and recurring; we ignore the distinction between struct and union here as it doesn't matter. @d dump structs and unions @{(define (getset-structured-type structure funcname cast selector field) (let (;(selector (string-append selector "." (name field))) ;(funcname (string-append funcname "_" (canonical-name (name field)))) (struct (if (eq? (record-tag (type field)) 'struct-ref) (lookup (tag (type field)) structs) (lookup (tag (type field)) unions)))) (generate-accessors-and-mutators struct funcname cast selector))) @| getset-structured-type @} \subsubsection{Structures as values} \label{struct-vals} As mentioned above, there are no provisions for using structures or unions as values in the program, not even when they have the size of a primitive type. The reason for this is that it is not natural for Scheme to assign objects which are larger than a single location (and special-casing small structures because the are small seems silly). For example, the following function takes a structure value as an argument (by copy, that is, by assignment): \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@void f( struct X x ) { ... }@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} There is no way to call this function directly in the generated FFI for Chez Scheme. We could fudge our way around it by generating a proxy function which acts as a level of indirection: \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@void _proxy_f( struct X *x ) { f( *x ); }@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} \noindent but I have not implemented this. If it becomes a problem it needs to be fixed. Libraries which manipulate complex numbers represented as structures may need it. Another upshot is that there is no way to ask for the field \verb|u| of the structure defined earlier.\footnote{There is no way to ask for its address, either, but that's easy to implement.} \subsection{Global Variables} In order to simplify things, global variables are handled by generating a single function for each global variable; the function takes no parameters and returns the address of the global. Technically the following code should be finessed for arrays, since the type of a global array is not a pointer to an array, but an array, or at the very least, a pointer to the base type of the array. @d dump global variable accessors @{(define (dump-variables) (for-each (lambda (v) (let ((n (canonical-name (name v)))) (function-pair global-template (vector n (name v)) (string-append "_glob_" n) '((void ())) `(pointer ,(type v))))) vars)) (define global-template "unsigned _glob_@@0( void ) { return (unsigned)&@@1; }") @| dump-vars global-template @} \subsection{Functions} Supporting functions in all their generality is pretty much impossible, and I'm not going to try. The restrictions are: \begin{description} \item[Variadic functions] are not supported. Such support can be implemented but the resulting code is target-specific and probably involves some assembly language. Some C compilers (like Watcom C) provide library functions which makes the job much easier, but most don't. \item[Old-style prototypes] (that is, declarations of the form \verb|char *malloc()| where the number and types of the arguments are not known) are hard, since the native Chez FFI requires an exact argument count and the type for each argument. My solution is to generate a warning for this and translate the interface as if it had no arguments, which is almost always wrong. \item[Parameters of function type] work fine, but you can't pass Scheme procedures to a C function. \item[Calling through function pointers] is not implemented, since it's hard to express this in the native FFI, which requires the name of the function in order to fetch it from a library. We can get around this by generating a proxy function for each function pointer type in the header file and calling the function through it by passing it the function pointer and the arguments, essentially a trampoline. \end{description} \noindent \verb|Dump-functions| creates a Scheme-side interface for each referenced function. @d dump function definitions @{(define (dump-functions) (for-each (lambda (f) (define-foreign (name f) (type f))) functions)) @| dump-functions @} \verb|Define-foreign| checks that every argument type and return type is valid and attempts to give a reasonable error message if not; if everything is OK then a Scheme definition for the foreign function is emitted.: @D dump function definitions @{(define (define-foreign name type) (let ((argtypes (arglist type)) (returntype (rett type))) (let loop ((l argtypes)) (cond ((null? l) #t) ((structured-type? (car l)) (warn "Cannot pass structured value of type" (rational-typename (car l)) "to function" name) (set-car! l '(***invalid***)) (loop (cdr l))) (else (loop (cdr l))))) (if (structured-type? returntype) (begin (warn "Cannot receive structured value of type" (rational-typename returntype) "from function" name) (set! returntype '(***invalid***)))) (write `(define ,(string->symbol (canonical-name name)) (foreign-function ,name ,(chez-map-args argtypes name) ,(chez-type returntype))) sch-output) (newline sch-output))) @| define-foreign @} The argument list must be handled carefully to deal with functions without prototypes and functions with variable-length parameter lists: @d dump function definitions @{(define (chez-map-args args name) (cond ((and (= (length args) 1) (eq? (caar args) 'void)) '()) ((= (length args) 0) (warn "Function without prototype assumed to take no arguments:" name) '()) (else (map (lambda (x) (if (eq? (record-tag x) 'void) (begin (warn "Varargs *cannot* be handled for" name) '***invalid***) (chez-type x))) args)))) @| chez-map-args @} \subsection{Enums} Enums are straightforward. There is a definition for each enum-ident: \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@enum { A=100, B, C };@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} \noindent becomes \begin{flushleft} \small \begin{minipage}{\linewidth} \begin{list}{}{} \item \mbox{}\verb@@(define a 100)@@\\ \mbox{}\verb@@(define b 101)@@\\ \mbox{}\verb@@(define c 102)@@\\ \end{list} \vspace{-1ex} \end{minipage}\\[4ex] \end{flushleft} \vspace{-3ex} \noindent The enum's tag, should it have one, is ignored. The code is straightforward: @d dump enum definitions @{(define (dump-enums) (for-each (lambda (x) (display (instantiate "(define @@0 @@1)" (vector (canonical-name (name x)) (number->string (value x)))) sch-output) (newline sch-output)) enum-idents)) @| dump-enums @} \subsection{Macros} Macros are hard in the general case, because the general case is that the right-hand-side names some arbitrarily complex C statement which might even depend on variables in some scope not in effect when the macro is defined. Handling C macros in their full generality is therefore not an attainable goal for most targets. However, we can still do useful things. If the left-hand-side is a simple identifier (no parameters) and the right-hand-side parses as a number or a string, then we can emit a simple definition which gives the name the value. Being more ambitious we can take each macro which has parameters and whose right-hand-side is a closed expression--that is, the macro which results from expanding all applicable macros on the right-hand-side of the macro has no free variables--and convert it to a function. For now, we handle the simple case of an integer right-hand-side only. @d dump macro definitions @{(define (dump-macros) (for-each (lambda (m) (if (and (valid-ident? (name m)) (valid-number? (value m))) (begin (display `(define ,(canonical-name (name m)) ,(evaluate-number (value m))) sch-output) (newline sch-output)))) macros)) @| dump-macros @} \verb|Valid-ident| and \verb|valid-number| check that their arguments are valid C identifier and numbers, respectively. @d dump macro definitions @{(define (valid-ident? s) (andmap (lambda (c) (or (char-upper-case? c) (char-lower-case? c) (char-numeric? c) (char=? c #\_))) (string->list s))) (define (valid-number? s) (let ((n (evaluate-number s))) n)) @| valid-ident? valid-number? @} \section{Utility functions} \label{utils} The procedure \verb|function-pair| generates corresponding definitions in both languages. @d utility functions @{(define (function-pair c-template template-args scheme-name arglist rett) (display (instantiate c-template template-args) c-output) (newline c-output) (define-foreign scheme-name `(function ,arglist ,rett))) @| function-pair @} The C type name generated for a given basic type is very much a policy issue; here are our encodings: @d utility functions @{(define (basic-type-name type) (let ((probe (assq (record-tag type) '((char . "char") (signed-char . "signed char") (unsigned-char . "unsigned char") (short . "short") (unsigned-short . "unsigned short") (int . "int") (enum . "int") (unsigned . "unsigned") (long . "long") (unsigned-long . "unsigned long") (void . "void") (pointer . "unsigned") (float . "float") (double . "double") )))) (if probe (cdr probe) (begin (warn "Unknown type " type) "***invalid***")))) @| basic-type-name @} The procedure \verb|c-cast-expression| takes a type and returns a C-syntax expression which can be used in a type cast for casting a value to the given type. It's a hard problem and the code below does not do a very good job; it needs to be refined considerably. See e.g.~K\&R~2nd edition section 5.12. One issue is what happens when we have a structured type which does not have a user-defined tag. There will be exactly one \verb|typedef| which refers to the structure, and that name should be used instead. @d utility functions @{(define (c-cast-expression type) (define (function-cast stars ftype) (string-append (c-cast-expression (rett ftype)) "(" stars ")(" (apply string-append (insert "," (map c-cast-expression (arglist ftype)))) ")")) (cond ((primitive-type? type) (basic-type-name type)) ((pointer-type? type) (let loop ((t (cadr type)) (stars "*")) (cond ((eq? 'function (record-tag t)) (function-cast stars t)) ((eq? 'pointer (record-tag t)) (loop (cadr t) (string-append "*" stars))) (else (string-append (c-cast-expression (cadr type)) "*"))))) ((eq? (record-tag type) 'enum-ref) (basic-type-name '(int ()))) ((memq (record-tag type) '(struct-ref union-ref)) (let ((t (tag type))) (if (user-defined-tag? t) (string-append (if (eq? (record-tag type) 'struct-ref) "struct " "union ") t) (let ((names (if (eq? (record-tag type) 'struct-ref) (struct-names type) (union-names type)))) (if (= (length names) 1) (car names) (error "c-cast-expression: bad: " type)))))) (else (warn "c-cast-expression: Too complicated: " type) "unknown"))) (define (insert x l) (define (loop l) (if (null? (cdr l)) l (cons (car l) (cons x (loop (cdr l)))))) (if (or (null? l) (null? (cdr l))) l (loop l))) @| c-cast-expression insert @} \verb|String-prefix=?| takes a string and a prefix and tests whether the prefix matches the string. @d utility functions @{(define (string-prefix=? s prefix) (let ((limit (string-length prefix))) (and (<= limit (string-length s)) (let loop ((i 0)) (or (= i limit) (and (char=? (string-ref s i) (string-ref prefix i)) (loop (+ i 1)))))))) @| string-prefix=? @} \verb|Rational-typename| takes a type; if the type is a \verb|struct-ref| or \verb|union-ref| and the tag is auto-generated, then a type is returned where the tag is replaced by the unique \verb|typedef|-name which uses the structure (if one can be found). This procedure is useful for giving meaningful error messages. @d utility functions @{(define (rational-typename type) (case (record-tag type) ((struct-ref) (if (user-defined-tag? (tag type)) type (let ((t (lookup (tag type) structs))) (if (not t) type (list 'struct-ref (tag t)))))) ((union-ref) (if (user-defined-tag? (tag type)) type (let ((t (lookup (tag type) unions))) (if (not t) type (list 'union-ref (tag t)))))) (else type))) @| rational-typename @} \verb|Evaluate-number| takes a string and attempts to parse it as a C number, returning the value if the parse was successful and \verb|#f| if not. Currently only integers are handled reliably. @d utility functions @{(define (evaluate-number s) (let ((k (string->list s))) (cond ((null? k) #f) ((not (char-numeric? (car k))) #f) ((char=? (car k) #\0) (cond ((null? (cdr k)) 0) ((or (char=? (cadr k) #\x) (char=? (cadr k) #\X)) (string->number (list->string (cddr k)) 16)) (else (string->number s 8)))) (else (string->number s))))) @| evaluate-number @} \section{Discussion} \label{discussion} Clearly using a function call to access each field is not very efficient, compared to C. But if the foreign function interface is reasonably efficient it shouldn't be too bad. It's hard to know how efficient it will be in Chez Scheme without some benchmarking, but a call-out to a foreign function does not need to be much more expensive than a call to a procedure in the same language. \section{Future Work on the Chez Back-end} \label{future} \begin{itemize} \item Handling name clashes, both finding them and working around them (if necessary). \item Dealing with qualifiers on types in general; handling \verb|const| in particular. \item Better generation of C \verb|typedef| casts. \item Proper choice of names for the output files. \item Experiment with policy choices defined in a file: for example, a file could contain a specification of which prototypes to override (like in the \verb|fgets| example), which file names to include or exclude, how to name procedures on the Scheme side, and how to compute the referenced or unreferenced sets. Since we are programming in Scheme here, these decisions could be implemented as code in a file which is loaded at run-time. As an example of what has been thought of but not implemented, read the file \verb|chez-policy.sch| in the current distribution. \end{itemize} \section{Support libraries} \label{stdlib} The file \verb|chez-stdlib.h| defines the prototypes for the C-side standard library procedures. @O chez-stdlib.h @{int _ref_int( unsigned _p, int _k ); void _set_int( unsigned _p, int _k, int _v ); unsigned _ref_uint( unsigned _p, int _k ); void _set_uint( unsigned _p, int _k, unsigned _v ); char _ref_char( unsigned _p, int _k ); void _set_char( unsigned _p, int _k, char _v ); double _ref_double( unsigned _p, int _k ); void _set_double( unsigned _p, int _k, double _v ); float _ref_float( unsigned _p, int _k ); void _set_float( unsigned _p, int _k, float _v ); @} The implementation of the library is in \verb|chez-stdlib.c|; it's pretty obvious. @O chez-stdlib.c @{#include "chez-stdlib.h" int _ref_int( unsigned _p, int _k ) { return ((int*)_p)[_k]; } void _set_int( unsigned _p, int _k, int _v ) { ((int*)_p)[_k] = _v; } unsigned _ref_uint( unsigned _p, int _k ) { return ((unsigned*)_p)[_k]; } void _set_uint( unsigned _p, int _k, unsigned _v ) { ((unsigned*)_p)[_k] = _v; } char _ref_char( unsigned _p, int _k ) { return ((char*)_p)p[_k]; } void _set_char( unsigned _p, int _k, char _v ) { ((char*)_p)[_k] = _v; } double _ref_double( unsigned _p, int _k ) { return ((double*)_p)[_k]; } void _set_double( unsigned _p, int _k, double _v ) { ((double*)_p)[_k] = _v; } float _ref_float( unsigned _p, int _k ) { return ((float*)_p)[_k]; } void _set_float( unsigned _p, int _k, float _v ) { ((float*)_p)[_k] = _v; } @} Finally, a Scheme file \verb|chez-stdlib.sch| defines the foreign function interfaces to the standard library: @O chez-stdlib.sch @{(define _ref_int (foreign-function "_ref_int" (unsigned-32 integer-32) integer-32)) (define _set_int (foreign-function "_set_int" (unsigned-32 integer-32 integer-32) void)) (define _ref_uint (foreign-function "_ref_uint" (unsigned-32 integer-32) integer-32)) (define _set_uint (foreign-function "_set_uint" (unsigned-32 integer-32 unsigned-32) void)) (define _ref_char (foreign-function "_ref_char" (unsigned-32 integer-32) char)) (define _set_char (foreign-function "_set_char" (unsigned-32 integer-32 char) void)) (define _ref_double (foreign-function "_ref_double" (unsigned-32 integer-32) double-float)) (define _set_double (foreign-function "_set_double" (unsigned-32 integer-32 double-float) void)) (define _ref_float (foreign-function "_ref_float" (unsigned-32 integer-32) single-float)) (define _set_float (foreign-function "_set_float" (unsigned-32 integer-32 single-float) void)) (define _memcpy (let ((memcpy-*-* (foreign-function "memcpy" (unsigned-32 unsigned-32 unsigned-32) void)) (memcpy-string-string (foreign-function "memcpy" (string string unsigned-32) void)) (memcpy-string-* (foreign-function "memcpy" (string unsigned-32 unsigned-32) void)) (memcpy-*-string (foreign-function "memcpy" (unsigned-32 string unsigned-32) void))) (lambda (a b count) (cond ((string? a) (cond ((string? b) (memcpy-string-string a b count)) ((integer? b) (memcpy-string-* a b count)) (else ???))) ((integer? a) (cond ((string? b) (memcpy-*-string a b count)) ((integer? b) (memcpy-*-* a b count)) (else ???))) (else ???))))) @} \pagebreak \section*{Index of Identifiers} The underlined scrap number shows the defining scrap for the identifier; all other scrap numbers are uses. Also note that many procedures are not defined in this file but in the target-independent part of the back-end, in file {\tt process.sch}. @u \end{document}