555 lines
22 KiB
HTML
555 lines
22 KiB
HTML
<!-- -*- mode: html; mode: font-lock -*-
|
|
|
|
Hand-translated from LaTeX to HTML by lth on 2000-05-16, and
|
|
converted footnotes to in-line text. Fixed a small number of
|
|
typos. No other changes. -->
|
|
|
|
<html>
|
|
<head>
|
|
<title>FFIGEN User's Manual</title>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<center>
|
|
<h1>FFIGEN User's Manual</h1><br>
|
|
(Preliminary)<br>
|
|
Lars Thomas Hansen<br>
|
|
<tt>lth@cs.uoregon.edu</tt><br>
|
|
February 6, 1996
|
|
</center>
|
|
|
|
<h2>1. Introduction</h2>
|
|
|
|
<p>FFIGEN is a program system which facilitates the writing of
|
|
translators from C header files to foreign function interfaces for
|
|
particular programming language implementations. This document
|
|
describes its structure and use. The discussion is aimed at translator
|
|
writers; everyone else should confine themselves to section 3. A
|
|
companion document, <a href="manifesto.html">FFIGEN Manifesto and
|
|
Overview</a>, motivates the work, and other companion documents describe
|
|
specific translator implementations. In particular, the document
|
|
<em>FFIGEN Back-end for Chez Scheme Version 5</em> describes one
|
|
translator in detail.</p>
|
|
|
|
<p>FFIGEN is based on the <em>lcc</em> C compiler, which is copyrighted
|
|
software. See Section 10 for a full copyright notice.</p>
|
|
|
|
<h2>2. Writing Translators</h2>
|
|
|
|
<p>To generate a translation of a header file you run the <em>ffigen</em>
|
|
command to generate an intermediate form of the C header files you want
|
|
to translate, and then run the back-end on the resulting files to
|
|
generate the foreign function interface for the library.</p>
|
|
|
|
<p>Your task, should you choose to accept it, is to implement the
|
|
target-specific parts of the back-end for your particular target (which
|
|
is to say, combination of host language implementation, operating
|
|
system, architecture, foreign language implementation, and translation
|
|
policy). You should be able to use the FFIGEN front-end and the
|
|
target-independent parts of the back-end pretty much as they are.</p>
|
|
|
|
<p>How to implement the target-specific parts of the back-end is
|
|
discussed in Section 6. Use of the front end is described in Section 2.
|
|
The intermediate format is described in Section 4, and the
|
|
target-independent parts of the back-end and their interface to the
|
|
target-dependent part are described in Section 5. Finally, Section 7
|
|
covers some issues which need to be tackled in the future.</p>
|
|
|
|
<h2>3. Running FFIGEN</h2>
|
|
|
|
<p>The command <em>ffigen</em> is run on a set of header files with
|
|
preprocessor option and include file options. Arguments are processed
|
|
in order. For each header file (type <tt>.h</tt>) and all the files it
|
|
includes, a single preprocessor file (type <tt>.ffi</tt>) is
|
|
produced.</p>
|
|
|
|
<p>The options are:
|
|
<dl>
|
|
<dt><tt>-Dname[=value]</tt>
|
|
<dd>Define preprocessor macro.
|
|
<dt><tt>-Uname</tt>
|
|
<dd>Undefine preprocessor macro.
|
|
<dt><tt>-Idirectory</tt>
|
|
<dd>Add directory to the <em>beginning</em> of the list
|
|
of include files. Standard directories include the <em>lcc</em> include
|
|
directory, <tt>/usr/include</tt>, and the current directory (in that order).
|
|
See the release notes for information about how to change the defaults.
|
|
</dl>
|
|
|
|
<em>ffigen</em> performs full syntax and type checks on its input.</p>
|
|
|
|
The back-end is run by starting your favorite Scheme system and then
|
|
loading first the target-independent file <tt>process.sch</tt> and second
|
|
the target-dependent part of the translator; in the case of the Chez
|
|
Scheme back-end the file is called <tt>chez.sch</tt>. You then call the
|
|
procedure <tt>process</tt> with the name of the <tt>.ffi</tt> file to
|
|
process, as discussed in section 5.
|
|
|
|
<h2>4. Intermediate Format</h2>
|
|
|
|
<p>The intermediate format consists of s-expressions following this grammar:
|
|
|
|
<pre>
|
|
<file> -> <record> ...
|
|
<record> -> (function <filename> <name> <type> <attrs>)
|
|
| (var <filename> <name> <type> <attrs>)
|
|
| (type <filename> <name> <type>)
|
|
| (struct <filename> <name> ((<name> <type>) ...))
|
|
| (union <filename> <name> ((<name> <type>) ...))
|
|
| (enum <filename> <name> ((<name> <value>) ...))
|
|
| (enum-ident <filename> <name> <value>)
|
|
| (macro <filename> <name+args> <body>)
|
|
<type> -> (<primitive> <attrs>)
|
|
| (struct-ref <tag>)
|
|
| (union-ref <tag>)
|
|
| (enum-ref <tag>)
|
|
| (function (<type> ...) <type>)
|
|
| (pointer <type>)
|
|
| (array <value> <type>)
|
|
<attrs> -> (<attr> ...)
|
|
<attr> -> static | extern | const | volatile
|
|
<primitive> -> char | signed-char | unsigned-char | short
|
|
| unsigned-short | int | unsigned | long
|
|
| unsigned-long | float | double | void
|
|
<value> -> <integer>
|
|
<filename> -> <string>
|
|
<name> -> <string>
|
|
<body> -> <string>
|
|
<name+args> -> <string>
|
|
<tag> -> <string>
|
|
</pre>
|
|
|
|
Notes relating to the grammar:</p>
|
|
|
|
<ul>
|
|
<li> <tt>...</tt> means "zero or more of" the preceding item.
|
|
|
|
<li> The grammar is a little more general than the actual output
|
|
language. All structs, unions, and enums in parameter lists, return
|
|
types, and variable declarations are encoded as <tt>struct-ref</tt>,
|
|
<tt>union-ref</tt>, and <tt>enum-ref</tt>, respectively; structure, union,
|
|
and enum type definitions occur only in <tt>struct</tt>, <tt>union</tt>,
|
|
and <tt>enum</tt> records.
|
|
|
|
<li> The <tt><tag></tt> field in structs/unions/enums (and their
|
|
<tt>-ref</tt> forms) is the tag. If one of these types
|
|
has a user-defined tag, then that tag is used in the <tt>struct-ref</tt>
|
|
item for the type; if the structure had no user-defined tag then a tag has been
|
|
generated by <em>lcc</em>. Generated tags have the syntax of positive
|
|
integers; in particular they start with a digit. There is one namespace
|
|
each for structs, unions, and enums.
|
|
|
|
<li>
|
|
<tt>typedef</tt> names are not used anywhere: they occur in <tt>type</tt>
|
|
records only.
|
|
|
|
<li>
|
|
The attributes on primitive types are <tt>const</tt> or <tt>volatile</tt>; the
|
|
attributes <tt>static</tt> and <tt>extern</tt> are used only on functions and
|
|
global variables.
|
|
|
|
<li>
|
|
Functions which are known to take no parameters (<em>ie</em> <tt>t f(void)</tt>) have
|
|
one parameter, of type <tt>(void ())</tt>. The void type appears in a
|
|
parameter list only as the last element.
|
|
|
|
<li>
|
|
Functions which take a variable number of arguments have at least one
|
|
defined non-void parameter and a last parameter of type <tt>(void ())</tt>.
|
|
|
|
<li>
|
|
Functions for which no parameters were defined (<em>ie</em> <tt>t f()</tt>) have
|
|
no parameters.
|
|
|
|
<li>
|
|
The ordering of records in the input has no relation to the
|
|
relative ordering of declarations in the original source.
|
|
|
|
<li>
|
|
The <tt><value></tt> field in the array is its size. If the size is not
|
|
known, it is 0.
|
|
|
|
<li>
|
|
Multidimensional arrays are represented as nested array types with the
|
|
leftmost dimension outermost in the expected way; i.e., it looks like
|
|
an array of arrays.
|
|
|
|
<li>
|
|
Arrays are not valid return types.
|
|
|
|
<li>
|
|
Array parameters lose some semantic information in the translation in
|
|
the current system. An array parameter <tt>t a[n]</tt> is always
|
|
converted to a pointer: <tt>(pointer t)</tt> regardless of whether
|
|
<tt>n</tt> is known or not. As expected, then, something like
|
|
<tt>t a[n][m][o]</tt> gets the parameter type
|
|
<tt>(pointer (array m (array o t)))</tt>. Note that this only pertains to
|
|
parameter types; variables of array type are not converted in this manner.
|
|
(The semantic information claimed lost is the size of the leftmost
|
|
dimension. This lossage may make it impossible to perform array conversion
|
|
at call boundaries, for example.)
|
|
|
|
<li>
|
|
The grammar describes the current format, which will change: line number
|
|
and column information will be incorporated. You should always use the
|
|
accessor functions defined in the target-independent part of the
|
|
back-end; see section 5. The grammar does not allow
|
|
for bit fields or qualifications on anything but primitive
|
|
types, but these will be accomodated eventually.
|
|
|
|
</ul>
|
|
|
|
|
|
<h2>5. The Target-Independent Back-End</h2>
|
|
|
|
<p>The target-independent back-end is a Scheme program called
|
|
<tt>process</tt> which reads the intermediate form into memory and
|
|
performs some initial processing. It exports some global variables and
|
|
a number of procedures which are used to access the structures in the
|
|
database of intermediate records, and imports two target-dependent
|
|
functions from the target-dependent back-end. This section describes
|
|
the interfaces.</p>
|
|
|
|
<p>The global variables which hold the database are:
|
|
|
|
<pre>
|
|
(define functions '()) ; list of function records
|
|
(define vars '()) ; list of var records
|
|
(define types '()) ; list of type records
|
|
(define structs '()) ; list of struct records
|
|
(define unions '()) ; list of union records
|
|
(define macros '()) ; list of macro records
|
|
(define enums '()) ; list of enum records
|
|
(define enum-idents '()) ; list of enum-ident records
|
|
</pre>
|
|
|
|
Each of these contains a list of all the records of the type indicated
|
|
by their names. Note that records may look different internally than
|
|
in the defined intermediate form, so accessor functions (see below) should
|
|
always be used.</p>
|
|
|
|
<p>In addition, there are two globals which are set but not used by
|
|
the target-independent back-end:
|
|
|
|
<pre>
|
|
(define source-file #f) ; name of the input file itself
|
|
(define filenames '()) ; names of all files in the input
|
|
</pre>
|
|
</p>
|
|
|
|
<p>The main entry point to the back end is the procedure <tt>process</tt>,
|
|
which takes a single file name as an argument. <tt>Process</tt>
|
|
initializes globals, reads the file, and processes the records.
|
|
|
|
<pre>
|
|
(define (process filename) ...)
|
|
</pre></p>
|
|
|
|
<p>Record processing consists of some general analysis and target-specific
|
|
code generation. First, the target-specific procedure
|
|
<tt>select-functions</tt> is called; it must set or reset the
|
|
"referenced" bit in each record depending on whether the function is
|
|
interesting to the back-end or not. After computing reachability of
|
|
structured types and setting the referenced bits of those types which
|
|
are reachable, a translation is generated by a call to the back-end
|
|
function <tt>generate-translation</tt>, which takes no arguments.
|
|
|
|
<pre>
|
|
(define (select-functions) ...)
|
|
(define (generate-translation) ...)
|
|
</pre></p>
|
|
|
|
<p>A number of data structure accessors and mutators are also available.
|
|
These are generic procedures which work on all of the record types.
|
|
|
|
<pre>
|
|
(define (file r) ...) ; file name of record
|
|
(define (name r) ...) ; name in records which have one
|
|
(define (type r) ...) ; type in records which have one
|
|
(define (attrs r) ...) ; attrs in records which have one
|
|
(define (fields r) ...) ; fields in struct/union record
|
|
(define (value r) ...) ; value of enum-ident record
|
|
(define (tag r) ...) ; tag in struct/union/union/-ref record
|
|
|
|
(define (referenced? r) ...) ; is record referenced?
|
|
(define (referenced! r) ...) ; set referenced bit
|
|
(define (unreferenced! r) ...) ; reset referenced bit
|
|
</pre>
|
|
|
|
Arguably the <tt>tag</tt> accessor should go away and <tt>name</tt>
|
|
should simply be used in its place. As it is, <tt>name</tt> is not
|
|
defined on <tt>struct-ref</tt>, <tt>union-ref</tt>, and
|
|
<tt>enum-ref</tt> records.</p>
|
|
|
|
<p>The procedure <tt>record-tag</tt> returns the tag of the record currently
|
|
being held. It can also be applied to types.
|
|
|
|
<pre>
|
|
(define (record-tag r) ...) ; get record tag
|
|
</pre></p>
|
|
|
|
<p>All records can have back-end specific values attached to them; usually
|
|
these are cached names for operations on structured values, so for now
|
|
the procedures which manipulate the back-end specific data are called
|
|
<tt>cache-name</tt> to remember a value and <tt>cached-names</tt> to return
|
|
the list of remembered values:
|
|
|
|
<pre>
|
|
(define (cache-name r v) ...) ; remember value in record
|
|
(define (cached-names r) ...) ; retrieve remembered values
|
|
</pre>
|
|
|
|
We should probably replace this with a more general property-list-like
|
|
mechanism.</p>
|
|
|
|
<p>In addition, two procedures extract parts of function types:
|
|
|
|
<pre>
|
|
(define (arglist r) ...) ; function argument types
|
|
(define (rett r) ...) ; function return type
|
|
</pre></p>
|
|
|
|
<p>Some utilities to deal with file names are also provided:
|
|
|
|
<pre>
|
|
(define (strip-extension fn) ...)
|
|
(define (strip-path fn) ...)
|
|
(define (get-path fn) ...)
|
|
</pre></p>
|
|
|
|
<p>A string macro expander makes it easier to generate C code, for the back
|
|
ends that need it. The macro expander is called <tt>instantiate</tt> and
|
|
is called with a string template and a vector of arguments (which are
|
|
also strings). The template contains patterns of the form <tt>@n</tt>
|
|
where <tt>n</tt> is a single digit; when such a pattern is seen it is
|
|
replaced with the corresponding value from the argument vector.
|
|
|
|
<pre>
|
|
(define (instantiate template arguments) ...)
|
|
</pre></p>
|
|
|
|
<p>Two procedures, <tt>struct-names</tt> and <tt>union-names</tt>, take a
|
|
structure (or union) and returns a list of all the typedef names which
|
|
reference the structure directly.
|
|
|
|
<pre>
|
|
(define (struct-names struct) ...)
|
|
(define (union-names union) ...)
|
|
</pre></p>
|
|
|
|
<p>An association function which searches one of the record lists for a
|
|
given record by the <tt>name</tt> field is also available:
|
|
|
|
<pre>
|
|
(define (lookup key items) ...)
|
|
</pre></p>
|
|
|
|
<p>The procedure <tt>user-defined-tag?</tt> determines whether a tag was
|
|
defined by the user or generated by the system:
|
|
|
|
<pre>
|
|
(define (user-defined-tag? x) ...)
|
|
</pre></p>
|
|
|
|
<p>The procedure <tt>warn</tt> takes some arbitrary arguments and generates
|
|
a warning message on standard output:
|
|
|
|
<pre>
|
|
(define (warn msg . rest) ...)
|
|
</pre></p>
|
|
|
|
<p>Some standard predicates take a type and test its kind:
|
|
<tt>primitive-type?</tt> is true if the argument is of a primitive type as
|
|
outlined in the grammar above; <tt>basic-type?</tt> is true if the
|
|
argument is a primitive type or a pointer type; <tt>array-type?</tt> is
|
|
true if the argument is an array type, and finally,
|
|
<tt>structured-type?</tt> is true if the argument is a <tt>struct-ref</tt>
|
|
or <tt>union-ref</tt> type:
|
|
|
|
<pre>
|
|
(define (primitive-type? t) ...)
|
|
(define (basic-type? t) ...)
|
|
(define (array-type? t) ...)
|
|
(define (structured-type? t) ...)
|
|
</pre></p>
|
|
|
|
<h2>6. Writing a Target-Dependent Back-End</h2>
|
|
|
|
<p>To write the target-dependent back-end, you must decide on the policy
|
|
for the translation and then implement the translation. The policy
|
|
covers such issues as: which constructs in C are or are not handled; the
|
|
translation for each handled construct; how non-handled constructs are
|
|
dealt with (ignored, detected with warnings, detected with errors); how
|
|
to deal with exceptional cases (consider the <tt>fgets</tt> example from
|
|
the <a href="manifesto.html">Manifesto</a>).</p>
|
|
|
|
<p>For a concrete example, see the companion document <em>FFIGEN Backend
|
|
for Chez Scheme Version 5</em>, which addresses many of the choices to be
|
|
made and their possible solutions.</p>
|
|
|
|
<h2>7. Future Work</h2>
|
|
|
|
<p>A number of features <em>will</em> be supported in the future:</p>
|
|
|
|
<ul>
|
|
<li> There will be a line and a column field in each record, giving the
|
|
source line on which the identifier was defined.
|
|
|
|
<li> Bitfields will be supported.
|
|
|
|
<li> Qualifiers (what's now called attributes, that is, const and
|
|
volatile) will be supported on all types, not just on primitive
|
|
non-pointer types like now.
|
|
|
|
<li> The intermediate representation will include the name of the orignal
|
|
input file, and its path.
|
|
|
|
<li> The intermediate representation will include a representation of
|
|
the include file hierarchy which was traversed to produce the
|
|
intermediate representation.
|
|
|
|
</ul>
|
|
|
|
<p>A number of features will most likely be supported, but need
|
|
to be investigated:</p>
|
|
|
|
<ul>
|
|
<li> It would be nice to retain comments.
|
|
|
|
<li> Various popular extensions to C are not currently supported by
|
|
<em>lcc</em>, but would be extremely useful: <tt>long long</tt> is used
|
|
extensively in Unix header files, and header files for compilers on PCs
|
|
often use the common Microsoft extensions <tt>__huge</tt>, <tt>__far</tt>,
|
|
and <tt>__near</tt> (and their non-underscore equivalents). Some C compilers
|
|
support <tt>__inline</tt> declarations, and although we can't generate
|
|
code for in-line procedures we can at least parse them if the compiler
|
|
can cope with <tt>__inline</tt>. (<tt>__inline</tt> is the easier, since it
|
|
can be ignored. The others must show up as type qualifiers or new types.)
|
|
|
|
<li> The current shell-program driver will probably be replaced by
|
|
something based on the lcc driver.
|
|
|
|
<li> I'm going to experiment with partial macro application in the
|
|
front end so that back-ends can have simple support for macro
|
|
definitions. Currently, for example, even something as simple as the
|
|
<tt>EOF</tt> macro will be ignored by the Chez Scheme back-end because its
|
|
form is <tt>"(-1)"</tt> rather than simply <tt>"-1"</tt>.
|
|
|
|
<li> Information about the layout of fields within structured types
|
|
should possibly be emitted; this information would be useful to
|
|
low-level FFIs which need byte offset and size to access the field of a
|
|
structure.
|
|
|
|
</ul>
|
|
|
|
<p>In addition, there are some issues to investigate in a larger
|
|
perspective:</p>
|
|
|
|
<ul>
|
|
<li> General (target-independent) support for useful policy mechanisms.
|
|
|
|
<li> How well can the intermediate language support other front-ends?
|
|
I don't want to fall into the UNCOL pit, but it would be interesting to
|
|
see how languages which resemble C in their parameter passing mechanisms
|
|
(Pascal, Modula, Oberon) could be mapped onto the intermediate language.
|
|
This is not high priority with me, however. If I embark on supporting
|
|
another front-end language it will probably be (sigh) C++.
|
|
|
|
</ul>
|
|
|
|
<h2>8. Please Contribute!</h2>
|
|
|
|
<p>My goal is to support as many target languages as is reasonable, but I
|
|
can't write all the translators myself (I lack the time and, in many
|
|
cases, the knowledge). Targets that I will take care of include STk,
|
|
and, if no-one beats me to it, Scsh, both Scheme systems. Someone has
|
|
already volunteered to write the ILU back-end. Others are interested
|
|
in back-ends for Modula-3 and Mercury.</p>
|
|
|
|
<p>Volunteers for any translator back-end are welcome to e-mail me and
|
|
volunteer their help. I will coach, coordinate, and help out as much as
|
|
possible.</p>
|
|
|
|
<h2>9. Credits</h2>
|
|
|
|
<p>FFIGEN is based on the freely available <em>lcc</em> ANSI C compiler,
|
|
implemented by Christopher Fraser (of AT&T Bell Labs) and David Hanson
|
|
(of Princeton University).</p>
|
|
|
|
<p>I would like to thank Fraser and Hanson for producing such an excellent
|
|
system; <em>lcc</em> has been a joy to work with, and their book, <em>A
|
|
Retargetable C Compiler: Design and Implementation</em>, made the
|
|
implementation of the FFIGEN front end in the matter of roughly a single
|
|
work day possible. Would it be that all software was this clean!</p>
|
|
|
|
<p>The development of FFIGEN was supported by ARPA
|
|
under U.S. Army grant No. DABT63-94-C-0029,
|
|
``Programming Environments, Compiler Technology and Runtime Systems
|
|
for Object Oriented Parallel Processing''.</p>
|
|
|
|
<h2>10. Copyrights</h2>
|
|
|
|
<em>lcc</em> is covered by the following Copyright notice:
|
|
|
|
<blockquote>
|
|
<p>The authors of this software are Christopher W. Fraser and
|
|
David R. Hanson.</p>
|
|
|
|
<p>Copyright (c) 1991,1992,1993,1994,1995 by AT&T, Christopher W. Fraser,
|
|
and David R. Hanson. All Rights Reserved.</p>
|
|
|
|
<p>Permission to use, copy, modify, and distribute this software for any
|
|
purpose, subject to the provisions described below, without fee is
|
|
hereby granted, provided that this entire notice is included in all
|
|
copies of any software that is or includes a copy or modification of
|
|
this software and in all copies of the supporting documentation for
|
|
such software.</p>
|
|
|
|
<p>THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED
|
|
WARRANTY. IN PARTICULAR, NEITHER THE AUTHORS NOR AT&T MAKE ANY
|
|
REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY
|
|
OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.</p>
|
|
|
|
<p>lcc is not public-domain software, shareware, and it is not protected
|
|
by a `copyleft' agreement, like the code from the Free Software
|
|
Foundation.</p>
|
|
|
|
<p>lcc is available free for your personal research and instructional use
|
|
under the `fair use' provisions of the copyright law. You may,
|
|
however, redistribute the lcc in whole or in part provided you
|
|
acknowledge its source and include this COPYRIGHT file.</P>
|
|
|
|
<p>You may not sell lcc or any product derived from it in which it is a
|
|
significant part of the value of the product. Using the lcc front end
|
|
to build a C syntax checker is an example of this kind of product.</p>
|
|
|
|
<p>You may use parts of lcc in products as long as you charge for only
|
|
those components that are entirely your own and you acknowledge the use
|
|
of lcc clearly in all product documentation and distribution media. You
|
|
must state clearly that your product uses or is based on parts of lcc
|
|
and that lcc is available free of charge. You must also request that
|
|
bug reports on your product be reported to you. Using the lcc front
|
|
end to build a C compiler for the Motorola 88000 chip and charging for
|
|
and distributing only the 88000 code generator is an example of this
|
|
kind of product.</p>
|
|
|
|
<p>Using parts of lcc in other products is more problematic. For example,
|
|
using parts of lcc in a C++ compiler could save substantial time and
|
|
effort and therefore contribute significantly to the profitability of
|
|
the product. This kind of use, or any use where others stand to make a
|
|
profit from what is primarily our work, is subject to negotiation.</p>
|
|
|
|
<p>Chris Fraser / cwf@research.att.com <br>
|
|
David Hanson / drh@cs.princeton.edu<br>
|
|
Fri Jun 17 11:57:07 EDT 1994</p>
|
|
</blockquote>
|
|
|
|
<hr>
|
|
<address>
|
|
<A HREF="mailto:lth@acm.org">lth@acm.org</A>
|
|
</address>
|
|
<em>24 May 2000</em>
|
|
</body>
|
|
</html>
|