ffigen-website/www/manifesto.html

<!-- -*- mode: html; mode: font-lock -*-

  Hand-translated from LaTeX to HTML by lth on 2000-05-16,
  converted footnotes to in-line text, and inserted hyperlinks.
  No other changes. -->

<html>
<head>
<title>FFIGEN Manifesto and Overview</title>
</head>

<body>

<center>
<h1>FFIGEN Manifesto and Overview</h1><br>
Lars Thomas Hansen <br>
<tt>lth@cs.uoregon.edu</tt><br>
February 6, 1996
</center>

<blockquote>
<p>FFIGEN (Foreign Function Interface GENerator) is a program suite which
facilitates the writing of translators from C header files to foreign
function interfaces for particular language implementations.</p>

<p>On a more general level, FFIGEN is a statement about how such
translators should be structured for maximum usability, namely as a
single translator from C to a rational intermediate language and as
multiple translators from the intermediate language to separate FFI
translations.  In the present document I motivate this two-level
structure by arguing that the many policy questions inherent in choosing
a mapping from one language to another cannot be accomodated in a single
translator, and that the two-level structure promotes significant code
reuse.  Companion documents present the program suite itself.</p>
</blockquote>

<h2>1. Manifesto</h2>

<p>Many language implementations have mechanisms which provide support for
call-outs to other, typically more primitive, languages.  In particular,
implementations of very-high-level languages like Scheme, Common Lisp,
Standard ML, and Haskell support call-outs to system-level languages,
typically C.  Other examples include the support for call-outs to C and
assembly language in C++, the EXTRINSIC directive in HPF, and the
<tt>&lt;*EXTERNAL*&gt;</tt> pragma in DEC SRC Modula-3.  Mechanisms to call-out
to other languages are typically called <em>foreign function
interfaces</em> (FFIs).  The purpose of an FFI is often to gain access to
functionality which is not (efficiently) expressible in the language
itself; other times the FFI is used to allow the program to interface to
existing libraries.</p>

<p>FFIs are only rarely part of the language definition; the only examples
I can think of are the support for C and assembly in C++ and the
EXTRINSIC directive in HPF.  More typically, each language
implementation has its own idiosyncratic and often ad-hoc mechanism for
supporting foreign data types, functions, and variables.  The mechanisms
are not standardized probably because they depend to a large extent on
the calling conventions of the procedure being called, the operating
system on which the program is running, the architecture of the machine,
the data types of the language being called, the version of the
compilers for the host and foreign languages, and so on. (In the
following I will refer to a point in the space made from the product of
the preceding attributes as a <em>target</em>.)  Since the system
dependencies are considerable, it is unlikely that a fully general and
portable FFI can be defined for a language, and in addition, an
interface that works with all targets is likely to be neither functional
nor convenient.  The chances for any portable, standardized language to
adopt a non-trivial FFI therefore seem slight.  This is not to say that
an adequate job can't be done in many cases--for example, Franz Allegro
Common Lisp sports a sophisticated FFI which supports C and Fortran
seemingly very well--only that no <em>standard</em> and <em>general</em>
solution is likely to emerge.</p>

<p>Based on these observations, an approach to inter-language calling would
be to accept the fact that FFIs are implementation-dependent and instead
concentrate our effort on a higher level of abstraction: that of the
library interface.  Even if the FFI is target-dependent, most of the
time the interface to a library is not (which is the beauty of an
interface in the first place).  If, for each library, there existed a
reasonable definition of its interface, then a program could take that
definition and generate FFI code for the library for a given target.
This is the approach advocated by the creators of the ILU system (see
section 3).</p>

<p>However, manufacturers of libraries are <em>not</em> distributing
reasonable definitions of the interfaces to their libraries.  All you
usually get is a C or C++ header file.  A header file is not a
reasonable definition of the interface because of the baggage it
carries: nested include files, preprocessor macros, conditional
compilation, syntactic peculiarities, implementation language target
dependencies, and so on.  In the best of all worlds, the manufacturer
would distribute the interfaces in an interface definition language like
the Object Management Group's IDL or ILU's ISL, and maybe one day that
will be common.  In the mean time, we must fend for ourselves.</p>

<p>What we must do is to provide a translator which takes as its input not
a reasonable definition but instead a C or C++ header file or set of
header files, and produces as its output the FFI code for the library
for a given target.  However, such a program is likely to be complicated
and there will be one version for each target.  Maintaining all these
translators will be an unpleasant task.  We could of course have one
translator, to IDL or ISL, and translators from the interface language
to the FFI, and as we will see, this is a variation on the mechanism
implemented by FFIGEN.</p>

<p>An additional important problem is that there is not one but several
translations for every target.  A given interface can be translated to
any of several FFIs depending on the desired <em>policy</em> for the
translation.  For example, consider a function

<pre>
  char *fgets(char*, int, FILE*).
</pre>

What does <tt>char*</tt> translate to?  Consider the FFI provided by Chez
Scheme version 5.  It has a <tt>string</tt> type which in a parameter
position causes the address of the first character of the string
argument to be passed to the function, but which in the return position
causes the characters to be copied from the storage pointed to by the
return value (if not <tt>NULL</tt>) into a fresh Scheme string.  So if we
translate <tt>char*</tt> as <tt>string</tt>, we end up with (since
<tt>FILE*</tt> is translated as an <tt>unsigned int</tt>)

<pre>
  (define fgets
    (foreign-function "fgets"
       (string integer-32 unsigned-32)
       string))
</pre>

which is expensive because the string is (needlessly) copied on return.
On the other hand, we can treat a <tt>char*</tt> as "just a pointer" and
translate as:

<pre>
  (define fgets
    (foreign-function "fgets"
       (unsigned-32 integer-32 unsigned-32)
       unsigned-32))
</pre>

but this does not let us access the characters in the buffer using
Scheme's string functions, since the buffer is not a string.  In the
end, it appears that no fixed translation for <tt>char*</tt> is possible;
even if a fixed translation (and then: which one of them?) is adequate
in most situations, there will be special cases.  (Arguably, it
would have been better for <tt>fgets()</tt> to return a truth value or the
number of characters read.)</p>

<p>The bottom line is, there is a lot of policy that goes into a
translation into a specific FFI.  Hence we have a slogan (the core of
the Manifesto):</p>

<blockquote>
A good foreign function interface is 25% code and 75% policy.
</blockquote>

<p>It should be a goal, then, to separate the ardous task of parsing and
type-checking C headers and translating them into a rational
intermediate form, from the task of translating the intermediate form
into a FFI specification for a given target and translation policy.</p>

<h2>2. The FFIGEN System</h2>

<p>I have written a program, which I call <em>ffigen</em>, which takes
as its input a C header file and produces as its output a rational
translation of the interface defined by the header file.  A rational
translation is one in which unnecessary or redundant syntax has been
removed, preprocessor macros have been expanded, and preprocessor
conditionals have been resolved so that definitions have been included
or excluded corrspondingly.  The exact format of the intermediate code
is described in a companion document, the <a href="userman.html">FFIGEN
User's Manual</a>.  <em>ffigen</em> functions as the <em>front-end</em>
of a system which translates C headers into foreign function
interfaces.</p>

<p>Each target system will have one or more specific <em>back-ends</em> which
take the intermediate form and produce translations for particular
targets and translation policies.  Substantial parts of the back-end
code is largely target-independent and can therefore be shared by
multiple back-ends.</p>

<p>I have written one back-end to serve as a sample; it produces FFI code
for Chez Scheme version 5.  It is documented in a companion document,
<em>FFIGEN Back-end for Chez Scheme Version 5</em>.</p>


<h2>3. Related Work</h2>

<p>Kenneth B. Russell of MIT has implemented a system called Header2Scheme
which translates C++ to the FFI of the SCM Scheme system.  FFIGEN and
Header2Scheme are fairly different at this point.  My goal with FFIGEN
was to cover all of ANSI C including the preprocessor in a reasonable
way; this is doable because ANSI C is a small, fixed, and fairly simple
language.  C++, on the other hand, is a very large, changing, and
complex language, and Header2Scheme therefore handles only part of it at
this time (as of version 1.2, it does not handle preprocessor macros,
typedefs, and enums).  In addition, my emphasis was on not fixing policy
at all, which gives great freedom (and more work) to back-end writers,
whereas Russell has mostly fixed the policy.  On the other hand,
Header2Scheme allows some policy decisions to be expressed in auxiliary
files given to the translator, and I have yet to experiment with these
mechanisms in FFIGEN.  Header2Scheme is available from URL
<pre>
http://www-white.media.mit.edu/~kbrussel/Header2Scheme
</pre>
</p>

<p>A message (<tt>&lt;1996Jan17.121933.25825@chemabs.uucp&gt;</tt>) posted
to the Usenet group <tt>comp.lang.scheme</tt> (among others) alleged that
Apple has a translator for their Dylan implementation which will take a
C header file and generate Dylan FFI glue for it.  I know nothing else
about this system (but would appreciate hearing about it from anyone who
knows).</p>

<p>The ILU (Inter-Language Unification) system from Xerox PARC provides
cross-language calling functionality for modules which have interfaces
specified in ISL, the ILU interface definition language.  ILU will take
the interfaces and produce stubs (glue, as it were) for the languages so
that they can call each other.  The ISL file specifies the interface
somewhat abstractly in terms of data types which are meaningful in ISL
but which have various mappings in the target languages; again, one
mapping is assumed to fit all.</p>

<h2>4. Acknowlegements</h2>

<p>FFIGEN is based on the <em>lcc</em> ANSI C compiler.  See the <a
href="userman.html">FFIGEN User's Manual</a> for full acknowlegements
and a copyright notice.</p>

<p>This work has been supported by ARPA under U.S. Army grant
No. DABT63-94-C-0029, "Programming Environments, Compiler Technology
and Runtime Systems for Object Oriented Parallel Processing".</p>

<hr>
<address>
<A HREF="mailto:lth@acm.org">lth@acm.org</A>
</address>
<em>24 May 2000</em>
</body>
</html>