242 lines
11 KiB
HTML
242 lines
11 KiB
HTML
<!-- -*- mode: html; mode: font-lock -*-
|
|
|
|
Hand-translated from LaTeX to HTML by lth on 2000-05-16,
|
|
converted footnotes to in-line text, and inserted hyperlinks.
|
|
No other changes. -->
|
|
|
|
<html>
|
|
<head>
|
|
<title>FFIGEN Manifesto and Overview</title>
|
|
</head>
|
|
|
|
<body>
|
|
|
|
<center>
|
|
<h1>FFIGEN Manifesto and Overview</h1><br>
|
|
Lars Thomas Hansen <br>
|
|
<tt>lth@cs.uoregon.edu</tt><br>
|
|
February 6, 1996
|
|
</center>
|
|
|
|
<blockquote>
|
|
<p>FFIGEN (Foreign Function Interface GENerator) is a program suite which
|
|
facilitates the writing of translators from C header files to foreign
|
|
function interfaces for particular language implementations.</p>
|
|
|
|
<p>On a more general level, FFIGEN is a statement about how such
|
|
translators should be structured for maximum usability, namely as a
|
|
single translator from C to a rational intermediate language and as
|
|
multiple translators from the intermediate language to separate FFI
|
|
translations. In the present document I motivate this two-level
|
|
structure by arguing that the many policy questions inherent in choosing
|
|
a mapping from one language to another cannot be accomodated in a single
|
|
translator, and that the two-level structure promotes significant code
|
|
reuse. Companion documents present the program suite itself.</p>
|
|
</blockquote>
|
|
|
|
<h2>1. Manifesto</h2>
|
|
|
|
<p>Many language implementations have mechanisms which provide support for
|
|
call-outs to other, typically more primitive, languages. In particular,
|
|
implementations of very-high-level languages like Scheme, Common Lisp,
|
|
Standard ML, and Haskell support call-outs to system-level languages,
|
|
typically C. Other examples include the support for call-outs to C and
|
|
assembly language in C++, the EXTRINSIC directive in HPF, and the
|
|
<tt><*EXTERNAL*></tt> pragma in DEC SRC Modula-3. Mechanisms to call-out
|
|
to other languages are typically called <em>foreign function
|
|
interfaces</em> (FFIs). The purpose of an FFI is often to gain access to
|
|
functionality which is not (efficiently) expressible in the language
|
|
itself; other times the FFI is used to allow the program to interface to
|
|
existing libraries.</p>
|
|
|
|
<p>FFIs are only rarely part of the language definition; the only examples
|
|
I can think of are the support for C and assembly in C++ and the
|
|
EXTRINSIC directive in HPF. More typically, each language
|
|
implementation has its own idiosyncratic and often ad-hoc mechanism for
|
|
supporting foreign data types, functions, and variables. The mechanisms
|
|
are not standardized probably because they depend to a large extent on
|
|
the calling conventions of the procedure being called, the operating
|
|
system on which the program is running, the architecture of the machine,
|
|
the data types of the language being called, the version of the
|
|
compilers for the host and foreign languages, and so on. (In the
|
|
following I will refer to a point in the space made from the product of
|
|
the preceding attributes as a <em>target</em>.) Since the system
|
|
dependencies are considerable, it is unlikely that a fully general and
|
|
portable FFI can be defined for a language, and in addition, an
|
|
interface that works with all targets is likely to be neither functional
|
|
nor convenient. The chances for any portable, standardized language to
|
|
adopt a non-trivial FFI therefore seem slight. This is not to say that
|
|
an adequate job can't be done in many cases--for example, Franz Allegro
|
|
Common Lisp sports a sophisticated FFI which supports C and Fortran
|
|
seemingly very well--only that no <em>standard</em> and <em>general</em>
|
|
solution is likely to emerge.</p>
|
|
|
|
<p>Based on these observations, an approach to inter-language calling would
|
|
be to accept the fact that FFIs are implementation-dependent and instead
|
|
concentrate our effort on a higher level of abstraction: that of the
|
|
library interface. Even if the FFI is target-dependent, most of the
|
|
time the interface to a library is not (which is the beauty of an
|
|
interface in the first place). If, for each library, there existed a
|
|
reasonable definition of its interface, then a program could take that
|
|
definition and generate FFI code for the library for a given target.
|
|
This is the approach advocated by the creators of the ILU system (see
|
|
section 3).</p>
|
|
|
|
<p>However, manufacturers of libraries are <em>not</em> distributing
|
|
reasonable definitions of the interfaces to their libraries. All you
|
|
usually get is a C or C++ header file. A header file is not a
|
|
reasonable definition of the interface because of the baggage it
|
|
carries: nested include files, preprocessor macros, conditional
|
|
compilation, syntactic peculiarities, implementation language target
|
|
dependencies, and so on. In the best of all worlds, the manufacturer
|
|
would distribute the interfaces in an interface definition language like
|
|
the Object Management Group's IDL or ILU's ISL, and maybe one day that
|
|
will be common. In the mean time, we must fend for ourselves.</p>
|
|
|
|
<p>What we must do is to provide a translator which takes as its input not
|
|
a reasonable definition but instead a C or C++ header file or set of
|
|
header files, and produces as its output the FFI code for the library
|
|
for a given target. However, such a program is likely to be complicated
|
|
and there will be one version for each target. Maintaining all these
|
|
translators will be an unpleasant task. We could of course have one
|
|
translator, to IDL or ISL, and translators from the interface language
|
|
to the FFI, and as we will see, this is a variation on the mechanism
|
|
implemented by FFIGEN.</p>
|
|
|
|
<p>An additional important problem is that there is not one but several
|
|
translations for every target. A given interface can be translated to
|
|
any of several FFIs depending on the desired <em>policy</em> for the
|
|
translation. For example, consider a function
|
|
|
|
<pre>
|
|
char *fgets(char*, int, FILE*).
|
|
</pre>
|
|
|
|
What does <tt>char*</tt> translate to? Consider the FFI provided by Chez
|
|
Scheme version 5. It has a <tt>string</tt> type which in a parameter
|
|
position causes the address of the first character of the string
|
|
argument to be passed to the function, but which in the return position
|
|
causes the characters to be copied from the storage pointed to by the
|
|
return value (if not <tt>NULL</tt>) into a fresh Scheme string. So if we
|
|
translate <tt>char*</tt> as <tt>string</tt>, we end up with (since
|
|
<tt>FILE*</tt> is translated as an <tt>unsigned int</tt>)
|
|
|
|
<pre>
|
|
(define fgets
|
|
(foreign-function "fgets"
|
|
(string integer-32 unsigned-32)
|
|
string))
|
|
</pre>
|
|
|
|
which is expensive because the string is (needlessly) copied on return.
|
|
On the other hand, we can treat a <tt>char*</tt> as "just a pointer" and
|
|
translate as:
|
|
|
|
<pre>
|
|
(define fgets
|
|
(foreign-function "fgets"
|
|
(unsigned-32 integer-32 unsigned-32)
|
|
unsigned-32))
|
|
</pre>
|
|
|
|
but this does not let us access the characters in the buffer using
|
|
Scheme's string functions, since the buffer is not a string. In the
|
|
end, it appears that no fixed translation for <tt>char*</tt> is possible;
|
|
even if a fixed translation (and then: which one of them?) is adequate
|
|
in most situations, there will be special cases. (Arguably, it
|
|
would have been better for <tt>fgets()</tt> to return a truth value or the
|
|
number of characters read.)</p>
|
|
|
|
<p>The bottom line is, there is a lot of policy that goes into a
|
|
translation into a specific FFI. Hence we have a slogan (the core of
|
|
the Manifesto):</p>
|
|
|
|
<blockquote>
|
|
A good foreign function interface is 25% code and 75% policy.
|
|
</blockquote>
|
|
|
|
<p>It should be a goal, then, to separate the ardous task of parsing and
|
|
type-checking C headers and translating them into a rational
|
|
intermediate form, from the task of translating the intermediate form
|
|
into a FFI specification for a given target and translation policy.</p>
|
|
|
|
<h2>2. The FFIGEN System</h2>
|
|
|
|
<p>I have written a program, which I call <em>ffigen</em>, which takes
|
|
as its input a C header file and produces as its output a rational
|
|
translation of the interface defined by the header file. A rational
|
|
translation is one in which unnecessary or redundant syntax has been
|
|
removed, preprocessor macros have been expanded, and preprocessor
|
|
conditionals have been resolved so that definitions have been included
|
|
or excluded corrspondingly. The exact format of the intermediate code
|
|
is described in a companion document, the <a href="userman.html">FFIGEN
|
|
User's Manual</a>. <em>ffigen</em> functions as the <em>front-end</em>
|
|
of a system which translates C headers into foreign function
|
|
interfaces.</p>
|
|
|
|
<p>Each target system will have one or more specific <em>back-ends</em> which
|
|
take the intermediate form and produce translations for particular
|
|
targets and translation policies. Substantial parts of the back-end
|
|
code is largely target-independent and can therefore be shared by
|
|
multiple back-ends.</p>
|
|
|
|
<p>I have written one back-end to serve as a sample; it produces FFI code
|
|
for Chez Scheme version 5. It is documented in a companion document,
|
|
<em>FFIGEN Back-end for Chez Scheme Version 5</em>.</p>
|
|
|
|
|
|
<h2>3. Related Work</h2>
|
|
|
|
<p>Kenneth B. Russell of MIT has implemented a system called Header2Scheme
|
|
which translates C++ to the FFI of the SCM Scheme system. FFIGEN and
|
|
Header2Scheme are fairly different at this point. My goal with FFIGEN
|
|
was to cover all of ANSI C including the preprocessor in a reasonable
|
|
way; this is doable because ANSI C is a small, fixed, and fairly simple
|
|
language. C++, on the other hand, is a very large, changing, and
|
|
complex language, and Header2Scheme therefore handles only part of it at
|
|
this time (as of version 1.2, it does not handle preprocessor macros,
|
|
typedefs, and enums). In addition, my emphasis was on not fixing policy
|
|
at all, which gives great freedom (and more work) to back-end writers,
|
|
whereas Russell has mostly fixed the policy. On the other hand,
|
|
Header2Scheme allows some policy decisions to be expressed in auxiliary
|
|
files given to the translator, and I have yet to experiment with these
|
|
mechanisms in FFIGEN. Header2Scheme is available from URL
|
|
<pre>
|
|
http://www-white.media.mit.edu/~kbrussel/Header2Scheme
|
|
</pre>
|
|
</p>
|
|
|
|
<p>A message (<tt><1996Jan17.121933.25825@chemabs.uucp></tt>) posted
|
|
to the Usenet group <tt>comp.lang.scheme</tt> (among others) alleged that
|
|
Apple has a translator for their Dylan implementation which will take a
|
|
C header file and generate Dylan FFI glue for it. I know nothing else
|
|
about this system (but would appreciate hearing about it from anyone who
|
|
knows).</p>
|
|
|
|
<p>The ILU (Inter-Language Unification) system from Xerox PARC provides
|
|
cross-language calling functionality for modules which have interfaces
|
|
specified in ISL, the ILU interface definition language. ILU will take
|
|
the interfaces and produce stubs (glue, as it were) for the languages so
|
|
that they can call each other. The ISL file specifies the interface
|
|
somewhat abstractly in terms of data types which are meaningful in ISL
|
|
but which have various mappings in the target languages; again, one
|
|
mapping is assumed to fit all.</p>
|
|
|
|
<h2>4. Acknowlegements</h2>
|
|
|
|
<p>FFIGEN is based on the <em>lcc</em> ANSI C compiler. See the <a
|
|
href="userman.html">FFIGEN User's Manual</a> for full acknowlegements
|
|
and a copyright notice.</p>
|
|
|
|
<p>This work has been supported by ARPA under U.S. Army grant
|
|
No. DABT63-94-C-0029, "Programming Environments, Compiler Technology
|
|
and Runtime Systems for Object Oriented Parallel Processing".</p>
|
|
|
|
<hr>
|
|
<address>
|
|
<A HREF="mailto:lth@acm.org">lth@acm.org</A>
|
|
</address>
|
|
<em>24 May 2000</em>
|
|
</body>
|
|
</html>
|