Add wget mirror

wget --mirror --no-parent --no-host-directories --cut-dirs 3 \
  https://www.ccs.neu.edu/home/lth/ffigen/

The following files are omitted from this commit:

960213.tar.gz
ffigen.tar.gz
lcc-3.4b.tar.gz
robots.txt
This commit is contained in:
Lassi Kortela 2023-05-19 10:18:38 +03:00
parent cce0f69fad
commit 61fc3061a1
8 changed files with 1112 additions and 0 deletions

90
www/chez-policy.sch Normal file
View File

@ -0,0 +1,90 @@
; -*- scheme -*-
;
; Suggestions for policy mechanisms in the FFIGEN back-end for Chez Scheme.
; These are currently *not* implemented, and are only intended as examples.
;
; Mechanism falls into three categories: exclusion, overriding, and
; adaptation.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Exclusion:
; At the outset, everything in the .ffi file is marked as referenced.
; The mechanisms for excluding stuff are based on an item's name.
; exclude-file takes a file name or list of file names and excludes every
; item defined in that file and files included by it.
(exclude-file '())
; exclude-structure takes a structure name (i.e. either "struct FOO"
; or "union FOO" or "FOO") or list of names and inhibits generation of
; constructors, destructors, accessors, and mutators for it and all
; typedefs derived from it. If the name is a typedef name and the
; structure named has a compiler-generated tag, then the structure
; named by this typedef is also excluded.
(exclude-structure "FILE")
; exclude-function excludes the named function(s).
(exclude-function "select")
; exclude-global excludes the named global variable.
(exclude-global "__iob")
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Overriding
; Override-prototype gives the named function a new prototype.
(override-prototype "fgets"
`(function (,(primitive-type 'string)
,(primitive-type 'int)
,(pointer-type (struct-type "FILE")))
,(pointer-type (primitive-type 'char))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
; Adaptation
; Short-policy says something about how to handle shorts. Three values are
; possible: warning, use-integer, and use-proxy. If use-integer is the
; value, then an integer-32 FFI argument will be used on the assumption that
; this is meaningful in the native API. If use-proxy is the value then a
; proxy function is generated which takes an integer argument and calls
; the real function with the argument cast to short.
(short-policy 'use-integer)
; Struct-param-policy says something about how to handle structure parameters.
; Values are: warning and use-proxy. If use-proxy is the value, then
; an FF will be generated which takes structure pointers and which names a
; proxy function (this is transparent to Scheme code), and the real function
; will be called by the proxy.
(struct-param-policy 'warning)
; Struct-return-policy says something about how to handle structure return
; values. Values are: warning, alloc-new, and pass-placeholder. If alloc-new
; is the value, then a proxy function will be generated which receives
; the return value, allocates an object on the heap for it, copies the value
; into the allocated memory, and returns a pointer to the memory.
; If pass-placeholder is the value, then a FF and proxy will be generated
; that take an extra argument (the first); that argument must be a pointer
; to a structure in which to place the value.
(struct-return-policy 'warning)
; Variadic-policy says something about how to handle variadic procedures.
; Values are: warning and exclude. If the value is exclude, a warning will
; be given and no FFI code will be generated; if the value is warning, then
; invalid FFI code will be generated.
(variadic-policy 'warning)
; eof

BIN
www/chez.ps.gz Normal file

Binary file not shown.

135
www/index.html Normal file
View File

@ -0,0 +1,135 @@
<HTML>
<HEAD>
<TITLE>FFIGEN Home Page</TITLE>
<LINK REV="made" HREF="mailto:lth@acm.org">
</HEAD>
<BODY>
<H1>FFIGEN</H1>
"A good foreign function interface is 25% code and 75% policy."
<HR>
<P>
FFIGEN (Foreign Function Interface GENerator) is a program suite that
facilitates the writing of translators from C header files to foreign
function interfaces for particular language implementations.
<P>
<img src="../ball.red.gif" alt="*">
FFIGEN Manifesto and Overview
<A href="manifesto.html">(HTML)</a> <A href="manifesto.ps.gz">(ps.gz, 26 KB)</A>
<BR>
<img src="../ball.red.gif" alt="*">
FFIGEN User's Manual
<a href="userman.html">(HTML)</a> <A href="userman.ps.gz">(ps.gz, 30 KB)</A>
<BR>
<img src="../ball.red.gif" alt="*">
FFIGEN Back-end for Chez Scheme Version 5
<A href="chez.ps.gz">(ps.gz, 52 KB)</A>
<P>
There are three motivating observations behind FFIGEN. The first is
that C header files are hard to parse because of the preprocessor,
general syntactic grunge, and the problem of getting the data layouts
right. The second is that foreign function interfaces differ widely and
that translations to different FFIs can't be the same, yet should share
work as much as possible. The third is that not all translations are
suitable for all purposes; there may be multiple valid translations -
each of which serves a different need - for any given language's FFI.
<P>
For these reasons, a translator from C header syntax to an FFI should
have two parts: one target-independent front-end that translates from
the header file into a rational intermediate form and which can be used
with all translators, and a target-dependent back-end that translates
from the intermediate form to an FFI for the target system, using a
translation policy to guide the translation. This design nicely
facilitates writing back-ends for multiple languages, multiple FFIs per
language, and multiple policies per FFI.
<P>
FFIGEN is a system that implements the split-translation philosophy.
<UL>
<LI>The FFIGEN front-end is based on the front-end of the freely
available, production quality, ANSI C compiler <em>lcc</em>. Using
<em>lcc</em> makes
the FFIGEN front end portable, complete, and extensible for special
purposes.
<P>
<LI>The FFIGEN back-ends can be small (a back-end for Chez Scheme that
handles nearly all of C is 350 lines of Scheme code, for example), and
can be written in any language. Scheme is the preferred language for
back-ends right now, because the output syntax of the front-end,
although easily changeable, is that of S-expressions, and because a
back-end written in Scheme is already available for new back-ends to
build on.
</UL>
<P>
The current version of FFIGEN is available as a set of modifications to
<em>lcc</em> version 3.4b; you also need to get the <em>lcc</em> sources.
The FFIGEN
distribution includes documentation on how to write back-ends and a
documented example back-end for the FFI of Chez Scheme version 5.
<P>
<B> This is a preliminary release of FFIGEN. It works, but
is neither complete nor polished.</B>
<P>
<img src="../ball.red.gif" alt="*">
Click <A href="ffigen.tar.gz">here</A> to download the
full FFIGEN distribution. (148 KB)
<BR>
This archive has not been updated with the fixes in the bug fix file (below).
<P>
<img src="../ball.red.gif" alt="*">
Click <A href="960213.tar.gz">here</A> to download bug fixes up to February 13, 1996. (29 KB) <BR>
Fixes to chez.sch to handle structs/unions that are declared but not
defined; function pointers; and unsigned shorts (a typo). Also a minor fix
to policy.sch to remove gratuitous non-standard-ness (use of reverse! rather
than reverse). Also included generated standard libraries for Chez Scheme
back-end (unknowingly left out of distribution). Unpack in <em>lcc</em> main
directory.
<P>
<img src="../ball.red.gif" alt="*">
Click <A href="chez-policy.sch">here</A> to download an example of a Chez
Scheme policy file, left out of distribution.
<P>
<img src="../ball.red.gif" alt="*">
Click <A href="lcc-3.4b.tar.gz">here</A> to download the <em>lcc</em>
3.4b distribution.
(965 KB)
<P>
<HR>
<P>
Related systems:
<UL>
<LI> Kenneth Russell's <A href="http://www-white.media.mit.edu/~kbrussel/Header2Scheme">Header2Scheme</A>.
<LI> David Beazley's <A href="http://www.cs.utah.edu/~beazley/SWIG/">SWIG</A> system.
</UL>
<P>
<HR>
<P>
The <A HREF="todo.html">FFIGEN to-do list</A>.
<P>
<A HREF="mailto:lth@acm.org"><I>lth@acm.org</I></A><BR>
24 May 2000
</BODY>
</HTML>

241
www/manifesto.html Normal file
View File

@ -0,0 +1,241 @@
<!-- -*- mode: html; mode: font-lock -*-
Hand-translated from LaTeX to HTML by lth on 2000-05-16,
converted footnotes to in-line text, and inserted hyperlinks.
No other changes. -->
<html>
<head>
<title>FFIGEN Manifesto and Overview</title>
</head>
<body>
<center>
<h1>FFIGEN Manifesto and Overview</h1><br>
Lars Thomas Hansen <br>
<tt>lth@cs.uoregon.edu</tt><br>
February 6, 1996
</center>
<blockquote>
<p>FFIGEN (Foreign Function Interface GENerator) is a program suite which
facilitates the writing of translators from C header files to foreign
function interfaces for particular language implementations.</p>
<p>On a more general level, FFIGEN is a statement about how such
translators should be structured for maximum usability, namely as a
single translator from C to a rational intermediate language and as
multiple translators from the intermediate language to separate FFI
translations. In the present document I motivate this two-level
structure by arguing that the many policy questions inherent in choosing
a mapping from one language to another cannot be accomodated in a single
translator, and that the two-level structure promotes significant code
reuse. Companion documents present the program suite itself.</p>
</blockquote>
<h2>1. Manifesto</h2>
<p>Many language implementations have mechanisms which provide support for
call-outs to other, typically more primitive, languages. In particular,
implementations of very-high-level languages like Scheme, Common Lisp,
Standard ML, and Haskell support call-outs to system-level languages,
typically C. Other examples include the support for call-outs to C and
assembly language in C++, the EXTRINSIC directive in HPF, and the
<tt>&lt;*EXTERNAL*&gt;</tt> pragma in DEC SRC Modula-3. Mechanisms to call-out
to other languages are typically called <em>foreign function
interfaces</em> (FFIs). The purpose of an FFI is often to gain access to
functionality which is not (efficiently) expressible in the language
itself; other times the FFI is used to allow the program to interface to
existing libraries.</p>
<p>FFIs are only rarely part of the language definition; the only examples
I can think of are the support for C and assembly in C++ and the
EXTRINSIC directive in HPF. More typically, each language
implementation has its own idiosyncratic and often ad-hoc mechanism for
supporting foreign data types, functions, and variables. The mechanisms
are not standardized probably because they depend to a large extent on
the calling conventions of the procedure being called, the operating
system on which the program is running, the architecture of the machine,
the data types of the language being called, the version of the
compilers for the host and foreign languages, and so on. (In the
following I will refer to a point in the space made from the product of
the preceding attributes as a <em>target</em>.) Since the system
dependencies are considerable, it is unlikely that a fully general and
portable FFI can be defined for a language, and in addition, an
interface that works with all targets is likely to be neither functional
nor convenient. The chances for any portable, standardized language to
adopt a non-trivial FFI therefore seem slight. This is not to say that
an adequate job can't be done in many cases--for example, Franz Allegro
Common Lisp sports a sophisticated FFI which supports C and Fortran
seemingly very well--only that no <em>standard</em> and <em>general</em>
solution is likely to emerge.</p>
<p>Based on these observations, an approach to inter-language calling would
be to accept the fact that FFIs are implementation-dependent and instead
concentrate our effort on a higher level of abstraction: that of the
library interface. Even if the FFI is target-dependent, most of the
time the interface to a library is not (which is the beauty of an
interface in the first place). If, for each library, there existed a
reasonable definition of its interface, then a program could take that
definition and generate FFI code for the library for a given target.
This is the approach advocated by the creators of the ILU system (see
section 3).</p>
<p>However, manufacturers of libraries are <em>not</em> distributing
reasonable definitions of the interfaces to their libraries. All you
usually get is a C or C++ header file. A header file is not a
reasonable definition of the interface because of the baggage it
carries: nested include files, preprocessor macros, conditional
compilation, syntactic peculiarities, implementation language target
dependencies, and so on. In the best of all worlds, the manufacturer
would distribute the interfaces in an interface definition language like
the Object Management Group's IDL or ILU's ISL, and maybe one day that
will be common. In the mean time, we must fend for ourselves.</p>
<p>What we must do is to provide a translator which takes as its input not
a reasonable definition but instead a C or C++ header file or set of
header files, and produces as its output the FFI code for the library
for a given target. However, such a program is likely to be complicated
and there will be one version for each target. Maintaining all these
translators will be an unpleasant task. We could of course have one
translator, to IDL or ISL, and translators from the interface language
to the FFI, and as we will see, this is a variation on the mechanism
implemented by FFIGEN.</p>
<p>An additional important problem is that there is not one but several
translations for every target. A given interface can be translated to
any of several FFIs depending on the desired <em>policy</em> for the
translation. For example, consider a function
<pre>
char *fgets(char*, int, FILE*).
</pre>
What does <tt>char*</tt> translate to? Consider the FFI provided by Chez
Scheme version 5. It has a <tt>string</tt> type which in a parameter
position causes the address of the first character of the string
argument to be passed to the function, but which in the return position
causes the characters to be copied from the storage pointed to by the
return value (if not <tt>NULL</tt>) into a fresh Scheme string. So if we
translate <tt>char*</tt> as <tt>string</tt>, we end up with (since
<tt>FILE*</tt> is translated as an <tt>unsigned int</tt>)
<pre>
(define fgets
(foreign-function "fgets"
(string integer-32 unsigned-32)
string))
</pre>
which is expensive because the string is (needlessly) copied on return.
On the other hand, we can treat a <tt>char*</tt> as "just a pointer" and
translate as:
<pre>
(define fgets
(foreign-function "fgets"
(unsigned-32 integer-32 unsigned-32)
unsigned-32))
</pre>
but this does not let us access the characters in the buffer using
Scheme's string functions, since the buffer is not a string. In the
end, it appears that no fixed translation for <tt>char*</tt> is possible;
even if a fixed translation (and then: which one of them?) is adequate
in most situations, there will be special cases. (Arguably, it
would have been better for <tt>fgets()</tt> to return a truth value or the
number of characters read.)</p>
<p>The bottom line is, there is a lot of policy that goes into a
translation into a specific FFI. Hence we have a slogan (the core of
the Manifesto):</p>
<blockquote>
A good foreign function interface is 25% code and 75% policy.
</blockquote>
<p>It should be a goal, then, to separate the ardous task of parsing and
type-checking C headers and translating them into a rational
intermediate form, from the task of translating the intermediate form
into a FFI specification for a given target and translation policy.</p>
<h2>2. The FFIGEN System</h2>
<p>I have written a program, which I call <em>ffigen</em>, which takes
as its input a C header file and produces as its output a rational
translation of the interface defined by the header file. A rational
translation is one in which unnecessary or redundant syntax has been
removed, preprocessor macros have been expanded, and preprocessor
conditionals have been resolved so that definitions have been included
or excluded corrspondingly. The exact format of the intermediate code
is described in a companion document, the <a href="userman.html">FFIGEN
User's Manual</a>. <em>ffigen</em> functions as the <em>front-end</em>
of a system which translates C headers into foreign function
interfaces.</p>
<p>Each target system will have one or more specific <em>back-ends</em> which
take the intermediate form and produce translations for particular
targets and translation policies. Substantial parts of the back-end
code is largely target-independent and can therefore be shared by
multiple back-ends.</p>
<p>I have written one back-end to serve as a sample; it produces FFI code
for Chez Scheme version 5. It is documented in a companion document,
<em>FFIGEN Back-end for Chez Scheme Version 5</em>.</p>
<h2>3. Related Work</h2>
<p>Kenneth B. Russell of MIT has implemented a system called Header2Scheme
which translates C++ to the FFI of the SCM Scheme system. FFIGEN and
Header2Scheme are fairly different at this point. My goal with FFIGEN
was to cover all of ANSI C including the preprocessor in a reasonable
way; this is doable because ANSI C is a small, fixed, and fairly simple
language. C++, on the other hand, is a very large, changing, and
complex language, and Header2Scheme therefore handles only part of it at
this time (as of version 1.2, it does not handle preprocessor macros,
typedefs, and enums). In addition, my emphasis was on not fixing policy
at all, which gives great freedom (and more work) to back-end writers,
whereas Russell has mostly fixed the policy. On the other hand,
Header2Scheme allows some policy decisions to be expressed in auxiliary
files given to the translator, and I have yet to experiment with these
mechanisms in FFIGEN. Header2Scheme is available from URL
<pre>
http://www-white.media.mit.edu/~kbrussel/Header2Scheme
</pre>
</p>
<p>A message (<tt>&lt;1996Jan17.121933.25825@chemabs.uucp&gt;</tt>) posted
to the Usenet group <tt>comp.lang.scheme</tt> (among others) alleged that
Apple has a translator for their Dylan implementation which will take a
C header file and generate Dylan FFI glue for it. I know nothing else
about this system (but would appreciate hearing about it from anyone who
knows).</p>
<p>The ILU (Inter-Language Unification) system from Xerox PARC provides
cross-language calling functionality for modules which have interfaces
specified in ISL, the ILU interface definition language. ILU will take
the interfaces and produce stubs (glue, as it were) for the languages so
that they can call each other. The ISL file specifies the interface
somewhat abstractly in terms of data types which are meaningful in ISL
but which have various mappings in the target languages; again, one
mapping is assumed to fit all.</p>
<h2>4. Acknowlegements</h2>
<p>FFIGEN is based on the <em>lcc</em> ANSI C compiler. See the <a
href="userman.html">FFIGEN User's Manual</a> for full acknowlegements
and a copyright notice.</p>
<p>This work has been supported by ARPA under U.S. Army grant
No. DABT63-94-C-0029, "Programming Environments, Compiler Technology
and Runtime Systems for Object Oriented Parallel Processing".</p>
<hr>
<address>
<A HREF="mailto:lth@acm.org">lth@acm.org</A>
</address>
<em>24 May 2000</em>
</body>
</html>

BIN
www/manifesto.ps.gz Normal file

Binary file not shown.

92
www/todo.html Normal file
View File

@ -0,0 +1,92 @@
<HTML>
<HEAD>
<TITLE>FFIGEN To-do list</TITLE>
<LINK REV="made" HREF="mailto:lth@acm.org">
</HEAD>
<BODY>
<H2>FFIGEN To-do list</H2>
Updated 14 June 2000.
<H3>Intermediate format features</H3>
<UL>
<LI> Full ANSI C support:
<UL>
<LI> [done] Bitfields.
<LI> General support for type qualifiers.
</UL>
<LI> Output a machine description.
<LI> Output struct/union sizes.
<LI> Output structure field offsets.
<LI> Output line and column information.
<LI> Retain and output comments.
<LI> Output source file information (the name of the input file to
<code>lcc -ffigen</code>); this can
be useful since the back end can generate C files which
<code>#includes</code> the source header file.
<LI> Support certain extensions: Microsoft __huge, __near, __far, __based,
__cdecl, __pascal; GNU __inline; others?
</UL>
<H3>Processing (both front-end and back-end)</H3>
<UL>
<LI> Some general support for a policy file?
<LI> More intelligent macro-expansion support: macros should be expanded
as far as possible, and extraneous cruft should be removed so that the
back end can produce better translations.
<LI> Support for some form of tokenized macros to support certain regular
and nice rewrites? C libraries like Open Inventor use macros
heavily in a virtual-function like style:
<pre>
#define SoSphSetOverride(_this, state) \
SoNodeSetOverride((SoNode *)_this, state)
</pre>
and it would be nice to provide some support for such cases in the form
of already-tokenized output.
</UL>
<H3>Implementation features</H3>
<UL>
<LI> Move to lcc 4.1, and ASDL.
<LI> [done] Move to lcc 3.6.
<LI> [done] Proper integration with lcc. Currently, it uses the lcc driver but
it generates code, performs assembly, and produces file.o (which it need
not do). In addition, the output file is called SYMBOLS but should
rather be called filename.ffi.
</UL>
<H3>Known bugs</H3>
<UL>
<LI> [done] Currently the rhs of a macro is output without any whitespace. This
is not correct if there are two adjacent identifiers or reserved words,
which happens in declarations (consider "const int blah"). [Harold]
</UL>
<H3>Back-ends</H3>
<UL>
<LI> Back-end for Scheme-to-C.
<LI> Back-end for Gambit-C (Harold's got one working, it also does
interesting things with Open Inventor macros (see above)).
<LI> Back-end for Tcl/Tk?
<LI> Back-ends for ILU and Modula-3.
<LI> Back-end for STk.
<LI> Improvements to Chez Scheme back-end.
</UL>
<H3>Miscellaneous</H3>
<UL>
<LI> Advertise on lcc mailing list.
</UL>
<HR>
<P>
Press <A HREF="index.html">here</A> to go to the FFIGEN home page.
<hr>
<address><a href="mailto:lth@acm.org">lth@acm.org</address>
</BODY>
</HTML>

554
www/userman.html Normal file
View File

@ -0,0 +1,554 @@
<!-- -*- mode: html; mode: font-lock -*-
Hand-translated from LaTeX to HTML by lth on 2000-05-16, and
converted footnotes to in-line text. Fixed a small number of
typos. No other changes. -->
<html>
<head>
<title>FFIGEN User's Manual</title>
</head>
<body>
<center>
<h1>FFIGEN User's Manual</h1><br>
(Preliminary)<br>
Lars Thomas Hansen<br>
<tt>lth@cs.uoregon.edu</tt><br>
February 6, 1996
</center>
<h2>1. Introduction</h2>
<p>FFIGEN is a program system which facilitates the writing of
translators from C header files to foreign function interfaces for
particular programming language implementations. This document
describes its structure and use. The discussion is aimed at translator
writers; everyone else should confine themselves to section 3. A
companion document, <a href="manifesto.html">FFIGEN Manifesto and
Overview</a>, motivates the work, and other companion documents describe
specific translator implementations. In particular, the document
<em>FFIGEN Back-end for Chez Scheme Version 5</em> describes one
translator in detail.</p>
<p>FFIGEN is based on the <em>lcc</em> C compiler, which is copyrighted
software. See Section 10 for a full copyright notice.</p>
<h2>2. Writing Translators</h2>
<p>To generate a translation of a header file you run the <em>ffigen</em>
command to generate an intermediate form of the C header files you want
to translate, and then run the back-end on the resulting files to
generate the foreign function interface for the library.</p>
<p>Your task, should you choose to accept it, is to implement the
target-specific parts of the back-end for your particular target (which
is to say, combination of host language implementation, operating
system, architecture, foreign language implementation, and translation
policy). You should be able to use the FFIGEN front-end and the
target-independent parts of the back-end pretty much as they are.</p>
<p>How to implement the target-specific parts of the back-end is
discussed in Section 6. Use of the front end is described in Section 2.
The intermediate format is described in Section 4, and the
target-independent parts of the back-end and their interface to the
target-dependent part are described in Section 5. Finally, Section 7
covers some issues which need to be tackled in the future.</p>
<h2>3. Running FFIGEN</h2>
<p>The command <em>ffigen</em> is run on a set of header files with
preprocessor option and include file options. Arguments are processed
in order. For each header file (type <tt>.h</tt>) and all the files it
includes, a single preprocessor file (type <tt>.ffi</tt>) is
produced.</p>
<p>The options are:
<dl>
<dt><tt>-Dname[=value]</tt>
<dd>Define preprocessor macro.
<dt><tt>-Uname</tt>
<dd>Undefine preprocessor macro.
<dt><tt>-Idirectory</tt>
<dd>Add directory to the <em>beginning</em> of the list
of include files. Standard directories include the <em>lcc</em> include
directory, <tt>/usr/include</tt>, and the current directory (in that order).
See the release notes for information about how to change the defaults.
</dl>
<em>ffigen</em> performs full syntax and type checks on its input.</p>
The back-end is run by starting your favorite Scheme system and then
loading first the target-independent file <tt>process.sch</tt> and second
the target-dependent part of the translator; in the case of the Chez
Scheme back-end the file is called <tt>chez.sch</tt>. You then call the
procedure <tt>process</tt> with the name of the <tt>.ffi</tt> file to
process, as discussed in section 5.
<h2>4. Intermediate Format</h2>
<p>The intermediate format consists of s-expressions following this grammar:
<pre>
&lt;file&gt; -&gt; &lt;record&gt; ...
&lt;record&gt; -&gt; (function &lt;filename&gt; &lt;name&gt; &lt;type&gt; &lt;attrs&gt;)
| (var &lt;filename&gt; &lt;name&gt; &lt;type&gt; &lt;attrs&gt;)
| (type &lt;filename&gt; &lt;name&gt; &lt;type&gt;)
| (struct &lt;filename&gt; &lt;name&gt; ((&lt;name&gt; &lt;type&gt;) ...))
| (union &lt;filename&gt; &lt;name&gt; ((&lt;name&gt; &lt;type&gt;) ...))
| (enum &lt;filename&gt; &lt;name&gt; ((&lt;name&gt; &lt;value&gt;) ...))
| (enum-ident &lt;filename&gt; &lt;name&gt; &lt;value&gt;)
| (macro &lt;filename&gt; &lt;name+args&gt; &lt;body&gt;)
&lt;type&gt; -&gt; (&lt;primitive&gt; &lt;attrs&gt;)
| (struct-ref &lt;tag&gt;)
| (union-ref &lt;tag&gt;)
| (enum-ref &lt;tag&gt;)
| (function (&lt;type&gt; ...) &lt;type&gt;)
| (pointer &lt;type&gt;)
| (array &lt;value&gt; &lt;type&gt;)
&lt;attrs&gt; -&gt; (&lt;attr&gt; ...)
&lt;attr&gt; -&gt; static | extern | const | volatile
&lt;primitive&gt; -&gt; char | signed-char | unsigned-char | short
| unsigned-short | int | unsigned | long
| unsigned-long | float | double | void
&lt;value&gt; -&gt; &lt;integer&gt;
&lt;filename&gt; -&gt; &lt;string&gt;
&lt;name&gt; -&gt; &lt;string&gt;
&lt;body&gt; -&gt; &lt;string&gt;
&lt;name+args&gt; -&gt; &lt;string&gt;
&lt;tag&gt; -&gt; &lt;string&gt;
</pre>
Notes relating to the grammar:</p>
<ul>
<li> <tt>...</tt> means "zero or more of" the preceding item.
<li> The grammar is a little more general than the actual output
language. All structs, unions, and enums in parameter lists, return
types, and variable declarations are encoded as <tt>struct-ref</tt>,
<tt>union-ref</tt>, and <tt>enum-ref</tt>, respectively; structure, union,
and enum type definitions occur only in <tt>struct</tt>, <tt>union</tt>,
and <tt>enum</tt> records.
<li> The <tt>&lt;tag&gt;</tt> field in structs/unions/enums (and their
<tt>-ref</tt> forms) is the tag. If one of these types
has a user-defined tag, then that tag is used in the <tt>struct-ref</tt>
item for the type; if the structure had no user-defined tag then a tag has been
generated by <em>lcc</em>. Generated tags have the syntax of positive
integers; in particular they start with a digit. There is one namespace
each for structs, unions, and enums.
<li>
<tt>typedef</tt> names are not used anywhere: they occur in <tt>type</tt>
records only.
<li>
The attributes on primitive types are <tt>const</tt> or <tt>volatile</tt>; the
attributes <tt>static</tt> and <tt>extern</tt> are used only on functions and
global variables.
<li>
Functions which are known to take no parameters (<em>ie</em> <tt>t f(void)</tt>) have
one parameter, of type <tt>(void ())</tt>. The void type appears in a
parameter list only as the last element.
<li>
Functions which take a variable number of arguments have at least one
defined non-void parameter and a last parameter of type <tt>(void ())</tt>.
<li>
Functions for which no parameters were defined (<em>ie</em> <tt>t f()</tt>) have
no parameters.
<li>
The ordering of records in the input has no relation to the
relative ordering of declarations in the original source.
<li>
The <tt>&lt;value&gt;</tt> field in the array is its size. If the size is not
known, it is 0.
<li>
Multidimensional arrays are represented as nested array types with the
leftmost dimension outermost in the expected way; i.e., it looks like
an array of arrays.
<li>
Arrays are not valid return types.
<li>
Array parameters lose some semantic information in the translation in
the current system. An array parameter <tt>t a[n]</tt> is always
converted to a pointer: <tt>(pointer t)</tt> regardless of whether
<tt>n</tt> is known or not. As expected, then, something like
<tt>t a[n][m][o]</tt> gets the parameter type
<tt>(pointer (array m (array o t)))</tt>. Note that this only pertains to
parameter types; variables of array type are not converted in this manner.
(The semantic information claimed lost is the size of the leftmost
dimension. This lossage may make it impossible to perform array conversion
at call boundaries, for example.)
<li>
The grammar describes the current format, which will change: line number
and column information will be incorporated. You should always use the
accessor functions defined in the target-independent part of the
back-end; see section 5. The grammar does not allow
for bit fields or qualifications on anything but primitive
types, but these will be accomodated eventually.
</ul>
<h2>5. The Target-Independent Back-End</h2>
<p>The target-independent back-end is a Scheme program called
<tt>process</tt> which reads the intermediate form into memory and
performs some initial processing. It exports some global variables and
a number of procedures which are used to access the structures in the
database of intermediate records, and imports two target-dependent
functions from the target-dependent back-end. This section describes
the interfaces.</p>
<p>The global variables which hold the database are:
<pre>
(define functions '()) ; list of function records
(define vars '()) ; list of var records
(define types '()) ; list of type records
(define structs '()) ; list of struct records
(define unions '()) ; list of union records
(define macros '()) ; list of macro records
(define enums '()) ; list of enum records
(define enum-idents '()) ; list of enum-ident records
</pre>
Each of these contains a list of all the records of the type indicated
by their names. Note that records may look different internally than
in the defined intermediate form, so accessor functions (see below) should
always be used.</p>
<p>In addition, there are two globals which are set but not used by
the target-independent back-end:
<pre>
(define source-file #f) ; name of the input file itself
(define filenames '()) ; names of all files in the input
</pre>
</p>
<p>The main entry point to the back end is the procedure <tt>process</tt>,
which takes a single file name as an argument. <tt>Process</tt>
initializes globals, reads the file, and processes the records.
<pre>
(define (process filename) ...)
</pre></p>
<p>Record processing consists of some general analysis and target-specific
code generation. First, the target-specific procedure
<tt>select-functions</tt> is called; it must set or reset the
"referenced" bit in each record depending on whether the function is
interesting to the back-end or not. After computing reachability of
structured types and setting the referenced bits of those types which
are reachable, a translation is generated by a call to the back-end
function <tt>generate-translation</tt>, which takes no arguments.
<pre>
(define (select-functions) ...)
(define (generate-translation) ...)
</pre></p>
<p>A number of data structure accessors and mutators are also available.
These are generic procedures which work on all of the record types.
<pre>
(define (file r) ...) ; file name of record
(define (name r) ...) ; name in records which have one
(define (type r) ...) ; type in records which have one
(define (attrs r) ...) ; attrs in records which have one
(define (fields r) ...) ; fields in struct/union record
(define (value r) ...) ; value of enum-ident record
(define (tag r) ...) ; tag in struct/union/union/-ref record
(define (referenced? r) ...) ; is record referenced?
(define (referenced! r) ...) ; set referenced bit
(define (unreferenced! r) ...) ; reset referenced bit
</pre>
Arguably the <tt>tag</tt> accessor should go away and <tt>name</tt>
should simply be used in its place. As it is, <tt>name</tt> is not
defined on <tt>struct-ref</tt>, <tt>union-ref</tt>, and
<tt>enum-ref</tt> records.</p>
<p>The procedure <tt>record-tag</tt> returns the tag of the record currently
being held. It can also be applied to types.
<pre>
(define (record-tag r) ...) ; get record tag
</pre></p>
<p>All records can have back-end specific values attached to them; usually
these are cached names for operations on structured values, so for now
the procedures which manipulate the back-end specific data are called
<tt>cache-name</tt> to remember a value and <tt>cached-names</tt> to return
the list of remembered values:
<pre>
(define (cache-name r v) ...) ; remember value in record
(define (cached-names r) ...) ; retrieve remembered values
</pre>
We should probably replace this with a more general property-list-like
mechanism.</p>
<p>In addition, two procedures extract parts of function types:
<pre>
(define (arglist r) ...) ; function argument types
(define (rett r) ...) ; function return type
</pre></p>
<p>Some utilities to deal with file names are also provided:
<pre>
(define (strip-extension fn) ...)
(define (strip-path fn) ...)
(define (get-path fn) ...)
</pre></p>
<p>A string macro expander makes it easier to generate C code, for the back
ends that need it. The macro expander is called <tt>instantiate</tt> and
is called with a string template and a vector of arguments (which are
also strings). The template contains patterns of the form <tt>@n</tt>
where <tt>n</tt> is a single digit; when such a pattern is seen it is
replaced with the corresponding value from the argument vector.
<pre>
(define (instantiate template arguments) ...)
</pre></p>
<p>Two procedures, <tt>struct-names</tt> and <tt>union-names</tt>, take a
structure (or union) and returns a list of all the typedef names which
reference the structure directly.
<pre>
(define (struct-names struct) ...)
(define (union-names union) ...)
</pre></p>
<p>An association function which searches one of the record lists for a
given record by the <tt>name</tt> field is also available:
<pre>
(define (lookup key items) ...)
</pre></p>
<p>The procedure <tt>user-defined-tag?</tt> determines whether a tag was
defined by the user or generated by the system:
<pre>
(define (user-defined-tag? x) ...)
</pre></p>
<p>The procedure <tt>warn</tt> takes some arbitrary arguments and generates
a warning message on standard output:
<pre>
(define (warn msg . rest) ...)
</pre></p>
<p>Some standard predicates take a type and test its kind:
<tt>primitive-type?</tt> is true if the argument is of a primitive type as
outlined in the grammar above; <tt>basic-type?</tt> is true if the
argument is a primitive type or a pointer type; <tt>array-type?</tt> is
true if the argument is an array type, and finally,
<tt>structured-type?</tt> is true if the argument is a <tt>struct-ref</tt>
or <tt>union-ref</tt> type:
<pre>
(define (primitive-type? t) ...)
(define (basic-type? t) ...)
(define (array-type? t) ...)
(define (structured-type? t) ...)
</pre></p>
<h2>6. Writing a Target-Dependent Back-End</h2>
<p>To write the target-dependent back-end, you must decide on the policy
for the translation and then implement the translation. The policy
covers such issues as: which constructs in C are or are not handled; the
translation for each handled construct; how non-handled constructs are
dealt with (ignored, detected with warnings, detected with errors); how
to deal with exceptional cases (consider the <tt>fgets</tt> example from
the <a href="manifesto.html">Manifesto</a>).</p>
<p>For a concrete example, see the companion document <em>FFIGEN Backend
for Chez Scheme Version 5</em>, which addresses many of the choices to be
made and their possible solutions.</p>
<h2>7. Future Work</h2>
<p>A number of features <em>will</em> be supported in the future:</p>
<ul>
<li> There will be a line and a column field in each record, giving the
source line on which the identifier was defined.
<li> Bitfields will be supported.
<li> Qualifiers (what's now called attributes, that is, const and
volatile) will be supported on all types, not just on primitive
non-pointer types like now.
<li> The intermediate representation will include the name of the orignal
input file, and its path.
<li> The intermediate representation will include a representation of
the include file hierarchy which was traversed to produce the
intermediate representation.
</ul>
<p>A number of features will most likely be supported, but need
to be investigated:</p>
<ul>
<li> It would be nice to retain comments.
<li> Various popular extensions to C are not currently supported by
<em>lcc</em>, but would be extremely useful: <tt>long long</tt> is used
extensively in Unix header files, and header files for compilers on PCs
often use the common Microsoft extensions <tt>__huge</tt>, <tt>__far</tt>,
and <tt>__near</tt> (and their non-underscore equivalents). Some C compilers
support <tt>__inline</tt> declarations, and although we can't generate
code for in-line procedures we can at least parse them if the compiler
can cope with <tt>__inline</tt>. (<tt>__inline</tt> is the easier, since it
can be ignored. The others must show up as type qualifiers or new types.)
<li> The current shell-program driver will probably be replaced by
something based on the lcc driver.
<li> I'm going to experiment with partial macro application in the
front end so that back-ends can have simple support for macro
definitions. Currently, for example, even something as simple as the
<tt>EOF</tt> macro will be ignored by the Chez Scheme back-end because its
form is <tt>"(-1)"</tt> rather than simply <tt>"-1"</tt>.
<li> Information about the layout of fields within structured types
should possibly be emitted; this information would be useful to
low-level FFIs which need byte offset and size to access the field of a
structure.
</ul>
<p>In addition, there are some issues to investigate in a larger
perspective:</p>
<ul>
<li> General (target-independent) support for useful policy mechanisms.
<li> How well can the intermediate language support other front-ends?
I don't want to fall into the UNCOL pit, but it would be interesting to
see how languages which resemble C in their parameter passing mechanisms
(Pascal, Modula, Oberon) could be mapped onto the intermediate language.
This is not high priority with me, however. If I embark on supporting
another front-end language it will probably be (sigh) C++.
</ul>
<h2>8. Please Contribute!</h2>
<p>My goal is to support as many target languages as is reasonable, but I
can't write all the translators myself (I lack the time and, in many
cases, the knowledge). Targets that I will take care of include STk,
and, if no-one beats me to it, Scsh, both Scheme systems. Someone has
already volunteered to write the ILU back-end. Others are interested
in back-ends for Modula-3 and Mercury.</p>
<p>Volunteers for any translator back-end are welcome to e-mail me and
volunteer their help. I will coach, coordinate, and help out as much as
possible.</p>
<h2>9. Credits</h2>
<p>FFIGEN is based on the freely available <em>lcc</em> ANSI C compiler,
implemented by Christopher Fraser (of AT&amp;T Bell Labs) and David Hanson
(of Princeton University).</p>
<p>I would like to thank Fraser and Hanson for producing such an excellent
system; <em>lcc</em> has been a joy to work with, and their book, <em>A
Retargetable C Compiler: Design and Implementation</em>, made the
implementation of the FFIGEN front end in the matter of roughly a single
work day possible. Would it be that all software was this clean!</p>
<p>The development of FFIGEN was supported by ARPA
under U.S. Army grant No. DABT63-94-C-0029,
``Programming Environments, Compiler Technology and Runtime Systems
for Object Oriented Parallel Processing''.</p>
<h2>10. Copyrights</h2>
<em>lcc</em> is covered by the following Copyright notice:
<blockquote>
<p>The authors of this software are Christopher W. Fraser and
David R. Hanson.</p>
<p>Copyright (c) 1991,1992,1993,1994,1995 by AT&amp;T, Christopher W. Fraser,
and David R. Hanson. All Rights Reserved.</p>
<p>Permission to use, copy, modify, and distribute this software for any
purpose, subject to the provisions described below, without fee is
hereby granted, provided that this entire notice is included in all
copies of any software that is or includes a copy or modification of
this software and in all copies of the supporting documentation for
such software.</p>
<p>THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED
WARRANTY. IN PARTICULAR, NEITHER THE AUTHORS NOR AT&amp;T MAKE ANY
REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY
OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE.</p>
<p>lcc is not public-domain software, shareware, and it is not protected
by a `copyleft' agreement, like the code from the Free Software
Foundation.</p>
<p>lcc is available free for your personal research and instructional use
under the `fair use' provisions of the copyright law. You may,
however, redistribute the lcc in whole or in part provided you
acknowledge its source and include this COPYRIGHT file.</P>
<p>You may not sell lcc or any product derived from it in which it is a
significant part of the value of the product. Using the lcc front end
to build a C syntax checker is an example of this kind of product.</p>
<p>You may use parts of lcc in products as long as you charge for only
those components that are entirely your own and you acknowledge the use
of lcc clearly in all product documentation and distribution media. You
must state clearly that your product uses or is based on parts of lcc
and that lcc is available free of charge. You must also request that
bug reports on your product be reported to you. Using the lcc front
end to build a C compiler for the Motorola 88000 chip and charging for
and distributing only the 88000 code generator is an example of this
kind of product.</p>
<p>Using parts of lcc in other products is more problematic. For example,
using parts of lcc in a C++ compiler could save substantial time and
effort and therefore contribute significantly to the profitability of
the product. This kind of use, or any use where others stand to make a
profit from what is primarily our work, is subject to negotiation.</p>
<p>Chris Fraser / cwf@research.att.com <br>
David Hanson / drh@cs.princeton.edu<br>
Fri Jun 17 11:57:07 EDT 1994</p>
</blockquote>
<hr>
<address>
<A HREF="mailto:lth@acm.org">lth@acm.org</A>
</address>
<em>24 May 2000</em>
</body>
</html>

BIN
www/userman.ps.gz Normal file

Binary file not shown.