| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
gauche.cgen - Cコードの生成Significant part of Gauche is written in Gauche or S-expression
based DSL. During the building process, they are converted
into C sources and then compiled by C compiler.
The gauche.cgen module and its submodules expose
the functionality Gauche build process is using to the general use.
Required features for a C code generator differ greatly among
applications, and too much scaffolding could be a constraint
for the module users. So, instead of providing a single
solid framework, we provide a set of loosely coupled modules
so that you can combine necessary features freely. In fact,
some of Gauche build process only use gauche.cgen.unit
and gauche.cgen.literal (see ‘src/builtin-syms.scm’,
for example).
This is a convenience module that extends
gauche.cgen.unit, gauche.cgen.literal,
gauche.cgen.type and gauche.cgen.cise together.
Usually you can just use gauche.cgen and don’t need
to think about individual submodules.
The following subsections are organized by submodules
only for the convenience of explanation.
| 9.3.1 Generating C source files | gauche.cgen.unit | |
| 9.3.2 Generating Scheme literals | gauche.cgen.literals | |
| 9.3.3 Conversions between Scheme and C | gauche.cgen.type | |
| 9.3.4 CiSE - C in S expression | gauche.cgen.cise |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
One of the tricky issues about generating C source is that you have to put several fragments of code in different parts of the source file, even you want to say just one thing—that is, sometimes you have to put declaration before the actual definition, plus some setup code that needs to be run at initialization time.
A cgen-unit is a unit of C source generation.
It corresponds to one .c file, and optionally one .h file.
During the processing, a "current unit" is kept in a parameter
cgen-current-unit, and most cgen APIs implicitly work to it.
The following slot are for public use. They are used to tailor the output. Usually you set those slots at initializatoin time. The effect is undefined if you change them in the middle of code geneartion process.
A string to name this unit. This is used for the default name of the generated files (‘name.c’ and ‘name.h’) and the suffix of the default name of initialization function. Other cgen modules may use this to generate names. Avoid using characters that are valid for C identifiers.
You can override those default names by setting the other slots.
The name of the C source file and header file, in strings.
If they are #f (by default), the value of name slot
is used as the file name, with extension .c or .h
is attached, respectively.
To get the file names to be generated, use cgen-unit-c-file
and cugen-unit-h-file generic functions, instead of reading
these slots.
A list of strings to be inserted at the top of the generated sources.
The default value is ("/* Generated by gauche.cgen */").
Each string appears in its
own line. Usually you don’t need anything
A string to start or to end the initialization function, respectively.
The default value of init-prologue is
"void Scm_Init_NAME(void) {" where NAME is the
value of the name slot. The default value of init-epilogue
is just "}". Each appears in its own line.
To get the default initialization function name, use cgen-unit-init-name
generic function.
To customize initialization function name, arguments and/or return type,
set init-prologue.
The content of initialization function is filled by the code
fragments registered by cgen-init.
A parameter to keep the current cgen-unit.
A typical flow of generating C code is as follows:
<cgen-unit> and make it the current unit.
Write the accumulated code fragments in cgen-unit to a
C source file and C header file. The name of the files are
determined by calling cgen-unit-c-file and cgen-unit-h-file,
respectively. If the files already exist, its content is overwritten;
you can’t gradually write to the files.
So, usually these procedures are called at the last step of the code
generation.
We’ll explain the details of how each file is organized under “Filling the content” section below.
Returns a string that names C source and header file for
cgen-unit, respectively. The default method first
looks at c-file or h-file slot of the
cgen-unit, and if it is #f, use the value of name
slot and appends an extension .c or .h.
Returns a string that names the initialization function
generated to C. It is used to create the default
init-prologue value.
There are four parts to which you can add C code fragment. Within each part, code fragments are rendered in the same order as added.
This part is put into the header file, if exists.
Placed at the beginning of the C source, after the standard prologue.
Placed in the C source, following the ’decl’ part.
Placed inside the initialization function, which appears at the end of the C source.
The following procedures are the simple way to put a souce code fragments in an appropriate part:
Put code fragments code … to the appropriate parts. Each fragment must be a string.
This is an almost minimal example to show the typical usage.
After running this code you’ll get my-cfile.c and
my-cfile.h in the current directory.
(use gauche.cgen)
(define *unit* (make <cgen-unit> :name "my-cfile"))
(parameterize ([cgen-current-unit *unit*])
(cgen-decl "#include <stdio.h>")
(cgen-init "printf(stderr, \"initialization function\\n\");")
(cgen-body "void foo(int n) { printf(stderr, \"got %d\\n\", n); }")
(cgen-extern "void foo(int n);")
)
(cgen-emit-c *unit*)
(cgen-emit-h *unit*)
|
These are handy escaping procedures; they are useful even
if you don’t use other parts of the cgen modules.
Escapes characters invalid in C identifiers or C comments.
With cgen-safe-name, characters other than ASCII alphabets
and digits are converted to a form _XX, where XX is
hexadecimal notation of the character code. (Note that the character
_ is also converted.) So the returned string can be used
safely as a C identifier. The mapping is injective, that is,
if the source strings differ, the result string always differ.
On the other hand, cgen-safe-name-friendly convers
the input string into more readable C identifier. -> becomes
_TO (e.g. char->integer becomes char_TOinteger),
other - and _ become _,
? becomes P (e.g. char? becomes charP),
! becomes X (e.g. set! becomes setX),
< and > become _LT and _GT respectively.
Other special characters except _ are converted to _XX
as in cgen-safe-name. The mapping is not injective; e.g.
both read-line and read_line map to read_line.
Use this only when you think some human needs to read the generated
C code (which is not recommended, by the way.)
Much simpler is c-safe-comment, which just converts
/* and */ into / * and * / (a space
between those two characters), so that it won’t terminate the
comment inadvertently. (Technically, escaping only */ suffice,
but some simple-minded C parser might be confused by /* in the
comments). The conversion isn’t injective as well.
If you want to conditionalize a fragment by C preprocessor
#ifdefs, use the following macro:
Code fragments submitted in body … are protected
by #if cpp-expr and #endif.
If cpp-expr is a string, it is emitted literally:
(cgen-with-cpp-condition "defined(FOO)" (cgen-init "foo();")) ;; will generate: #if defined(FOO) foo(); #endif /* defined(FOO) */ |
You can also construct cpp-expr by S-expr.
<cpp-expr> : <string>
| (defined <cpp-expr>)
| (not <cpp-expr>)
| (<n-ary-op> <cpp-expr> <cpp-expr> ...)
| (<binary-op> <cpp-expr> <cpp-expr>)
<n-ary-op> : and | or | + | * | - | /
<binary-op> : > | >= | == | < | <= | !=
| logand | logior | lognot | >> | <<
|
Example:
(cgen-with-cpp-condition '(and (defined FOO)
(defined BAR))
(cgen-init "foo();"))
;; will generate:
#if ((defined FOO)&&(defined BAR))
foo();
#endif /* ((defined FOO)&&(defined BAR)) */
|
You can nest cgen-with-cpp-condition.
When you try to abstract code generation process,
calling individual procedures for each parts (e.g. cgen-body
or cgen-init) becomes tedious, since such higher-level
constructs are likely to require generating code fragments
to various parts. Instead, you can create a customized class
that handles submission of fragments to appropriate parts.
A base class to represent a set of code fragments.
The state of C preprocessor condition (set by with-cgen-cpp-condition)
is captured when an instance of the subclass of this class is
created, so generating appropriate #ifs and #endifs are
automatically handled.
You subclass <cgen-node>, then define method(s) to
one or more of the following generic functions:
These generic functions are called during writing out
the C source within cgen-emit-c and cgen-emit-h.
Inside these methods, anything written out to the
current output port goes into the output file.
While generating .h file by cgen-emit-h,
cgen-emit-xtrn method for all submitted nodes are
called in order of submission.
While generating .c file by cgen-emit-c,
cgen-emit-decl method for all submitted nodes are
called first, then cgen-emit-body method, then
cgen-emit-init method.
If you don’t specialize any one of these method, it doesn’t generate code in that part.
Once you define your subclass and create an instace, you can submit it to the current cgen unit by this procedure:
Submit cgen-node to the current cgen unit. If the current unit is not set, cgen-node is simply ignored.
In fact, the procedures cgen-extern, cgen-decl,
cgen-body and cgen-init are just a convenience
wrapper to create an internal subclass specialized to generate
code fragment only to the designated part.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Sometimes you want to refer to a Scheme constant value in C code.
It is trivial if the value is a simple thing like Scheme boolean
(SCM_TRUE, SCM_FALSE), characters (SCM_MAKE_CHAR(code)),
small integers (SCM_MAKE_INT(value)), etc. You can directly
write it in C code. However, once you step outside of these simple
values, it gets tedious quickly, involving static data declarations
and/or runtime initialization code.
For example, to get a Scheme value of a list of symbols (a b c),
you have to (1) create ScmStrings for the names of the symbols,
(2) pass them to Scm_Intern to get Scheme symbols, then
(3) call Scm_Conses (or a convenience macro SCM_LIST3) to
build a list.
With gauche.cgen, those code can be generated automatically.
NOTE: If you use cgen-literal, make sure you call
(cgen-decl "#include <gauche.h>") to include ‘gauche.h’
before the first call of cgen-literal, which may insert
declarations that ness ‘gauche.h’.
Returns an <cgen-literal> object for a Scheme object obj,
and submit necessary declarations and initialization code to the
current cgen unit.
For the above example, you can just call (cgen-literal '(a b c))
and the C code to set up the Scheme literal of the list of three
symbols will be generated.
The result of cgen-literal is an instance of <cgen-literal>;
the detail of the class isn’t for public use, but you can use it
to refer the created literal in C code.
Returns a C code expression fragment of type ScmObj,
which represents the Scheme literal value.
The following example creates a C function printabc that prints
the literal value (a b c), created by cgen-literal.
(define *unit* (make <cgen-unit> :name "foo"))
(parameterize ((cgen-current-unit *unit*))
(let1 lit (cgen-literal '(a b c))
(cgen-body
(format "void printabc() { Scm_Printf(SCM_CUROUT, \"%S\", ~a); }"
(cgen-c-name lit)))))
(cgen-emit-c *unit*)
|
If you examine the generated file ‘foo.c’, you’ll get a general idea of how it is handled.
One advantage of cgen-literal is that it tries to share
the same literal whenever possible. If you call
(cgen-literal '(a b c)) twice in the same cgen unit,
you’ll get one instance of cgen-literal. If you call
(cgen-literal '(b c)) then, it will share the tail
of the original list (a b c). So you can just use
cgen-literal whenever you need to have Scheme literal
values, without worrying about generating excessive amount of
duplicated code.
Certain Scheme objects cannot be generated as a literal; for example, an opened port can’t, since it carries lots of runtime information.
(There’s a machinery to allow programmers to extend the cgen-literal behavior for new types. The API isn’t fixed yet, though.)
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
In the C world, any Scheme object is uniformly of type ScmObj.
But it is often the case that you need to narrow down to the
specific type and convert it to a C value. Gauche maintains
a database of how to typecheck and map Scheme value to C value and
vice versa.
Note that the mapping isn’t one-to-one: Scheme <integer>
can be mapped to C’s short, long, unsigned int,
or even just ScmObj if the C routine wants to cover bignums.
So each mapping has its own name. For historical reasons, each
mapping is called stub type. The names of stub types look
like Scheme type but its semantics differ from Scheme type.
Remember: Each stub type represents a specific mapping between a
Scheme type and a C type.
Each stub type has a C-predicate, a boxer and an unboxer, each of them is a Scheme string for the name of a C function or C macro. A C-predicate takes ScmObj object and returns C boolean value that if the given object has a valid type and range for the stub type. A boxer takes C object and converts it to a Scheme object; it usually involves wrapping or boxing the C value in a tagged pointer or object, hence the name. An unboxer does the opposite: takes a Scheme object and convert it to a C value. The Scheme object must be checked by the C-predicate before being passed to the unboxer.
The following table shows the predefined stub types.
Note that the most of aggregate types has one to one mappings.
The difficult ones are numeric types and strings.
Scheme numbers can represent much wider range of numbers
than C, so you have to narrow down according to the capability
of C routine. Scheme strings have byte size and character length,
and the body may not be NULL-terminated; so the <string>
stub type maps Scheme string to ScmString*. For the convenience,
you can use <const-cstring>, which creates NUL-terminated
C string; beware that it may incur some copying cost.
Stub type Scheme C Notes
-----------------------------------------------------------------
<fixnum> <integer> int Integers within fixnum range
<integer> <integer> ScmObj Any exact integers
<real> <real> double Value converted to double
<number> <number> ScmObj Any numbers
<int> <integer> int Integers representable in C
<int8> <integer> int
<int16> <integer> int
<int32> <integer> int
<short> <integer> short
<long> <integer> long
<uint> <integer> uint Integers representable in C
<uint8> <integer> uint
<uint16> <integer> uint
<uint32> <integer> uint
<ushort> <integer> ushort
<ulong> <integer> ulong
<float> <real> float Unboxed value casted to float
<double> <real> double Alias of <real>
<boolean> <boolean> int Boolean value
<char> <char> ScmChar Note: not a C char
<void> - void (Used only as a return type.
Scheme function returns #<undef>)
<string> <string> ScmString* Note: not a C string
<const-cstring> <string> const char* For arguments, string is unboxed
by Scm_GetStringConst.
For return values, C string is boxed
by SCM_MAKE_STR_COPYING.
<pair> <pair> ScmPair*
<list> <list> ScmObj
<string> <string> ScmString*
<symbol> <symbol> ScmSymbol*
<keyword> <keyword> ScmKeyword*
<vector> <vector> ScmVector*
<uvector> <uvector> ScmUVector*
<s8vector> <s8vector> ScmS8Vector*
<u8vector> <u8vector> ScmU8Vector*
<s16vector> <s16vector> ScmS16Vector*
<u16vector> <u16vector> ScmU16Vector*
<s32vector> <s32vector> ScmS32Vector*
<u32vector> <u32vector> ScmU32Vector*
<s64vector> <s64vector> ScmS64Vector*
<u64vector> <u64vector> ScmU64Vector*
<f16vector> <f16vector> ScmF16Vector*
<f32vector> <f32vector> ScmF32Vector*
<f64vector> <f64vector> ScmF64Vector*
<hash-table> <hash-table> ScmHashTable*
<tree-map> <tree-map> ScmTreeMap*
<char-set> <char-set> ScmCharSet*
<regexp> <regexp> ScmRegexp*
<regmatch> <regmatch> ScmRegMatch*
<port> <port> ScmPort*
<input-port> <input-port> ScmPort*
<output-port> <output-port> ScmPort*
<procedure> <procedure> ScmProcedure*
<closure> <closure> ScmClosure*
<promise> <promise> ScmPromise*
<class> <class> ScmClass*
<method> <method> ScmMethod*
<module> <module> ScmModule*
<thread> <thread> ScmVM*
<mutex> <mutex> ScmMutex*
<condition-variable> <condition-variable> ScmConditionVariable*
|
A stub type can have a maybe variation, denoted by
? suffix; e.g. <string>?. It is a union type of
the base type and boolean false (for <string>?, it
can be either <string> or #f.) In the C world,
boolean false is mapped to NULL pointer. It is convenient
to pass a C value that allowed to be NULL back and forth—if
you pass #f from the Scheme world it comes out NULL to
the C world, and vice versa. The maybe variation is only
meaningful when the C type is a pointer type.
An instance of this class represents a stub type.
It can be looked up by name such as <const-cstring> by
cgen-type-from-name.
Returns an instance of <cgen-type> that has name.
If the name is unknown, #f is returned.
c-expr is a string denotes a C expression. Returns a string of C expression that boxes, unboxes, or typechecks the c-expr according to the cgen-type.
;; suppose foo() returns char* (cgen-box-expr (cgen-type-from-name ’<const-cstring>) "foo()") ⇒ "SCM_MAKE_STR_COPYING(foo())"
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Shiro Kawai on May 28, 2012 using texi2html 1.82.