Generating C code (Gauche Users’ Reference)

9.4 `gauche.cgen` - Generating C code

Significant part of Gauche is written in Gauche or S-expression based DSL. During the building process, they are converted into C sources and then compiled by C compiler. The gauche.cgen module and its submodules expose the functionality Gauche build process is using to the general use.

Required features for a C code generator differ greatly among applications, and too much scaffolding could be a constraint for the module users. So, instead of providing a single solid framework, we provide a set of loosely coupled modules so that you can combine necessary features freely. In fact, some of Gauche build process only use gauche.cgen.unit and gauche.cgen.literal (see src/builtin-syms.scm, for example).

Module: gauche.cgen ¶: This is a convenience module that extends gauche.cgen.unit, gauche.cgen.literal, gauche.cgen.type and gauche.cgen.cise together.

Usually you can just use gauche.cgen and don’t need to think about individual submodules. The following subsections are organized by submodules only for the convenience of explanation.

9.4.1 Generating C source files

One of the tricky issues about generating C source is that you have to put several fragments of code in different parts of the source file, even you want to say just one thing—that is, sometimes you have to put declaration before the actual definition, plus some setup code that needs to be run at initialization time. The <cgen-unit> class takes care of such code placement.

Creating a frame

Class: <cgen-unit> ¶

{gauche.cgen} A cgen-unit is a unit of C source generation. It corresponds to one .c file, and optionally one .h file. During the processing, a "current unit" is kept in a parameter cgen-current-unit, and most cgen APIs implicitly work to it.

The following slot are for public use. They are used to tailor the output. Usually you set those slots at initialization time. The effect is undefined if you change them in the middle of the code generation process.

Instance Variable of <cgen-unit>: name ¶

A string to name this unit. This is used for the default name of the generated files (name.c and name.h) and the suffix of the default name of initialization function. Other cgen modules may use this to generate names. Avoid using characters that are not valid for C identifiers.

You can override those default names by setting the other slots.

Instance Variable of <cgen-unit>: c-file ¶

Instance Variable of <cgen-unit>: h-file ¶

The name of the C source file and header file, in strings. If they are #f (by default), the value of name slot is used as the file name, with extension .c or .h is attached, respectively.

To get the file names to be generated, use cgen-unit-c-file and cgen-unit-h-file generic functions, instead of reading these slots.

Instance Variable of <cgen-unit>: preamble ¶: A list of strings to be inserted at the top of the generated sources. The default value is ("/* Generated by gauche.cgen */"). Each string appears in its own line.

Instance Variable of <cgen-unit>: init-prologue ¶

Instance Variable of <cgen-init>: init-epilogue ¶

A string to start or to end the initialization function, respectively. The default value of init-prologue is "void Scm_Init_NAME(void) {" where NAME is the value of the name slot. The default value of init-epilogue is just "}". Each string appears in its own line.

To get the default initialization function name, use cgen-unit-init-name generic function.

To customize initialization function name, arguments and/or return type, set init-prologue.

The content of initialization function is filled by the code fragments registered by cgen-init.

Parameter: cgen-current-unit ¶: A parameter to keep the current cgen-unit.

A typical flow of generating C code is as follows:

Create a <cgen-unit> instance and make it the current unit.
Call code insertion APIs with code fragments. Fragments are accumulated in the current unit.
Call emit method (cgen-emit-c, cgen-emit-h) on the unit, which generates a C file and optionally a header file.

Generic Function: cgen-emit-c cgen-unit ¶

Generic Function: cgen-emit-h cgen-unit ¶

{gauche.cgen} Write the accumulated code fragments in cgen-unit to a C source file and C header file. The name of the files are determined by calling cgen-unit-c-file and cgen-unit-h-file, respectively. If the files already exist, its content is overwritten; you can’t gradually write to the files. So, usually these procedures are called at the last step of the code generation.

We’ll explain the details of how each file is organized under “Filling the content” section below.

Generic Function: cgen-unit-c-file cgen-unit ¶
Generic Function: cgen-unit-h-file cgen-unit ¶: {gauche.cgen} Returns a string that names C source and header file for cgen-unit, respectively. The default method first looks at c-file or h-file slot of the cgen-unit, and if it is #f, use the value of name slot and appends an extension .c or .h.

Generic Function: cgen-unit-init-name cgen-unit ¶: {gauche.cgen} Returns a string that names the initialization function generated to C. It is used to create the default init-prologue value.

Filling the content

There are four parts to which you can add C code fragment. Within each part, code fragments are rendered in the same order as added.

extern: This part is put into the header file, if exists.
decl: Placed at the beginning of the C source, after the standard prologue.
body: Placed in the C source, following the ’decl’ part.
init: Placed inside the initialization function, which appears at the end of the C source.

The following procedures are the simple way to put a source code fragments in an appropriate part:

Function: cgen-extern code … ¶
Function: cgen-decl code … ¶
Function: cgen-body code … ¶
Function: cgen-init code … ¶: {gauche.cgen} Put code fragments code … to the appropriate parts. Each fragment must be a string.

This is a minimal example to show the typical usage. After running this code you’ll get my-cfile.c and my-cfile.h in the current directory.

(use gauche.cgen)

(define *unit* (make <cgen-unit> :name "my-cfile"))

(parameterize ([cgen-current-unit *unit*])
  (cgen-decl "#include <stdio.h>")
  (cgen-init "printf(stderr, \"initialization function\\n\");")
  (cgen-body "void foo(int n) { printf(stderr, \"got %d\\n\", n); }")
  (cgen-extern "void foo(int n);")
  )

(cgen-emit-c *unit*)
(cgen-emit-h *unit*)

These are handy escaping procedures; they are useful even if you don’t use other parts of the cgen modules.

Function: cgen-safe-name string ¶

Function: cgen-safe-name-friendly string ¶

Function: cgen-safe-string string ¶

Function: cgen-safe-comment string ¶

{gauche.cgen} Escapes characters invalid in C identifiers, C string literals or C comments.

With cgen-safe-name, characters other than ASCII alphabets and digits are converted to a form _XX, where XX is hexadecimal notation of the character code. (Note that the character _ is also converted.) So the returned string can be used safely as a C identifier. The mapping is injective, that is, if the source strings differ, the result string always differ.

On the other hand, cgen-safe-name-friendly converts the input string into more readable C identifier. -> becomes _TO (e.g. char->integer becomes char_TOinteger), other - and _ become _, ? becomes P (e.g. char? becomes charP), ! becomes X (e.g. set! becomes setX), < and > become _LT and _GT respectively. Other special characters except _ are converted to _XX as in cgen-safe-name. The mapping is not injective; e.g. both read-line and read_line map to read_line. Use this only when you think some human needs to read the generated C code (which is not recommended, by the way.)

If you want to write out a Scheme string as a C string literal, you can use cgen-safe-string. It escapes control characters and non-ascii characters. If the Scheme string contains a character beyond ASCII, it is encoded in Gauche’s native encoding. (NB: It also escapes ?, to avoid accidenal formation of C trigraphs).

Much simpler is cgen-safe-comment, which just converts /* and */ into / * and * / (a space between those two characters), so that it won’t terminate the comment inadvertently. (Technically, escaping only */ suffice, but some simple-minded C parser might be confused by /* in the comments). The conversion isn’t injective as well.

(cgen-safe-name "char-alphabetic?")
  ⇒ "char_2dalphabetic_3f"
(cgen-safe-name-friendly "char-alphabetic?")
  ⇒ "char_alphabeticP"
(cgen-safe-string "char-alphabetic?")
  ⇒ "\"char-alphabetic\\077\""

(cgen-safe-comment "*/*"
  ⇒ "* / *"

If you want to conditionalize a fragment by C preprocessor #ifdefs, use the following macro:

Macro: cgen-with-cpp-condition cpp-expr body … ¶

{gauche.cgen} Code fragments submitted in body … are protected by #if cpp-expr and #endif.

If cpp-expr is a string, it is emitted literally:

(cgen-with-cpp-condition "defined(FOO)"
  (cgen-init "foo();"))

;; will generate:
#if defined(FOO)
foo();
#endif /* defined(FOO) */

You can also construct cpp-expr by S-expr.

<cpp-expr> : <string>
           | (defined <cpp-expr>)
           | (not <cpp-expr>)
           | (<n-ary-op> <cpp-expr> <cpp-expr> ...)
           | (<binary-op> <cpp-expr> <cpp-expr>)

<n-ary-op> : and | or | + | * | - | /

<binary-op> : > | >= | == | < | <= | !=
            | logand | logior | lognot | >> | <<

Example:

(cgen-with-cpp-condition '(and (defined FOO)
                               (defined BAR))
  (cgen-init "foo();"))

;; will generate:
#if ((defined FOO)&&(defined BAR))
foo();
#endif /* ((defined FOO)&&(defined BAR)) */

You can nest cgen-with-cpp-condition.

Submitting code fragments for more than one parts

When you try to abstract code generation process, calling individual procedures for each parts (e.g. cgen-body or cgen-init) becomes tedious, since such higher-level constructs are likely to require generating code fragments to various parts. Instead, you can create a customized class that handles submission of fragments to appropriate parts.

Class: <cgen-node> ¶

{gauche.cgen} A base class to represent a set of code fragments.

The state of C preprocessor condition (set by with-cgen-cpp-condition) is captured when an instance of the subclass of this class is created, so generating appropriate #ifs and #endifs are automatically handled.

You subclass <cgen-node>, then define method(s) to one or more of the following generic functions:

Generic Function: cgen-emit-xtrn cgen-node ¶

Generic Function: cgen-emit-decl cgen-node ¶

Generic Function: cgen-emit-body cgen-node ¶

Generic Function: cgen-emit-init cgen-node ¶

{gauche.cgen} These generic functions are called during writing out the C source within cgen-emit-c and cgen-emit-h. Inside these methods, anything written out to the current output port goes into the output file.

While generating .h file by cgen-emit-h, cgen-emit-xtrn method for all submitted nodes are called in order of submission.

While generating .c file by cgen-emit-c, cgen-emit-decl method for all submitted nodes are called first, then cgen-emit-body method, then cgen-emit-init method.

If you don’t specialize any one of these method, it doesn’t generate code in that part.

Once you define your subclass and create an instance, you can submit it to the current cgen unit by this procedure:

Function: cgen-add! cgen-node ¶: {gauche.cgen} Submit cgen-node to the current cgen unit. If the current unit is not set, cgen-node is simply ignored.

In fact, the procedures cgen-extern, cgen-decl, cgen-body and cgen-init are just a convenience wrapper to create an internal subclass specialized to generate code fragment only to the designated part.

9.4.2 Generating Scheme literals

Sometimes you want to refer to a Scheme constant value in C code. It is trivial if the value is a simple thing like Scheme boolean (SCM_TRUE, SCM_FALSE), characters (SCM_MAKE_CHAR(code)), small integers (SCM_MAKE_INT(value)), etc. You can directly write it in C code. However, once you step outside of these simple values, it gets tedious quickly, involving static data declarations and/or runtime initialization code.

For example, to get a Scheme value of a list of symbols (a b c), you have to (1) create ScmStrings for the names of the symbols, (2) pass them to Scm_Intern to get Scheme symbols, then (3) call Scm_Conses (or a convenience macro SCM_LIST3) to build a list.

With gauche.cgen, those code can be generated automatically.

NOTE: If you use cgen-literal, make sure you call (cgen-decl "#include <gauche.h>") to include gauche.h before the first call of cgen-literal, which may insert declarations that needs gauche.h.

Function: cgen-literal obj ¶

{gauche.cgen} Returns an <cgen-literal> object for a Scheme object obj, and submit necessary declarations and initialization code to the current cgen unit.

The detail of the <cgen-literal> object isn’t for public use, but one thing you can do is to pass it to cgen-cexpr to obtain a C code fragment that refers to the Scheme value at runtime. See the example in cgen-expr entry below.

Generic Function: cgen-cexpr cgen-literal ¶

{gauche.cgen} Returns a C code expression fragment of type ScmObj, which represents the Scheme literal value.

If the Scheme object is a immediate one such as boolean or fixnum, the C code is a immediate code to return the value, e.g. SCM_FALSE or SCM_MAKE_INT(1234). If the Scheme object requires allocation, cgen-literal allocates memory and initializes it, and cgen-cexpr returns a pointer to that object.

The following example creates a C function printabc that prints the literal value (a b c), created by cgen-literal.

(define *unit* (make <cgen-unit> :name "foo"))
(parameterize ((cgen-current-unit *unit*))
  (let1 lit (cgen-literal '(a b c))
    (cgen-body
     (format "void printabc() { Scm_Printf(SCM_CUROUT, \"%S\", ~a); }"
             (cgen-c-name lit)))))
(cgen-emit-c *unit*)

If you examine the generated file foo.c, you’ll get a general idea of how it is handled.

One advantage of cgen-literal is that it tries to share the same literal whenever possible. If you call (cgen-literal '(a b c)) twice in the same cgen unit, you’ll get one instance of cgen-literal. If you call (cgen-literal '(b c)) then, it will share the tail of the original list (a b c). So you can just use cgen-literal whenever you need to have Scheme literal values, without worrying about generating excessive amount of duplicated code.

(Note that the literals registered with cgen-literal must be treated as immutable, just as in the Scheme world.)

Certain Scheme objects cannot be generated as a literal; for example, an opened port can’t, since it carries lots of runtime information.

(There’s a machinery to allow programmers to extend the cgen-literal behavior for new types. The API isn’t fixed yet, though.)

9.4.3 Conversions between Scheme and C

In the C world, Scheme objects are uniformly represented as an opaque tagged pointer ScmObj. In order to access the actual objects, you need to check its runtime type information and to retrive the actual C type out of it.

Stub types are the objects that bridge Scheme runtime types and C types. Since mappings between Scheme types and C types are not one-to-one, there are more stub types than either types; for example, Scheme <integer> type may be bridged to C int type by the stub type <int>, but it may also be bridged to C short type by the stub type <short>.

Do not confuse stub types and Gauche’s runtime types—stub types are meta information associated to runtime types. You can look up a stub type by its name by cgen-type-from-name. The session below shows the difference of the runtime types and stub types:

gosh> <int>
#<native-type <int>>
gosh> ,d
#<native-type <int>> is an instance of class <native-type>
slots:
  name      : <int>
  super     : #<class <integer>>
  c-type-name: "int"
  size      : 4
  alignment : 4

gosh> (cgen-type-from-name '<int>)
#<cten-type <int>>
gosh> ,d
#<cten-type <int>> is an instance of class <cgen-type>
slots:
  name      : <int>
  scheme-type: #<native-type <int>>
  c-type    : "int"
  description: "int"
  cclass    : #f
  %c-predicate: "SCM_INTEGERP"
  %unboxer  : "Scm_GetInteger"
  %boxer    : "Scm_MakeInteger"
  %maybe    : #f

gosh> <integer>
#<class <integer>>
gosh> ,d
#<class <integer>> is an instance of class <integer-meta>
slots:
  name      : <integer>
  cpl       : (#<class <integer>> #<class <rational>> #<class <real>> #<cl
  direct-supers: (#<class <rational>>)
  accessors : ()
  slots     : ()
  direct-slots: ()
  num-instance-slots: 0
  direct-subclasses: ()
  direct-methods: ()
  initargs  : ()
  defined-modules: (#<module gauche>)
  redefined : #f
  category  : builtin
  core-size : 0

gosh> (cgen-type-from-name '<integer>)
#<cten-type <integer>>
gosh> ,d
#<cten-type <integer>> is an instance of class <cgen-type>
slots:
  name      : <integer>
  scheme-type: #<class <integer>>
  c-type    : "ScmObj"
  description: "exact integer"
  cclass    : #f
  %c-predicate: "SCM_INTEGERP"
  %unboxer  : ""
  %boxer    : "SCM_OBJ_SAFE"
  %maybe    : #f

Each stub type has a C-predicate, a boxer and an unboxer, each of them is a Scheme string for the name of a C function or C macro. A C-predicate takes ScmObj object and returns C boolean value that if the given object has a valid type and range for the stub type. A boxer takes C object and converts it to a Scheme object; it usually involves wrapping or boxing the C value in a tagged pointer or object, hence the name. An unboxer does the opposite: takes a Scheme object and convert it to a C value. The Scheme object must be checked by the C-predicate before being passed to the unboxer.

We have a few categories of stub types.

Stub types corresponds to native types (see Native types).
Stub types corresponds to C-class types. These are Scheme object whose structure is defined in C. They can be treated as ScmObj or can be casted to the specific C type; e.g. <symbol> can be casted to ScmSymbol*. Its unboxer is ScmObj -> C-TYPE*, and boxer is C-TYPE* -> ScmObj.
Pass-through types. These are Scheme object that are also handled as ScmObj in C-level. Stub types only typecheck, and its boxer and unboxer are just identity. It can be either purely-Scheme-defined objects, or an object that can take multiple representations (e.g. <integer> can be a fixnum or ScmBignum*, so the stub generator passes through it, and the C routine handles the internals.)
Maybe stub types. It is noted by a question mark suffix. In stub context, we only concern maybe type that can be unboxed into a C pointer type. In addition to the objects of the origial type, it maps Scheme’s #f to C’s NULL and vice versa. For example, <port>? maps Scheme’s <port> instance to C’s ScmPort*, and Scheme’s #f to C’s NULL.

Class: <cgen-type> ¶: {gauche.cgen} An instance of this class represents a stub type. It can be looked up by name such as <const-cstring> by cgen-type-from-name.

Function: cgen-type-from-name name ¶: {gauche.cgen} Returns an instance of <cgen-type> that has name. If the name is unknown, #f is returned.

Function: cgen-box-expr cgen-type c-expr ¶

Function: cgen-unbox-expr cgen-type c-expr ¶

Function: cgen-pred-expr cgen-type c-expr ¶

{gauche.cgen} c-expr is a string denotes a C expression. Returns a string of C expression that boxes, unboxes, or typechecks the c-expr according to the cgen-type.

;; suppose foo() returns char*
(cgen-box-expr
 (cgen-type-from-name '<const-cstring>)
 "foo()")
 ⇒ "SCM_MAKE_STR_COPYING(foo())"

9.4.4 CiSE - C in S expression

Some low-level routines in Gauche are implemented in C, but they’re written in S-expression. We call it “C in S expression”, or CiSE.

The advantage of using S-expression is its readability, obviously. Another advantage is that it allows us to write macros as S-expr to S-expr translation, just like the legacy Scheme macros. That’s a powerful feature—effectively you can extend C language to suit your needs.

The gauche.cgen.cise module provides a set of tools to convert CiSE code into C code to be passed to the C compiler. It also has some support to overcome C quirks, such as preparing forward declarations.

Currently, we don’t do rigorous check for CiSE; you can pass a CiSE expression to the translator that yields invalid C code, which will cause the C compiler to emit errors. The translator inserts line directives by default so the C compiler error message points to the location of original (CiSE) source instead of generated code; however, sometimes you need to look at the generated code to figure out what went wrong. We hope this will be improved in future.

In Gauche source code, CiSE is extensively used in precompiled Scheme files and recognized by the precompiler. However, gauche.cgen.cise is an independent module only relies on gauche.cgen basic features, so you can plug it to your own C code generating programs.

9.4.4.1 CiSE overview

Before diving into the details, it’s easier to grasp some basic concepts.

A CiSE fragment is an S-expression that follows CiSE syntax (see CiSE syntax). A CiSE fragment can be translated to a C code fragment by cise-render. Note that some translation may not be local, e.g. it may want to emit forward declarations before other C code fragments. So, the full translation requires buffering—you process all the CiSE fragments and save output, emit forward declarations, then emit the saved C code fragments. We have a wrapper procedure, cise-translate, to take care of it, but for your purpose you may want to roll your own wrapper.

A CiSE macro is a Scheme code that translates a CiSE fragment to another CiSE fragment. There are number of predefined CiSE macros. You can add your own CiSE macros by utilities such as define-cise-stmt and define-cise-expr.

A CiSE ambient is a bundle of information that affects fragment translation. It contains CiSE macro definitions, and also it keeps track of forward declarations.

If you’re not sure how a cise fragment corresponds to C code, you can interactively try it:

gosh> (cise-render-to-string
        '(.struct foo (i::int c::(const char*)))))
"struct foo { int i; const char* c; } "
gosh> (cise-render-to-string
        '(define-cfn foo (x::int) (return (+ x 3)))
        'toplevel))
" ScmObj foo(int x){{return ((x)+(3));}}"

(The second argument of cise-render-to-string specifies the context; see CiSE procedures, for the details.)

9.4.4.2 CiSE syntax

In this section, we lists basic CiSE syntax. They are just data from the viewpoint of Gauche—so you can build and manipulate them like any S-expression (quasiquote comes pretty handy).

CiSE types

C types can be written either as a symbol (e.g. int) or a list (e.g. (const char *). When used in definition, it is preceded by ::. The following example shows types are used in local variable definitions:

(let* ([a :: int 0]
       [b :: (const char *) "abc"])
  ...)

For the convenience and readability, you can write the variable name, separating double-colon and type name concatenated. You can also concatenate point suffixes (char* instead of char * in the following example):

(let* ([a::int 0]
       [b::(const char*) "abc"])
  ...)

CiSE translater first breaks up these concatenated forms, then deal with types.

At this moment, CiSE does not check if type is valid C type. It just pass along whatever given.

There are a few special type notations for more complex types. These can appear in middle of the type; for example, you can write (const .struct x (a::int b::double) *) to produce const struct x {int a; double b;} *.

CiSE Type: .array elt-type (dim …) ¶

Expands to C array type, whose element type is of elt-type and dimensions are dim ….

(cise-render-to-string '(.array char (3)))
  ⇒ "char [3]"

(cise-render-to-string '(.array int (2 5)))
  ⇒ "int [2][5]"

The last element of dim can be *, corresponds to the C type without specifying the array size:

(cise-render-to-string '(.array char (*)))
  ⇒ "char [3]"

(cise-render-to-string '(.array int (10 *)))
  ⇒ "int [10][]"

Here’s an example of global C variable definition of array type:

(cise-render-to-string '(define-cvar params ::(.array int (PARAM_SIZE)))
                       'toplevel)
  ⇒ " int params[PARAM_SIZE];"

CiSE Type: .struct [tag] [(field-spec …)] ¶
CiSE Type: .union [tag] [(field-spec …)] ¶

CiSE Type: .function (arg-spec …) ret-type ¶

CiSE statements

CiSE Statement: begin stmt … ¶: Code grouping with { and }

CiSE Statement: let* (binding …) stmt … ¶

Define and optionally assign initial values to local variables. The binding is a form of either one of the following type:

(name [:: type] [init-expr]): Define a C variable name of type type. type should be a CiSE type. If type is omitted, the default type is ScmObj. Optional init-expr is a CiSE expression to compute the initial value of name. Note that array initialization is not supported yet.
(_ cise-form): This is to allow arbitrary CiSE statement or expression cise-form between local variable definitions. See the example below.

The (_ cise-form) binding is useful when you want to do some check between other bindings, without having nested let*:

(let* ([len::ScmSmallInt (Scm_Length lis)]
       [_ (when (< len 1) (Scm_Error "Lis is too short: %S" lis))]
       [first (SCM_CAR lis)])
  ...)

CiSE Statement: if test-expr then-stmt [else-stmt] ¶
CiSE Statement: when test-expr stmt … ¶
CiSE Statement: unless test-expr stmt … ¶
CiSE Statement: cond (cond1 stmt1 …) … [ (else else-stmt …) ] ¶: Conditional statements.

CiSE Statement: case expr ((val1 …) stmt1 …) … [ (else else-stmt …) ] ¶
CiSE Statement: case/fallthrough expr ((val1 …) stmt1 …) … [ (else else-stmt …) ] ¶: Switch-case statement. case does not fall through between ’case’ blocks while case/fallthrough does.

CiSE Statement: for (start-expr test-expr update-expr) stmt … ¶
CiSE Statement: for () stmt … ¶
CiSE Statement: loop stmt … ¶
CiSE Statement: while test-expr body … ¶: Loop statements.

CiSE Statement: for-each (lambda (var) stmt …) expr ¶
CiSE Statement: dolist [var expr] stmt … ¶: expr must yield a list. Traverse the list, binding each element to var and executing stmt …. The lambda form is a fake; you don’t really create a closure.

CiSE Statement: pair-for-each (lambda (var) stmt …) expr ¶: Like for-each, but var is bound to each ’spine’ cell instead of each element of the list.

CiSE Statement: dopairs [var expr] stmt … ¶

CiSE Statement: dotimes (var expr) stmt … ¶: expr must yield an integer, n. Repeat stmt … by binding var from 0 to (n-1).

CiSE Statement: return [expr] ¶
CiSE Statement: break ¶
CiSE Statement: continue ¶: Return, break and continue statements.

CiSE Statement: label name ¶
CiSE Statement: goto name ¶: Label and goto statements. We always add a null statement after the label so that we can place (label name) at the end of a compound statement.

CiSE Statement: .if expr stmt [stmt] ¶

CiSE Statement: .when expr stmt … ¶

CiSE Statement: .unless expr stmt … ¶

CiSE Statement: .cond clause … ¶

CiSE Statement: .define name[(arg …)] [expr] ¶

CiSE Statement: .undef name ¶

CiSE Statement: .include path ¶

Preprocessor directives.

expr could be a string, a symbol, a number or one of the following forms:

(defined c)
(not c)
(and c)
(or c)
(op c …) where op is either + or *.
(op c c …) where op is either - or /.
(op c c) where op is either >, >=, ==, <, <=, !=, logand, logior, lognot, << or >>.

Note that defining a macro function without value

#define foo(abc)

is not supported because it’s ambiguous with

#define foo abc()

when written in CiSE syntax. (.define foo (abc)) always generates the latter.

.include could take a symbol. This is used for including system header files, e.g. (.include <stdint.h>).

CiSE Statement: define-cfn name (arg [:: type] …) [ret-type [qualifier …]] stmt … ¶

Defines a C function.

If type or ret-type is omitted, the default type is ScmObj.

Supported qualifiers are :static and :inline, corresponding to C’s static and inline keywords. If :static is specified, forward declaration is automatically generated.

CiSE Statement: define-cvar name [:: type] [qualifier …] [<init-expr>] ¶: Defines a global C variable. Supported qualifier is :static. Note that array initialization is not supported yet.

CiSE Statement: define-ctype name [:: type] ¶: Defines a new type using typedef

CiSE Statement: declare-cfn name (arg [:: type] …) [ret-type] ¶
CiSE Statement: declare-cvar name [:: type] ¶: Declares an external C function or variable.

CiSE Statement: .static-decls ¶: Produce declarations of static functions before function bodies.

CiSE Statement: .raw-c-code body … ¶

CiSE expressions

CiSE Expression: + expr … ¶
CiSE Expression: - expr … ¶
CiSE Expression: * expr … ¶
CiSE Expression: / expr … ¶
CiSE Expression: % expr1 expr2 ¶: Arithmetic operations.

CiSE Expression: and expr … ¶
CiSE Expression: or expr … ¶
CiSE Expression: not expr ¶: Boolean operations.

CiSE Expression: logand expr1 expr2 … ¶
CiSE Expression: logior expr1 expr2 … ¶
CiSE Expression: logxor expr1 expr2 … ¶
CiSE Expression: lognot expr ¶
CiSE Expression: << expr1 expr2 ¶
CiSE Expression: >> expr1 expr2 ¶: Bitwise operations.

CiSE Expression: * expr ¶
CiSE Expression: -> expr1 expr2 … ¶
CiSE Expression: ref expr1 expr2 … ¶
CiSE Expression: aref expr1 expr2 … ¶
CiSE Expression: & expr ¶: Dereference, reference and address operations. ref is C’s .. aref is array reference.

CiSE Expression: pre++ expr ¶
CiSE Expression: post++ expr ¶
CiSE Expression: pre-- expr ¶
CiSE Expression: post-- expr ¶: Pre/Post increment or decrement.

CiSE Expression: < expr1 expr2 ¶
CiSE Expression: <= expr1 expr2 ¶
CiSE Expression: > expr1 expr2 ¶
CiSE Expression: >= expr1 expr2 ¶
CiSE Expression: == expr1 expr2 ¶
CiSE Expression: != expr1 expr2 ¶: Comparison.

CiSE Expression: set! lvalue1 expr1 lvalue2 expr2 … ¶
CiSE Expression: = lvalue1 expr1 lvalue2 expr2 … ¶
CiSE Expression: += lvalue expr ¶
CiSE Expression: -= lvalue expr ¶
CiSE Expression: *= lvalue expr ¶
CiSE Expression: /= lvalue expr ¶
CiSE Expression: %= lvalue expr ¶
CiSE Expression: <<= lvalue expr ¶
CiSE Expression: >>= lvalue expr ¶
CiSE Expression: logand= lvalue expr ¶
CiSE Expression: logior= lvalue expr ¶
CiSE Expression: logxor= lvalue expr ¶: Assignment expressions.

CiSE Expression: cast type expr ¶: Type casting.

CiSE Expression: ?: test-expr then-expr else-expr ¶: Conditional expression.

CiSE Expression: .type type ¶: Useful to place a type name, e.g. an argument of sizeof operator.

CiSE Expression: new type ¶

CiSE Expression: new (type expr ...) ¶

CiSE Expression: new type (dim ...) ¶

CiSE Expression: new (type expr ...) (dim ...) ¶

C++ new operator. The second argument can be just a type name, or a constructor call. The optional second argument specifies array dimensions.

(new MyClass)              ⇒ new MyClass;
(new (MyClass a b) (1 2))  ⇒ new MyClass(a, b)[1,2];

CiSE Expression: delete expr ¶

CiSE Expression: delete () expr ¶

C++ delete operator. The second form is to delete an array.

(delete (* foo))  ⇒ delete *foo;
(delete () foo)   ⇒ delete[] foo;

9.4.4.3 CiSE procedures

Parameter: cise-ambient ¶: {gauche.cgen}

Function: cise-default-ambient ¶: {gauche.cgen}

Function: cise-ambient-copy ambient ¶: {gauche.cgen}

Function: cise-ambient-decl-strings ambient ¶: {gauche.cgen}

Parameter: cise-emit-source-line ¶: {gauche.cgen}

Function: cise-render cise-fragment :optional port context ¶: {gauche.cgen}

Function: cise-render-to-string cise-fragment :optional context ¶: {gauche.cgen}

Function: cise-render-rec cise-fragment stmt/expr env ¶: {gauche.cgen}

Function: cise-translate inp outp :key environment ¶: {gauche.cgen}

Function: cise-register-macro! name expander :optional ambient ¶: {gauche.cgen}

Function: cise-lookup-macro name :optional ambient ¶: {gauche.cgen}

Macro: define-cise-stmt name [env] clause … [:where definition …] ¶
Macro: define-cise-expr name [env] clause … [:where definition …] ¶
Macro: define-cise-toplevel name [env] clause … [:where definition …] ¶: {gauche.cgen}

Macro: define-cise-macro (name form env) body … ¶
Macro: define-cise-macro name name2 ¶: {gauche.cgen}

9.4.5 Stub generation

Stub Form: define-stub-type NAME C-TYPE [DESC C-PREDICATE UNBOXER BOXER] ¶: Register a new type to be recognized. This is rather a declaration than definition; no C code will be generated directly by this form.

Stub Form: define-cproc name (args …) [ret-type] [flag …] [qualifier …] stmt … ¶

Create Scheme procedure.

args specifies arguments:

arg … [:rest var] : Each arg is variable name or var::type, specifies required argument. If :rest is given, list of excessive arguments are passed to var.
arg … :optional spec … [:rest rest-var] : Optional arguments. spec is var or (var default). If no default is given, var receives SCM_UNBOUND—if var isn’t a type of ScmObj it will raise an error.
ARG … :key spec … [:allow-other-keys [:rest rest-var]] : Keyword arguments. spec is var or (var default). If no default is given, var receives SCM_UNBOUND—if var isn’t a type of ScmObj it will raise an error.
arg … :optarray (var cnt max) [:rest rest-var] : A special syntax to receive optional arguments as a C array. var is a C variable of type ScmObj*. cnt is a C variable of type int, which receives the number of optional argument in the ScmObj array. max specifies the maximum number of optional arguments that can be passed in the array form. If more than max args are given, a list of excessive arguments are passed to the rest-var if it is specified

ret-type specifies the return type of function. It could be either :: typespec or ::typespec where typespec is a valid stub type, or (type …) when multiple values are returned. When omitted, the procedure is assumed to return <top>.

flag is a keyword to modify some aspects of the procedure. Supported flags are as follows:

:fast-flonum - indicates that the procedure accepts flonum arguments and it won’t retain the reference to them. The VM can pass flonums on VM registers to the procedure with this flag. (This improves floating-point number handling, but it’s behavior is highly VM-specific; ordinary stub writers shouldn’t need to care about this flag at all.)
:constant - indicates that this procedure returns a constant value if all args are compile-time constants. The compiler may replace the call to this proc with the value, if it determines all arguments are known at the compile time. The resulting value should be serializable to the precompiled file.
NB: Since this procedure may be called at compile time, a subr that may return a different value for batch/cross compilation shouldn’t have this flag.

qualifier is a list to adds auxiliary information to the procedure. Currently the following qualifiers are officially supported.

(setter setter-name) : specify setter. setter-name should be a cproc name defined in the same stub file
(setter (args …) body …) : specify setter anonymously.
(catch (decl c-stmt …) …) : when writing a stub for C++ function that may throw an exception, use this spec to ensure the exception will be caught and converted to Gauche error condition.
(inliner insn-name) : only used in Gauche core procedures that can be inlined into an VM instruction.

stmt is a cise expression. Inside the expression, a cise macro (result expr …) can be used to assign the value(s) to return from the cproc. As a special case, if stmt is a single symbol, it names a C function to be called with the same argument (mod unboxing) as the cproc.

Stub Form: define-cgeneric name c-name property-clause … ¶

Defines generic function. c-name specifies a C variable name that keeps the generic function structure. One or more of the following clauses can appear in property-clause …:

(extern) : makes c-name visible from other file (i.e. do not define the structure as static).
(fallback "fallback") : specifies the fallback function.
(setter . setter-spec) : specifies the setter.

Stub Form: define-cmethod name (arg …) body … ¶

Stub Form: define-cclass scheme-name [qualifier …] c-type-name c-class-name cpa (slot-spec …) property … ¶

Generates C stub for static class definition, slot accessors and initialization. Corresponding C struct has to be defined elsewhere.

The following qualifiers are supported:

:base generates a base class definition (inheritable from Scheme code).
:built-in generates a built-in class definition (not inheritable from Scheme code). This is the default if neither :base nor :built-in are specified.
:private - the class declaration and standard macro definitions are also generated (which needs to be in the separate header file if you want the C-level structure to be used from other C code. If the extension is small enough to be contained in one C file, this option is convenient.)

cpa lists ancestor classes in precedence order. They need to be C identifiers of Scheme class Scm_*Class, for the time being. Scm_TopClass is added at the end automatically.

slot-spec is defined as (slot-name [qualifier …]) or slot-name. The following qualifiers are supported:

:type cgen-type
:c-name c-name specifies the C field name if the autogenerated name from slot-name is not accurate.
:c-spec c-spec
:getter proc-spec specifies how to create the slot getter. proc-spec could be
- #f to omit the getter
- #t to generate a default one with type conversion according to type
- A string is interpreted as the C code to implement the getter
- (c c-name) specifies the C function name that implements the getter, which is implemented elsewhere.
:setter proc-spec specifies how to create the slot setter. The syntax is the same as :getter.

The following property are supported:

(allocator proc-spec)
(printer proc-spec)
(comparer proc-spec)
(direct-supers string …)

Stub Form: define-cptr scheme-name [qualifier …] c-type c-name c-pred c-boxer c-unboxer [(flags flag …)] [(print print-proc)] [(cleanup cleanup-proc)] ¶

Defines a new foreign pointer class based on <foreign-pointer>. It is suitable when the C structure is mostly passed around using pointers; most typically, when the foreign library allocates the structure and returns the pointer to the Scheme world.

scheme-name is a Scheme variable name. This will be bound to a newly-created subclass of <foreign-pointer> to represent this C-ptr type.

c-type is the type of the C pointer we wrap.

c-name is the C variable name (of type ScmClass *). In initialization code, an instance of a class (the same one bound to scm-name in the Scheme world) will be stored in this C variable.

c-pred is a macro name to determine if a ScmObj is of this type. c-boxer is a macro name to wrap C pointer and return a ScmObj c-unboxer is a macro name to extract C pointer from a ScmObj

The only supported qualifier is :private, which will generate c-pred, c-boxer and c-unboxer definitions automatically. Otherwise those definitions must be provided elsewhere.

The two supported flags are

:keep-identity (which is SCM_FOREIGN_POINTER_KEEP_IDENTITY in the C world) keeps a weak hash table that maps the wrapped C pointer to the wrapping ScmObj, so Scm_MakeForeignPointer (i.e. c-boxer when :private is used) returns eq? object if the same C pointer is given.
This incurs some overhead, but cleanup procedure can safely free the foreign object without worrying if there’s other ScmObj that’s pointing to the same C pointer.

Do not use this flag if the C pointer is also allocated by GC_malloc. The used hash table is only weak for its value, so the C pointer wouldn’t be GCed.
:map-null (which is SCM_FOREIGN_POINTER_MAP_NULL in the C world) makes Scm_MakeForeignPointer (i.e. c-boxer when :private is used) return SCM_FALSE when the C pointer is NULL.

Stub Form: define-symbol scheme-name [c-name] ¶: Defines a Scheme symbol. No Scheme binding is created. When c-name is given, the named C variable points to the created ScmSymbol.

Stub Form: define-variable scheme-name initializer ¶: Defines a Scheme variable.

Stub Form: define-constant scheme-name initializer ¶: Defines a Scheme constant.

Stub Form: define-enum name ¶: A define-constant specialized for enum values. This is useful for exporting C enums to Scheme.

Stub Form: define-enum-conditionally name ¶: Abbreviation of (if "defined(name)" (define-enum name))

Stub Form: define-cise-stmt name clause … ¶
Stub Form: define-cise-expr name clause … ¶
Stub Form: define-cfn … ¶
Stub Form: declare-cfn … ¶
Stub Form: define-cvar … ¶
Stub Form: declare-cvar … ¶
Stub Form: define-ctype … ¶
Stub Form: .define … ¶
Stub Form: .if … ¶
Stub Form: .include … ¶
Stub Form: .undef … ¶
Stub Form: .unless … ¶
Stub Form: .when … ¶: Cise macro definitions (see CiSE - C in S expression).

Stub Form: initcode c-code ¶: Insert c-code literally in the initialization function

Stub Form: declcode stmt … ¶: Inserts declaration code. stmt is usually .include or other preprocessor statements but it could also be a string which is treated as C fragments.

Stub Form: begin form … ¶: Treat each form as if they are toplevel stub forms.

Stub Form: if test then-stmt [else-stmt] ¶
Stub Form: when test stmt ¶: Deprecated. Please use .if and .when instead.

Stub Form: include file ¶: Include and evaluate another stub file.

• Generating C source files:		gauche.cgen.unit
• Generating Scheme literals:		gauche.cgen.literals
• Conversions between Scheme and C:		gauche.cgen.type
• C in S expression:		gauche.cgen.cise
• Stub generation:		gauche.cgen.stub

9.4 gauche.cgen - Generating C code