Lexical structure (Gauche Users’ Reference)

4.1 Lexical structure

Gauche extends R7RS Scheme lexical parser in some ways. Besides, because of historical reasons, a few of the default lexical syntax may conflict R7RS specification. You can set a reader mode to make it R7RS compliant.

Hash-bang directives

Tokens beginning with #! may have special meanings to the reader. R7RS defines two of such directives—#!fold-case and #!no-fold-case, which switches whether symbols are read in case-folding or non-case-folding mode, respectively.

see Hash-bang token below, for all the directives Gauche has.

Square brackets ¶

Gauche adopts the R6RS syntax that regards [] the same as (). Both kind of parentheses are equivalent, but the kind of corresponding open and close parentheses must match. Some seasoned Lisper may frown on them, but it helps visually distinguish different roles of parentheses.

A general convention is to use [] for groupings other than function and macro application. If such grouping nests, however, use () for outer groupings. Examples:

(cond [(test1 x) (y z)]
      [(test2 x) (s t)]
      [else (u v)])

(let ([x (foo a b)]
      [y (bar c d)])
  (baz x y))

It is purely optional, so you don’t need to use them if you don’t like them. R7RS doesn’t adopt this syntax and leaves [] for extensions, so it is safe to stick to () in portable R7RS programs. (If the reader is in strict-r7 mode, an error is signalled when [] is used. See Reader lexical mode, for the details.)

Scheme-specific modes of some editors (e.g. Quack on Emacs) allows you to type just ) and inserts either ] or ) depending on which kind parenthesis it is closing. We recommend using such modes if you use this convention.

Symbol names

Symbol names are case sensitive by default (see Case-sensitivity). Symbol name can begin with digits, ’+’ or ’-’, as long as the entire token doesn’t consist valid number syntax. Other weird characters can be included in a symbol name by surrounding it with ’|’, e.g. ’|this is a symbol|’. See Symbols, for details.

Numeric literals

Either integral part or fraction part of inexact real numbers can be omitted if it is zero, i.e. 30., .25, -.4 are read as real numbers. The number reader recognizes ’#’ as insignificant digits. Complex numbers can be written both in the rectangular format (e.g. 1+0.3i) and in the polar format (e.g. 3.0@1.57). Inexact real numbers include the positive infinity, the negative infinity, and NaN, which are represented as +inf.0, -inf.0 and +nan.0, respectively. (-nan.0 is also read as NaN.)

Gauche supports SRFI-169 (underscores in numbers) notation; you can insert a character _ between digits in numeric literals to improve readability, e.g. #b1100_1010_1111_1110. A valid underscore must be surrounded by digits; 1_2_3 is ok, but _123, 123_, and 12__3 are not.

Gauche also adopts Common-Lisp style radix prefixed numeric literals, e.g. #3r120 (120 in base-3, 15 in decimal). Radix between 2 and 36 are recognized; alphabetic letters a-zA-Z are used beyond decimal.

For the polar notation of complex numbers, Gauche allows the suffix pi to denote the phase by multiples of pi. The Scheme syntax use radians for the phase, but you can only approximate pi with the floating point numbers, so it can’t represent round numbers except zero angle.

gosh> 2@3.141592653589793
-2.0+2.4492935982947064e-16i

With the pi suffix, you can get a round numbers.

gosh> 2@1pi
-2.0
gosh> 2@0.5pi
0.0+2.0i
gosh> 2@-0.5pi
0.0-2.0i

Hex character escapes ¶

You can denote a character using hexadecimal notation of the character code in some literals; specifically, character literals, charcter set literals, string literals, symbols, regular expression literals.

R7RS adopted a hex escape notation \xNNNN; for strings and symbols surrounded by vertical bars, and #\xNNNN for characters. The number of digits is variable, and the character code is Unicode codepoint.

Gauche had been using two types of escapes; \u and \x. In general, u is for Unicode codepoint, while x is for the character code in the internal encoding. Besides, except character literals, we used fixed number of digits, instead of using the terminator ; as in R7RS.

Since 0.9.4, we interpret \x-escape as R7RS whenever if it consists a valid R7RS hex-escape, and if not, try to interpret it as legacy Gauche hex-escape.

Although rarely, there are cases that can interpreted both in R7RS syntax and legacy Gauche syntax, but yielding different characters. Reading legacy files with such literals in the current Gauche may cause unexpected behavior. You can switch the reader mode so that it becomes backward-compatible. See Reader lexical mode, for the details.

Extended sharp syntax

Many more special tokens begins with ’#’ are defined. See the table below.

4.1.1 Sharp syntax

The table below lists sharp-syntaxes.

`#!`	[R6RS][R7RS][SRFI-22] It is either a beginning of an interpreter line (shebang) of a script, or a special token that affects the mode of the reader. See ‘hash-bang token’ section below.
`#"`	Introduces an interpolated string. See String interpolation.
`##`, `#$`, `#%`, `#&`, `#'`	Unused.
`#(`	[R7RS] Introduces a vector.
`#)`	Unused.
`#*`	Bitvector or an incomplete string. See Strings.
`#+`	Unused.
`#,`	[SRFI-10] Introduces reader constructor syntax.
`#-`, `#.`	Unused.
`#/`	Introduces a literal regular expression. See Regular expressions.
`#0` … `#9`	`#n#`, `#n=`: [SRFI-38] Shared substructure definition and reference. `#nR`, `#nr`: Radix prefixed numeric literals.
`#:`	Uninterned symbol. See Symbols.
`#;`	[SRFI-62] S-expression comment. Reads next one S-expression and discard it.
`#<`	Introduces an unreadable object.
`#=`, `#>`	Unused.
`#?`	Introduces debug macros. See Debugging.
`#@`	Unused.
`#a`	Unused.
`#b`	[R7RS] Binary number prefix.
`#c`	Unused.
`#d`	[R7RS] Decimal number prefix.
`#e`	[R7RS] Exact number prefix.
`#f`	[R7RS] Boolean false, or introducing R7RS uniform vector. See Uniform vectors. R7RS defines both `#f` and `#false` as a boolean false value.
`#g`, `#h`	Unused.
`#i`	[R7RS] Inexact number prefix.
`#j`, `#k`, `#l`, `#m`, `#n`	Unused.
`#o`	[R7RS] Octal number prefix.
`#p`, `#q`, `#r`	Unused.
`#s`	[R7RS vector.@] introducing R7RS uniform vector. See Uniform vectors.
`#t`	[R7RS] Boolean true. R7RS defines `#t` and `#true` as a boolean true value.
`#u`	[R7RS vector.@] introducing R7RS uniform vector. See Uniform vectors. R7RS uses `#u8` prefix for bytevectors, which is compatible to `u8` uniform vectors.
`#v`, `#w`	Unused.
`#x`	[R7RS] Hexadecimal number prefix.
`#y`, `#z`	Unused.
`#[`	Introduces a literal character set. See Character Sets.
`#\`	[R7RS] Introduces a literal character. See Characters.
`#]`, `#^`, `#_`	Unused.
#`	Legacy syntax for string interpolation, superseded by `#"`.
`#{`	Unused.
`#\|`	[SRFI-30] Introduces a block comment. Comment ends by matching ’`\|#`’.
`#}`, `#~`	Unused.

4.1.2 Hash-bang token

A character sequence #! has two completely different semantics, depending on how and where it occurs.

If a file begins with #!/ or #! (hash, bang, and a space), then the reader assumes it is an interpreter line (shebang) of a script and ignores the rest of characters until the end of line. (Actually the source doesn’t need to be a file. The reader checks whether it is the beginning of a port.)

Other than the above case, #!identifier is read as a token with special meanings. This kind token can be a special directive for the reader, instead of read as a datum.

By default, the following tokens are recognized.

#!fold-case
#!no-fold-case: Switches the reader’s case sensitivity; #!fold-case makes the reader case insensitive, and #!no-fold-case makes it case sensitive. (Also see Case-sensitivity).
#!r6rs: This token is introduced in R6RS and used to indicate the program strictly conforms R6RS. Gauche doesn’t conform R6RS, but currently it just issues warning when it sees #!r6rs token, and it keeps reading on.
#!r7rs: Make the reader strict-r7 mode, that complies R7RS. See Reader lexical mode, for the details.
#!gauche-legacy: Make the reader legacy mode, that is compatible to Gauche 0.9.3 and before. See Reader lexical mode, for the details.