Multibyte scripts (Gauche Users’ Reference)

2.3 Multibyte scripts

You can use characters other than us-ascii not only in literal strings and characters, but in comments, symbol names, literal regular expressions, and so on.

By default, Gauche assumes a Scheme program is written in utf-8. If you need to write a source other than utf-8, however, you can add the following “magic comment” near the beginning of the source code:

When Gauche finds a comment something like the following within the first two lines of the program source, it assumes the rest of the source code is written in <encoding-name>, and does the appropriate character encoding conversion to read the source code:

;; coding: <encoding-name>

More precisely, a comment in either first or second line that matches a regular expression #/coding[:=]\s*([\w.-]+)/ is recognized, and the first submatch is taken as an encoding name. If there are multiple matches, only the first one is effective. The first two lines must not contain characters other than us-ascii in order for this mechanism to work.

The following example tells Gauche that the script is written in EUC-JP encoding. Note that the string "-*-" around the coding would be recognized by Emacs to select the buffer’s encoding appropriately.

#!/usr/bin/gosh
;; -*- coding: euc-jp -*-

... script written in euc-jp ...

Internally, the handling of this magic comment is done by a special type of port. See Coding-aware ports for the details. See also Loading Scheme file for how to disable this feature.