For Gauche 0.9.5


Next: , Previous: , Up: Concepts   [Contents][Index]

2.3 Multibyte scripts

You can use characters other than us-ascii not only in literal strings and characters, but in comments, symbol names, literal regular expressions, and so on.

By default, Gauche assumes a Scheme program is written in its internal character encoding. It is fine as far as you’re writing scripts to use your own environment, but it becomes a problem if somebody else tries to use your script and finds out you’re using different character encoding than his/hers.

So, if Gauche finds a comment something like the following within the first two lines of the program source, it assumes the rest of the source code is written in <encoding-name>, and does the appropriate character encoding conversion to read the source code:

;; coding: <encoding-name>

More precisely, a comment in either first or second line that matches a regular expression #/coding[:=]\s*([\w.-]+)/ is recognized, and the first submatch is taken as an encoding name. If there are multiple matches, only the first one is effective. The first two lines must not contain characters other than us-ascii in order for this mechanism to work.

The following example tells Gauche that the script is written in EUC-JP encoding. Note that the string "-*-" around the coding would be recognized by Emacs to select the buffer’s encoding appropriately.

#!/usr/bin/gosh
;; -*- coding: euc-jp -*-

... script written in euc-jp ...

Internally, the handling of this magic comment is done by a special type of port. See Coding-aware ports for the details. See also Loading Scheme file for how to disable this feature.


Next: , Previous: , Up: Concepts   [Contents][Index]