Next: Character Sets, Previous: Keywords, Up: Core library [Contents][Index]
#\charname
¶[R7RS+] Denotes a literal character.
When the reader reads #\
, it fetches a subsequent character.
If it is one of ()[]{}" \|;#
, this is a character literal of itself.
Otherwise, the reader reads subsequent characters until it sees
a non word-constituent character. If only one character is read,
it is the character. Otherwise, the reader matches the read characters
with predefined character names. If it doesn’t match any, an error
is signaled.
The following character names are recognized. These character names are case insensitive.
space
Whitespace (ASCII #x20)
newline, nl, lf
Newline (ASCII #x0a)
return, cr
Carriage return (ASCII #x0d)
tab, ht
Horizontal tab (ASCII #x09)
page
Form feed (ASCII #x0c)
alarm
Bell (ASCII #x07)
backspace
Backspace (ASCII #x08)
escape, esc
Escape (ASCII #x1b)
delete, del
Delete (ASCII #x7f)
null
NUL character (ASCII #x00)
xN
A character whose Unicode codepoint is the integer N, when N is a hexadecimal integer. This is R7RS lexical syntax. (See the compatibility note below).
uN
A character whose Unicode codepoint is the integer N, where N is 4-digit or 8-digit hexadecimal number.
This is legacy Gauche lexical syntax. Use \xN
syntax for
the new code. (See the compatibility note below).
#\newline ⇒ #\newline ; newline character #\x0a ⇒ #\newline ; ditto #\x41 ⇒ #\A ; ASCII letter ’A’ #\x3042 ⇒ ; Hiragana letter A #\x2a6b2 ⇒ ; JISX0213 Kanji 2-94-86
Compatibility note:
Before 0.9.4, \xNN
syntax uses Gauche’s internal
character encoding as opposed to Unicode codepoint. Both are
the same if Gauche is compiled with internal encoding utf-8
or none
(if it’s none
, only characters up to U+00ff
is supported and in this range the characters are the same as
Unicode characters.) If Gauche is compiled with encoding
euc-jp
or sjis
, the meaning of \xNN
beyond
ASCII range differs from 0.9.3.3 or before.
If you set the reader mode to legacy
(see Reader lexical mode),
#\xNN
is read as before, keeping the compatibility
(but it isn’t compatible to R7RS). Alternatively,
you can use #\uNNNN
, or a character itself, to
make the code work in both new and old versions of Gauche.
[R7RS base]
Returns #t
if obj is a character, #f
otherwise.
[R7RS base] Compares characters. Character comparison is done in internal character encoding.
[R7RS char]
Compares characters in case-insensitive way.
The comparison is done in the internal character code
of the foldcase of the each character; see char-foldcase
below.
In R7RS, these procedures are in the (scheme char)
library.
[R7RS char][SRFI-129]
Returns true if a character char is an alphabetic character
(Unicode character category Lu
, Ll
, Lt
,
Lm
, Lo
, Nl
),
a numeric character
(Unicode character category Nd
),
a whitespace character,
(Unicode character category Zs
, Zp
, Zl
),
an upper case character
(Unicode character category Lu
),
or a lower case character
(Unicode character category Ll
), respectively.
In R7RS, these procedures except char-title-case?
are in the (scheme char)
library,
while char-title-case?
is defined in SRFI-129.
Returns #t
iff char is a character R7RS allows to be
in an identifier name without escaping: ASCII alphanumeric characters,
extended alphabetic characters (! $ % & * + - . / : < = > ? ^ _ ~
),
any characters whose category is
Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co,
Zero width non-joiner (U+200C), and Zero width joiner (U+200D).
[R6RS] Returns one of the following symbols, representing the Unicode general category of char.
Cc | Other, Control |
Cf | Other, Format |
Cn | Other, Not Assigned |
Co | Other, Private Use |
Cs | Other, Surrogate |
Ll | Letter, Lowercase |
Lm | Letter, Modifier |
Lo | Letter, Other |
Lt | Letter, Titlecase |
Lu | Letter, Uppercase |
Mc | Mark, Spacing Combining |
Me | Mark, Enclosing |
Mn | Mark, Nonspacing |
Nd | Number, Decimal Digit |
Nl | Number, Letter |
No | Number, Other |
Pc | Punctuation, Connector |
Pd | Punctuation, Dash |
Pe | Punctuation, Close |
Pf | Punctuation, Final quote |
Pi | Punctuation, Initial quote |
Po | Punctuation, Other |
Ps | Punctuation, Open |
Sc | Symbol, Currency |
Sk | Symbol, Modifier |
Sm | Symbol, Math |
So | Symbol, Other |
Zl | Separator, Line |
Zp | Separator, Paragraph |
Zs | Separator, Space |
If Gauche is compiled with euc-jp or shift_jis encoding, there are characters that don’t have corresponding Unicode codepoint (each of them are represented by one unicode character plus one unicode modifier character). A provisional category is assigned to those characters. If future versions of Unicode incorporates these characters, the category may be reassigned.
SJIS | EUC | Cat | Unicode |
82F5 | A4F7 | Lo | U+304B U+309A (Semi-voiced Hiragana KA) |
82F6 | A4F8 | Lo | U+304D U+309A (Semi-voiced Hiragana KI) |
82F7 | A4F9 | Lo | U+304F U+309A (Semi-voiced Hiragana KU) |
82F8 | A4FA | Lo | U+3051 U+309A (Semi-voiced Hiragana KE) |
82F9 | A4FB | Lo | U+3053 U+309A (Semi-voiced Hiragana KO) |
8397 | A5F7 | Lo | U+30AB U+309A (Semi-voiced Katakana KA) |
8398 | A5F8 | Lo | U+30AD U+309A (Semi-voiced Katakana KI) |
8399 | A5F9 | Lo | U+30AF U+309A (Semi-voiced Katakana KU) |
839A | A5FA | Lo | U+30B1 U+309A (Semi-voiced Katakana KE) |
839B | A5FB | Lo | U+30B3 U+309A (Semi-voiced Katakana KO) |
839C | A5FC | Lo | U+30BB U+309A (Semi-voiced Katakana SE) |
839D | A5FD | Lo | U+30C4 U+309A (Semi-voiced Katakana TSU) |
839E | A5FE | Lo | U+30C8 U+309A (Semi-voiced Katakana TO) |
83F6 | A6F8 | Lo | U+31F7 U+309A (Semi-voiced small Katakana FU) |
8663 | ABC4 | Ll | U+00E6 U+0300 (Accented latin small ae) |
8667 | ABC8 | Ll | U+0254 U+0300 (Accented latin small open o) |
8668 | ABC9 | Ll | U+0254 U+0301 (Accented latin small open o) |
8669 | ABCA | Ll | U+028C U+0300 (Accented latin small turned v) |
866A | ABCB | Ll | U+028C U+0301 (Accented latin small turned v) |
866B | ABCC | Ll | U+0259 U+0300 (Accented latin small schwa) |
866C | ABCD | Ll | U+0259 U+0301 (Accented latin small schwa) |
866D | ABCE | Ll | U+025A U+0300 (Accented latin small schwa w/hook) |
866E | ABCF | Ll | U+025A U+0301 (Accented latin small schwa w/hook) |
8685 | ABE5 | Sk | U+02E9 U+02E5 |
8686 | ABE6 | Sk | U+02E5 U+02E9 |
[R7RS base]
char->integer
returns an exact integer that represents
internal encoding of the character char.
integer->char
returns a character whose internal encoding
is an exact integer n. The following expression is always
true for valid character char:
(eq? char (integer->char (char->integer char)))
Note: R7RS defines these procedures to deal with Unicode
codepoints. Gauche complies it when compiled with utf-8
or none
internal encoding (for the latter, only characters
up to U+00ff are supported). If Gauche is compiled with
euc-jp
or sjis
internal encoding, you need to use
char->ucs
/ucs->char
below to convert between Unicode
codepoints and characters.
The result is undefined if you pass n to integer->char
that doesn’t have a corresponding character.
Converts a character char to integer UCS codepoint, and integer UCS codepoint n to a character, respectively.
If Gauche is compiled with UTF-8 encoding, these procedures are the
same as char->integer
and integer->char
.
When Gauche’s internal encoding differs from UTF-8, these procedures
implicitly loads gauche.charconv
module to convert internal
character code to UCS or vice versa (see gauche.charconv
- Character Code Conversion).
If char doesn’t have corresponding UCS codepoint,
char->ucs
returns #f
. If UCS codepoint n can’t
be represented in the internal character encoding, ucs->char
returns #f
, unless the conversion routine provides a substitution
character.
[R7RS char][SRFI-129] Returns the upper case, lower case, title case and folded case of char, respectively.
The mapping is done according to Unicode-defined character-by-character case mapping whenever possible. If the native encoding doesn’t support the mapped character defined in Unicode, the operation becomes no-op. If the native encoding is ’none’, we treat the characters as if they are Latin-1 (ISO-8859-1) characters. So, upcasing Latin-1 character small y with diaresis (U+00ff) maps to capital y with diaeresis (U+0178) if the internal encoding is utf-8, but it is no-op if the internal encoding is none.
R7RS defines char-upcase
, char-downcase
, and
char-foldcase
in the (scheme char)
library,
while char-titlecase
is defined in SRFI-129.
R6RS defines all of them.
The character-by-character case mapping doesn’t consider a character
that may map to more than one characters; a notable example is
eszett (latin small letter sharp S, U+00df), which is
is mapped to two capital S’s in string context,
but char-upcase #\ß
returns #\ß. To get a full mapping,
use string-upcase
etc. in gauche.unicode
module
(see Full string case conversion).
If given character char is a valid digit character in radix
radix number, the corresponding integer is returned. Otherwise
#f
is returned.
(digit->integer #\4) ⇒ 4 (digit->integer #\e 16) ⇒ 14 (digit->integer #\9 8) ⇒ #f
If the optional extended-range? argument is true,
this procedure recognizes not only ASCII digits, but also
all characters with Nd
general category—such as
FULLWIDTH DIGIT ZERO to NINE (U+ff10 - U+ff19).
R7RS has digit-value
, which is equivalent to
(digit->integer char 10 #t)
.
Note: CommonLisp has a similar function in rather confusing name,
digit-char-p
.
Reverse operation of digit->integer
. Returns a character that
represents the number integer in the radix radix system.
If integer is out of the valid range, #f
is returned.
(integer->digit 13 16) ⇒ #\d (integer->digit 10) ⇒ #f
The optional basechar1 argument specifies the character
that stands for zero; by default, it’s #\0
. You can give
alternative character, for example, U+0660 (ARABIC-INDIC DIGIT ZERO)
to convert an integer to a arabic-indic digit character.
Another optional basechar2 argument is used for integers over 10.
The default value is #\a
. You can pass #\A
to get
upper-case hex digits, for example.
Note: CommonLisp’s digit-char
.
Returns a symbol designates the native character encoding, selected at the compile time. The possible return values are those:
euc-jp
EUC-JP
utf-8
UTF-8
sjis
Shift JIS
none
No multibyte character support (8-bit fixed-length character).
To switch code at compile time according to the internal encoding,
you can use feature identifiers gauche.ces.*
–see
Using platform-dependent features.
Returns a list of string names of character encoding schemes that are supported in the native multibyte encoding scheme.
Next: Character Sets, Previous: Keywords, Up: Core library [Contents][Index]