#\charname
¶[R7RS+] Denotes a literal character.
When the reader reads #\
, it fetches a subsequent character.
If it is one of ()[]{}" \|;#
, this is a character literal of itself.
Otherwise, the reader reads subsequent characters until it sees
a non word-constituent character. If only one character is read,
it is the character. Otherwise, the reader matches the read characters
with predefined character names. If it doesn’t match any, an error
is signaled.
The following character names are recognized. These character names are case insensitive.
space
Whitespace (ASCII #x20)
newline, nl, lf
Newline (ASCII #x0a)
return, cr
Carriage return (ASCII #x0d)
tab, ht
Horizontal tab (ASCII #x09)
page
Form feed (ASCII #x0c)
alarm
Bell (ASCII #x07)
backspace
Backspace (ASCII #x08)
escape, esc
Escape (ASCII #x1b)
delete, del
Delete (ASCII #x7f)
null
NUL character (ASCII #x00)
xN
A character whose Unicode codepoint is the integer N, when N is a hexadecimal integer. This is R7RS lexical syntax.
uN
A character whose Unicode codepoint is the integer N,
where N is 4-digit or 8-digit hexadecimal number.
This is legacy Gauche lexical syntax. Use \xN
syntax for
the new code.
#\newline ⇒ #\newline ; newline character #\x0a ⇒ #\newline ; ditto #\x41 ⇒ #\A ; ASCII letter ’A’ #\x3042 ⇒ ; Hiragana letter A #\x2a6b2 ⇒ ; JISX0213 Kanji 2-94-86
[R7RS base]
Returns #t
if obj is a character, #f
otherwise.
[R7RS base] Compares characters. Character comparison is done in internal character encoding.
[R7RS char]
Compares characters in case-insensitive way.
The comparison is done in the internal character code
of the foldcase of the each character; see char-foldcase
below.
In R7RS, these procedures are in the (scheme char)
library.
[R7RS char][SRFI-129]
Returns true if a character char is an alphabetic character
(Unicode character category Lu
, Ll
, Lt
,
Lm
, Lo
, Nl
),
a numeric character
(Unicode character category Nd
),
a whitespace character,
(Unicode character category Zs
, Zp
, Zl
),
an upper case character
(Unicode character category Lu
),
or a lower case character
(Unicode character category Ll
), respectively.
In R7RS, these procedures except char-title-case?
are in the (scheme char)
library,
while char-title-case?
is defined in SRFI-129.
Returns #t
iff char is a character R7RS allows to be
in an identifier name without escaping: ASCII alphanumeric characters,
extended alphabetic characters (! $ % & * + - . / : < = > ? ^ _ ~
),
any characters whose category is
Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co,
Zero width non-joiner (U+200C), and Zero width joiner (U+200D).
[R6RS] Returns one of the following symbols, representing the Unicode general category of char.
Cc | Other, Control |
Cf | Other, Format |
Cn | Other, Not Assigned |
Co | Other, Private Use |
Cs | Other, Surrogate |
Ll | Letter, Lowercase |
Lm | Letter, Modifier |
Lo | Letter, Other |
Lt | Letter, Titlecase |
Lu | Letter, Uppercase |
Mc | Mark, Spacing Combining |
Me | Mark, Enclosing |
Mn | Mark, Nonspacing |
Nd | Number, Decimal Digit |
Nl | Number, Letter |
No | Number, Other |
Pc | Punctuation, Connector |
Pd | Punctuation, Dash |
Pe | Punctuation, Close |
Pf | Punctuation, Final quote |
Pi | Punctuation, Initial quote |
Po | Punctuation, Other |
Ps | Punctuation, Open |
Sc | Symbol, Currency |
Sk | Symbol, Modifier |
Sm | Symbol, Math |
So | Symbol, Other |
Zl | Separator, Line |
Zp | Separator, Paragraph |
Zs | Separator, Space |
[R7RS base]
char->integer
returns an exact integer of the unicode
codepont of the character char.
integer->char
returns a character whose unicode codepoint
is an exact integer n. The following expression is always
true for valid character char:
(eq? char (integer->char (char->integer char)))
The result is undefined if you pass n to integer->char
that doesn’t have a corresponding character.
Deprecated.
These are the same as char->integer
and integer->char
,
respectively.
Gauche used to be built with an internal encoding other than utf-8, in which integer character codes differ from Unicode codepoints. These procedures can be used to guarantee you can use Unicode codepoints.
Now that Gauche always uses utf-8, these procedures are no longer needed.
[R7RS char][SRFI-129] Returns the upper case, lower case, title case and folded case of char, respectively. The mapping is done according to Unicode-defined character-by-character case mapping.
R7RS defines char-upcase
, char-downcase
, and
char-foldcase
in the (scheme char)
library,
while char-titlecase
is defined in SRFI-129.
R6RS defines all of them.
The character-by-character case mapping doesn’t consider a character
that may map to more than one characters; a notable example is
eszett (latin small letter sharp S, U+00df), which is
is mapped to two capital S’s in string context,
but char-upcase #\ß
returns #\ß. To get a full mapping,
use string-upcase
etc. in gauche.unicode
module
(see Full string case conversion).
If given character char is a valid digit character in radix
radix number, the corresponding integer is returned. Otherwise
#f
is returned.
(digit->integer #\4) ⇒ 4 (digit->integer #\e 16) ⇒ 14 (digit->integer #\9 8) ⇒ #f
If the optional extended-range? argument is true,
this procedure recognizes not only ASCII digits, but also
all characters with Nd
general category—such as
FULLWIDTH DIGIT ZERO to NINE (U+ff10 - U+ff19).
R7RS has digit-value
, which is equivalent to
(digit->integer char 10 #t)
.
Note: CommonLisp has a similar function in rather confusing name,
digit-char-p
.
Reverse operation of digit->integer
. Returns a character that
represents the number integer in the radix radix system.
If integer is out of the valid range, #f
is returned.
(integer->digit 13 16) ⇒ #\d (integer->digit 10) ⇒ #f
The optional basechar1 argument specifies the character
that stands for zero; by default, it’s #\0
. You can give
alternative character, for example, U+0660 (ARABIC-INDIC DIGIT ZERO)
to convert an integer to a arabic-indic digit character.
Another optional basechar2 argument is used for integers over 10.
The default value is #\a
. You can pass #\A
to get
upper-case hex digits, for example.
Note: CommonLisp’s digit-char
.
Deprecated.
Returns a symbol utf-8
.
This is kept for the backward compatibility with the code
written for pre-1.0 Gauche which may be compiled with internal
encodings other than utf-8.
Returns a list of string names of character encoding schemes that are supported in the native multibyte encoding scheme.