Characters (Gauche Users’ Reference)

6.9 Characters

Builtin Class: <char> ¶

Reader Syntax: #\charname ¶

[R7RS+] Denotes a literal character.

When the reader reads #\, it fetches a subsequent character. If it is one of ()[]{}" \|;#, this is a character literal of itself. Otherwise, the reader reads subsequent characters until it sees a non word-constituent character. If only one character is read, it is the character. Otherwise, the reader matches the read characters with predefined character names. If it doesn’t match any, an error is signaled.

The following character names are recognized. These character names are case insensitive.

space: Whitespace (ASCII #x20)
newline, nl, lf: Newline (ASCII #x0a)
return, cr: Carriage return (ASCII #x0d)
tab, ht: Horizontal tab (ASCII #x09)
page: Form feed (ASCII #x0c)
alarm: Bell (ASCII #x07)
backspace: Backspace (ASCII #x08)
escape, esc: Escape (ASCII #x1b)
delete, del: Delete (ASCII #x7f)
null: NUL character (ASCII #x00)
xN: A character whose Unicode codepoint is the integer N, when N is a hexadecimal integer. This is R7RS lexical syntax.
uN: A character whose Unicode codepoint is the integer N, where N is 4-digit or 8-digit hexadecimal number. This is legacy Gauche lexical syntax. Use \xN syntax for the new code.

#\newline ⇒ #\newline ; newline character
#\x0a     ⇒ #\newline ; ditto
#\x41     ⇒ #\A       ; ASCII letter ’A’
#\x3042   ⇒ ; Hiragana letter A
#\x2a6b2  ⇒ ; JISX0213 Kanji 2-94-86

Function: char? obj ¶: [R7RS base] Returns #t if obj is a character, #f otherwise.

Function: char=? char1 char2 char3 … ¶
Function: char<? char1 char2 char3 … ¶
Function: char<=? char1 char2 char3 … ¶
Function: char>? char1 char2 char3 … ¶
Function: char>=? char1 char2 char3 … ¶: [R7RS base] Compares characters. Character comparison is done in internal character encoding.

Function: char-ci=? char1 char2 char3 … ¶

Function: char-ci<? char1 char2 char3 … ¶

Function: char-ci<=? char1 char2 char3 … ¶

Function: char-ci>? char1 char2 char3 … ¶

Function: char-ci>=? char1 char2 char3 … ¶

[R7RS char] Compares characters in case-insensitive way. The comparison is done in the internal character code of the foldcase of the each character; see char-foldcase below.

In R7RS, these procedures are in the (scheme char) library.

Function: char-alphabetic? char ¶

Function: char-numeric? char ¶

Function: char-whitespace? char ¶

Function: char-upper-case? char ¶

Function: char-lower-case? char ¶

Function: char-title-case? char ¶

[R7RS char][SRFI-129] Returns true if a character char is an alphabetic character (Unicode character category Lu, Ll, Lt, Lm, Lo, Nl), a numeric character (Unicode character category Nd), a whitespace character, (Unicode character category Zs, Zp, Zl), an upper case character (Unicode character category Lu), or a lower case character (Unicode character category Ll), respectively.

In R7RS, these procedures except char-title-case? are in the (scheme char) library, while char-title-case? is defined in SRFI-129.

Function: char-word-constituent? char ¶: Returns #t iff char is a character R7RS allows to be in an identifier name without escaping: ASCII alphanumeric characters, extended alphabetic characters (! $ % & * + - . / : < = > ? ^ _ ~), any characters whose category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co, Zero width non-joiner (U+200C), and Zero width joiner (U+200D).

Function: char-general-category char ¶

[R6RS] Returns one of the following symbols, representing the Unicode general category of char.

`Cc`	Other, Control
`Cf`	Other, Format
`Cn`	Other, Not Assigned
`Co`	Other, Private Use
`Cs`	Other, Surrogate
`Ll`	Letter, Lowercase
`Lm`	Letter, Modifier
`Lo`	Letter, Other
`Lt`	Letter, Titlecase
`Lu`	Letter, Uppercase
`Mc`	Mark, Spacing Combining
`Me`	Mark, Enclosing
`Mn`	Mark, Nonspacing
`Nd`	Number, Decimal Digit
`Nl`	Number, Letter
`No`	Number, Other
`Pc`	Punctuation, Connector
`Pd`	Punctuation, Dash
`Pe`	Punctuation, Close
`Pf`	Punctuation, Final quote
`Pi`	Punctuation, Initial quote
`Po`	Punctuation, Other
`Ps`	Punctuation, Open
`Sc`	Symbol, Currency
`Sk`	Symbol, Modifier
`Sm`	Symbol, Math
`So`	Symbol, Other
`Zl`	Separator, Line
`Zp`	Separator, Paragraph
`Zs`	Separator, Space

Function: char->integer char ¶

Function: integer->char n ¶

[R7RS base] char->integer returns an exact integer of the unicode codepont of the character char. integer->char returns a character whose unicode codepoint is an exact integer n. The following expression is always true for valid character char:

(eq? char (integer->char (char->integer char)))

The result is undefined if you pass n to integer->char that doesn’t have a corresponding character.

Function: char->ucs char ¶

Function: ucs->char n ¶

Deprecated. These are the same as char->integer and integer->char, respectively.

Gauche used to be built with an internal encoding other than utf-8, in which integer character codes differ from Unicode codepoints. These procedures can be used to guarantee you can use Unicode codepoints.

Now that Gauche always uses utf-8, these procedures are no longer needed.

Function: char-upcase char ¶

Function: char-downcase char ¶

Function: char-titlecase char ¶

Function: char-foldcase char ¶

[R7RS char][SRFI-129] Returns the upper case, lower case, title case and folded case of char, respectively. The mapping is done according to Unicode-defined character-by-character case mapping.

R7RS defines char-upcase, char-downcase, and char-foldcase in the (scheme char) library, while char-titlecase is defined in SRFI-129. R6RS defines all of them.

The character-by-character case mapping doesn’t consider a character that may map to more than one characters; a notable example is eszett (latin small letter sharp S, U+00df), which is is mapped to two capital S’s in string context, but char-upcase #\ß returns #\ß. To get a full mapping, use string-upcase etc. in gauche.unicode module (see Full string case conversion).

Function: digit->integer char :optional (radix 10) (extended-range? #f) ¶

If given character char is a valid digit character in radix radix number, the corresponding integer is returned. Otherwise #f is returned.

(digit->integer #\4) ⇒ 4
(digit->integer #\e 16) ⇒ 14
(digit->integer #\9 8) ⇒ #f

If the optional extended-range? argument is true, this procedure recognizes not only ASCII digits, but also all characters with Nd general category—such as FULLWIDTH DIGIT ZERO to NINE (U+ff10 - U+ff19).

R7RS has digit-value, which is equivalent to (digit->integer char 10 #t).

Note: CommonLisp has a similar function in rather confusing name, digit-char-p.

Function: integer->digit integer :optional (radix 10) (basechar1 #\0) (basechar2 #\a) ¶

Reverse operation of digit->integer. Returns a character that represents the number integer in the radix radix system. If integer is out of the valid range, #f is returned.

(integer->digit 13 16) ⇒ #\d
(integer->digit 10) ⇒ #f

The optional basechar1 argument specifies the character that stands for zero; by default, it’s #\0. You can give alternative character, for example, U+0660 (ARABIC-INDIC DIGIT ZERO) to convert an integer to a arabic-indic digit character.

Another optional basechar2 argument is used for integers over 10. The default value is #\a. You can pass #\A to get upper-case hex digits, for example.

Note: CommonLisp’s digit-char.

Function: gauche-character-encoding ¶: Deprecated. Returns a symbol utf-8. This is kept for the backward compatibility with the code written for pre-1.0 Gauche which may be compiled with internal encodings other than utf-8.

Function: supported-character-encodings ¶: Returns a list of string names of character encoding schemes that are supported in the native multibyte encoding scheme.