For Gauche 0.9.5

Next: , Previous: , Up: Core library   [Contents][Index]

6.12 Strings

Builtin Class: <string>

A string class. In Gauche, a string can be viewed in two ways: a sequence of characters, or a sequence of bytes.

It should be emphasized that Gauche’s internal string object, string body, is immutable. To comply R7RS in which strings are mutable, a Scheme-level string object is an indirect pointer to a string body. Mutating a string means that Gauche creates a new immutable string body that reflects the changes, then swap the pointer in the Scheme-level string object.

This may affect some assumptions on the cost of string operations.

Gauche does not attempt to make string mutation faster; (string-set! s k c) is exactly as slow as to take two substrings, before and after of k-th character, and concatenate them with a single-character string inbetween. So, just avoid string mutations; we believe it’s a better practice. See also String Constructors.

R7RS string operations are very minimal. Gauche supports some extra built-in operations, and also a rich string library defined in SRFI-13. See String library, for details about SRFI-13.

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.1 String syntax

Reader Syntax: " "

[R7RS+] Denotes a literal string. Inside the double quotes, the following backslash escape sequences are recognized.


[R7RS] Double-quote character


[R7RS] Backslash character


[R7RS] Newline character (ASCII 0x0a).


[R7RS] Return character (ASCII 0x0d).


Form-feed character (ASCII 0x0c).


[R7RS] Tab character (ASCII 0x09)


[R7RS] Alarm character (ASCII 0x07).


[R7RS] Backspace character (ASCII 0x08).


ASCII NUL character (ASCII 0x00).


[R7RS] Ignored. This can be used to break a long string literal for readability. This escape sequence is introduced in R6RS.


[R7RS] A character whose Unicode codepoint is represented by hexadecimal number N, which is any number of hexadecimal digits. (See the compatibility notes below.)


A character whose UCS2 code is represented by four-digit hexadecimal number NNNN.


A character whose UCS4 code is represented by eight-digit hexadecimal number NNNNNNNN.

The following code is an example of backslash-newline escape sequence:

(define *message* "\
  This is a long message \
  in a literal string.")

  ⇒ "This is a long message in a literal string."

Note the whitespace just after ‘message’. Since any whitespaces before ‘in’ is eaten by the reader, you have to put a whitespace between ‘message’ and the following backslash. If you want to include an actual newline character in a string, and any indentation after it, you can put ’\n’ in the next line like this:

(define *message/newline* "\
  This is a long message, \
  \n   with a line break.")

Note for the compatibility: We used to recognize a syntax \xNN (two-digit hexadecimal number, without semicolon terminator) as a character in a string; for example, "\x0d\x0a" was the same as "\r\n". We still support it when we don’t see the terminating semicolon, for the compatibility. There are ambiguous cases: "\0x0a;" means "\n" in the current syntax, while "\n;" in the legacy syntax.

Setting the reader mode to legacy restores the old behavior. Setting the reader mode to warn-legacy makes it work like the default behavior, but prints warning when it finds legacy syntax. See Reader lexical mode, for the details.

Reader Syntax: #*" "

Denotes incomplete string. The same escape sequences as the complete string syntax are recognized.

Rationale of the syntax: ’#*’ is used for bit vector in Common Lisp. Since an incomplete strings is really a byte vector, it has similarity. (Bit vector can be added later, if necessary, and two can coexist).

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.2 String Predicates

Function: string? obj

[R7RS] Returns #t if obj is a string, #f otherwise.

Function: string-immutable? obj

Returns #t if obj is an immutable string, #f otherwise

Function: string-incomplete? obj

Returns #t if obj is an incomplete string, #f otherwise

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.3 String Constructors

Function: make-string k :optional char

[R7RS] Returns a string of length k. If optional char is given, the new string is filled with it. Otherwise, the string is filled with a whitespace. The result string is always complete.

(make-string 5 #\x) ⇒ "xxxxx"

Note that the algorithm to allocate a string by make-string and then fills it one character at a time is extremely inefficient in Gauche, and should be avoided. That kind of algorithms unnecessarily assumes underlying string allocation and representation mechanism, which Gauche doesn’t follow. You can use an output string port for a string construction (see String ports). Even creating a list of characters and using list->string is faster than using make-string and string-set!.

Function: make-byte-string k :optional byte

Creates and returns an incomplete string o size k. If byte is given, which must be an exact integer, and its lower 8 bits are used to initialize every byte in the created string.

Function: string char …

[R7RS] Returns a string consisted by char ….

Generic Function: x->string obj

A generic coercion function. Returns a string representation of obj. The default methods are defined as follows: strings are returned as is, numbers are converted by number->string, symbols are converted by symbol->string, and other objects are converted by display.

Other class may provide a method to customize the behavior.

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.4 String interpolation

The term "string interpolation" is used in various scripting languages such as Perl and Python to refer to the feature to embed expressions in a string literal, which are evaluated and then their results are inserted into the string literal at run time.

Scheme doesn’t define such a feature, but Gauche implements it as a reader macro.

Reader Syntax: # string-literal

Evaluates to a string. If string-literal contains the character sequence ~expr, where expr is a valid external representation of a Scheme expression, expr is evaluated and its result is inserted in the original place (by using x->string, see String Constructors).

The tilde and the following expression must be adjacent (without containing any whitespace characters), or it is not recognized as a special sequence.

To include a tilde itself immediately followed by non-delimiting character, use ~~.

Other characters in the string-literal are copied as is.

If you use a variable as expr and need to delimit it from the subsequent string, you can use the symbol escape syntax using ‘|’ character, as shown in the last two examples below.

#"This is Gauche, version ~(gauche-version)."
 ⇒ "This is Gauche, version 0.9.5."

#"Date: ~(sys-strftime \"%Y/%m/%d\" (sys-localtime (sys-time)))"
 ⇒ "Date: 2002/02/18"

(let ((a "AAA")
      (b "BBB"))
 #"xxx ~a ~b zzz")
 ⇒ "xxx AAA BBB zzz"

 ⇒ "123~456~789"

(let ((n 7)) #"R~|n|RS")
 ⇒ "R7RS"

(let ((x "bar")) #"foo~|x|.")
 ⇒ "foobar"

In fact, the reader expands this syntax into a macro call, which is then expanded into a call of string-append as follows:

#"This is Gauche, version ~(gauche-version)."
(string-append "This is Gauche, version "
               (x->string (gauche-version))
Reader Syntax: #` string-literal

This is the old style of string-interpolation. It is still recognized, but discouraged for the new code.

Inside string-literal, you can use ,expr (instead of ~expr) to evaluate expr. If comma isn’t immediately followed by a character starting an expression, it loses special meaning.

#`"This is Gauche, version ,(gauche-version)"

Rationale of the syntax: There are wide variation of string interpolation syntax among scripting languages. They are usually linked with other syntax of the language (e.g. prefixing $ to mark evaluating place is in sync with variable reference syntax in some languages).

The old style of string interpolation syntax was taken from quasiquote syntax, because those two are conceptually similar operations (see Quasiquotation). However, since comma character is frequently used in string literals, it was rather awkward.

We decided that tilde is more suitable as the unquote character for the following reasons.

Note that Scheme allows wider range of characters for valid identifier names than usual scripting languages. Consequently, you will almost always need to use ‘|’ delimiters when you interpolate the value of a variable. For example, while you can write "$year/$month/$day $hour:$minutes:$seconds" in Perl, you should write #"~|year|/~|month|/~day ~|hour|:~|minutes|:~seconds". It may be better always to delimit direct variable references in this syntax to avoid confusion.

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.5 String Accessors & Modifiers

Function: string-length string

[R7RS] Returns a length of (possibly incomplete) string string.

Function: string-size string

Returns a size of (possibly incomplete) string. A size of string is a number of bytes string occupies on memory. The same string may have different sizes if the native encoding scheme differs.

For incomplete string, its length and its size always match.

Function: string-ref cstring k :optional fallback

[R7RS+] Returns k-th character of a complete string cstring. It is an error to pass an incomplete string.

By default, an error is signaled if k is out of range (negative, or greater than or equal to the length of cstring). However, if an optional argument fallback is given, it is returned in such case. This is Gauche’s extension.

Function: string-byte-ref string k

Returns k-th byte of a (possibly incomplete) string string. Returned value is an integer in the range between 0 and 255. k must be greater than or equal to zero, and less than (string-size string).

Function: string-set! string k char

[R7RS] Substitute string’s k-th character by char. k must be greater than or equal to zero, and less than (string-length string). Return value is undefined.

If string is an incomplete string, integer value of the lower 8 bits of char is used to set string’s k-th byte.

See the notes in make-string about performance consideration.

Function: string-byte-set! string k byte

Substitute string’s k-th byte by integer byte. byte must be in the range between 0 to 255, inclusive. k must be greater than or equal to zero, and less than (string-size string). If string is a complete string, it is turned to incomplete string by this operation. Return value is undefined.

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.6 String Comparison

Function: string=? string1 string2 string3 …

[R7RS] Returns #t iff all arguments are strings with the same content.

Function: string<? string1 string2 string3 …
Function: string<=? string1 string2 string3 …
Function: string>? string1 string2 string3 …
Function: string>=? string1 string2 string3 …

[R7RS] Compares strings in codepoint order. Returns #t iff all the arguments are ordered.

Function: string-ci=? string1 string2 string3 …
Function: string-ci<? string1 string2 string3 …
Function: string-ci<=? string1 string2 string3 …
Function: string-ci>? string1 string2 string3 …
Function: string-ci>=? string1 string2 string3 …

Case-insensitive string comparison.

These procedures fold argument character-wise, according to Unicode-defined character-by-character case mapping. See char-foldcase for the details (Characters). Character-wise case folding doesn’t handles the case like German eszett:

(string-ci=? "\u00df" "SS") ⇒ #f

R7RS requires string-ci* procedures to use string case folding. Gauche provides R7RS-conformant case insensitive comparison procedures in gauche.unicode (see Full string case conversion). If you write in R7RS, importing (scheme char) library, you’ll use gauche.unicode’s string-ci* procedures.

Next: , Previous: , Up: Strings   [Contents][Index]

6.12.7 String utilities

Function: substring string start end

[R7RS] Returns a substring of string, starting from start-th character (inclusive) and ending at end-th character (exclusive). The start and end arguments must satisfy 0 <= start < N, 0 <= end <= N, and start <= end, where N is the length of the string.

When start is zero and end is N, this procedure returns a copy of string.

Actually, extended string-copy explained below is a superset of substring. This procedure is kept mostly for compatibility of R7RS programs. See also subseq in Sequence framework, for the generic version.

Function: string-append string …

[R7RS] Returns a newly allocated string whose content is concatenation of string ….

See also string-concatenate in SRFI-13 String reverse & append.

Function: string->list string :optional start end
Function: list->string list

[R7RS] Converts a string to a list of characters or vice versa.

You can give an optional start/end indexes to string->list.

For list->string, every elements of list must be a character, or an error is signaled. If you want to build a string out of a mixed list of strings and characters, you may want to use tree->string in Lazy text construction.

Function: string-copy string :optional start end

[R7RS] Returns a copy of string. You can give start and/or end index to extract the part of the original string (it makes string-copy a superset of substring effectively).

If only start argument is given, a substring beginning from start-th character (inclusive) to the end of string is returned. If both start and end argument are given, a substring from start-th character (inclusive) to end-th character (exclusive) is returned. See substring above for the condition that start and end should satisfy.

Node: R7RS’s destructive version string-copy! is provided by srfi-13 module (see String library).

Function: string-fill! string char :optional start end

[R7RS] Fills string by char. Optional start and end limits the effective area.

(string-fill! "orange" #\X)
  ⇒ "XXXXXX"
(string-fill! "orange" #\X 2 4)
  ⇒ "orXXge"
Function: string-join strs :optional delim grammer

[SRFI-13] Concatenate strings in the list strs, with a string delim as ‘glue’.

The argument grammer may be one of the following symbol to specify how the strings are concatenated.


Use delim between each string. This mode is default. Note that this mode introduce ambiguity when strs is an empty string or a list with a null string.

(string-join '("apple" "mango" "banana") ", ")
  ⇒ "apple, mango, banana"
(string-join '() ":")
  ⇒ ""
(string-join '("") ":")
  ⇒ ""

Works like infix, but empty list is not allowed to strs, thus avoiding ambiguity.


Use delim before each string.

(string-join '("usr" "local" "bin") "/" 'prefix)
  ⇒ "/usr/local/bin"
(string-join '() "/" 'prefix)
  ⇒ ""
(string-join '("") "/" 'prefix)
  ⇒ "/"

Use delim after each string.

(string-join '("a" "b" "c") "&" 'suffix)
  ⇒ "a&b&c&"
(string-join '() "&" 'suffix)
  ⇒ ""
(string-join '("") "&" 'suffix)
  ⇒ "&"
Function: string-scan string item :optional return
Function: string-scan-right string item :optional return

Scan item (either a string or a character) in string. While string-scan finds the leftmost match, string-scan-right finds the rightmost match.

The return argument specifies what value should be returned when item is found in string. It must be one of the following symbols.


Returns the index in string if item is found, or #f. This is the default behavior.

(string-scan "abracadabra" "ada") ⇒ 5
(string-scan "abracadabra" #\c) ⇒ 4
(string-scan "abracadabra" "aba") ⇒ #f

Returns a substring of string before item, or #f if item is not found.

(string-scan "abracadabra" "ada" 'before) ⇒ "abrac"
(string-scan "abracadabra" #\c 'before) ⇒ "abra"

Returns a substring of string after item, or #f if item is not found.

(string-scan "abracadabra" "ada" 'after) ⇒ "bra"
(string-scan "abracadabra" #\c 'after) ⇒ "adabra"

Returns a substring of string before item, and the substring after it. If item is not found, returns (values #f #f).

(string-scan "abracadabra" "ada" 'before*)
  ⇒ "abrac" and "adabra"
(string-scan "abracadabra" #\c 'before*)
  ⇒ "abra" and "cadabra"

Returns a substring of string up to the end of item, and the rest. If item is not found, returns (values #f #f).

(string-scan "abracadabra" "ada" 'after*)
  ⇒ "abracada" and "bra"
(string-scan "abracadabra" #\c 'after*)
  ⇒ "abrac" and "adabra"

Returns a substring of string before item and after item. If item is not found, returns (values #f #f).

(string-scan "abracadabra" "ada" 'both)
  ⇒ "abrac" and "bra"
(string-scan "abracadabra" #\c 'both)
  ⇒ "abra" and "adabra"
Function: string-split string splitter &optional limit

Splits string by splitter and returns a list of strings. splitter can be a character, a character set, a string, a regexp, or a procedure.

If splitter is a character, the character is used as a delimiter.

If splitter is a character set, any consecutive characters that are member of the character set are used as a delimiter.

If a procedure is given to splitter, it is called for each character in string, and the consecutive characters that caused splitter to return a true value are used as a delimiter.

(string-split "/aa/bb//cc" #\/)    ⇒ ("" "aa" "bb" "" "cc")
(string-split "/aa/bb//cc" "/")    ⇒ ("" "aa" "bb" "" "cc")
(string-split "/aa/bb//cc" "//")   ⇒ ("/aa/bb" "cc")
(string-split "/aa/bb//cc" #[/])   ⇒ ("" "aa" "bb" "cc")
(string-split "/aa/bb//cc" #/\/+/) ⇒ ("" "aa" "bb" "cc")
(string-split "/aa/bb//cc" #[\w])  ⇒ ("/" "/" "//" "")
(string-split "/aa/bb//cc" char-alphabetic?) ⇒ ("/" "/" "//" "")

;; some boundary cases
(string-split "abc" #\/) ⇒ ("abc")
(string-split ""    #\/) ⇒ ("")

If limit is given and not #f, it must be a nonnegative integer and specifies the maximum number of match to the splitter. Once the limit is reached, the rest of string is included in the result as is.

(string-split "a.b..c" "." 0)   ⇒ ("a.b..c")
(string-split "a.b..c" "." 1)   ⇒ ("a" "b..c")
(string-split "a.b..c" "." 2)   ⇒ ("a" "b" ".c")

See also string-tokenize in (see SRFI-13 other string operations).

Previous: , Up: Strings   [Contents][Index]

6.12.8 Incomplete strings

A string can be flagged as "incomplete" if it may contain byte sequences that do not consist of a valid multibyte character in the Gauche’s native encoding.

Incomplete strings may be generated in several circumstances; reading binary data as a string, reading a string data that has been ’chopped’ in middle of a multibyte character, or concatenating a string with other incomplete strings, for example.

Incomplete strings should be regarded as an exceptional case. It used to be a way to handle byte strings, but now we have u8vector (see Uniform vectors) for that purpose. In fact, we’re planning to remove it in the future releases.

Just in case, if you happen to get an incomplete string, you can convert it to a complete string by the following procedure:

Function: string-incomplete->complete str :optional handling

Reinterpret the content of an incomplete string str and returns a newly created complete string from it. The handling argument specifies how to handle the illegal byte sequences in str.


If str contains an illegal byte sequence, give up the conversion and returns #f. This is the default behavior.


Omit any illegal byte sequences. Always returns a complete string.

a character

Replace each byte in illegal byte sequences by the given character. Always returns a complete string.

If str is already a complete string, its copy is returned.

Previous: , Up: Strings   [Contents][Index]