For Development HEAD DRAFTSearch (procedure/syntax/module):

11.6 srfi.13 - String library

Module: srfi.13

Defines a large set of string-related procedures.

It was one of the earliest popular srfis, but some of the procedures were turned out not to align well with recent Scheme developments. See srfi.152 - String library (reduced), and srfi.130 - Cursor-based string library adapts this srfis to more “modern” form. Consider to use them for newer development.

Many procedures in this srfi specifies the position of a character in a string with an integer index. Although it’s portable, it is not optimal for multibyte strings. Gauche natively supports string cursors, which is more efficient than integer indexes (see String cursors). All SRFI-13 procedures that accepts integer indexes can also accept string cursors.


11.6.1 General conventions

There are a few common factors in string library API, which I don’t repeat in each function description

argument convention

The following argument names imply their types.

s, s1, s2

Those arguments must be strings.

char/char-set/pred

This argument can be a character, a character-set object, or a predicate that takes a single character and returns a boolean value. “Applying char/char-set/pred to a character” means, if char/char-set/pred is a character, it is compared to the given character; if char/char-set/pred is a character set, it is checked if the character set contains the given character; if char/char-set/pred is a procedure, it is applied to the given character. “A character satisfies char/char-set/pred” means such application to the character yields true value.

start, end

Lots of SRFI-13 functions takes these two optional arguments, which limit the area of input string from start-th character (inclusive) to end-th character (exclusive), where the operation is performed. When specified, the condition 0 <= start <= end <= length of the string must be satisfied. Default value of start and end is 0 and the length of the string, respectively.

shared variant

Some functions have variants with “/shared” attached to its name. SRFI-13 defines those functions to allow to share the part of input string, for better performance. Gauche doesn’t have a concept of shared string, and these functions are mere synonyms of their non-shared variants. However, Gauche internally shares the storage of strings, so generally you don’t need to worry about the overhead of copying substrings.

right variant

Most functions works from left to right of the input string. Some functions have variants with “-right” to its name, that works from right to left.


11.6.2 String predicates

Function: string-null? s

[SRFI-13]{srfi.13} Returns #t if s is an empty string, "".

Function: string-every char/char-set/pred s :optional start end

[SRFI-13]{srfi.13} Sees if every character in s satisfies char/char-set/pred. If so, string-every returns the value that is returned at the last application of char/char-set/pred. If any of the application returns #f, string-every returns #f immediately.

Function: string-any char/char-set/pred s :optional start end

[SRFI-13]{srfi.13} Sees if any character in s satisfies char/char-set/pred. If so, string-any returns the value that is returned by the application. If no character satisfies char/char-set/pred, #f is returned.


11.6.3 String constructors

Function: string-tabulate proc len

[SRFI-13]{srfi.13} proc must be a procedure that takes an integer argument and returns a character. string-tabulate creates a string, whose i-th character is calculated by (proc i).

(string-tabulate
  (lambda (i) (integer->char (+ i #x30))) 10)
 ⇒ "0123456789"
Function: string-unfold p f g seed :optional base make-final

[SRFI-13]{srfi.13} A fundamental string builder. The p, f and g are procedures, taking the current seed value. The stop predicate p determines when to stop: If it returns a true value, string building stops. The mapping function f returns a character from the current seed value. The next seed function g returns a next seed value from the current seed value. The seed argument gives the initial seed value.

(string-unfold (^n (= n 10))
               (^n (integer->char (+ n 48)))
               (^n (+ n 1))
               0)
  ⇒ "0123456789"

The optional argument base is, when given, prepended to the result string. Another optional argument make-final is a procedure that takes the last return value of g and returns a string that becomes the suffix of the result string.

(string-unfold (^n (= n 10))
               (^n (integer->char (+ n 48)))
               (^n (+ n 1))
               0 "foo" x->string)
  ⇒ "foo012345678910"
Function: string-unfold-right p f g seed :optional base make-final

[SRFI-13]{srfi.13} Another fundamental string builder. The meanings of arguments are the same as string-unfold. The only difference is that the string is build right-to-left. The optional base, if given, becomes the suffix of result, and the result of make-final becomes the prefix.

(string-unfold-right (^n (= n 10))
                     (^n (integer->char (+ n 48)))
                     (^n (+ n 1))
                     0 "foo" x->string)
  ⇒ "109876543210foo"
Function: reverse-list->string char-list

[SRFI-13]{srfi.13} ≡ (list->string (reverse char-list)).


11.6.4 String selection

Function: substring/shared s start :optional end

[SRFI-13]{srfi.13} In Gauche, this is the same as substring, except that the end argument is optional.

(substring/shared "abcde" 2) ⇒ "cde"
Function: string-copy! target tstart s :optional start end

[SRFI-13]{srfi.13} Copies a string s into a string target from the position tstart. The target string must be mutable. Optional start and end arguments limits the range of s. If the copied string run over the end of target, an error is signaled.

(define s (string-copy "abcde"))
(string-copy! s 2 "ZZ")
s ⇒ "abZZe"

It is ok to pass the same string to target and s; this always work even if the regions of source and destination are overlapping.

Note that Gauche encourages you to treat strings as immutable objects. Internally, a string is an indirect pointer to a immutable entity, and mutating a string means copying the original entity and creating a new one. It doesn’t “save allocations”. Always use the functional version string-copy unless you absolutely need to replace a string in-place. See String utilities.

Function: string-take s nchars
Function: string-drop s nchars
Function: string-take-right s nchars
Function: string-drop-right s nchars

[SRFI-13]{srfi.13} Returns the first nchars-character string of s (string-take) or the string without first nchars (string-drop). The *-right variation counts from the end of string. It is guaranteed that the returned string is always a copy of s, even no character is dropped.

(string-take "abcde" 2) ⇒ "ab"
(string-drop "abcde" 2) ⇒ "cde"

(string-take-right "abcde" 2) ⇒ "de"
(string-drop-right "abcde" 2) ⇒ "abc"
Function: string-pad s len :optional char start end
Function: string-pad-right s len :optional char start end

[SRFI-13]{srfi.13} If a string s is shorter than len, returns a string of len where char is padded to the left or right, respectively. If s is longer than len, the rightmost or leftmost len chars are taken. Char defaults to #\space. If start and end are provided, the substring of s is used as the source.

(string-pad "abc" 10)    ⇒ "       abc"
(string-pad "abcdefg" 3) ⇒ "efg"

(string-pad-right "abc" 10) ⇒ "abc       "

(string-pad "abcdefg" 10 #\+ 2 5)
  ⇒ "+++++++cde"
Function: string-trim s :optional char/char-set/pred start end
Function: string-trim-right s :optional char/char-set/pred start end
Function: string-trim-both s :optional char/char-set/pred start end

[SRFI-13]{srfi.13} Removes characters that match char/char-set/pred from s. String-trim removes the characters from left of s, string-trim-right does from right, and string-trim-both does from both sides. Char/char-set/pred defaults to #[\s], i.e. a char-set of whitespaces. If start and end are provided, the substring of s is used as the source.

(string-trim "   abc  ")       ⇒ "abc  "
(string-trim-right "   abc  ") ⇒ "   abc"
(string-trim-both "   abc  ")  ⇒ "abc"

11.6.5 String comparison

Function: string-compare s1 s2 proc< proc= proc> :optional start1 end1 start2 end2
Function: string-compare-ci s1 s2 proc< proc= proc> :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Compares two strings s1 and s2 codepoint-wise from left. When mismatch is found at the index k of s1, calls proc< with k if s1’s codepoint is smaller than the corresponding s2’s, or calls proc> if s1’s one is greater than s2’s. If two strings are the same, calls proc= with the index of the last compared position in s1.

(string-compare "abcd" "abzd"
                (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)))
  ⇒ (< 2)

(string-compare "abcd" "abcd"
                (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)))
  ⇒ (= 3)

The optional arguments restricts the range of the input strings; however, the index passed to one of the procedures is always an index from the beginning of s1.

(string-compare "zzabcdyy" "abcz"
   (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)) 2 6 0 4)
 ⇒ (< 5)

(string-compare "zzabcdyy" "abcz"
   (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)) 2 5 0 3)

 ⇒ (= 4)

The case-insensitive variant, string-compare-ci, compares each codepoint with character-wise case-folding. It won’t consider special case folding such as German eszett.

Function: string= s1 s2 :optional start1 end1 start2 end2
Function: string<> s1 s2 :optional start1 end1 start2 end2
Function: string< s1 s2 :optional start1 end1 start2 end2
Function: string<= s1 s2 :optional start1 end1 start2 end2
Function: string> s1 s2 :optional start1 end1 start2 end2
Function: string>= s1 s2 :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Compare two strings s1 and s2. Optional arguments can limit the portion of strings to be compared. Comparison is done by character-wise.

Note: The builtin procedures string=? etc. can also be used for character-wise string comparison, but they take arguments differently. See String comparison.

Function: string-ci= s1 s2 :optional start1 end1 start2 end2
Function: string-ci<> s1 s2 :optional start1 end1 start2 end2
Function: string-ci< s1 s2 :optional start1 end1 start2 end2
Function: string-ci<= s1 s2 :optional start1 end1 start2 end2
Function: string-ci> s1 s2 :optional start1 end1 start2 end2
Function: string-ci>= s1 s2 :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Compare two strings s1 and s2 in case-insensitive way. Optional arguments can limit the portion of strings to be compared. Case folding and comparison is done by character-wise, so they don’t consider case folding that affects multiple characters.

Note: We have two other sets of string comparison operations, both are named as string-ci=? etc. The builtin version (see String comparison) does character-wise comparison. The one in gauche.unicode uses full-string case conversion (see Full string case conversion). R7RS version is the latter.

Function: string-hash s :optional bound start end
Function: string-hash-ci s :optional bound start end

[SRFI-13]{srfi.13} (Note: Gauche has builtin string-hash and string-ci-hash according to SRFI-128. See Hashing, for the details. SRFI-13’s API is upper-compatible to SRFI-128’s. The underlying hash algorithm is the same as the builtin ones, so string-hash returns the same value as the builtin ones for the same string if optional arguments are omitted. On the other hand, the builtin string-ci-hash uses string case folding (e.g. German eszett and SS are the same), while SRFI-13’s string-hash-ci uses character-wise case folding. Unless there’s a strong reason, we recommend new code should use builtin SRFI-128 version instead of SRFI-13.)

Calculates hash value of a string s. For string-hash-ci, character-wise case folding is done before calculating the hash value.

If the optional bound argument is given, it must be a positive exact integer, and the return value is limited below it. The optional start and end arguments allows using that portion for calculation.


11.6.6 String prefixes & suffixes

Function: string-prefix-length s1 s2 :optional start1 end1 start2 end2
Function: string-suffix-length s1 s2 :optional start1 end1 start2 end2
Function: string-prefix-length-ci s1 s2 :optional start1 end1 start2 end2
Function: string-suffix-length-ci s1 s2 :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Returns the length of the longest common prefix/suffix of two strings, s1 and s2. The optional arguments restrict the range of search. The *-ci variations use case folding character comparison.

(string-prefix-length "abacus" "abalone")   ⇒ 3
(string-prefix-length "machine" "umbrella") ⇒ 0
(string-suffix-length "peeking" "poking")   ⇒ 4

(string-prefix-length "obvious" "oblivious" 2 7 4 9)
  ⇒ 5
Function: string-prefix? s1 s2 :optional start1 end1 start2 end2
Function: string-suffix? s1 s2 :optional start1 end1 start2 end2
Function: string-prefix-ci? s1 s2 :optional start1 end1 start2 end2
Function: string-suffix-ci? s1 s2 :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Returns true iff s1 is a prefix or suffix of s2, respectively. The optional arguments limit the range of s1 and s2 to look at. The *-ci variations use case folding character comparison.

(string-prefix? "sch" "scheme")   ⇒ #t
(string-prefix? "lisp" "scheme")  ⇒ #f

(string-suffix? "eme" "scheme")   ⇒ #t
(string-suffix? "eme" "lisp")     ⇒ #f

(string-prefix? "mit-scheme" "scheme-family" 4) ⇒ #t

11.6.7 String searching

Function: string-index s char/char-set/pred :optional start end
Function: string-index-right s char/char-set/pred :optional start end

[SRFI-13]{srfi.13} Looks for the first element in a string s that matches char/char-set/pred, and returns its index. If char/char-set/pred is not found in s, returns #f. Optional start and end limit the range of s to search.

(string-index "Aloha oe" #\a) ⇒ 4
(string-index "Aloha oe" #[Aa]) ⇒ 0
(string-index "Aloha oe" #[\s]) ⇒ 5
(string-index "Aloha oe" char-lower-case?) ⇒ 1
(string-index "Aloha oe" #\o 3) ⇒ 6

See also the Gauche built-in procedure string-scan (String utilities), if you need speed over portability.

Function: string-skip s char/char-set/pred :optional start end
Function: string-skip-right s char/char-set/pred :optional start end

[SRFI-13]{srfi.13} Looks for the first element that does not match char/char-set/pred and returns its index. If such element is not found, returns #f. Optional start and end limit the range of s to search.

Function: string-count s char/char-set/pred :optional start end

[SRFI-13]{srfi.13} Counts the number of elements in s that matches char/char-set/pred. Optional start and end limit the range of s to search.

Function: string-contains s1 s2 :optional start1 end1 start2 end2
Function: string-contains-ci s1 s2 :optional start1 end1 start2 end2

[SRFI-13]{srfi.13} Looks for a string s2 inside another string s1. If found, returns an index in s1 from where the matching string begins. Returns #f otherwise. Optional start1, end1, start2 and end2 limits the range of s1 and s2.

See also the Gauche built-in procedure string-scan (String utilities), if you need speed over portability.


11.6.8 String case mapping

Function: string-titlecase s :optional start end
Function: string-titlecase! s :optional start end
Function: string-upcase s :optional start end
Function: string-upcase! s :optional start end
Function: string-downcase s :optional start end
Function: string-downcase! s :optional start end

[SRFI-13]{srfi.13} Converts a string s to titlecase, upcase or downcase, respectively. These operations uses character-by-character mapping provided by char-upcase etc. That is, string-upcase and string-downcase can be understood as follow:

(string-upcase s)
  ≡ (string-map char-upcase s)
(string-downcase s)
  ≡ (string-map char-downcase s)

If you need full case mapping that handles the case when a character is mapped to more than one characters, use the procedures with the same name in gauche.unicode module (see Full string case conversion).

The linear-update version string-titlecase!, string-upcase! and string-downcase! destroys s to store the result. Note that in Gauche, using those procedures doesn’t save anything, since string mutation is expensive by design. They are provided merely for completeness.


11.6.9 String reverse & append

Function: string-reverse s :optional start end
Function: string-reverse! s :optional start end

[SRFI-13]{srfi.13} Returns a string in which the character positions are reversed from s. string-reverse! modifies s.

(string-reverse "mahalo") ⇒ "olaham"
(string-reverse "mahalo" 3) ⇒ "ola"
(string-reverse "mahalo" 1 4) ⇒ "aha"

(let ((s (string-copy "mahalo")))
  (string-reverse! s 1 5)
  s)
  ⇒ "mlahao"
Function: string-concatenate string-list

[SRFI-13]{srfi.13} Concatenates list of strings.

(string-concatenate '("humuhumu" "nukunuku" "apua" "`a"))
  ⇒ "humuhumunukunukuapua`a"
Function: string-concatenate/shared string-list
Function: string-append/shared s …

[SRFI-13]{srfi.13} “Shared” version of string-concatenate and string-append. In Gauche, these are just synonyms of them.

Function: string-concatenate-reverse string-list
Function: string-concatenate-reverse/shared string-list

[SRFI-13]{srfi.13} Reverses string-list before concatenation. “Shared” version works the same in Gauche.


11.6.10 String mapping

Function: string-map! proc s :optional start end

[SRFI-13]{srfi.13} string-map! applies proc on every character of s, and stores the results into s. It is an error if proc returns non-character.

(let ((s (string-copy "wikiwiki")))
  (string-map! char-upcase s 4)
  s)
  ⇒ "wikiWIKI"
Function: string-fold kons knil s :optional start end
Function: string-fold-right kons knil s :optional start end

[SRFI-13]{srfi.13} Like fold and fold-right (see Walking over lists), but works on a string instead of a list.

(string-fold cons '() "abcde")
  ⇒ (#\e #\d #\c #\b #\a)
(string-fold-right cons '() "abcde")
  ⇒ (#\a #\b #\c #\d #\e)
Function: string-for-each-index proc s :optional start end

[SRFI-13]{srfi.13} Call proc on each index of the string s, from left to right. The result of proc is discarded. The optional start and end arguments can be an integer index or a string cursor, restricting the range of the input string to be traversed.


11.6.11 String rotation

Function: xsubstring s from :optional to start end

[SRFI-13]{srfi.13} Takes a substring of infinite repetition of string s between index from (inclusive) and index to (exclusive). If to is omitted, the length of s is assumed.

For example, if s is "abcde", we repeat it infinitely to both sides. So 5n-th character for integer n is always #\a, which extends negative n as well.

(xsubstring "abcde" 2 10)
  ⇒ "cdeabcde"
(xsubstring "abcde" -9 -2)
  ⇒ "bcdeabc"

The optional start and end arguments can be an integer index or a string cursor, it works as (xsubstring (substring s start end) from to).

Function: string-xcopy! target tstart s sfrom :optional sto start end

[SRFI-13]{srfi.13} It works as (string-copy! target tstart (xsubstring s sfrom sto start end)).


11.6.12 Other string operations

Function: string-replace s1 s2 start1 end1 :optional start2 end2

[SRFI-13]{srfi.13} Returns a new string whose content is a copy of a string s1, except the part beginning from the index start1 (inclusive) and ending at the index end1 (exclusive) is replaced by a string s2. When optional start2 and end2 arguments are given, s2 is trimmed first according to them. The size of the gap, (- end1 start1), doesn’t need to be the same as the size of the inserted string. Effectively, this is the same as the following code.

(string-append (substring s1 0 start1)
               (substring s2 start2 end2)
               (substring s1 end1 (string-length s1)))
Function: string-tokenize s :optional token-set start end

[SRFI-13]{srfi.13} Splits the string s into a list of substrings, where each substring is a maximal non-empty contiguous sequence of characters from the character set token-set. The default of token-set is char-set:graphic (see Predefined character sets).

See also Gauche’s built-in string-split (see String utilities), which provides similar features but different criteria.


11.6.13 String filtering

Function: string-filter char/char-set/pred s :optional start end
Function: string-delete char/char-set/pred s :optional start end

[SRFI-13]{srfi.13} Returns a string consists of characters in a string s that passes (or don’t pass) the test indicated by char/char-set/pred, respectively.

(string-filter char-upper-case? "Hello, World!")
  ⇒ "HW"

(string-delete char-upper-case? "Hello, World!")
  ⇒ "ello, orld!"

(string-delete #\l "Hello, World!")
  ⇒ "Heo, Word!"

(string-filter #[\w] "Hello, World!")
  ⇒ "HelloWorld"

Note: SRFI-13 was revised after finalization to switch the order of arguments char/char-set/pred and s was. At the time of finalization, the order was (string-filter s pred) and Gauche implemented it accordingly. However, most existing implementations follows the revised order, since that was what the SRFI-13 reference implementation had.

So, from 0.9.4, we revised the API to comply the current SRFI-13 spec, but we also accept the old order as well not to break the old code. We recommend the new code to use the new order.


11.6.14 Low-level string procedures

Here are some helper procedures useful to write other string-processing utilities similar to SRFI-13:

Function: string-parse-start+end proc s args
Function: string-parse-final-start+end proc s args

[SRFI-13]{srfi.13} Most SRFI-13 procedures takes optional start and end arguments. These procedures looks for them in the rest arguments args, and if they’re not provided, gives the default values.

The proc argument is the name (symbol) of the procedure, to be used in the error condition, and s is the string to be processed.

Both expects an optional start argument at the beginning of args, followed by an optional end argument. If the end argument is missing, the length of s is assumed. If the start argument is also missing, 0 is assumed. If arguments are provided, they must be exact integers, and 0 <= start <= end <= (string-length s) must be satisfied; otherwise, an error is signaled.

If more than two arguments are in args, string-parse-final-start+end raises an error (in other words, it requires that the argument list end with the end argument at most), while string-parse-start+end permits it and returns the result of arguments, along with start and end indexes.

Both function return three values: The rest of the argument list, the start index, and the end index.

Macro: let-string-start+end (start end [rest]) proc-exp s-exp args-exp body …

[SRFI-13]{srfi.13}

Function: check-substring-spec proc s start end
Function: substring-spec-ok? s start end

[SRFI-13]{srfi.13}

Function: make-kmp-restart-vector s :optional c= start end

[SRFI-13]{srfi.13}

Function: kmp-step pat rv c i c= p-start

[SRFI-13]{srfi.13}

Function: string-kmp-partial-search pat rv s i :optional c= p-start s-start s-end

[SRFI-13]{srfi.13}



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT