srfi.13
- String library ¶Defines a large set of string-related procedures.
It was one of the earliest popular srfis, but some of
the procedures were turned out not to align well with recent Scheme
developments. See srfi.152
- String library (reduced),
and srfi.130
- Cursor-based string library adapts this srfis to more
“modern” form. Consider to use them for newer development.
Many procedures in this srfi specifies the position of a character in a string with an integer index. Although it’s portable, it is not optimal for multibyte strings. Gauche natively supports string cursors, which is more efficient than integer indexes (see String cursors). All SRFI-13 procedures that accepts integer indexes can also accept string cursors.
There are a few common factors in string library API, which I don’t repeat in each function description
The following argument names imply their types.
Those arguments must be strings.
This argument can be a character, a character-set object, or a predicate that takes a single character and returns a boolean value. “Applying char/char-set/pred to a character” means, if char/char-set/pred is a character, it is compared to the given character; if char/char-set/pred is a character set, it is checked if the character set contains the given character; if char/char-set/pred is a procedure, it is applied to the given character. “A character satisfies char/char-set/pred” means such application to the character yields true value.
Lots of SRFI-13 functions takes these two optional arguments, which limit the area of input string from start-th character (inclusive) to end-th character (exclusive), where the operation is performed. When specified, the condition 0 <= start <= end <= length of the string must be satisfied. Default value of start and end is 0 and the length of the string, respectively.
shared
variantSome functions have variants with “/shared” attached to its name. SRFI-13 defines those functions to allow to share the part of input string, for better performance. Gauche doesn’t have a concept of shared string, and these functions are mere synonyms of their non-shared variants. However, Gauche internally shares the storage of strings, so generally you don’t need to worry about the overhead of copying substrings.
right
variantMost functions works from left to right of the input string. Some functions have variants with “-right” to its name, that works from right to left.
[SRFI-13]{srfi.13
}
Returns #t
if s is an empty string, ""
.
[SRFI-13]{srfi.13
}
Sees if every character in s satisfies
char/char-set/pred. If so, string-every
returns
the value that is returned at the last application of char/char-set/pred.
If any of the application returns #f
, string-every
returns #f
immediately.
[SRFI-13]{srfi.13
}
Sees if any character in s satisfies
char/char-set/pred. If so, string-any
returns
the value that is returned by the application. If no character
satisfies char/char-set/pred, #f
is returned.
[SRFI-13]{srfi.13
}
proc must be a procedure that takes an integer
argument and returns a character. string-tabulate
creates a string, whose i-th character is calculated by
(proc i)
.
(string-tabulate (lambda (i) (integer->char (+ i #x30))) 10) ⇒ "0123456789"
[SRFI-13]{srfi.13
}
A fundamental string builder. The p, f and g are
procedures, taking the current seed value. The stop predicate p
determines when to stop: If it returns a true value, string building
stops. The mapping function f returns a character
from the current seed value. The next seed function g returns
a next seed value from the current seed value. The seed argument
gives the initial seed value.
(string-unfold (^n (= n 10)) (^n (integer->char (+ n 48))) (^n (+ n 1)) 0) ⇒ "0123456789"
The optional argument base is, when given, prepended to the result string. Another optional argument make-final is a procedure that takes the last return value of g and returns a string that becomes the suffix of the result string.
(string-unfold (^n (= n 10)) (^n (integer->char (+ n 48))) (^n (+ n 1)) 0 "foo" x->string) ⇒ "foo012345678910"
[SRFI-13]{srfi.13
}
Another fundamental string builder. The meanings of arguments
are the same as string-unfold
. The only difference is
that the string is build right-to-left. The optional base,
if given, becomes the suffix of result, and the result of
make-final becomes the prefix.
(string-unfold-right (^n (= n 10)) (^n (integer->char (+ n 48))) (^n (+ n 1)) 0 "foo" x->string) ⇒ "109876543210foo"
[SRFI-13]{srfi.13
}
≡ (list->string (reverse char-list))
.
[SRFI-13]{srfi.13
}
In Gauche, this is the same as substring
, except
that the end argument is optional.
(substring/shared "abcde" 2) ⇒ "cde"
[SRFI-13]{srfi.13
}
Copies a string s into a string
target from the position tstart.
The target string must be mutable.
Optional start and end arguments limits the range of s.
If the copied string run over the end of target, an error is
signaled.
(define s (string-copy "abcde")) (string-copy! s 2 "ZZ") s ⇒ "abZZe"
It is ok to pass the same string to target and s; this always work even if the regions of source and destination are overlapping.
Note that Gauche encourages you to treat strings as immutable objects.
Internally, a string is an indirect pointer to a immutable entity, and
mutating a string means copying the original entity and creating a new one.
It doesn’t “save allocations”.
Always use the functional version string-copy
unless you
absolutely need to replace a string in-place.
See String utilities.
[SRFI-13]{srfi.13
}
Returns the first nchars-character string of s
(string-take
) or the string without first nchars
(string-drop
). The *-right
variation counts from
the end of string. It is guaranteed that the returned string is
always a copy of s, even no character is dropped.
(string-take "abcde" 2) ⇒ "ab" (string-drop "abcde" 2) ⇒ "cde" (string-take-right "abcde" 2) ⇒ "de" (string-drop-right "abcde" 2) ⇒ "abc"
[SRFI-13]{srfi.13
}
If a string s is shorter than len,
returns a string of len where char is
padded to the left or right, respectively.
If s is longer than len, the rightmost
or leftmost len chars are taken.
Char defaults to #\space
.
If start and end are provided,
the substring of s is used as the source.
(string-pad "abc" 10) ⇒ " abc" (string-pad "abcdefg" 3) ⇒ "efg" (string-pad-right "abc" 10) ⇒ "abc " (string-pad "abcdefg" 10 #\+ 2 5) ⇒ "+++++++cde"
[SRFI-13]{srfi.13
}
Removes characters that match char/char-set/pred
from s. String-trim
removes the characters from
left of s, string-trim-right
does from right,
and string-trim-both
does from both sides.
Char/char-set/pred defaults to #[\s]
, i.e. a char-set
of whitespaces.
If start and end are provided,
the substring of s is used as the source.
(string-trim " abc ") ⇒ "abc " (string-trim-right " abc ") ⇒ " abc" (string-trim-both " abc ") ⇒ "abc"
[SRFI-13]{srfi.13
}
Compares two strings s1 and s2 codepoint-wise from left.
When mismatch is found at the index k of s1,
calls proc< with k if s1’s codepoint is smaller than
the corresponding s2’s, or calls proc> if s1’s one is
greater than s2’s. If two strings are the same, calls proc=
with the index of the last compared position in s1.
(string-compare "abcd" "abzd" (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i))) ⇒ (< 2) (string-compare "abcd" "abcd" (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i))) ⇒ (= 3)
The optional arguments restricts the range of the input strings; however, the index passed to one of the procedures is always an index from the beginning of s1.
(string-compare "zzabcdyy" "abcz" (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)) 2 6 0 4) ⇒ (< 5) (string-compare "zzabcdyy" "abcz" (^i `(< ,i)) (^i `(= ,i)) (^i `(> ,i)) 2 5 0 3) ⇒ (= 4)
The case-insensitive variant, string-compare-ci
, compares
each codepoint with character-wise case-folding. It won’t consider
special case folding such as German eszett.
[SRFI-13]{srfi.13
}
Compare two strings s1 and s2. Optional arguments
can limit the portion of strings to be compared.
Comparison is done by character-wise.
Note: The builtin procedures string=?
etc. can also be
used for character-wise string comparison, but they take
arguments differently. See String comparison.
[SRFI-13]{srfi.13
}
Compare two strings s1 and s2 in case-insensitive way.
Optional arguments can limit the portion of strings to be compared.
Case folding and comparison is done by character-wise, so they don’t
consider case folding that affects multiple characters.
Note: We have two other sets of string comparison operations,
both are named as string-ci=?
etc.
The builtin version (see String comparison) does character-wise
comparison. The one in gauche.unicode
uses full-string
case conversion (see Full string case conversion).
R7RS version is the latter.
[SRFI-13]{srfi.13
}
(Note: Gauche has builtin string-hash
and string-ci-hash
according to SRFI-128. See Hashing, for the details.
SRFI-13’s API is upper-compatible to SRFI-128’s. The underlying
hash algorithm is the same as the builtin ones, so string-hash
returns the same value as the builtin ones for the same string
if optional arguments are omitted.
On the other hand, the builtin string-ci-hash
uses string case
folding (e.g. German eszett and SS
are the same), while
SRFI-13’s string-hash-ci
uses character-wise case folding.
Unless there’s a strong reason, we recommend new code should use
builtin SRFI-128 version instead of SRFI-13.)
Calculates hash value of a string s. For string-hash-ci
,
character-wise case folding is done before calculating the hash value.
If the optional bound argument is given, it must be a positive exact integer, and the return value is limited below it. The optional start and end arguments allows using that portion for calculation.
[SRFI-13]{srfi.13
}
Returns the length of the longest common prefix/suffix of two strings,
s1 and s2. The optional arguments restrict the range of
search. The *-ci
variations use case folding character comparison.
(string-prefix-length "abacus" "abalone") ⇒ 3 (string-prefix-length "machine" "umbrella") ⇒ 0 (string-suffix-length "peeking" "poking") ⇒ 4 (string-prefix-length "obvious" "oblivious" 2 7 4 9) ⇒ 5
[SRFI-13]{srfi.13
}
Returns true iff s1 is a prefix or suffix of s2, respectively.
The optional arguments limit the range of s1 and s2 to look at.
The *-ci
variations use case folding character comparison.
(string-prefix? "sch" "scheme") ⇒ #t (string-prefix? "lisp" "scheme") ⇒ #f (string-suffix? "eme" "scheme") ⇒ #t (string-suffix? "eme" "lisp") ⇒ #f (string-prefix? "mit-scheme" "scheme-family" 4) ⇒ #t
[SRFI-13]{srfi.13
}
Looks for the first element in a string s
that matches char/char-set/pred, and returns its index.
If char/char-set/pred is not found in s, returns #f
.
Optional start and end limit the range of s to search.
(string-index "Aloha oe" #\a) ⇒ 4 (string-index "Aloha oe" #[Aa]) ⇒ 0 (string-index "Aloha oe" #[\s]) ⇒ 5 (string-index "Aloha oe" char-lower-case?) ⇒ 1 (string-index "Aloha oe" #\o 3) ⇒ 6
See also the Gauche built-in procedure string-scan
(String utilities), if you need speed over portability.
[SRFI-13]{srfi.13
}
Looks for the first element that does not match
char/char-set/pred and returns its index.
If such element is not found, returns #f
.
Optional start and end limit the range of s to search.
[SRFI-13]{srfi.13
}
Counts the number of elements in s
that matches char/char-set/pred.
Optional start and end limit the range of s to search.
[SRFI-13]{srfi.13
}
Looks for a string s2 inside another string s1.
If found, returns an index in s1 from where the matching string
begins. Returns #f
otherwise.
Optional start1, end1, start2 and end2
limits the range of s1 and s2.
See also the Gauche built-in procedure string-scan
(String utilities), if you need speed over portability.
[SRFI-13]{srfi.13
}
Converts a string s to titlecase, upcase or downcase,
respectively. These operations uses character-by-character
mapping provided by char-upcase
etc. That is, string-upcase
and string-downcase
can be understood as follow:
(string-upcase s) ≡ (string-map char-upcase s) (string-downcase s) ≡ (string-map char-downcase s)
If you need full case mapping that handles the case when
a character is mapped to more than one characters, use
the procedures with the same name in gauche.unicode
module
(see Full string case conversion).
The linear-update version string-titlecase!
, string-upcase!
and string-downcase!
destroys s to store the result.
Note that in Gauche, using those procedures doesn’t save anything,
since string mutation is expensive by design. They are provided merely
for completeness.
[SRFI-13]{srfi.13
}
Returns a string in which the character positions are reversed
from s. string-reverse!
modifies s.
(string-reverse "mahalo") ⇒ "olaham" (string-reverse "mahalo" 3) ⇒ "ola" (string-reverse "mahalo" 1 4) ⇒ "aha" (let ((s (string-copy "mahalo"))) (string-reverse! s 1 5) s) ⇒ "mlahao"
[SRFI-13]{srfi.13
}
Concatenates list of strings.
(string-concatenate '("humuhumu" "nukunuku" "apua" "`a")) ⇒ "humuhumunukunukuapua`a"
[SRFI-13]{srfi.13
}
“Shared” version of string-concatenate
and
string-append
. In Gauche, these are just synonyms of them.
[SRFI-13]{srfi.13
}
Reverses string-list before concatenation.
“Shared” version works the same in Gauche.
[SRFI-13]{srfi.13
}
string-map!
applies proc on every character of s,
and stores the results into s. It is an error if proc returns
non-character.
(let ((s (string-copy "wikiwiki"))) (string-map! char-upcase s 4) s) ⇒ "wikiWIKI"
[SRFI-13]{srfi.13
}
Like fold and fold-right (see Walking over lists),
but works on a string instead of a list.
(string-fold cons '() "abcde") ⇒ (#\e #\d #\c #\b #\a) (string-fold-right cons '() "abcde") ⇒ (#\a #\b #\c #\d #\e)
[SRFI-13]{srfi.13
}
Call proc on each index of the string s, from left to right.
The result of proc is discarded.
The optional start and end arguments can be an integer
index or a string cursor, restricting the range
of the input string to be traversed.
[SRFI-13]{srfi.13
}
Takes a substring of infinite repetition of string s
between index from (inclusive) and index to (exclusive).
If to is omitted, the length of s is assumed.
For example, if s is "abcde"
, we repeat it
infinitely to both sides. So 5n-th character for integer n
is always #\a
, which extends negative n as well.
(xsubstring "abcde" 2 10) ⇒ "cdeabcde" (xsubstring "abcde" -9 -2) ⇒ "bcdeabc"
The optional start and end arguments can be an integer
index or a string cursor, it works as
(xsubstring (substring s start end) from to)
.
[SRFI-13]{srfi.13
}
It works as
(string-copy! target tstart (xsubstring s sfrom sto start end))
.
[SRFI-13]{srfi.13
}
Returns a new string whose content is a copy of a string s1, except
the part beginning from the index
start1 (inclusive) and ending at the index end1 (exclusive)
is replaced by a string s2. When optional start2 and end2
arguments are given, s2 is trimmed first according to them.
The size of the gap, (- end1 start1)
, doesn’t
need to be the same as the size of the inserted string.
Effectively, this is the same as the following code.
(string-append (substring s1 0 start1) (substring s2 start2 end2) (substring s1 end1 (string-length s1)))
[SRFI-13]{srfi.13
}
Splits the string s into a list of substrings,
where each substring is a maximal non-empty contiguous
sequence of characters from the character set token-set.
The default of token-set is char-set:graphic
(see Predefined character sets).
See also Gauche’s built-in string-split
(see String utilities),
which provides similar features but different criteria.
[SRFI-13]{srfi.13
}
Returns a string consists of characters in a string s
that passes (or don’t pass) the test indicated by char/char-set/pred,
respectively.
(string-filter char-upper-case? "Hello, World!") ⇒ "HW" (string-delete char-upper-case? "Hello, World!") ⇒ "ello, orld!" (string-delete #\l "Hello, World!") ⇒ "Heo, Word!" (string-filter #[\w] "Hello, World!") ⇒ "HelloWorld"
Note: SRFI-13 was revised after finalization to switch
the order of arguments char/char-set/pred and s was.
At the time of finalization, the order was
(string-filter s pred)
and Gauche implemented it accordingly.
However, most existing implementations follows the revised order,
since that was what the SRFI-13 reference implementation had.
So, from 0.9.4, we revised the API to comply the current SRFI-13 spec, but we also accept the old order as well not to break the old code. We recommend the new code to use the new order.
Here are some helper procedures useful to write other string-processing utilities similar to SRFI-13:
[SRFI-13]{srfi.13
}
Most SRFI-13 procedures takes optional start and end arguments.
These procedures looks for them in the rest arguments args,
and if they’re not provided, gives the default values.
The proc argument is the name (symbol) of the procedure, to be used in the error condition, and s is the string to be processed.
Both expects an optional start argument at the beginning of args,
followed by an optional end argument. If the end argument
is missing, the length of s is assumed. If the start argument
is also missing, 0
is assumed.
If arguments are provided, they must be exact integers,
and 0
<= start <= end
<= (string-length s)
must
be satisfied; otherwise, an error is signaled.
If more than two arguments are in args, string-parse-final-start+end
raises an error (in other words, it requires that the argument list
end with the end argument at most), while string-parse-start+end
permits it and returns the result of arguments, along with start and
end indexes.
Both function return three values: The rest of the argument list, the start index, and the end index.
[SRFI-13]{srfi.13
}
[SRFI-13]{srfi.13
}
[SRFI-13]{srfi.13
}
[SRFI-13]{srfi.13
}
[SRFI-13]{srfi.13
}