Next: Regular expressions, Previous: Character Sets, Up: Core library [Contents][Index]
A string class.
It should be emphasized that Gauche’s internal string object, string body, is immutable. To comply R7RS in which strings are mutable, a Scheme-level string object is an indirect pointer to a string body. Mutating a string means that Gauche creates a new immutable string body that reflects the changes, then swap the pointer in the Scheme-level string object.
This may affect some assumptions on the cost of string operations.
Gauche does not attempt to make string mutation faster;
(string-set! s k c)
is exactly as slow as to take
two substrings, before and after of k-th character, and
concatenate them with a single-character string inbetween.
So, just avoid string mutations; we believe it’s a better practice.
See also String constructors.
R7RS string operations are very minimal. Gauche supports some
extra built-in operations, and also a rich string library
defined in SRFI-13. See srfi.13
- String library, for details about SRFI-13.
Next: String predicates, Previous: Strings, Up: Strings [Contents][Index]
"
…"
¶[R7RS+] Denotes a literal string. Inside the double quotes, the following backslash escape sequences are recognized.
\"
[R7RS] Double-quote character
\\
[R7RS] Backslash character
\n
[R7RS] Newline character (ASCII 0x0a).
\r
[R7RS] Return character (ASCII 0x0d).
\f
Form-feed character (ASCII 0x0c).
\t
[R7RS] Tab character (ASCII 0x09)
\a
[R7RS] Alarm character (ASCII 0x07).
\b
[R7RS] Backspace character (ASCII 0x08).
\0
ASCII NUL character (ASCII 0x00).
\<whitespace>*<newline><whitespace>*
[R7RS] Ignored. This can be used to break a long string literal for readability. This escape sequence is introduced in R6RS.
\xN;
[R7RS] A character whose Unicode codepoint is represented by hexadecimal number N, which is any number of hexadecimal digits. (See the compatibility notes below.)
\uNNNN
A character whose UCS2 code is represented by four-digit hexadecimal number NNNN.
\UNNNNNNNN
A character whose UCS4 code is represented by eight-digit hexadecimal number NNNNNNNN.
The following code is an example of backslash-newline escape sequence:
(define *message* "\ This is a long message \ in a literal string.") *message* ⇒ "This is a long message in a literal string."
Note the whitespace just after ‘message’. Since any whitespaces before ‘in’ is eaten by the reader, you have to put a whitespace between ‘message’ and the following backslash. If you want to include an actual newline character in a string, and any indentation after it, you can put ’\n’ in the next line like this:
(define *message/newline* "\ This is a long message, \ \n with a line break.")
Note for the compatibility:
We used to recognize a syntax \xNN
(two-digit hexadecimal number,
without semicolon terminator) as a character in a string; for example,
"\x0d\x0a"
was the same as "\r\n"
. We still support it
when we don’t see the terminating semicolon, for the compatibility.
There are ambiguous cases: "\0x0a;"
means "\n"
in the
current syntax, while "\n;"
in the legacy syntax.
Setting the reader mode to legacy
restores the old behavior.
Setting the reader mode to warn-legacy
makes it work like the default
behavior, but prints warning when it finds legacy syntax.
See Reader lexical mode, for the details.
Next: String constructors, Previous: String syntax, Up: Strings [Contents][Index]
[R7RS base]
Returns #t
if obj is a string, #f
otherwise.
Returns #t
if obj is an immutable string, #f
otherwise
String literals, and the strings returned from certain procedures such
as symbol->string
are immutable. To ensure you get
an immutable string in a program, you can use string-copy-immutable
.
Returns #t
if obj is an incomplete string, #f
otherwise
Next: String interpolation, Previous: String predicates, Up: Strings [Contents][Index]
[R7RS base] Returns a string of length k. If optional char is given, the new string is filled with it. Otherwise, the string is filled with a whitespace. The result string is always complete.
(make-string 5 #\x) ⇒ "xxxxx"
Note that the algorithm to allocate a string by make-string
and
then fills it one character at a time is extremely inefficient
in Gauche, and should be avoided.
In Gauche, a string is simply a pointer to an immutable string content.
If you mutate a string by, e.g. string-set!
, Gauche allocates
whole new immutable string content, copies the original content with
modification, then swap the pointer of the original string. It is no
more efficient than making a new copy.
You can use an output string port for a string construction
(see String ports).
Even creating a list of characters and
using list->string
is faster than using make-string
and
string-set!
.
Creates and returns an incomplete string o size k. If byte is given, which must be an exact integer, and its lower 8 bits are used to initialize every byte in the created string.
[R7RS base] Returns a string consisted by char ….
A generic coercion function.
Returns a string representation of obj.
The default methods are defined as follows: strings are returned
as is, numbers are converted by number->string
, symbols are
converted by symbol->string
, and other objects are
converted by display
.
Other class may provide a method to customize the behavior.
Next: String cursors, Previous: String constructors, Up: Strings [Contents][Index]
The term "string interpolation" is used in various scripting languages such as Perl and Python to refer to the feature to embed expressions in a string literal, which are evaluated and then their results are inserted into the string literal at run time.
Scheme doesn’t define such a feature, but Gauche implements it as a reader macro.
#
string-literal ¶Evaluates to a string. If string-literal contains the
character sequence ~expr
, where
expr is a valid external representation
of a Scheme expression, expr is evaluated and
its result is inserted in the original place (by using x->string
,
see String constructors).
The tilde and the following expression must be adjacent (without containing any whitespace characters), or it is not recognized as a special sequence.
To include a tilde itself immediately followed by non-delimiting
character, use ~~
.
Other characters in the string-literal are copied as is.
If you use a variable as expr and need to delimit it from the subsequent string, you can use the symbol escape syntax using ‘|’ character, as shown in the last two examples below.
#"This is Gauche, version ~(gauche-version)." ⇒ "This is Gauche, version 0.9.15." #"Date: ~(sys-strftime \"%Y/%m/%d\" (sys-localtime (sys-time)))" ⇒ "Date: 2002/02/18" (let ((a "AAA") (b "BBB")) #"xxx ~a ~b zzz") ⇒ "xxx AAA BBB zzz" #"123~~456~~789" ⇒ "123~456~789" (let ((n 7)) #"R~|n|RS") ⇒ "R7RS" (let ((x "bar")) #"foo~|x|.") ⇒ "foobar"
In fact, the reader expands this syntax into a macro call,
which is then expanded into a call of string-append
as follows:
#"This is Gauche, version ~(gauche-version)." ≡ (string-interpolate* ("This is Gauche, version " (gauche-version) ".")) ;; then, it expands to... (string-append "This is Gauche, version " (x->string (gauche-version)) ".")
(NB: The exact spec of string-interpolate*
might change in future,
so do not rely on the current behavior.)
Since the #"..."
syntax is equivalent to a macro call
of string-interpolate*
, which is provided in the Gauche module,
it must be visible from where you use the interpolation syntax.
When you write Gauche code, typically you implicitly inherit the Gauche
module so you don’t need to worry; however, if you start from R7RS code,
make sure you import string-interpolate*
(by (import (gauche base))
, for example) whenever you use
string interpolation syntax. Also be careful not to shadow
string-interpolate*
locally.
#`
string-literal ¶This is the old style of string-interpolation. It is still recognized, but discouraged for the new code.
Inside string-literal, you can use ,expr
(instead of ~expr
) to
evaluate expr. If comma isn’t immediately followed by a
character starting an expression, it loses special meaning.
#`"This is Gauche, version ,(gauche-version)"
Rationale of the syntax:
There are wide variation of string interpolation syntax among scripting
languages. They are usually linked with other syntax of the language
(e.g. prefixing $
to mark evaluating place is in sync with variable
reference syntax in some languages).
The old style of string interpolation syntax was taken from quasiquote syntax, because those two are conceptually similar operations (see Quasiquotation). However, since comma character is frequently used in string literals, it was rather awkward.
We decided that tilde is more suitable as the unquote character for the following reasons.
format
uses ~
to introduce
format directives (see Formatting output). Lispers are used to
scan ~
’s in a string as variable portions.
~
is a universal accessor, and the operator has a nuance
of “taking something out of it”
(see Universal accessor).
~
as the unquote character in
the quasiquote syntax, instead of commas.
Note that Scheme allows wider range of characters for valid identifier names
than usual scripting languages.
Consequently, you will almost always need to use ‘|’ delimiters
when you interpolate the value of a variable.
For example, while you can write
"$year/$month/$day $hour:$minutes:$seconds"
in Perl,
you should write #"~|year|/~|month|/~day ~|hour|:~|minutes|:~seconds"
.
It may be better always to delimit direct variable references
in this syntax to avoid confusion.
Next: String indexing, Previous: String interpolation, Up: Strings [Contents][Index]
String cursors are opaque objects that point into strings, similar to
indexes. Cursors however are more efficient. For example, to get a
character with string-ref
using an index on a multibyte string,
Gauche needs to iterate from the beginning of the string until that
position, or O(n)
. Using cursors you can access in O(1)
(for singlebyte (ASCII) strings or an indexed string,
Gauche does it in O(1)
even with index. See String indexing,
for the details of indexed string.)
For a string of length n, there can be n+1 cursors. The last cursor at the end of the string does not point to any valid character, it’s usually used to determine if nothing is found.
A string cursor is associated with a specific string and should not be
used with another string. A string cursor also becomes invalid when
the associated string is modified. Accessing an invalid cursor does
not always fail though. Running gosh
with -fsafe-string-cursors
could help catch these issues, with some performance overhead.
See Invoking Gosh.
Most of the time, string cursors aren’t heap-allocated. It is only
allocated in heap either (1) when it points at a huge byte index,
or (2) when you use -fsafe-string-cursors
to enable extra
run-time check.
The threashold of byte index to cause a string cursor to be heap-allocated is 2^56 on 64bit systems, and 2^24 on 32bit systems, in the current implementation. On 64bit systems you will never hit the threashold practically. On 32bit systems you may, if you have a huge string, but you may want to consider using other data structure rather than keeping such data in one string object.
Most procedures that take indexes in Gauche can also take
cursors. Relying on this though is unportable. For example, the
substring
procedure in RnRS standards does not mention anything
about cursors even though the Gauche version accepts cursors. For
portable programs, you should only use cursors on procedures from
srfi.130
module (see srfi.130
- Cursor-based string library).
Represents a cursor. When printed out, you’ll see the byte offset from the beginning of the string, not the character index.
(string-index->cursor "あかさたな" 2) ⇒ #<string-cursor 6>
[SRFI-130]
Returns #t
if obj is a string cursor, #f
otherwise.
[SRFI-130]
Returns a cursor pointing to the start of
a string str. It returns a valid cursor on an
empty string too. It’s the same as string-cursor-end
in that
case.
[SRFI-130]
Returns a cursor pointing to the end of str (the point after
the last character.) If str is empty, it is the same
as string-cursor-start
.
This cursor does not point to any valid character of the string.
[SRFI-130] Returns the cursor into str following cur. cur can also be an index. An error is signaled if cur points to the end of the string.
[SRFI-130] Returns the cursor into str preceding cur. cur can also be an index. An error is signaled if cur points to the beginning of the string.
[SRFI-130] Returns the cursor into str following cur by n characters. cur can also be an index.
[SRFI-130] Returns the cursor into str preceding cur by n characters. cur can also be an index.
[SRFI-130] Convert an index to a cursor. If index is a cursor it will be returned as-is.
[SRFI-130] Convert a cursor to an index. If cur is a an index it will be returned as-is.
[SRFI-130] Returns the number of characters between start and end. It should be non-negative if start precedes end, non-positive otherwise. start and end also accept index.
[SRFI-130]
Compares two cursors or two indexes (but not a cursor and an index)
and returns #t
or #f
accordingly.
Next: String accessors & modifiers, Previous: String cursors, Up: Strings [Contents][Index]
Since Gauche stores strings in multibyte encoding, random access requires O(N) by default. In most cases, string access is either sequential or search-and-extract pattern, and Gauche provides direct means for these operations, so you don’t need to deal with indexed access. However, there may be a case that you have need more efficient random access string (mostly when porting third-party code, we imagine).
There are a couple of ways to achieve O(1) random access.
First, instead of integer character indexes,
you can use string cursors (see String cursors). It is defined
by srfi.130
, and you can use the code that’s using SRFI-130
as is, without worring about slow access. However, if external interface
gives you integer character index, converting index to cursor and vice versa
takes O(N) after all.
There’s another way. You can precompute string index, mapping from integer character index to the position in the multibyte string. It costs O(N) of time and space to compute it, but once computed, you have O(1) random access. (We store positions for every K characters, where K is between 16 to 256, so it won’t take up as large storage as the actual string body).
For portability, SRFI-135 Immutable Texts provides O(1) accessible string as “texts”. On Gauche, a text is just an immutable string with index attached.
Computes and attaches index to a string str, and returns str itself. The operation doesn’t alter the content of str, and you can pass immutable string as well.
If str is a single-byte string (ASCII-only, or incomplete), or a short one (less than 64 octets), no index is attached. It is ok to pass a string which already has an index; then index computation is skipped.
The index is attached to the string’s content. If you alter str
by e.g. string-set!
, the index is discarded.
Returns #t
iff index access of a string str is effectively O(1),
that is, str is either a single-byte string,
a short string, or a long multibyte string with index computed.
Next: String comparison, Previous: String indexing, Up: Strings [Contents][Index]
[R7RS base] Returns a length of (possibly incomplete) string string.
Returns a size of (possibly incomplete) string. A size of string is a number of bytes string occupies on memory. The same string may have different sizes if the native encoding scheme differs.
For incomplete string, its length and its size always match.
[R7RS+ base] Returns k-th character of a complete string cstring. It is an error to pass an incomplete string.
By default, an error is signaled if k
is out of range
(negative, or greater than or equal to the length of cstring).
However, if an optional argument fallback is given,
it is returned in such case. This is Gauche’s extension.
If cstring is a multibyte string without index attached, this procedure takes O(k) time. See String indexing, for ensuring O(1) access.
k can also be a string cursor (also Gauche’s extension). Cursor acccess is O(1).
Returns k-th byte of a (possibly incomplete) string string.
Returned value is an integer in the range between 0 and 255.
k must be greater than or equal to zero, and less than
(string-size string)
.
[R7RS base]
Substitute string’s k-th character by char.
k must be greater than or equal to zero, and less than
(string-length string)
.
Return value is undefined.
If string is an incomplete string, integer value of the lower 8 bits of char is used to set string’s k-th byte.
See the notes in make-string
about performance consideration.
Substitute string’s k-th byte by integer byte.
byte must be in the range between 0 to 255, inclusive.
k must be greater than or equal to zero, and less than
(string-size string)
.
If string is a complete string, it is turned to incomplete string
by this operation.
Return value is undefined.
Next: String utilities, Previous: String accessors & modifiers, Up: Strings [Contents][Index]
[R7RS base]
Returns #t
iff all arguments are strings with the same content.
If any of arguments is incomplete string, it returns #t
iff
all arguments are incomplete and have exactly the same content.
In other words, a complete string and an incomplete string never equal
to each other.
[R7RS base]
Compares strings in codepoint order. Returns #t
iff
all the arguments are ordered.
Comparison between an incomplete string and a complete string, or between two incomplete strings, are done by octet-to-octet comparison. If a complete string and an incomplete string have exactly the same binary representation of the content, a complete string is smaller.
Case-insensitive string comparison.
These procedures fold argument character-wise, according to
Unicode-defined character-by-character case mapping. See
char-foldcase
for the details (Characters).
Character-wise case folding doesn’t handles the case like
German eszett:
(string-ci=? "\u00df" "SS") ⇒ #f
R7RS requires string-ci
* procedures to use string case folding.
Gauche provides R7RS-conformant case insensitive comparison procedures
in gauche.unicode
(see Full string case conversion).
If you write in R7RS, importing
(scheme char)
library, you’ll use gauche.unicode
’s
string-ci
* procedures.
Next: Incomplete strings, Previous: String comparison, Up: Strings [Contents][Index]
[R7RS+ base]
Returns a substring of string, starting from start-th
character (inclusive) and ending at end-th character (exclusive).
The start and end arguments must satisfy
0 <= start < N
,
0 <= end <= N
, and
start <= end
, where N is the length of the
string.
start and end can also be string cursors, but this is an extension of Gauche.
When start is zero and end is N, this procedure
returns a copy of string.
(See also opt-substring
below, if you don’t want to copy
if not necessary.)
Actually, extended string-copy
explained below
is a superset of substring
. This procedure is
kept mostly for compatibility of R7RS programs.
See also subseq
in gauche.sequence
- Sequence framework,
for the generic version.
Like substring
, returns a part of string between
start-th character (inclusive) and end-th character (exclusive).
However, if the entire string is used (e.g. start is 0 and
end is the length of string, or the arguments are omitted, etc.),
string is returned as is, without copying.
This is a typical handling of optional start/end indexes
for many string utilities. Note that using substring
forces
copying the input string even when it’s not necessary.
Besides exact integers, #f
or #<undef>
is allowed
as start and end, to indicate the argument is missing.
In that case, 0 is assumed for start, and
the length of string is assumed for end.
[R7RS base] Returns a newly allocated string whose content is concatenation of string ….
See also string-concatenate
in String reverse & append.
[R7RS base] Converts a string to a list of characters or vice versa.
You can give an optional start/end indexes to string->list
.
For list->string
, every elements of list must be
a character, or an error is signaled. If you want to build
a string out of a mixed list of strings and characters, you
may want to use tree->string
in text.tree
- Lazy text construction.
[R7RS base]
Returns a copy of string. You can give start and/or
end index to extract the part of the original string
(it makes string-copy
a superset of substring
effectively).
If only start argument is given, a substring beginning from
start-th character (inclusive) to the end of string is
returned. If both start and end argument are given,
a substring from start-th character (inclusive) to
end-th character (exclusive) is returned.
See substring
above for the condition that start and
end should satisfy.
Node: R7RS’s destructive version string-copy!
is provided
by srfi.13
module (see srfi.13
- String library).
If string is immutable, return it as is. Otherwise, returns
an immutable copy of string. It is a dual of string-copy
which always returns a mutable copy.
The optional start and end argument may be a nonnegative integer character index and/or string cursors to restrict the range of string to be copied.
[R7RS base] Fills string by char. Optional start and end limits the effective area.
(string-fill! "orange" #\X) ⇒ "XXXXXX" (string-fill! "orange" #\X 2 4) ⇒ "orXXge"
See the notes in make-string
about performance consideration.
[SRFI-13] Concatenate strings in the list strs, with a string delim as ‘glue’.
The argument grammar may be one of the following symbol to specify how the strings are concatenated.
infix
Use delim between each string. This mode is default. Note that this mode introduce ambiguity when strs is an empty string or a list with a null string.
(string-join '("apple" "mango" "banana") ", ") ⇒ "apple, mango, banana" (string-join '() ":") ⇒ "" (string-join '("") ":") ⇒ ""
strict-infix
Works like infix
, but empty list is not allowed to strs,
thus avoiding ambiguity.
prefix
Use delim before each string.
(string-join '("usr" "local" "bin") "/" 'prefix) ⇒ "/usr/local/bin" (string-join '() "/" 'prefix) ⇒ "" (string-join '("") "/" 'prefix) ⇒ "/"
suffix
Use delim after each string.
(string-join '("a" "b" "c") "&" 'suffix) ⇒ "a&b&c&" (string-join '() "&" 'suffix) ⇒ "" (string-join '("") "&" 'suffix) ⇒ "&"
Scan item (either a string or a character) in string.
While string-scan
finds the leftmost match, string-scan-right
finds the rightmost match.
The return argument specifies what value should be returned when item is found in string. It must be one of the following symbols.
index
Returns the index in string if item is found, or #f
.
This is the default behavior.
(string-scan "abracadabra" "ada") ⇒ 5 (string-scan "abracadabra" #\c) ⇒ 4 (string-scan "abracadabra" "aba") ⇒ #f
before
Returns a substring of string before item, or
#f
if item is not found.
(string-scan "abracadabra" "ada" 'before) ⇒ "abrac" (string-scan "abracadabra" #\c 'before) ⇒ "abra"
after
Returns a substring of string after item, or
#f
if item is not found.
(string-scan "abracadabra" "ada" 'after) ⇒ "bra" (string-scan "abracadabra" #\c 'after) ⇒ "adabra"
before*
Returns a substring of string before item, and
the substring after it. If item is not found, returns
(values #f #f)
.
(string-scan "abracadabra" "ada" 'before*) ⇒ "abrac" and "adabra" (string-scan "abracadabra" #\c 'before*) ⇒ "abra" and "cadabra"
after*
Returns a substring of string up to the end of item,
and the rest. If item is not found, returns
(values #f #f)
.
(string-scan "abracadabra" "ada" 'after*) ⇒ "abracada" and "bra" (string-scan "abracadabra" #\c 'after*) ⇒ "abrac" and "adabra"
both
Returns a substring of string before item and
after item. If item is not found, returns
(values #f #f)
.
(string-scan "abracadabra" "ada" 'both) ⇒ "abrac" and "bra" (string-scan "abracadabra" #\c 'both) ⇒ "abra" and "adabra"
[SRFI-152+] Splits string by splitter and returns a list of strings. splitter can be a character, a character set, a string, a regexp, or a procedure.
If splitter is a character or a string, it is
used as a delimiter. Note that SRFI-152’s string-split
only allows strings for splitter (it also interprets the first
optional argument as a grammar; see below for the compatibility note.)
If splitter is a character set, any consecutive characters that are member of the character set are used as a delimiter.
If a procedure is given to splitter, it is called for each character in string, and the consecutive characters that caused splitter to return a true value are used as a delimiter.
(string-split "/aa/bb//cc" #\/) ⇒ ("" "aa" "bb" "" "cc") (string-split "/aa/bb//cc" "/") ⇒ ("" "aa" "bb" "" "cc") (string-split "/aa/bb//cc" "//") ⇒ ("/aa/bb" "cc") (string-split "/aa/bb//cc" #[/]) ⇒ ("" "aa" "bb" "cc") (string-split "/aa/bb//cc" #/\/+/) ⇒ ("" "aa" "bb" "cc") (string-split "/aa/bb//cc" #[\w]) ⇒ ("/" "/" "//" "") (string-split "/aa/bb//cc" char-alphabetic?) ⇒ ("/" "/" "//" "") ;; some boundary cases (string-split "abc" #\/) ⇒ ("abc") (string-split "" #\/) ⇒ ("")
The grammar argument is the same as string-join
above; it
must be one of symbols infix
, strict-infix
, prefix
or suffix
. When omitted, infix
is assumed.
(string-split "/a/b/c/" "/" 'infix) ⇒ ("" "a" "b" "c" "") (string-split "/a/b/c/" "/" 'prefix) ⇒ ("a" "b" "c" "") (string-split "/a/b/c/" "/" 'suffix) ⇒ ("" "a" "b" "c")
In general, the following relationship holds:
(string-join XS DELIM GRAMMAR) ⇒ S (string-split S DELIM GRAMMAR) ⇒ XS
If limit is given and not #f
, it must be a nonnegative
integer and specifies the maximum number of match to the splitter.
Once the limit is reached, the rest of string is included in the result
as is.
(string-split "a.b..c" "." 'infix 0) ⇒ ("a.b..c") (string-split "a.b..c" "." 'infix 1) ⇒ ("a" "b..c") (string-split "a.b..c" "." 'infix 2) ⇒ ("a" "b" ".c")
Compatibility note:
The grammar argument is added for the consistency of srfis
(SRFI-130, SRFI-152, see srfi.152
- String library (reduced)).
However, for the backward compatibility and
the convenience, it also accepts limit without grammar argument;
it is distinguishable since grammar is a symbol and
limit is an integer.
For the code that’s compatible to SRFI-152, use the first form that takes
grammar argument.
(string-split "a.b..c" "." 2) ⇒ ("a" "b" ".c")
The start and end arguments limits input string in the given range before splitting.
See also string-tokenize
in
(see Other string operations).
[R7RS base][SRFI-13] Applies proc over each character in the input string, and gathers the characters returned from proc into a string and returns it. It is an error if proc returns non-character.
Because of historical reasons, this procedure has two interfaces.
The first one takes one or more input strings, and proc receives
as many characters as the number of input strings, each character
being taken from each string. Iteration stops on the shortest string.
This is defined in R7RS-small, and consistent with map
,
vector-map
, etc.
The second one takes only one string argument, and optional start/end arguments, which may be nonnegative integer indexes or string cursors to limit the input range of the string. This is defined in SRFI-13, string library.
The order in which proc is applied is not guaranteed to be left to right. You shouldn’t depend on the order.
If proc saves a continuation and it is invoked later, the result
already returned from string-map
won’t be affected
(as specified in R7RS).
(string-map char-upcase "apple") ⇒ "APPLE" (string-map (^[a b] (if (char>? a b) a b)) "orange" "apple") ⇒ "orpng" (string-map char-upcase "pineapple" 0 4) ⇒ "PINE"
[R7RS base][SRFI-13] Applies proc over each character in the input string in left-to-right order. The results of proc is discarded.
Because of historical reasons, this procedure has two interfaces,
first one defined in R7RS and second one defined in SRFI-13.
See string-map
above for the explanation.
Previous: String utilities, Up: Strings [Contents][Index]
A string can be flagged as "incomplete" if it may contain byte sequences that do not consist of a valid multibyte character in the Gauche’s native encoding.
Incomplete strings may be generated in several circumstances; reading binary data as a string, reading a string data that has been ’chopped’ in middle of a multibyte character, or concatenating a string with other incomplete strings, for example.
Incomplete strings should be regarded as an exceptional case. It used to be a way to handle byte strings, but now we have u8vector (see Uniform vectors) for that purpose. In fact, we’re planning to remove it in the future releases.
Just in case, if you happen to get an incomplete string,
you can convert it to a complete string by string-incomplete->complete
.
#**"
…"
¶Denotes incomplete string. The same escape sequences as the complete string syntax are recognized.
Rationale of the syntax: #*
is used for bit vectors.
Since an incomplete strings is really a byte vector,
it has similarity.
Note: We used #*"...."
for an incomplete string on 0.9.9 and
before. It turned out that it couldn’t coexist with bitvectors, for
#*
is a valid bitvector literal (zero-length vector), and
"
is a delimiter, so #*"...."
can be parsed as
a zero-length bitvector followed by a string. From 0.9.10, we changed
the incomplete string literal to #**"..."
. It’s a bit lengthy,
but incomplete strings are anomalies and shouldn’t be used often anyway.
For the backward compatibility, #*"..."
is still read as an
incomplete string literal, unless the reader lexical mode
is strict-r7
(see Reader lexical mode, for the details).
If the reader lexical mode is warn-legacy
, it is read as
an incomplete string, but a warning is issued.
If the mode is strict-r7
, it is read as a zero-length
bitvector followed by a string.
In future releasers, #*"..."
would be warned by default,
and later we’ll gradually move to strict-r7
behavior.
Reinterpret the content of an incomplete string str and returns a newly created complete string from it. The handling argument specifies how to handle the illegal byte sequences in str.
#f
If str contains an illegal byte sequence, give up the
conversion and returns #f
. This is the default behavior.
:omit
Omit any illegal byte sequences.
:replace
Replace each byte in illegal byte sequences by a character given
in filler argument, defaulted to ?
.
:escape
Replace each byte in illegal byte sequences by a sequence of
filler <hexdigit>
<hexdigit>
.
Besides, the filler characters in the original string is replaced with
filler filler.
If str is already a complete string, its copy is returned.
The procedure always returns a complete string, except when
the handling argument is #f
(default) and the input is an
incomplete string, in which case #f
is returned.
(string-incomplete->complete #**"_abc") ⇒ "_abc" ; can be represented as a complete string (string-incomplete->complete #**"_ab\x80;c") ⇒ #f ; can't be represented as a complete string (string-incomplete->complete #**"_ab\x80;c" :omit) ⇒ "_abc" ; omit the illegal bytes (string-incomplete->complete #**"_ab\x80;c" :replace #\_) ⇒ "_ab_c" ; replace the illegal bytes (string-incomplete->complete #**"_ab\x80;c" :escape #\_) ⇒ "__ab_80c" ; escape the illegal bytes and escape char itself
Next: Regular expressions, Previous: Character Sets, Up: Core library [Contents][Index]