| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A string class. In Gauche, a string can be viewed in two ways: a sequence of characters, or a sequence of bytes.
It should be emphasized that Gauche’s internal string object, string body, is immutable. To comply R5RS in which strings are mutable, a Scheme-level string object is an indirect pointer to a string body. Mutating a string means that Gauche creates a new immutable string body that reflects the changes, then swap the pointer in the Scheme-level string object.
This may affect some assumptions on the cost of string operations.
Gauche does not attempt to make string mutation faster;
(string-set! s k c) is exactly as slow as to take
two substrings, before and after of k-th character, and
concatenate them with a single-character string inbetween.
So, just avoid string mutations; we believe it’s a better practice.
See also String Constructors.
R5RS string operations are very minimal. Gauche supports some
extra built-in operations, and also a rich string library
defined in SRFI-13. See section srfi-13 - String library, for details about SRFI-13.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
"…"[R5RS+] Denotes a literal string. Inside the double quotes, the following backslash escape sequences are recognized.
\"[R5RS] Double-quote character
\\[R5RS] Backslash character
\nNewline character (ASCII 0x0a).
\rReturn character (ASCII 0x0d).
\fForm-feed character (ASCII 0x0c).
\tTab character (ASCII 0x09)
\0ASCII NUL character (ASCII 0x00).
\<whitespace>*<newline><whitespace>*Ignored. This can be used to break a long string literal for readability. This escape sequence is introduced in R6RS.
\xNNA byte represented by two-digit hexadecimal number NN. The byte is interpreted as the internal multibyte encoding.
\uNNNNA character whose UCS2 code is represented by four-digit hexadecimal number NNNN.
\UNNNNNNNNA character whose UCS4 code is represented by eight-digit hexadecimal number NNNNNNNN.
If Gauche is compiled with internal encoding other than UTF-8,
the reader uses gauche.charconv module to interpret
\uNNNN and \UNNNNNNNN escape sequence.
The following code is an example of backslash-newline escape sequence:
(define *message* "\ This is a long message \ in a literal string.") *message* ⇒ "This is a long message in a literal string." |
Note the whitespace just after ‘message’. Since any whitespaces before ‘in’ is eaten by the reader, you have to put a whitespace between ‘message’ and the following backslash. If you want to include an actual newline character in a string, and any indentation after it, you can put ’\n’ in the next line like this:
(define *message/newline* "\ This is a long message, \ \n with a line break.") |
#*"…"Denotes incomplete string. The same escape sequences as the complete string syntax are recognized.
Rationale of the syntax: ’#*’ is used for bit vector
in Common Lisp. Since an incomplete strings is really a byte vector,
it has similarity. (Bit vector can be added later, if necessary,
and two can coexist).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[R5RS]
Returns #t if obj is a string, #f otherwise.
Returns #t if obj is an immutable string, #f otherwise
Returns #t if obj is an incomplete string, #f otherwise
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[R5RS] Returns a string of length k. If optional char is given, the new string is filled with it. Otherwise, the string is filled with a whitespace. The result string is always complete.
(make-string 5 #\x) ⇒ "xxxxx" |
Note that the algorithm to allocate a string by make-string and
then fills it one character at a time is extremely inefficient
in Gauche, and should be avoided. That kind of algorithms unnecessarily
assumes underlying string allocation and representation mechanism,
which Gauche doesn’t follow.
You can use an output string port for a string construction
(See section String ports).
Even creating a list of characters and
using list->string is faster than using make-string and
string-set!.
Creates and returns an incomplete string o size k. If byte is given, which must be an exact integer, and its lower 8 bits are used to initialize every byte in the created string.
[R5RS] Returns a string consisted by char ….
A generic coercion function.
Returns a string representation of obj.
The default methods are defined as follows: strings are returned
as is, numbers are converted by number->string, symbols are
converted by symbol->string, and other objects are
converted by display.
Other class may provide a method to customize the behavior.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The term "string interpolation" is used in various scripting languages such as Perl and Python to refer to the feature to embed expressions in a string literal, which are evaluated and then their results are inserted into the string literal at run time.
Scheme doesn’t define such a feature, but Gauche implements it as a reader macro.
#`string-literalEvaluates to a string. If string-literal contains the
character sequence ,expr, where
expr is a valid external representation
of a Scheme expression, expr is evaluated and
its result is inserted in the original place (by using x->string,
see String Constructors).
The comma and the following expression must be adjacent (without containing any whitespace characters), or it is not recognized as a special sequence.
Two adjacent commas are converted to a single comma. You can embed a comma before a non-whitespace character in string-literal by this.
Other characters in the string-literal are copied as is.
If you use a variable as expr and need to delimit it from the subsequent string, you can use the symbol escape syntax using ‘|’ character, as shown in the last two examples below.
#`"This is Gauche, version ,(gauche-version)."
⇒ "This is Gauche, version 0.9.3.3."
#`"Date: ,(sys-strftime \"%Y/%m/%d\" (sys-localtime (sys-time)))"
⇒ "Date: 2002/02/18"
(let ((a "AAA")
(b "BBB"))
#`"xxx ,a ,b zzz")
⇒ "xxx AAA BBB zzz"
#`"123,,456,,789"
⇒ "123,456,789"
(let ((n 5)) #`"R,|n|RS")
⇒ "R5RS"
(let ((x "bar")) #`"foo,|x|.")
⇒ "foobar"
|
In fact, the reader expands this syntax into a macro call,
which is then expanded into a call of string-append
as follows:
#`"This is Gauche, version ,(gauche-version)."
≡
(string-append "This is Gauche, version "
(x->string (gauche-version))
".")
|
Rationale of the syntax:
Some other scripting languages use ‘$expr’ or ’#{...}’.
I chose this syntax with respect to the quasiquote (See section Quasiquotation).
Although it may be awkward to delimit variable names by ‘|’,
the comma syntax should be easier to read than the other exotic syntax
for seasoned Scheme programmers.
Note that Scheme allows wider range of characters for valid identifier names
than usual scripting languages.
Consequently, you will almost always need to use ‘|’ delimiters
when you interpolate the value of a variable.
For example, while you can write
"$year/$month/$day $hour:$minutes:$seconds" in Perl,
you should write #`",|year|/,|month|/,day ,|hour|:,|minutes|:,seconds".
It may be better always to delimit direct variable references
in this syntax to avoid confusion.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[R5RS] Returns a length of (possibly incomplete) string string.
Returns a size of (possibly incomplete) string. A size of string is a number of bytes string occupies on memory. The same string may have different sizes if the native encoding scheme differs.
For incomplete string, its length and its size always match.
[R5RS+] Returns k-th character of a complete string cstring. It is an error to pass an incomplete string.
By default, an error is signalled if k is out of range
(negative, or greater than or equal to the length of cstring).
However, if an optional argument fallback is given,
it is returned in such case. This is Gauche’s extension.
Returns k-th byte of a (possibly incomplete) string string.
Returned value is an integer in the range between 0 and 255.
k must be greater than or equal to zero, and less than
(string-size string).
[R5RS] Substitute string’s k-th character by char.
k must be greater than or equal to zero, and less than
(string-length string).
Return value is undefined.
If string is an incomplete string, integer value of the lower 8 bits of char is used to set string’s k-th byte.
See the notes in make-string about performance consideration.
Substitute string’s k-th byte by integer byte.
byte must be in the range between 0 to 255, inclusive.
k must be greater than or equal to zero, and less than
(string-size string).
If string is a complete string, it is turned to incomplete string
by this operation.
Return value is undefined.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[R5RS]
[R5RS]
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
[R5RS]
Returns a substring of string, starting from start-th
character (inclusive) and ending at end-th character (exclusive).
The start and end arguments must satisfy
0 <= start < N,
0 <= end <= N, and
start <= end, where N is the length of the
string.
When start is zero and end is N, this procedure returns a copy of string.
Actually, extended string-copy explained below
is a superset of substring. This procedure is
kept mostly for compatibility of R5RS programs.
See also subseq in gauche.sequence - Sequence framework,
for the generic version.
[R5RS] Returns a newly allocated string whose content is concatenation of string ….
See also string-concatenate in String reverse & append.
[R5RS+][SRFI-13] Converts a string to a list of characters or vice versa.
You can give an optional start/end indexes to string->list,
as specified in SRFI-13.
For list->string, every elements of list must be
a character, or an error is signalled. If you want to build
a string out of a mixed list of strings and characters, you
may want to use tree->string in text.tree - Lazy text construction.
[R5RS+][SRFI-13]
Returns a copy of string. You can give start and/or
end index to extract the part of the original string
(it makes string-copy a superset of substring effectively).
If only start argument is given, a substring beginning from
start-th character (inclusive) to the end of string is
returned. If both start and end argument are given,
a substring from start-th character (inclusive) to
end-th character (exclusive) is returned.
See substring above for the condition that start and
end should satisfy.
[R5RS+][SRFI-13] Fills string by char. Optional start and end limits the effective area.
(string-fill! "orange" #\X) ⇒ "XXXXXX" (string-fill! "orange" #\X 2 4) ⇒ "orXXge" |
[SRFI-13] Concatenate strings in the list strs, with a string delim as ‘glue’.
The argument grammer may be one of the following symbol to specify how the strings are concatenated.
infixUse delim between each string. This mode is default. Note that this mode introduce ambiguity when strs is an empty string or a list with a null string.
(string-join '("apple" "mango" "banana") ", ")
⇒ "apple, mango, banana"
(string-join '() ":")
⇒ ""
(string-join '("") ":")
⇒ ""
|
strict-infixWorks like infix, but empty list is not allowed to strs,
thus avoiding ambiguity.
prefixUse delim before each string.
(string-join '("usr" "local" "bin") "/" 'prefix)
⇒ "/usr/local/bin"
(string-join '() "/" 'prefix)
⇒ ""
(string-join '("") "/" 'prefix)
⇒ "/"
|
suffixUse delim after each string.
(string-join '("a" "b" "c") "&" 'suffix)
⇒ "a&b&c&"
(string-join '() "&" 'suffix)
⇒ ""
(string-join '("") "&" 'suffix)
⇒ "&"
|
Scan item (either a string or a character) in string.
While string-scan finds the leftmost match, string-scan-right
finds the rightmost match.
The return argument specifies what value should be returned when item is found in string. It must be one of the following symbols.
indexReturns the index in string if item is found, or #f.
This is the default behavior.
(string-scan "abracadabra" "ada") ⇒ 5 (string-scan "abracadabra" #\c) ⇒ 4 (string-scan "abracadabra" "aba") ⇒ #f |
beforeReturns a substring of string before item, or
#f if item is not found.
(string-scan "abracadabra" "ada" 'before) ⇒ "abrac" (string-scan "abracadabra" #\c 'before) ⇒ "abra" |
afterReturns a substring of string after item, or
#f if item is not found.
(string-scan "abracadabra" "ada" 'after) ⇒ "bra" (string-scan "abracadabra" #\c 'after) ⇒ "adabra" |
before*Returns a substring of string before item, and
the substring after it. If item is not found, returns
(values #f #f).
(string-scan "abracadabra" "ada" 'before*) ⇒ "abrac" and "adabra" (string-scan "abracadabra" #\c 'before*) ⇒ "abra" and "cadabra" |
after*Returns a substring of string up to the end of item,
and the rest. If item is not found, returns
(values #f #f).
(string-scan "abracadabra" "ada" 'after*) ⇒ "abracada" and "bra" (string-scan "abracadabra" #\c 'after*) ⇒ "abrac" and "adabra" |
bothReturns a substring of string before item and
after item. If item is not found, returns
(values #f #f).
(string-scan "abracadabra" "ada" 'both) ⇒ "abrac" and "bra" (string-scan "abracadabra" #\c 'both) ⇒ "abra" and "adabra" |
Splits string by splitter and returns a list of strings. splitter can be a character, a character set, a string, a regexp, or a procedure.
If splitter is a character, the character is used as a delimiter.
If splitter is a character set, any consecutive characters that are member of the character set are used as a delimiter.
If a procedure is given to splitter, it is called for each character in string, and the consecutive characters that caused splitter to return a true value are used as a delimiter.
(string-split "/aa/bb//cc" #\/) ⇒ ("" "aa" "bb" "" "cc")
(string-split "/aa/bb//cc" "/") ⇒ ("" "aa" "bb" "" "cc")
(string-split "/aa/bb//cc" "//") ⇒ ("/aa/bb" "cc")
(string-split "/aa/bb//cc" #[/]) ⇒ ("" "aa" "bb" "cc")
(string-split "/aa/bb//cc" #/\/+/) ⇒ ("" "aa" "bb" "cc")
(string-split "/aa/bb//cc" #[\w]) ⇒ ("/" "/" "//" "")
(string-split "/aa/bb//cc" char-alphabetic?) ⇒ ("/" "/" "//" "")
;; some boundary cases
(string-split "abc" #\/) ⇒ ("abc")
(string-split "" #\/) ⇒ ("")
|
See also string-tokenize in
(See section Other string operations).
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A string can be flagged as "incomplete" if it may contain byte sequences that do not consist of a valid multibyte character in the Gauche’s native encoding.
Incomplete strings may be genereated in several circumstances; reading binary data as a string, reading a string data that has been ’chopped’ in middle of a multibyte character, or concatenating a string with other incomplete strings, for example.
Incomplete strings should be regarded as an exceptional case.
It used to be a way to handle byte strings, but now we have
u8vector (See section gauche.uvector - Uniform vectors) for that purpose.
In fact, we’re planning to remove it in the future releases.
Just in case, if you happen to get an incomplete string, you can convert it to a complete string by the following procedure:
Reinterpret the content of an incomplete string str and returns a newly created complete string from it. The handling argument specifies how to handle the illegal byte sequences in str.
#fIf str contains an illegal byte sequence, give up the
conversion and returns #f. This is the default behavior.
:omitOmit any illegal byte sequences. Always returns a complete string.
Replace each byte in illegal byte sequences by the given character. Always returns a complete string.
If str is already a complete string, its copy is returned.
| [ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] |
This document was generated by Shiro Kawai on May 28, 2012 using texi2html 1.82.