text.tr- Transliterate characters
This module implements a transliterate function,
that substitutes characters of the input string.
This functionality is realized in Unix
and incorporated in various programs such as
tr is aware of multibyte characters.
Reads from input and writes to output, with transliterating characters in from-list to the corresponding ones in to-list. Characters that doesn’t appear in from-list are passed through.
The default values of input and output are current input port and current output port, respectively.
Both from-list and to-list must be strings. They may contain the following special syntax. Other characters that doesn’t fits in the syntax are taken as they are.
Expanded to the increasing sequence of characters from
inclusive. The order is determined by the internal character
encoding system; generally it is safer to limit use of this within
the range of the same character class. The character
must be before
n is a decimal number notation.
Meaningful only in
to-list; it is an error to use this form in from-list.
n is omitted or zero,
x is repeated until to-list
matches the length of from-list (any character after it is ignored).
x itself. Use this escape to avoid a special
character to be interpreted as itself. Note that if you place
a backslash in a string, you must write
\\, for the Scheme
reader also interprets backslash as a special character.
There’s no special sequence to represent non-graphical characters, for you can put such characters by the string syntax.
Here’s some basic examples.
;; swaps case of input (tr "A-Za-z" "a-zA-Z") ;; replaces 7-bit non-graphical characters to ‘?’ (tr "\x00-\x19\x7f" "?*")
If to-list is shorter than from-list, the behavior depends on the keyword argument delete. If a true value is given, characters that appear in from-list but not in to-list are deleted. Otherwise, the extra characters in from-list are just passed through.
When a true value is specified to complement,
the character set in from-list is complemented.
Note that it implies huge set of characters,
so it is not very useful unless either output character
set is a single character (using ‘*’) or used with
When a true value is specified to squeeze, the sequence of the same replaced characters is squeezed to one. If to-list is empty, the sequence of the same characters in from-list is squeezed.
tr builds a table to map the characters for
efficiency. Since Gauche can deal with potentially huge set
of characters, it limits the use of the table for only smaller
characters (<256 by default). If you want to transliterate
multibyte characters on the large text, however, you might want
to use larger table, trading off the memory usage. You can specify
the internal table size by table-size keyword argument.
For example, if you transliterate lots of EUC-JP hiragana text
to katakana, you may want to set table size greater than 42483
(the character code of the last katakana).
Note that the pre-calculation to build the transliterate table
needs some overhead. If you want to call
tr many times
inside loop, consider to use
build-transliterator described below.
tr, except that input is taken from a string string.
Returns a procedure that does the actual transliteration. This effectively
“pre-compiles” the internal data structure. If you want to run
tr with the same sets repeatedly, you may build the procedure
once and apply it repeatedly, saving the overhead of initialization.
A note for an edge case: When input and/or output keyword arguments are omitted, the created transliterator is set up to use current-input-port and/or current-output-port at the time transliterator is called.
(with-input-from-file "huge-file.txt" (lambda () (let loop ((line (read-line))) (unless (eof-object? line) (tr "A-Za-z" "a-zA-Z"))))) ;; runs more efficiently... (with-input-from-file "huge-file.txt" (lambda () (let ((ptr (build-transliterator "A-Za-z" "a-zA-Z"))) (let loop ((line (read-line))) (unless (eof-object? line) (ptr))))))