For Development HEAD DRAFTSearch (procedure/syntax/module):

9.31 gauche.sequence - Sequence framework

Module: gauche.sequence

Provides a generic operations on sequences. A sequence is a collection with ordered elements. Besides all the operations applicable on collections, you can associate integer index to each element, and apply order-aware operations on the elements.

This module inherits gauche.collection (see gauche.collection - Collection framework). All the collection generic operations can be applied to a sequence as well.

Among Gauche builtin class, lists, vectors and strings are sequences and the specialized methods are defined for them. Other extension types, such as uniform vectors, have the methods as well.


9.31.1 Fundamental sequence accessors

Method: ref (seq <sequence>) index :optional fallback

{gauche.sequence} Returns index-th element of the sequence seq. This method enables uniform access for any sequence types.

When index is less than zero, or greater than or equal to the size of the sequence, fallback is returned if provided, or an error is signaled if not.

(ref '(a b c) 1)  ⇒ b
(ref '#(a b c) 1) ⇒ b
(ref "abc" 1)     ⇒ #\b
Method: (setter ref) (seq <sequence>) index value

{gauche.sequence} Sets value to the index-th element of the sequence seq. This is the uniform sequence modifier.

Note: Some sequences may not support arbitrary modification by index. For example, if you have a sequence representing a set of sorted integers, you cannot modify i-th element with arbitrary value. Yet such sequence may provide other means of modification, such as inserting or deleting elements.

(let ((x (list 'a 'b 'c)))
  (set! (ref x 1) 'z)
  x) ⇒ (a z c)

(let ((x (vector 'a 'b 'c)))
  (set! (ref x 1) 'z)
  x) ⇒ #(a z c)

(let ((x (string #\a #\b #\c)))
  (set! (ref x 1) #\z)
  x) ⇒ "azc"
Method: referencer (seq <sequence>)

{gauche.sequence}

Method: modifier (seq <sequence>)

{gauche.sequence}


9.31.2 Slicing sequence

Method: subseq (seq <sequence>) :optional start end

{gauche.sequence} Retrieve a subsequence of the sequence seq, from start-th element (inclusive) to end-th element (exclusive). If end is omitted, up to the end of sequence is taken. The type of the returned sequence is the same as seq.

(subseq '(a b c d e) 1 4)   ⇒ (b c d)
(subseq '#(a b c d e) 1 4)  ⇒ #(b c d)
(subseq "abcde" 1 4)        ⇒ "bcd"

(subseq '(a b c d e) 3)     ⇒ (d e)
Method: (setter subseq) (seq <sequence>) start end value-seq
Method: (setter subseq) (seq <sequence>) start value-seq

{gauche.sequence} Sets the elements of value-seq from the start-th element (inclusive) to the end-th element (exclusive) of the sequence seq. Value-seq can be any sequence, but its size must be larger than (end - start).

In the second form, end is figured out by the length of value-seq.

(define s (vector 'a 'b 'c 'd 'e))
(set! (subseq s 1 4) '(4 5 6))
s ⇒ #(a 4 5 6 e)
(set! (subseq s 0)   "ab")
s ⇒ #(#\a #\b 5 6 e)

9.31.3 Mapping over sequences

You can use extended fold, map, for-each and other generic functions on sequences, since a sequence is also a collection. However, sometimes you want to have index as well as the element itself during iteration. There are several generic functions for it.

Method: fold-with-index kons knil (seq <sequence>) …

{gauche.sequence} Like generic fold, except kons is given the index within seq, as the first argument, as well as each element from seqs and the accrued value.

(fold-with-index acons '() '(a b c))
  ⇒ ((2 . c) (1 . b) (0 . a))
Method: map-with-index proc (seq <sequence>) …
Method: map-to-with-index class proc (seq <sequence>) …
Method: for-each-with-index proc (seq <sequence>) …

{gauche.sequence} Like map, map-to and for-each, except proc receives the index as the first argument.

(map-with-index list '(a b c d) '(e f g h))
  ⇒ ((0 a e) (1 b f) (2 c g) (3 d h))

(map-to-with-index <vector> cons '(a b c d))
  ⇒ #((0 . a) (1 . b) (2 . c) (3 . d))
Method: find-with-index pred (seq <sequence>)

{gauche.sequence} Finds the first element in seq that satisfies pred like find, but returns two values, the index of the element and the element itself. If no element satisfies pred, two #f’s are returned.

(find-with-index char-upper-case? "abraCadabra")
  ⇒ 4 and #\C

(find-with-index char-numeric? "abraCadabra")
  ⇒ #f and #f
Method: find-index pred (seq <sequence>)

{gauche.sequence} Like find, but returns the index of the first element that satisfies pred in seq, instead of the element itself. If no element in seq satisfies pred, #f is returned.

(find-index char-upper-case? "abraCadabra")
  ⇒ 4

(find-index char-numeric? "abraCadabra")
  ⇒ #f

See also list-index in scheme.list (see scheme.list - R7RS lists).

Method: fold-right kons knil (seq <sequence>) …

{gauche.sequence} Generalization of fold-right on lists. Like fold, this method applies a higher-order function kons over given sequence(s), passing the "seed" value whose default is knil. The difference between fold and fold-right is the associative order of elements on which kons is applied.

When we have one sequence, [E0, E1, ..., En], fold and fold-right work as follows, respectively.

fold:
  (kons En (kons En-1 (kons ... (kons E1 (kons E1 knil)) ...)))

fold-right
  (kons E0 (kons E1 (kons ... (kons En-1 (kons En knil)) ...)))

This method isn’t defined on <collection>, since collections don’t care the order of elements.


9.31.4 Other operations over sequences

Selection and searching

Generic function: sequence-contains haystack needle :key test

{gauche.sequence} Both needle and haystack must be sequences. Searches needle from haystack from the beginning of haystack. If needle is found, the index in haystack where it begins is returned. Otherwise #f is returned. The keyword argument test is used to compare elements; its default is eqv?.

(sequence-contains '#(a b r a c a d a b r a) '#(b r a))
  ⇒ 1

(sequence-contains '#(a b r a c a d a b r a) '#(c r a))
  ⇒ #f

This can be regarded as generalization of string-contains in SRFI-13 (see String searching).

Function: break-list-by-sequence list needle :key test
Function: break-list-by-sequence! list needle :key test

{gauche.sequence} Searches a sequence needle from list, and if found, breaks list to two parts—the prefix of list up to right before needle begins, and the rest—and returns them. List must be a list, but needle can be any sequence. Elements are compared by test, defaulted to eqv?.

(break-list-by-sequence '(a b r a c a d a b r a) '(c a d))
  ⇒ (a b r a) and (c a d a b r a)

If needle isn’t found in list, it returns list itself and (). This behavior is aligned to span and break (see scheme.list - R7RS lists), which split a list by predicate but returns the whole list if split condition isn’t met.

(break-list-by-sequence '(a b r a c a d a b r c a) '(c a z))
  ⇒ (a b r a c a d a b r c a) and ()

The linear update version break-list-by-sequence! modifies list to create the return value if necessary, so list must be mutable. The caller must use the return value instead of relying on side-effects, though, for list may not be modified.

Function: sequence->kmp-stepper needle :key test

{gauche.sequence} This is an internal routine to search subsequence (needle) inside larger sequence, using Knuth-Morris-Pratt (KMP) algorithm. It is used in sequence-contains, break-list-by-sequence and break-list-by-sequence!.

Returns a procedure that performs one step of KMP search. The procedure takes two arguments, an element elt and an index k. It compares elt with (~ needle k), and returns two values—the next index and a flag indicating the match is completed. When the match is completed, the next index is equal to the length of needle.

As an edge case, if needle is an empty sequence, sequence->kmp-stepper returns #f.

Elements are compared using test, which is defaulted to eqv?.

The following is a skeleton of searcher using sequence->kmp-stepper. Here we assume haystack is a list, and we just return whether the needle is found or not, or needle is empty; you might want to carry around other info in the loop (e.g. sequence-contains tracks the current index of haystack in order to return the found index.)

(if-let1 stepper (sequence->kmp-stepper needle)
  (let loop ([haystack haystack]
             [k 0])
    (if (null? haystack)
      'not-found
      (receive (k found) (stepper (car haystack) k) ; KMP step
        (if found
          'found
          (loop (cdr haystack) k)))))
  'needle-is-empty)

Note that selection and searching methods for collections can also be applied to sequences. See Selection and searching in collection.

Grouping

Generic function: group-sequence seq :key key test

{gauche.sequence} Groups consecutive elements in a sequence seq which have the common key value. A key value of an element is obtained by applying the procedure key to the element; the default procedure is identity. For each element in seq, key is applied exactly once. The equal-ness of keys are compared by test procedure, whose default is eqv?.

(group-sequence '(1 1 1 2 3 4 4 2 2 3 1 1 3))
  ⇒ ((1 1 1) (2) (3) (4 4) (2 2) (3) (1 1) (3))

(group-sequence '(1 1 1 2 3 4 4 2 2 3 1 1 3)
                :key (cut modulo <> 2)))
  ⇒ ((1 1 1) (2) (3) (4 4 2 2) (3 1 1 3))

(group-sequence '#("a" "a" "b" "b" "c" "d" "d")
                :test string=?)
  ⇒ (("a" "a") ("b" "b") ("c") ("d" "d"))

(group-sequence "aabbcdd"
                :test char=?)
  ⇒ ((#\a #\a) (#\b #\b) (#\c) (#\d #\d))

This method is similar to Haskell’s group. If you want to group elements that are not adjacent, use group-collection (see Selection and searching in collection).

If you simply need to reduce each group for one instance, that is, removing adjacent duplicated elements, you can use delete-neighbor-dups below.

Generic function: group-contiguous-sequence seq :key key next test squeeze

{gauche.sequence} Group contiguous elements in seq.

(group-contiguous-sequence '(1 2 3 4 7 8 9 11 13 14 16))
  ⇒ ((1 2 3 4) (7 8 9) (11) (13 14) (16))

If the keyword argument squeeze is true, each subsequence is represented with its first and last elements, except when the subsequence has only one element.

(group-contiguous-sequence '(1 2 3 4 7 8 9 11 13 14 16) :squeeze #t)
  ⇒ ((1 4) (7 9) (11) (13 14) (16))

The keyword argument key must be a procedure taking one argument, and it is applied to every element in the sequence once, to construct the result. Its default is identity.

The keyword argument next must be a procedure taking one argument, which is the key value (whatever key procedure returns) and must return the “next” key value. Its default is (^n (+ n 1)).

The test argument must be a procedure taking two argument and used to compare two key values. Its default is eqv?.

(group-contiguous-sequence "AbCdFgH"
  :key char-upcase :next (^c (integer->char (+ 1 (char->integer c)))))
  ⇒ ((#\A #\B #\C #\D) (#\F #\G #\H))
Generic function: delete-neighbor-dups seq :key key test start end

{gauche.sequence} Returns a sequence of the same type as seq, in which elements in seq are included in the original order, except duplicate adjacent elements. The type of seq must has a builder.

(delete-neighbor-dups '(1 1 1 2 3 4 4 2 2 3 1 1 3))
  ⇒ (1 2 3 4 2 3 1 3)

(delete-neighbor-dups '#(1 1 1 2 3 4 4 2 2 3 1 1 3))
  ⇒ #(1 2 3 4 2 3 1 3)

(delete-neighbor-dups "1112344223113")
  ⇒ "12342313"

Elements are compared with eqv? by default. You can pass alternative procedure to test keyword argument; it is always called as (test x y), where x and y are the contiguous elements in seq. If elements are compared equal, the first one is kept:

(delete-neighbor-dups "AaaAbBBbCCcc" :test char-ci=?)
  ⇒ "AbC"

If key is provided, it must be a procedure that takes one arguments. It is applied to each element of seq at most once, and each resulting value is used for the comparison instead of elements themselves.

(delete-neighbor-dups
  '((1 . "a") (1 . "b") (2 . "c") (2 . "d"))
  :key car)
  ⇒ ((1 . "a") (2 . "c"))

The start and end arguments specify indexes in the seq to limit the range to look at. Where start is inclusive, end is exclusive.

(delete-neighbor-dups "1112344223113" :start 3 :end 9)
  ⇒ "2342"
Generic function: delete-neighbor-dups! seq :key key test start end

{gauche.sequence} Scan seq from left to right, dropping consecutive duplicated elements. The result is stored into seq, packed to left. Note seq must be modifiable by index, i.e. modifier method must be defined. The rest of seq will be untouched. Returns the next index after the last modified entry.

(let1 v (vector 1 1 2 2 3 3 2 2 4 4)
  (list (delete-neighbor-dups! v)
        v))
  ⇒ (5 #(1 2 3 2 4 3 2 2 4 4))

The semantics of keyword arguments key, test, start and end are the same as delete-neighbor-dups.

(let1 v (vector 1 1 2 2 3 3 2 2 4 4)
  (list (delete-neighbor-dups! v :start 2)
        v))
  ⇒ (6 #(1 1 2 3 2 4 2 2 4 4))

Note: This method works on any sequence with modifier method, but it’s not necessarily more efficient than delete-neighbor-dups, which creates a new sequence. If seq is a list or a string, each modification by index takes O(n) time (for a string even it costs O(n) extra storage), so the total cost is O(n^2), whereas delete-neighbor-dups needs O(n) time and storage. This works best for vectors and alike, with which it doesn’t cost extra allocation and runs in O(n) time.

Generic function: delete-neighbor-dups-squeeze! seq :key key test start end

{gauche.sequence} Operates like delete-neighbor-dups but reuses storage of seq, which will be resized by dropping duplicated elements. Returns the sequence after dupes are removed.

Not all sequences are resizable, so this method won’t be defined for such sequences. The gauche.sequence module only provides this method for <list>, in which dropping the middle of the sequence is very efficient as it is just a single set-cdr!.

(delete-neighbor-dups-squeeze! (list 1 1 1 2 2 2 3 3 3))
  ⇒ (1 2 3)

(delete-neighbor-dups-squeeze! (list 1 1 1 2 2 2 3 3 3 4 4)
                               :start 3 :end 9)
  ⇒ (2 3)

The semantics of keyword arguments key, test, start and end are the same as delete-neighbor-dups.

Prefix

Generic function: common-prefix (a <sequence>) (b <sequence>) :key key test

{gauche.sequence} Returns a new sequence of the same type of a which contains the common prefix of sequences a and b. The types of a and b doesn’t need to match. The type of a must have a builder.

For each corresponding element in a and b, the key procedure is applied (default identity), then compared with test procedure (default eqv?).

(common-prefix '(a b c d e) '(a b c e f))
  ⇒ (a b c)

(common-prefix "abcef" '#(#\a #\b #\c #\d #\e))
  ⇒ "abc"

For strings, SRFI-13 has a specific function with related feature: string-prefix-length (see String prefixes & suffixes).

Generic function: common-prefix-to (class <class>) (a <sequence>) (b <sequence>) :key key test

{gauche.sequence} Returns a new sequence of the type class which contains the common prefix of sequences a and b. The types of a and b doesn’t need to match, and neither needs to have a builder. The class must be a sequence class with a builder.

The meanings of keyword arguments are the same as common-prefix.

(common-prefix-to <list> "abcde" "ABCEF" :test char-ci=?)
  ⇒ '(#\a #\b #\c)

Permutation and shuffling

Generic function: permute (src <sequence>) (permuter <sequence>) :optional fallback

{gauche.sequence} Returns a newly created sequence of the same type as src, in which the elements are permuted from src according to permuter.

Permuter is a sequence of exact integers. When the k-th element of permuter is i, the k-th element of the result is (ref src i). Therefore, the size of the result sequence is the same as the size of permuter. Permuter can be any kind of sequence, unrelated to the type of src.

It is allowed that the same index i can appear more than once in permuter.

(permute '(a b c d) '(3 2 0 1))     ⇒ (d c a b)
(permute '(a b c d) '(0 2))         ⇒ (a c)
(permute '(a b c d) '(0 0 1 1 2 2)) ⇒ (a a b b c c)

If an integer in permuter is out of the valid range as the index of src, then an error is signaled unless fallback is given. If fallback is given, what value is used depends on the result of (ref src i fallback)—which usually returns fallback for the out-of-range index i.

(permute '#(a b c) '(3 2 1 0) 'foo) ⇒ #(foo c b a)

(permute "!,HWdelor" #(2 5 6 6 7 1 -1 3 7 8 6 4 0) #\space)
  ⇒ "Hello, World!"
Generic function: permute-to (class <class>) (src <sequence>) (permuter <sequence>) :optional fallback

{gauche.sequence} Like permute, but the result will be an instance of the given class instead of the class of src.

(permute-to <string> '(#\a #\b #\c #\d #\r)
            '(0 1 4 0 2 0 3 0 1 4 0))
  ⇒ "abracadabra"
Generic function: permute! (src <sequence>) (permuter <sequence>) :optional fallback

{gauche.sequence} Also like permute, but the result is stored back to src. Src must be a mutable sequence, and the length of src and permuter must be the same.

Generic function: shuffle (src <sequence>) :optional random-source

{gauche.sequence} Returns a new sequence of the same type and size as src, in which elements are randomly permuted.

(shuffle '(a b c d e))  ⇒ (e b d c a)
(shuffle "abcde")       ⇒ "bacde"

This generic function uses SRFI-27 (see srfi.27 - Sources of Random Bits). By default it uses default-random-source, but you can pass an alternative random source by the optional argument.

Generic function: shuffle-to (class <class>) (src <sequence>) :optional random-source

{gauche.sequence} Like shuffle, except that the result will be an instance of class instead of the class of src.

Generic function: shuffle! (src <sequence>) :optional random-source

{gauche.sequence} Like shuffle, but the result is stored back to src. Src must be a mutable sequence.


9.31.5 Implementing sequences

When you define your own aggregate class that holds ordered objects, you can follow the sequence protocol below to make it sequence.

Inherit <sequence>: You need to derive your class from <sequence>, so that sequence-related methods can be uniformly used. Note that <sequence> inherits <collection>, so your class will also be a collection.

Implement collection protocol: A sequence is also a collection, and you need to implement a collection protocol (see Implementing collections). For the very least, you need to have call-with-iterator method. If you want your sequence to be buildable, you also need to have call-with-builder method. All other collection generic functions have default method on top of these two, so they’ll work; though you can specialize other collection generic functions if it’s more efficient. We strongly recommend to specialize size-of, for its default method walks over the entire sequence to find its size, which can be very ineffient (O(N) time).

Implement sequence protocol: You need to define referencer, modifier, at least. If your sequence is immutable, you can leave modifier undefined (the default method raises an error). Again, you may want to specialize other generic functions if it’s more efficient. It is recommended to define call-with-reverse-iterator as well, which is used for operations that walks the sequence in the reverse order (e.g. fold-right). If call-with-reverse-iterator is not specialized, the default method uses call-with-iterator and keeps all the elements temporarily during execution, which takes O(N) space.



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT