Collection framework (Gauche Users’ Reference)

9.6 `gauche.collection` - Collection framework

Module: gauche.collection ¶

This module provides a set of generic functions (GFs) that iterate over various collections. The Scheme standard has some iterative primitives such as map and for-each, and scheme.list (see scheme.list - R7RS lists adds a rich set of such functions, but they work only on lists.

Using the method dispatch of the object system, this module efficiently extends those functions for other collection classes such as vectors and hash tables. It also provides a simple way for user-defined class to adapt those operations. So far, the following operations are defined.

Mapping: fold, fold2, fold3, map, map-to, map-accum, for-each
Selection and searching: find, find-min, find-max, find-min&max, filter, filter-to, remove, remove-to, partition, partition-to group-collection
Conversion: coerce-to
Miscellaneous: size-of, lazy-size-of
Fundamental iterator creator: call-with-iterator, call-with-builder, with-iterator, with-builder, call-with-iterators.

Those operations work on collections and its subclass, sequences. A collection is a certain form of a set of objects that you can traverse all the object in it in a certain way. A sequence is a collection that all its elements are ordered, so that you can retrieve its element by index.

The following Gauche built-in objects are treated as collections and/or sequences.

<list>: A sequence.
<vector>: A sequence.
<string>: A sequence (of characters)
<hash-table>: A collection. Each element is a pair of a key and a value.
<s8vector>, <u8vector>, … <f64vector>: A sequence (methods defined in gauche.uvector module, see Uniform vectors).

See gauche.sequence - Sequence framework, for it adds more sequence specific methods.

The methods that needs to return a set of objects, i.e. map, filter, remove and partition. returns a list (or lists). The corresponding “-to” variant (map-to, filter-to, remove-to and partition-to. takes a collection class argument and returns the collection of the class.

9.6.1 Mapping over collection

These generic functions extends the standard mapping procedures. See also Mapping over sequences, if you care the index as well as elements.

Generic function: fold proc knil coll coll2 … ¶

{gauche.collection} This is a natural extension of fold (see Other list procedures).

For each element Ei in the collection coll, proc is called as (proc Ei Ri-1), where Ri-1 is the result of (i-1)-th invocation of proc for i > 0, and R0 is knil. Returns the last invocation of proc.

(fold + 0 '#(1 2 3 4)) ⇒ 10
(fold cons '() "abc")  ⇒ (#\c #\b #\a)

If the coll is a sequence, it is guaranteed that the elements are traversed in order. Otherwise, the order of iteration is undefined.

Note: We don’t provide fold-right on collections, since the order of elements doesn’t matter, so only fold is sufficient for meaningful traversal. However, sequences do have fold-right; see Mapping over sequences.

You can fold more than one collection, although it doesn’t make much sense unless all of the collections are sequences. Suppose E(k, i) for i-th element of k-th collection. proc is called as

(proc E(0,i) E(1,i) … E(K-1,i) Ri-1)

Different types of collections can be mixed together.

(fold acons '() "abc" '#(1 2 3))
  ⇒ ((#\c 3) (#\b 2) (#\a 1))

;; calculates dot product of two vectors
(fold (lambda (a b r) (+ (* a b) r)) 0
      '#(3 5 7) '#(2 4 6))
  ⇒ 68

When more than one collection is given, fold terminates as soon as at least one of the collections exhausted.

Generic function: fold2 proc knil1 knil2 coll coll2 … ¶

Generic function: fold3 proc knil1 knil2 knil3 coll coll2 … ¶

{gauche.collection} Like fold, but they can carry two and three state values instead of one, respectively. The state values are initialized by knilN. The procedure proc is called with each element of collN, and the state values. It must return two (fold2) or three (fold3) values, which will be used as the state values of next iteration. The values returned in the last iteration will be the return values of fold2 and fold3.

(fold2 (lambda (elt a b) (values (min elt a) (max elt b)))
       256 0 '#u8(33 12 142 1 74 98 12 5 99))
 ⇒ 1 and 142  ;; find minimum and maximum values

9.6.2 Selection and searching in collection

Generic function: find pred coll ¶

{gauche.collection} Applies pred for each element of a collection coll until pred returns a true value. Returns the element on which pred returned a true value, or #f if no element satisfies pred.

If coll is a sequence, it is guaranteed that pred is applied in order. Otherwise the order of application is undefined.

(find char-upper-case? "abcDe") ⇒ #\D
(find even? '#(1 3 4 6)) ⇒ 4
(find even? '(1 3 5 7))  ⇒ #f

Generic function: find-min coll :key key compare default ¶

Generic function: find-max coll :key key compare default ¶

{gauche.collection} Returns a minimum or maximum element in the collection coll.

A one-argument procedure key, whose default is identity, is applied for each element to obtain a comparison value. Then a comparison value is compared by a two-argument procedure compare, whose default is <. If the collection has zero or one element, the compare procedure is never called.

When the collection is empty, a value given to default is returned, whose default is #f.

(find-min '((a . 3) (b . 9) (c . -1) (d . 7)) :key cdr) ⇒ (c . -1)

Generic function: find-min&max coll :key key compare default default-min default-max ¶: {gauche.collection} Does find-min and find-max simultaneously, and returns two values, the minimum element and the maximum element. The keyword arguments key, compare, and default are the same as find-min and find-max. Alternatively you can give default values for minimum and maximum separately, by default-min and default-max.

Generic function: filter pred coll ¶

{gauche.collection} Returns a list of elements of collection coll that satisfies the predicate pred. If the collection is a sequence, the order is preserved in the result.

(filter char-upper-case? "Hello, World")
  ⇒ (#\H #\W)
(filter even? '#(1 2 3 4)) ⇒ (2 4)

Generic function: filter-to class pred coll ¶

{gauche.collection} Same as filter, but the result is returned as a collection of class class.

(filter-to <vector> even? '#(1 2 3 4)) ⇒ #(2 4)
(filter-to <string> char-upper-case? "Hello, World")
  ⇒ "HW"

Generic function: remove pred coll ¶

{gauche.collection} Returns a list of elements of collection coll that does not satisfy the predicate pred. If the collection is a sequence, the order is preserved in the result.

(remove char-upper-case? "Hello, World")
  ⇒ (#\e #\l #\l #\o #\, #\space #\o #\r #\l #\d)
(remove even? '#(1 2 3 4)) ⇒ (1 3)

Generic function: remove-to class pred coll ¶

{gauche.collection} Same as remove, but the result is returned as a collection of class class.

(remove-to <vector> even? '#(1 2 3 4)) ⇒ #(1 3)
(remove-to <string> char-upper-case? "Hello, World")
  ⇒ "ello, orld"

Generic function: partition pred coll ¶

{gauche.collection} Does filter and remove the same time. Returns two lists, the first consists of elements of the collection coll that satisfies the predicate pred, and the second consists of elements that doesn’t.

(partition char-upper-case? "PuPu")
  ⇒ (#\P #\P) and (#\u #\u)
(partition even? '#(1 2 3 4))
  ⇒ (2 4) and (1 3)

Generic function: partition-to class pred coll ¶

{gauche.collection} Same as partition, except the results are returned in the collections of class class.

(partition-to <string> char-upper-case? "PuPu")
  ⇒ "PP" and "uu"
(partition-to <vector> even? '#(1 2 3 4))
  ⇒ #(2 4) and #(1 3)

Generic function: group-collection coll :key key test ¶

{gauche.collection} Generalized partition. Groups elements in coll into those who has the same key value, and returns the groups as of lists. Key values are calculated by applying the procedure key to each element of coll. The default value of key is identity. For each element of coll, key is applied exactly once. The equal-ness of keys are compared by test procedure, whose default is eqv?.

If coll is a sequence, then the order of elements in each group of the result is the same order in coll.

(group-collection '(1 2 3 2 3 1 2 1 2 3 2 3))
  ⇒ ((1 1 1) (2 2 2 2 2) (3 3 3 3))

(group-collection '(1 2 3 2 3 1 2 1 2 3 2 3) :key odd?)
  ⇒ ((1 3 3 1 1 3 3) (2 2 2 2 2))

(group-collection '(("a" 2) ("b" 5) ("c" 1) ("b" 3) ("a" 6))
  :key car :test string=?)
  ⇒ ((("a" 2) ("a" 6)) (("b" 5) ("b" 3)) (("c" 1)))

See also group-sequence in gauche.sequence (see Other operations over sequences), which only groups adjacent elements.

Generic function: group-collection->alist coll :key key value test ¶

{gauche.collection} Similar to group-collection, but returns the result as an assoc list keyed by extracted keys of comparison. It’s eaiser to see the example:

(group-collection->alist '((3 "a") (2 "b") (1 "c") (3 "d") (1 "e"))
                         :key car :value cadr)
  ⇒ ((3 "a" "d") (2 "b") (1 "c" "e"))

;; Compare with this
(group-collection '((3 "a") (2 "b") (1 "c") (3 "d") (1 "e"))
                  :key car)
  ⇒ (((3 "a") (3 "d")) ((2 "b")) ((1 "c") (1 "e")))

First, elements in the collection is grouped with the common keys. Then, a pair (cons key (map value-proc elements)) is computed, where value-proc is the value extraction procedure passed by the value keyword argument. List of those pairs is returned.

If omitted, key and value arguments are assumed to be identity, and test argument (used to compare keys) is assumed to be eqv?.

If the collection is ordered, the order of the alist is the same as the order the first element with each key appears in the collection, and within each alist entry, the values are the same order of appearance in the collection.

9.6.3 Miscellaneous operations on collection

Generic function: size-of coll ¶: {gauche.collection} Returns the number of elements in the collection. Default method iterates over the collection to calculate the size, which is not very efficient and may diverge if the collection is infinite. Some collection classes overload the method for faster calculation.

Generic function: lazy-size-of coll ¶: {gauche.collection} Returns either the size of the collection, or a promise to calculate it. The intent of this method is to avoid size calculation if it is expensive. In some cases, the caller wants to have size just for optimization, and it is not desirable to spend time to calculate the size. Such caller uses this method and just discards the information if it is a promise.

Generic function: coerce-to class coll ¶

{gauche.collection} Convert a collection coll to another collection which is an instance of class. If coll is a sequence and class is a sequence class, the order is preserved.

(coerce-to <vector> '(1 2 3 4))
  ⇒ #(1 2 3 4)

(coerce-to <string> '#(#\a #\b #\c))
  ⇒ "abc"

9.6.4 Fundamental iterator creators

These are fundamental methods on which all the rest of iterative method are built. The method interface is not intended to be called from general code, but suitable for building other iterator construct. The reason why I chose this interface as fundamental methods are explained at the bottom of this subsection.

Generic function: call-with-iterator collection proc :key start ¶

{gauche.collection} A fundamental iterator creator. This creates two procedures from collection, both take no argument, and then call proc with those two procedures. The first procedure is terminate predicate, which returns #t if the iteration is exhausted, or #f if there are still elements to be visited. The second procedure is an incrementer, which returns one element from the collection and sets the internal pointer to the next element. The behavior is undefined if you call the incrementer after the terminate predicate returns #t.

If the collection is actually a sequence, the incrementer is guaranteed to return elements in order, from 0-th element to the last element. If a keyword argument start is given, however, the iteration begins from start-th element and ends at the last element. If the collection is not a sequence, the iteration order is arbitrary, and start argument has no effect.

An implementation of call-with-iterator method may limit the extent of the iterator inside the dynamic scope of the method. For example, it allocates some resource (e.g. connect to a database) before calling proc, and deallocates it (e.g. disconnect from a database) after proc returns.

This method returns the value(s) proc returns.

(call-with-iterator '(1 2 3 4 5)
  (lambda (end? next)
    (do ((odd-nums 0))
        ((end?) odd-nums)
      (when (odd? (next)) (inc! odd-nums)))))
 ⇒ 3

See also with-iterator macro below, for it is easier to use.

Macro: with-iterator (collection end? next args …) body … ¶

{gauche.collection} A convenience macro to call call-with-iterator.

(with-iterator (coll end? next args …) body …)
 ≡
(call-with-iterator coll
  (lambda (end? next) body …)
   args …)

Function: call-with-iterators collections proc ¶: {gauche.collection} A helper function to write n-ary iterator method. This function applies call-with-iterator for each collections, and makes two lists, the first consists of terminate predicates and the second of incrementers. Then proc is called with those two lists. Returns whatever proc returns.

Generic function: call-with-builder collection-class proc :key size ¶

{gauche.collection} A fundamental builder creator. Builder is a way to construct a collection incrementally. Not all collection classes provide this method.

Collection-class is a class of the collection to be built. This method creates two procedures, adder and getter, then calls proc with those procedures. Adder procedure takes one argument and adds it to the collection being built. Getter takes no argument and returns a built collection object. The effect is undefined if adder is called after getter is called.

A keyword argument size may be specified if the size of the result collection is known. Certain collections may be built much more efficiently if the size is known; other collections may just ignore it. The behavior is undefined if more than size elements are added, or the collection is retrieved before size elements are accumulated.

If the collection class is actually a sequence class, adder is guaranteed to add elements in order. Otherwise, the order of elements are insignificant.

Some collection class may take more keyword arguments to initialize the collection.

This method returns the value(s) proc returned.

(call-with-builder <list>
  (lambda (add! get)
    (add! 'a) (add! 'b) (add! 'c) (get)))
 ⇒ (a b c)

(call-with-builder <vector>
  (lambda (add! get)
    (add! 'a) (add! 'b) (add! 'c) (get)))
 ⇒ #(a b c)

See also with-builder macro below, for it is much easier to use.

Macro: with-builder (collection add! get args …) body … ¶

{gauche.collection} A convenience macro to call call-with-builder.

(with-builder (coll add! get args …) body …)
 ≡
(call-with-builder coll
  (lambda (add! get) body …)
  args …)

Discussion: Other iterator methods are built on top of call-with-iterator and call-with-builder. By implementing those methods, you can easily adapt your own collection class to all of those iterative operations. Optionally you can overload some of higher-level methods for efficiency.

It is debatable that which set of operations should be primitives. I chose call-with-iterator style for efficiency of the applications I see most. The following is a discussion of other possible primitive iterators.

fold: It is possible to make fold a primitive method, and build other iterator method on top of it. Collection-specific iterating states can be kept in the stack of fold, thus it runs efficiently. The method to optimize a procedure that uses fold as a basic iterator construct. However, it is rather cumbersome to derive generator-style interface from it. It is also tricky to iterate irregularly over more than one collections.
CPS: Passes iteratee the continuation procedure that continues the iteration. The iteratee just returns when it want to terminate the iteration. It has resource management problem described in Oleg Kiselyov’s article (http://okmij.org/ftp/Scheme/enumerators-callcc.html).
Iterator object: Like C++ iterator or Common Lisp generator. Easy to write loop. The problem is that every call of checking termination or getting next element must be dispatched.
Series: Common Lisp’s series can be very efficient if the compiler can statically analyze the usage of series. Unfortunately it is not the case in Gauche. Even if it could, the extension mechanism doesn’t blend well with Gauche’s object system.
Macros: Iterator can be implemented as macros, and that will be very efficient; e.g. Scheme48’s iterator macro. It uses macros to extend, however, and that doesn’t blend well with Gauche’s object system.

The current implementation is close to the iterator object approach, but using closures instead of iterator objects so that avoiding dispatching in the inner loop. Also it allows the iterator implementor to take care of the resource problem.

9.6.5 Implementing collections

The minimum requirements of the collection class implementation is as follow:

The class inherits <collection> abstract class.
A method call-with-iterator is implemented.

This makes iterator methods such as map, for-each, find and filter to work.

Optionally, you can specialize other generic functions to optimize performance. Particularly, we strongly recommend to specialize size-of, since its default method iterates entire collection to find the size, which is O(N). Usually a collection has bette way to find out its size.

In order to make the constructive methods (e.g. map-to to create your collection), you have to implement call-with-builder method as well. Note that call-with-builder method must work a sort of class method, dispatched by class, rather than normal method dispatched by instance. In Gauche, you can implement it by using a metaclass. Then the minimal code will look like this:

(define-class <your-collection-meta> (<class>) ())

(define-class <your-collection> (<collection>)
 (...) ;; slots
 :metaclass <your-collection-meta>)

(define-method call-with-iterator
    ((coll <your-collection>) proc . options)
  …
  )

(define-method call-with-builder
     ((coll <your-collection-meta>) proc . options)
  …
  )

• Mapping over collection:
• Selection and searching in collection:
• Miscellaneous operations on collection:
• Fundamental iterator creators:
• Implementing collections:

9.6 gauche.collection - Collection framework