gauche.collection
- Collection framework ¶This module provides a set of generic functions (GFs) that iterate over
various collections. The Scheme standard has some iterative
primitives such as map
and for-each
,
and scheme.list
(see scheme.list
- R7RS lists adds a rich set of such functions,
but they work only on lists.
Using the method dispatch of the object system, this module efficiently extends those functions for other collection classes such as vectors and hash tables. It also provides a simple way for user-defined class to adapt those operations. So far, the following operations are defined.
fold
, fold2
, fold3
,
map
, map-to
, map-accum
, for-each
find
, find-min
, find-max
, find-min&max
,
filter
, filter-to
,
remove
, remove-to
, partition
, partition-to
group-collection
coerce-to
size-of
, lazy-size-of
call-with-iterator
, call-with-builder
,
with-iterator
, with-builder
, call-with-iterators
.
Those operations work on collections and its subclass, sequences. A collection is a certain form of a set of objects that you can traverse all the object in it in a certain way. A sequence is a collection that all its elements are ordered, so that you can retrieve its element by index.
The following Gauche built-in objects are treated as collections and/or sequences.
<list>
A sequence.
<vector>
A sequence.
<string>
A sequence (of characters)
<hash-table>
A collection. Each element is a pair of a key and a value.
<s8vector>, <u8vector>, … <f64vector>
A sequence (methods defined in gauche.uvector
module,
see Uniform vectors).
See gauche.sequence
- Sequence framework, for it adds more sequence specific
methods.
The methods that needs to return a set of objects, i.e.
map
, filter
, remove
and partition
.
returns a list (or lists). The corresponding “-to” variant
(map-to
, filter-to
, remove-to
and partition-to
.
takes a collection class argument and returns the collection of the class.
• Mapping over collection: | ||
• Selection and searching in collection: | ||
• Miscellaneous operations on collection: | ||
• Fundamental iterator creators: | ||
• Implementing collections: |
These generic functions extends the standard mapping procedures. See also Mapping over sequences, if you care the index as well as elements.
{gauche.collection
}
This is a natural extension of fold (see Other list procedures).
For each element Ei in the collection coll, proc is called as (proc Ei Ri-1), where Ri-1 is the result of (i-1)-th invocation of proc for i > 0, and R0 is knil. Returns the last invocation of proc.
(fold + 0 '#(1 2 3 4)) ⇒ 10 (fold cons '() "abc") ⇒ (#\c #\b #\a)
If the coll is a sequence, it is guaranteed that the elements are traversed in order. Otherwise, the order of iteration is undefined.
Note: We don’t provide fold-right
on collections, since the order
of elements doesn’t matter, so only fold
is sufficient for
meaningful traversal.
However, sequences do have fold-right
;
see Mapping over sequences.
You can fold more than one collection, although it doesn’t make
much sense unless all of the collections are sequences.
Suppose E(k, i)
for i-th element
of k-th collection. proc is called as
(proc E(0,i) E(1,i) ... E(K-1,i) Ri-1)
Different types of collections can be mixed together.
(fold acons '() "abc" '#(1 2 3))
⇒ ((#\c 3) (#\b 2) (#\a 1))
;; calculates dot product of two vectors
(fold (lambda (a b r) (+ (* a b) r)) 0
'#(3 5 7) '#(2 4 6))
⇒ 68
When more than one collection is given, fold
terminates
as soon as at least one of the collections exhausted.
{gauche.collection
}
Like fold
, but they can carry two and three state values
instead of one, respectively. The state values are
initialized by knilN. The procedure proc is called
with each element of collN, and the state values. It must return
two (fold2
) or three (fold3
) values, which will be used
as the state values of next iteration. The values returned in the
last iteration will be the return values of fold2
and fold3
.
(fold2 (lambda (elt a b) (values (min elt a) (max elt b))) 256 0 '#u8(33 12 142 1 74 98 12 5 99)) ⇒ 1 and 142 ;; find minimum and maximum values
See also map-accum
below.
{gauche.collection
}
This extends the built-in map
(see Walking over lists).
Apply proc for each element in the collection coll, and
returns a list of the results.
If the coll is a sequence, it is guaranteed that the elements are traversed in order. Otherwise, the order of iteration is undefined.
If more than one collection is passed, proc is called with
elements for each collection. In such case, map
terminates
as soon as at least one of the collection is exhausted. Note that passing
more than one collection doesn’t make much sense unless
all the collections are sequences.
(map (lambda (x) (* x 2)) '#(1 2 3)) ⇒ #(2 4 6) (map char-upcase "abc") ⇒ (#\A #\B #\C) (map + '#(1 2 3) '#(4 5 6)) ⇒ (5 7 9)
map
always returns a list. If you want to get the result
in a different type of collection, use map-to
described below.
If you wonder why (map char-upcase "abc")
doesn’t return
"ABC"
, read the discussion in the bottom of this subsection.
{gauche.collection
}
This works the same as map
, except the result is returned
in a collection of class class. Class
must be a
collection class and have a builder interface
(see Fundamental iterator creators).
(map-to <vector> + '#(1 2 3) '#(4 5 6)) ⇒ #(5 7 9) (map-to <string> char-upcase "def") ⇒ "DEF" (map-to <vector> char=? "bed" "pet") ⇒ #(#f #t #f)
{gauche.collection
}
Collects results of proc over collections, while passing
a state value. proc is called like this:
(proc elt1 elt2 ... seed)
Where elt1 elt2 … are the elements of
coll1 coll2 ….
It must return two values; the first value is collected into
a list (like map
), while the second value is passed as
seed to the next call of proc.
When one of the collections is exhausted, map-accum
returns
two values, the list of the first return values from proc,
and the second return value of the last call of proc.
If the given collections are sequences, it is guaranteed that proc is applied in order of the sequence.
This is similar to Haskell’s mapAccumL
, but note that
the order of proc
’s argument and return values are
reversed.
{gauche.collection
}
Extension of built-in for-each
(see Walking over lists).
Applies proc for each elements in the collection(s).
The result of proc is discarded. The return value of
for-each
is undefined.
If the coll is a sequence, it is guaranteed that the elements are traversed in order. Otherwise, the order of iteration is undefined.
If more than one collection is passed, proc is called with
elements for each collection. In such case, for-each
terminates
as soon as one of the collection is exhausted. Note that passing
more than one collection doesn’t make much sense unless
all the collections are sequences.
{gauche.collection
}
Partial-application version of fold
, map
and for-each
.
Discussion: It is debatable what type of collection map
should return when it operates on the collections other than lists.
It may seem more “natural” if (map * '#(1 2) '#(3 4))
returns a vector, and (map char-upcase "abc")
returns a string.
Although such interface seems work for simple cases, it’ll become
problematic for more general cases. What type of collection should
be returned if a string and a vector are passed? Furthermore,
some collection may only have iterator interface but no builder
interface, so that the result can’t be coerced to the argument type
(suppose you’re mapping over database records, for example).
And Scheme programmers are used to think map
returns a list,
and the result of map
are applied to the procedures that
takes list everywhere.
So I decided to add another method, map-to
, to specify
the return type explicitly
The idea of passing the return type is taken from CommonLisp’s map
function, but taking a class metaobject, map-to
is much flexible
to extend using method dispatch.
This protocol (“-to” variant takes a class metaobject
for the result collection) is used throughout the collection framework.
{gauche.collection
}
Applies pred for each element of a collection coll until
pred returns a true value. Returns the element on which pred
returned a true value, or #f
if no element satisfies pred.
If coll is a sequence, it is guaranteed that pred is applied in order. Otherwise the order of application is undefined.
(find char-upper-case? "abcDe") ⇒ #\D (find even? '#(1 3 4 6)) ⇒ 4 (find even? '(1 3 5 7)) ⇒ #f
{gauche.collection
}
Returns a minimum or maximum element in the collection coll.
A one-argument procedure key, whose default is identity
,
is applied for each element to obtain a comparison value.
Then a comparison value is compared by a two-argument procedure
compare, whose default is <
.
If the collection has zero or one element, the compare procedure
is never called.
When the collection is empty, a value given to default is
returned, whose default is #f
.
(find-min '((a . 3) (b . 9) (c . -1) (d . 7)) :key cdr) ⇒ (c . -1)
{gauche.collection
}
Does find-min
and find-max
simultaneously, and returns
two values, the minimum element and the maximum element.
The keyword arguments key, compare, and default are
the same as find-min
and find-max
. Alternatively
you can give default values for minimum and maximum separately,
by default-min and default-max.
{gauche.collection
}
Returns a list of elements of collection coll that satisfies
the predicate pred. If the collection is a sequence,
the order is preserved in the result.
(filter char-upper-case? "Hello, World") ⇒ (#\H #\W) (filter even? '#(1 2 3 4)) ⇒ (2 4)
{gauche.collection
}
Same as filter
, but the result is returned
as a collection of class class.
(filter-to <vector> even? '#(1 2 3 4)) ⇒ #(2 4) (filter-to <string> char-upper-case? "Hello, World") ⇒ "HW"
{gauche.collection
}
Returns a list of elements of collection coll that does not
satisfy the predicate pred. If the collection is a sequence,
the order is preserved in the result.
(remove char-upper-case? "Hello, World") ⇒ (#\e #\l #\l #\o #\, #\space #\o #\r #\l #\d) (remove even? '#(1 2 3 4)) ⇒ (1 3)
{gauche.collection
}
Same as remove
, but the result is returned
as a collection of class class.
(remove-to <vector> even? '#(1 2 3 4)) ⇒ #(1 3) (remove-to <string> char-upper-case? "Hello, World") ⇒ "ello, orld"
{gauche.collection
}
Does filter
and remove
the same time.
Returns two lists, the first consists of elements of the collection
coll that satisfies the predicate pred, and the second
consists of elements that doesn’t.
(partition char-upper-case? "PuPu") ⇒ (#\P #\P) and (#\u #\u) (partition even? '#(1 2 3 4)) ⇒ (2 4) and (1 3)
{gauche.collection
}
Same as partition
, except the results are returned
in the collections of class class.
(partition-to <string> char-upper-case? "PuPu") ⇒ "PP" and "uu" (partition-to <vector> even? '#(1 2 3 4)) ⇒ #(2 4) and #(1 3)
{gauche.collection
}
Generalized partition
. Groups elements in coll
into those who has the same key value, and returns the groups as
of lists. Key values are calculated by applying the procedure key
to each element of coll. The default value of key is
identity
. For each element of coll, key is applied
exactly once.
The equal-ness of keys are compared by
test procedure, whose default is eqv?
.
If coll is a sequence, then the order of elements in each group of the result is the same order in coll.
(group-collection '(1 2 3 2 3 1 2 1 2 3 2 3)) ⇒ ((1 1 1) (2 2 2 2 2) (3 3 3 3)) (group-collection '(1 2 3 2 3 1 2 1 2 3 2 3) :key odd?) ⇒ ((1 3 3 1 1 3 3) (2 2 2 2 2)) (group-collection '(("a" 2) ("b" 5) ("c" 1) ("b" 3) ("a" 6)) :key car :test string=?) ⇒ ((("a" 2) ("a" 6)) (("b" 5) ("b" 3)) (("c" 1)))
See also group-sequence
in gauche.sequence
(see Other operations over sequences),
which only groups adjacent elements.
{gauche.collection
}
Similar to group-collection
, but returns the result as an assoc
list keyed by extracted keys of comparison. It’s eaiser to see the example:
(group-collection->alist '((3 "a") (2 "b") (1 "c") (3 "d") (1 "e")) :key car :value cadr) ⇒ ((3 "a" "d") (2 "b") (1 "c" "e")) ;; Compare with this (group-collection '((3 "a") (2 "b") (1 "c") (3 "d") (1 "e")) :key car) ⇒ (((3 "a") (3 "d")) ((2 "b")) ((1 "c") (1 "e")))
First, elements in the collection is grouped with the common keys.
Then, a pair (cons key (map value-proc elements))
is computed, where value-proc
is the value extraction procedure
passed by the value keyword argument. List of those pairs is returned.
If omitted, key and value arguments are assumed to be
identity
, and test argument (used to compare keys)
is assumed to be eqv?
.
If the collection is ordered, the order of the alist is the same as the order the first element with each key appears in the collection, and within each alist entry, the values are the same order of appearance in the collection.
{gauche.collection
}
Returns the number of elements in the collection. Default method iterates over
the collection to calculate the size, which is not very efficient
and may diverge if the collection is infinite.
Some collection classes overload the method for faster calculation.
{gauche.collection
}
Returns either the size of the collection, or a promise to
calculate it. The intent of this method is to avoid
size calculation if it is expensive. In some cases, the caller
wants to have size just for optimization, and it is not desirable
to spend time to calculate the size. Such caller uses this method
and just discards the information if it is a promise.
{gauche.collection
}
Convert a collection coll to another collection
which is an instance of class.
If coll is a sequence and class is a sequence class,
the order is preserved.
(coerce-to <vector> '(1 2 3 4)) ⇒ #(1 2 3 4) (coerce-to <string> '#(#\a #\b #\c)) ⇒ "abc"
These are fundamental methods on which all the rest of iterative method are built. The method interface is not intended to be called from general code, but suitable for building other iterator construct. The reason why I chose this interface as fundamental methods are explained at the bottom of this subsection.
{gauche.collection
}
A fundamental iterator creator. This creates two procedures
from collection, both take no argument, and then call
proc with those two procedures. The first procedure is
terminate predicate, which returns #t
if the iteration
is exhausted, or #f
if there are still elements to be visited.
The second procedure is an incrementer,
which returns one element from the collection and sets the
internal pointer to the next element.
The behavior is undefined if you call the incrementer after
the terminate predicate returns #t
.
If the collection is actually a sequence, the incrementer is guaranteed to return elements in order, from 0-th element to the last element. If a keyword argument start is given, however, the iteration begins from start-th element and ends at the last element. If the collection is not a sequence, the iteration order is arbitrary, and start argument has no effect.
An implementation of call-with-iterator method may limit the extent of the iterator inside the dynamic scope of the method. For example, it allocates some resource (e.g. connect to a database) before calling proc, and deallocates it (e.g. disconnect from a database) after proc returns.
This method returns the value(s) proc returns.
(call-with-iterator '(1 2 3 4 5) (lambda (end? next) (do ((odd-nums 0)) ((end?) odd-nums) (when (odd? (next)) (inc! odd-nums))))) ⇒ 3
See also with-iterator
macro below, for it is easier to use.
{gauche.collection
}
A convenience macro to call call-with-iterator
.
(with-iterator (coll end? next args ...) body ...) ≡ (call-with-iterator coll (lambda (end? next) body ...) args ...)
{gauche.collection
}
A helper function to write n-ary iterator method.
This function applies call-with-iterator
for each collections,
and makes two lists, the first consists of terminate predicates
and the second of incrementers. Then proc is called
with those two lists. Returns whatever proc returns.
{gauche.collection
}
A fundamental builder creator. Builder is a way to construct
a collection incrementally. Not all collection classes provide
this method.
Collection-class is a class of the collection to be built. This method creates two procedures, adder and getter, then calls proc with those procedures. Adder procedure takes one argument and adds it to the collection being built. Getter takes no argument and returns a built collection object. The effect is undefined if adder is called after getter is called.
A keyword argument size may be specified if the size of the result collection is known. Certain collections may be built much more efficiently if the size is known; other collections may just ignore it. The behavior is undefined if more than size elements are added, or the collection is retrieved before size elements are accumulated.
If the collection class is actually a sequence class, adder is guaranteed to add elements in order. Otherwise, the order of elements are insignificant.
Some collection class may take more keyword arguments to initialize the collection.
This method returns the value(s) proc returned.
(call-with-builder <list> (lambda (add! get) (add! 'a) (add! 'b) (add! 'c) (get))) ⇒ (a b c) (call-with-builder <vector> (lambda (add! get) (add! 'a) (add! 'b) (add! 'c) (get))) ⇒ #(a b c)
See also with-builder
macro below, for it is much easier to use.
{gauche.collection
}
A convenience macro to call call-with-builder
.
(with-builder (coll add! get args ...) body ...) ≡ (call-with-builder coll (lambda (add! get) body ...) args ...)
Discussion: Other iterator methods are built on top of call-with-iterator and call-with-builder. By implementing those methods, you can easily adapt your own collection class to all of those iterative operations. Optionally you can overload some of higher-level methods for efficiency.
It is debatable that which set of operations should be primitives. I chose call-with-iterator style for efficiency of the applications I see most. The following is a discussion of other possible primitive iterators.
fold
It is possible to make fold
a primitive method, and
build other iterator method on top of it.
Collection-specific iterating states can be kept in the
stack of fold
, thus it runs efficiently. The method
to optimize a procedure that uses fold
as a basic
iterator construct.
However, it is rather cumbersome to derive
generator-style interface from it. It is also tricky
to iterate irregularly over more than one collections.
Passes iteratee the continuation procedure that continues the iteration. The iteratee just returns when it want to terminate the iteration. It has resource management problem described in Oleg Kiselyov’s article (http://okmij.org/ftp/Scheme/enumerators-callcc.html).
Like C++ iterator or Common Lisp generator. Easy to write loop. The problem is that every call of checking termination or getting next element must be dispatched.
Common Lisp’s series can be very efficient if the compiler can statically analyze the usage of series. Unfortunately it is not the case in Gauche. Even if it could, the extension mechanism doesn’t blend well with Gauche’s object system.
Iterator can be implemented as macros, and that will be very efficient; e.g. Scheme48’s iterator macro. It uses macros to extend, however, and that doesn’t blend well with Gauche’s object system.
The current implementation is close to the iterator object approach, but using closures instead of iterator objects so that avoiding dispatching in the inner loop. Also it allows the iterator implementor to take care of the resource problem.
The minimum requirements of the collection class implementation is as follow:
<collection>
abstract class.
call-with-iterator
is implemented.
This makes iterator methods such as map
, for-each
,
find
and filter
to work.
Optionally, you can specialize other generic functions to optimize
performance. Particularly, we strongly recommend to specialize
size-of
, since its default method iterates entire collection
to find the size, which is O(N). Usually a collection has bette way
to find out its size.
In order to make the constructive methods (e.g. map-to
to
create your collection), you have to implement call-with-builder
method as well. Note that call-with-builder
method must work
a sort of class method, dispatched by class, rather than normal method
dispatched by instance. In Gauche, you can implement it by using a
metaclass. Then the minimal code will look like this:
(define-class <your-collection-meta> (<class>) ())
(define-class <your-collection> (<collection>)
(...) ;; slots
:metaclass <your-collection-meta>)
(define-method call-with-iterator
((coll <your-collection>) proc . options)
...
)
(define-method call-with-builder
((coll <your-collection-meta>) proc . options)
...
)