For Development HEAD DRAFTSearch (procedure/syntax/module):

9.15 gauche.lazy - Lazy sequence utilities

Module: gauche.lazy

This module provides utility procedures that yields lazy sequences. For the details of lazy sequences, see Lazy sequences.

Since lazy sequences are forced implicitly and indistinguishable from ordinary lists, we don’t need a separate set of procedures for taking lists and lazy sequences; we can use find to search in both ordinary lists and lazy sequences.

However, we do need a separate set of procedures for returning either lists or lazy sequences. For example, lmap can take any kind of sequences, and returns lazy sequence (and calls the procedure on demand).

This distinction is subtle, so I reiterate it. You can use both map and lmap on lazy sequences. If you want the result list at once, use map; it doesn’t have overhead of delayed calculation. If you don’t know you’ll use the entire result, or you know the result will get very large list and don’t want to waste space for an intermediate list, you want to use lmap.


9.15.1 Lazy sequence constructors

You can construct lazy sequences with built-in generator->lseq and lcons. This module provides a few more constructors.

Function: x->lseq obj

{gauche.lazy} A convenience function to coerce obj to (possibly lazy) list. If obj is a list, it is returned as it is. If obj is other type of collection, the return value is a lazy sequence that iterates over the collection. If obj is other object, it is returned as it is (you can think of it as a special case of dotted list).

If you try x->lseq in REPL, it looks as if it just converts the input collection to a list.

(x->lseq '#(a b c)) ⇒ (a b c)

But that’s because the lazy sequence is forced by the output routine of the REPL.

Function: lunfold p f g seed :optional tail-gen

{gauche.lazy} A lazy version of unfold (see scheme.list - R7RS lists). The arguments p, f and g are procedures, each of which take one argument, the current seed value. The predicate p determines when to stop, f creates each element, and g generates the next seed value. The seed argument gives the initial seed value. If tail-gen is given, it should also be a procedure that takes one argument, the last seed value (that is, the seed value (p seed) returned #f). It must return a (possibly lazy) list, that forms the tail of the resulting sequence.

(lunfold ($ = 10 $) ($ * 2 $) ($ + 1 $) 0 (^_ '(end)))
  ⇒ (0 2 4 6 8 10 12 14 16 18 end)
Function: literate proc seed

{gauche.lazy} Creates an infinite sequence of (seed (proc seed) (proc (proc seed)) …).

The same sequence can be created with (lunfold (^_ #f) identity proc seed), but this one is a lot more efficient.

See also util.stream, which as stream-iterate (see Stream constructors).

(take (literate (pa$ + 1) 0) 10)
 ⇒ (0 1 2 3 4 5 6 7 8 9)
Function: coroutine->lseq proc

{gauche.lazy} The proc procedure is called with one argument, yield, which is also a procedure that takes one argument. Whenever yeild is called, the value passed to it becomes the next element of resulting lseq. When proc returns, lseq ends.

(coroutine->lseq (^[yield] (dotimes [i 10] (yield (square i)))))
  ⇒ (0 1 4 9 16 25 36 49 64 81)

See also generate (Generator constructors), and coroutine->cseq (control.cseq - Concurrent sequences).


9.15.2 Lazy sequence operations

Function: lseq->list obj

{gauche.lazy} Returns obj, but if it is an lseq, fully computes all values. It is useful when you need to ensure necessary compuatation is done by certain moment, e.g. you want to ensure all data is read from a port before closing it.

Function: lmap proc seq seq2 …

{gauche.lazy} Returns a lazy sequence consists of values calculated by applying proc to every first element of seq seq2 …, every second element of them, etc., until any of the input is exhausted. Application of proc will be delayed as needed.

;; If you use map instead of lmap, this won't return
(use math.prime)
(take (lmap (pa$ * 2) *primes*) 10)
  ⇒ (4 6 10 14 22 26 34 38 46 58)
Function: lmap-accum proc seed seq seq2 …

{gauche.lazy} The procedure proc takes one element each from seq seq2 …, plus the current seed value. It must return two values, a result value and the next seed value. The result of lmap-accum is a lazy sequence consists of the first values returned by each invocation of proc.

(use math.prime)
(take (lmap-accum (^[p sum] (values sum (+ p sum))) 0 *primes*) 10)
  ⇒ (0 2 5 10 17 28 41 58 77 100)

This is a lazy version of map-accum (see Mapping over collection), but lmap-accum does not return the final seed value. We only know the final seed value when we have the result sequence to the end, so it can’t be calculated lazily.

Function: lappend seq …

{gauche.lazy} Returns a lazy sequence which is concatenation of seq …. Unlike append, this procedure returns immediately, taking O(1) time. It is useful when you want to append large sequences but may use only a part of the result.

Function: lconcatenate seqs

{gauche.lazy} The seqs argument is a sequence of sequences. Returns a lazy sequence that is a concatenation of all the sequences in seqs.

This differs from (apply lappend seqs), for lconcatenate can handle infinite number of lazy seqs.

Function: lappend-map proc seq1 seq …

{gauche.lazy} Lazy version of append-map. This differs from a simple composition of lappend and lmap, since (apply lappend (lmap proc seq1 seq …)) would evaluate the result of lmap to the end before passing it to lappend (it’s because apply need to determine the list of arguments before calling lappend).

It also differs from (lconcatenate (lmap proc seq1 seq …)) in the subtle way.

Remember that Gauche’s lazy sequence evaluates one element ahead? lconcatenate does that to the result of lmap. To see the effect, let’s define a procedure with a debug print:

(define (p x) #?=(list x x))

You can see in the following example that (apply lappend (lmap ...)) wouldn’t delay any of application of p:

gosh> (car (apply lappend (lmap p '(1 2 3))))
(car (apply lappend (lmap p '(1 2 3))))
#?="(standard input)":4:(list x x)
#?-    (1 1)
#?="(standard input)":4:(list x x)
#?-    (2 2)
#?="(standard input)":4:(list x x)
#?-    (3 3)
1

How about lconcatenate?

gosh> (car (lconcatenate (lmap p '(1 2 3))))
(car (lconcatenate (lmap p '(1 2 3))))
#?="(standard input)":4:(list x x)
#?-    (1 1)
#?="(standard input)":4:(list x x)
#?-    (2 2)
1

Oops, even though we need only the first element, and the first result of lmap, (1 1), provides the second element, too, p is already applied to the second input.

This is because the intermediate lazy list of the result of lmap is evaluated “one element ahead”. On the other hand, lappend-map doesn’t have this problem.

gosh> (car (lappend-map p '(1 2 3)))
(car (lappend-map p '(1 2 3)))
#?="(standard input)":4:(list x x)
#?-    (1 1)
1
Function: linterweave seq …

{gauche.lazy} Returns a lazy seq of the first items from seq …, then their second items, and so on. If the length of shortest sequence of seqs is N, the length of the resulting sequence is (* N number-of-sequences). If all of seqs are infinite, the resulting sequence is also infinite.

(linterweave (lrange 0) '(a b c d e) (circular-list '*))
 ⇒ (0 a * 1 b * 2 c * 3 d * 4 e *)
Function: lfilter proc seq

{gauche.lazy} Returns a lazy sequence that consists of non-false values calculated by applying proc on every elements in seq.

Function: lfilter-map proc seq seq2 …

{gauche.lazy} Lazy version of filter-map.

Function: lstate-filter proc seed seq

{gauche.lazy} Lazy sequence version of gstate-filter (see Generator operations).

Function: ltake seq n :optional fill? padding
Function: ltake-while pred seq

{gauche.lazy} Lazy versions of take* and take-while (see List accessors and modifiers). Note that ltake works rather like take* than take, that is, it won’t complain if the input sequence has less than n elements. Because of the lazy nature of ltake, it can’t know whether input is too short or not before returning the sequence.

There are no ldrop and ldrop-while; you don’t need them. if you apply drop and drop-while on lazy sequence, they return lazy sequence.

Function: lrxmatch rx seq

{gauche.lazy} This is a lazy sequence version of grxmatch (see Generator operations).

The seq argument must be a sequence of characters (including ordinary strings). The return value is a lazy sequence of <rxmatch> objects, each representing strings matching to the regular expression rx.

This procedure is convenient to scan character sequences from lazy character sequences, but it may be slow if you’re looking for rarely matching string from very large non-string input. Unless seq is a string, lrxmatch buffers certain length of input, and if matching phrase isn’t found, it extend the buffer and scan again from the beginning, since the match may span from the end of previous chunk to the newly added portion.

Function: lslices seq k :optional fill? padding

{gauche.lazy} Lazy version of slices (see List accessors and modifiers).

(lslices '(a b c d e f) 2)
  ⇒ ((a b) (c d) (e f))

9.15.3 Lazy sequence with positions

Treating input data stream as a lazy sequence is a powerful abstraction; especially, it allows unlimited lookahead with simple list manipulation.

However, you’ll have a difficulty when you want to know the position of the input data within the input stream, e.g. for an error message. Unlike reading from a port, which gives you the current input position, a lazy sequence just looks like a list and unknown amout of data may be prefetched from the real source.

Gauche has special pair objects, called extended pairs, that can carry auxiliary information (see Extended pairs and pair attributes). You can create a lazy sequence that carries positional information using the feature.

Class: <sequence-position>

{gauche.lazy} An immutable structure holding positional information. It is returned by lseq-position. The information is queried by the following procedure.

Function: sequence-position-source seqpos
Function: sequence-position-line seqpos
Function: sequence-position-column seqpos
Function: sequence-position-item-count seqpos

{gauche.lazy} Query positional information to a <sequence-position> instance seqpos.

Returns the source name (usually the source file name), the line count (starting from 1), the column count (starting from 1), and the item count (number of characters, starting from 0).

The source name may be #f if it is not available.

Function: port->char-lseq/position :optional port :key source-name start-line start-column start-item-count

{gauche.lazy} Like port->char-lseq, returns a lazy sequence of characters read from an input port port. However, the sequence returned by this procedure has positional info attached, and can be retrieved by lseq-position.

The source-name, start-line, start-column and start-item-count initializes the positional info before start reading characters. The default values are (port-name port), 1, 1 and 0, respectively. If you’re reading from a freshly opened port, the default values suffice. Specify these if you’ve already read some data from the port, for example.

Function: generator->lseq/position char-gen :key source-name start-line start-column start-item-count

{gauche.lazy} Like generator->lseq, returns a lazy sequence of characters generated by char-gen. However, the sequence returned by this procedure has positional info attached, and can be retrieved by lseq-position.

The source-name, start-line, start-column and start-item-count initializes the positional info before start reading characters. The default values are #f, 1, 1 and 0, respectively.

Function: lseq-position seq

{gauche.lazy} If seq is a lazy sequence with positional info attached, retrieve it and returns a <sequence-position> instance.

If seq doesn’t have positional info, or not even a sequence, #f is returned.



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT