Lazy evaluation (Gauche Users’ Reference)

6.18 Lazy evaluation

Gauche has two primitive lazy evaluation mechanisms.

The first one is an explicit mechanism, defined in the Scheme standard: You mark an expression to be evaluated lazily by delay, and you use force to make the evaluation happen when needed. Gauche also support another primitive lazy, as defined in SRFI-45, for space-efficient tail-recursive lazy algorithms.

The second one is a lazy sequence, in which evaluation happens implicitly. From a Scheme program, a lazy sequence just looks as a list—you can take its car and cdr, and you can apply map or other list procedures on it. However, internally, its element isn’t calculated until it is required.

6.18.1 Delay, force and lazy

Scheme has traditionally provided an explicit delayed evaluation mechanism using delay and force. After R5RS, however, it is found that it didn’t mix well with tail-recursive algorithms: It required unbound memory, despite that the body of the algorithm could be expressed in iterative manner. SRFI-45 showed that introducing another primitive syntax lazy addresses the issue. For the detailed explanation please look at the SRFI-45 document. Here we explain how to use those primitives.

Special Form: delay expression ¶

Special Form: lazy expression ¶

[R7RS lazy][SRFI-45] These forms creates a promise that delays the evaluation of expression. Expression will be evaluated when the promise is passed to force.

If expression itself is expected to yield a promise, you should use lazy. Otherwise, you should use delay. If you can think in types, the difference may be clearer.

lazy  : Promise a -> Promise a
delay : a -> Promise a

Since we don’t have static typing, we can’t enforce this usage. The programmer has to choose appropriate one from the context. Generally, lazy appears only to surround the entire body of function that express a lazy algorithm.

NB: In R7RS, lazy is called delay-force, for the operation is conceptually similar to (delay (force expr)) (note that the type of force is Promise a -> a).

For the real-world example of use of lazy, you may want to check the implementation of util.stream (see util.stream - Stream library).

Note that expression may be evaluated more than once; it may require the value of its own promise recursively, or more than one thread may evaluate expression simultaneously. In general, you shouldn’t have side-effecting computation in expression—even if it is only evaluated once, you may not exactly know when the evaluation occur, so having side-effects there is a bad ides. (See force below for multi-thread semantics).

Function: eager obj … ¶

Function: make-promise obj … ¶

[SRFI-45][SRFI-226] Returns a promise that returns obj …. The name eager is introduced in SRFI-45. It takes single argument, but Gauche enhanced it to take multiple values. SRFI-226 defines make-promise, which is the same as multi-value eager. (Note that R7RS-small also defines make-promise in scheme.lazy library (see scheme.lazy - R7RS lazy evaluation). It is slightly different that it only takes one argument, and if the argument is a promise, it will be returned as is. This change is intentional; see SRFI-226 for the details.)

Since that these are procedures, arguments are evaluated before they are called; so they work as a type converter (a -> Promise a) without delaying the evaluation. Used mainly to construct promise-returning functions.

Function: force promise ¶

[R7RS lazy][SRFI-226] If promise is not a promise, it is just returned.

Otherwise, if promise’s value hasn’t been computed, force makes promise’s encapsulated expression be evaluated in the same parameterization where the promise is created, and returns the result.

Once promise’s value is computed, it is memorized in it so that subsequent force on it won’t cause the computation.

In Gauche, if the delayed expression yields multiple values, force yields multiple values as well. (It is unspecified in R7RS).

Note that if uncomputed promise is forced simultaneously by more than one thread, its expression may be evaluated concurrently. The first delivered value becomes the value of the promise; if another thread finishes computation and see the value of promise is already determined, it discards its computation result and returns the determined value.

The following example shows the promise is evaluated with the parameterization of delay:

(define p (make-parameter 1))

(let1 v (parameterize ((p 2)) (delay (p)))
  (parameterize ((p 3))
    (force v)))
  ⇒ 2

Note: R7RS defines force to evaluate the unevaluated promise’s body in the dynamic environment of the caller of force, but it is regarded as a mistake. In general, you shouldn’t care which force call actually evaluates the promise’s body, so the result shouldn’t depend on the context of force call. SRFI-226 addressed it.

Function: promise? obj ¶: [R7RS lazy] Returns #t iff obj is a promise object.

6.18.2 Lazy sequences

Introduction

A lazy sequence is a list-like structure whose elements are calculated lazily. Internally we have a special type of pairs, whose cdr is evaluated on demand. However, in Scheme level, you’ll never see a distinct “lazy-pair” type. As soon as you try to access a lazy pair, Gauche automatically force the delayed calculation, and the lazy pair turns into an ordinary pair.

It means you can pass lazy sequences to ordinary list-processing procedures such as car, cdr or map.

Look at the following example; generator->lseq takes a procedure that generates one value at a time, and returns a lazy sequence that consists of those values.

(with-input-from-file "file"
  (^[] (let loop ([cs (generator->lseq read-char)] [i 0])
         (match cs
           [() #f]
           [(#\c (or #\a #\d) #\r . _) i]
           [(c . cs) (loop cs (+ i 1))]))))

It returns the position of the first occurrence of character sequence “car” or “cdr” in the file file. The loop treats the lazy sequence just like an ordinary list, but characters are read as needed, so once the sequence is found, the rest of the file won’t be read. If we do it eagerly, we would have to read the entire file first no matter how big it is, or to give up using the mighty match macro and to write a basic state machine that reads one character one at a time.

Other than implicit forcing, Gauche’s lazy sequences are slightly different than the typical lazy stream implementations in Scheme in the following ways:

When you construct a lazy sequence in an iterative lazy algorithm, only cdr side of the lazy pair is lazily evaluated; the car side is evaluated immediately. On the other hand, with stream-cons in util.stream (see util.stream - Stream library), both car and cdr sides won’t be evaluated until it is absolutely needed.
Gauche’s lazy sequence always evaluates one item ahead. Once you get a lazy pair, its car part is already calculated, even if you don’t use it. In most cases you don’t need to care, for calculating one item more is a negligible overhead. However, when you create a self-referential lazy structure, in which the earlier elements of a sequence is used to calculate the latter elements of itself, a bit of caution is needed; a valid code for fully lazy circular structure may not terminate in Gauche’s lazy sequences. We’ll show a concrete example later. This bit of eagerness is also visible when side effects are involved; for example, lazy character sequence reading from a port may read one character ahead.

Note: R7RS scheme.lseq (SRFI-127) provides a portable alternative of lazy sequence (see scheme.lseq - R7RS lazy sequences). It uses dedicated APIs (e.g. lseq-cdr) to operate on lazy sequences so that portable implementation is possible. In Gauche, we just use our built-in lazy sequence as SRFI-127 lazy sequence; if you want your code to be portable, consider using SRFI-127, but be careful not to mix lazy sequences and ordinary lists; Gauche won’t complain, but other Scheme implementation may choke on it.

Primitives

Function: generator->lseq generator ¶

Function: generator->lseq item … generator ¶

[R7RS lseq] Creates a lazy sequence that consists of items produced by generator, which is just a procedure with zero arguments that yields an item at a time. Returning EOF marks the end of the sequence (EOF itself isn’t included in the sequence). For example, read-char can work as a generator. Gauche has a set of convenient utilities to deal with generators (see gauche.generator - Generators).

In the second form, the returned lazy sequence is prepended by item …. Since there’s no way to distinguish lazy pairs and ordinary pairs, you can write it as (cons* item … (generator->lseq generator)), but that’s more verbose.

Internally, Gauche’s lazy sequence is optimized to be built on top of generators, so this procedure is the most efficient way to build lazy sequences.

Note: SRFI-127 also has generator->lseq, which is exactly the same as this in Gauche.

Macro: lcons car cdr ¶

Returns a lazy pair consists of car and cdr. The expression car is evaluated at the call of lcons, but evaluation of cdr is delayed.

You can’t distinguish a lazy pair from an ordinary pair. If you access either its car or cdr, or even you ask pair? to it, its cdr part is implicitly forced and you get an ordinary pair.

Unlike cons, cdr should be an expression that yields a (lazy or ordinary) list, including an empty list. In other words, lazy sequences can always be a null-terminated list when entirely forced; there are no “improper lazy sequences”. (Since Scheme isn’t statically typed, we can’t force the cdr expression to be a proper list before actually evaluating it. Currently if cdr expression yields non-list, we just ignore it and treat as if it yielded an empty list.)

(define z (lcons (begin (print 1) 'a) (begin (print 2) '())))
 ⇒ ; prints '1', since the car part is evaluated eagerly.

(cdr z) ⇒ () ;; and prints '2'

;; This also prints '2', for accessing car of a lazy pair forces
;; its cdr, even the cdr part isn't used.
(car (lcons 'a (begin (print 2) '()))) ⇒ a

;; So as this; asking pair? to a lazy pair causes forcing its cdr.
(pair? (lcons 'a (begin (print 2) '()))) ⇒ #t

;; To clarify: This doesn't print '2', because the second lazy
;; pair never be accessed, so its cdr isn't evaluated.
(pair? (lcons 'a (lcons 'b (begin (print 2) '())))) ⇒ #t

Now, let me show you a case where “one item ahead” evaluation becomes an issue. The following is an elegant definition of infinite Fibonacci sequence using self-referential lazy structure (lmap is a lazy map, defined in gauche.lazy module):

(use gauche.lazy)  ;; for lmap
(define *fibs* (lcons* 0 1 (lmap + *fibs* (cdr *fibs*)))) ;; BUGGY

Unfortunately, Gauche can’t handle it well.

(car *fibs*)
 ⇒ 0
(cadr *fibs*)
 ⇒ *** ERROR: Attempt to recursively force a lazy pair.

When we want to access the second argument (cadr) of *fibs*, we take the car of the second pair, which is a lazy pair of 1 and (lmap ...). The lazy pair is forced and its cdr part needs to be calculated. The first thing lmap returns needs to see the first and second element of *fibs*, but the second element of *fibs* is what we’re calculating now!

We can workaround this issue by avoiding accessing the immediately preceding value. Fibonacci numbers F(n) = F(n-1) + F(n-2) = 2*F(n-2) + F(n-3), so we can write our sequence as follows.

(define *fibs*
  (lcons* 0 1 1 (lmap (^[a b] (+ a (* b 2))) *fibs* (cdr *fibs*))))

And this works!

(take *fibs* 20)
  ⇒ (0 1 1 2 3 5 8 13 21 34 55 89 144 233
     377 610 987 1597 2584 4181)

Many lazy algorithms are defined in terms of fully-lazy cons at the bottom. When you port such algorithms to Gauche using lcons, keep this bit of eagerness in mind.

Note also that lcons needs to create a thunk to delay the evaluation. So the algorithm to construct lazy list using lcons has an overhead of making closure for each item. For performance-critical part, you want to use generator->lseq whenever possible.

Utilities

Macro: lcons* x … tail ¶

Macro: llist* x … tail ¶

A lazy version of cons* (see List constructors). Both lcons* and llist* do the same thing; both names are provided for the symmetry to cons*/list*.

The tail argument should be an expression that yields a (possibly lazy) list. It is evaluated lazily. Note that the preceding elements x … are evaluated eagerly. The following equivalences hold.

(lcons* a)           ≡ a
(lcons* a b)         ≡ (lcons a b)
(lcons* a b ... y z) ≡ (cons* a b … (lcons y z))

Function: lrange start :optional end step ¶

Creates a lazy sequence of numbers starting from start, increasing by step (default 1), to the maximum value that doesn’t exceed end. The default of end is +inf.0, so it creates an infinite list. (Don’t type just (lrange 0) in REPL, or it won’t terminate!)

If any of start or step is inexact, the resulting sequence has inexact numbers.

(take (lrange -1) 3) ⇒ (-1 0 1)

(lrange 0.0 5 0.5)
  ⇒ (0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5)

(lrange 1/4 1 1/8)
  ⇒ (1/4 3/8 1/2 5/8 3/4 7/8)

Function: liota :optional (count +inf.0) (start 0) (step 1) ¶

A lazy version of iota (see List constructors); returns a lazy sequence of count integers (default: positive infinity), starting from start (default: 0), stepping by step (default: 1).

Just like iota, the result consists of exact numbers if and only if both start and step are exact; otherwise the result consists of inexact numbers.

Function: port->char-lseq :optional port ¶

Function: port->byte-lseq :optional port ¶

Function: port->string-lseq :optional port ¶

Function: port->sexp-lseq :optional port ¶

These are the same as the following expressions, respectively. They are provided for the convenience, since this pattern appears frequently.

(generator->lseq (cut read-char port))
(generator->lseq (cut read-byte port))
(generator->lseq (cut read-line port))
(generator->lseq (cut read port))

If port is omitted, the current input port is used.

Note that the lazy sequence may buffer some items, so once you make an lseq from a port, only use the resulting lseq and don’t ever read from port directly.

Note that the lazy sequence terminates when EOF is read from the port, but the port isn’t closed. The port should be managed in larger dynamic extent where the lazy sequence is used.

You can also convert input data into various lists by the following expressions (see Input utility functions). Those procedures read the port eagerly until EOF and returns the whole data in a list, while lseq versions read the port lazily.

(port->list read-char port)
(port->list read-byte port)
(port->string-list port)
(port->sexp-list port)

Those procedures make (lazy) lists out of ports. The opposite can be done by open-input-char-list and open-input-byte-list; See gauche.vport - Virtual ports, for the details.

See also gauche.lazy - Lazy sequence utilities, for more utility procedures that creates lazy sequences.

Examples

Let’s consider calculating an infinite sequence of prime numbers. (Note: If you need prime numbers in your application, you don’t need to write one; just use math.prime. see math.prime - Prime numbers).

Just pretend we already have some prime numbers calculated in a variable *primes*, and you need to find a prime number equal to or grater than n (for simplicity, we assume n is an odd number).

(define (next-prime n)
  (let loop ([ps *primes*])
    (let1 p (car ps)
      (cond [(> (* p p) n) n]
            [(zero? (modulo n p)) (next-prime (+ n 2))]
            [else (loop (cdr ps))]))))

This procedure loops over the list of prime numbers, and if no prime number p less than or equal to (sqrt n) divides n, we can say n is prime. (Actual test is done by (> (* p p) n) instead of (> p (sqrt n)), for the former is faster.) If we find some p divides n, we try a new value (+ n 2) with next-prime.

Using next-prime, we can make a generator that keeps generating prime numbers. The following procedure returns a generator that returns primes above last.

(define (gen-primes-above last)
  (^[] (set! last (next-prime (+ last 2))) last))

Using generator->lseq, we can turn the generator returned by gen-primes-above into a lazy list, which can be used as the value of *prime*. The only caveat is that we need to have some pre-calculated prime numbers:

(define *primes* (generator->lseq 2 3 5 (gen-primes-above 5)))

Be careful not to evaluate *primes* directly on REPL, since it contains an infinite list and it’ll blow up your REPL. You can look at the first 20 prime numbers instead:

(take *primes* 20)
 ⇒ (2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71)

Or find what the 10000-th prime number is:

(~ *primes* 10000)
 ⇒ 104743

Or count how many prime numbers there are below 1000000:

(any (^[p i] (and (>= p 1000000) i)) *primes* (lrange 0))
 ⇒ 78498

Note: If you’re familiar with the lazy functional approach, this example may look strange. Why do we use side-effecting generators while we can define a sequence of prime numbers in pure functional way, as follows?

(use gauche.lazy)

(define (prime? n)
  (not (any (^p (zero? (mod n p)))
            (ltake-while (^k (<= (* k k) n)) *primes*))))

(define (primes-from k)
  (if (prime? k)
    (lcons k (primes-from (+ k 2)))
    (primes-from (+ k 2))))

(define *primes* (llist* 2 3 5 (primes-from 7)))

(The module gauche.lazy provides ltake-while, which is a lazy version of take-while. We don’t need lazy version of any, since it immediately stops when the predicate returns a true value.)

The use of lcons and co-recursion in primes-from is a typical idiom in functional programming. It’s perfectly ok to do so in Gauche; except that the generator version is much faster (when you take first 5000 primes, generator version ran 17 times faster than co-recursion version on the author’s machine).

It doesn’t mean you should avoid co-recursive code; if an algorithm can be expressed nicely in co-recursion, it’s perfectly ok. However, watch out the subtle semantic difference from lazy functional languages—straightforward porting may or may not work.