For Development HEAD DRAFTSearch (procedure/syntax/module):

Next: , Previous: , Up: Library modules - Utilities   [Contents][Index]

12.18 data.random - Random data generators

Module: data.random

This module defines a set of generators and generator makers that yield random data of specific type and distribution.

A naming convention: Procedures that takes parameters and returns a generator is suffixed by $ (e.g. integer$). Procedures that are generators themselves are not (e.g. fixnums). Procedures that are combinators, that is, the ones that take one or more generators and returns a generator, generally ends with a preposition (e.g. list-of).

Global state

All the generators in this module shares a global random state. The random seed is initialized by a fixed value when the module is loaded. You can get and set the random seed by the following procedure.

Function: random-data-seed
Function: (setter random-data-seed) seed-value

{data.random} Calling random-data-seed (without arguments) returns the random seed value used to initialize the current random state.

It can be used with generic setter, to reinitialize the random state with seed-value.

Random seed value must be an exact integer. Its lower 32bits are used.

; reinitialize the random state with a new random seed.
(set! (random-data-seed) 1)

(random-data-seed) ⇒ 1

Note: This procedure doesn’t have parameter interface (alter the global value by giving the new value as an argument), since it doesn’t work like a parameter (see Parameters). You can get the random seed value, but you can’t get the current random state itself—if you restore the random seed value again, the internal state is reset, instead of restoring the state at the time you called random-data-seed.

If you want to use different random state temporarily, and ensure to restore original state afterwards, use with-random-data-seed below.

Function: with-random-data-seed seed thunk

{data.random} Saves the current global random state, initializes the random state with seed, then executes thunk. If thunk returns or the control exits out of thunk, the state at the time with-random-data-seed was called is restored.

Since the default random seed value is fixed, you can get deterministic output when you call the random data generators below without altering the random seed explicitly.

Generators of primitive data types

Those generators generate uniformly distributed data.

In the following examples, we use generator->list to show some concrete data from the generators. It is provided in gauche.generator module. See gauche.generator - Generators, for more utilities work on generators.

Function: integers$ size :optional (start 0)
Function: integers-between$ lower-bound upper-bound

{data.random} Create exact integer generators. The first one, integers$, creates a generator that generates integers from start (inclusive) below start+size (exclusive) uniformly. The second one, integers-between$, creates a generator that generates integers between lower-bound and upper-bound (both inclusive) uniformly.

;; A dice roller
(define dice (integers$ 6 1))

;; Roll the dice 10 times
(generator->list dice 10)
 ⇒ (6 6 2 4 2 5 5 1 2 2)
Function: fixnums
Function: int8s
Function: uint8s
Function: int16s
Function: uint16s
Function: int32s
Function: uint32s
Function: int64s
Function: uint64s

{data.random} Uniform integer generators. Generate integers in fixnum range, and 8/16/32/64bit signed and unsigned integers, respectively.

(generator->list int8s 10)
 ⇒ (20 -101 50 -99 -111 -28 -19 -61 39 110)
Function: booleans

{data.random} Generates boolean values (#f and #t) in equal probability.

(generator->list booleans 10)
 ⇒ (#f #f #t #f #f #t #f #f #f #f)
Function: chars$ :optional char-set

{data.random} Creates a generator that generates characters in char-set uniformly. The default char-set is #[A-Za-z0-9].

(define alphanumeric-chars (chars$))

(generator->list alphanumeric-chars 10)
 ⇒ (#\f #\m #\3 #\S #\z #\m #\x #\S #\l #\y)
Function: reals$ :optional size start
Function: reals-between$ lower-bound upper-bound

{data.random} Create a generator that generates real numbers uniformly with given range. The first procedure, reals$, returns reals between start and start+size, inclusively. The default of size is 1.0 and start is 0.0. The second procedure, reals-between$, returns reals between lower-bound and upper-bound, inclusively.

(define uniform-100 (reals$ 100))

(generator->list uniform-100 10)
 ⇒ (81.67965004942268 81.84927577572596 53.02443813660833)

Note that a generator from reals$ can generate the upper-bound value start+size, as opposed to integers$. If you need to exclude the bound value, just discard the bound value; gfilter may come handy.

(define generate-from-0-below-1
  (gfilter (^r (not (= r 1.0))) (reals$ 1.0 0.0)))
Function: samples$ collection

{data.random} Creates a generator that returns randomly chosen item in collection at a time.

Do not confuse this with samples-from below, which is to combine multiple generators for sampling.

(define coin-toss (samples$ '(head tail)))

(generator->list coin-toss 5)
 ⇒ (head tail tail head tail)
Function: regular-string$ regexp

{data.random} Creates an infinite generator that generates random strings each of which matches the given regexp. The regexp shouldn’t include conditional patterns and lookahead/behind assertions.

Note: It is hard to define how the distribution of the generated strings should look like. For now, we build an NFA from regexp and put the same probability when there are multiple choices, but that may not be really useful for typical use cases (e.g. generate test data). Please assume the current implementation strategy a provisional one.

Nonuniform distributions

Function: reals-normal$ :optional mean deviation

{data.random} Creates a generator that yields real numbers from normal distribution with mean and deviation. The default of mean is 0.0 and deviation is 1.0.

Function: reals-exponential$ mean

{data.random} Creates a generator that yields real numbers from exponential distribution with mean.

Function: integers-geometric$ p

{data.random} Creates a generator that yields integers from geometric distribution with success probability p (0 <= p <= 1). The mean is 1/p and variance is (1-p)/p^2.

Function: integers-poisson$ L

{data.random} Creates a generator that yields integers from poisson distribution with mean L, variance L.

Aggregate data generators

Function: samples-from generators

{data.random} Takes a finite sequence of generators (sequence in the sense of gauche.sequence), and returns a generator. Every time the resulting generator is called, it picks one of the input generators in equal probability, then calls it to get a value.

(define g (samples-from (list uint8s (chars$ #[a-z]))))

(generator->list g 10)
 ⇒ (207 107 #\m #\f 199 #\o #\b 57 #\j #\e)

NB: To create a generator that samples from a fixed collection of items, use samples$ described above.

Function: weighted-samples-from weight&gens

{data.random} The argument is a list of pairs of a nonnegative real number and a generator. The real number determines the weight, or the relative probability that the generator is chosen. The sum of weight doesn’t need to be 1.0.

The following example chooses the uint8 generator four times frequently than the character generator.

(define g (weighted-samples-from
           `((4.0 . ,uint8s)
             (1.0 . ,(chars$)))))

(generator->list g 10)
 ⇒ (195 97 #\j #\W #\5 72 49 143 19 164)
Function: pairs-of car-gen cdr-gen

{data.random} Returns a generator that yields pairs, whose car is generated from car-gen and whose cdr is generated from cdr-gen.

(define g (pairs-of int8s booleans))

(generator->list g 10)
 ⇒ ((113 . #t) (101 . #f) (12 . #t) (68 . #f) (-55 . #f))
Function: tuples-of gen …

{data.random} Returns a generator that yields lists, whose i-th element is generated from the i-th argument.

(define g (tuples-of int8s booleans (char$)))

(generator->list g 3)
 ⇒ ((-43 #f #\8) (53 #f #\1) (-114 #f #\i))
Function: permutations-of seq

{data.random} Returns a generator that yields a random permutations of seq.

The type of seq should be a sequence with a builder (see gauche.sequence - Sequence framework). The type of generated objects will be the same as seq.

(generator->list (permutations-of '(1 2 3)) 3)
 ⇒ ((1 2 3) (2 3 1) (3 2 1))

(generator->list (permutations-of "abc") 3)
 ⇒ ("cba" "cba" "cab")
Function: combinations-of size seq

{data.random} Returns a generator that yields a sequence of size elements randomly picked from seq.

The type of seq should be a sequence with a builder (see gauche.sequence - Sequence framework). The type of generated objects will be the same as seq.

(generator->list (combinations-of 2 '(a b c)) 5)
 ⇒ ((a c) (a b) (a c) (b a) (a c))

(generator->list (combinations-of 2 '#(a b c)) 5)
 ⇒ (#(a c) #(b c) #(c b) #(b a) #(b c))

The following procedures takes optional sizer argument, which can be either a nonnegative integer or a generator of nonnegative integers. The value of the sizer determines the length of the result data.

Unlike most of Gauche procedures, sizer argument comes before the last argument when it is not omitted. We couldn’t resist the temptation to write something like (lists-of 3 booleans).

If sizer is omitted, the default value is taken from the parameter default-sizer. The default of default-sizer is (integers-poisson$ 4).

Function: lists-of item-gen
Function: lists-of sizer item-gen
Function: vectors-of item-gen
Function: vectors-of sizer item-gen
Function: strings-of
Function: strings-of item-gen
Function: strings-of sizer item-gen

{data.random} Creates a generator that generates lists, vectors or strings of values from item-gen, respectively. The size of each datum is determined by sizer.

You can also omit item-gen for strings-of. In that case, a generator created by (chars$) is used.

(generator->list (lists-of 3 uint8s) 4)
 ⇒ ((254 46 0) (77 158 46) (1 134 156) (74 5 110))

;; using the default sizer
(generator->list (lists-of uint8s) 4)
 ⇒ ((93 249) (131 97) (98 206 144 247 241) (126 156 31))

;; using a generator for the sizer
(generator->list (strings-of (integers$ 8) (chars$)) 5)
 ⇒ ("dTJYVhu" "F" "PXkC" "w" "")
Function: sequences-of class item-gen
Function: sequences-of class sizer item-gen

{data.random} Creates a generator that yields sequences of class class, whose items are generated by item-gen. The size of each sequence is determined by sizer, or the value of default-sizer if omitted; the sizer can be a nonnegative integer, or a generator that yields nonnegative integers.

The class class must be a subclass of <sequence> and implement the builder interface.

(generator->list (sequences-of <u8vector> 4 uint8s) 3)
 ⇒ (#u8(95 203 243 46) #u8(187 199 153 152) #u8(39 114 39 25))
Parameter: default-sizer

{data.random} The sizer used by lists-of, vectors-of and strings-of when sizer argument is omitted.

The value must be either an nonnegative integer, or a generator of nonnegative integers.

Next: , Previous: , Up: Library modules - Utilities   [Contents][Index]

For Development HEAD DRAFTSearch (procedure/syntax/module):