For Development HEAD DRAFTSearch (procedure/syntax/module):

12.24 data.sparse - Sparse data containers

Module: data.sparse

This module provides a sparse vector and sparse matrix, a space efficient data container indexed by nonnegative integer(s), and a sparse table, a hash table using a sparse vector as a backing storage.

A sparse vector associates a nonnegative integer index to a value. It has vector in its name since it is indexed by an integer, but it isn’t like a flat array on contiguous memory; it’s more like an associative array. (Internally, the current implementation uses compact trie structure.) It is guaranteed that you can store a value with index at least up to 2^32-1; the actual maximum bits of indexes can be queried by sparse-vector-max-index-bits. (We have a plan to remove the maximum bits limitation in future).

Unlike ordinary vectors, you don’t need to specify the size of a sparse vector when you create one. You can just set a value to any index in the supported range.

(define v (make-sparse-vector))

(sparse-vector-set! v 0 'a)
(sparse-vector-ref v 0) ⇒ a

(sparse-vector-set! v 100000000 'b)
(sparse-vector-ref v 100000000) ⇒ b

;; set! also work
(set! (sparse-vector-ref v 100) 'c)
(sparse-vector-ref v 100) ⇒ c

If you try to access an element that hasn’t been set, an error is signaled by default. You can set a default value for each vector, or give a fallback value to sparse-vector-ref, to suppress the error.

(sparse-vector-ref v 1)        ⇒ error
(sparse-vector-ref v 1 'noval) ⇒ noval

(let1 w (make-sparse-vector #f :default 'x)
  (sparse-vector-ref w 1))     ⇒ x

A sparse matrix is like a sparse vector, except it can be indexed by a pair of integers.

A sparse table works just like a hash table, but it uses a sparse vector to store the values using hashed number of the keys.

The main reason of these sparse data containers are for memory efficiency. If you want to store values in a vector but knows you’ll use only some entries sparsely, obviously it is waste to allocate a large vector and to leave many entries unused. But it is worse than that; Gauche’s GC doesn’t like a large contiguous region of memory. Using lots of large vectors adds GC overhead quickly. It becomes especially visible when you store large number of entries (like >100,000) into hash tables, since Gauche’s builtin hash tables use a flat vector as a backing storage. You’ll see the heap size grows quickly and GC runs more frequently and longer. On the other hand, sparse table works pretty stable with large number of entries.

Sparse data containers does have overhead on access speed. They are a bit slower than the ordinary hash tables, and much slower than ordinary vectors. We should note, however, as the number of entries grow, access time on ordinary hash tables grows quicker than sparse tables and eventually two become comparable.

It depends on your application which you should use, and if you’re not sure, you need to benchmark. As a rule of thumb, if you use more than several hashtables each of which contains more than a few tens of thousands of entries, sparse tables may work better. If you see GC Warnings telling “repeated allocation of large blocks”, you should definitely consider sparse tables.


12.24.1 Sparse vectors

Class: <sparse-vector-base>

{data.sparse} An abstract base class of sparse vectors. Inherits <dictionary> and <collection>. Note that sparse vectors are not <sequence>; even they can be indexable by integers, they don’t have means of ordered access.

Sparse vector may be a general vector that can contain any Scheme objects (like <vector>), or a specialized vector that can contain only certain types of numbers (like <s8vector> etc.).

All of these sparse vectors can be accessed by the same API.

Sparse vectors also implements the Collection API (see gauche.collection - Collection framework) and the Dictionary API (see gauche.dictionary - Dictionary framework).

Class: <sparse-vector>
Class: <sparse-@vector>

{data.sparse} The actual sparse vector classes. Inherits <sparse-vector-base>. An instance of <sparse-vector> can contain any Scheme objects.

@ is either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64. The range of values an instance of those classes can hold is the same as the corresponding <@vector> class in gauche.uvector (see Uniform vectors). That is, <sparse-u8vector> can have exact integer values between 0 and 255.

Function: make-sparse-vector :optional type :key default

{data.sparse} Creates an empty sparse vector. The type argument can be #f (default), one of subclasses of <sparse-vector-base>, or a symbol of either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64.

If type is omitted or #f, a <sparse-vector> is created. If it is a class, an instance of the class is created (It is an error to pass a class that is not a subclass of <sparse-vector-base>.) If it is a symbol, an instance of corresponding <sparse-@vector> is created.

You can specify the default value of the vector by default keyword argument. If given, the vector behaves as if it is filled with the default value (but the vector iterator only picks the values explicitly set).

Note that you have to give the optional argument as well to specify the keyword argument.

(define v (make-sparse-vector 'u8 :default 128))

(sparse-vector-ref v 0) ⇒ 128
Function: sparse-vector-max-index-bits

{data.sparse} Returns maximum number of bits of allowed integer. If this returns 32, the index up to (expt 2 32) is supported. It is guaranteed that this is at least 32.

In the following entries, the argument sv denotes an instance of sparse vector; an error is signaled if other object is passed.

Function: sparse-vector-copy sv

{data.sparse} Returns a copy of a sparse vector sv.

Function: sparse-vector-ref sv k :optional fallback

{data.sparse} Returns k-th element of a sparse vector sv, where k must an exact integer.

If the sparse vector doesn’t have a value for k, it behaves as follows:

  • If fallback is given, it is returned.
  • Otherwise, if the vector has the default value, it is returned.
  • Otherwise, an error is signaled.
Function: sparse-vector-set! sv k value

{data.sparse} Sets value for k-th element of a sparse vector sv. K must be a nonnegative exact integer, and below the maximum allowed index.

If sv is a numeric sparse vector, value must also be within the allowed range, or an error is signaled.

Function: sparse-vector-num-entries sv

{data.sparse} Returns the number of entries in sv.

Function: sparse-vector-exists? sv k

{data.sparse} Returns #t if sv has an entry for index k, #f otherwise.

Function: sparse-vector-delete! sv k

{data.sparse} Deletes the k-th entry of sv. If sv had the entry , returns #t. If sv didn’t have the entry, returns #f.

Function: sparse-vector-clear! sv

{data.sparse} Empties a sparse vector.

Function: sparse-vector-inc! sv k delta :optional (fallback 0)

{data.sparse} This is a shortcut of the following. It is especially efficient for numeric sparse vectors.

(sparse-vector-set! sv k (+ (sparse-vector-ref sv k fallback) delta))

If the result of addition exceeds the allowed value range of sv, an error is signaled. In future we’ll allow an option to clamp the result value within the range.

Function: sparse-vector-update! sv k proc :optional fallback
Function: sparse-vector-push! sv k val
Function: sparse-vector-pop! sv k :optional fallback

{data.sparse} Convenience routines to fetch-and-update an entry of a sparse vector. Works just like hash-table-update!, hash-table-push! and hash-table-pop!; (see Hashtables).

The following procedures traverses a sparse vector. Note that elements are not visited in the order of index; it’s just like hash table traversers.

At this moment, if you want to walk a sparse vector with increasing/decreasing index order, you have to get a list of keys by sparse-vector-keys, sort it, then use it to retrieve values. We may add an option in future to make-sparse-vector so that those walk operation will be more convenient.

Function: sparse-vector-fold sv proc seed

{data.sparse} For each entry in sv, calls proc as (proc k_n v_n seed_n), where k_n is an index and v_n is a value for it, and seed_n is the returned value of the previous call to proc if n >= 1, and seed if n = 0. Returns the value of the last call of proc.

Function: sparse-vector-for-each sv proc
Function: sparse-vector-map sv proc

{data.sparse} Calls proc with index and value, e.g. (proc k value), for each element of sv.

The results of proc are discarded by sparse-vector-for-each, and gathered to a list and returned by sparse-vector-map.

Function: sparse-vector-keys sv
Function: sparse-vector-values sv

{data.sparse} Returns a list of all keys and all values in sv, respectively.


12.24.2 Sparse matrixes

A sparse matrix is like a sparse vector, except it can be indexed by two nonnegative integers.

Note: This implementation of sparse matrixes aims at a reasonable space efficiency for sparse matrixes without knowing its structure beforehand (imagine, for example, a 2D map with some scattered landmarks). If what you want is a sparse matrix implementation for efficient numeric calculations, with certain particular structures, probably the access speed of this module isn’t suitable.

Currently, each index can have half of bits of sparse-vector-max-index-bits. We’ll remove this limitation in future.

Class: <sparse-matrix-base>

{data.sparse} An abstract base class of sparse matrixes. Inherits <collection>.

Like sparse vectors, a sparse matrix can be of type that can store any Scheme objects, or that can store only certain types of numbers.

All of these sparse matrix subtypes can be accessed by the same API.

Class: <sparse-matrix>
Class: <sparse-@matrix>

{data.sparse} The actual sparse matrix classes. Inherits <sparse-matrix-base>. An instance of <sparse-matrix> can contain any Scheme objects.

@ is either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64. The range of values an instance of those classes can hold is the same as the corresponding <@vector> class in gauche.uvector (see Uniform vectors). That is, <sparse-u8matrix> can have exact integer values between 0 and 255.

Function: make-sparse-matrix :optional type :key default

{data.sparse} Creates an empty sparse matrix. The type argument can be #f (default), one of subclasses of <sparse-matrix-base>, or a symbol of either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64.

If type is omitted or #f, a <sparse-matrix> is created. If it is a class, an instance of the class is created (It is an error to pass a class that is not a subclass of <sparse-matrix-base>.) If it is a symbol, an instance of corresponding <sparse-@matrix> is created.

You can specify the default value of the matrix by default keyword argument. If given, the vector behaves as if it is filled with the default value (but the matrix iterator only picks the values explicitly set).

Note that you have to give the optional argument as well to specify the keyword argument.

Function: sparse-matrix-num-entries mat

{data.sparse} Returns the number of entries explicitly set in a sparse matrix mat.

Function: sparse-matrix-ref mat x y :optional fallback

{data.sparse} Returns an element indexed by (x, y) in a sparse matrix mat. If the indexed element isn’t set, fallback is returned if provided; otherwise, if the matrix has the default value, it is returned; otherwise, an error is raised.

Function: sparse-matrix-set! mat x y value

{data.sparse} Set value to the sparse matrix mat at the location (x, y).

Function: sparse-matrix-exists? mat x y

{data.sparse} Returns #t iff the sparse matrix mat has a value at (x, y).

Function: sparse-matrix-clear! mat

{data.sparse} Empties the sparse matrix mat.

Function: sparse-matrix-delete! mat x y

{data.sparse} Remove the value at (x, y) from the sparse matrix mat.

Function: sparse-matrix-copy mat

{data.sparse} Returns a fresh copy of mat.

Function: sparse-matrix-update! mat x y proc :optional fallback

{data.sparse} Call proc with the value at (x, y) of the sparse matrix, and sets the result of proc as the new value of the location.

The optional fallback argument works just like sparse-matrix-ref; if provided, it is passed to proc in case the matrix doesn’t have a value at (x, y). If fallback isn’t provided and the matrix doesn’t have a value at the location, the default value of the matrix is used if it has one. Otherwise, an error is signalled.

Function: sparse-matrix-inc! mat x y delta :optional fallback

{data.sparse}

(sparse-matrix-update! mat x y (cut + <> delta) fallback)
Function: sparse-matrix-push! mat x y val

{data.sparse}

(sparse-matrix-update! mat x y (cut cons val <>) '())
Function: sparse-matrix-pop! mat x y

{data.sparse}

(rlet1 r #f
  (sparse-matrix-update! mat x y (^p (set! r (car p)) (cdr p))))
Function: sparse-matrix-fold mat proc seed

{data.sparse} Loop over values in the sparse matrix mat. The procedure proc is called with four arguments, x, y, val and seed, for each index (x, y) which has the value val. The initial value of seed is the one given to sparse-matrix-fold, and the result of proc is passed as the next seed value. The last result of proc is returned from sparse-matrix-fold.

The procedure proc is only called on the entries that’s actually has a value, and the order of which the procedure is called is undefined.

Function: sparse-matrix-map mat proc

{data.sparse}

(sparse-matrix-fold sv (^[x y v s] (cons (proc x y v) s)) '()))
Function: sparse-matrix-for-each mat proc

{data.sparse}

(sparse-matrix-fold sv (^[x y v _] (proc x y v)) #f))
Function: sparse-matrix-keys mat

{data.sparse}

(sparse-matrix-fold sv (^[x y _ s] (cons (list x y) s)) '())
Function: sparse-matrix-values mat

{data.sparse}

(sparse-matrix-fold sv (^[x y v s] (cons v s)) '())

12.24.3 Sparse tables

Class: <sparse-table>

{data.sparse} A class for sparse table. Inherits <dictionary> and <collection>.

Operationally sparse tables are the same as hash tables, but the former consumes less memory in trade of slight slower access. (Roughly x1.5 to x2 access time when the table is small. As the table gets larger the difference becomes smaller.)

Function: make-sparse-table comparator

{data.sparse} Creates and returns an empty sparse table. The comparator argument specifies how to compare and hash keys; it must be either a comparator (see Basic comparators), or one of the symbols eq?, eqv?, equal? and string=?, like hash tables (see Hashtables). If it is a symbol, eq-comparator, eqv-comparator, equal-comparator or string-comparator are used, respectively.

Function: sparse-table-comparator st

{data.sparse} Returns the comparator used in the sparse table st.

Function: sparse-table-copy st

{data.sparse} Returns a copy of a sparse table st.

Function: sparse-table-num-entries st

{data.sparse} Returns the number of entries in a sparse table st.

Function: sparse-table-ref st key :optional fallback

{data.sparse} Retrieves a value associated to the key in st. If no entry with key exists, fallback is returned when it is provided, or an error is signaled otherwise.

Function: sparse-table-set! st key value

{data.sparse} Sets value with key in st.

Function: sparse-table-exists? st key

{data.sparse} Returns #t if an entry with key exists in st, #f otherwise.

Function: sparse-table-delete! st key

{data.sparse} Deletes an entry with key in st if it exists. Returns #t if an entry is actually deleted, or #f if there hasn’t been an entry with key.

Function: sparse-table-clear! st

{data.sparse} Empties st.

Function: sparse-table-update! st key proc :optional fallback
Function: sparse-table-push! st key val
Function: sparse-table-pop! st key :optional fallback

{data.sparse}

Function: sparse-table-fold st proc seed
Function: sparse-table-for-each st proc
Function: sparse-table-map st proc

{data.sparse}

Function: sparse-table-keys st
Function: sparse-table-values st

{data.sparse}



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT