[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.59 util.sparse - Sparse data containers

Module: util.sparse

This module provides a sparse vector, a space efficient data containter indexed by nonnegative integer, and a sparse table, a hash table using a sparse vector as a backing storage.

A sparse vector associates a nonnegative integer index to a value. It has vector in its name since it is indexed by an integer, but it isn’t like a flat array on contiguous memory; it’s more like an associative array. (Internally, the current implementation uses compact trie structure.) It is guaranteed that you can store a value with index at least up to 2^32-1; the actual maximum bits of indexes can be queried by sparse-vector-max-index-bits.

Unlike ordinary vectors, you don’t need to specify the size of a sparse vector when you create one. You can just set a value to any index in the supported range.

 
(define v (make-sparse-vector))

(sparse-vector-set! v 0 'a)
(sparse-vector-ref v 0) ⇒ a

(sparse-vector-set! v 100000000 'b)
(sparse-vector-ref v 100000000) ⇒ b

;; set! also work
(set! (sparse-vector-ref v 100) 'c)
(sparse-vector-ref v 100) ⇒ c

If you try to access an element that hasn’t been set, an error is signaled by default. You can give a fallback value to sparse-vector-ref to suppress the error.

 
(sparse-vector-ref v 1)        ⇒ error
(sparse-vector-ref v 1 'noval) ⇒ noval

A sparse table works just like a hash table, but it uses a sparse vector to store the values using hashed number of keys.

The main reason of these sparse data containers are for memory efficiency. If you want to store values in a vector but knows you’ll use only some entries sparsely, obviously it is waste to allocate a large vector and to leave many entries unused. But it is worse than that; Gauche’s GC doesn’t like a large contiguous region of memory. Using lots of large vectors adds GC overhead quickly. It becomes especially visible when you store large number of entries (like >100,000) into hash tables, since Gauche’s builtin hash tables use a flat vector as a backing storage. You’ll see the heap size grows quickly and GC runs more frequently and longer. On the other hand, sparse table works pretty stable with large number of entries.

Sparse data containers does have overhead on access speed. They are a bit slower than the ordinary hash tables, and much slower than ordinary vectors. We should note, however, as the number of entries grow, access speed on ordinary hash tables grows quicker than sparse tables and eventually two become comparable.

It depends on your application which you should use, and if you’re not sure, you need to benchmark. As a rule of thumb, if you use more than several hashtables each of which contains more than a few tens of thousands of entries, sparse tables may work better. If you see GC Warnings telling “repeated allocation of large blocks”, you should definitely consider sparse tables.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.59.1 Sparse vectors

Class: <sparse-vector-base>

An abstract base class of sparse vectors. Inherits <dictionary> and <collection>. Note that sparse vectors are not <sequence>; even they can be indexable by integers, they don’t have means of ordered access.

Sparse vector may be a general vector that can contain any Scheme objects (like <vector>), or a specialized vector that can contain only certain types of numbers (like <s8vector> etc.).

All of these sparse vectors can be accessed by the same API.

Sparse vectors also implements the Collection API (See section gauche.collection - Collection framework) and the Dictionary API (See section gauche.dictionary - Dictionary framework).

Class: <sparse-vector>
Class: <sparse-TAGvector>

The actual sparse vector classes. Inherits <sparse-vector-base>. An instance of <sparse-vector> can contain any Scheme objects.

TAG either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64. The range of values an instance of those classes can hold is the same as the corresponding <TAGvector> class in gauche.uvector (See section gauche.uvector - Uniform vectors). That is, <sparse-u8vector> can have exact integer values between 0 and 255.

Function: make-sparse-vector :optional type

Creates an empty sparse vector. The type argument can be #f (default), one of subclasses of <sparse-vector-base>, or a symbol of either one of s8, u8, s16, u16, s32, u32, s64, u64, f16, f32, or f64.

If type is omitted or #f, a <sparse-vector> is created. If it is a class, an instance of the class is created (It is an error to pass a class that is not a subclass of <sparse-vector-base>.) If it is a symbol, an instance of corresponding <sparse-TAGvector> is created.

Function: sparse-vector-max-index-bits

Returns maximum number of bits of allowed integer. If this returns 32, the index up to (expt 2 32) is supported. It is guaranteed that this is at least 32.

In the following entries, the argument sv denotes an instance of sparse vector; an error is signaled if other object is passed.

Function: sparse-vector-copy sv

Returns a copy of a sparse vector sv.

Function: sparse-vector-ref sv k :optional default

Returns k-th element of a sparse vector sv, where k must be a nonnegative exact integer. If k is equal to or greater than the allowed index range, an error is signaled.

If k is within the allowed range but the sparse vector doesn’t have a value for k, default is returned if it is provided, otherwise an error is signaled.

Function: sparse-vector-set! sv k value

Sets value for k-th element of a sparse vector sv. K must be a nonnegative exact integer, and below the maximum allowed index.

If sv is a numeric sparse vector, value must also be within the allowed range, or an error is signaled.

Function: sparse-vector-num-entries sv

Returns the number of entries in sv.

Function: sparse-vector-exists? sv k

Returns #t if sv has an entry for index k, #f otherwise.

Function: sparse-vector-delete! sv k

Deletes the k-th entry of sv. If sv had the entry , returns #t. If sv didn’t have the entry, returns #f.

Function: sparse-vector-clear! sv

Empties a sparse vector.

Function: sparse-vector-inc! sv k delta :optional (fallback 0)

This is a shortcut of the following. It is especially efficient for numeric sparse vectors.

 
(sparse-vector-set! sv k (+ (sparse-vector-ref sv k fallback) delta))

If the result of addition exceeds the allowed value range of sv, an error is signaled. In future we’ll allow an option to clamp the result value within the range.

Function: sparse-vector-update! sv k proc :optional fallback
Function: sparse-vector-push! sv k val
Function: sparse-vector-pop! sv k :optional fallback

Convenience routines to fetch-and-update an entry of a sparse vector. Works just like hash-table-update!, hash-table-push! and hash-table-pop!; (See section Hashtables).

The following procedures traverses a sparse vector. Note that elements are not visited in the order of index; it’s just like hash table traversers.

At this moment, if you want to walk a sparse vector with increasing/decreasing index order, you have to get a list of keys by sparse-vector-keys, sort it, then use it to retrieve values. We may add an option in future to make-sparse-vector so that those walk operation will be more convenient.

Function: sparse-vector-fold sv proc seed

For each entry in sv, calls proc as (proc k_n v_n seed_n), where k_n is an index and v_n is a value for it, and seed_n is the returned value of the previous call to proc if n >= 1, and seed if n = 0. Returns the value of the last call of proc.

Function: sparse-vector-for-each sv proc
Function: sparse-vector-map sv proc

Calls proc with index and value, e.g. (proc k value), for each element of sv.

The results of proc are discarded by sparse-vector-for-each, and gathered to a list and returned by sparse-vector-map.

Function: sparse-vector-keys sv
Function: sparse-vector-values sv

Returns a list of all keys and all values in sv, respectively.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

12.59.2 Sparse tables

Class: <sparse-table>

A class for sparse table. Inherits <dictionary> and <collection>.

Operationally sparse tables are the same as hash tables, but the former consumes less memory in trade of slight slower access. (Roughly x1.5 to x2 access time when the table is small. As the table gets larger the difference becomes smaller.)

Function: make-sparse-table type

Creates and returns an empty sparse table. The type argument specifies how to compare the elements; currently it can be one of the symbols eq?, eqv?, equal? and string=?, like hash tables (See section Hashtables).

Function: sparse-table-copy st

Returns a copy of a sparse table st.

Function: sparse-table-num-entries st

Returns the number of entries in a sparse table st.

Function: sparse-table-ref st key :optional fallback

Retrieves a value associated to the key in st. If no entry with key exists, fallback is returned when it is provided, or an error is signaled otherwise.

Function: sparse-table-set! st key value

Sets value with key in st.

Function: sparse-table-exists? st key

Returns #t if an entry with key exists in st, #f otherwise.

Function: sparse-table-delete! st key

Deletes an entry with key in st if it exists. Returns #t if an entry is actually deleted, or #f if there hasn’t been an entry with key.

Function: sparse-table-clear! st

Empties st.

Function: sparse-table-update! st key proc :optional fallback
Function: sparse-table-push! st key val
Function: sparse-table-pop! st key :optional fallback
Function: sparse-table-fold st proc seed
Function: sparse-table-for-each st proc
Function: sparse-table-map st proc
Function: sparse-table-keys st
Function: sparse-table-values st

[ < ] [ > ]   [ << ] [ Up ] [ >> ]

This document was generated on July 19, 2014 using texi2html 1.82.