data.trie
- Trie ¶This module provides Trie, a dictionary-like data structure that maps keys to values, where a key is an arbitrary sequence. Internally it stores the data as a tree where each node corresponds to each element in the key sequence. Key lookup is O(n) where n is the length of the key, and not affected much by the number of total entries. Also it is easy to find a set of values whose keys share a common prefix.
The following example may give you the idea.
(define t (make-trie)) ;; create a trie (trie-put! t "pho" 3) ;; populate the trie (trie-put! t "phone" 5) (trie-put! t "phrase" 6) (trie-get t "phone") ⇒ 5 ;; lookup (trie-common-prefix t "pho") ;; common prefix search ⇒ (("phone" . 5) ("pho" . 3)) (trie-common-prefix-keys t "ph") ⇒ ("phone" "pho" "phrase")
Tries are frequently used with string keys, but you are not
limited to do so; any sequence (see gauche.sequence
- Sequence framework) can be
a key. If the types of keys differ, they are treated as different
keys:
(trie-put! t '(#\p #\h #\o) 8) ;; different key from "pho"
Trie inherits <collection>
and implements collection framework
including the builder. So you can apply generic collection
operations on a trie (see gauche.collection
- Collection framework).
When iterated, each element of a trie appears as a pair of a key
and a value.
{data.trie
}
A class for Trie. No slots are intended for public.
Use the following procedures to operate on tries.
This class also implements the dictionary interface (see Generic functions for dictionaries).
{data.trie
}
Creates and returns an empty trie. The optional arguments
are procedures to customize how the nodes of the internal
tree are managed.
Each node can have a table to store its child nodes, indexed by an element of the key sequence (e.g. if the trie uses strings as keys, a node’s table is indexed by characters).
tab-make
A procedure with no arguments. When called, creates and returns an empty table for a node.
tab-get tab elt
Returns a child node indexed by elt, or returns #f
if the table doesn’t have a child for elt.
tab-put! tab elt child-node
If child-node isn’t #f
,
stores a child-node with index elt.
If child-node is #f
,
removes the entry with index elt.
In both cases, this procedure should return the updated table.
tab-fold tab proc seed
Calls proc for every index and node in tab, while
passing a seed value, whose initial value is seed.
That is, proc has a type of (index, node, seed) -> seed
.
Should return the last result of proc.
tab-empty? tab
Returns #t
if tab is empty, #f
otherwise.
You can omit or pass #f
to this procedure; then we use
tab-fold
to check if tab is empty, which can be expensive.
The default assumes eqv?
-hashtables, i.e. the
following procedures are used.
tab-make: (lambda () (make-hash-table 'eqv?)) tab-get: (lambda (tab k) (hash-table-get tab k #f)) tab-put!: (lambda (tab k v) (if v (hash-table-put! tab k v) (hash-table-delete! tab k)) tab) tab-fold: hash-table-fold tab-empty?: (lambda (tab) (zero? (hash-table-num-entries tab)))
The following example creates a trie using assoc list to manage children, while comparing string keys with case-insensitive way:
(make-trie list (cut assoc-ref <> <> #f char-ci=?) (lambda (t k v) (if v (assoc-set! t k v char-ci=?) (alist-delete! k t char-ci=?))) (lambda (t f s) (fold f s t)) null?)
It is important that tab-put!
must return an updated
table—by that, you can replace the table structure on the fly.
For example, you may design a table which uses assoc list when
the number of children are small, and then switches to a vector
(indexed by character code) once the number of children grows over
a certain threshold.
{data.trie
}
Construct a trie with the initial contents
kv …, where each kv is a pair of a key and a value.
Params are a list of arguments
which will be given to make-trie
to create the trie.
The following example creates a trie with two entries
and the default table procedures.
(trie '() '("foo" . a) '("bar" . b))
{data.trie
}
A convenient version of trie
when you only concern
the keys. Each value is the same as its key.
The following example creates a trie with two entries
and the default table procedures.
(trie-with-keys '() "foo" "bar")
{data.trie
}
Returns #t
if obj is a trie, or #f
otherwise.
{data.trie
}
Returns the number of entries in trie.
{data.trie
}
Returns #t
if trie contains an entry with key,
or returns #f
otherwise.
(let1 t (trie '() '("foo" . ok)) (list (trie-exists? t "foo") (trie-exists? t "fo") (trie-exists? t "bar"))) ⇒ '(#t #f #f)
{data.trie
}
Returns #t
if there’s at least one key in trie that is
not equal to seq but seq matches its prefix. Note that
seq may or may not a key of trie; see the example below.
(define t (trie '() '("foo" . ok) '("fo" . ok))) (trie-partial-key? t "f") ⇒ #t (trie-partial-key? t "fo") ⇒ #t (trie-partial-key? t "foo") ⇒ #f (trie-partial-key? t "bar") ⇒ #f
{data.trie
}
Returns the value associated with key in trie, if
such an entry exists. When there’s no entry for key,
if fallback is given, it is returned; otherwise,
an error is signaled.
{data.trie
}
Puts value associated to key into trie.
{data.trie
}
Works like the following code, except that the
lookup of entry in trie is done only once.
(let ((val (trie-get trie key fallback))) (trie-put! trie key (proc val)))
{data.trie
}
Removes an entry associated with key from trie.
If there’s no such entry, this procedure does nothing.
{data.trie
}
Makes each entry in trie to a pair (key . value)
and returns a list of pairs of all entries. The order of entries
are undefined.
{data.trie
}
Returns a list of all keys and values in trie, respectively.
The order of keys/values are undefined.
{data.trie
}
Creates a hash table with type ht-type (see Hashtables,
about hash table types), and populates it with every key and value
pair in trie.
{data.trie
}
Returns a pair of the key and its value, where the key is the
longest prefix of seq. If no such key is found, fallback
is returned if it is provided, or an error is thrown.
Do not confuse this with trie-common-prefix-*
procedures below;
In this procedure, the key is the prefix of the given argument.
In trie-common-prefix-*
procedures, the given argument is
the prefix of the keys.
(let1 t (make-trie) (trie-put! t "a" 'a) (trie-put! t "ab" 'ab) (trie-longest-match t "abc") ⇒ ("ab" . ab) (trie-longest-match t "acd") ⇒ ("a" . a) (trie-longest-match t "ab") ⇒ ("ab" . ab) (trie-longest-match t "zy") ⇒ error )
{data.trie
}
Gathers all entries whose keys begin with prefix;
trie-common-prefix
returns those entries in a list of
pairs (key . value)
; trie-common-prefix-keys
returns
a list of keys; and trie-common-prefix-values
returns a list
of values. The order of entries in a returned list is undefined.
If trie contains no entry whose key has prefix, an
empty list is returned.
Note that prefix matching doesn’t consider the type of sequence;
if trie has entries for "foo"
and (#\f #\o #\o)
,
(trie-common-prefix trie "foo")
will return both entries.
{data.trie
}
For each entry whose key begins with prefix,
calls proc with three arguments, the entry’s key,
its value, and the current seed value. Seed is
used for the first seed value, and the value proc
returns is used for the seed value of the next call of proc.
The last returned value from proc is returned from
trie-common-prefix-fold
.
The order of entries on which proc is called is undefined.
If trie contains no entry whose key has prefix,
proc is never called and seed is returned.
{data.trie
}
These are to trie-common-prefix-fold as map
and for-each
are to fold
; trie-common-prefix-map
calls
proc with key and value for matching entries and
gathers its result to a list; trie-common-prefix-for-each
also applies proc, but discards its results.