For Development HEAD DRAFTSearch (procedure/syntax/module):

12.25 data.trie - Trie

Module: data.trie

This module provides Trie, a dictionary-like data structure that maps keys to values, where a key is an arbitrary sequence. Internally it stores the data as a tree where each node corresponds to each element in the key sequence. Key lookup is O(n) where n is the length of the key, and not affected much by the number of total entries. Also it is easy to find a set of values whose keys share a common prefix.

The following example may give you the idea.

(define t (make-trie))   ;; create a trie

(trie-put! t "pho" 3)    ;; populate the trie
(trie-put! t "phone" 5)
(trie-put! t "phrase" 6)

(trie-get t "phone")  ⇒ 5  ;; lookup

(trie-common-prefix t "pho")       ;; common prefix search
  ⇒ (("phone" . 5) ("pho" . 3))
(trie-common-prefix-keys t "ph")
  ⇒ ("phone" "pho" "phrase")

Tries are frequently used with string keys, but you are not limited to do so; any sequence (see gauche.sequence - Sequence framework) can be a key. If the types of keys differ, they are treated as different keys:

(trie-put! t '(#\p #\h #\o) 8)  ;; different key from "pho"

Trie inherits <collection> and implements collection framework including the builder. So you can apply generic collection operations on a trie (see gauche.collection - Collection framework). When iterated, each element of a trie appears as a pair of a key and a value.

Class: <trie>

{data.trie} A class for Trie. No slots are intended for public. Use the following procedures to operate on tries.

This class also implements the dictionary interface (see Generic functions for dictionaries).

Function: make-trie :optional tab-make tab-get tab-put! tab-fold tab-empty?

{data.trie} Creates and returns an empty trie. The optional arguments are procedures to customize how the nodes of the internal tree are managed.

Each node can have a table to store its child nodes, indexed by an element of the key sequence (e.g. if the trie uses strings as keys, a node’s table is indexed by characters).

tab-make

A procedure with no arguments. When called, creates and returns an empty table for a node.

tab-get tab elt

Returns a child node indexed by elt, or returns #f if the table doesn’t have a child for elt.

tab-put! tab elt child-node

If child-node isn’t #f, stores a child-node with index elt. If child-node is #f, removes the entry with index elt. In both cases, this procedure should return the updated table.

tab-fold tab proc seed

Calls proc for every index and node in tab, while passing a seed value, whose initial value is seed. That is, proc has a type of (index, node, seed) -> seed. Should return the last result of proc.

tab-empty? tab

Returns #t if tab is empty, #f otherwise. You can omit or pass #f to this procedure; then we use tab-fold to check if tab is empty, which can be expensive.

The default assumes eqv?-hashtables, i.e. the following procedures are used.

tab-make: (lambda () (make-hash-table 'eqv?))

tab-get:  (lambda (tab k) (hash-table-get tab k #f))

tab-put!: (lambda (tab k v)
            (if v
              (hash-table-put! tab k v)
              (hash-table-delete! tab k))
            tab)

tab-fold: hash-table-fold

tab-empty?: (lambda (tab) (zero? (hash-table-num-entries tab)))

The following example creates a trie using assoc list to manage children, while comparing string keys with case-insensitive way:

(make-trie list
           (cut assoc-ref <> <> #f char-ci=?)
           (lambda (t k v)
             (if v
               (assoc-set! t k v char-ci=?)
               (alist-delete! k t char-ci=?)))
           (lambda (t f s) (fold f s t))
           null?)

It is important that tab-put! must return an updated table—by that, you can replace the table structure on the fly. For example, you may design a table which uses assoc list when the number of children are small, and then switches to a vector (indexed by character code) once the number of children grows over a certain threshold.

Function: trie params kv …

{data.trie} Construct a trie with the initial contents kv …, where each kv is a pair of a key and a value. Params are a list of arguments which will be given to make-trie to create the trie. The following example creates a trie with two entries and the default table procedures.

(trie '() '("foo" . a) '("bar" . b))
Function: trie-with-keys params key …

{data.trie} A convenient version of trie when you only concern the keys. Each value is the same as its key. The following example creates a trie with two entries and the default table procedures.

(trie-with-keys '() "foo" "bar")
Function: trie? obj

{data.trie} Returns #t if obj is a trie, or #f otherwise.

Function: trie-num-entries trie

{data.trie} Returns the number of entries in trie.

Function: trie-exists? trie key

{data.trie} Returns #t if trie contains an entry with key, or returns #f otherwise.

(let1 t (trie '() '("foo" . ok))
  (list (trie-exists? t "foo")
        (trie-exists? t "fo")
        (trie-exists? t "bar")))
  ⇒ '(#t #f #f)
Function: trie-partial-key? trie seq

{data.trie} Returns #t if there’s at least one key in trie that is not equal to seq but seq matches its prefix. Note that seq may or may not a key of trie; see the example below.

(define t (trie '() '("foo" . ok) '("fo" . ok)))

(trie-partial-key? t "f")    ⇒ #t
(trie-partial-key? t "fo")   ⇒ #t
(trie-partial-key? t "foo")  ⇒ #f
(trie-partial-key? t "bar")  ⇒ #f
Function: trie-get trie key :optional fallback

{data.trie} Returns the value associated with key in trie, if such an entry exists. When there’s no entry for key, if fallback is given, it is returned; otherwise, an error is signaled.

Function: trie-put! trie key value

{data.trie} Puts value associated to key into trie.

Function: trie-update! trie key proc :optional fallback

{data.trie} Works like the following code, except that the lookup of entry in trie is done only once.

(let ((val (trie-get trie key fallback)))
  (trie-put! trie key (proc val)))
Function: trie-delete! trie key

{data.trie} Removes an entry associated with key from trie. If there’s no such entry, this procedure does nothing.

Function: trie->list trie

{data.trie} Makes each entry in trie to a pair (key . value) and returns a list of pairs of all entries. The order of entries are undefined.

Function: trie-keys trie
Function: trie-values trie

{data.trie} Returns a list of all keys and values in trie, respectively. The order of keys/values are undefined.

Function: trie->hash-table trie ht-type

{data.trie} Creates a hash table with type ht-type (see Hashtables, about hash table types), and populates it with every key and value pair in trie.

Function: trie-longest-match trie seq :optional fallback

{data.trie} Returns a pair of the key and its value, where the key is the longest prefix of seq. If no such key is found, fallback is returned if it is provided, or an error is thrown.

Do not confuse this with trie-common-prefix-* procedures below; In this procedure, the key is the prefix of the given argument. In trie-common-prefix-* procedures, the given argument is the prefix of the keys.

(let1 t (make-trie)
  (trie-put! t "a"  'a)
  (trie-put! t "ab" 'ab)

  (trie-longest-match t "abc")  ⇒ ("ab" . ab)
  (trie-longest-match t "acd")  ⇒ ("a"  . a)
  (trie-longest-match t "ab")   ⇒ ("ab" . ab)
  (trie-longest-match t "zy")   ⇒ error
  )
Function: trie-common-prefix trie prefix
Function: trie-common-prefix-keys trie prefix
Function: trie-common-prefix-values trie prefix

{data.trie} Gathers all entries whose keys begin with prefix; trie-common-prefix returns those entries in a list of pairs (key . value); trie-common-prefix-keys returns a list of keys; and trie-common-prefix-values returns a list of values. The order of entries in a returned list is undefined. If trie contains no entry whose key has prefix, an empty list is returned.

Note that prefix matching doesn’t consider the type of sequence; if trie has entries for "foo" and (#\f #\o #\o), (trie-common-prefix trie "foo") will return both entries.

Function: trie-common-prefix-fold trie prefix proc seed

{data.trie} For each entry whose key begins with prefix, calls proc with three arguments, the entry’s key, its value, and the current seed value. Seed is used for the first seed value, and the value proc returns is used for the seed value of the next call of proc. The last returned value from proc is returned from trie-common-prefix-fold. The order of entries on which proc is called is undefined. If trie contains no entry whose key has prefix, proc is never called and seed is returned.

Function: trie-common-prefix-map trie prefix proc
Function: trie-common-prefix-for-each trie prefix proc

{data.trie} These are to trie-common-prefix-fold as map and for-each are to fold; trie-common-prefix-map calls proc with key and value for matching entries and gathers its result to a list; trie-common-prefix-for-each also applies proc, but discards its results.

Function: trie-fold trie proc seed
Function: trie-map trie proc
Function: trie-for-each trie proc

{data.trie} These procedures are like their common-prefix versions, but traverse entire trie instead.



For Development HEAD DRAFTSearch (procedure/syntax/module):
DRAFT