Cache (Gauche Users’ Reference)

12.14 `data.cache` - Cache

Module: data.cache ¶: A cache works similarly as a dictionary, associating keys to values, but its entries may disappear according to the policy of the cache algorithm. This module defines a common protocol for cache datatypes, and also provides several typical cache implementations.

Examples

Let’s start from simple examples to get the idea.

Suppose you want to read given files and you want to cache the frequently read ones. The following code defines a cached version of file->string:

(use data.cache)
(use file.util)

(define file->string/cached
  (let1 file-cache (make-lru-cache 10 :comparator string-comparator)
    (^[path] (cache-through! file-cache path file->string))))

The procedure closes a variable file-cache, which is an LRU (least recently used) cache that associates string pathnames to the file contents. The actual logic is in cache-through!, which first consults the cache if it has an entry for the path. If the cache has the entry, its value (the file content) is returned. If not, it calls file->string with the path to fetch the file content, register it to the cache, and return it. The capacity of the cache is set to 10 (the first argument of make-lru-cache), so when the 11th file is read, the least recently used file will be evicted from the cache.

The effect of the cache isn’t very visible in the above example. You can insert some print stubs to see the cache is actually in action, as the following example. Try read various files using file->string/cached.

(define file->string/cached
  (let1 file-cache (make-lru-cache 10 :comparator string-comparator)
    (^[path]
      (print #"file->string/cached called on ~path")
      (cache-through! file-cache path
                      (^[path]
                        (print #"cache miss.  fetching ~path")
                        (file->string path))))))

Caveat: A cache itself isn’t MT-safe. If you are using it in multithreaded programs, you have to wrap it with an atom (see Synchronization primitives):

(use data.cache)
(use file.util)
(use gauche.threads)

(define file->string/cached
  (let1 file-cache (atom (make-lru-cache 10 :comparator string-comparator))
    (^[path]
      (atomic file-cache (cut cache-through! <> path file->string)))))

Common properties of caches

A cache of any kind has a comparator and a storage. The comparator is used to compare keys; in the example above, we use string-comparator to compare string pathnames (see Basic comparators, for more about comparators).

The storage is a dictionary that maps keys to internal structures of the cache. By default, a hashtable is created automatically using the given comparator (or, if a comparator is omitted, using default-comparator). The comparator must have the hash function.

Alternatively, you can give a pre-filled dictionary (copied from another instance of the same kind of cache) to start cache with some data already in it. Note that what the cache keeps in the dictionary totally depends on the cache algorithm, so you can’t just pass a random dictionary; it has to be created by the same kind of cache. If you pass in the storage, the comparator is taken from it.

Thus, the cache constructors uniformly take keyword arguments comparator and storage; you can specify either one, or omit both to use the defaults.

Predefined caches

For the storage and comparator keyword arguments, see above.

Function: make-fifo-cache capacity :key storage comparator ¶: {data.cache} Creates and returns a FIFO (first-in, first-out) cache that can hold up to capacity entries. If the number of entries exceeds capacity, the oldest entry is removed.

Function: make-lru-cache capacity :key storage comparator ¶: {data.cache} Creates and returns an LRU (least recently used) cache that can hold up to capacity entries. If the number of entries exceeds capacity, the least recently used entry is removed.

Function: make-ttl-cache timeout :key storage comparator timestamper ¶

{data.cache} Creates and returns a TTL (time to live) cache with the timeout value timeout. Each entry is timestamped when it’s inserted, and it is removed when the current time passes timeout unit from the timestamp. The actual entry removal is done when the cache is accessed.

By default, the Unix system time (seconds from Epoch) is used as a timestamp, and timeout is in seconds. It may not be fine-grained enough if you add multiple entries in shorter intervals than seconds. You can customize it by giving a thunk to timestamper; the thunk is called to obtain a timestamp, which can be any monotonically increasing real number (it doesn’t need to be associated with physical time). If you give timestamper, the unit of timeout value should be the same as whatever timestamper returns.

Function: make-ttlr-cache timeout :key storage comparator timestamper ¶

{data.cache} A variation of TTL cache, but the entry’s timestamp is updated (refreshed) whenever the entry is read. Hence we call it TTL with refresh (TTLR). But you can also think it as a variation of LRU cache with timeout.

The unit of timeout, and the role of timestamper argument, are the same as make-ttl-cache.

Common operations of caches

The following APIs are for the users of a cache.

Function: cache-lookup! cache key :optional default ¶

{data.cache} Look for an entry with key in cache, and returns its value if it exists. If there’s no entry, the procedure returns default if it is provided, or throws an error otherwise.

Some types of cache algorithms update cache by this operation (e.g. access timestamp), hence the bang is in the name.

Function: cache-through! cache key value-fn ¶: {data.cache} Look for an entry with key in cache, and returns its value if it exists. If there’s no entry, a procedure value-fn is called with key as the argument, and its return value is inserted into cache and also returned. The insertion may cause eviction of exising entries.

Generic function: cache-write! cache key value ¶

{data.cache} This inserts association of key and value into cache. If there’s already an entry with key, it is overwritten. Otherwise a new entry is created.

The same effect can be achieved by calling cache-evict! then cache-through!, but cache algorithms may provide efficient way through this method.

Generic function: cache-evict! cache key ¶: {data.cache} Removes an entry with key from cache, if it exists.

Generic function: cache-clear! cache ¶: {data.cache} Removes all entries from cache.

Implementing a cache algorithm

Each cache algorithm must define a class inheriting <cache>, and implement the following two essential methods. The higher-level API calls them.

Generic function: cache-check! cache key ¶: {data.cache} Looks for an entry with key in cache. If it exists, returns a pair of key and the associated value. Otherwise, returns #f. It may update the cache, for example, the timestamp of the entry for being read.

Generic function: cache-register! cache key value ¶: {data.cache} Add an entry with key and associated value into cache. This is called after key is confirmed not being in cache.

Additionally, the implementation should consider the following points.

The initialize method must call next-method first, which sets up the comparator and storage slots. You should check if storage has pre-filled entries, and if so, set up other internal structures appropriately.
The default methods of cache-evict! and cache-clear! only takes care of the storage of the cache. You should implement them if your auxiliary structure needs to be taken care of.
The default method of cache-write! is just cache-evict! followed by cache-register!. You may provide an alternative method if you can do it more efficiently, which is often the case.

There are several procedures that help implementing cache subclasses:

Function: cache-comparator cache ¶
Function: cache-storage cache ¶: {data.cache} Returns the comparator and the storage of the cache, respectively.

Typical caches may be constructed with a storage (dictionary) and a queue, where the storage maps keys to (<n> . <value>), and queues holds (<key> . <n>), <n> being a number (timestamp, counter, etc.) Here are some common operations work on this queue-and-dictionary scheme:

Function: cache-populate-queue! queue storage ¶: {data.cache} You can call this in the initialize method to set up the queue. This procedure walks storage (a dictionary that maps keys to (<n> . <value>)) to construct (<key> . <n>) pairs, sorts it in increasing order of <n>, and pushes them into the queue.

Function: cache-compact-queue! queue storage ¶: {data.cache} The queue may contain multiple pairs with the same key. Sometimes the queue gets to have too many duplicated entries (e.g. the same entry is read repeatedly, and you push the read timestamp to the queue for every read). This procedure scans the queue and removes duplicated entries but the up-to-date one. After this operation, the length of the queue and the number of entries in the storage should match.

Function: cache-renumber-entries! queue storage ¶

{data.cache} This procedure renumbers <n>s in the queue and the storage starting from 0, without changing their order, and returns the maximum <n>. The duplicated entries in the queue is removed as in cache-compact-queue!.

When you’re using monotonically increasing counter for <n> and you don’t want <n> to get too big (i.e. bignums), you can call this procedure occasionally to keep <n>’s in a reasonable range.

12.14 data.cache - Cache